Single-channel Speech Enhancement Using Statistical Modelling
MetadataShow full metadata
A new speech enhancement method based on Maximum A-Posteriori (MAP) estimation on Gaussian Mixture Models (GMMs) of speech and different noise types is introduced. The GMMs model the distribution of speech and noise periodograms in a high dimensional space and hence decrease the complexity of estimation procedure. Using the GMMs the Probability Density Functions (PDFs) of clean speech and noise can be calculated and by applying MAP on these PDFs, the estimates of speech and noise periodograms that form the noisy speech periodogram of the observed noisy speech frame can be estimated. These estimates are then used in a Wiener filter to enhance the noisy speech and recover the speech signal as close as possible to the original one. Since the PDFs are complicated and hence the realization of a MAP criterion can become even more complicated, some approximations are used to find the MAP criterion. Some improvements on this MAP estimation based on the characteristics of periodograms are also introduced in which the approximations are improved in a way which leads to more accurate estimates of speech and noise periodograms. Since the accuracy of the introduced MAP estimate is highly dependent on the accuracy of speech and noise power estimation in the noisy frame, a new power estimation method using Gamma modelling is introduced to replace the older methods like Minimum Statistics. The results of all the estimation methods are used in a classic Wiener filter to be applied on the noisy frame to enhance it. Since all the estimation algorithms can have some errors, we introduce an improvement of Wiener filter in which we can attenuate the effect of these errors on the enhanced speech signal. The performance of all the introduced methods are analyzed in terms of quality and intelligibility and reported thus.