We present a novel method for estimating formant frequencies by fitting Gaussian mixtures to discrete Fourier Transform (DFT) magnitude spectra. The method first estimates the Gaussian parameters for a sequence of wideband spectra using the Expectation- Maximization (EM) algorithm. It then refines the parameters by using maximum a posteriori (MAP) adaptation. The work was evaluated using manually labeled ground truth data with 516 utterances and comparing results both with PRAAT's formant tracking algorithm in various noisy environments and one other state-of-the-art method. We obtained statistically significant improvements in the relative errors for the first three formants over all phonetic classes.
@inproceedings{kim13f_interspeech, title = {Formant frequency tracking using Gaussian mixtures with maximum a posteriori adaptation}, author = {Jonathan C. Kim and Hrishikesh Rao and Mark A. Clements}, year = {2013}, booktitle = {Interspeech 2013}, pages = {3221--3225}, doi = {10.21437/Interspeech.2013-714}, issn = {2958-1796},}
Cite as:Kim, J.C., Rao, H., Clements, M.A. (2013) Formant frequency tracking using Gaussian mixtures with maximum a posteriori adaptation. Proc. Interspeech 2013, 3221-3225, doi: 10.21437/Interspeech.2013-714