Many voice conversion algorithms are based on frame-wise mapping from source features into target features. This ignores the inherent temporal continuity that is present in speech and can degrade the subjective quality. In this paper, we propose to optimize the speech feature sequence after a frame-based conversion algorithm has been applied. In particular, we select the sequence of speech features through the minimization of a cost function that involves both the conversion error and the smoothness of the sequence. The estimation problem is solved using sequential Monte Carlo methods. Both subjective and objective results show the effectiveness of the method.
@inproceedings{helander10_interspeech, title = {Maximum a posteriori voice conversion using sequential monte carlo methods}, author = {Elina Helander and Hanna Silén and Joaquin Míguez and Moncef Gabbouj}, year = {2010}, booktitle = {Interspeech 2010}, pages = {1716--1719}, doi = {10.21437/Interspeech.2010-493}, issn = {2958-1796},}
Cite as:Helander, E., Silén, H., Míguez, J., Gabbouj, M. (2010) Maximum a posteriori voice conversion using sequential monte carlo methods. Proc. Interspeech 2010, 1716-1719, doi: 10.21437/Interspeech.2010-493