Movatterモバイル変換

Lip synchronization: from phone lattice to PCA eigen-projections using neural networks

Samer Al Moubayed, Michael De Smet, Hugo Van hamme

Lip synchronization is the process of generating natural lip movements from a speech signal. In this work we address the lip-sync problem using an automatic phone recognizer that generates a phone lattice carrying posterior probabilities. The acoustic feature vector contains the posterior probabilities of all the phones over a time window centered at the current time point. Hence this representation characterizes the phone recognition output including the confusion patterns caused by its limited accuracy. A 3D face model with varying texture is computed by analyzing a video recording of the speaker using a 3D morphable model. Training a neural network using 30 000 data vectors from an audiovisual recording in Dutch resulted in a very good simulation of the face on independent data sets of the same or of a different speaker.

@inproceedings{moubayed08_interspeech,  title     = {Lip synchronization: from phone lattice to PCA eigen-projections using neural networks},  author    = {Samer Al Moubayed and Michael De Smet and Hugo {Van hamme}},  year      = {2008},  booktitle = {Interspeech 2008},  pages     = {2016--2019},  doi       = {10.21437/Interspeech.2008-524},  issn      = {2958-1796},}

Cite as:Moubayed, S.A., Smet, M.D., Van hamme, H. (2008) Lip synchronization: from phone lattice to PCA eigen-projections using neural networks. Proc. Interspeech 2008, 2016-2019, doi: 10.21437/Interspeech.2008-524

doi:10.21437/Interspeech.2008-524