Research Article
Published:18 September 2004

Multimodal Translation System Using Texture-Mapped Lip-Sync Images for Video Mail and Automatic Dubbing Applications

EURASIP Journal on Advances in Signal Processingvolume 2004, Article number: 509796 (2004)Cite this article

2040Accesses
Metricsdetails

Abstract

We introduce a multimodal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion by synchronizing it to the translated speech. This system also introduces both a face synthesis technique that can generate any viseme lip shape and a face tracking technique that can estimate the original position and rotation of a speaker's face in an image sequence. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a 3D wire-frame model that is adaptable to any speaker. Our approach provides translated image synthesis with an extremely small database. The tracking motion of the face from a video image is performed by template matching. In this system, the translation and rotation of the face are detected by using a 3D personal face model whose texture is captured from a video frame. We also propose a method to customize the personal face model by using our GUI tool. By combining these techniques and the translated voice synthesis technique, an automatic multimodal translation can be achieved that is suitable for video mail or automatic dubbing systems into other languages.

Author information

Authors and Affiliations

School of Science and Engineering, Waseda University, Tokyo, 169-8555, Japan
Shigeo Morishima
ATR Spoken Language Translation Research Laboratories, Kyoto, 619-0288, Japan
Shigeo Morishima & Satoshi Nakamura

Authors

Shigeo Morishima
View author publications
You can also search for this author inPubMed Google Scholar
Satoshi Nakamura
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toShigeo Morishima.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Morishima, S., Nakamura, S. Multimodal Translation System Using Texture-Mapped Lip-Sync Images for Video Mail and Automatic Dubbing Applications.EURASIP J. Adv. Signal Process.2004, 509796 (2004). https://doi.org/10.1155/S1110865704404259

Download citation

Received:25 November 2002
Revised:16 January 2004
Published:18 September 2004
DOI:https://doi.org/10.1155/S1110865704404259

Movatterモバイル変換

Multimodal Translation System Using Texture-Mapped Lip-Sync Images for Video Mail and Automatic Dubbing Applications

Abstract

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords and phrases

Associated content

Multimedia Human-Computer Interface