Part of the book series:Lecture Notes in Computer Science ((LNTCS,volume 6064))
Included in the following conference series:
1777Accesses
Abstract
This work describes a real-time voice driven method using which a speaker’s lip shape is synchronized with the corresponding speech signal, for a low bandwidth mobile devices. Phoneme recognition is generally regarded as an important task in the operation of a real-time lip-sync system. In this work, the use of the kernel-based lip shape clustering algorithm is inspired based on one-class support vector machines (SVM). A set of speaker who has similar lip shape is clustered and a cluster-dependent vowel phoneme is then constructed for each cluster. We use sum of absolute difference (SAD) as vowel lip shape likelihood to cluster into categories. Then adjust the source and destination pictures of lip shape in the transparent level using alpha blending for lip-sync animation. We find that this method outperforms conventional CHMM method in phoneme error rate (PER), 8.78% and 32.25%, respectively.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 11439
- Price includes VAT (Japan)
- Softcover Book
- JPY 14299
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lin, I.C., Hung, C.S., Yang, T.J., Ouhyoung, M.: A Speech Driven Talking Head System Based on a Single Face Image. In: Proc. Pacific Graphics 1999, Seoul, Korea, October 1999, pp. 43–49 (1999) IEEE ISBN 0-7695-0293-8
Ostermann, J., Weissenfeld, A.: Talking faces-technologies and applications. In: Proc. of ICPR 2004, August 2004, vol. 3, pp. 826–833 (2004)
Tamura, M.: Visual speech synthesis based on parameter generation from HMM: Speech driven and text-and-speech driven approaches. In: Proc. AVSP 1998, pp. 221–226 (1998)
Zoric, G., Pandzic, I.S.: Automatic lip sync. and its use in the new multimedia services for mobile devices. In: Proc. 8th Int. Conf. Telecommunications, vol. 2, pp. 353–358 (2005)
Xie, L., Liu, Z.: Realistic mouth-synching for speech-driven talking face using articulatory modeling. IEEE Trans. Multimedia 9(3), 500–510 (2007)
Park, J., Ko, H.: Real-Time Continuous Phoneme Recognition System Using Class-Dependent Tied-Mixture HMM With HBT Structure for Speech-Driven Lip-Sync. IEEE Transaction on Multimedia 10(7) (November 2008)
Sun, N., Suigetsu, K., Ayabe, T.: An Approach to Speech Driven Animation. In: International Conference on Intelligent Information Hiding and Multimedia Signal Processing, August 15-17 (2008)
Camastra, F., Verri, A.: A novel kernel method for clustering. IEEE Trans. PAMI 27(5), 801–805 (2005)
Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognition Letters 20(11-13), 1191–1199 (1999)
Author information
Authors and Affiliations
Department of Electrical Engineering, National Cheng Kung University, No. 1 University Road, Tainan City, Taiwan
Po-Yi Shih, Jhing-Fa Wang & Zong-You Chen
- Po-Yi Shih
You can also search for this author inPubMed Google Scholar
- Jhing-Fa Wang
You can also search for this author inPubMed Google Scholar
- Zong-You Chen
You can also search for this author inPubMed Google Scholar
Editor information
Editors and Affiliations
Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800, Dongchuan Road, 200240, Shanghai, China
Liqing Zhang & Bao-Liang Lu &
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water bay, Kowloon, Hong Kong, China
James Kwok
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shih, PY., Wang, JF., Chen, ZY. (2010). Kernel-Based Lip Shape Clustering with Phoneme Recognition for Real-Time Voice Driven Talking Face. In: Zhang, L., Lu, BL., Kwok, J. (eds) Advances in Neural Networks - ISNN 2010. ISNN 2010. Lecture Notes in Computer Science, vol 6064. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13318-3_64
Download citation
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-642-13317-6
Online ISBN:978-3-642-13318-3
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative