Part of the book series:Springer Handbooks ((SHB))
10kAccesses
156Citations
Abstract
This chapter describes a general representation and algorithmic framework for speech recognition based on weighted finite-state transducers. These transducers provide a common and natural representation for major components of speech recognition systems, including hidden Markov models (HMMs), context-dependency models, pronunciation dictionaries, statistical grammars, and word or phone lattices. General algorithms for building and optimizing transducer models are presented, including composition for combining models, weighted determinization and minimization for optimizing time and space requirements, and a weight pushing algorithm for redistributing transition weights optimally for speech recognition. The application of these methods to large-vocabulary recognition tasks is explained in detail, and experimental results are given, in particular for the North American Business News (NAB) task, in which these methods were used to combine HMMs, full cross-word triphones, a lexicon of 40000 words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in real time on a very simple decoder. Another example demonstrates that the same methods can be used to optimize lattices for second-pass recognition.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Starting from 10 chapters or articles per month
- Access and download chapters and articles from more than 300k books and 2,500 journals
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 68525
- Price includes VAT (Japan)
- Hardcover Book
- JPY 85657
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Explore related subjects
Discover the latest articles, books and news in related subjects, suggested using machine learning.Abbreviations
- ASR:
automatic speech recognition
- DARPA:
Defense Advanced Research Projects Agency
- DFA:
deterministic finite automata
- HMM:
hidden Markov models
- MLLR:
maximum-likelihood linear regression
- NAB:
North American Business News
- NFA:
nondeterministic finite automata
- WFSA:
weighted finite-state acceptors
- WFST:
weighted finite-state transducer
References
F. Pereira, R. Wright: Finite-state approximation of phrase-structure grammars. In:Finite-State Language Processing, ed. by E. Roche, Y. Schabes (MIT Press, Cambridge, MA 1997) pp. 149-173
M-J Nederhof: Practical experiments with regular approximation of context-free languages, Computational Linguistics, 26(1) (2000)
M. Mohri, M.-J. Nederhof: Regular approximation of context-free grammars through transformation. In:Robustness in Language and Speech Technology, ed. by J.C. Junqua, G. van Noord (Kluwer Academic, The Netherlands 2001) pp. 153-163
M. Mohri, F. Pereira, M. Riley: The design principles of a weighted finite-state transducer library, Theoret. Comput. Sci.231, 17-32 (2000)
F. Pereira, M. Riley:Finite State Language Processing, chapter Speech Recognition by Composition of Weighted Finite Automata (MIT Press, Cambridge, MA 1997)
M. Mohri: Finite-state transducers in language and speech processing, Computational Linguistics 23(2) (1997)
M. Mohri, F. Pereira, Michael Riley: Weighted automata in text and speech processing, ECAI-96 Workshop, Budapest ECAI (1996)
M. Mohri, M. Riley: Network optimizations for large vocabulary speech recognition, Speech Commun. 25(3) (1998)
M. Mohri, M. Riley, D. Hindle, A. Ljolje, F. Pereira: full expansion of context-dependent networks in large vocabulary speech recognition, Procs. ICASSP (ICASSP ʼ98), Seattle (1998)
M. Mohri, M. Riley: Integrated context-dependent networks in very large vocabulary speech recognition, Proc. 6th Europ. Conf. Speech Commun. Technol. (Eurospeech ʼ99), Budapest (1999)
S. Ortmanns, H. Ney, A. Eiden: Language-model look-ahead for large vocabulary speech recognition, Proc. Int. Conf. Spoken Lang. Proces. (ICSLPʼ96), University of Delaware and Alfred I. duPont Institute (1996) pp. 2095-2098
A. Salomaa, M. Soittola:Automata-Theoretic Aspects of Formal Power Series (Springer, New York 1978)
J. Berstel, C. Reutenauer:Rational Series and Their Languages (Springer, Heidelberg 1988)
W. Kuich, A. Salomaa:Semirings, Automata, Languages, EATCS Monographs on Theoretical Computer Science, Vol. 5 (Springer, Berlin, Heidelberg 1986)
C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, M. Mohri: OpenFST - a General and Efficient Weighted Finite-State Transducer Library, Proceedings of the International Conference on Implementation and Application of Automata, Prague (2007)
J.E. Hopcroft, J.D. Ullman:Introduction to Automata Theory, Languages, and Computation (Addison Wesley, Reading 1979)
A.V. Aho, R. Sethi, J.D. Ullman:Compilers, Principles, Techniques and Tools (Addison Wesley, Reading 1986)
A.V. Aho, J.E. Hopcroft, J.D. Ullman:The Design and Analysis of Computer Algorithms (Addison Wesley, Reading 1974)
D. Revuz: Minimisation of acyclic deterministic automata in linear time, Theoret. Comput. Sci.92, 181-189 (1992)
M. Mohri: Semiring frameworks and algorithms for shortest-distance problems, J. Automata Lang. Combinat.7(3), 321-350 (2002)
M. Mohri, M. Riley: A weight pushing algorithm for large vocabulary speech recognition, Proc. 7th Europ. Conf. Speech Commun. Technol. (Eurospeech ʼ01), Aalborg (2001)
D.J. Lehmann: Algebraic structures for transitive closures, Theoret. Comput. Sci.4, 59-76 (1977)
S. Eilenberg: Automata,Languages and Machines (Academic, San Diego 1974-1976) vol. A-B
J. Berstel:Transductions and Context-Free Languages (Teubner Studienbucher, Stuttgart 1979)
C. Allauzen, M. Mohri: Efficient algorithms for testing the twins property, J. Automata Languages Combinat.8(2), 117-144 (2003)
C. Allauzen, M. Mohri: An optimal pre-determinization algorithm for weighted transducers, Theor. Comput. Sci.328(1-2), 3-18 (2004)
M. Mohri: Statistical natural language processing. InApplied Combinatorics on Words, ed. by M. Lothaire (Cambridge Univ. Press Cambridge 2005)
S.M. Katz: Estimation of probabilities from sparse data for the language model component of a speech recogniser, IEEE Trans Acoust. Speech Signal Process.35(3), 400-401 (1987)
C. Allauzen, M. Mohri, B. Roark, M. Riley: A generalized construction of integrated speech recognition transducers, Proc. ICASSP (ICASSP 2004), Montréal (2004)
M. Riley, F. Pereira, M. Mohri: Transducer composition for context-dependent network expansion, Proc. Eurospeechʼ97, Rhodes (1997)
M. Mohri, F. C.N. Pereira: Dynamic compilation of weighted context-free grammars, 36th Annual Meeting ACL and 17th Int. Conf. Computat. Linguist., vol. 2 (1998) p. 891-897
J.W. Carlyle, A. Paz: Realizations by stochastic finite automaton, J. Comput. Syst. Sci.5, 26-40 (1971)
K. Seymore, R. Rosenfeld: Scalable backoff language models, Proc. ICSLP, Philadelphia (1996)
S. Kanthak, H. Ney, M. Riley, M. Mohri: A comparison of two LVR search optimization techniques, Proc. Int. Conf. Spoken Lang. Proces. 2002 (ICSLP ʼ02), Denver (2002)
M. Saraclar, M. Riley, E. Bocchieri, V. Goffin: Towards automatic closed captioning : Low latency real time broadcast news transcription, In Proceedings of the International Conference on Spoken Language Processing (ICSLPʼ02) (2002)
A. Ljolje, F. Pereira, M. Riley: Efficient general lattice generation and rescoring, Proc. Europ. Conf. Speech Commun. Technol. (Eurospeech ʼ99), Budapest (1999)
C. Leggetter, P. Woodland: Maximum likelihood linear regession for speaker adaptation of continuous density HMMs, Comput. Speech Lang.9(2), 171-186 (1995)
M. Riley, W. Byrne, M. Finke, S. Khudanpur, A. Ljolje, J. McDonough, H. Nock, M. Saraclar, C. Wooters, G. Zavaliagkos: Stochastic pronunciation modelling form hand-labelled phonetic corpora, Speech Commun.29, 209-224 (1999)
R. Sproat: Multilingual text analysis for text-to-speech synthesis, J. Nat. Lang. Eng.2(4), 369-380 (1997)
C. Allauzen, M. Mohri, M. Riley: Statistical modeling for unit selection in speech synthesis, In 42nd Meeting of the Association for Computational Linguistics (ACL 2004), Proceedings of the Conference, Barcelona (2004)
R. M. Kaplan, M. Kay: Regular models of phonological rule systems, Computational Linguistics 20(3) (1994)
L. Karttunen: The replace operator, In 33rd Meeting of the Association for Computational Linguistics (ACL 95), Proceedings of the Conference, MIT, Cambridge ACL (1995)
M. Crochemore, W. Rytter:Text Algorithms (Oxford Univ. Press, Oxford 1994)
K.I.I. Culik, J. Kari: Digital images and formal languages. In:Handbook of Formal Languages, ed. by G. Rozenberg, A Salomaa (Springer, Berlin, Heidelberg 1997) pp. 599-616
Author information
Authors and Affiliations
Courant Institute of Mathematical Sciences, 251 Mercer Street, 10012, New York, NY, USA
Mehryar Mohri Prof.
Department of Computer and Information Science, University of Pennsylvania, 305 Levine Hall, 3330 Walnut Street, 19104, Philadelphia, PA, USA
Fernando Pereira Ph.D
Google, Inc., Research, 111 Eighth AV, 10011, New York, NY, USA
Michael Riley Ph.D
- Mehryar Mohri Prof.
Search author on:PubMed Google Scholar
- Fernando Pereira Ph.D
Search author on:PubMed Google Scholar
- Michael Riley Ph.D
Search author on:PubMed Google Scholar
Corresponding authors
Correspondence toMehryar Mohri Prof.,Fernando Pereira Ph.D orMichael Riley Ph.D.
Editor information
Editors and Affiliations
INRS-EMT, University of Quebec, 800 de la Gauchetiere Ouest, H5A 1K6, Montreal, Quebec, Canada
Jacob Benesty Dr.
Avayalabs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
M. Mohan Sondhi Ph.D.
Alcatel-Lucent, Bell Laboratories, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Yiteng Arden Huang Dr.
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Mohri, M., Pereira, F., Riley, M. (2008). Speech Recognition with Weighted Finite-State Transducers. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_28
Download citation
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-540-49125-5
Online ISBN:978-3-540-49127-9
eBook Packages:EngineeringEngineering (R0)
Share this chapter
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


