Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Speech Recognition with Weighted Finite-State Transducers

  • Chapter

Part of the book series:Springer Handbooks ((SHB))

Abstract

This chapter describes a general representation and algorithmic framework for speech recognition based on weighted finite-state transducers. These transducers provide a common and natural representation for major components of speech recognition systems, including hidden Markov models (HMMs), context-dependency models, pronunciation dictionaries, statistical grammars, and word or phone lattices. General algorithms for building and optimizing transducer models are presented, including composition for combining models, weighted determinization and minimization for optimizing time and space requirements, and a weight pushing algorithm for redistributing transition weights optimally for speech recognition. The application of these methods to large-vocabulary recognition tasks is explained in detail, and experimental results are given, in particular for the North American Business News (NAB) task, in which these methods were used to combine HMMs, full cross-word triphones, a lexicon of 40000 words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in real time on a very simple decoder. Another example demonstrates that the same methods can be used to optimize lattices for second-pass recognition.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from ¥17,985 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 68525
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
JPY 85657
Price includes VAT (Japan)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

Explore related subjects

Discover the latest articles, books and news in related subjects, suggested using machine learning.

Abbreviations

ASR:

automatic speech recognition

DARPA:

Defense Advanced Research Projects Agency

DFA:

deterministic finite automata

HMM:

hidden Markov models

MLLR:

maximum-likelihood linear regression

NAB:

North American Business News

NFA:

nondeterministic finite automata

WFSA:

weighted finite-state acceptors

WFST:

weighted finite-state transducer

References

  1. F. Pereira, R. Wright: Finite-state approximation of phrase-structure grammars. In:Finite-State Language Processing, ed. by E. Roche, Y. Schabes (MIT Press, Cambridge, MA 1997) pp. 149-173

    Google Scholar 

  2. M-J Nederhof: Practical experiments with regular approximation of context-free languages, Computational Linguistics, 26(1) (2000)

    Google Scholar 

  3. M. Mohri, M.-J. Nederhof: Regular approximation of context-free grammars through transformation. In:Robustness in Language and Speech Technology, ed. by J.C. Junqua, G. van Noord (Kluwer Academic, The Netherlands 2001) pp. 153-163

    Chapter  Google Scholar 

  4. M. Mohri, F. Pereira, M. Riley: The design principles of a weighted finite-state transducer library, Theoret. Comput. Sci.231, 17-32 (2000)

    Article MathSciNet MATH  Google Scholar 

  5. F. Pereira, M. Riley:Finite State Language Processing, chapter Speech Recognition by Composition of Weighted Finite Automata (MIT Press, Cambridge, MA 1997)

    Google Scholar 

  6. M. Mohri: Finite-state transducers in language and speech processing, Computational Linguistics 23(2) (1997)

    Google Scholar 

  7. M. Mohri, F. Pereira, Michael Riley: Weighted automata in text and speech processing, ECAI-96 Workshop, Budapest ECAI (1996)

    Google Scholar 

  8. M. Mohri, M. Riley: Network optimizations for large vocabulary speech recognition, Speech Commun. 25(3) (1998)

    Google Scholar 

  9. M. Mohri, M. Riley, D. Hindle, A. Ljolje, F. Pereira: full expansion of context-dependent networks in large vocabulary speech recognition, Procs. ICASSP (ICASSP ʼ98), Seattle (1998)

    Google Scholar 

  10. M. Mohri, M. Riley: Integrated context-dependent networks in very large vocabulary speech recognition, Proc. 6th Europ. Conf. Speech Commun. Technol. (Eurospeech ʼ99), Budapest (1999)

    Google Scholar 

  11. S. Ortmanns, H. Ney, A. Eiden: Language-model look-ahead for large vocabulary speech recognition, Proc. Int. Conf. Spoken Lang. Proces. (ICSLPʼ96), University of Delaware and Alfred I. duPont Institute (1996) pp. 2095-2098

    Google Scholar 

  12. A. Salomaa, M. Soittola:Automata-Theoretic Aspects of Formal Power Series (Springer, New York 1978)

    Book MATH  Google Scholar 

  13. J. Berstel, C. Reutenauer:Rational Series and Their Languages (Springer, Heidelberg 1988)

    Book MATH  Google Scholar 

  14. W. Kuich, A. Salomaa:Semirings, Automata, Languages, EATCS Monographs on Theoretical Computer Science, Vol. 5 (Springer, Berlin, Heidelberg 1986)

    Google Scholar 

  15. C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, M. Mohri: OpenFST - a General and Efficient Weighted Finite-State Transducer Library, Proceedings of the International Conference on Implementation and Application of Automata, Prague (2007)

    Google Scholar 

  16. J.E. Hopcroft, J.D. Ullman:Introduction to Automata Theory, Languages, and Computation (Addison Wesley, Reading 1979)

    MATH  Google Scholar 

  17. A.V. Aho, R. Sethi, J.D. Ullman:Compilers, Principles, Techniques and Tools (Addison Wesley, Reading 1986)

    MATH  Google Scholar 

  18. A.V. Aho, J.E. Hopcroft, J.D. Ullman:The Design and Analysis of Computer Algorithms (Addison Wesley, Reading 1974)

    MATH  Google Scholar 

  19. D. Revuz: Minimisation of acyclic deterministic automata in linear time, Theoret. Comput. Sci.92, 181-189 (1992)

    Article MathSciNet MATH  Google Scholar 

  20. M. Mohri: Semiring frameworks and algorithms for shortest-distance problems, J. Automata Lang. Combinat.7(3), 321-350 (2002)

    MathSciNet MATH  Google Scholar 

  21. M. Mohri, M. Riley: A weight pushing algorithm for large vocabulary speech recognition, Proc. 7th Europ. Conf. Speech Commun. Technol. (Eurospeech ʼ01), Aalborg (2001)

    Google Scholar 

  22. D.J. Lehmann: Algebraic structures for transitive closures, Theoret. Comput. Sci.4, 59-76 (1977)

    Article MathSciNet MATH  Google Scholar 

  23. S. Eilenberg: Automata,Languages and Machines (Academic, San Diego 1974-1976) vol. A-B

    Google Scholar 

  24. J. Berstel:Transductions and Context-Free Languages (Teubner Studienbucher, Stuttgart 1979)

    Book MATH  Google Scholar 

  25. C. Allauzen, M. Mohri: Efficient algorithms for testing the twins property, J. Automata Languages Combinat.8(2), 117-144 (2003)

    MathSciNet MATH  Google Scholar 

  26. C. Allauzen, M. Mohri: An optimal pre-determinization algorithm for weighted transducers, Theor. Comput. Sci.328(1-2), 3-18 (2004)

    Article MathSciNet MATH  Google Scholar 

  27. M. Mohri: Statistical natural language processing. InApplied Combinatorics on Words, ed. by M. Lothaire (Cambridge Univ. Press Cambridge 2005)

    Google Scholar 

  28. S.M. Katz: Estimation of probabilities from sparse data for the language model component of a speech recogniser, IEEE Trans Acoust. Speech Signal Process.35(3), 400-401 (1987)

    Article  Google Scholar 

  29. C. Allauzen, M. Mohri, B. Roark, M. Riley: A generalized construction of integrated speech recognition transducers, Proc. ICASSP (ICASSP 2004), Montréal (2004)

    Google Scholar 

  30. M. Riley, F. Pereira, M. Mohri: Transducer composition for context-dependent network expansion, Proc. Eurospeechʼ97, Rhodes (1997)

    Google Scholar 

  31. M. Mohri, F. C.N. Pereira: Dynamic compilation of weighted context-free grammars, 36th Annual Meeting ACL and 17th Int. Conf. Computat. Linguist., vol. 2 (1998) p. 891-897

    Google Scholar 

  32. J.W. Carlyle, A. Paz: Realizations by stochastic finite automaton, J. Comput. Syst. Sci.5, 26-40 (1971)

    Article MATH  Google Scholar 

  33. K. Seymore, R. Rosenfeld: Scalable backoff language models, Proc. ICSLP, Philadelphia (1996)

    Google Scholar 

  34. S. Kanthak, H. Ney, M. Riley, M. Mohri: A comparison of two LVR search optimization techniques, Proc. Int. Conf. Spoken Lang. Proces. 2002 (ICSLP ʼ02), Denver (2002)

    Google Scholar 

  35. M. Saraclar, M. Riley, E. Bocchieri, V. Goffin: Towards automatic closed captioning : Low latency real time broadcast news transcription, In Proceedings of the International Conference on Spoken Language Processing (ICSLPʼ02) (2002)

    Google Scholar 

  36. A. Ljolje, F. Pereira, M. Riley: Efficient general lattice generation and rescoring, Proc. Europ. Conf. Speech Commun. Technol. (Eurospeech ʼ99), Budapest (1999)

    Google Scholar 

  37. C. Leggetter, P. Woodland: Maximum likelihood linear regession for speaker adaptation of continuous density HMMs, Comput. Speech Lang.9(2), 171-186 (1995)

    Article  Google Scholar 

  38. M. Riley, W. Byrne, M. Finke, S. Khudanpur, A. Ljolje, J. McDonough, H. Nock, M. Saraclar, C. Wooters, G. Zavaliagkos: Stochastic pronunciation modelling form hand-labelled phonetic corpora, Speech Commun.29, 209-224 (1999)

    Article  Google Scholar 

  39. R. Sproat: Multilingual text analysis for text-to-speech synthesis, J. Nat. Lang. Eng.2(4), 369-380 (1997)

    Article  Google Scholar 

  40. C. Allauzen, M. Mohri, M. Riley: Statistical modeling for unit selection in speech synthesis, In 42nd Meeting of the Association for Computational Linguistics (ACL 2004), Proceedings of the Conference, Barcelona (2004)

    Google Scholar 

  41. R. M. Kaplan, M. Kay: Regular models of phonological rule systems, Computational Linguistics 20(3) (1994)

    Google Scholar 

  42. L. Karttunen: The replace operator, In 33rd Meeting of the Association for Computational Linguistics (ACL 95), Proceedings of the Conference, MIT, Cambridge ACL (1995)

    Google Scholar 

  43. M. Crochemore, W. Rytter:Text Algorithms (Oxford Univ. Press, Oxford 1994)

    MATH  Google Scholar 

  44. K.I.I. Culik, J. Kari: Digital images and formal languages. In:Handbook of Formal Languages, ed. by G. Rozenberg, A Salomaa (Springer, Berlin, Heidelberg 1997) pp. 599-616

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Courant Institute of Mathematical Sciences, 251 Mercer Street, 10012, New York, NY, USA

    Mehryar Mohri Prof.

  2. Department of Computer and Information Science, University of Pennsylvania, 305 Levine Hall, 3330 Walnut Street, 19104, Philadelphia, PA, USA

    Fernando Pereira Ph.D

  3. Google, Inc., Research, 111 Eighth AV, 10011, New York, NY, USA

    Michael Riley Ph.D

Authors
  1. Mehryar Mohri Prof.
  2. Fernando Pereira Ph.D
  3. Michael Riley Ph.D

Corresponding authors

Correspondence toMehryar Mohri Prof.,Fernando Pereira Ph.D orMichael Riley Ph.D.

Editor information

Editors and Affiliations

  1. INRS-EMT, University of Quebec, 800 de la Gauchetiere Ouest, H5A 1K6, Montreal, Quebec, Canada

    Jacob Benesty Dr.

  2. Avayalabs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA

    M. Mohan Sondhi Ph.D.

  3. Alcatel-Lucent, Bell Laboratories, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA

    Yiteng Arden Huang Dr.

Rights and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Mohri, M., Pereira, F., Riley, M. (2008). Speech Recognition with Weighted Finite-State Transducers. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_28

Download citation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Access this chapter

Subscribe and save

Springer+
from ¥17,985 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 68525
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
JPY 85657
Price includes VAT (Japan)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp