Movatterモバイル変換


[0]ホーム

URL:


A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency

Sakriani SAKTI,Konstantin MARKOV,Satoshi NAKAMURA

  • Full Text Views

    0

Summary :

The most widely used acoustic unit in current automatic speech recognition systems is the triphone, which includes the immediate preceding and following phonetic contexts. Although triphones have proved to be an efficient choice, it is believed that they are insufficient in capturing all of the coarticulation effects. A wider phonetic context seems to be more appropriate, but often suffers from the data sparsity problem and memory constraints. Therefore, an efficient modeling of wider contexts needs to be addressed to achieve a realistic application for an automatic speech recognition system. This paper presents a new method of modeling pentaphone-context units using the hybrid HMM/BN acoustic modeling framework. Rather than modeling pentaphones explicitly, in this approach the probabilistic dependencies between the triphone context unit and the second preceding/following contexts are incorporated into the triphone state output distributions by means of the BN. The advantages of this approach are that we are able to extend the modeled phonetic context within the triphone framework, and we can use a standard decoding system by assuming the next preceding/following context variables hidden during the recognition. To handle the increased parameter number, tying using knowledge-based phoneme classes and a data-driven clustering method is applied. The evaluation experiments indicate that the proposed model outperforms the standard HMM based triphone model, achieving a 9-10% relative word error rate (WER) reduction.

Publication
IEICE TRANSACTIONS on InformationVol.E89-D No.3 pp.954-961
Publication Date
2006/03/01
Publicized
Online ISSN
1745-1361
DOI
10.1093/ietisy/e89-d.3.954
Type of Manuscript
Special Section PAPER (Special Section on Statistical Modeling for Speech Processing)
Category
Speech Recognition

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. SeeIEICE Provisions on Copyright for details.

Email Document

Cite this

Copy

Sakriani SAKTI, Konstantin MARKOV, Satoshi NAKAMURA, "A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency" in IEICE TRANSACTIONS on Information, vol. E89-D, no. 3, pp. 954-961, March 2006, doi:10.1093/ietisy/e89-d.3.954.
Abstract:The most widely used acoustic unit in current automatic speech recognition systems is the triphone, which includes the immediate preceding and following phonetic contexts. Although triphones have proved to be an efficient choice, it is believed that they are insufficient in capturing all of the coarticulation effects. A wider phonetic context seems to be more appropriate, but often suffers from the data sparsity problem and memory constraints. Therefore, an efficient modeling of wider contexts needs to be addressed to achieve a realistic application for an automatic speech recognition system. This paper presents a new method of modeling pentaphone-context units using the hybrid HMM/BN acoustic modeling framework. Rather than modeling pentaphones explicitly, in this approach the probabilistic dependencies between the triphone context unit and the second preceding/following contexts are incorporated into the triphone state output distributions by means of the BN. The advantages of this approach are that we are able to extend the modeled phonetic context within the triphone framework, and we can use a standard decoding system by assuming the next preceding/following context variables hidden during the recognition. To handle the increased parameter number, tying using knowledge-based phoneme classes and a data-driven clustering method is applied. The evaluation experiments indicate that the proposed model outperforms the standard HMM based triphone model, achieving a 9-10% relative word error rate (WER) reduction.
URL: https://globals.ieice.org/en_transactions/information/10.1093/ietisy/e89-d.3.954/_p

Copy

@ARTICLE{e89-d_3_954,
author={Sakriani SAKTI, Konstantin MARKOV, Satoshi NAKAMURA, },
journal={IEICE TRANSACTIONS on Information},
title={A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency},
year={2006},
volume={E89-D},
number={3},
pages={954-961},
abstract={The most widely used acoustic unit in current automatic speech recognition systems is the triphone, which includes the immediate preceding and following phonetic contexts. Although triphones have proved to be an efficient choice, it is believed that they are insufficient in capturing all of the coarticulation effects. A wider phonetic context seems to be more appropriate, but often suffers from the data sparsity problem and memory constraints. Therefore, an efficient modeling of wider contexts needs to be addressed to achieve a realistic application for an automatic speech recognition system. This paper presents a new method of modeling pentaphone-context units using the hybrid HMM/BN acoustic modeling framework. Rather than modeling pentaphones explicitly, in this approach the probabilistic dependencies between the triphone context unit and the second preceding/following contexts are incorporated into the triphone state output distributions by means of the BN. The advantages of this approach are that we are able to extend the modeled phonetic context within the triphone framework, and we can use a standard decoding system by assuming the next preceding/following context variables hidden during the recognition. To handle the increased parameter number, tying using knowledge-based phoneme classes and a data-driven clustering method is applied. The evaluation experiments indicate that the proposed model outperforms the standard HMM based triphone model, achieving a 9-10% relative word error rate (WER) reduction.},
keywords={},
doi={10.1093/ietisy/e89-d.3.954},
ISSN={1745-1361},
month={March},}

Copy

TY - JOUR
TI - A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency
T2 - IEICE TRANSACTIONS on Information
SP - 954
EP - 961
AU - Sakriani SAKTI
AU - Konstantin MARKOV
AU - Satoshi NAKAMURA
PY - 2006
DO -10.1093/ietisy/e89-d.3.954
JO - IEICE TRANSACTIONS on Information
SN -1745-1361
VL - E89-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2006
AB -The most widely used acoustic unit in current automatic speech recognition systems is the triphone, which includes the immediate preceding and following phonetic contexts. Although triphones have proved to be an efficient choice, it is believed that they are insufficient in capturing all of the coarticulation effects. A wider phonetic context seems to be more appropriate, but often suffers from the data sparsity problem and memory constraints. Therefore, an efficient modeling of wider contexts needs to be addressed to achieve a realistic application for an automatic speech recognition system. This paper presents a new method of modeling pentaphone-context units using the hybrid HMM/BN acoustic modeling framework. Rather than modeling pentaphones explicitly, in this approach the probabilistic dependencies between the triphone context unit and the second preceding/following contexts are incorporated into the triphone state output distributions by means of the BN. The advantages of this approach are that we are able to extend the modeled phonetic context within the triphone framework, and we can use a standard decoding system by assuming the next preceding/following context variables hidden during the recognition. To handle the increased parameter number, tying using knowledge-based phoneme classes and a data-driven clustering method is applied. The evaluation experiments indicate that the proposed model outperforms the standard HMM based triphone model, achieving a 9-10% relative word error rate (WER) reduction.
ER -

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.

IEICE DIGITAL LIBRARY

Select the flag iconEnglishEnglish
Sign In[Member]
Sign In[Non-Member]

Sign In[Non-Member]

Create Account now.

Create Account

Sign In[Member]

Create Account now.

Create Account

Links

Call for Papers
Call for Papers

Special Section

Submit to IEICE Trans.
Submit to IEICE Trans.

Information for Authors

Transactions NEWS
Transactions NEWS

 

Popular articles
Popular articles

Top 10 Downloads


[8]ページ先頭

©2009-2025 Movatter.jp