Movatterモバイル変換


[0]ホーム

URL:


Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models

Randy GOMEZ,Akinobu LEE,Tomoki TODA,Hiroshi SARUWATARI,Kiyohiro SHIKANO

  • Full Text Views

    0

Summary :

This paper describes the method of using multi-template unsupervised speaker adaptation based on HMM-Sufficient Statistics to push up the adaptation performance while keeping adaptation time within few seconds with just one arbitrary utterance. This adaptation scheme is mainly composed of two processes. The first part is done offline which involves the training of multiple class-dependent acoustic models and the creation of speakers' HMM-Sufficient Statistics based on gender and age. The second part is performed online where adaptation begins using the single utterance of a test speaker. From this utterance, the system will classify the speaker's class and consequently select the N-best neighbor speakers close to the utterance using Gaussian Mixture Models (GMM). The classified speakers' class template model is then adopted as a base model. From this template model, the adapted model is rapidly constructed using the N-best neighbor speakers' HMM-Sufficient Statistics. Experiments in noisy environment conditions with 20 dB, 15 dB and 10 dB SNR office, crowd, booth, and car noise are performed. The proposed multi-template method achieved 89.5% word accuracy rate compared with 88.1% of the conventional single-template method, while the baseline recognition rate without adaptation is 86.4%. Moreover, experiments using Vocal Tract Length Normalization (VTLN) and supervised Maximum Likelihood Linear Regression (MLLR) are also compared.

Publication
IEICE TRANSACTIONS on InformationVol.E89-D No.3 pp.998-1005
Publication Date
2006/03/01
Publicized
Online ISSN
1745-1361
DOI
10.1093/ietisy/e89-d.3.998
Type of Manuscript
Special Section PAPER (Special Section on Statistical Modeling for Speech Processing)
Category
Speech Recognition

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. SeeIEICE Provisions on Copyright for details.

Email Document

Cite this

Copy

Randy GOMEZ, Akinobu LEE, Tomoki TODA, Hiroshi SARUWATARI, Kiyohiro SHIKANO, "Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models" in IEICE TRANSACTIONS on Information, vol. E89-D, no. 3, pp. 998-1005, March 2006, doi:10.1093/ietisy/e89-d.3.998.
Abstract:This paper describes the method of using multi-template unsupervised speaker adaptation based on HMM-Sufficient Statistics to push up the adaptation performance while keeping adaptation time within few seconds with just one arbitrary utterance. This adaptation scheme is mainly composed of two processes. The first part is done offline which involves the training of multiple class-dependent acoustic models and the creation of speakers' HMM-Sufficient Statistics based on gender and age. The second part is performed online where adaptation begins using the single utterance of a test speaker. From this utterance, the system will classify the speaker's class and consequently select the N-best neighbor speakers close to the utterance using Gaussian Mixture Models (GMM). The classified speakers' class template model is then adopted as a base model. From this template model, the adapted model is rapidly constructed using the N-best neighbor speakers' HMM-Sufficient Statistics. Experiments in noisy environment conditions with 20 dB, 15 dB and 10 dB SNR office, crowd, booth, and car noise are performed. The proposed multi-template method achieved 89.5% word accuracy rate compared with 88.1% of the conventional single-template method, while the baseline recognition rate without adaptation is 86.4%. Moreover, experiments using Vocal Tract Length Normalization (VTLN) and supervised Maximum Likelihood Linear Regression (MLLR) are also compared.
URL: https://globals.ieice.org/en_transactions/information/10.1093/ietisy/e89-d.3.998/_p

Copy

@ARTICLE{e89-d_3_998,
author={Randy GOMEZ, Akinobu LEE, Tomoki TODA, Hiroshi SARUWATARI, Kiyohiro SHIKANO, },
journal={IEICE TRANSACTIONS on Information},
title={Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models},
year={2006},
volume={E89-D},
number={3},
pages={998-1005},
abstract={This paper describes the method of using multi-template unsupervised speaker adaptation based on HMM-Sufficient Statistics to push up the adaptation performance while keeping adaptation time within few seconds with just one arbitrary utterance. This adaptation scheme is mainly composed of two processes. The first part is done offline which involves the training of multiple class-dependent acoustic models and the creation of speakers' HMM-Sufficient Statistics based on gender and age. The second part is performed online where adaptation begins using the single utterance of a test speaker. From this utterance, the system will classify the speaker's class and consequently select the N-best neighbor speakers close to the utterance using Gaussian Mixture Models (GMM). The classified speakers' class template model is then adopted as a base model. From this template model, the adapted model is rapidly constructed using the N-best neighbor speakers' HMM-Sufficient Statistics. Experiments in noisy environment conditions with 20 dB, 15 dB and 10 dB SNR office, crowd, booth, and car noise are performed. The proposed multi-template method achieved 89.5% word accuracy rate compared with 88.1% of the conventional single-template method, while the baseline recognition rate without adaptation is 86.4%. Moreover, experiments using Vocal Tract Length Normalization (VTLN) and supervised Maximum Likelihood Linear Regression (MLLR) are also compared.},
keywords={},
doi={10.1093/ietisy/e89-d.3.998},
ISSN={1745-1361},
month={March},}

Copy

TY - JOUR
TI - Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models
T2 - IEICE TRANSACTIONS on Information
SP - 998
EP - 1005
AU - Randy GOMEZ
AU - Akinobu LEE
AU - Tomoki TODA
AU - Hiroshi SARUWATARI
AU - Kiyohiro SHIKANO
PY - 2006
DO -10.1093/ietisy/e89-d.3.998
JO - IEICE TRANSACTIONS on Information
SN -1745-1361
VL - E89-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2006
AB -This paper describes the method of using multi-template unsupervised speaker adaptation based on HMM-Sufficient Statistics to push up the adaptation performance while keeping adaptation time within few seconds with just one arbitrary utterance. This adaptation scheme is mainly composed of two processes. The first part is done offline which involves the training of multiple class-dependent acoustic models and the creation of speakers' HMM-Sufficient Statistics based on gender and age. The second part is performed online where adaptation begins using the single utterance of a test speaker. From this utterance, the system will classify the speaker's class and consequently select the N-best neighbor speakers close to the utterance using Gaussian Mixture Models (GMM). The classified speakers' class template model is then adopted as a base model. From this template model, the adapted model is rapidly constructed using the N-best neighbor speakers' HMM-Sufficient Statistics. Experiments in noisy environment conditions with 20 dB, 15 dB and 10 dB SNR office, crowd, booth, and car noise are performed. The proposed multi-template method achieved 89.5% word accuracy rate compared with 88.1% of the conventional single-template method, while the baseline recognition rate without adaptation is 86.4%. Moreover, experiments using Vocal Tract Length Normalization (VTLN) and supervised Maximum Likelihood Linear Regression (MLLR) are also compared.
ER -

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.

IEICE DIGITAL LIBRARY

Select the flag iconEnglishEnglish
Sign In[Member]
Sign In[Non-Member]

Sign In[Non-Member]

Create Account now.

Create Account

Sign In[Member]

Create Account now.

Create Account

Links

Call for Papers
Call for Papers

Special Section

Submit to IEICE Trans.
Submit to IEICE Trans.

Information for Authors

Transactions NEWS
Transactions NEWS

 

Popular articles
Popular articles

Top 10 Downloads


[8]ページ先頭

©2009-2025 Movatter.jp