Movatterモバイル変換


[0]ホーム

URL:


ISCAArchiveInterspeech 2022
ISCAArchiveInterspeech 2022

Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

Ondrej Klejch, Electra Wallington, Peter Bell

We present a method for cross-lingual training an ASR system using absolutely no transcribed training data from the target language, and with no phonetic knowledge of the language in question. Our approach uses a novel application of a decipherment algorithm, which operates given only unpaired speech and text data from the target language. We apply this decipherment to phone sequences generated by a universal phone recogniser trained on out-of-language speech corpora, which we follow with flat-start semi-supervised training to obtain an acoustic model for the new language. To the best of our knowledge, this is the first practical approach to zero-resource cross-lingual ASR which does not rely on any hand-crafted phonetic information. We carry out experiments on read speech from the GlobalPhone corpus, and show that it is possible to learn a decipherment model on just 20 minutes of data from the target language. When used to generate pseudo-labels for semi-supervised training, we obtain WERs that range from 32.5% to just 1.9% absolute worse than the equivalent fully supervised models trained on the same data.

@inproceedings{klejch22_interspeech,  title     = {Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR},  author    = {Ondrej Klejch and Electra Wallington and Peter Bell},  year      = {2022},  booktitle = {Interspeech 2022},  pages     = {2288--2292},  doi       = {10.21437/Interspeech.2022-10170},  issn      = {2958-1796},}

Cite as:Klejch, O., Wallington, E., Bell, P. (2022) Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR. Proc. Interspeech 2022, 2288-2292, doi: 10.21437/Interspeech.2022-10170

doi:10.21437/Interspeech.2022-10170

[8]ページ先頭

©2009-2025 Movatter.jp