There has been huge progress in speech recognition over the last severalyears. Tasks once thought extremely difficult, such as SWITCHBOARD,now approach levels of human performance. The MALACH corpus (LDC catalogLDC2012S05), a 375-Hour subset of a large archive of Holocaust testimoniescollected by the Survivors of the Shoah Visual History Foundation,presents significant challenges to the speech community. The collectionconsists of unconstrained, natural speech filled with disfluencies,heavy accents, age-related coarticulations, un-cued speaker and languageswitching, and emotional speech - all still open problems for speechrecognition systems. Transcription is challenging even for skilledhuman annotators. This paper proposes that the community place focuson the MALACH corpus to develop speech recognition systems that aremore robust with respect to accents, disfluencies and emotional speech.To reduce the barrier for entry, a lexicon and training and testingsetups have been created and baseline results using current deep learningtechnologies are presented. The metadata has just been released byLDC (LDC2019S11). It is hoped that this resource will enable the communityto build on top of these baselines so that the extremely importantinformation in these and related oral histories becomes accessibleto a wider audience.
@inproceedings{picheny19_interspeech, title = {Challenging the Boundaries of Speech Recognition: The MALACH Corpus}, author = {Michael Picheny and Zoltán Tüske and Brian Kingsbury and Kartik Audhkhasi and Xiaodong Cui and George Saon}, year = {2019}, booktitle = {Interspeech 2019}, pages = {326--330}, doi = {10.21437/Interspeech.2019-1907}, issn = {2958-1796},}
Cite as:Picheny, M., Tüske, Z., Kingsbury, B., Audhkhasi, K., Cui, X., Saon, G. (2019) Challenging the Boundaries of Speech Recognition: The MALACH Corpus. Proc. Interspeech 2019, 326-330, doi: 10.21437/Interspeech.2019-1907