Movatterモバイル変換

On-the-fly topic adaptation for YouTube video transcription

Kapil Thadani, Fadi Biadsy, Dan Bikel

Automatic closed-captioning of video is a useful application of speech recognition technology but poses numerous challenges when applied to open-domain user-uploaded videos such as those on YouTube. In this work, we explore a strategy to improve decoding accuracy for video transcription by decoding each video with a language model (LM) adapted specifically to the topics that the video covers. Taxonomic topic classifiers are used to determine the topic content of videos and to build a large set of topic-specific LMs from web documents. We consider strategies for selecting and interpolating LMs in both supervised and unsupervised scenarios in a two-pass lattice rescoring framework. Experiments on a YouTube video corpus show a 10% relative reduction in WER over generic single-pass transcriptions as well as a statistically significant 2.5% reduction over rescoring with a very large non-adapted LM built from all the documents.

@inproceedings{thadani12_interspeech,  title     = {On-the-fly topic adaptation for YouTube video transcription},  author    = {Kapil Thadani and Fadi Biadsy and Dan Bikel},  year      = {2012},  booktitle = {Interspeech 2012},  pages     = {210--213},  doi       = {10.21437/Interspeech.2012-69},  issn      = {2958-1796},}

Cite as:Thadani, K., Biadsy, F., Bikel, D. (2012) On-the-fly topic adaptation for YouTube video transcription. Proc. Interspeech 2012, 210-213, doi: 10.21437/Interspeech.2012-69

doi:10.21437/Interspeech.2012-69