Movatterモバイル変換

Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

Awni Hannun, Ann Lee, Qiantong Xu, Ronan Collobert

We propose a fully convolutional sequence-to-sequence encoder architecturewith a simple and efficient decoder. Our model improves WER on LibriSpeechwhile being an order of magnitude more efficient than a strong RNNbaseline. Key to our approach is a time-depth separable convolutionblock which dramatically reduces the number of parameters in the modelwhile keeping the receptive field large. We also give a stable andefficient beam search inference procedure which allows us to effectivelyintegrate a language model. Coupled with a convolutional language model,our time-depth separable convolution architecture improves by morethan 22% relative WER over the best previously reported sequence-to-sequenceresults on the noisy LibriSpeech test set.

@inproceedings{hannun19_interspeech,  title     = {Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions},  author    = {Awni Hannun and Ann Lee and Qiantong Xu and Ronan Collobert},  year      = {2019},  booktitle = {Interspeech 2019},  pages     = {3785--3789},  doi       = {10.21437/Interspeech.2019-2460},  issn      = {2958-1796},}

Cite as:Hannun, A., Lee, A., Xu, Q., Collobert, R. (2019) Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions. Proc. Interspeech 2019, 3785-3789, doi: 10.21437/Interspeech.2019-2460

doi:10.21437/Interspeech.2019-2460