Movatterモバイル変換

Towards End-to-End Spoken Dialogue Systems with Turn Embeddings

Ali Orkan Bayer, Evgeny A. Stepanov, Giuseppe Riccardi

Training task-oriented dialogue systems requires significant amountof manual effort and integration of many independently built components;moreover, the pipeline is prone to error-propagation. End-to-end traininghas been proposed to overcome these problems by training the wholesystem over the utterances of both dialogue parties. In this paperwe present an end-to-end spoken dialogue system architecture that isbased on turn embeddings. Turn embeddings encode a robust representationof user turns with a local dialogue history and they are trained usingsequence-to-sequence models. Turn embeddings are trained by generatingthe previous and the next turns of the dialogue and additionally performspoken language understanding. The end-to-end spoken dialogue systemis trained using the pre-trained turn embeddings in a stateful architecturethat considers the whole dialogue history. We observe that the proposedspoken dialogue system architecture outperforms the models based onlocal-only dialogue history and it is robust to automatic speech recognitionerrors.

@inproceedings{bayer17_interspeech,  title     = {Towards End-to-End Spoken Dialogue Systems with Turn Embeddings},  author    = {Ali Orkan Bayer and Evgeny A. Stepanov and Giuseppe Riccardi},  year      = {2017},  booktitle = {Interspeech 2017},  pages     = {2516--2520},  doi       = {10.21437/Interspeech.2017-1574},  issn      = {2958-1796},}

Cite as:Bayer, A.O., Stepanov, E.A., Riccardi, G. (2017) Towards End-to-End Spoken Dialogue Systems with Turn Embeddings. Proc. Interspeech 2017, 2516-2520, doi: 10.21437/Interspeech.2017-1574

doi:10.21437/Interspeech.2017-1574