Training task-oriented dialogue systems requires significant amountof manual effort and integration of many independently built components;moreover, the pipeline is prone to error-propagation. End-to-end traininghas been proposed to overcome these problems by training the wholesystem over the utterances of both dialogue parties. In this paperwe present an end-to-end spoken dialogue system architecture that isbased on turn embeddings. Turn embeddings encode a robust representationof user turns with a local dialogue history and they are trained usingsequence-to-sequence models. Turn embeddings are trained by generatingthe previous and the next turns of the dialogue and additionally performspoken language understanding. The end-to-end spoken dialogue systemis trained using the pre-trained turn embeddings in a stateful architecturethat considers the whole dialogue history. We observe that the proposedspoken dialogue system architecture outperforms the models based onlocal-only dialogue history and it is robust to automatic speech recognitionerrors.
@inproceedings{bayer17_interspeech, title = {Towards End-to-End Spoken Dialogue Systems with Turn Embeddings}, author = {Ali Orkan Bayer and Evgeny A. Stepanov and Giuseppe Riccardi}, year = {2017}, booktitle = {Interspeech 2017}, pages = {2516--2520}, doi = {10.21437/Interspeech.2017-1574}, issn = {2958-1796},}
Cite as:Bayer, A.O., Stepanov, E.A., Riccardi, G. (2017) Towards End-to-End Spoken Dialogue Systems with Turn Embeddings. Proc. Interspeech 2017, 2516-2520, doi: 10.21437/Interspeech.2017-1574