Detecting the presence of quotations in speech is a difficult task for automatic natural language understanding. This paper presents a study on the correlation between three prosodic features present in a voice command and the presence or absence of quotations. These features consist of intra-word pause durations, F0 reset and F0 continuity. A combination of lexical and prosodic extraction tools was used to extract these features. The two-sample Kolmogorov-Smirnov test was then used to compare the distributions of the collected measures. The results show a correlation between these features and the presence or absence of quotations. Moreover, the results show that it is possible to use these features to differentiate direct from indirect quotations.
@inproceedings{boutin15_interspeech, title = {Audio quotation marks for natural language understanding}, author = {Simon Boutin and Réal Tremblay and Patrick Cardinal and Doug Peters and Pierre Dumouchel}, year = {2015}, booktitle = {Interspeech 2015}, pages = {1349--1352}, doi = {10.21437/Interspeech.2015-46}, issn = {2958-1796},}
Cite as:Boutin, S., Tremblay, R., Cardinal, P., Peters, D., Dumouchel, P. (2015) Audio quotation marks for natural language understanding. Proc. Interspeech 2015, 1349-1352, doi: 10.21437/Interspeech.2015-46