Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 11961))
Included in the following conference series:
3086Accesses
Abstract
We present a new solution towards building a crowd knowledge enhanced multimodal conversational system for travel. It aims to assist users in completing various travel-related tasks, such as searching for restaurants or things to do, in a multimodal conversation manner involving both text and images. In order to achieve this goal, we ground this research on the combination of multimodal understanding and recommendation techniques which explores the possibility of a more convenient information seeking paradigm. Specifically, we build the system in a modular manner where each modular construction is enriched with crowd knowledge from social sites. To the best of our knowledge, this is the first work that attempts to build intelligent multimodal conversational systems for travel, and moves an important step towards developing human-like assistants for completion of daily life tasks. Several current challenges are also pointed out as our future directions.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 13727
- Price includes VAT (Japan)
- Softcover Book
- JPY 17159
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprintarXiv:1409.0473 (2014)
Bordes, A., Weston, J.: Learning end-to-end goal-oriented dialog. In: The 3rd International Conference on Learning Representations, pp. 1–14 (2016)
Budzianowski, P., et al.: MultiWOZ - a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In: EMNLP, pp. 5016–5026 (2018)
Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. In: SIGMOD, pp. 313–324. ACM (2003)
Chen, Y.N., Wang, W.Y., Rudnicky, A.I.: Leveraging frame semantics and distributional semantics for unsupervised semantic slot induction in spoken dialogue systems. In: 2014 IEEE Spoken Language Technology Workshop, pp. 584–589 (2014)
Ester, M., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Li, R., Kahou, S.E., Schulz, H., Michalski, V., Charlin, L., Pal, C.: Towards deep conversational recommendations. In: NIPS, pp. 9748–9758 (2018)
Liao, L., He, X., Ren, Z., Nie, L., Xu, H., Chua, T.S.: Representativeness-aware aspect analysis for brand monitoring in social media. In: IJCAI, pp. 310–316 (2017)
Liao, L., Takanobu, R., Ma, Y., Yang, X., Huang, M., Chua, T.: Deep conversational recommender in travel.arxiv:1907.00710 (2019)
Liu, B., Lane, I.: Attention-based recurrent neural network models for joint intent detection and slot filling. arXiv preprintarXiv:1609.01454 (2016)
Madotto, A., Wu, C.S., Fung, P.: Mem2seq: effectively incorporating knowledge bases into end-to-end task-oriented dialog systems. In: ACL, pp. 1468–1478 (2018)
Rieser, V., Lemon, O.: Natural language generation as planning under uncertainty for spoken dialogue systems. In: Krahmer, E., Theune, M. (eds.) EACL/ENLG -2009. LNCS (LNAI), vol. 5790, pp. 105–120. Springer, Heidelberg (2010).https://doi.org/10.1007/978-3-642-15573-4_6
Sukhbaatar, S., et al.: End-to-end memory networks. In: NIPS, pp. 2440–2448 (2015)
Sun, Y., Zhang, Y.: Conversational recommender system. In: SIGIR, pp. 235–244 (2018)
Tur, G., Jeong, M., Wang, Y.Y., Hakkani-Tür, D., Heck, L.: Exploiting the semantic web for unsupervised natural language semantic parsing. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)
Wen, T.H., et al.: A network-based end-to-end trainable task-oriented dialogue system. In: EACL, pp. 438–449 (2017)
Yan, Z., Duan, N., Chen, P., Zhou, M., Zhou, J., Li, Z.: Building task-oriented dialogue systems for online shopping. In: AAAI, pp. 4618–4625 (2017)
Author information
Authors and Affiliations
NGS, National University of Singapore, Singapore, Singapore
Lizi Liao & Tat-Seng Chua
FXPAL, Palo Alto, USA
Lyndon Kennedy & Lynn Wilcox
- Lizi Liao
You can also search for this author inPubMed Google Scholar
- Lyndon Kennedy
You can also search for this author inPubMed Google Scholar
- Lynn Wilcox
You can also search for this author inPubMed Google Scholar
- Tat-Seng Chua
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toLizi Liao.
Editor information
Editors and Affiliations
Korea Advanced Institute of Science and, Daejeon, Korea (Republic of)
Yong Man Ro
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Junmo Kim
National Cheng Kung University, Tainan City, Taiwan
Wei-Ta Chu
Tsinghua University, Beijing, China
Peng Cui
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Jung-Woo Choi
National Tsing Hua University, Hsinchu, Taiwan
Min-Chun Hu
Ghent University, Ghent, Belgium
Wesley De Neve
Appendices
Appendix A: State Tracking
State tracking refers to the maintenance of the dialogue state\(\mathcal {S}_t\) which is the representation of the conversation session until timet. Based on the state\(\mathcal {S}_{t-1}\) in former time step and the multimodal understanding result\(\mathcal {U}_t\) for utterance in time stept, the dialogue state is obtained as follows:
where\(\mathcal {G}\) refers to a set of rules. We generally summarize the rules as below:
- (1)
if\(\mathcal {M}_t = Chitchat\), then\(\mathcal {S}_t = \mathcal {S}_{t-1}\);
- (2)
if domain\(\mathcal {D}_t\) is changed:\(\mathcal {S}_t\) will be updated totally based on\(\mathcal {U}_t\);
- (3)
if domain\(\mathcal {D}_t\) is not changed: if\(\mathcal {M}_t \ne Negation\),\(\mathcal {S}_t\) will inherit information stored in\(\mathcal {S}_{t-1}\);
- (4)
if domain\(\mathcal {D}_t\) is not changed: if\(\mathcal {M}_t = Negation\),\(\mathcal {S}_t\) will inherit information stored in\(\mathcal {S}_{t-1}\) while update the parts according\(\mathcal {U}_t\);
- (5)
if the time interval between two consecutive utterances exceeds a pre-defined length at timet, then\(\mathcal {S}_t\) will be cleaned.
Tracking dialogue states is the key to elevate user experience on multi-turn conversation. The main reason that we do not follow the previous works to learn models is because of the lack of dialogue data to train statistical tracking models. We leave leveraging session-level labeled dialogue data to improve the state tracking task as our future work.
Appendix B: Action Decision
At each turn of conversation between user and agent, the dialogue management module takes the current state tracking results as input, and outputs the corresponding actions. Due to the lack of large-scale dialogue training data, we also resort to a set of rules. Intuitively, the main action types considered and the conditions for triggering it are as below:
Proactive Questioning. This action will be triggered when (a) a recommendation intent is detected, (b) a domain is detected, and (c) no enough constraints or attributes is detected in\(\mathcal {S}_t\). This action is often used to obtain more constraints or attributes to narrow down the search space.
Candidate Listing. This action is often triggered when recommendation results are obtained or theShow more intent is detected. As each venue in our dataset is associated with Foursquare photos, we implement candidate listing via a list of images where each image corresponds to a venue. In the interface, user can conveniently choose a venue by simply clicking its corresponding image.
Venue Recommendation. This action is triggered when the intent\(\mathcal {I}_t\) in\(\mathcal {S}_t\) isRecommendation, and it will retrieve results from the recommendation module.
Question Answering. It will be triggered when a venue is selected and one of its slot names are detected in\(\mathcal {U}_t\) without value around. It returns the missing attribute value by looking up the venue database.
Review Summary. This action will be triggered when a venue is selected and the intent\(\mathcal {I}_t\) in\(\mathcal {S}_t\) isAsk opinion. It will summarize the reviews of the target venue and present in organized form.
API call. It will be triggered when a venue is selected and theMap direction intent is present in\(\mathcal {I}_t\). The Google Map API will be called with the start position and destination. Currently, only the Map API is integrated. However, other APIs such as weather report can also be integrated with proper modifications.
Chitchat. This action will be triggered when none travel venue seeking related intent is detected. As pointed out by [18], nearly 80% of utterances are chitchat queries for e-commerce bots. If the system cannot reply to them, then the conversation may not be able to continue. Thus, it will activate the chitchat response generation to obtain a reply.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Liao, L., Kennedy, L., Wilcox, L., Chua, TS. (2020). Crowd Knowledge Enhanced Multimodal Conversational Assistant in Travel Domain. In: Ro, Y.,et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11961. Springer, Cham. https://doi.org/10.1007/978-3-030-37731-1_33
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-030-37730-4
Online ISBN:978-3-030-37731-1
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative