Multimodal affective computing, learning to recognize and interpret human affect and subjective information from multiple data sources, is still a challenge because: (i) it is hard to extract informative features to represent human affects from heterogeneous inputs; (ii) current fusion strategies only fuse different modalities at abstract levels, ignoring time-dependent interactions between modalities. Addressing such issues, we introduce a hierarchical multimodal architecture with attention and word-level fusion to classify utterance-level sentiment and emotion from text and audio data. Our introduced model outperforms state-of-the-art approaches on published datasets, and we demonstrate that our model is able to visualize and interpret synchronized attention over modalities.
@inproceedings{gu-etal-2018-multimodal, title = "Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment", author = "Gu, Yue and Yang, Kangning and Fu, Shiyu and Chen, Shuhong and Li, Xinyu and Marsic, Ivan", editor = "Gurevych, Iryna and Miyao, Yusuke", booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = jul, year = "2018", address = "Melbourne, Australia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P18-1207/", doi = "10.18653/v1/P18-1207", pages = "2225--2235", abstract = "Multimodal affective computing, learning to recognize and interpret human affect and subjective information from multiple data sources, is still a challenge because: (i) it is hard to extract informative features to represent human affects from heterogeneous inputs; (ii) current fusion strategies only fuse different modalities at abstract levels, ignoring time-dependent interactions between modalities. Addressing such issues, we introduce a hierarchical multimodal architecture with attention and word-level fusion to classify utterance-level sentiment and emotion from text and audio data. Our introduced model outperforms state-of-the-art approaches on published datasets, and we demonstrate that our model is able to visualize and interpret synchronized attention over modalities."}
<?xml version="1.0" encoding="UTF-8"?><modsCollection xmlns="http://www.loc.gov/mods/v3"><mods ID="gu-etal-2018-multimodal"> <titleInfo> <title>Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment</title> </titleInfo> <name type="personal"> <namePart type="given">Yue</namePart> <namePart type="family">Gu</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Kangning</namePart> <namePart type="family">Yang</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Shiyu</namePart> <namePart type="family">Fu</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Shuhong</namePart> <namePart type="family">Chen</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Xinyu</namePart> <namePart type="family">Li</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ivan</namePart> <namePart type="family">Marsic</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2018-07</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</title> </titleInfo> <name type="personal"> <namePart type="given">Iryna</namePart> <namePart type="family">Gurevych</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yusuke</namePart> <namePart type="family">Miyao</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Melbourne, Australia</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>Multimodal affective computing, learning to recognize and interpret human affect and subjective information from multiple data sources, is still a challenge because: (i) it is hard to extract informative features to represent human affects from heterogeneous inputs; (ii) current fusion strategies only fuse different modalities at abstract levels, ignoring time-dependent interactions between modalities. Addressing such issues, we introduce a hierarchical multimodal architecture with attention and word-level fusion to classify utterance-level sentiment and emotion from text and audio data. Our introduced model outperforms state-of-the-art approaches on published datasets, and we demonstrate that our model is able to visualize and interpret synchronized attention over modalities.</abstract> <identifier type="citekey">gu-etal-2018-multimodal</identifier> <identifier type="doi">10.18653/v1/P18-1207</identifier> <location> <url>https://aclanthology.org/P18-1207/</url> </location> <part> <date>2018-07</date> <extent unit="page"> <start>2225</start> <end>2235</end> </extent> </part></mods></modsCollection>
%0 Conference Proceedings%T Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment%A Gu, Yue%A Yang, Kangning%A Fu, Shiyu%A Chen, Shuhong%A Li, Xinyu%A Marsic, Ivan%Y Gurevych, Iryna%Y Miyao, Yusuke%S Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)%D 2018%8 July%I Association for Computational Linguistics%C Melbourne, Australia%F gu-etal-2018-multimodal%X Multimodal affective computing, learning to recognize and interpret human affect and subjective information from multiple data sources, is still a challenge because: (i) it is hard to extract informative features to represent human affects from heterogeneous inputs; (ii) current fusion strategies only fuse different modalities at abstract levels, ignoring time-dependent interactions between modalities. Addressing such issues, we introduce a hierarchical multimodal architecture with attention and word-level fusion to classify utterance-level sentiment and emotion from text and audio data. Our introduced model outperforms state-of-the-art approaches on published datasets, and we demonstrate that our model is able to visualize and interpret synchronized attention over modalities.%R 10.18653/v1/P18-1207%U https://aclanthology.org/P18-1207/%U https://doi.org/10.18653/v1/P18-1207%P 2225-2235
[Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment](https://aclanthology.org/P18-1207/) (Gu et al., ACL 2018)