Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 12431))
Included in the following conference series:
2209Accesses
Abstract
We focus on the task of Automatic Live Video Commenting (ALVC), which aims to generate real-time video comments with both video frames and other viewers’ comments as inputs. A major challenge in this task is how to properly leverage the rich and diverse information carried by video and text. In this paper, we aim to collect diversified information from video and text for informative comment generation. To achieve this, we propose a Diversified Co-Attention (DCA) model for this task. Our model builds bidirectional interactions between video frames and surrounding comments from multiple perspectives via metric learning, to collect adiversified andinformative context for comment generation. We also propose an effective parameter orthogonalization technique to avoid excessive overlap of information learned from different perspectives. Results show that our approach outperforms existing methods in the ALVC task, achieving new state-of-the-art results.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 5719
- Price includes VAT (Japan)
- Softcover Book
- JPY 7149
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We concatenate all surrounding comments into a single sequence\(\textit{\textbf{x}}\).
- 2.
- 3.
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR 2015 (2015)
Chen, Y., Gao, Q., Rau, P.L.P.: Watching a movie alone yet together: understanding reasons for watching Danmaku videos. Int. J. Hum. Comput. Interact.33(9), 731–743 (2017)
Cissé, M., Bojanowski, P., Grave, E., Dauphin, Y.N., Usunier, N.: Parseval networks: improving robustness to adversarial examples. In: ICML 2017 (2017)
Das, A., et al.: Visual dialog. In: CVPR 2017 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016 (2016)
Hsu, K., Lin, Y., Chuang, Y.: Co-attention CNNs for unsupervised object co-segmentation. In: IJCAI 2018 (2018)
Jiang, T., et al.: CTGA: graph-based biomedical literature search. In: BIBM 2019 (2019)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR 2015 (2015)
Kulis, B.: Metric learning: a survey. Found. Trends Mach. Learn.5(4), 287–364 (2013)
Li, W., Xu, J., He, Y., Yan, S., Wu, Y., Sun, X.: Coherent comments generation for Chinese articles with a graph-to-sequence model. In: ACL 2019 (2019)
Li, X., et al.: Beyond RNNs: positional self-attention with co-attention for video question answering. In: AAAI 2019 (2019)
Li, X., Zhou, Z., Chen, L., Gao, L.: Residual attention-based LSTM for video captioning. World Wide Web22(2), 621–636 (2018).https://doi.org/10.1007/s11280-018-0531-z
Lin, Z., et al.: A structured self-attentive sentence embedding. In: ICLR 2017 (2017)
Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: NeurIPS 2016 (2016)
Ma, S., Cui, L., Dai, D., Wei, F., Sun, X.: LiveBot: generating live video comments based on visual and textual contexts. In: AAAI 2019 (2019)
Ma, S., Cui, L., Wei, F., Sun, X.: Unsupervised machine commenting with neural variational topic model. ArXiv preprintarXiv:1809.04960 (2018)
Nguyen, D., Okatani, T.: Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering. In: CVPR 2018 (2018)
Qin, L., et al.: Automatic article commenting: the task and dataset. In: ACL 2018 (2018)
Seo, M.J., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. In: ICLR 2017 (2017)
Shen, Z., et al.: Weakly supervised dense video captioning. In: CVPR 2017 (2017)
Tay, Y., Luu, A.T., Hui, S.C.: Multi-pointer co-attention networks for recommendation. In: KDD 2018 (2018)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS 2017 (2017)
Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R.J., Darrell, T., Saenko, K.: Sequence to sequence - video to text. In: ICCV 2015 (2015)
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR 2015 (2015)
Wu, W., et al.: Proactive human-machine conversation with explicit conversation goal. In: ACL 2019 (2019)
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.J.: Distance metric learning with application to clustering with side-information. In: NeurIPS 2002 (2002)
Xiong, Y., Dai, B., Lin, D.: Move forward and tell: a progressive generator of video descriptions. In: ECCV 2018 (2018)
Xu, H., Li, B., Ramanishka, V., Sigal, L., Saenko, K.: Joint event detection and description in continuous video streams. In: WACV 2019 (2019)
Yang, P., Zhang, Z., Luo, F., Li, L., Huang, C., Sun, X.: Cross-modal commentator: automatic machine commenting based on cross-modal information. In: ACL 2019 (2019)
Yu, A.W., et al.: QANet: combining local convolution with global self-attention for reading comprehension. In: ICLR 2018 (2018)
Yu, Z., Yu, J., Cui, Y., Tao, D., Tian, Q.: Deep modular co-attention networks for visual question answering. In: CVPR 2019 (2019)
Zeng, W., Abuduweili, A., Li, L., Yang, P.: Automatic generation of personalized comment based on user profile. In: ACL 2019 (2019)
Zhou, H., Zheng, C., Huang, K., Huang, M., Zhu, X.: KdConv: a Chinese multi-domain dialogue dataset towards multi-turn knowledge-driven conversation. In: ACL 2020 (2020)
Author information
Authors and Affiliations
School of Electronic Engineering and Computer Science, Peking University, Beijing, China
Zhihan Zhang, Zhiyi Yin & Shicheng Li
School of Software Engineering, Huazhong University of Science and Technology, Wuhan, China
Shuhuai Ren
College of Software, Beijing University of Aeronautics and Astronautics, Beijing, China
Xinhang Li
- Zhihan Zhang
You can also search for this author inPubMed Google Scholar
- Zhiyi Yin
You can also search for this author inPubMed Google Scholar
- Shuhuai Ren
You can also search for this author inPubMed Google Scholar
- Xinhang Li
You can also search for this author inPubMed Google Scholar
- Shicheng Li
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toZhihan Zhang.
Editor information
Editors and Affiliations
ECE & Ingenuity Labs Research Institute, Queen’s University, Kingston, ON, Canada
Xiaodan Zhu
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Min Zhang
School of Computer Science and Technology, Soochow University, Suzhou, China
Yu Hong
College of Intelligence and Computing, Tianjin University, Tianjin, China
Ruifang He
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Z., Yin, Z., Ren, S., Li, X., Li, S. (2020). DCA: Diversified Co-attention Towards Informative Live Video Commenting. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12431. Springer, Cham. https://doi.org/10.1007/978-3-030-60457-8_1
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-030-60456-1
Online ISBN:978-3-030-60457-8
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative