Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

DCA: Diversified Co-attention Towards Informative Live Video Commenting

  • Conference paper
  • First Online:

Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 12431))

Abstract

We focus on the task of Automatic Live Video Commenting (ALVC), which aims to generate real-time video comments with both video frames and other viewers’ comments as inputs. A major challenge in this task is how to properly leverage the rich and diverse information carried by video and text. In this paper, we aim to collect diversified information from video and text for informative comment generation. To achieve this, we propose a Diversified Co-Attention (DCA) model for this task. Our model builds bidirectional interactions between video frames and surrounding comments from multiple perspectives via metric learning, to collect adiversified andinformative context for comment generation. We also propose an effective parameter orthogonalization technique to avoid excessive overlap of information learned from different perspectives. Results show that our approach outperforms existing methods in the ALVC task, achieving new state-of-the-art results.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

Notes

  1. 1.

    We concatenate all surrounding comments into a single sequence\(\textit{\textbf{x}}\).

  2. 2.
  3. 3.

References

  1. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR 2015 (2015)

    Google Scholar 

  2. Chen, Y., Gao, Q., Rau, P.L.P.: Watching a movie alone yet together: understanding reasons for watching Danmaku videos. Int. J. Hum. Comput. Interact.33(9), 731–743 (2017)

    Article  Google Scholar 

  3. Cissé, M., Bojanowski, P., Grave, E., Dauphin, Y.N., Usunier, N.: Parseval networks: improving robustness to adversarial examples. In: ICML 2017 (2017)

    Google Scholar 

  4. Das, A., et al.: Visual dialog. In: CVPR 2017 (2017)

    Google Scholar 

  5. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016 (2016)

    Google Scholar 

  6. Hsu, K., Lin, Y., Chuang, Y.: Co-attention CNNs for unsupervised object co-segmentation. In: IJCAI 2018 (2018)

    Google Scholar 

  7. Jiang, T., et al.: CTGA: graph-based biomedical literature search. In: BIBM 2019 (2019)

    Google Scholar 

  8. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR 2015 (2015)

    Google Scholar 

  9. Kulis, B.: Metric learning: a survey. Found. Trends Mach. Learn.5(4), 287–364 (2013)

    Article MathSciNet  Google Scholar 

  10. Li, W., Xu, J., He, Y., Yan, S., Wu, Y., Sun, X.: Coherent comments generation for Chinese articles with a graph-to-sequence model. In: ACL 2019 (2019)

    Google Scholar 

  11. Li, X., et al.: Beyond RNNs: positional self-attention with co-attention for video question answering. In: AAAI 2019 (2019)

    Google Scholar 

  12. Li, X., Zhou, Z., Chen, L., Gao, L.: Residual attention-based LSTM for video captioning. World Wide Web22(2), 621–636 (2018).https://doi.org/10.1007/s11280-018-0531-z

    Article  Google Scholar 

  13. Lin, Z., et al.: A structured self-attentive sentence embedding. In: ICLR 2017 (2017)

    Google Scholar 

  14. Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: NeurIPS 2016 (2016)

    Google Scholar 

  15. Ma, S., Cui, L., Dai, D., Wei, F., Sun, X.: LiveBot: generating live video comments based on visual and textual contexts. In: AAAI 2019 (2019)

    Google Scholar 

  16. Ma, S., Cui, L., Wei, F., Sun, X.: Unsupervised machine commenting with neural variational topic model. ArXiv preprintarXiv:1809.04960 (2018)

  17. Nguyen, D., Okatani, T.: Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering. In: CVPR 2018 (2018)

    Google Scholar 

  18. Qin, L., et al.: Automatic article commenting: the task and dataset. In: ACL 2018 (2018)

    Google Scholar 

  19. Seo, M.J., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. In: ICLR 2017 (2017)

    Google Scholar 

  20. Shen, Z., et al.: Weakly supervised dense video captioning. In: CVPR 2017 (2017)

    Google Scholar 

  21. Tay, Y., Luu, A.T., Hui, S.C.: Multi-pointer co-attention networks for recommendation. In: KDD 2018 (2018)

    Google Scholar 

  22. Vaswani, A., et al.: Attention is all you need. In: NeurIPS 2017 (2017)

    Google Scholar 

  23. Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R.J., Darrell, T., Saenko, K.: Sequence to sequence - video to text. In: ICCV 2015 (2015)

    Google Scholar 

  24. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR 2015 (2015)

    Google Scholar 

  25. Wu, W., et al.: Proactive human-machine conversation with explicit conversation goal. In: ACL 2019 (2019)

    Google Scholar 

  26. Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.J.: Distance metric learning with application to clustering with side-information. In: NeurIPS 2002 (2002)

    Google Scholar 

  27. Xiong, Y., Dai, B., Lin, D.: Move forward and tell: a progressive generator of video descriptions. In: ECCV 2018 (2018)

    Google Scholar 

  28. Xu, H., Li, B., Ramanishka, V., Sigal, L., Saenko, K.: Joint event detection and description in continuous video streams. In: WACV 2019 (2019)

    Google Scholar 

  29. Yang, P., Zhang, Z., Luo, F., Li, L., Huang, C., Sun, X.: Cross-modal commentator: automatic machine commenting based on cross-modal information. In: ACL 2019 (2019)

    Google Scholar 

  30. Yu, A.W., et al.: QANet: combining local convolution with global self-attention for reading comprehension. In: ICLR 2018 (2018)

    Google Scholar 

  31. Yu, Z., Yu, J., Cui, Y., Tao, D., Tian, Q.: Deep modular co-attention networks for visual question answering. In: CVPR 2019 (2019)

    Google Scholar 

  32. Zeng, W., Abuduweili, A., Li, L., Yang, P.: Automatic generation of personalized comment based on user profile. In: ACL 2019 (2019)

    Google Scholar 

  33. Zhou, H., Zheng, C., Huang, K., Huang, M., Zhu, X.: KdConv: a Chinese multi-domain dialogue dataset towards multi-turn knowledge-driven conversation. In: ACL 2020 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. School of Electronic Engineering and Computer Science, Peking University, Beijing, China

    Zhihan Zhang, Zhiyi Yin & Shicheng Li

  2. School of Software Engineering, Huazhong University of Science and Technology, Wuhan, China

    Shuhuai Ren

  3. College of Software, Beijing University of Aeronautics and Astronautics, Beijing, China

    Xinhang Li

Authors
  1. Zhihan Zhang

    You can also search for this author inPubMed Google Scholar

  2. Zhiyi Yin

    You can also search for this author inPubMed Google Scholar

  3. Shuhuai Ren

    You can also search for this author inPubMed Google Scholar

  4. Xinhang Li

    You can also search for this author inPubMed Google Scholar

  5. Shicheng Li

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toZhihan Zhang.

Editor information

Editors and Affiliations

  1. ECE & Ingenuity Labs Research Institute, Queen’s University, Kingston, ON, Canada

    Xiaodan Zhu

  2. Department of Computer Science and Technology, Tsinghua University, Beijing, China

    Min Zhang

  3. School of Computer Science and Technology, Soochow University, Suzhou, China

    Yu Hong

  4. College of Intelligence and Computing, Tianjin University, Tianjin, China

    Ruifang He

Rights and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Z., Yin, Z., Ren, S., Li, X., Li, S. (2020). DCA: Diversified Co-attention Towards Informative Live Video Commenting. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12431. Springer, Cham. https://doi.org/10.1007/978-3-030-60457-8_1

Download citation

Publish with us

Societies and partnerships

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp