Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Semantic-Adversarial Graph Convolutional Network for Zero-Shot Cross-Modal Retrieval

  • Conference paper
  • First Online:

Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 13630))

Included in the following conference series:

  • 1501Accesses

Abstract

Traditional cross-modal retrieval (CMR) methods assume that training data holds all the categories appearing in retrieval stage. However, when some multimodal data of new categories come, the learned model may achieve disappointing performance. Based on the theory of zero-shot learning, zero-shot cross-modal retrieval (ZS-CMR) emerges to solve this problem and becomes a new research topic. Existing ZS-CMR methods have the following limitations. (1) The semantic association between seen and unseen categories is important but ignored. Therefore, the semantic knowledge cannot be fully transferred from seen classes to unseen classes. (2) The cross-modal representations are not semantically aligned. Thus, samples of new categories cannot obtain semantic representations, further leading to unsatisfactory retrieval results. To tackle the above problems, this paper proposed the semantic-adversarial graph convolutional network (SAGCN) for ZS-CMR. Specifically, graph convolutional network is introduced to mine the potential relationship between categories. Besides, the techniques of adversarial learning and semantic similarity reconstruction are utilized to learn a common space, where multimodal embedding and class embedding are semantically fused. Finally, a shared classifier is adopted to enhance the discriminant ability of the common space. Experiments on three data sets illustrated the effectiveness of SAGCN on both traditional CMR and ZS-CMR tasks.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 10295
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 12869
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

References

  1. Ballan, L., Uricchio, T., Seidenari, L., Del Bimbo, A.: A cross-media model for automatic image annotation. In: Proceedings of International Conference on Multimedia Retrieval, pp. 73–80 (2014)

    Google Scholar 

  2. Chakraborty, B., Wang, P., Wang, L.: Inter-modality fusion based attention for zero-shot cross-modal retrieval. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 2648–2652. IEEE (2021)

    Google Scholar 

  3. Chi, J., Peng, Y.: Dual adversarial networks for zero-shot cross-media retrieval. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 663–669 (2018)

    Google Scholar 

  4. Chi, J., Peng, Y.: Zero-shot cross-media embedding learning with dual adversarial distribution network. IEEE Trans. Circ. Syst. Video Technol.30(4), 1173–1187 (2019)

    Article  Google Scholar 

  5. Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: a real-world web image database from national university of singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 1–9 (2009)

    Google Scholar 

  6. Chun, S., Oh, S.J., De Rezende, R.S., Kalantidis, Y., Larlus, D.: Probabilistic embeddings for cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8415–8424 (2021)

    Google Scholar 

  7. Diao, H., Zhang, Y., Ma, L., Lu, H.: Similarity reasoning and filtration for image-text matching. arXiv preprintarXiv:2101.01368 (2021)

  8. Dong, X., Zhang, H., Dong, X., Lu, X.: Iterative graph attention memory network for cross-modal retrieval. Knowl.-Based Syst.226, 107138 (2021)

    Article  Google Scholar 

  9. Felix, R., Vijay Kumar, B.G., Reid, I., Carneiro, G.: Multi-modal cycle-consistent generalized zero-shot learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 21–37. Springer, Cham (2018).https://doi.org/10.1007/978-3-030-01231-1_2

    Chapter  Google Scholar 

  10. Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst.27 (2014)

    Google Scholar 

  11. Han, Z., Fu, Z., Chen, S., Yang, J.: Contrastive embedding for generalized zero-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2371–2381 (2021)

    Google Scholar 

  12. Kang, C., Xiang, S., Liao, S., Xu, C., Pan, C.: Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans. Multimedia17(3), 370–381 (2015)

    Article  Google Scholar 

  13. Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3174–3183 (2017)

    Google Scholar 

  14. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196. PMLR (2014)

    Google Scholar 

  15. Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: Proceedings of the Eleventh ACM International Conference on Multimedia, pp. 604–611 (2003)

    Google Scholar 

  16. Liu, S., Fan, H., Qian, S., Chen, Y., Ding, W., Wang, Z.: Hit: hierarchical transformer with momentum contrast for video-text retrieval. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11915–11925 (2021)

    Google Scholar 

  17. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprintarXiv:1301.3781 (2013)

  18. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst.32 (2019)

    Google Scholar 

  19. Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using amazon’s mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, pp. 139–147 (2010)

    Google Scholar 

  20. Rasiwasia, N., et al.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 251–260 (2010)

    Google Scholar 

  21. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprintarXiv:1409.1556 (2014)

  22. Wang, B., Yang, Y., Xu, X., Hanjalic, A., Shen, H.T.: Adversarial cross-modal retrieval. In: Proceedings of the ACM International Conference on Multimedia, pp. 154–162 (2017)

    Google Scholar 

  23. Wei, Y., et al.: Cross-modal retrieval with cnn visual features: a new baseline. IEEE Trans. Cybern.47(2), 449–460 (2016)

    Google Scholar 

  24. Xu, W., Xian, Y., Wang, J., Schiele, B., Akata, Z.: Attribute prototype network for zero-shot learning. Adv. Neural Inf. Process. Syst.33, 21969–21980 (2020)

    Google Scholar 

  25. Xu, X., Lin, K., Lu, H., Gao, L., Shen, H.T.: Correlated features synthesis and alignment for zero-shot cross-modal retrieval. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1419–1428 (2020)

    Google Scholar 

  26. Xu, X., Song, J., Lu, H., Yang, Y., Shen, F., Huang, Z.: Modal-adversarial semantic learning network for extendable cross-modal retrieval. In: Proceedings of the ACM International Conference on Multimedia Retrieval, pp. 46–54 (2018)

    Google Scholar 

  27. Xu, X., Tian, J., Lin, K., Lu, H., Shao, J., Shen, H.T.: Zero-shot cross-modal retrieval by assembling autoencoder and generative adversarial network. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM)17(1s), 1–17 (2021)

    Google Scholar 

  28. Yan, F., Mikolajczyk, K.: Deep correlation for matching images and text. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3441–3450 (2015)

    Google Scholar 

  29. Yang, Z., et al.: Nsdh: a nonlinear supervised discrete hashing framework for large-scale cross-modal retrieval. Knowl.-Based Syst.217, 106818 (2021)

    Article  Google Scholar 

  30. Zeng, Z., Wang, S., Xu, N., Mao, W.: Pan: prototype-based adaptive network for robust cross-modal retrieval. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1125–1134 (2021)

    Google Scholar 

  31. Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. Circ. Syst. Video Technol.24(6), 965–978 (2013)

    Article  Google Scholar 

  32. Zhen, L., Hu, P., Peng, X., Goh, R.S.M., Zhou, J.T.: Deep multimodal transfer learning for cross-modal retrieval. IEEE Trans. Neural Netw. Learn. Syst.33(2) (2022)

    Google Scholar 

Download references

Acknowledgements

This work is supported in part by the Key-Area Research and Development Program of Guangdong Province under Grant 2020B010166006, in part by the National Natural Science Foundation of China under Grant 62176065, 62176066, 62202107, and 61972102, and in part by the Natural Science Foundation of Guangdong Province under Grant 2019A1515011811 and 2021A1515012017.

Author information

Authors and Affiliations

  1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China

    Chuang Li, Lunke Fei, Peipei Kang & Shaohua Teng

  2. School of Automation, Guangdong University of Technology, Guangzhou, China

    Jiahao Liang & Xiaozhao Fang

Authors
  1. Chuang Li
  2. Lunke Fei
  3. Peipei Kang
  4. Jiahao Liang
  5. Xiaozhao Fang
  6. Shaohua Teng

Corresponding author

Correspondence toPeipei Kang.

Editor information

Editors and Affiliations

  1. CSIRO Australian e-Health Research Centre, Brisbane, QLD, Australia

    Sankalp Khanna

  2. Shanghai Jiao Tong University, Shanghai, China

    Jian Cao

  3. University of Tasmania, Hobart, TAS, Australia

    Quan Bai

  4. University of Technology Sydney, Sydney, NSW, Australia

    Guandong Xu

Rights and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, C., Fei, L., Kang, P., Liang, J., Fang, X., Teng, S. (2022). Semantic-Adversarial Graph Convolutional Network for Zero-Shot Cross-Modal Retrieval. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13630. Springer, Cham. https://doi.org/10.1007/978-3-031-20865-2_34

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 10295
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 12869
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp