Movatterモバイル変換

Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 13630))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

1501Accesses

Abstract

Traditional cross-modal retrieval (CMR) methods assume that training data holds all the categories appearing in retrieval stage. However, when some multimodal data of new categories come, the learned model may achieve disappointing performance. Based on the theory of zero-shot learning, zero-shot cross-modal retrieval (ZS-CMR) emerges to solve this problem and becomes a new research topic. Existing ZS-CMR methods have the following limitations. (1) The semantic association between seen and unseen categories is important but ignored. Therefore, the semantic knowledge cannot be fully transferred from seen classes to unseen classes. (2) The cross-modal representations are not semantically aligned. Thus, samples of new categories cannot obtain semantic representations, further leading to unsatisfactory retrieval results. To tackle the above problems, this paper proposed the semantic-adversarial graph convolutional network (SAGCN) for ZS-CMR. Specifically, graph convolutional network is introduced to mine the potential relationship between categories. Besides, the techniques of adversarial learning and semantic similarity reconstruction are utilized to learn a common space, where multimodal embedding and class embedding are semantically fused. Finally, a shared classifier is adopted to enhance the discriminant ability of the common space. Experiments on three data sets illustrated the effectiveness of SAGCN on both traditional CMR and ZS-CMR tasks.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 10295; Price includes VAT (Japan)

Softcover Book: JPY 12869; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Semantic Cross-Self-Reconstruction with Graph Convolutional Network for Zero-Shot Cross-Modal Retrieval

Towards user-specific multimodal recommendation via cross-modal attention-enhanced graph convolution network

Article18 November 2024

An Exploration of Cross-Modal Retrieval for Unseen Concepts

References

Ballan, L., Uricchio, T., Seidenari, L., Del Bimbo, A.: A cross-media model for automatic image annotation. In: Proceedings of International Conference on Multimedia Retrieval, pp. 73–80 (2014)
Google Scholar
Chakraborty, B., Wang, P., Wang, L.: Inter-modality fusion based attention for zero-shot cross-modal retrieval. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 2648–2652. IEEE (2021)
Google Scholar
Chi, J., Peng, Y.: Dual adversarial networks for zero-shot cross-media retrieval. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 663–669 (2018)
Google Scholar
Chi, J., Peng, Y.: Zero-shot cross-media embedding learning with dual adversarial distribution network. IEEE Trans. Circ. Syst. Video Technol.30(4), 1173–1187 (2019)
Article Google Scholar
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: a real-world web image database from national university of singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 1–9 (2009)
Google Scholar
Chun, S., Oh, S.J., De Rezende, R.S., Kalantidis, Y., Larlus, D.: Probabilistic embeddings for cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8415–8424 (2021)
Google Scholar
Diao, H., Zhang, Y., Ma, L., Lu, H.: Similarity reasoning and filtration for image-text matching. arXiv preprintarXiv:2101.01368 (2021)
Dong, X., Zhang, H., Dong, X., Lu, X.: Iterative graph attention memory network for cross-modal retrieval. Knowl.-Based Syst.226, 107138 (2021)
Article Google Scholar
Felix, R., Vijay Kumar, B.G., Reid, I., Carneiro, G.: Multi-modal cycle-consistent generalized zero-shot learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 21–37. Springer, Cham (2018).https://doi.org/10.1007/978-3-030-01231-1_2
Chapter Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst.27 (2014)
Google Scholar
Han, Z., Fu, Z., Chen, S., Yang, J.: Contrastive embedding for generalized zero-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2371–2381 (2021)
Google Scholar
Kang, C., Xiang, S., Liao, S., Xu, C., Pan, C.: Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans. Multimedia17(3), 370–381 (2015)
Article Google Scholar
Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3174–3183 (2017)
Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196. PMLR (2014)
Google Scholar
Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: Proceedings of the Eleventh ACM International Conference on Multimedia, pp. 604–611 (2003)
Google Scholar
Liu, S., Fan, H., Qian, S., Chen, Y., Ding, W., Wang, Z.: Hit: hierarchical transformer with momentum contrast for video-text retrieval. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11915–11925 (2021)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprintarXiv:1301.3781 (2013)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst.32 (2019)
Google Scholar
Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using amazon’s mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, pp. 139–147 (2010)
Google Scholar
Rasiwasia, N., et al.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 251–260 (2010)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprintarXiv:1409.1556 (2014)
Wang, B., Yang, Y., Xu, X., Hanjalic, A., Shen, H.T.: Adversarial cross-modal retrieval. In: Proceedings of the ACM International Conference on Multimedia, pp. 154–162 (2017)
Google Scholar
Wei, Y., et al.: Cross-modal retrieval with cnn visual features: a new baseline. IEEE Trans. Cybern.47(2), 449–460 (2016)
Google Scholar
Xu, W., Xian, Y., Wang, J., Schiele, B., Akata, Z.: Attribute prototype network for zero-shot learning. Adv. Neural Inf. Process. Syst.33, 21969–21980 (2020)
Google Scholar
Xu, X., Lin, K., Lu, H., Gao, L., Shen, H.T.: Correlated features synthesis and alignment for zero-shot cross-modal retrieval. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1419–1428 (2020)
Google Scholar
Xu, X., Song, J., Lu, H., Yang, Y., Shen, F., Huang, Z.: Modal-adversarial semantic learning network for extendable cross-modal retrieval. In: Proceedings of the ACM International Conference on Multimedia Retrieval, pp. 46–54 (2018)
Google Scholar
Xu, X., Tian, J., Lin, K., Lu, H., Shao, J., Shen, H.T.: Zero-shot cross-modal retrieval by assembling autoencoder and generative adversarial network. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM)17(1s), 1–17 (2021)
Google Scholar
Yan, F., Mikolajczyk, K.: Deep correlation for matching images and text. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3441–3450 (2015)
Google Scholar
Yang, Z., et al.: Nsdh: a nonlinear supervised discrete hashing framework for large-scale cross-modal retrieval. Knowl.-Based Syst.217, 106818 (2021)
Article Google Scholar
Zeng, Z., Wang, S., Xu, N., Mao, W.: Pan: prototype-based adaptive network for robust cross-modal retrieval. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1125–1134 (2021)
Google Scholar
Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. Circ. Syst. Video Technol.24(6), 965–978 (2013)
Article Google Scholar
Zhen, L., Hu, P., Peng, X., Goh, R.S.M., Zhou, J.T.: Deep multimodal transfer learning for cross-modal retrieval. IEEE Trans. Neural Netw. Learn. Syst.33(2) (2022)
Google Scholar

Download references

Acknowledgements

This work is supported in part by the Key-Area Research and Development Program of Guangdong Province under Grant 2020B010166006, in part by the National Natural Science Foundation of China under Grant 62176065, 62176066, 62202107, and 61972102, and in part by the Natural Science Foundation of Guangdong Province under Grant 2019A1515011811 and 2021A1515012017.

Author information

Authors and Affiliations

School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China
Chuang Li, Lunke Fei, Peipei Kang & Shaohua Teng
School of Automation, Guangdong University of Technology, Guangzhou, China
Jiahao Liang & Xiaozhao Fang

Authors

Chuang Li
View author publications
Search author on:PubMed Google Scholar
Lunke Fei
View author publications
Search author on:PubMed Google Scholar
Peipei Kang
View author publications
Search author on:PubMed Google Scholar
Jiahao Liang
View author publications
Search author on:PubMed Google Scholar
Xiaozhao Fang
View author publications
Search author on:PubMed Google Scholar
Shaohua Teng
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence toPeipei Kang.

Editor information

Editors and Affiliations

CSIRO Australian e-Health Research Centre, Brisbane, QLD, Australia
Sankalp Khanna
Shanghai Jiao Tong University, Shanghai, China
Jian Cao
University of Tasmania, Hobart, TAS, Australia
Quan Bai
University of Technology Sydney, Sydney, NSW, Australia
Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, C., Fei, L., Kang, P., Liang, J., Fang, X., Teng, S. (2022). Semantic-Adversarial Graph Convolutional Network for Zero-Shot Cross-Modal Retrieval. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13630. Springer, Cham. https://doi.org/10.1007/978-3-031-20865-2_34

Download citation

DOI:https://doi.org/10.1007/978-3-031-20865-2_34
Published:04 November 2022
Publisher Name:Springer, Cham
Print ISBN:978-3-031-20864-5
Online ISBN:978-3-031-20865-2
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Movatterモバイル変換

Semantic-Adversarial Graph Convolutional Network for Zero-Shot Cross-Modal Retrieval

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Cross-Self-Reconstruction with Graph Convolutional Network for Zero-Shot Cross-Modal Retrieval

Towards user-specific multimodal recommendation via cross-modal attention-enhanced graph convolution network

An Exploration of Cross-Modal Retrieval for Unseen Concepts

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Access this chapter

Subscribe and save

Buy Now