Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 13630))
Included in the following conference series:
1501Accesses
Abstract
Traditional cross-modal retrieval (CMR) methods assume that training data holds all the categories appearing in retrieval stage. However, when some multimodal data of new categories come, the learned model may achieve disappointing performance. Based on the theory of zero-shot learning, zero-shot cross-modal retrieval (ZS-CMR) emerges to solve this problem and becomes a new research topic. Existing ZS-CMR methods have the following limitations. (1) The semantic association between seen and unseen categories is important but ignored. Therefore, the semantic knowledge cannot be fully transferred from seen classes to unseen classes. (2) The cross-modal representations are not semantically aligned. Thus, samples of new categories cannot obtain semantic representations, further leading to unsatisfactory retrieval results. To tackle the above problems, this paper proposed the semantic-adversarial graph convolutional network (SAGCN) for ZS-CMR. Specifically, graph convolutional network is introduced to mine the potential relationship between categories. Besides, the techniques of adversarial learning and semantic similarity reconstruction are utilized to learn a common space, where multimodal embedding and class embedding are semantically fused. Finally, a shared classifier is adopted to enhance the discriminant ability of the common space. Experiments on three data sets illustrated the effectiveness of SAGCN on both traditional CMR and ZS-CMR tasks.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 10295
- Price includes VAT (Japan)
- Softcover Book
- JPY 12869
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ballan, L., Uricchio, T., Seidenari, L., Del Bimbo, A.: A cross-media model for automatic image annotation. In: Proceedings of International Conference on Multimedia Retrieval, pp. 73–80 (2014)
Chakraborty, B., Wang, P., Wang, L.: Inter-modality fusion based attention for zero-shot cross-modal retrieval. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 2648–2652. IEEE (2021)
Chi, J., Peng, Y.: Dual adversarial networks for zero-shot cross-media retrieval. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 663–669 (2018)
Chi, J., Peng, Y.: Zero-shot cross-media embedding learning with dual adversarial distribution network. IEEE Trans. Circ. Syst. Video Technol.30(4), 1173–1187 (2019)
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: a real-world web image database from national university of singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 1–9 (2009)
Chun, S., Oh, S.J., De Rezende, R.S., Kalantidis, Y., Larlus, D.: Probabilistic embeddings for cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8415–8424 (2021)
Diao, H., Zhang, Y., Ma, L., Lu, H.: Similarity reasoning and filtration for image-text matching. arXiv preprintarXiv:2101.01368 (2021)
Dong, X., Zhang, H., Dong, X., Lu, X.: Iterative graph attention memory network for cross-modal retrieval. Knowl.-Based Syst.226, 107138 (2021)
Felix, R., Vijay Kumar, B.G., Reid, I., Carneiro, G.: Multi-modal cycle-consistent generalized zero-shot learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 21–37. Springer, Cham (2018).https://doi.org/10.1007/978-3-030-01231-1_2
Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst.27 (2014)
Han, Z., Fu, Z., Chen, S., Yang, J.: Contrastive embedding for generalized zero-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2371–2381 (2021)
Kang, C., Xiang, S., Liao, S., Xu, C., Pan, C.: Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans. Multimedia17(3), 370–381 (2015)
Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3174–3183 (2017)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196. PMLR (2014)
Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: Proceedings of the Eleventh ACM International Conference on Multimedia, pp. 604–611 (2003)
Liu, S., Fan, H., Qian, S., Chen, Y., Ding, W., Wang, Z.: Hit: hierarchical transformer with momentum contrast for video-text retrieval. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11915–11925 (2021)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprintarXiv:1301.3781 (2013)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst.32 (2019)
Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using amazon’s mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, pp. 139–147 (2010)
Rasiwasia, N., et al.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 251–260 (2010)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprintarXiv:1409.1556 (2014)
Wang, B., Yang, Y., Xu, X., Hanjalic, A., Shen, H.T.: Adversarial cross-modal retrieval. In: Proceedings of the ACM International Conference on Multimedia, pp. 154–162 (2017)
Wei, Y., et al.: Cross-modal retrieval with cnn visual features: a new baseline. IEEE Trans. Cybern.47(2), 449–460 (2016)
Xu, W., Xian, Y., Wang, J., Schiele, B., Akata, Z.: Attribute prototype network for zero-shot learning. Adv. Neural Inf. Process. Syst.33, 21969–21980 (2020)
Xu, X., Lin, K., Lu, H., Gao, L., Shen, H.T.: Correlated features synthesis and alignment for zero-shot cross-modal retrieval. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1419–1428 (2020)
Xu, X., Song, J., Lu, H., Yang, Y., Shen, F., Huang, Z.: Modal-adversarial semantic learning network for extendable cross-modal retrieval. In: Proceedings of the ACM International Conference on Multimedia Retrieval, pp. 46–54 (2018)
Xu, X., Tian, J., Lin, K., Lu, H., Shao, J., Shen, H.T.: Zero-shot cross-modal retrieval by assembling autoencoder and generative adversarial network. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM)17(1s), 1–17 (2021)
Yan, F., Mikolajczyk, K.: Deep correlation for matching images and text. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3441–3450 (2015)
Yang, Z., et al.: Nsdh: a nonlinear supervised discrete hashing framework for large-scale cross-modal retrieval. Knowl.-Based Syst.217, 106818 (2021)
Zeng, Z., Wang, S., Xu, N., Mao, W.: Pan: prototype-based adaptive network for robust cross-modal retrieval. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1125–1134 (2021)
Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. Circ. Syst. Video Technol.24(6), 965–978 (2013)
Zhen, L., Hu, P., Peng, X., Goh, R.S.M., Zhou, J.T.: Deep multimodal transfer learning for cross-modal retrieval. IEEE Trans. Neural Netw. Learn. Syst.33(2) (2022)
Acknowledgements
This work is supported in part by the Key-Area Research and Development Program of Guangdong Province under Grant 2020B010166006, in part by the National Natural Science Foundation of China under Grant 62176065, 62176066, 62202107, and 61972102, and in part by the Natural Science Foundation of Guangdong Province under Grant 2019A1515011811 and 2021A1515012017.
Author information
Authors and Affiliations
School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China
Chuang Li, Lunke Fei, Peipei Kang & Shaohua Teng
School of Automation, Guangdong University of Technology, Guangzhou, China
Jiahao Liang & Xiaozhao Fang
- Chuang Li
Search author on:PubMed Google Scholar
- Lunke Fei
Search author on:PubMed Google Scholar
- Peipei Kang
Search author on:PubMed Google Scholar
- Jiahao Liang
Search author on:PubMed Google Scholar
- Xiaozhao Fang
Search author on:PubMed Google Scholar
- Shaohua Teng
Search author on:PubMed Google Scholar
Corresponding author
Correspondence toPeipei Kang.
Editor information
Editors and Affiliations
CSIRO Australian e-Health Research Centre, Brisbane, QLD, Australia
Sankalp Khanna
Shanghai Jiao Tong University, Shanghai, China
Jian Cao
University of Tasmania, Hobart, TAS, Australia
Quan Bai
University of Technology Sydney, Sydney, NSW, Australia
Guandong Xu
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, C., Fei, L., Kang, P., Liang, J., Fang, X., Teng, S. (2022). Semantic-Adversarial Graph Convolutional Network for Zero-Shot Cross-Modal Retrieval. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13630. Springer, Cham. https://doi.org/10.1007/978-3-031-20865-2_34
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-031-20864-5
Online ISBN:978-3-031-20865-2
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative