- Bin Wang1,
- Yong Feng ORCID:orcid.org/0000-0002-8820-83881,
- Xian-cai Xiong2,3,
- Yong-heng Wang4 &
- …
- Bao-hua Qiang5
2346Accesses
17Citations
1Altmetric
Abstract
Fake news with multimedia data is ubiquitous on the Internet nowadays, and it is difficult for users to distinguish them. Therefore, it is necessary to design automatic multi-modal fake news detectors. However, the existing works make poor utilization of visual information, and do not fully consider the semantic interaction of multi-modal data. In this paper, we propose the multi-modal transformer using two-level visual features (MTTV) for fake news detection. First, we model texts and images from news uniformly as sequences that can be processed by transformer, and two-level visual features, i.e. global feature and entity-level feature, are used to improve the utilization of news images. Second, we extend the transformer model for natural language processing to multi-modal transformer which can make multi-modal data interact fully and capture the semantic relationships between them. In addition, we propose a scalable classifier to improve the classification balance of fine-grained fake news detection with the problem of class imbalance. Extensive experiments on two public datasets demonstrate that our method achieved significant performance improvement compared to the state-of-the-art methods. The source code is available athttps://github.com/cqu-wb/MTTV.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter 19(1):22–36
Allcott H, Gentzkow M (2017) Social media and fake news in the 2016 election. J Econ Perspect 31(2):211–36
Rocha YM, de Moura GA, Desidério GA, de Oliveira CH, lourenço FD, de Figueiredo Nicolete LD (2021) The impact of fake news on social media and its influence on health during the covid-19 pandemic: a systematic review. J of Public Health 9:1–10
Liu Y, Wu YF (2018) Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Zhou X, Zafarani R (2019) Network-based fake news detection: A pattern-driven approach. ACM SIGKDD explorations newsletter 21(2):48–60
Kwon S, Cha M, Jung K, Chen W, Wang Y (2013) Prominent features of rumor propagation in online social media. In: 2013 IEEE 13Th International conference on data mining, pp 1103–1108
Shu K, Wang S, contents HL (2019) Beyond news the role of social context for fake news detection. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 312–320
Ma J, Gao W, Wei Z, Lu Y, Wong K-F (2015) Detect rumors using time series of social context information on microblogging websites. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 1751–1754
Jin Z, Cao J, Guo H, Zhang Y, Luo J (2017) Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In: Proceedings of the 25th ACM international conference on Multimedia, pp 795–816
Ruchansky N, Seo S, Csi YL (2017) A hybrid deep model for fake news detection. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp 797–806
Ma Jing, Gao Wei, Mitra P, Kwon S, Jansen BJ, Wong KFi, Cha M (2016) Detecting rumors from microblogs with recurrent neural networks. In: IJCAI International joint conference on artificial intelligence, pp 3818–3824
Singhania S, Fernandez N, Rao A (2017) 3han: A deep neural network for fake news detection. In: International conference on neural information processing, pp 572–581
Wang Y, Yang W, Ma F, Xu J, Zhong B, Deng Q, Gao J (2020) Weak supervision for fake news detection via reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 516–523
Wang Y, Ma F, Jin Z, Ye Y, Xun G, Jha K, Lu S, Gao J (2018) Eann: Event adversarial neural networks for multi-modal fake news detection. In: Proceedings of the 24th acm sigkdd international conference on knowledge discovery & data mining, pp 849–857
Khattar D, Goud JS, Gupta M, Mvae VV (2019) Multimodal variational autoencoder for fake news detection. In: The world wide web conference, pp 2915–2921
Raj C, Meel P (2021) Convnet frameworks for multi-modal fake news detection. Appl Intell 51(11):8132–8148
Yoon K (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1746–1751
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3Rd international conference on learning representations
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770– 778
Nakamura Kai, Levy Sharon, Wang William Yang (2020) Fakeddit: a new multimodal benchmark dataset for fine-grained fake news detection. In: Proceedings of the 12th language resources and evaluation conference, pp 6149–6157
Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Pérez-Rosas V, Kleinberg B, Lefevre A, Rada M (2018) Automatic detection of fake news. In: Proceedings of the 27th international conference on computational linguistics, pp 3391–3401
Boididou C, Andreadou K, Papadopoulos S, Dang-Nguyen DT, Boato G, Riegler M, ompatsiaris Y et al (2015) Verifying multimedia use at mediaeval 2015. In: Working notes proceedings of the MediaEval 2015 workshop, Wurzen, Germany, September 14-15, 2015, vol 1436. of CEUR Workshop Proceedings
Gupta A, Lamba H, Kumaraguru P, Joshi A (2013) Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In: Proceedings of the 22nd international conference on world wide web, pp 729–736
Jin Z, Cao J, Zhang Y, Zhou J, Qi T (2016) Novel visual and statistical image features for microblogs news verification. IEEE Trans Multimedia 19(3):598–608
Zhang H, Fang Q, Qian S, Xu C (2019) Multi-modal knowledge-aware event memory network for social media rumor detection. In: Proceedings of the 27th ACM international conference on multimedia, pp 1942–1951
Wang Y, Qian S, Hu J, Fang Q, Xu C (2020) Fake news detection via knowledge-driven multimodal graph convolutional networks. In: Proceedings of the 2020 international conference on multimedia retrieval, pp 540–547
Qian S, Hu J, Fang Q, Xu C (2021) Knowledge-aware multi-modal adaptive graph convolutional networks for fake news detection. ACM Trans Multimedia Comput Commun Appl (TOMM) 17(3):1–23
Silva A, Luo L, Karunasekera S, Leckie C (2021) Embracing domain differences in fake news: Cross-domain fake news detection using multi-modal data. In: Proceedings of the AAAI conference on artificial intelligence, pp 557–565
Zeng J, Zhang Y, Ma X (2021) Fake news detection for epidemic emergencies via deep correlations between text and images. Sustain Cities Soc 66:102652
Wei Z, Pan H, Qiao L, Niu X, Dong P, Li D (2022) Cross-modal knowledge distillation in multi-modal fake news detection. In: ICASSP 2022-2022 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 4733–4737
Qian S, Wang J, Hu J, Fang Q, Xu C (2021) Hierarchical multi-modal contextual attention network for fake news detection. In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, pp 153– 162
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding with unsupervised learning, technical report, OpenAI
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, pp 4171–4186
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: 9Th International conference on learning representations
Lu J, Batra D, Parikh D, Lee S (2019) Vilbert: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Proceedings of the 33rd international conference on neural information processing systems, pp 13–23
Li LH, Yatskar M, Yin D, Hsieh C-J, Chang K-W (2020) What does bert with vision look at?. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 5265–5275
Su W, Zhu X, Cao Y, Li B, Lu W, Wei F, Dai J (2020) VL-BERT: Pre-training of generic visual-linguistic representations. In: 8Th International conference on learning representations
Kiela D, Bhooshan S, Firooz H, Testuggine D (2019) Supervised multimodal bitransformers for classifying images and text. In: Visually grounded interaction and language (ViGIL), NeurIPS 2019 workshop
Curto D, Clapés A, Selva J, Smeureanu S, Junior J, Jacques CS, Gallardo-Pujol D, Guilera G, Leiva D, Moeslund TB et al (2021) Dyadformer: a multi-modal transformer for long-range modeling of dyadic interactions. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2177–2188
Gabeur V, Sun C, Alahari K, Schmid C (2020) Multi-modal transformer for video retrieval. In: Computer vision–ECCV 2020 16th european conference, pp 214–229
Messina N, Amato G, Esuli A, Falchi F, Gennaro C, Marchand-Maillet S (2021) Fine-grained visual textual alignment for cross-modal retrieval using transformer encoders. ACM Trans. Multimedia Comput, Commun, Appl (TOMM) 17(4):1–23
Prakash A, Chitta K, Geiger A (June 2021) Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7077–7087
Ju X, Zhang D, Li J, Zhou G (2020) Transformer-based label set generation for multi-modal multi-label emotion detection. In: Proceedings of the 28th ACM international conference on multimedia, pp 512–520
Sun H, Liu J, Chai S, Qiu Z, Lin L, Huang X, Chen Y (2021) Multi-modal adaptive fusion transformer network for the estimation of depression level. Sensors 21(14):4764
Zhou B, Cui Q, Wei X-S, Chen Z-M (2020) Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9719–9728
Kang B, Xie S, Rohrbach M, Yan Z, Gordo A, Feng J, Kalantidis Y (2020) Decoupling representation and classifier for long-tailed recognition. In: 8Th International conference on learning representations
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: 3Rd International conference on learning representations
Acknowledgements
Supported by Zhejiang Lab (No. 2021KE0AB01), Open Fund of Key Laboratory of Monitoring, Evaluation and Early Warning of Territorial Spatial Planning Implementation, Ministry of Natural Resources (No. LMEE-KF2021008), Technology Innovation and Application Development Key Project of Chongqing (No. cstc2021jscx-gksbX0058), National Natural Science Foundation of China (No.62176029), and Guangxi Key Laboratory of Trusted Software (No. kx202006).
Author information
Authors and Affiliations
College of Computer Science, Chongqing University, Chongqing, 400030, China
Bin Wang & Yong Feng
Key Laboratory of Monitoring, Evaluation and Early Warning of Territorial Spatial Planning Implementation, Ministry of Natural Resources, Chongqing, 401147, China
Xian-cai Xiong
Chongqing Institute of Planning and Natural Resources Investigation and Monitoring, Chongqing, 401121, China
Xian-cai Xiong
8# of Zhejiang Lab, Yuhang District, Hangzhou, 311121, China
Yong-heng Wang
Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, 541004, China
Bao-hua Qiang
- Bin Wang
You can also search for this author inPubMed Google Scholar
- Yong Feng
You can also search for this author inPubMed Google Scholar
- Xian-cai Xiong
You can also search for this author inPubMed Google Scholar
- Yong-heng Wang
You can also search for this author inPubMed Google Scholar
- Bao-hua Qiang
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toYong Feng.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, B., Feng, Y., Xiong, Xc.et al. Multi-modal transformer using two-level visual features for fake news detection.Appl Intell53, 10429–10443 (2023). https://doi.org/10.1007/s10489-022-04055-5
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative