482Accesses
Abstract
Multi-label text classification(MLTC) is a key task in natural language processing. Its challenge is to extract latent semantic features from text and effectively exploit label-associated features. This work proposes an MLTC model driven solely by attention mechanisms, which includes Graph Attention(GA), Class-Specific Attention(CSA), and Multi-Head Attention(MHA) modules. The GA module examines and records label dependencies by considering label semantic features as attributes of graph nodes. It uses graph embedding to maintain structural relationships within the label graph. Meanwhile, the CSA module produces distinctive features for each category by utilizing spatial attention scores, thereby improving classification accuracy. Then, the MHA module facilitates extensive feature interactions, enhancing the expressiveness of text features and supporting the handling of long-range dependencies. Experimental evaluations conducted on two MLTC datasets show that our proposed model outperforms existing MLTC algorithms, achieving state-of-the-art performance. These results highlight the effectiveness of our attention-based approach in tackling the complexity of MLTC tasks.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.






Similar content being viewed by others
References
Schapire RE, Singer Y (2000) Boostexter: A boosting-based system for text categorization. Mach Learn 39:135–168
Katakis I, Tsoumakas G, Vlahavas I (2008) Multilabel text classification for automated tag suggestion. ECML PKDD discovery challenge 75:2008
Gopal S, Yang Y, Bai B, Niculescu-Mizil A (2012) Bayesian models for large-scale hierarchical classification. Advances in Neural Information Processing Systems25
Zeng Y, Mai S, Hu H (2021) Which is making the contribution: Modulating unimodal and cross-modal dynamics for multimodal sentiment analysis. In: Findings of the Association for Computational Linguistics: EMNLP 2021, 1262–1274
Zhang M-L, Li Y-K, Liu X-Y, Geng X (2018) Binary relevance for multi-label learning: an overview. Front Comp Sci 12:191–202
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85:333–359
Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. Advances in neural information processing systems14
Zhang M-L, Zhou Z-H (2007) Ml-knn: A lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048
Read J, Perez-Cruz F (2014) Deep learning for multi-label classification. arXiv preprintarXiv:1502.05988
Chen Y (2015) Convolutional neural network for sentence classification. Master’s thesis, University of Waterloo
Qian Q, Tian B, Huang M, Liu Y, Zhu X, Zhu X (2015) Learning tag embeddings and tag-specific composition functions in recursive neural network. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1365–1374
Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY (2020) A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems 32(1):4–24
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprintarXiv:1609.02907
Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y et al (2017) Graph attention networks. stat 1050(20):10–48550
Yang P, Sun X, Li W, Ma S, Wu W, Wang H (2018) Sgm: Sequence generation model for multi-label classification. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 3915–3926
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprintarXiv:1810.04805
Yarullin R, Serdyukov P (2021) Bert for sequence-to-sequence multi-label text classification. In: Analysis of Images, Social Networks and Texts: 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020, Revised Selected Papers 9, pp. 187–198
Yang Z, Emmert-Streib F (2024) Optimal performance of binary relevance cnn in targeted multi-label text classification. Knowl-Based Syst 284:111286
Ionescu RT, Butnaru AM (2019) Vector of locally-aggregated word embeddings (vlawe): A novel document-level representation. arXiv preprintarXiv:1902.08850
Song R, Chen X, Liu Z, An H, Zhang Z, Wang X, Xu H (2021) Label mask for multi-label text classification. arXiv preprintarXiv:2106.10076
Liu H, Yuan C, Wang X (2020) Label-wise document pre-training for multi-label text classification. In: Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14–18, 2020, Proceedings, Part I 9, pp. 641–653
Xiao L, Huang X, Chen B, Jing L (2019) Label-specific document representation for multi-label text classification. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 466–475
Vu H-T, Nguyen M-T, Nguyen V-C, Pham M-H, Nguyen V-Q, Nguyen V-H (2023) Label-representative graph convolutional network for multi-label text classification. Appl Intell 53(12):14759–14774
Xiao L, Zhang X, Jing L, Huang C, Song M (2021) Does head label help for long-tailed multi-label text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 14103–14111
Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. Proceedings of the AAAI Conference on Artificial Intelligence 33:7370–7377
Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019) Simplifying graph convolutional networks. In: International Conference on Machine Learning, pp. 6861–6871
Zeng D, Zha E, Kuang J, Shen Y (2024) Multi-label text classification based on semantic-sensitive graph convolutional network. Knowl-Based Syst 284:111303
Li I, Feng A, Wu H, Li T, Suzumura T, Dong R (2021) Ligcn: label-interpretable graph convolutional networks for multi-label text classification. arXiv preprintarXiv:2103.14620
Ma Q, Yuan C, Zhou W, Hu S (2021) Label-specific dual graph neural network for multi-label text classification. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 3855–386
Wang K, Han SC, Poon J (2022) Induct-gcn: Inductive graph convolutional networks for text classification. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 1243–1249
Wei X, Huang J, Zhao R, Yu H, Xu Z (2024) Multi-label text classification model based on multi-level constraint augmentation and label association attention. ACM Transactions on Asian and Low-Resource Language Information Processing 23(1):1–20
Huang X, Chen B, Xiao L, Yu J, Jing L (2022) Label-aware document representation via hybrid attention for extreme multi-label text classification. Neural Processing Letters, 1–17
Wang B, Liu J, Chen S, Ling X, Wang S, Zhang W, Chen L, Zhang J (2021) A residual dynamic graph convolutional network for multi-label text classification. In: Natural Language Processing and Chinese Computing: 10th CCF International Conference, NLPCC 2021, Qingdao, China, October 13–17, 2021, Proceedings, Part I 10, pp. 664–675
Yan Y, Liu F, Zhuang X, Ju J (2023) An r-transformer_bilstm model based on attention for multi-label text classification. Neural Process Lett 55(2):1293–1316
Liu M, Liu L, Cao J, Du Q (2022) Co-attention network with label embedding for text classification. Neurocomputing 471:61–69
Pal A, Selvakumar M, Sankarasubbu M (2020) Multi-label text classification using attention-based graph neural network. arXiv preprintarXiv:2003.11644
Lin C, Zhu C, Zhu W (2022) Multi-label text classification based on graph attention network and self-attention mechanism. In: 2nd International Conference on Information Technology and Intelligent Control (CITIC 2022),12346, 272–280
Sundermeyer M, Schlüter R, Ney H (2012) Lstm neural networks for language modeling. In: Interspeech,2012, 194–197
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems32
Zhu K, Wu J (2021) Residual attention: A simple but effective method for multi-label recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 184–193
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems30
Gomez R (2018) Understanding categorical cross-entropy loss, binary cross-entropy loss, softmax loss, logistic loss, focal loss and all those confusing names. URL:https://gombru.github.io/2018/05/23/cross_entropy_loss/(visited on 29/03/2019)
Debole F, Sebastiani F (2005) An analysis of the relative hardness of reuters-21578 subsets. J Am Soc Inform Sci Technol 56(6):584–596
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprintarXiv:1711.05101
Xu P, Xiao L, Liu B, Lu S, Jing L, Yu J (2023) Label-specific feature augmentation for long-tailed multi-label text classification. Proceedings of the AAAI Conference on Artificial Intelligence 37:10602–10610
Wang G, Du Y, Jiang Y, Liu J, Li X, Chen X, Gao H, Xie C, Lee Y-l (2024) Label-text bi-attention capsule networks model for multi-label text classification. Neurocomputing588, 127671
Chen G, Ye D, Xing Z, Chen J, Cambria E (2017) Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2377–2383
Li B, Chen Y, Zeng L (2024) Kenet: Knowledge-enhanced doc-label attention network for multi-label text classification. In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 11961–11965
Liu N, Wang Q, Ren J (2021) Label-embedding bi-directional attentive model for multi-label text classification. Neural Process Lett 53:375–389
Zhou B, Cui Q, Wei X-S, Chen Z-M (2020) Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9719–9728
Author information
Authors and Affiliations
School of Artificial Intelligence, Chongqing University of Technology, Chongqing, China
Zhi Liu, Yunjie Huang, Xincheng Xia & Yihao Zhang
- Zhi Liu
You can also search for this author inPubMed Google Scholar
- Yunjie Huang
You can also search for this author inPubMed Google Scholar
- Xincheng Xia
You can also search for this author inPubMed Google Scholar
- Yihao Zhang
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toZhi Liu.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Z., Huang, Y., Xia, X.et al. All is attention for multi-label text classification.Knowl Inf Syst67, 1249–1270 (2025). https://doi.org/10.1007/s10115-024-02253-w
Received:
Revised:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative