252Accesses
Abstract
Violence against women is a major social issue. One in every three women worldwide has been subjected to physical or sexual violence. The pervasive violence against women in the physical world, the ever-growing presence of social media in our lives, and its lack of content moderation have led to an influx of misogynistic social media content. We contribute to preventing violence against women by introducing a BERT architecture with domain-adaptive pre-training to detect misogynistic tweets in Spanish automatically. We used the IbeEval 2018 Spanish dataset for automatic misogyny identification, obtaining an accuracy of 84.60%, precision of 79.64%, recall at 86.70%, and F-1 score of 83.02%, outperforming the state of the art. We also conducted a manual error analysis and discovered 469 mislabeled tweets and a misogynistic bias in the IbeEval 2018 Spanish dataset. Our debiased model outperformed the current literature on automatic misogyny detection with an accuracy of 84.35%, precision of 84.64%, recall of 83.93%, and F-1 score of 84.28%. Lastly, we addressed the need for misogyny detection on other social media by experimenting with a manually curated and labeled dataset of Facebook comments in Spanish for automatic misogyny detection. We obtained excellent results with an accuracy of 87.85%. Misogyny is a complex social issue, so an interdisciplinary approach might benefit future models for automatically detecting misogyny.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.



Similar content being viewed by others
References
Aayel A, Magdy W (2021) Stance detection on social media: state of the art and trends. Inf Process Manag 58(4):102–597.https://doi.org/10.1016/j.ipm.2021.102597
Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2012) Effects of user similarity in social media. In: Proceedings of the fifth ACM international conference on web search and data mining. Association for Computing Machinery, New York, NY, USA, pp 703–712.https://doi.org/10.1145/2124295.2124378
Banda JM, Tekumalla R, Wang G, Yu J, Liu T, Ding Y, Artemova K, Tutubalina E, Chowell G (2022) A large-scale COVID-19 Twitter chatter dataset for open scientific research—an international collaboration. Zenodo.https://doi.org/10.5281/zenodo.7297788
Bashar MA, Nayak R, Suzor N (2020) Regularising lstm classifier by transfer learning for detecting misogynistic tweets with small training set. Knowl Inf Syst 62:4029–4054.https://doi.org/10.1007/s10115-020-01481-0
Basile V, Bosco C, Fersini E, Nozza D, Patti V, Rangel Pardo FM, Rosso P, Sanguinetti M (2019) SemEval-2019 task 5: multilingual detection of hate speech against immigrants and women in Twitter. In: Proceedings of the 13th international workshop on semantic evaluation. Association for Computational Linguistics, Minneapolis, Minnesota, USA, pp 54–63.https://doi.org/10.18653/v1/S19-2007
Blake KR, O’Dean SM, Lian J, Denson TF (2021) Misogynistic tweets correlate with violence against women. Psychol Sci 32(3):315–325.https://doi.org/10.1177/0956797620968529
Cañete J, Chaperon G, Fuentes R, Pérez J, Ho J-H, Kang H (2020) Spanish pre-trained bert model and evaluation data. In: Practical machine learning for developing countries workshop at the international conference on learning representations 2020
Canós JS (2018) Misogyny identification through SVM at ibereval 2018. In: Rosso P, Gonzalo J, Martínez R, Montalvo S, de Albornoz JC (eds) Proceedings of the third workshop on evaluation of human language technologies for Iberian languages (IberEval 2018) co-located with 34th conference of the Spanish Society for Natural Language Processing (SEPLN 2018), Sevilla, Spain, September 18th, 2018. CEUR Workshop Proceedings, vol 2150, pp 229–233. CEUR-WS.org.http://ceur-ws.org/Vol-2150/AMI_paper1.pdf
Comito C, Falcone D, Talia D (2017) A peak detection method to uncover events from social media. In: Proceedings of the IEEE international conference on data science and advanced analytics (DSAA), pp 459–467 (2017).https://doi.org/10.1109/DSAA.2017.69
Council of Europe (2023) Cyberviolence against women.https://www.coe.int/en/web/cyberviolence/cyberviolence-against-women
Coyne SM, Rogers AA, Zurcher JD, Stockdale L, Booth M (2020) Does time spent using social media impact mental health? An eight year longitudinal study. Comput Hum Behav 104:106160.https://doi.org/10.1016/j.chb.2019.106160
Devlin J, Chang M.-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (long and short papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 4171–4186.https://doi.org/10.18653/v1/N19-1423
Dwivedi A, Lewis C (2021) How millennials’ life concerns shape social media behaviour. Behav Inf Technol 40(14):1467–1484.https://doi.org/10.1080/0144929X.2020.1760938
Fersini E, Rosso P, Anzovino M (2018) Overview of the task on automatic misogyny identification at ibereval 2018. In: Rosso P, Gonzalo J, Martínez R, Montalvo S, de Albornoz JC (eds) Proceedings of the third workshop on evaluation of human language technologies for Iberian languages (IberEval 2018) co-located with 34th conference of the Spanish Society for Natural Language Processing (SEPLN 2018), Sevilla, Spain, September 18th, 2018. CEUR Workshop proceedings, vol 2150, pp 214–228. CEUR-WS.org.http://ceur-ws.org/Vol-2150/overview-AMI.pdf
Frenda S, Ghanem B, Montes-y-Gómez M (2018) Exploration of misogyny in Spanish and English tweets. In: Rosso P, Gonzalo J, Martínez R, Montalvo S, de Albornoz JC (eds) Proceedings of the third workshop on evaluation of human language technologies for Iberian languages (IberEval 2018) co-located with 34th conference of the Spanish society for natural language processing (SEPLN 2018), Sevilla, Spain, September 18th, 2018. CEUR workshop proceedings, vol 2150, pp 260–267. CEUR-WS.org.http://ceur-ws.org/Vol-2150/AMI_paper6.pdf
Fulper R, Ciampaglia GL, Ferrara E, Ahn Y, Flammini A, Menczer F, Lewis B, Rowe K (2014) Misogynistic language on twitter and sexual violence. In: ChASM’14: computational approaches to social modeling
García-Díaz J.A, Cánovas-García M, Colomo-Palacios R, Valencia-García R (2021) Detecting misogyny in Spanish tweets. an approach based on linguistics features and word embeddings. Future Gener Comput Syst 114:506–518.https://doi.org/10.1016/j.future.2020.08.032
García-Díaz J, Jiménez-Zafra SM, García-Cumbreras MA (2022) Valencia–García R Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers. Complex Intell Sys.https://doi.org/10.1007/s40747-022-00693-x
Gobierno de México (2016) Qué es el feminicidio y cómo identificarlo?https://www.gob.mx/conavim/articulos/que-es-el-feminicidio-y-como-identificarlo?idiom=es
Goenaga I, Atutxa A, Gojenola K, Casillas A, de Ilarraza AD, Ezeiza N, Oronoz M, Pérez A, Perez-de-Viñaspre O (2018)Automatic misogyny identification using neural networks. In: Rosso P, Gonzalo J, Martínez R, Montalvo S, de Albornoz JC (eds) Proceedings of the third workshop on evaluation of human language technologies for Iberian languages (IberEval 2018) co-located with 34th conference of the Spanish Society for Natural Language Processing (SEPLN 2018), Sevilla, Spain, September 18th, 2018. CEUR workshop proceedings, vol 2150, pp 249–254. CEUR-WS.org (2018).http://ceur-ws.org/Vol-2150/AMI_paper4.pdf
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge (2016).http://www.deeplearningbook.org
Gururangan S, Marasović A, Swayamdipta S, Lo K, Beltagy I, Downey D, Smith NA (2020) Don’t stop pretraining: adapt language models to domains and tasks. In: Proceedings of the 58th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp 8342–8360.https://doi.org/10.18653/v1/2020.acl-main.740
INEGI: Modulo sobre Ciberacoso 2020. INEGI.https://www.inegi.org.mx/contenidos/saladeprensa/boletines/2021/EstSociodemo/MOCIBA-2020.pdf
Kemp S (2023) Datareportal: digital 2023 global overview report.https://datareportal.com/reports/digital-2023-global-overview-report
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv.https://doi.org/10.48550/ARXIV.1412.6980
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv.https://doi.org/10.48550/ARXIV.1711.05101
Manne K (2017) Down girl: the logic of Misogyny. Oxford University Press, Oxford
Nina-Alcocer V (2018) AMI at ibereval2018 automatic misogyny identification in Spanish and English tweets. In: Rosso P, Gonzalo J, Martínez R, Montalvo S, de Albornoz JC (eds) Proceedings of the third workshop on evaluation of human language technologies for Iberian languages (IberEval 2018) co-located with 34th conference of the Spanish Society for Natural Language Processing (SEPLN 2018), Sevilla, Spain, September 18th, 2018. CEUR Workshop Proceedings, vol 2150, pp 274–279. CEUR-WS.org.http://ceur-ws.org/Vol-2150/AMI_paper8.pdf
Observatorio Nacional de la Violencia Contra las Mujeres y los Integrantes del Grupo Familiar (2021) Datos y evidencias sobre violencia hacia las mujeres e integrantes del grupo familiar, según fuente de información.https://observatorioviolencia.pe/datos/
Otterbacher J, Bates J, Clough P (2017) Competent men and warm women: gender stereotypes and backlash in image search results. In: Proceedings of the 2017 CHI conference on human factors in computing systems (CHI’17). Association for Computing Machinery, New York, NY, USA, pp 6620–6631.https://doi.org/10.1145/3025453.3025727
Pamungkas EW, Basile V, atti V (2020) Misogyny detection in twitter: a multilingual and cross-domain study. Inf Process Manag 57(6):102360.https://doi.org/10.1016/j.ipm.2020.102360
Pamungkas EW, Cignarella AT, Basile V, Patti V (2020) 14-ExLab@UniTo for AMI at ibereval2018: exploiting lexical knowledge for detecting misogyny in English and Spanish tweets. In: Rosso P, Gonzalo J, Martínez R, Montalvo S, de Albornoz JC (eds) Proceedings of the third workshop on evaluation of human language technologies for Iberian languages (IberEval 2018) co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2018), Sevilla, Spain, September 18th, 2018. CEUR workshop proceedings, vol 2150, pp 234–241. CEUR-WS.org.http://ceurws.org/Vol2150/AMI_paper2.pdf
Plaza-Del-Arco F-M, Molina-González MD, Ureña-López LA, Martín-Valdivia MT (2020) Detecting misogyny and xenophobia in Spanish tweets using language technologies. ACM Trans Internet Technol.https://doi.org/10.1145/3369869
Plaza-del-Arco FM, Molina-González MD, Ureña-López LA, Martín-Valdivia MT (2021) Comparing pre-trained language models for Spanish hate speech detection. Expert Syst with Appl 166:114120.https://doi.org/10.1016/j.eswa.2020.114120
Posetti J, Aboulez N, Bontcheva K, Harrison J, Waisbord S (2020) Online violence against women journalists: a global snapshot of incidence and impacts. UNESCO.https://unesdoc.unesco.org/ark:/48223/pf0000375136
Rodríguez DA, Díaz-Ramírez A, Miranda-Vega JE, Trujillo L (2021) A systematic review of computer science solutions for addressing violence against women and children. IEEE Access 9:114622–114639.https://doi.org/10.1109/ACCESS.2021.3103459
Secretaria de Seguridad y Protección Ciudadana de México (2022) Información sobre violencia contra las mujeres Incidencia delictiva y llamadas de emergencia 9-1-1.https://drive.google.com/file/d/1jvGGrA31Q361fOuNChetkBu0pva_MGxF/view
Srivastava K, Chaudhury S, Bhat PS, Sahu S (2017) Misogyny, feminism, and sexual harassment. Ind Psychiatry J 26(2):111–113.https://doi.org/10.4103/ipj.ipj_32
Sveen W, Dewan M, Dexheimer JW (2022) The risk of coding racism into pediatric sepsis care: the necessity of antiracism in machine learning. J Pediatr 247:129–132.https://doi.org/10.1016/j.jpeds.2022.04.024
Taylor SJ, Muchnik L, Kumar M, Aral S (2023) Identity effects in social media. Nat Hum Behav 7(1):27–37.https://doi.org/10.1038/s41562-022-01459-8
Twitter I (2014) The 2014 #yearontwitter. Twitter.https://blog.twitter.com/official/en_us/a/2014/the-2014-yearontwitter.html
UN Women (2021) Facts and figures: ending violence against women.https://www.unwomen.org/en/what-we-do/ending-violence-against-women/facts-and-figures
United Nations Office for the Coordination of Humanitarian Affairs (2020a) A double pandemic: gender-based violence in Latin America and the early experience of women during Covid-19.https://bit.ly/3I6UQOE
United Nations Office for the Coordination of Humanitarian Affairs (2020b) Surge in violence against girls and women in Latin America and Caribbean.https://bit.ly/3O8GIbC
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L.u, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30.http://arxiv.org/abs/1706.03762
Vogels EA (2021) The state of online harassment. Technical report, Pew Research Center.https://www.pewresearch.org/internet/2021/01/13/the-state-of-online-harassment/
World Health Organization (2021) Violence against women.https://www.who.int/news-room/fact-sheets/detail/violence-against-women
Zou J, Schiebinger L (2018) Ai can be sexist and racist—it’s time to make it fair. Nature 559(7714):324–326.https://doi.org/10.1038/d41586-018-05707-8
Author information
Authors and Affiliations
Department of Computer Systems, Tecnológico Nacional de México/IT Mexicali, Av Tecnológico s/n, Mexicali, 21376, Baja California, Mexico
Dalia A. Rodríguez, Julia Diaz-Escobar & Arnoldo Díaz-Ramírez
Department of Electrics and Electronics, Tecnológico Nacional de México/IT Tijuana, Blvd. Industrial s/n, Tijuana, 22430, Baja California, Mexico
Leonardo Trujillo
- Dalia A. Rodríguez
You can also search for this author inPubMed Google Scholar
- Julia Diaz-Escobar
You can also search for this author inPubMed Google Scholar
- Arnoldo Díaz-Ramírez
You can also search for this author inPubMed Google Scholar
- Leonardo Trujillo
You can also search for this author inPubMed Google Scholar
Contributions
D.R. and A.D-R. conceived of the presented idea. D.R. and J.D-E. developed the theory, and D.R. performed the computations with the supervision of J.D-E. and L.T. A.D-R, and J-D-E verified the analytical methods. A.D-R., J.D-E and L.T. supervised the findings of this work. D.R. wrote the manuscript with input from all authors. All authors discussed the results and contributed to the final manuscript.
Corresponding author
Correspondence toArnoldo Díaz-Ramírez.
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rodríguez, D.A., Diaz-Escobar, J., Díaz-Ramírez, A.et al. Domain-adaptive pre-training on a BERT model for the automatic detection of misogynistic tweets in Spanish.Soc. Netw. Anal. Min.13, 126 (2023). https://doi.org/10.1007/s13278-023-01128-2
Received:
Revised:
Accepted:
Published:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative