The rampant spread of cyberbullying content poses a growing threat to societal well-being. However, research on cyberbullying detection in Chinese remains underdeveloped, primarily due to the lack of comprehensive and reliable datasets. Notably, no existing Chinese dataset is specifically tailored for cyberbullying detection. Moreover, while comments play a crucial role within sessions, current session-based datasets often lack detailed, fine-grained annotations at the comment level. To address these limitations, we present a novel Chinese cyberbullying dataset, termed SCCD, which consists of 677 session-level samples sourced from a major social media platform Weibo. Moreover, each comment within the sessions is annotated with fine-grained labels rather than conventional binary class labels. Empirically, we evaluate the performance of various baseline methods on SCCD, highlighting the challenges for effective Chinese cyberbullying detection.
Qingpo Yang, Yakai Chen, Zihui Xu, Yu-ming Shang, Sanchuan Guo, and Xi Zhang. 2025.SCCD: A Session-based Dataset for Chinese Cyberbullying Detection. InProceedings of the 31st International Conference on Computational Linguistics, pages 9533–9545, Abu Dhabi, UAE. Association for Computational Linguistics.
@inproceedings{yang-etal-2025-sccd, title = "{SCCD}: A Session-based Dataset for {C}hinese Cyberbullying Detection", author = "Yang, Qingpo and Chen, Yakai and Xu, Zihui and Shang, Yu-ming and Guo, Sanchuan and Zhang, Xi", editor = "Rambow, Owen and Wanner, Leo and Apidianaki, Marianna and Al-Khalifa, Hend and Eugenio, Barbara Di and Schockaert, Steven", booktitle = "Proceedings of the 31st International Conference on Computational Linguistics", month = jan, year = "2025", address = "Abu Dhabi, UAE", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.coling-main.639/", pages = "9533--9545", abstract = "The rampant spread of cyberbullying content poses a growing threat to societal well-being. However, research on cyberbullying detection in Chinese remains underdeveloped, primarily due to the lack of comprehensive and reliable datasets. Notably, no existing Chinese dataset is specifically tailored for cyberbullying detection. Moreover, while comments play a crucial role within sessions, current session-based datasets often lack detailed, fine-grained annotations at the comment level. To address these limitations, we present a novel Chinese cyberbullying dataset, termed SCCD, which consists of 677 session-level samples sourced from a major social media platform Weibo. Moreover, each comment within the sessions is annotated with fine-grained labels rather than conventional binary class labels. Empirically, we evaluate the performance of various baseline methods on SCCD, highlighting the challenges for effective Chinese cyberbullying detection."}
%0 Conference Proceedings%T SCCD: A Session-based Dataset for Chinese Cyberbullying Detection%A Yang, Qingpo%A Chen, Yakai%A Xu, Zihui%A Shang, Yu-ming%A Guo, Sanchuan%A Zhang, Xi%Y Rambow, Owen%Y Wanner, Leo%Y Apidianaki, Marianna%Y Al-Khalifa, Hend%Y Eugenio, Barbara Di%Y Schockaert, Steven%S Proceedings of the 31st International Conference on Computational Linguistics%D 2025%8 January%I Association for Computational Linguistics%C Abu Dhabi, UAE%F yang-etal-2025-sccd%X The rampant spread of cyberbullying content poses a growing threat to societal well-being. However, research on cyberbullying detection in Chinese remains underdeveloped, primarily due to the lack of comprehensive and reliable datasets. Notably, no existing Chinese dataset is specifically tailored for cyberbullying detection. Moreover, while comments play a crucial role within sessions, current session-based datasets often lack detailed, fine-grained annotations at the comment level. To address these limitations, we present a novel Chinese cyberbullying dataset, termed SCCD, which consists of 677 session-level samples sourced from a major social media platform Weibo. Moreover, each comment within the sessions is annotated with fine-grained labels rather than conventional binary class labels. Empirically, we evaluate the performance of various baseline methods on SCCD, highlighting the challenges for effective Chinese cyberbullying detection.%U https://aclanthology.org/2025.coling-main.639/%P 9533-9545
Qingpo Yang, Yakai Chen, Zihui Xu, Yu-ming Shang, Sanchuan Guo, and Xi Zhang. 2025.SCCD: A Session-based Dataset for Chinese Cyberbullying Detection. InProceedings of the 31st International Conference on Computational Linguistics, pages 9533–9545, Abu Dhabi, UAE. Association for Computational Linguistics.