{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:09:36Z","timestamp":1760058576215,"version":"build-2065373602"},"reference-count":46,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,4,11]],"date-time":"2025-04-11T00:00:00Z","timestamp":1744329600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Recent studies have shown that hate speech on social media negatively impacts users\u2019 mental health and is a contributing factor to suicide attempts. On a broader scale, online hate speech can undermine social stability. With the continuous growth of the internet, the prevalence of online hate speech is rising, making its detection an urgent issue. Recent advances in natural language processing, particularly with transformer-based models, have shown significant promise in hate speech detection. However, these models come with a large number of parameters, leading to high computational requirements and making them difficult to deploy on personal computers. To address these challenges, knowledge distillation offers a solution by training smaller student networks using larger teacher networks. Recognizing that learning also occurs through peer interactions, we propose a knowledge distillation method called Deep Distill\u2013Mutual Learning (DDML). DDML employs one teacher network and two or more student networks. While the student networks benefit from the teacher\u2019s knowledge, they also engage in mutual learning with each other. We trained numerous deep neural networks for hate speech detection based on DDML and demonstrated that these networks perform well across various datasets. We tested our method across ten languages and nine datasets. The results demonstrate that DDML enhances the performance of deep neural networks, achieving an average F1 score increase of 4.87% over the baseline.<\/jats:p>","DOI":"10.3390\/e27040417","type":"journal-article","created":{"date-parts":[[2025,4,11]],"date-time":"2025-04-11T10:05:51Z","timestamp":1744365951000},"page":"417","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["DDML: Multi-Student Knowledge Distillation for Hate Speech"],"prefix":"10.3390","volume":"27","author":[{"given":"Ze","family":"Liu","sequence":"first","affiliation":[{"name":"School of Cyber Science and Engineering, Sichuan University, Chengdu 610211, China"}]},{"given":"Zerui","family":"Shao","sequence":"additional","affiliation":[{"name":"School of Cyber Science and Engineering, Sichuan University, Chengdu 610211, China"}]},{"given":"Haizhou","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Cyber Science and Engineering, Sichuan University, Chengdu 610211, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0485-1975","authenticated-orcid":false,"given":"Beibei","family":"Li","sequence":"additional","affiliation":[{"name":"School of Cyber Science and Engineering, Sichuan University, Chengdu 610211, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,4,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Ouaddah, A., Elkalam, A.A., and Ouahman, A.A. (2017). Towards a novel privacy-preserving access control model based on blockchain technology in IoT. Europe and MENA Cooperation Advances in Information and Communication Technologies, Springer.","DOI":"10.1007\/978-3-319-46568-5_53"},{"key":"ref_2","first-page":"3","article-title":"Hate Speech as an Indicator for the State of the Society","volume":"34","author":"Reiners","year":"2021","journal-title":"J. Media Psychol."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"MacAvaney, S., Yao, H.R., Yang, E., Russell, K., Goharian, N., and Frieder, O. (2019). Hate speech detection: Challenges and solutions. PLoS ONE, 14.","DOI":"10.1371\/journal.pone.0221152"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Shawkat, N., Saquer, J., and Shatnawi, H. (2024, January 18\u201320). Evaluation of Different Machine Learning and Deep Learning Techniques for Hate Speech Detection. Proceedings of the the 2024 ACM Southeast Conference, Marietta, GA, USA.","DOI":"10.1145\/3603287.3651218"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"101608","DOI":"10.1016\/j.avb.2021.101608","article-title":"Internet, social media and online hate speech. Systematic review","volume":"58","author":"Vega","year":"2021","journal-title":"Aggress. Violent Behav."},{"key":"ref_6","unstructured":"Naveed, H., Khan, A.U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., and Mian, A. (2023). A comprehensive overview of large language models. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ranasinghe, T., and Zampieri, M. (2020). Multilingual offensive language identification with cross-lingual embeddings. arXiv.","DOI":"10.18653\/v1\/2020.emnlp-main.470"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Aluru, S.S., Mathew, B., Saha, P., and Mukherjee, A. (2020, January 14\u201318). A deep dive into multilingual hate speech classification. Proceedings of the Machine Learning and Knowledge Discovery in Databases, Applied Data Science and Demo Track: European Conference, Ghent, Belgium.","DOI":"10.1007\/978-3-030-67670-4_26"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Pamungkas, E.W., and Patti, V. (2019, January 29\u201331). Cross-domain and cross-lingual abusive language detection: A hybrid approach with deep learning and a multilingual lexicon. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, Florence, Italy.","DOI":"10.18653\/v1\/P19-2051"},{"key":"ref_10","unstructured":"Hinton, G. (2015). Distilling the Knowledge in a Neural Network. arXiv."},{"key":"ref_11","unstructured":"Devlin, J. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Conneau, A. (2019). Unsupervised cross-lingual representation learning at scale. arXiv.","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Khan, S., Shahid, M., and Singh, N. (2022). White-box attacks on hate-speech BERT classifiers in german with explicit and implicit character level defense. arXiv.","DOI":"10.54646\/bijiiac.004"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Perifanos, K., and Goutsos, D. (2021). Multimodal Hate Speech Detection in Greek Social Media. Multimodal Technol. Interact., 5.","DOI":"10.3390\/mti5070034"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zampieri, M., Nakov, P., Rosenthal, S., Atanasova, P., Karadzhov, G., Mubarak, H., Derczynski, L., Pitenis, Z., and \u00c7\u00f6ltekin, \u00c7. (2020). SemEval-2020 task 12: Multilingual offensive language identification in social media (OffensEval 2020). arXiv.","DOI":"10.18653\/v1\/2020.semeval-1.188"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Basile, V., Di Maro, M., Croce, D., and Passaro, L. (2020, January 17). Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian. Proceedings of the Ceur Workshop Proceedings, Virtual Event.","DOI":"10.4000\/books.aaccademia.6747"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"597","DOI":"10.1162\/tacl_a_00288","article-title":"Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond","volume":"7","author":"Artetxe","year":"2019","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Ruiz-Garcia, M. (2022). Model architecture can transform catastrophic forgetting into positive transfer. Sci. Rep., 12.","DOI":"10.1038\/s41598-022-14348-x"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"102544","DOI":"10.1016\/j.ipm.2021.102544","article-title":"A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection","volume":"58","author":"Pamungkas","year":"2021","journal-title":"Inf. Process. Manag."},{"key":"ref_20","unstructured":"Jiang, A., and Zubiaga, A. (September, January 30). Cross-lingual capsule network for hate speech detection in social media. Proceedings of the 32nd ACM Conference on Hypertext and Social Media, Virtual Event."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Sahin, U., Kucukkaya, I.E., Ozcelik, O., and Toraman, C. (2023, January 5\u20138). Zero and Few-Shot Hate Speech Detection in Social Media Messages Related to Earthquake Disaster. Proceedings of the 2023 31st Signal Processing and Communications Applications Conference, Istanbul, Turkiye.","DOI":"10.1109\/SIU59756.2023.10224056"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Nozza, D. (2021, January 2\u20135). Exposing the limits of zero-shot cross-lingual hate speech detection. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Bangkok, Thailand.","DOI":"10.18653\/v1\/2021.acl-short.114"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Montariol, S., Riabi, A., and Seddah, D. (2022). Multilingual auxiliary tasks training: Bridging the gap between languages for zero-shot transfer of hate speech detection models. arXiv.","DOI":"10.18653\/v1\/2022.findings-aacl.33"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Li, B., Shi, Y., Guo, Y., Kong, Q., and Jiang, Y. (2022, January 2\u20135). Incentive and knowledge distillation based federated learning for cross-silo applications. Proceedings of the IEEE INFOCOM 2022-IEEE Conference on Computer Communications Workshops, Virtual Event.","DOI":"10.1109\/INFOCOMWKSHPS54753.2022.9798320"},{"key":"ref_25","unstructured":"Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., and Bengio, Y. (2014). Fitnets: Hints for thin deep nets. arXiv."},{"key":"ref_26","unstructured":"Ding, Q., Wu, S., Sun, H., Guo, J., and Xia, S.T. (2019). Adaptive regularization of labels. arXiv."},{"key":"ref_27","unstructured":"Chen, D., Mei, J.P., Zhang, Y., Wang, C., Wang, Z., Feng, Y., and Chen, C. (2021, January 2\u20139). Cross-layer distillation with semantic calibration. Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Yang, Z., Li, Z., Zeng, A., Li, Z., Yuan, C., and Li, Y. (2024, January 17\u201318). ViTKD: Feature-based Knowledge Distillation for Vision Transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPRW63382.2024.00145"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Park, W., Kim, D., Lu, Y., and Cho, M. (2019, January 15\u201320). Relational knowledge distillation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00409"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhang, C., and Peng, Y. (2018). Better and faster: Knowledge transfer from multiple self-supervised learning tasks via graph distillation for video classification. arXiv.","DOI":"10.24963\/ijcai.2018\/158"},{"key":"ref_31","unstructured":"Lee, S., and Song, B.C. (2019). Graph-based knowledge distillation by multi-head attention network. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Zhang, R., Shen, J., Liu, T., Liu, J., Bendersky, M., Najork, M., and Zhang, C. (2024, January 25\u201329). Knowledge Distillation with Perturbed Loss: From a Vanilla Teacher to a Proxy Teacher. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain.","DOI":"10.1145\/3637528.3671851"},{"key":"ref_33","unstructured":"Kim, S.W., and Kim, H.E. (2017, January 24\u201326). Transferring knowledge to smaller network with class-distance loss. Proceedings of the International Conference on Learning Representations, Toulon, France."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Mandl, T., Modha, S., Majumder, P., Patel, D., Dave, M., Mandlia, C., and Patel, A. (2019, January 12). Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages. Proceedings of the 11th Annual Meeting of the Forum for Information Retrieval Evaluation, New York, NY, USA.","DOI":"10.1145\/3368567.3368584"},{"key":"ref_35","unstructured":"Wiegand, M., Siegel, M., and Ruppenhofer, J. (2018, January 21). Overview of the germeval 2018 shared task on the identification of offensive language. Proceedings of the GermEval 2018 Workshop, Vienna, Austria."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Sanguinetti, M., Comandini, G., Di Nuovo, E., Frenda, S., Stranisci, M., Bosco, C., Caselli, T., Patti, V., and Russo, I. (2020, January 17). Haspeede 2@ evalita2020: Overview of the evalita 2020 hate speech detection task. Proceedings of the Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, Online.","DOI":"10.4000\/books.aaccademia.6897"},{"key":"ref_37","first-page":"214","article-title":"Overview of the task on automatic misogyny identification at IberEval 2018","volume":"2150","author":"Fersini","year":"2018","journal-title":"Ibereval@sepln"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., and Kumar, R. (2019). Predicting the type and target of offensive posts in social media. arXiv.","DOI":"10.18653\/v1\/N19-1144"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Deng, J., Zhou, J., Sun, H., Zheng, C., Mi, F., Meng, H., and Huang, M. (2022). COLD: A benchmark for Chinese offensive language detection. arXiv.","DOI":"10.18653\/v1\/2022.emnlp-main.796"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"De Pelle, R.P., and Moreira, V.P. (2017, January 21\u201325). Offensive comments in the brazilian web: A dataset and baseline results. Proceedings of the Anais do VI Brazilian Workshop on Social Network Analysis and Mining, Brasilia, Brazil.","DOI":"10.5753\/brasnam.2017.3260"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1086","DOI":"10.1109\/TCSS.2023.3252401","article-title":"Model-agnostic meta-learning for multilingual hate speech detection","volume":"11","author":"Awal","year":"2023","journal-title":"IEEE Trans. Comput. Soc. Syst."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"121115","DOI":"10.1016\/j.eswa.2023.121115","article-title":"Improving hate speech detection using cross-lingual learning","volume":"235","author":"Firmino","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Fillies, J., Hoffmann, M.P., and Paschke, A. (2023, January 15\u201318). Multilingual Hate Speech Detection: Comparison of Transfer Learning Methods to Classify German, Italian, and Spanish Posts. Proceedings of the 2023 IEEE International Conference on Big Data, Sorrento, Italy.","DOI":"10.1109\/BigData59044.2023.10386244"},{"key":"ref_44","unstructured":"Sanh, V. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv."},{"key":"ref_45","unstructured":"Ca\u00f1ete, J., Chaperon, G., Fuentes, R., Ho, J.H., Kang, H., and P\u00e9rez, J. (2023). Spanish pre-trained bert model and evaluation data. arXiv."},{"key":"ref_46","first-page":"1","article-title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J. Mach. Learn. Res."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/4\/417\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:13:00Z","timestamp":1760029980000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/4\/417"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,11]]},"references-count":46,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,4]]}},"alternative-id":["e27040417"],"URL":"https:\/\/doi.org\/10.3390\/e27040417","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2025,4,11]]}}}