{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T01:34:40Z","timestamp":1777080880988,"version":"3.51.4"},"reference-count":26,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2025,8,10]],"date-time":"2025-08-10T00:00:00Z","timestamp":1754784000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>Hate speech detection remains a significant challenge due to the nuanced and context-dependent nature of hateful language. Traditional classifiers, trained on specialized corpora, often struggle to accurately identify subtle or manipulated hate speech. This paper explores the potential of utilizing large language models (LLMs) to address these limitations. By leveraging their extensive training on diverse texts, LLMs demonstrate a superior ability to understand context, which is crucial for effective hate speech detection. We conduct a comprehensive evaluation of various LLMs on both binary and multi-label hate speech datasets to assess their performance. Our findings aim to clarify the extent to which LLMs can enhance hate speech classification accuracy, particularly in complex and challenging cases.<\/jats:p>","DOI":"10.3390\/computation13080196","type":"journal-article","created":{"date-parts":[[2025,8,11]],"date-time":"2025-08-11T08:10:32Z","timestamp":1754899832000},"page":"196","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Beyond Traditional Classifiers: Evaluating Large Language Models for Robust Hate Speech Detection"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9126-7613","authenticated-orcid":false,"given":"Basel","family":"Barakat","sequence":"first","affiliation":[{"name":"School of Computing, Goldsmiths University of London, London SE14 6NW, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5620-0277","authenticated-orcid":false,"given":"Sardar","family":"Jaf","sequence":"additional","affiliation":[{"name":"School of Engineering and Computer Science, University of Sunderland, Sunderland SR1 3SD, UK"}]}],"member":"1968","published-online":{"date-parts":[[2025,8,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Gr\u00f6ndahl, T., Pajola, L., Juuti, M., Conti, M., and Asokan, N. (2018, January 15\u201319). All you need is \u201clove\u201d evading hate speech detection. Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security, Toronto, ON, Canada.","DOI":"10.1145\/3270101.3270103"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"44337","DOI":"10.1109\/ACCESS.2022.3160712","article-title":"Political Hate Speech Detection and Lexicon Building: A Study in Taiwan","volume":"10","author":"Wang","year":"2022","journal-title":"IEEE Access"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Badjatiya, P., Gupta, S., Gupta, M., and Varma, V. (2017, January 3\u20137). Deep Learning for Hate Speech Detection in Tweets. Proceedings of the 26th International Conference on World Wide Web Companion, Republic and Canton of Geneva, CHE, Perth, Australia. WWW \u201917 Companion.","DOI":"10.1145\/3041021.3054223"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Jaf, S., and Barakat, B. (2024). Empirical Evaluation of Public HateSpeech Datasets. arXiv.","DOI":"10.2139\/ssrn.4504059"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Plaza-del Arco, F.M., Nozza, D., and Hovy, D. (2023, January 13). Respectful or toxic? using zero-shot learning with language models to detect hate speech. Proceedings of the 7th Workshop on Online Abuse and Harms (WOAH), Toronto, ON, Canada.","DOI":"10.18653\/v1\/2023.woah-1.6"},{"key":"ref_6","first-page":"2849","article-title":"Comparing Fine-Tuning, Zero and Few-Shot Strategies with Large Language Models in Hate Speech Detection in English","volume":"140","author":"Pan","year":"2024","journal-title":"CMES-Comput. Model. Eng. Sci."},{"key":"ref_7","unstructured":"Tunstall, L., Beeching, E., Lambert, N., Rajani, N., Rasul, K., Belkada, Y., Huang, S., Von Werra, L., Fourrier, C., and Habib, N. (2023). Zephyr: Direct distillation of lm alignment. arXiv."},{"key":"ref_8","unstructured":"Saha, P., Agrawal, A., Jana, A., Biemann, C., and Mukherjee, A. (2024). On Zero-Shot Counterspeech Generation by LLMs. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Nirmal, A., Bhattacharjee, A., Sheth, P., and Liu, H. (2024). Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales. arXiv.","DOI":"10.18653\/v1\/2024.woah-1.17"},{"key":"ref_10","unstructured":"Suryawanshi, S., Chakravarthi, B.R., Arcan, M., and Buitelaar, P. (2020, January 11\u201316). Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text. Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, Marseille, France."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Salminen, J., Almerekhi, H., Milenkovi\u0107, M., Jung, S.g., An, J., Kwak, H., and Jansen, B.J. (2018, January 25\u201328). Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media. Proceedings of the Twelfth International AAAI Conference on Web and Social Media, Palo Alto, CA, USA.","DOI":"10.1609\/icwsm.v12i1.15028"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"512","DOI":"10.1609\/icwsm.v11i1.14955","article-title":"Automated Hate Speech Detection and the Problem of Offensive Language","volume":"11","author":"Davidson","year":"2017","journal-title":"Proc. Int. AAAI Conf. Web Soc. Media"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"De Gibert, O., Perez, N., Garc\u00eda-Pablos, A., and Cuadros, M. (2018). Hate speech dataset from a white supremacy forum. arXiv.","DOI":"10.18653\/v1\/W18-5102"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Waseem, Z., and Hovy, D. (2016, January 13\u201315). Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. Proceedings of the NAACL Student Research Workshop, San Diego, CA, USA.","DOI":"10.18653\/v1\/N16-2013"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Qian, J., Bethke, A., Liu, Y., Belding, E., and Wang, W.Y. (2019). A Benchmark Dataset for Learning to Intervene in Online Hate Speech. arXiv.","DOI":"10.18653\/v1\/D19-1482"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Vidgen, B., Nguyen, D., Margetts, H., Rossini, P., Tromble, R., Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tur, D., and Beltagy, I. (2021, January 6\u201311). Introducing CAD: The contextual abuse dataset. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online.","DOI":"10.18653\/v1\/2021.naacl-main.182"},{"key":"ref_17","unstructured":"Kennedy, C.J., Bacon, G., Sahn, A., and von Vacano, C. (2020). Constructing interval variables via faceted Rasch measurement and multitask deep learning: A hate speech application. arXiv."},{"key":"ref_18","unstructured":"AI@Meta (2025, August 04). Llama 3 Model Card 2024. Available online: https:\/\/github.com\/meta-llama\/llama3\/blob\/main\/MODEL_CARD.md."},{"key":"ref_19","unstructured":"Microsoft (2024). Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. arXiv."},{"key":"ref_20","unstructured":"(2025, August 04). Teknium; Theemozilla; Karan4d; Huemin_art. Nous Hermes 2 Mistral 7B DPO. Available online: https:\/\/huggingface.co\/NousResearch\/Nous-Hermes-2-Mistral-7B-DPO."},{"key":"ref_21","unstructured":"Xu, C., Sun, Q., Zheng, K., Geng, X., Zhao, P., Feng, J., Tao, C., and Jiang, D. (2023). WizardLM: Empowering Large Language Models to Follow Complex Instructions. arXiv."},{"key":"ref_22","unstructured":"Song, Q., Liao, P., Zhao, W., Wang, Y., Hu, S., Zhen, H.L., Jiang, N., and Yuan, M. (2025). Harnessing On-Device Large Language Model: Empirical Results and Implications for AI PC. arXiv."},{"key":"ref_23","first-page":"2611","article-title":"The hateful memes challenge: Detecting hate speech in multimodal memes","volume":"33","author":"Kiela","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_24","unstructured":"Carvallo, A., Mendoza, M., Fernandez, M., Ojeda, M., Guevara, L., Varela, D., Borquez, M., Buzeta, N., and Ayala, F. (2025, January 1). Hate Explained: Evaluating NER-Enriched Text in Human and Machine Moderation of Hate Speech. Proceedings of the 9th Workshop on Online Abuse and Harms (WOAH), Vienna, Austria."},{"key":"ref_25","unstructured":"Tao, C., Shen, T., Gao, S., Zhang, J., Li, Z., Tao, Z., and Ma, S. (2024). Llms are also effective embedding models: An in-depth overview. arXiv."},{"key":"ref_26","unstructured":"Lin, L., Wang, L., Guo, J., and Wong, K.F. (2024). Investigating bias in llm-based bias detection: Disparities between llms and human perception. arXiv."}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/8\/196\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:27:41Z","timestamp":1760034461000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/8\/196"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,10]]},"references-count":26,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2025,8]]}},"alternative-id":["computation13080196"],"URL":"https:\/\/doi.org\/10.3390\/computation13080196","relation":{},"ISSN":["2079-3197"],"issn-type":[{"value":"2079-3197","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,10]]}}}