{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,27]],"date-time":"2026-04-27T21:24:45Z","timestamp":1777325085772,"version":"3.51.4"},"reference-count":45,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:00:00Z","timestamp":1750204800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>The rise in social media has improved communication but also amplified the spread of hate speech, creating serious societal risks. Automated detection remains difficult due to subjectivity, linguistic diversity, and implicit language. While prior research focuses on high-resource languages, this study addresses the underexplored multilingual challenges of Arabic and Urdu hate speech through a comprehensive approach. To achieve this objective, this study makes four different key contributions. First, we have created a unique multi-lingual, manually annotated binary and multi-class dataset (UA-HSD-2025) sourced from X, which contains the five most important multi-class categories of hate speech. Secondly, we created detailed annotation guidelines to make a robust and perfect hate speech dataset. Third, we explore two strategies to address the challenges of multilingual data: a joint multilingual and translation-based approach. The translation-based approach involves converting all input text into a single target language before applying a classifier. In contrast, the joint multilingual approach employs a unified model trained to handle multiple languages simultaneously, enabling it to classify text across different languages without translation. Finally, we have employed state-of-the-art 54 different experiments using different machine learning using TF-IDF, deep learning using advanced pre-trained word embeddings such as FastText and Glove, and pre-trained language-based models using advanced contextual embeddings. Based on the analysis of the results, our language-based model (XLM-R) outperformed traditional supervised learning approaches, achieving 0.99 accuracy in binary classification for Arabic, Urdu, and joint-multilingual datasets, and 0.95, 0.94, and 0.94 accuracy in multi-class classification for joint-multilingual, Arabic, and Urdu datasets, respectively.<\/jats:p>","DOI":"10.3390\/computers14060239","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:08:39Z","timestamp":1750309719000},"page":"239","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["UA-HSD-2025: Multi-Lingual Hate Speech Detection from Tweets Using Pre-Trained Transformers"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-8799-8212","authenticated-orcid":false,"given":"Muhammad","family":"Ahmad","sequence":"first","affiliation":[{"name":"Centro de Investigaci\u00f3n en Computaci\u00f3n, Instituto Polit\u00e9cnico Nacional (CIC-PN), Mexico City 07738, Mexico"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Muhammad","family":"Waqas","sequence":"additional","affiliation":[{"name":"Department of Software Engineering, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ameer","family":"Hamza","sequence":"additional","affiliation":[{"name":"Department of Software Engineering, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4698-6461","authenticated-orcid":false,"given":"Sardar","family":"Usman","sequence":"additional","affiliation":[{"name":"School of Informatics and Robotics, Institute of Arts and Culture, Lahore 54000, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0241-7902","authenticated-orcid":false,"given":"Ildar","family":"Batyrshin","sequence":"additional","affiliation":[{"name":"Centro de Investigaci\u00f3n en Computaci\u00f3n, Instituto Polit\u00e9cnico Nacional (CIC-PN), Mexico City 07738, Mexico"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3901-3522","authenticated-orcid":false,"given":"Grigori","family":"Sidorov","sequence":"additional","affiliation":[{"name":"Centro de Investigaci\u00f3n en Computaci\u00f3n, Instituto Polit\u00e9cnico Nacional (CIC-PN), Mexico City 07738, Mexico"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,6,18]]},"reference":[{"key":"ref_1","unstructured":"Ullah, F., Zamir, M.T., Ahmad, M., Sidorov, G., and Gelbukh, A. (2024, January 24). Hope: A multilingual approach to identifying positive communication in social media. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), Valladolid, Spain."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"31","DOI":"10.17323\/jle.2024.22443","article-title":"Hope speech detection using social media discourse (Posi-Vox-2024): A transfer learning approach","volume":"10","author":"Ahmad","year":"2024","journal-title":"J. Lang. Educ."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Ahmad, M., Ameer, I., Sharif, W., Usman, S., Muzamil, M., Hamza, A., and Sidorov, G. (2025). Multilingual hope speech detection from tweets using transfer learning models. Sci. Rep., 15.","DOI":"10.1038\/s41598-025-88687-w"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"529","DOI":"10.13053\/cys-28-2-4556","article-title":"Deep Learning-Based Text Classification to Improve Web Service Discovery","volume":"28","author":"Meghazi","year":"2024","journal-title":"Comput. Y Sistemas"},{"key":"ref_5","first-page":"5290","article-title":"Enhancing text classification using BERT: A transfer learning approach","volume":"28","author":"Naeem","year":"2024","journal-title":"Comput. Y Sistemas"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"88364","DOI":"10.1109\/ACCESS.2021.3089515","article-title":"Advances in Machine Learning Algorithms for Hate Speech Detection in Social Media: A Review","volume":"9","author":"Mullah","year":"2021","journal-title":"IEEE Access"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"101690","DOI":"10.1016\/j.csl.2024.101690","article-title":"Generalizing hate speech detection using multi-task learning: A case study of political public figures","volume":"89","author":"Yuan","year":"2025","journal-title":"Comput. Speech Lang."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"681","DOI":"10.13053\/cys-28-2-4130","article-title":"Comparing pre-trained language models for Arabic hate speech detection","volume":"28","author":"Daouadi","year":"2024","journal-title":"Comput. Y Sistemas"},{"key":"ref_9","first-page":"2115","article-title":"Cyberbullying-related hate speech detection using shallow-to-deep learning","volume":"75","author":"Sultan","year":"2023","journal-title":"Comput. Mater. Contin."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"59474","DOI":"10.1109\/ACCESS.2024.3393295","article-title":"Social media forensics: An adaptive cyberbullying-related hate speech detection approach based on neural networks with uncertainty","volume":"12","author":"Ibrahim","year":"2024","journal-title":"IEEE Access"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"103454","DOI":"10.1016\/j.ipm.2023.103454","article-title":"Cyberbullying detection for low-resource languages and dialects: Review of the state of the art","volume":"60","author":"Mahmud","year":"2023","journal-title":"Inf. Process. Manag."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Raj, C., Agarwal, A., Bharathy, G., Narayan, B., and Prasad, M. (2021). Cyberbullying detection: Hybrid models based on machine learning and natural language processing techniques. Electronics, 10.","DOI":"10.3390\/electronics10222810"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Singh, V.K., Ghosh, S., and Jose, C. (2017, January 6\u201311). Toward multimodal cyberbullying detection. Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems, Denver, CO, USA.","DOI":"10.1145\/3027063.3053169"},{"key":"ref_14","first-page":"4794227","article-title":"Building towards automated cyberbullying detection: A comparative analysis","volume":"2022","author":"Moradpoor","year":"2022","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"783","DOI":"10.25046\/aj060187","article-title":"Text mining techniques for cyberbullying detection: State of the art","volume":"6","author":"Bayari","year":"2021","journal-title":"Adv. Sci. Technol. Eng. Syst. J."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1186\/s40537-022-00619-x","article-title":"Detection of fake news and hate speech for Ethiopian languages: A systematic review of the approaches","volume":"9","author":"Demilie","year":"2022","journal-title":"J. Big Data"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"232","DOI":"10.1016\/j.procs.2021.05.086","article-title":"Aracovid19-mfh: Arabic covid-19 multi-label fake news & hate speech detection dataset","volume":"189","author":"Ameur","year":"2021","journal-title":"Procedia Comput. Sci."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Liu, M., Liu, Y., Fu, R., Wen, Z., Tao, J., Liu, X., and Li, G. (2024, January 7\u201310). Exploring the role of audio in multimodal misinformation detection. Proceedings of the 2024 IEEE 14th International Symposium on Chinese Spoken Language Processing (ISCSLP), Beijing, China.","DOI":"10.1109\/ISCSLP63861.2024.10800162"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3697349","article-title":"Multi-modal misinformation detection: Approaches, challenges and opportunities","volume":"57","author":"Abdali","year":"2024","journal-title":"ACM Comput. Surv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Cinelli, M., Pelicon, A., Mozeti\u010d, I., Quattrociocchi, W., Novak, P.K., and Zollo, F. (2021). Dynamics of online hate and misinformation. Sci. Rep., 11.","DOI":"10.1038\/s41598-021-01487-w"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Shang, L., Kou, Z., Zhang, Y., and Wang, D. (2021, January 15\u201318). A multimodal misinformation detector for covid-19 short videos on tiktok. Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA.","DOI":"10.1109\/BigData52589.2021.9671928"},{"key":"ref_22","unstructured":"Bade, G., Kolesnikova, O., Sidorov, G., and Oropeza, J. (2024, January 22). Social media hate and offensive speech detection using machine learning method. Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, St. Julian\u2019s, Malta."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Nghiem, H., and Daum\u00e9, H. (2024). HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models. arXiv.","DOI":"10.18653\/v1\/2024.findings-emnlp.343"},{"key":"ref_24","first-page":"16","article-title":"Abusive Speech Detection Method for Ukrainian Language Used Recurrent Neural Network","volume":"3","author":"Krak","year":"2024","journal-title":"COLINS"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Sekkate, S., Chebbi, S., Adib, A., and Jebara, S.B. (2024, January 21\u201323). A deep learning framework for offensive speech detection. Proceedings of the 2024 IEEE 12th International Symposium on Signal, Image, Video and Communications (ISIVC), Marrakech, Morocco.","DOI":"10.1109\/ISIVC61350.2024.10577928"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1504\/IJSNET.2024.142516","article-title":"Unsupervised offensive speech detection for multimedia based on multilingual BERT","volume":"46","author":"Liu","year":"2024","journal-title":"Int. J. Sens. Netw."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"51009","DOI":"10.1007\/s11042-023-17470-8","article-title":"A comprehensive survey on machine learning approaches for fake news detection","volume":"83","author":"Alghamdi","year":"2024","journal-title":"Multimed. Tools Appl."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1016\/j.gr.2024.03.013","article-title":"Promoting sustainability in developing Countries: A Machine Learning-based approach to understanding the relationship between green investment and environmental degradation","volume":"132","author":"Khan","year":"2024","journal-title":"Gondwana Res."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1058","DOI":"10.1016\/j.procs.2024.04.100","article-title":"An efficient sarcasm detection using linguistic features and ensemble machine learning","volume":"235","author":"Pradhan","year":"2024","journal-title":"Procedia Comput. Sci."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Ganganwar, V., Manvainder Singh, M., Patil, P., and Joshi, S. (2024). Sarcasm and Humor Detection in Code-Mixed Hindi Data: A Survey. International Conference on Computing and Machine Learning, Springer Nature.","DOI":"10.1007\/978-981-97-6588-1_34"},{"key":"ref_31","unstructured":"Khazeni, M., Heydari, M., and Albadvi, A. (2024). Persian Slang Text Conversion to Formal and Deep Learning of Persian Short Texts on Social Media for Sentiment Classification. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Sundaram, A., Subramaniam, H., Ab Hamid, S.H., and Nor, A.M. (2024, January 22\u201323). A three-step procedural paradigm for domain-specific social media slang analytics. Proceedings of the 2024 International Conference on Trends in Quantum Computing and Emerging Business Technologies, Pune, India.","DOI":"10.1109\/TQCEBT59414.2024.10545286"},{"key":"ref_33","unstructured":"Embassy of Sri Lanka, Saudi Arabia (2025, April 12). Arabic, Spoken by over 450 Million People and Holding Official Status in Nearly 25 Countries, Is a Global Language with Immense Cultural Significance. Available online: https:\/\/slemb.org.sa\/2024\/arabic-spoken-by-over-450-million-people-and-holding-official-status-in-nearly-25-countries-is-a-global-language-with-immense-cultural-significance\/."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1007\/s42452-024-06425-9","article-title":"Cost-effective time-efficient subnational-level surveillance using Twitter: Kingdom of Saudi Arabia case study","volume":"7","author":"Elteir","year":"2025","journal-title":"Discov. Appl. Sci."},{"key":"ref_35","unstructured":"Ethnologue (2025, April 12). List of Languages by Total Number of Speakers. Available online: https:\/\/en.wikipedia.org\/wiki\/List_of_languages_by_total_number_of_speakers."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Alshaabi, T., Dewhurst, D.R., Minot, J.R., Arnold, M.V., Adams, J.L., Danforth, C.M., and Dodds, P.S. (2020). The growing amplification of social media: Measuring temporal and social contagion dynamics for over 150 languages on Twitter for 2009\u20132020. arXiv.","DOI":"10.1140\/epjds\/s13688-021-00271-0"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"3704","DOI":"10.1021\/acssensors.5c00630","article-title":"Robust odor detection in electronic nose using transfer-learning powered Scentformer model","volume":"10","author":"Ni","year":"2025","journal-title":"ACS Sens."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Alkomah, F., and Ma, X. (2022). A literature review of textual hate speech detection methods and datasets. Information, 13.","DOI":"10.14569\/IJACSA.2022.01308100"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"20871","DOI":"10.1109\/ACCESS.2025.3532397","article-title":"Hate Speech Detection using Large Language Models: A Comprehensive Review","volume":"13","author":"Albladi","year":"2025","journal-title":"IEEE Access"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"125843","DOI":"10.1016\/j.eswa.2024.125843","article-title":"Self-supervised hate speech detection in norwegian texts with lexical and semantic augmentations","volume":"264","author":"Hashmi","year":"2025","journal-title":"Expert Syst. Appl."},{"key":"ref_41","unstructured":"Chavinda, K., and Thayasivam, U. (2025, January 19). A Dual Contrastive Learning Framework for Enhanced Hate Speech Detection in Low-Resource Languages. Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025), Abu Dhabi, United Arab Emirates."},{"key":"ref_42","unstructured":"Thapa, S., Rauniyar, K., Jafri, F.A., Adhikari, S., Sarveswaran, K., Bal, B.K., and Naseem, U. (2025, January 19). Natural language understanding of devanagari script languages: Language identification, hate speech and its target detection. Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025), Abu Dhabi, United Arab Emirates."},{"key":"ref_43","unstructured":"Khadka, P., Bk, A., Acharya, A., Kc, B., Shrestha, S., and Thapa, R. (2025, January 19). Nepali Transformers@ NLU of Devanagari Script Languages 2025: Detection of Language, Hate Speech and Targets. Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025), Abu Dhabi, United Arab Emirates."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Finch, W.H., and French, B.F. (2018). Educational and Psychological Measurement, Routledge.","DOI":"10.4324\/9781315650951"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1007\/s11135-014-0003-1","article-title":"Fleiss\u2019 kappa statistic without paradoxes","volume":"49","author":"Falotico","year":"2015","journal-title":"Qual. Quant."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/6\/239\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:54:26Z","timestamp":1760032466000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/6\/239"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,18]]},"references-count":45,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2025,6]]}},"alternative-id":["computers14060239"],"URL":"https:\/\/doi.org\/10.3390\/computers14060239","relation":{},"ISSN":["2073-431X"],"issn-type":[{"value":"2073-431X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,18]]}}}