{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T07:21:15Z","timestamp":1780730475656,"version":"3.54.1"},"reference-count":41,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T00:00:00Z","timestamp":1752537600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>The proliferation of hate speech on social media platforms poses significant threats to digital safety, social cohesion, and freedom of expression. Detecting such content\u2014especially across diverse languages\u2014remains a challenging task due to linguistic complexity, cultural context, and resource limitations. To address these challenges, this study introduces a comprehensive approach for multilingual hate speech detection. To facilitate robust hate speech detection across diverse languages, this study makes several key contributions. First, we created a novel trilingual hate speech dataset consisting of 10,193 manually annotated tweets in English, Spanish, and Urdu. Second, we applied two innovative techniques\u2014joint multilingual and translation-based approaches\u2014for cross-lingual hate speech detection that have not been previously explored for these languages. Third, we developed detailed hate speech annotation guidelines tailored specifically to all three languages to ensure consistent and high-quality labeling. Finally, we conducted 41 experiments employing machine learning models with TF\u2013IDF features, deep learning models utilizing FastText and GloVe embeddings, and transformer-based models leveraging advanced contextual embeddings to comprehensively evaluate our approach. Additionally, we employed a large language model with advanced contextual embeddings to identify the best solution for the hate speech detection task. The experimental results showed that our GPT-3.5-turbo model significantly outperforms strong baselines, achieving up to an 8% improvement over XLM-R in Urdu hate speech detection and an average gain of 4% across all three languages. This research not only contributes a high-quality multilingual dataset but also offers a scalable and inclusive framework for hate speech detection in underrepresented languages.<\/jats:p>","DOI":"10.3390\/computers14070279","type":"journal-article","created":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T15:01:05Z","timestamp":1752591665000},"page":"279","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["A Large Language Model-Based Approach for Multilingual Hate Speech Detection on Social Media"],"prefix":"10.3390","volume":"14","author":[{"given":"Muhammad","family":"Usman","sequence":"first","affiliation":[{"name":"Centro de Investigaci\u00f3n en Computaci\u00f3n, Instituto Polit\u00e9cnico Nacional (CIC-IPN), Mexico City 07320, Mexico"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Muhammad","family":"Ahmad","sequence":"additional","affiliation":[{"name":"Centro de Investigaci\u00f3n en Computaci\u00f3n, Instituto Polit\u00e9cnico Nacional (CIC-IPN), Mexico City 07320, Mexico"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3901-3522","authenticated-orcid":false,"given":"Grigori","family":"Sidorov","sequence":"additional","affiliation":[{"name":"Centro de Investigaci\u00f3n en Computaci\u00f3n, Instituto Polit\u00e9cnico Nacional (CIC-IPN), Mexico City 07320, Mexico"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7295-5752","authenticated-orcid":false,"given":"Irina","family":"Gelbukh","sequence":"additional","affiliation":[{"name":"Centro de Investigaci\u00f3n en Computaci\u00f3n, Instituto Polit\u00e9cnico Nacional (CIC-IPN), Mexico City 07320, Mexico"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4454-8791","authenticated-orcid":false,"given":"Rolando Quintero","family":"Tellez","sequence":"additional","affiliation":[{"name":"Centro de Investigaci\u00f3n en Computaci\u00f3n, Instituto Polit\u00e9cnico Nacional (CIC-IPN), Mexico City 07320, Mexico"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,7,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"5489","DOI":"10.24294\/jipd.v8i8.5489","article-title":"The role of social media in shaping public opinion among Jordanian university students","volume":"8","author":"AlKhudari","year":"2024","journal-title":"J. Infrastruct. Policy Dev."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"620","DOI":"10.59613\/fm1dpm66","article-title":"The Role of Social Media in Shaping Public Opinion: A Comparative Analysis of Traditional vs. Digital Media Platforms","volume":"1","author":"Swastiningsih","year":"2024","journal-title":"J. Acad. Sci."},{"key":"ref_3","unstructured":"Tash, M.S., Ramos, L., Ahani, Z., Monroy, R., Calvo, H., and Sidorov, G. (2025). Online Social Support Detection in Spanish Social Media Texts. arXiv."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"31","DOI":"10.17323\/jle.2024.22443","article-title":"Hope Speech Detection Using Social Media Discourse (Posi-Vox-2024): A Transfer Learning Approach","volume":"10","author":"Ahmad","year":"2024","journal-title":"J. Lang. Educ."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ahmad, M., Ameer, I., Sharif, W., Usman, S., Muzamil, M., Hamza, A., Jalal, M., Batyrshin, I., and Sidorov, G. (2025). Multilingual hope speech detection from tweets using trans- fer learning models. Sci. Rep., 15.","DOI":"10.1038\/s41598-025-88687-w"},{"key":"ref_6","unstructured":"Ullah, F., Zamir, M.T., Ahmad, M., Sidorov, G., and Gelbukh, A. (2024, January 24). Hope: A multilingual approach to identifying positive communication in social media. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), Co-Located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS. org, Valladolid, Spain."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Arif, M., Shahiki Tash, M., Jamshidi, A., Ullah, F., Ameer, I., Kalita, J., Gelbukh, A., and Balouchzahi, F. (2024). Analyzing hope speech from psycholinguistic and emotional per- spectives. Sci. Rep., 14.","DOI":"10.1038\/s41598-024-74630-y"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Ahmad, M., Waqas, M., Hamza, A., Usman, S., Batyrshin, I., and Sidorov, G. (2025). UA-HSD-2025: Multi-Lingual Hate Speech Detection from Tweets Using Pre-Trained Transformers. Computers, 14.","DOI":"10.3390\/computers14060239"},{"key":"ref_9","unstructured":"Zamir, M., Tash, M., Ahani, Z., Gelbukh, A., and Sidorov, G. (2024, January 22). Lidoma@ dravidianlangtech 2024: Identifying hate speech in telugu code-mixed: A bert multilingual. Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, St. Julian\u2019s, Malta."},{"key":"ref_10","unstructured":"Ahani, Z., Tash, M.S., Tash, M., Gelbukh, A., and Gelbukh, I. (2024, January 24). Multiclass hope speech de- tection through transformer methods. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), Co-Located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS. org, Valladolid, Spain."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Pavlopoulos, J., Sorensen, J., Laugier, L., and Androutsopoulos, I. (2021, January 5\u20136). SemEval-2021 task 5: Toxic spans detection. Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), Bangkok, Thailand.","DOI":"10.18653\/v1\/2021.semeval-1.6"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Ali, M., Muhammad, A., Asad, M., Sajawal, M., Alexopoulos, C., and Charalabidis, Y. (2022, January 25\u201327). Towards perso-arabic urdu language hate detection using machine learning: A com- parative study based on a large dataset and time-complexity. Proceedings of the 26th Pan-Hellenic Conference on Informatics, Athens, Greece.","DOI":"10.1145\/3575879.3576011"},{"key":"ref_13","unstructured":"Perera, S.S., and Sumanathilaka, D.K. (2025, January 20). Machine Translation and Transliteration for Indo- Aryan Languages: A Systematic Review. Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages, Abu Dhabi, United Arab Emirates."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3232676","article-title":"A survey on automatic detection of hate speech in text","volume":"51","author":"Fortuna","year":"2018","journal-title":"ACM Comput. Surv."},{"key":"ref_15","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). Bert: Pre-training of deep bidirec- tional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Compu-Tational Linguistics: Human Language Technologies, Minneapolis, MN, USA. (long and short papers)."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Kolesnikova, O., Tash, M.S., Ahani, Z., Agrawal, A., Monroy, R., and Sidorov, G. (2025). Ad- vanced Machine Learning Techniques for Social Support Detection on Social Media. arXiv.","DOI":"10.1016\/j.heliyon.2025.e43437"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1007\/s13278-022-00920-w","article-title":"Fighting hate speech from bilingual hinglish speaker\u2019s perspective, a transformer-and translation-based approach","volume":"12","author":"Biradar","year":"2022","journal-title":"Soc. Netw. Anal. Min."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Ahmad, M., Sidorov, G., Amjad, M., Ameer, I., and Batyrshin, I. (2025). Opioid Crisis Detection in Social Media Discourse Using Deep Learning Approach. Information, 16.","DOI":"10.3390\/info16070545"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"e70525","DOI":"10.2196\/70525","article-title":"Sentiment Analysis Using a Large Language Model\u2013Based Approach to Detect Opioids Mixed With Other Substances Via Social Media: Method Development and Validation","volume":"5","author":"Ahmad","year":"2025","journal-title":"JMIR Infodemiol."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"111175","DOI":"10.1109\/ACCESS.2025.3579289","article-title":"UE-NER-2025: A GPT-Based Approach to Multi-Lingual Named Entity Recognition on Urdu and English","volume":"13","author":"Ahmad","year":"2025","journal-title":"IEEE Access"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"110245","DOI":"10.1109\/ACCESS.2023.3322101","article-title":"Bert-based sentiment analysis for low-resourced languages: A case study of urdu language","volume":"11","author":"Ashraf","year":"2023","journal-title":"IEEE Access"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1016\/j.avb.2018.05.003","article-title":"Hate speech review in the context of online social networks","volume":"40","author":"Chetty","year":"2018","journal-title":"Aggress. Violent Behav."},{"key":"ref_23","unstructured":"Aluru, S.S., Mathew, B., Saha, P., and Mukherjee, A. (2020). Deep learning models for multilin- gual hate speech detection. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"143177","DOI":"10.1109\/ACCESS.2024.3470901","article-title":"Fine-grained multilingual Hate speech detection using Explainable AI and Transformers","volume":"12","author":"Siddiqui","year":"2024","journal-title":"IEEE Access"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"777","DOI":"10.1007\/s10579-016-9367-2","article-title":"COUNTER: Corpus of Urdu news text reuse","volume":"51","author":"Sharjeel","year":"2017","journal-title":"Lang. Resour. Eval."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Mehmood, A., Farooq, M.S., Naseem, A., Rustam, F., Villar, M.G., Rodr\u00edguez, C.L., and Ashraf, I. (2022). Threatening URDU language detection from tweets using machine learning. Appl. Sci., 12.","DOI":"10.3390\/app122010342"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"2050008","DOI":"10.1142\/S2424922X20500084","article-title":"Roman Urdu headline news text classification using RNN, LSTM and CNN","volume":"12","author":"Kandhro","year":"2020","journal-title":"Adv. Data Sci. Adapt. Anal."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"121133","DOI":"10.1109\/ACCESS.2022.3216375","article-title":"Context-aware deep learning model for detection of roman urdu hate speech on social media platform","volume":"10","author":"Bilal","year":"2022","journal-title":"IEEE Access"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"27225","DOI":"10.1109\/ACCESS.2024.3367281","article-title":"Enhancing Hate Speech Detection in the Digital Age: A Novel Model Fusion Approach Leveraging a Comprehensive Dataset","volume":"12","author":"Sharif","year":"2024","journal-title":"IEEE Access"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"101119","DOI":"10.1016\/j.csl.2020.101119","article-title":"Emotion recognition in low-resource settings: An evaluation of automatic feature selection methods","volume":"65","author":"Haider","year":"2021","journal-title":"Comput. Speech Lang."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Azhar, N., and Latif, S. (2022, January 28\u201329). Roman urdu sentiment analysis using pre-trained distilbert and xlnet. Proceedings of the 2022 Fifth International Conference of Women in Data Science at Prince Sultan University (WiDS PSU), Riyadh, Saudi Arabia.","DOI":"10.1109\/WiDS-PSU54548.2022.00027"},{"key":"ref_32","unstructured":"Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv."},{"key":"ref_33","unstructured":"Gillioz, A., Casas, J., Mugellini, E., and Abou Khaled, O. (2020, January 6\u20139). Overview of the Transformer- based Models for NLP Tasks. Proceedings of the 2020 15th Conference on Computer Science and Information Systems (FedCSIS), Sofia, Bulgaria."},{"key":"ref_34","unstructured":"Wong, J.T., Zhang, C., Cao, X., Gimenes, P., Constantinides, G.A., Luk, W., and Zhao, Y. (2025). A3: An Analytical Low-Rank Approximation Framework for Attention. arXiv."},{"key":"ref_35","unstructured":"Fu, D.Y., Dao, T., Saab, K.K., Thomas, A.W., Rudra, A., and R\u00e9, C. (2022). Hungry hun- gry hippos: Towards language modeling with state space models. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Alrehili, A. (2019, January 3\u20137). Automatic hate speech detection on social media: A brief survey. Proceedings of the 2019 IEEE\/ACS 16th International Conference on Computer Systems and Applications (AICCSA), Abu Dhabi, United Arab Emirates.","DOI":"10.1109\/AICCSA47632.2019.9035228"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Ranasinghe, T., and Zampieri, M. (2020). Multilingual offensive language identification with cross- lingual embeddings. arXiv.","DOI":"10.18653\/v1\/2020.emnlp-main.470"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1515","DOI":"10.1007\/s10579-023-09637-4","article-title":"Label modification and boot- strapping for zero-shot cross-lingual hate speech detection","volume":"57","author":"Bigoulaeva","year":"2023","journal-title":"Lang. Resour. Eval."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1016\/j.procs.2019.01.202","article-title":"Deep learning-based sentiment analysis for roman urdu text","volume":"147","author":"Ghulam","year":"2019","journal-title":"Procedia Comput. Sci."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Vidgen, B., and Derczynski, L. (2020). Directions in abusive language training data, a systematic review: Garbage in, garbage out. PLoS ONE, 15.","DOI":"10.1371\/journal.pone.0243300"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Pereira-Kohatsu, J.C., Quijano-S\u00e1nchez, L., Liberatore, F., and Camacho-Collados, M. (2019). Detecting and monitoring hate speech in Twitter. Sensors, 19.","DOI":"10.3390\/s19214654"}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/7\/279\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:10:09Z","timestamp":1760033409000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/7\/279"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,15]]},"references-count":41,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2025,7]]}},"alternative-id":["computers14070279"],"URL":"https:\/\/doi.org\/10.3390\/computers14070279","relation":{},"ISSN":["2073-431X"],"issn-type":[{"value":"2073-431X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,15]]}}}