{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T16:14:20Z","timestamp":1779380060306,"version":"3.53.1"},"reference-count":56,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,2,20]],"date-time":"2024-02-20T00:00:00Z","timestamp":1708387200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100010450","name":"Ministry of Higher Education and Scientific Research","doi-asserted-by":"publisher","award":["ICT-Ict\/1\/2021."],"award-info":[{"award-number":["ICT-Ict\/1\/2021."]}],"id":[{"id":"10.13039\/100010450","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>Hate Speech Detection in Arabic presents a multifaceted challenge due to the broad and diverse linguistic terrain. With its multiple dialects and rich cultural subtleties, Arabic requires particular measures to address hate speech online successfully. To address this issue, academics and developers have used natural language processing (NLP) methods and machine learning algorithms adapted to the complexities of Arabic text. However, many proposed methods were hampered by a lack of a comprehensive dataset\/corpus of Arabic hate speech. In this research, we propose a novel multi-class public Arabic dataset comprised of 403,688 annotated tweets categorized as extremely positive, positive, neutral, or negative based on the presence of hate speech. Using our developed dataset, we additionally characterize the performance of multiple machine learning models for Hate speech identification in Arabic Jordanian dialect tweets. Specifically, the Word2Vec, TF-IDF, and AraBert text representation models have been applied to produce word vectors. With the help of these models, we can provide classification models with vectors representing text. After that, seven machine learning classifiers have been evaluated: Support Vector Machine (SVM), Logistic Regression (LR), Naive Bays (NB), Random Forest (RF), AdaBoost (Ada), XGBoost (XGB), and CatBoost (CatB). In light of this, the experimental evaluation revealed that, in this challenging and unstructured setting, our gathered and annotated datasets were rather efficient and generated encouraging assessment outcomes. This will enable academics to delve further into this crucial field of study.<\/jats:p>","DOI":"10.3389\/frai.2024.1345445","type":"journal-article","created":{"date-parts":[[2024,2,20]],"date-time":"2024-02-20T05:27:41Z","timestamp":1708406861000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":22,"title":["Hate speech detection in the Arabic language: corpus design, construction, and evaluation"],"prefix":"10.3389","volume":"7","author":[{"given":"Ashraf","family":"Ahmad","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mohammad","family":"Azzeh","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Eman","family":"Alnagi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Qasem","family":"Abu Al-Haija","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dana","family":"Halabi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Abdullah","family":"Aref","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yousef","family":"AbuHour","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1965","published-online":{"date-parts":[[2024,2,20]]},"reference":[{"key":"B1","first-page":"109","article-title":"\u201cQuick and simple approach for detecting hate speech in arabic tweets,\u201d","author":"Abuzayed","year":"2020","journal-title":"Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, With a Shared Task on Offensive Language Detection"},{"key":"B2","doi-asserted-by":"publisher","first-page":"170","DOI":"10.1109\/ESOLEC54569.2022.10009167","article-title":"\u201cFine-tuning arabic pre-trained transformer models for egyptian-arabic dialect offensive language and hate speech detection and classification,\u201d","author":"Ahmed","year":"2022","journal-title":"2022 20th International Conference on Language Engineering (ESOLEC)"},{"key":"B3","doi-asserted-by":"publisher","first-page":"351","DOI":"10.26438\/ijcse\/v5i10.351354","article-title":"A study on positive and negative effects of social media on society","volume":"5","author":"Akram","year":"2017","journal-title":"Int. J. Comput. Sci. Eng"},{"key":"B4","doi-asserted-by":"publisher","first-page":"114","DOI":"10.1109\/ASAR.2017.8067771","article-title":"\u201cArabic language sentiment analysis on health services,\u201d","author":"Alayba","year":"2017","journal-title":"2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR)"},{"key":"B5","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1109\/ASONAM.2018.8508247","article-title":"\u201cAre they our brothers? Analysis and detection of religious hate speech in the arabic twittersphere,\u201d","author":"Albadi","year":"2018","journal-title":"2018 IEEE\/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)"},{"key":"B6","first-page":"69","article-title":"\u201cArabic offensive and hate speech detection using a cross-corpora multi-task learning model,\u201d","volume-title":"Informatics","author":"Aldjanabi","year":"2021"},{"key":"B7","first-page":"13","article-title":"The effect of social media usage on students\u201d e-learning acceptance in higher education: a case study from the United Arab Emirates","volume":"3","author":"Alghizzawi","year":"2019","journal-title":"Int. J. Inf. Technol. Lang. Stud"},{"key":"B8","doi-asserted-by":"publisher","first-page":"83","DOI":"10.5121\/csit.2019.90208","article-title":"Detection of hate speech in social networks: a survey on multilingual corpus","volume":"9","author":"Al-Hassan","year":"2019","journal-title":"Comput. Sci. Inform. Technol"},{"key":"B9","doi-asserted-by":"publisher","first-page":"273","DOI":"10.3390\/info13060273","article-title":"A literature review of textual hate speech detection methods and datasets","volume":"13","author":"Alkomah","year":"2022","journal-title":"Information"},{"key":"B10","doi-asserted-by":"publisher","first-page":"863","DOI":"10.1109\/SMC52423.2021.9659134","article-title":"\u201cSemi-supervised self-learning for Arabic hate speech detection,\u201d","author":"Alsafari","year":"2021","journal-title":"2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC)"},{"key":"B11","doi-asserted-by":"publisher","first-page":"526","DOI":"10.1109\/ICTAI50040.2020.00087","article-title":"\u201cDeep learning ensembles for hate speech detection,\u201d","author":"Alsafari","year":"","journal-title":"2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI)"},{"key":"B12","doi-asserted-by":"publisher","first-page":"100096","DOI":"10.1016\/j.osnem.2020.100096","article-title":"Hate and offensive speech detection on Arabic social media","volume":"19","author":"Alsafari","year":"","journal-title":"Online Soc. Netw. Media"},{"key":"B13","first-page":"12","article-title":"\u201cHate speech detection in Saudi Twitter sphere: a deep learning approach,\u201d","author":"Alshaalan","year":"2020","journal-title":"Proceedings of the Fifth Arabic Natural Language Processing Workshop"},{"key":"B14","doi-asserted-by":"publisher","first-page":"8614","DOI":"10.3390\/app10238614","article-title":"A deep learning approach for automatic hate speech detection in the Saudi Twitter sphere","volume":"10","author":"Alshalan","year":"2020","journal-title":"Appl. Sci"},{"key":"B15","doi-asserted-by":"publisher","first-page":"109","DOI":"10.14569\/IJACSA.2022.01305109","article-title":"Bert-based approach to Arabic hate speech and offensive language detection in Twitter: exploiting emojis and sentiment analysis","volume":"13","author":"Althobaiti","year":"2022","journal-title":"Int. J. Adv. Comput. Sci. Applic"},{"key":"B16","author":"Al-Twairesh","year":"2016","journal-title":"Sentiment analysis of Twitter: a study on the Saudi community"},{"key":"B17","doi-asserted-by":"publisher","first-page":"232","DOI":"10.1016\/j.procs.2021.05.086","article-title":"Aracovid19-mfh: Arabic covid-19 multi-label fake news &hate speech detection dataset","volume":"189","author":"Ameur","year":"2021","journal-title":"Procedia Comput. Sci"},{"key":"B18","doi-asserted-by":"publisher","first-page":"6010","DOI":"10.3390\/app12126010","article-title":"Arabic hate speech detection using deep recurrent neural networks","volume":"12","author":"Anezi","year":"2022","journal-title":"Appl. Sci"},{"key":"B19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40561-020-00118-7","article-title":"Exploring the role of social media in collaborative learning the new domain of learning","volume":"7","author":"Ansari","year":"2020","journal-title":"Smart Lear. Environ"},{"key":"B20","doi-asserted-by":"publisher","first-page":"81","DOI":"10.5121\/csit.2020.100507","article-title":"\u201cHate speech detection of Arabic short text,\u201d","author":"Aref","year":"2020","journal-title":"CS IT Conference Proceedings"},{"key":"B21","doi-asserted-by":"crossref","first-page":"701","DOI":"10.1007\/978-3-030-75762-5_55","article-title":"\u201cAngrybert: joint learning target and emotion for hate speech detection,\u201d","volume-title":"Pacific-Asia Conference on Knowledge Discovery and Data Mining","author":"Awal","year":"2021"},{"key":"B22","first-page":"36","article-title":"\u201cRobust sentiment detection on Twitter from biased and noisy data,\u201d","author":"Barbosa","year":"2010","journal-title":"Coling 2010: Posters"},{"key":"B23","first-page":"4177","article-title":"\u201cA Turkish hate speech dataset and detection system,\u201d","author":"Beyhan","year":"2022","journal-title":"Proceedings of the Thirteenth Language Resources and Evaluation Conference"},{"key":"B24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3522598.3522601","article-title":"Nipping in the bud: detection, diffusion and mitigation of hate speech on social media","volume":"2022","author":"Chakraborty","year":"2022","journal-title":"ACM SIGWEB Newslett"},{"key":"B25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3580393","article-title":"Detection and cross-domain evaluation of cyberbullying in Facebook activity contents for Turkish","volume":"22","author":"Coban","year":"2023","journal-title":"ACM Trans. Asian Low-Resour. Lang. Infor. Proc"},{"key":"B26","doi-asserted-by":"publisher","first-page":"4001","DOI":"10.1007\/s13369-021-05383-3","article-title":"A deep learning framework for automatic detection of hate speech embedded in Arabic tweets. Arabian J","volume":"46","author":"Duwairi","year":"2021","journal-title":"Sci. Eng"},{"key":"B27","doi-asserted-by":"publisher","first-page":"453","DOI":"10.5220\/0008954004530460","article-title":"\u201cHate speech detection using word embedding and deep learning in the Arabic language context,\u201d","author":"Faris","year":"2020","journal-title":"ICPRAM"},{"key":"B28","first-page":"6786","article-title":"\u201cToxic, hateful, offensive or abusive? What are we really classifying? An empirical analysis of hate speech datasets,\u201d","author":"Fortuna","year":"2020","journal-title":"Proceedings of the 12th Language Resources and Evaluation Conference"},{"key":"B29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s11115-012-0202-y","article-title":"Harassment at the workplace: a practical review of the laws in the United Kingdom and the United States of America","volume":"14","author":"Gilani","year":"2014","journal-title":"Public Organiz. Rev"},{"key":"B30","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1402.3722","article-title":"word2vec explained: deriving mikolov et al.'s negative-sampling word-embedding method","author":"Goldberg","year":"2014","journal-title":"arXiv preprint arXiv:1402.3722"},{"key":"B31","first-page":"76","article-title":"\u201cArabic offensive language detection with attention-based deep neural networks,\u201d","author":"Haddad","year":"2020","journal-title":"Proceedings of the 4th workshop on Open-Source Arabic Corpora and Processing Tools, With A Shared Task on Offensive Language Detection"},{"key":"B32","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-demos.14","article-title":"\u201cASAD: Arabic social media analytics and understanding,\u201d","author":"Hassan","year":"2021","journal-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations"},{"key":"B33","article-title":"Arabic offensive language detection using machine learning and ensemble machine learning approaches","author":"Husain","year":"2020","journal-title":"arXiv preprint arXiv:2005.08946"},{"key":"B34","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3421504","article-title":"A survey of offensive language detection for the Arabic language","volume":"20","author":"Husain","year":"2021","journal-title":"ACM Trans. Asian Low-Resour. Lang. Infor. Proc"},{"key":"B35","doi-asserted-by":"publisher","first-page":"126232","DOI":"10.1016\/j.neucom.2023.126232","article-title":"A systematic review of hate speech automatic detection using natural language processing","volume":"546","author":"Jahan","year":"2023","journal-title":"Neurocomputing"},{"key":"B36","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1007\/s10796-017-9810-y","article-title":"Advances in social media research: past, present and future","volume":"20","author":"Kapoor","year":"2018","journal-title":"Inform. Syst. Front"},{"key":"B37","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s43926-023-00030-9","article-title":"AR hate detector: detection of hate speech from standard and dialectal Arabic tweets","volume":"3","author":"Khezzar","year":"2023","journal-title":"Discov. Internet Things"},{"key":"B38","doi-asserted-by":"publisher","first-page":"159","DOI":"10.2307\/2529310","article-title":"The measurement of observer agreement for categorical data","volume":"33","author":"Landis","year":"1977","journal-title":"Biometrics"},{"key":"B39","doi-asserted-by":"publisher","first-page":"4663","DOI":"10.1007\/s40747-021-00608-2","article-title":"Ethos: a multi-label hate speech detection dataset","volume":"8","author":"Mollas","year":"2022","journal-title":"Complex Intell. Syst"},{"key":"B40","doi-asserted-by":"crossref","first-page":"928","DOI":"10.1007\/978-3-030-36687-2_77","article-title":"\u201cA bert-based transfer learning approach for hate speech detection in online social media,\u201d","volume-title":"Complex Networks and Their Applications VIII: Volume 1 Proceedings of the Eighth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS","author":"Mozafari","year":"2020"},{"key":"B41","doi-asserted-by":"publisher","first-page":"52","DOI":"10.18653\/v1\/W17-3008","article-title":"\u201cAbusive language detection on Arabic social media,\u201d","author":"Mubarak","year":"2017","journal-title":"Proceedings of the First Workshop on Abusive Language"},{"key":"B42","doi-asserted-by":"publisher","first-page":"72526","DOI":"10.1109\/ACCESS.2022.3188688","article-title":"Detecting Islamic radicalism Arabic tweets using natural language processing","volume":"10","author":"Mursi","year":"2022","journal-title":"IEEE Access"},{"key":"B43","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1016\/j.ijinfomgt.2014.09.004","article-title":"Social media research: theories, constructs, and conceptual frameworks","volume":"35","author":"Ngai","year":"2015","journal-title":"Int. J. Inform. Manage"},{"key":"B44","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s43681-023-00281-w","article-title":"Merging public health and automated approaches to address online hate speech","volume":"12","author":"Nguyen","year":"2023","journal-title":"AI Ethics"},{"key":"B45","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1007\/978-3-030-44289-7_24","article-title":"\u201cComparative performance of machine learning and deep learning algorithms for Arabic hate speech detection in OSNS,\u201d","volume-title":"Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020)","author":"Omar","year":"2020"},{"key":"B46","first-page":"29","article-title":"\u201cUsing TF-IDF to determine word relevance in document queries,\u201d","volume-title":"Proceedings of the First Instructional Conference on Machine Learning","author":"Ramos","year":"2003"},{"key":"B47","first-page":"2268","article-title":"\u201cAn Arabic twitter corpus for subjectivity and sentiment analysis,\u201d","author":"Refaee","year":"2014","journal-title":"LREC"},{"key":"B48","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1007\/978-981-16-0586-4_37","article-title":"\u201cHate speech detection in the Bengali language: A dataset and its baseline evaluation,\u201d","volume-title":"Proceedings of International Joint Conference on Advances in Computational Intelligence: IJCACI 2020","author":"Romim","year":"2021"},{"key":"B49","first-page":"253","article-title":"\u201cHate speech detection in social media for the Kurdish language,\u201d","volume-title":"The International Conference on Innovations in Computing Research","author":"Saeed","year":"2022"},{"key":"B50","doi-asserted-by":"publisher","first-page":"208","DOI":"10.1109\/ICCICC57084.2022.10101577","article-title":"\u201cArabic hate speech detection system based on Arabert,\u201d","author":"Salomon","year":"2022","journal-title":"2022 IEEE 21st International Conference on Cognitive Informatics &Cognitive Computing (ICCI* CC)"},{"key":"B51","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18653\/v1\/W17-1101","article-title":"\u201cA survey on hate speech detection using natural language processing,\u201d","author":"Schmidt","year":"2017","journal-title":"Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media"},{"key":"B52","doi-asserted-by":"publisher","first-page":"71","DOI":"10.7753\/IJCATR0502.1006","article-title":"Social media its impact with positive and negative aspects","volume":"5","author":"Siddiqui","year":"2016","journal-title":"Int. J. Comput. Appl. Technol. Res"},{"key":"B53","doi-asserted-by":"publisher","first-page":"296","DOI":"10.1016\/j.chb.2016.01.002","article-title":"To use or not to use? Social media in higher education in developing countries","volume":"58","author":"Sobaih","year":"2016","journal-title":"Comput. Hum. Behav"},{"key":"B54","doi-asserted-by":"crossref","first-page":"329","DOI":"10.7827\/TurkishStudies.54730","article-title":"Instances of hate discourse in Turkish and English","volume":"17","author":"Yal\u00e7\u0131nkaya","year":"2022","journal-title":"Turkish Stud. Lang. Liter"},{"key":"B55","doi-asserted-by":"publisher","first-page":"100250","DOI":"10.1016\/j.osnem.2023.100250","article-title":"Session-based cyberbullying detection in social media: a survey","volume":"36","author":"Yi","year":"2023","journal-title":"Online Soc. Netw. Media"},{"key":"B56","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1186\/s40359-023-01243-x","article-title":"Pros cons: impacts of social media on mental health","volume":"11","author":"Zsila","year":"2023","journal-title":"BMC Psychol"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2024.1345445\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,20]],"date-time":"2024-02-20T05:27:58Z","timestamp":1708406878000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2024.1345445\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,20]]},"references-count":56,"alternative-id":["10.3389\/frai.2024.1345445"],"URL":"https:\/\/doi.org\/10.3389\/frai.2024.1345445","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,20]]},"article-number":"1345445"}}