{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,27]],"date-time":"2026-05-27T18:12:45Z","timestamp":1779905565286,"version":"3.53.1"},"reference-count":47,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2026,2,3]],"date-time":"2026-02-03T00:00:00Z","timestamp":1770076800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union-NextGenerationEU, through the National Recovery and Resilience Plan of the Republic of Bulgaria","award":["BG-RRP-2.013-0001"],"award-info":[{"award-number":["BG-RRP-2.013-0001"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Sentiment analysis of Arabic text, particularly on social media platforms, presents a formidable set of unique challenges that stem from the language\u2019s complex morphology, its numerous dialectal variations, and the frequent and nuanced use of emojis to convey emotional context. This paper presents SiAraSent, a hybrid framework that integrates traditional text representations, emoji-aware features, and deep contextual embeddings based on Arabic transformers. Starting from a strong and fully interpretable baseline built on Term Frequency\u2013Inverse Definition Frequency (TF\u2013IDF)-weighted character and word N-grams combined with emoji embeddings, we progressively incorporate SinaTools for linguistically informed preprocessing and AraBERT for contextualized encodings. The framework is evaluated on a large-scale dataset of 58,751 Arabic tweets labeled for sentiment polarity. Our design works within four experimental configurations: (1) a baseline traditional machine learning architecture that employs TF-IDF, N-grams, and emoji features with an Support Vector Machine (SVM) classifier; (2) an Large-language Model (LLM) feature extraction approach that leverages deep contextual embeddings from the pre-trained AraBERT model; (3) a novel hybrid fusion model that concatenates traditional morphological features, AraBERT embeddings, and emoji-based features into a high-dimensional vector; and (4) a fully fine-tuned AraBERT model specifically adapted for the sentiment classification task. Our experiments demonstrate the remarkable efficacy of our proposed framework, with the fine-tuned AraBERT architecture achieving an accuracy of 93.45%, a significant 10.89% improvement over the best traditional baseline.<\/jats:p>","DOI":"10.3390\/bdcc10020049","type":"journal-article","created":{"date-parts":[[2026,2,3]],"date-time":"2026-02-03T16:46:28Z","timestamp":1770137188000},"page":"49","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["SiAraSent: From Features to Deep Transformers for Large-Scale Arabic Sentiment Analysis"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3600-2764","authenticated-orcid":false,"given":"Omar","family":"Almousa","sequence":"first","affiliation":[{"name":"Department of Computer Science, Jordan University of Science and Technology, Irbid 22110, Jordan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4248-8732","authenticated-orcid":false,"given":"Yahya","family":"Tashtoush","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Jordan University of Science and Technology, Irbid 22110, Jordan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1506-7924","authenticated-orcid":false,"given":"Anas","family":"AlSobeh","sequence":"additional","affiliation":[{"name":"Department of Information Systems & Technology, Utah Valley University, Orem, UT 84058, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8538-6277","authenticated-orcid":false,"given":"Plamen","family":"Zahariev","sequence":"additional","affiliation":[{"name":"Department of Telecommunications, University of Ruse \u201cAngel Kanchev\u201d, 7017 Ruse, Bulgaria"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8346-7148","authenticated-orcid":false,"given":"Omar","family":"Darwish","sequence":"additional","affiliation":[{"name":"Information Security and Applied Computing, Eastern Michigan University, Ypsilanti, MI 48197, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2026,2,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Ashbaugh, L., and Zhang, Y. (2024). A Comparative Study of Sentiment Analysis on Customer Reviews Using Machine Learning and Deep Learning. Computers, 13.","DOI":"10.20944\/preprints202411.0741.v1"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Alotaibi, A., and Nadeem, F. (2024). Leveraging Social Media and Deep Learning for Sentiment Analysis for Smart Governance: A Case Study of Public Reactions to Educational Reforms in Saudi Arabia. Computers, 13.","DOI":"10.3390\/computers13110280"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"5731","DOI":"10.1007\/s10462-022-10144-1","article-title":"A survey on sentiment analysis methods, applications, and challenges","volume":"55","author":"Wankhade","year":"2022","journal-title":"Artif. Intell. Rev."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Dubey, P., Dubey, P., and Bokoro, P.N. (2025). Unpacking Sarcasm: A Contextual and Transformer-Based Approach for Improved Detection. Computers, 14.","DOI":"10.3390\/computers14030095"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"6266","DOI":"10.1016\/j.eswa.2013.05.057","article-title":"Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network","volume":"40","author":"Ghiassi","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"526","DOI":"10.1007\/s10791-008-9070-z","article-title":"A machine learning approach to sentiment analysis in multilingual Web texts","volume":"12","author":"Boiy","year":"2009","journal-title":"Inf. Retr."},{"key":"ref_7","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, NIPS Foundation."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Thakur, N., Cui, S., Khanna, K., Knieling, V., Duggal, Y.N., and Shao, M. (2023). Investigation of the Gender-Specific Discourse about Online Learning during COVID-19 on Twitter Using Sentiment Analysis, Subjectivity Analysis, and Toxicity Analysis. Computers, 12.","DOI":"10.20944\/preprints202310.0157.v1"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"e2644","DOI":"10.7717\/peerj-cs.2644","article-title":"Transformer based ensemble model for dialectal Arabic sentiment classification","volume":"11","author":"Mansour","year":"2025","journal-title":"PeerJ Comput. Sci."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1016\/j.knosys.2015.06.015","article-title":"A survey on opinion mining and sentiment analysis: Tasks, approaches and applications","volume":"89","author":"Kumar","year":"2015","journal-title":"Knowl.-Based Syst."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1007\/s10590-011-9087-8","article-title":"Nizar Y. Habash, Introduction to Arabic natural language processing (Synthesis lectures on human language technologies)","volume":"24","author":"Shaalan","year":"2010","journal-title":"Mach. Transl."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1644879.1644881","article-title":"Arabic Natural Language Processing: Challenges and Solutions","volume":"8","author":"Farghaly","year":"2009","journal-title":"ACM Trans. Asian Lang. Inf. Process."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Fang, Y., Xu, C., Guan, S., Yan, N., and Mei, Y. (2024, January 15\u201316). Advancing Arabic sentiment analysis: ArSen benchmark and the improved fuzzy deep hybrid network. Proceedings of the 28th Conference on Computational Natural Language Learning, Miami, FL, USA.","DOI":"10.18653\/v1\/2024.conll-1.39"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Al-Twairesh, N., Al-Khalifa, H., and Al-Salman, A. (2014, January 10\u201313). Subjectivity and sentiment analysis of Arabic: Trends and challenges. Proceedings of the 2014 IEEE\/ACS 11th International Conference on Computer Systems and Applications (AICCSA), Doha, Qatar.","DOI":"10.1109\/AICCSA.2014.7073192"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1561\/1500000031","article-title":"Arabic information retrieval","volume":"7","author":"Darwish","year":"2014","journal-title":"Found. Trends Inf. Retr."},{"key":"ref_16","first-page":"2.1","article-title":"Customizing Sentiment Classifiers to New Domains: A Case Study","volume":"Volume 1","author":"Aue","year":"2005","journal-title":"Proceedings of the Recent Advances in Natural Language Processing (RANLP)"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"5268","DOI":"10.1016\/j.eswa.2010.10.031","article-title":"Comparing machine learning classifiers in potential distribution modelling","volume":"38","author":"Lorena","year":"2011","journal-title":"Expert Syst. Appl."},{"key":"ref_18","unstructured":"Sidorov, G. (2013). Non-Linear Construction of N-Grams in Computational Linguistics, Sociedad Mexicana de Inteligencia Artificial."},{"key":"ref_19","unstructured":"Lajili, I., Ladhari, T., and Babai, Z. (2016, January 1\u20134). Adaptive machine learning classifiers for the class imbalance problem in ABC inventory classification. Proceedings of the 6th International Conference on Information Systems, Logistics and Supply Chain (ILS), Bordeaux, France."},{"key":"ref_20","unstructured":"Ladicky, L., and Torr, P.H. (July, January 28). Locally linear support vector machines. Proceedings of the 2011 28th International Conference on Machine Learning, Bellevue, WA, USA. Available online: https:\/\/icml.cc\/Conferences\/2011\/papers\/508_icmlpaper.pdf."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"88","DOI":"10.3991\/ijim.v14i07.10600","article-title":"Sentiment analysis of impact of technology on employment from text on Twitter","volume":"14","author":"Qaiser","year":"2020","journal-title":"Int. J. Interact. Mob. Technol."},{"key":"ref_22","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Abdul-Mageed, M., El-Haj, M., and Nagoudi, E.M.B. (2021, January 7\u201311). MARBERT: A Deep Bidirectional Transformer for Arabic Dialect Identification. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.acl-long.551"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"AlSobeh, A., Gwarzo, Z., and Shatnawi, A. (2025, January 1\u20134). ShadowPlay: Engineering Defenses Against Role-Based Prompt Injection and Dependency Hallucination in LLM-Powered Development. Proceedings of the 2025 International Conference on Cybersecurity and AI-Based Systems (Cyber-AI), Varna, Bulgaria.","DOI":"10.1109\/Cyber-AI66431.2025.11233258"},{"key":"ref_25","unstructured":"Hammouda, T., Jarrar, M., and Khalilia, M. (2024, January 21\u201324). SinaTools: Open Source Toolkit for Arabic Natural Language Understanding. Proceedings of the 2024 AI in Computational Linguistics (ACLing 2024), Dubai, United Arab Emirates."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"AlSobeh, A., Shatnawi, A., and Magableh, A. (2025). AspectFL: Aspect-Oriented Programming for Trustworthy and Compliant Federated Learning Systems. Information, 16.","DOI":"10.3390\/info16121048"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Al-Shawakfa, E.M., Alsobeh, A.M., Omari, S., and Shatnawi, A. (2025). RADAR#: An ensemble approach for radicalization detection in Arabic social media using hybrid deep learning and transformer models. Information, 16.","DOI":"10.3390\/info16070522"},{"key":"ref_28","unstructured":"Saad, M. (2025, November 27). Arabic Sentiment Twitter Corpus. Available online: https:\/\/www.kaggle.com\/datasets\/mksaad\/arabic-sentiment-twitter-corpus."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Hussien, W.A., Tashtoush, Y.M., Al-Ayyoub, M., and Al-Kabi, M.N. (2016, January 13\u201314). Are emoticons good enough to train emotion classifiers of Arabic tweets?. Proceedings of the 2016 7th International Conference on Computer Science and Information Technology (CSIT), Amman, Jordan.","DOI":"10.1109\/CSIT.2016.7549459"},{"key":"ref_30","first-page":"8921","article-title":"Arabic sentiment analysis of COVID 19 tweets: Dataset collection, pre processing and benchmarking","volume":"13","author":"Alrasheed","year":"2023","journal-title":"Appl. Sci."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"224","DOI":"10.3390\/make1010014","article-title":"Analysis of machine learning algorithms for opinion mining in different domains","volume":"1","author":"Gamal","year":"2019","journal-title":"Mach. Learn. Knowl. Extr."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"81793","DOI":"10.1109\/ACCESS.2024.3382836","article-title":"Emo-SL framework: Emoji sentiment lexicon using text-based features and machine learning for sentiment analysis","volume":"12","author":"Alfreihat","year":"2024","journal-title":"IEEE Access"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1016\/j.joi.2008.11.005","article-title":"A machine learning approach for Arabic text classification using N-gram frequency statistics","volume":"3","author":"Khreisat","year":"2009","journal-title":"J. Informetr."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Roul, R.K., Sahoo, J.K., and Arora, K. (2017, January 15\u201317). Modified TF-IDF term weighting strategies for text categorization. Proceedings of the 2017 14th IEEE India Council International Conference (INDICON), Roorkee, India.","DOI":"10.1109\/INDICON.2017.8487593"},{"key":"ref_35","unstructured":"O\u2019Keefe, T., and Koprinska, I. (2009, January 4). Feature selection and weighting methods in sentiment analysis. Proceedings of the 14th Australasian Document Computing Symposium (ADCS), Sydney, Australia."},{"key":"ref_36","unstructured":"Van Zaanen, M., and Kanters, P. (2010, January 9\u201313). Automatic mood classification using TFIDF based on lyrics. Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR), Utrecht, The Netherlands."},{"key":"ref_37","first-page":"1","article-title":"Sentiment analysis of Arabic tweets: A survey","volume":"17","author":"Heikal","year":"2018","journal-title":"ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP)"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Khamaiseh, S.Y., Chiacchira, S., Alsobeh, A., and Aljadayah, A. (2025, January 8\u201311). M u AE: A Mutation Testing Framework for Evaluating Autoencoders. Proceedings of the 2025 IEEE 49th Annual Computers, Software, and Applications Conference (COMPSAC), Toronto, ON, Canada.","DOI":"10.1109\/COMPSAC65507.2025.00131"},{"key":"ref_39","unstructured":"Antoun, W., Baly, F., and Hajj, H. (2020). AraBERT: Transformer-based Model for Arabic Language Understanding. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Izimi, A., and Battou, A. (2024, January 23\u201324). Transformer-Based Models for Arabic Text Sentiment Analysis: A Systematic Literature Review. Proceedings of the 2024 Sixth International Conference on Intelligent Computing in Data Sciences (ICDS), Marrakech, Morocco.","DOI":"10.1109\/ICDS62089.2024.10756375"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Shatnawi, A., AlSobeh, A., Alsmadi, I., and Al-Ahmad, B. (2025, January 19\u201322). Tailored large language models for spam detection: From model customization to benchmarking effectiveness. Proceedings of the 2025 5th Intelligent Cybersecurity Conference (ICSC), Tampa, FL, USA.","DOI":"10.1109\/ICSC65596.2025.11140025"},{"key":"ref_42","unstructured":"Pandey, P. (2025, November 27). Simplifying Sentiment Analysis Using VADER in Python (on Social Media Text). Medium, Available online: https:\/\/medium.com\/analytics-vidhya\/simplifying-social-media-sentiment-analysis-using-vader-in-python-f9e6ec6fc52f."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1311","DOI":"10.32604\/iasc.2022.025861","article-title":"Automatic Annotation Performance of TextBlob and VADER on COVID Vaccination Dataset","volume":"34","author":"Alenzi","year":"2022","journal-title":"Intell. Autom. Soft Comput."},{"key":"ref_44","first-page":"387","article-title":"SAMAR: A system for subjectivity and sentiment analysis of Arabic social media","volume":"6","author":"Salameh","year":"2015","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1007\/s13042-014-0264-y","article-title":"Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem","volume":"7","author":"Bouras","year":"2016","journal-title":"Int. J. Mach. Learn. Cybern."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Al-Azani, S., and El-Alfy, E.S.M. (2018, January 3\u20135). Combining emojis with Arabic textual features for sentiment classification. Proceedings of the 2018 9th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan.","DOI":"10.1109\/IACS.2018.8355456"},{"key":"ref_47","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/10\/2\/49\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,3]],"date-time":"2026-02-03T17:03:21Z","timestamp":1770138201000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/10\/2\/49"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,3]]},"references-count":47,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2026,2]]}},"alternative-id":["bdcc10020049"],"URL":"https:\/\/doi.org\/10.3390\/bdcc10020049","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,3]]}}}