{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T11:06:12Z","timestamp":1779361572773,"version":"3.51.4"},"reference-count":47,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T00:00:00Z","timestamp":1779321600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Multimodal Arabic sentiment analysis has gained increasing attention due to the growing volume of user-generated multimedia content. However, effectively integrating textual, acoustic, and visual modalities remains challenging because of modality imbalance and weak cross-modal alignment. This study proposes a Directional Semantic Enhancement approach with Gated Fusion to address these limitations. The objective is to explicitly model similarity-guided semantic transfer between modalities while adaptively regulating information flow during fusion. The proposed architecture consists of four main stages: modality encoding, directional semantic enhancement, gated fusion, and classification. Directional semantic interactions enable structured cross-modal knowledge exchange, while adaptive gating mechanisms balance original and enhanced representations to mitigate modality-specific noise. Extensive experiments are conducted on the Ar-MuSA benchmark dataset, which contains 8700 multimodal samples. The proposed approach achieves 89.89% accuracy and an F1-score of 0.8989 with a latent dimension of 1024, outperforming early fusion, late fusion, and recent state-of-the-art methods. The study highlights the importance of controlled cross-modal alignment and provides a scalable approach for robust multimodal sentiment understanding in Arabic multimedia environments.<\/jats:p>","DOI":"10.3390\/make8050139","type":"journal-article","created":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T09:29:58Z","timestamp":1779355798000},"page":"139","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Directional Semantic Enhancement Approach with Gated Fusion for Multimodal Arabic Sentiment Analysis"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-4245-5921","authenticated-orcid":false,"given":"Ayoub Ben","family":"Cheikhi","sequence":"first","affiliation":[{"name":"L3IA Laboratory, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez 30000, Morocco"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5816-0897","authenticated-orcid":false,"given":"El Habib","family":"Nfaoui","sequence":"additional","affiliation":[{"name":"L3IA Laboratory, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez 30000, Morocco"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-7020-8175","authenticated-orcid":false,"given":"Oumayma","family":"Elbiach","sequence":"additional","affiliation":[{"name":"L3IA Laboratory, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez 30000, Morocco"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2026,5,21]]},"reference":[{"key":"ref_1","unstructured":"El-Idrissi, R., Makhlouf, E., Sahrane, M.R., Elohounkpon, D.R., Cristancho, J.A.S., Reig, J.S., Agoun, J., Girard, J.P., de Prado, G., and Loudcher, S. (2025, January 4\u20135). Data, Archives and Archaeological Texts: Creation and Exploitation of a Semantic Data Lake for Archaeology in Catalonia. The DataLAC Project. Proceedings of the International Symposium Modeling the Past to Anticipate the Future, Paris, France."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Abibulaiev, A., Pukach, P., and Vovk, M. (2026). Context-Aware ML\/NLP Pipeline for Real-Time Anomaly Detection and Risk Assessment in Cloud API Traffic. Mach. Learn. Knowl. Extr., 8.","DOI":"10.3390\/make8010025"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Ben Cheikhi, A., and Nfaoui, E.H. (2025). Mathematics Problem Classification Based on Pretrained Language Models. Proceedings of the 2025 11th International Conference on Optimization and Applications (ICOA), IEEE.","DOI":"10.1109\/ICOA66896.2025.11236881"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"107362","DOI":"10.1016\/j.rineng.2025.107362","article-title":"Benchmarking large language models for adverse drug reaction extraction in social media and clinical texts","volume":"28","author":"Elbiach","year":"2025","journal-title":"Results Eng."},{"key":"ref_5","first-page":"100793","article-title":"BCLSA: Advancing Bangla Sentiment Analysis with Concept-Level Reasoning and Efficiency","volume":"22","author":"Ullah","year":"2025","journal-title":"Mach. Learn. Appl."},{"key":"ref_6","first-page":"100749","article-title":"Longitudinal abuse and sentiment analysis of Hollywood movie dialogues using language models","volume":"22","author":"Chandra","year":"2025","journal-title":"Mach. Learn. Appl."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Mironela, P., Iustin, P., Mihaela, P.C., Claudiu, P., and Daniel, G.L. (2024). Analysis of Youtube video comments with NLP methods. Proceedings of the 2024 16th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), IEEE.","DOI":"10.1109\/ECAI61503.2024.10607575"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"130107","DOI":"10.1016\/j.neucom.2025.130107","article-title":"Multimodal sentiment analysis method based on image-text quantum transformer","volume":"637","author":"Wu","year":"2025","journal-title":"Neurocomputing"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"64556","DOI":"10.1109\/ACCESS.2025.3554665","article-title":"Multimodal sentiment analysis-a comprehensive survey from a fusion methods perspective","volume":"13","author":"Zhao","year":"2025","journal-title":"IEEE Access"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"111206","DOI":"10.1016\/j.asoc.2023.111206","article-title":"Progress, achievements, and challenges in multimodal sentiment analysis using deep learning: A survey","volume":"152","author":"Pandey","year":"2024","journal-title":"Appl. Soft Comput."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1016\/j.inffus.2023.02.028","article-title":"Multimodal sentiment analysis based on fusion methods: A survey","volume":"95","author":"Zhu","year":"2023","journal-title":"Inf. Fusion"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1080\/00051144.2025.2598897","article-title":"Enhanced sentiment analysis on social media using BERT and multimodal attention-based fusion","volume":"67","author":"Amirthasaravanan","year":"2026","journal-title":"Automatika"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Yakut, S., Tuten, Y.T., Caglar, E., and Aktas, M.S. (2026). A Multimodal Transformer-Based Framework for Emotion Analysis in Multilingual Video Content. Computers, 15.","DOI":"10.3390\/computers15020077"},{"key":"ref_14","unstructured":"Haouhat, A., Bellaouar, S., Nehar, A., Cherroun, H., and Abdelali, A. (2025). Arabic multimodal machine learning: Datasets, applications, approaches, and challenges. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1479","DOI":"10.1007\/s10462-023-10555-8","article-title":"A comprehensive survey on deep learning-based approaches for multimodal sentiment analysis","volume":"56","author":"Ghorbanali","year":"2023","journal-title":"Artif. Intell. Rev."},{"key":"ref_16","first-page":"896","article-title":"A Transformer-Based Approach for Multimodal Arabic Sentiment Analysis","volume":"17","author":"Nfaoui","year":"2026","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"103682","DOI":"10.1016\/j.asej.2025.103682","article-title":"UniTextFusion: A low-resource framework for Arabic multimodal sentiment analysis using early fusion and LoRA-tuned language models","volume":"16","author":"Khaled","year":"2025","journal-title":"Ain Shams Eng. J."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Nguyen, C.D., Nguyen, T., Vu, D., and Tuan, L.A. (2023). Improving multimodal sentiment analysis: Supervised angular margin-based contrastive learning. Proceedings of the Findings of ACL: EMNLP, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2023.findings-emnlp.980"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"101973","DOI":"10.1016\/j.inffus.2023.101973","article-title":"Modality translation-based multimodal sentiment analysis under uncertain missing modalities","volume":"101","author":"Liu","year":"2024","journal-title":"Inf. Fusion"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Paraskevopoulos, G., Georgiou, E., and Potamianos, A. (2022). MMLatch: Bottom-up top-down fusion for multimodal sentiment analysis. Proceedings of the ICASSP, IEEE.","DOI":"10.1109\/ICASSP43922.2022.9746418"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Christ, L., Amiriparian, S., Baird, A., Kathan, A., M\u00fcller, N., Klug, S., Gagne, C., Tzirakis, P., Stappen, L., and Me\u00dfner, E.M. (2023). The MuSe 2023 Multimodal Sentiment Analysis Challenge. Proceedings of the MuSe Workshop, Association for Computing Machinery.","DOI":"10.1145\/3551876.3554817"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Cheikhi, A.B., and Nfaoui, E.H. (2025). Multimodal Arabic Sarcasm Detection Using CNN and BiLSTM. Proceedings of the International Conference on Intelligent Systems, IEEE.","DOI":"10.1109\/SITA67914.2025.11273462"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"24713","DOI":"10.1007\/s00521-023-08366-7","article-title":"Multimodal sentiment system and method based on CRNN-SVM","volume":"35","author":"Zhao","year":"2023","journal-title":"Neural Comput. Appl."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zadeh, A., Chen, M., Poria, S., Cambria, E., and Morency, L.P. (2017). Tensor Fusion Network for Multimodal Sentiment Analysis. arXiv.","DOI":"10.18653\/v1\/D17-1115"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Hori, C., Hori, T., Lee, T.-Y., Zhang, Z., Harsham, B., Hershey, J.R., Marks, T.K., and Sumi, K. (2017). Attention-based multimodal fusion for video description. Proceedings of the ICCV, IEEE.","DOI":"10.1109\/ICCV.2017.450"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"15311","DOI":"10.1007\/s10489-022-04266-w","article-title":"A mutual attention based multimodal fusion for fake news detection","volume":"53","author":"Guo","year":"2023","journal-title":"Appl. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zerkouk, M., Mihoubi, M., and Chikhaoui, B. (2025). Contextual Attention-Based Multimodal Fusion of LLM and CNN for Sentiment Analysis. arXiv.","DOI":"10.21428\/594757db.9343ac26"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"356","DOI":"10.62762\/CJIF.2025.537775","article-title":"VBCSNet: A Hybrid Attention-Based Multimodal Framework with Structured Self-Attention for Sentiment Classification","volume":"2","author":"Liu","year":"2025","journal-title":"Chin. J. Inf. Fusion"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"30733","DOI":"10.1109\/ACCESS.2026.3667314","article-title":"GCMA-Net: A Gated Cross-Modal Attention Network for Arabic Multimodal Sentiment Analysis","volume":"14","author":"Cheikhi","year":"2026","journal-title":"IEEE Access"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"39","DOI":"10.18178\/joig.6.1.39-43","article-title":"Multimodal sentiment analysis of Arabic videos","volume":"6","author":"Najadat","year":"2018","journal-title":"J. Image Graph."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Agarwal, A., Yadav, A., and Vishwakarma, D.K. (2019). Multimodal Sentiment Analysis via RNN variants. Proceedings of the IEEE\/ACIS International Conference on Big Data, Cloud Computing, and Data Science, IEEE.","DOI":"10.1109\/BCD.2019.8885108"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"136843","DOI":"10.1109\/ACCESS.2020.3011977","article-title":"Enhanced video analytics for sentiment analysis based on fusing textual, auditory and visual information","volume":"8","year":"2020","journal-title":"IEEE Access"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Sherif, A., and Sabty, C. (2024). Sentiment analysis for Egyptian Arabic-English code-switched data. Proceedings of the International Conference on Speech and Computer, IEEE.","DOI":"10.1007\/978-3-031-78014-1_5"},{"key":"ref_34","first-page":"30","article-title":"Ar-MuSA: A Multimodal Benchmark Dataset and Evaluation Framework for Arabic Sentiment Analysis","volume":"18","author":"Khaled","year":"2025","journal-title":"Int. J. Intell. Eng. Syst."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"132767","DOI":"10.1016\/j.neucom.2026.132767","article-title":"Bi-directional similarity enhancement and adjustment hashing for unsupervised cross-modal retrieval","volume":"672","author":"Yao","year":"2026","journal-title":"Neurocomputing"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"2008","DOI":"10.1109\/TIP.2018.2882225","article-title":"Bi-directional spatial-semantic attention networks for image-text matching","volume":"28","author":"Huang","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Di, L., Zhang, B., Wang, Y., and Zhang, W. (2025). Frequency Meets Semantics: Text-Visual Fusion with Directional Spectral Enhancement for Salient Object Detection in Optical Remote Sensing Images. Proceedings of the 33rd ACM International Conference on Multimedia, Association for Computing Machinery.","DOI":"10.1145\/3746027.3755562"},{"key":"ref_38","first-page":"4704014","article-title":"Direction-oriented visual\u2013semantic embedding model for remote sensing image\u2013text retrieval","volume":"62","author":"Ma","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"3983","DOI":"10.1109\/TCSVT.2024.3521646","article-title":"Bi-direction label-guided semantic enhancement for cross-modal hashing","volume":"35","author":"Zhu","year":"2024","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"4705714","DOI":"10.1109\/TGRS.2025.3587426","article-title":"Directional Semantic Enhanced Visual Grounding for Remote Sensing Images","volume":"63","author":"Guo","year":"2025","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_41","unstructured":"Hendrycks, D. (2016). Gaussian Error Linear Units (Gelus). arXiv."},{"key":"ref_42","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 7\u20139). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the International Conference on Machine Learning. PMLR 37, Lille, France."},{"key":"ref_43","first-page":"1929","article-title":"Dropout: A simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Abdul-Mageed, M., Elmadany, A., and Nagoudi, E.M.B. (2021). ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic. arXiv.","DOI":"10.18653\/v1\/2021.acl-long.551"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Hsu, W.N., Bolte, B., Tsai, Y.H.H., Lakhotia, K., Salakhutdinov, R., and Mohamed, A. (2021). HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. arXiv.","DOI":"10.1109\/TASLP.2021.3122291"},{"key":"ref_47","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021). An Image is Worth 16 \u00d7 16 Words: Transformers for Image Recognition at Scale. arXiv."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/8\/5\/139\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T10:22:52Z","timestamp":1779358972000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/8\/5\/139"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,5,21]]},"references-count":47,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2026,5]]}},"alternative-id":["make8050139"],"URL":"https:\/\/doi.org\/10.3390\/make8050139","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,5,21]]}}}