{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,3]],"date-time":"2025-12-03T08:20:04Z","timestamp":1764750004437,"version":"3.46.0"},"reference-count":41,"publisher":"PeerJ","license":[{"start":{"date-parts":[[2025,12,3]],"date-time":"2025-12-03T00:00:00Z","timestamp":1764720000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"abstract":"<jats:p>Multi-modal sentiment analysis lies at the intersection of natural-language processing and multimedia analysis, aiming to unravel complex emotional expressions in multimedia content. This article presents a novel approach to Urdu multi-modal sentiment analysis, focusing on the integration of textual, acoustic, and visual cues to predict sentiment. Our methodology involves a systematic approach to feature extraction from each modality, followed by individual and fused modality sentiment classification. We employ a convolutional neural network (CNN) model integrated with 300-dimensional Embedding FastText to capture meaningful text representations in the textual modality. The acoustic modality utilizes the Librosa library for audio feature extraction, encompassing Mel-frequency cepstral coefficients (MFCCs), intensity, pitch, and loudness. We utilize three-dimensional convolutional neural networks (3D-CNNs) to extract spatial and temporal features from videos for the visual modality. We explore feature- and decision-level fusion techniques to combine the strengths of individual modalities. The results highlight the effectiveness of the fused approach, achieving an accuracy of 91.18%. Our findings underscore the importance of leveraging multiple modalities for comprehensive sentiment analysis, opening avenues for applications in social media sentiment assessment, content recommendation, and market sentiment evaluation. The proposed framework not only contributes to the advancement of Urdu sentiment analysis but also serves as a stepping stone for further research in multilingual and cross-modal sentiment analysis, thereby enriching our understanding of emotions expressed in multimedia content.<\/jats:p>","DOI":"10.7717\/peerj-cs.3369","type":"journal-article","created":{"date-parts":[[2025,12,3]],"date-time":"2025-12-03T08:15:21Z","timestamp":1764749721000},"page":"e3369","source":"Crossref","is-referenced-by-count":0,"title":["Multi-modal sentiment analysis framework for Urdu language opinion videos"],"prefix":"10.7717","volume":"11","author":[{"given":"Ghulam-Rabbani","family":"Butt","sequence":"first","affiliation":[{"name":"Department of Software Engineering, University of Engineering and Technology Taxila, Texila, Pakistan"},{"name":"Department of Software Engineering, University of Azad Jammu and Kashmir, Muzaffarabad, A.K, Pakistan"}]},{"given":"Huma","family":"Qayyum","sequence":"additional","affiliation":[{"name":"Department of Software Engineering, University of Engineering and Technology Taxila, Texila, Pakistan"}]},{"given":"Muhammad","family":"Majid","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, University of Engineering and Technology Taxila, Taxila, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8242-0716","authenticated-orcid":true,"given":"Ikram","family":"Syed","sequence":"additional","affiliation":[{"name":"Department of Information and Communication Engineering, Hankuk University of Foreign Studies, Yongin, Gyeonggi-do, Republic of South Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5406-0389","authenticated-orcid":true,"given":"Syed","family":"Sajid Ullah","sequence":"additional","affiliation":[{"name":"Information and Communication Technology, University of Agder, Grimstad, Norway"}]}],"member":"4443","published-online":{"date-parts":[[2025,12,3]]},"reference":[{"key":"10.7717\/peerj-cs.3369\/ref-1","first-page":"19","article-title":"Multimodal sentiment analysis via RNN variants","author":"Agarwal","year":"2019"},{"key":"10.7717\/peerj-cs.3369\/ref-2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1080\/0144929X.2022.2156387","article-title":"Machine learning techniques for emotion detection and sentiment analysis: current state, challenges, and future directions","volume":"43","author":"Alslaity","year":"2022","journal-title":"Behaviour & Information Technology"},{"key":"10.7717\/peerj-cs.3369\/ref-3","doi-asserted-by":"publisher","first-page":"110494","DOI":"10.1016\/j.asoc.2023.110494","article-title":"Attention-based multimodal sentiment analysis and emotion recognition using deep neural networks","volume":"144","author":"Aslam","year":"2023","journal-title":"Applied Soft Computing"},{"key":"10.7717\/peerj-cs.3369\/ref-4","first-page":"421","article-title":"Speech emotion recognition system with librosa","author":"Babu","year":"2021"},{"key":"10.7717\/peerj-cs.3369\/ref-5","first-page":"108","article-title":"Sentic blending: scalable multimodal fusion for the continuous interpretation of semantics and sentics","author":"Cambria","year":"2013"},{"issue":"1","key":"10.7717\/peerj-cs.3369\/ref-6","doi-asserted-by":"publisher","first-page":"706","DOI":"10.1109\/COMST.2023.3308717","article-title":"Networking architecture and key supporting technologies for human digital twin in personalized healthcare: a comprehensive survey","volume":"26","author":"Chen","year":"2023","journal-title":"IEEE Communications Surveys & Tutorials"},{"issue":"3","key":"10.7717\/peerj-cs.3369\/ref-7","doi-asserted-by":"publisher","first-page":"377","DOI":"10.1016\/j.neucom.2021.02.020","article-title":"A novel context-aware multimodal framework for persian sentiment analysis","volume":"457","author":"Dashtipour","year":"2021","journal-title":"Neurocomputing"},{"key":"10.7717\/peerj-cs.3369\/ref-8","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1002\/9781118910566.ch16","article-title":"Sematic audiovisual data fusion for automatic emotion recognition","volume-title":"Emotion Recognition: A Pattern Analysis Approach","author":"Datcu","year":"2014"},{"key":"10.7717\/peerj-cs.3369\/ref-9","first-page":"397","article-title":"Facial emotion recognition using multi-modal information","volume":"1","author":"De Silva","year":"1997"},{"issue":"2","key":"10.7717\/peerj-cs.3369\/ref-10","first-page":"140","article-title":"Universal facial expressions in emotion","volume":"15","author":"Ekman","year":"1973","journal-title":"Studia Psychologica"},{"issue":"5","key":"10.7717\/peerj-cs.3369\/ref-11","doi-asserted-by":"publisher","first-page":"378","DOI":"10.1037\/h0031619","article-title":"Measuring nominal scale agreement among many raters","volume":"76","author":"Fleiss","year":"1971","journal-title":"Psychological Bulletin"},{"key":"10.7717\/peerj-cs.3369\/ref-12","doi-asserted-by":"publisher","first-page":"424","DOI":"10.1016\/j.inffus.2022.09.025","article-title":"Multimodal sentiment analysis: a systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions","volume":"91","author":"Gandhi","year":"2023","journal-title":"Information Fusion"},{"issue":"2","key":"10.7717\/peerj-cs.3369\/ref-13","doi-asserted-by":"publisher","first-page":"102946","DOI":"10.1016\/j.bspc.2021.102946","article-title":"Sentiment analysis in non-fixed length audios using a fully convolutional neural network","volume":"69","author":"Garc\u00eda-Ord\u00e1s","year":"2021","journal-title":"Biomedical Signal Processing and Control"},{"issue":"8","key":"10.7717\/peerj-cs.3369\/ref-14","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1016\/j.procs.2019.01.202","article-title":"Deep learning-based sentiment analysis for Roman Urdu text","volume":"147","author":"Ghulam","year":"2019","journal-title":"Procedia Computer Science"},{"key":"10.7717\/peerj-cs.3369\/ref-15","first-page":"1","article-title":"Deep learning driven multimodal fusion for automated deception detection","author":"Gogate","year":"2017"},{"key":"10.7717\/peerj-cs.3369\/ref-16","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1607.01759","article-title":"Bag of tricks for efficient text classification","author":"Joulin","year":"2016"},{"issue":"1","key":"10.7717\/peerj-cs.3369\/ref-17","doi-asserted-by":"publisher","first-page":"5436","DOI":"10.1038\/s41598-022-09381-9","article-title":"Multi-class sentiment analysis of Urdu text using multilingual BERT","volume":"12","author":"Khan","year":"2022","journal-title":"Scientific Reports"},{"key":"10.7717\/peerj-cs.3369\/ref-18","doi-asserted-by":"publisher","first-page":"97803","DOI":"10.1109\/access.2021.3093078","article-title":"Urdu sentiment analysis with deep learning methods","volume":"9","author":"Khan","year":"2021","journal-title":"IEEE Access"},{"key":"10.7717\/peerj-cs.3369\/ref-19","first-page":"1","article-title":"Urdu sentiment corpus (v1. 0): linguistic exploration and visualization of labeled dataset for Urdu sentiment analysis","author":"Khan","year":"2020"},{"issue":"1","key":"10.7717\/peerj-cs.3369\/ref-20","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/j.eij.2020.04.003","article-title":"A survey on sentiment analysis in Urdu: a resource-poor language","volume":"22","author":"Khattak","year":"2021","journal-title":"Egyptian Informatics Journal"},{"issue":"11","key":"10.7717\/peerj-cs.3369\/ref-21","doi-asserted-by":"publisher","first-page":"2347","DOI":"10.3390\/app9112347","article-title":"Sentiment classification using convolutional neural networks","volume":"9","author":"Kim","year":"2019","journal-title":"Applied Sciences"},{"key":"10.7717\/peerj-cs.3369\/ref-22","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1016\/j.dss.2018.09.002","article-title":"Deep learning for affective computing: text-based emotion recognition in decision support","volume":"115","author":"Kratzwald","year":"2018","journal-title":"Decision Support Systems"},{"issue":"20","key":"10.7717\/peerj-cs.3369\/ref-23","doi-asserted-by":"publisher","first-page":"10344","DOI":"10.3390\/app122010344","article-title":"Roman Urdu sentiment analysis using transfer learning","volume":"12","author":"Li","year":"2022","journal-title":"Applied Sciences"},{"issue":"02","key":"10.7717\/peerj-cs.3369\/ref-24","doi-asserted-by":"publisher","first-page":"e1414","DOI":"10.7717\/peerj-cs.1414","article-title":"Emotion recognition of social media users based on deep learning","volume":"9","author":"Li","year":"2023","journal-title":"PeerJ Computer Science"},{"issue":"6","key":"10.7717\/peerj-cs.3369\/ref-25","doi-asserted-by":"publisher","first-page":"e1032","DOI":"10.7717\/peerj-cs.1032","article-title":"Sentiment analysis techniques, challenges, and opportunities: Urdu language-based analytical study","volume":"8","author":"Liaqat","year":"2022","journal-title":"PeerJ Computer Science"},{"key":"10.7717\/peerj-cs.3369\/ref-26","first-page":"627","article-title":"Sentiment analysis and subjectivity","volume-title":"Handbook of Natural Language Processing","author":"Liu","year":"2010","edition":"2"},{"key":"10.7717\/peerj-cs.3369\/ref-27","first-page":"80","article-title":"Audio sentiment analysis by heterogeneous signal features learned from utterance-based parallel neural network","author":"Luo","year":"2019"},{"issue":"4","key":"10.7717\/peerj-cs.3369\/ref-28","doi-asserted-by":"publisher","first-page":"102233","DOI":"10.1016\/j.ipm.2020.102233","article-title":"Deep sentiments in roman Urdu text using recurrent convolutional neural network model","volume":"57","author":"Mahmood","year":"2020","journal-title":"Information Processing & Management"},{"key":"10.7717\/peerj-cs.3369\/ref-29","doi-asserted-by":"crossref","first-page":"163","DOI":"10.21015\/vtse.v10i2.981","article-title":"Urdu sentiment analysis: feature extraction, taxonomy, and challenges","volume":"10","author":"Mashooq","year":"2022","journal-title":"VFAST Transactions on Software Engineering"},{"key":"10.7717\/peerj-cs.3369\/ref-30","doi-asserted-by":"publisher","first-page":"18","DOI":"10.25080\/Majora-7b98e3ed-003","article-title":"librosa: audio and music signal analysis in Python","volume-title":"Proceedings of the 14th Python in Science Conference","author":"McFee","year":"2015"},{"issue":"6","key":"10.7717\/peerj-cs.3369\/ref-31","doi-asserted-by":"publisher","first-page":"102368","DOI":"10.1016\/j.ipm.2020.102368","article-title":"An unsupervised lexical normalization for Roman Hindi and Urdu sentiment analysis","volume":"57","author":"Mehmood","year":"2020","journal-title":"Information Processing & Management"},{"issue":"5","key":"10.7717\/peerj-cs.3369\/ref-32","doi-asserted-by":"publisher","first-page":"1027","DOI":"10.3390\/sym15051027","article-title":"Innovations in urdu sentiment analysis using machine and deep learning techniques for two-class classification of symmetric datasets","volume":"15","author":"Muhammad","year":"2023","journal-title":"Symmetry"},{"issue":"8","key":"10.7717\/peerj-cs.3369\/ref-33","doi-asserted-by":"publisher","first-page":"2173","DOI":"10.1016\/j.tele.2018.08.003","article-title":"Lexicon-based approach outperforms supervised machine learning approach for Urdu sentiment analysis in multiple domains","volume":"35","author":"Mukhtar","year":"2018","journal-title":"Telematics and Informatics"},{"issue":"6","key":"10.7717\/peerj-cs.3369\/ref-34","doi-asserted-by":"publisher","first-page":"102383","DOI":"10.1016\/j.ipm.2020.102383","article-title":"Extractive text summarization models for Urdu language","volume":"57","author":"Nawaz","year":"2020","journal-title":"Information Processing & Management"},{"issue":"1","key":"10.7717\/peerj-cs.3369\/ref-35","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1016\/j.neucom.2015.01.095","article-title":"Fusing audio, visual and textual clues for sentiment analysis from multimodal content","volume":"174","author":"Poria","year":"2016","journal-title":"Neurocomputing"},{"key":"10.7717\/peerj-cs.3369\/ref-36","first-page":"973","article-title":"Utterance-level multimodal sentiment analysis","author":"P\u00e9rez-Rosas","year":"2013"},{"key":"10.7717\/peerj-cs.3369\/ref-37","doi-asserted-by":"publisher","first-page":"153072\u2013153082","DOI":"10.1109\/access.2021.3122025","article-title":"Urdu sentiment analysis via multimodal data mining based on deep learning algorithms","volume":"9","author":"Sehar","year":"2021","journal-title":"IEEE Access"},{"issue":"4","key":"10.7717\/peerj-cs.3369\/ref-38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1017\/s1351324921000425","article-title":"UNLT: Urdu natural language toolkit","volume":"29","author":"Shafi","year":"2022","journal-title":"Natural Language Engineering"},{"key":"10.7717\/peerj-cs.3369\/ref-39","first-page":"1","article-title":"Beyond facial expressions: learning human emotion from body gestures","author":"Shan","year":"2007"},{"issue":"7","key":"10.7717\/peerj-cs.3369\/ref-40","doi-asserted-by":"publisher","first-page":"5731","DOI":"10.1007\/s10462-022-10144-1","article-title":"A survey on sentiment analysis methods, applications, and challenges","volume":"55","author":"Wankhade","year":"2022","journal-title":"Artificial Intelligence Review"},{"issue":"3","key":"10.7717\/peerj-cs.3369\/ref-41","doi-asserted-by":"publisher","first-page":"162","DOI":"10.3390\/digital1030012","article-title":"Extracting information on affective computing research from data analysis of known digital platforms: Research into emotional artificial intelligence","volume":"1","author":"Yusupova","year":"2021","journal-title":"Digital"}],"container-title":["PeerJ Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/peerj.com\/articles\/cs-3369.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-3369.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-3369.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-3369.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,3]],"date-time":"2025-12-03T08:15:39Z","timestamp":1764749739000},"score":1,"resource":{"primary":{"URL":"https:\/\/peerj.com\/articles\/cs-3369"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,3]]},"references-count":41,"alternative-id":["10.7717\/peerj-cs.3369"],"URL":"https:\/\/doi.org\/10.7717\/peerj-cs.3369","archive":["CLOCKSS","LOCKSS","Portico"],"relation":{},"ISSN":["2376-5992"],"issn-type":[{"value":"2376-5992","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,3]]},"article-number":"e3369"}}