{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T20:34:45Z","timestamp":1768422885215,"version":"3.49.0"},"reference-count":97,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T00:00:00Z","timestamp":1768348800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61762085"],"award-info":[{"award-number":["61762085"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>With the rapid development of the Internet, online public opinion monitoring has emerged as a crucial task in the information era. Multimodal sentiment analysis, through the integration of multiple modalities such as text, images, and audio, combined with technologies including natural language processing and computer vision, offers novel technical means for online public opinion monitoring. Nevertheless, current research still faces many challenges, such as the scarcity of high-quality datasets, limited model generalization ability, and difficulties with cross-modal feature fusion. This paper reviews the current research progress of multimodal sentiment analysis in online public opinion monitoring, including its development history, key technologies, and application scenarios. Existing problems are analyzed and future research directions are discussed. In particular, we emphasize a fusion-architecture-centric comparison under online public opinion monitoring, and discuss cross-lingual differences that affect multimodal alignment and evaluation.<\/jats:p>","DOI":"10.3390\/informatics13010010","type":"journal-article","created":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T11:01:14Z","timestamp":1768388474000},"page":"10","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Review of Multimodal Sentiment Analysis in Online Public Opinion Monitoring"],"prefix":"10.3390","volume":"13","author":[{"given":"Shuxian","family":"Liu","sequence":"first","affiliation":[{"name":"College of Computer Science and Technology, Xinjiang University, Urumqi 830017, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4035-7517","authenticated-orcid":false,"given":"Tianyi","family":"Li","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Xinjiang University, Urumqi 830017, China"}]}],"member":"1968","published-online":{"date-parts":[[2026,1,14]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"15092","DOI":"10.1109\/TNNLS.2023.3294810","article-title":"Sentiment Analysis: Comprehensive Reviews, Recent Advances, and Open Challenges","volume":"35","author":"Lu","year":"2024","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"424","DOI":"10.1016\/j.inffus.2022.09.025","article-title":"Multimodal Sentiment Analysis: A Systematic Review of History, Datasets, Multimodal Fusion Methods, Applications, Challenges and Future Directions","volume":"91","author":"Gandhi","year":"2023","journal-title":"Inf. Fusion"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"12113","DOI":"10.1109\/TPAMI.2023.3275156","article-title":"Multimodal Learning with Transformers: A Survey","volume":"45","author":"Xu","year":"2023","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_4","first-page":"1","article-title":"Survey of Research on Multimodal Fusion Technology for Deep Learning","volume":"46","author":"He","year":"2020","journal-title":"Comput. Eng."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"12039","DOI":"10.1109\/ACCESS.2024.3354844","article-title":"A Survey on Multimodal Aspect-Based Sentiment Analysis","volume":"12","author":"Zhao","year":"2024","journal-title":"IEEE Access"},{"key":"ref_6","first-page":"778","article-title":"A Survey of Large Language Models in the Domain of Cybersecurity","volume":"24","author":"Zhang","year":"2024","journal-title":"Netinfo Secur."},{"key":"ref_7","first-page":"1","article-title":"Survey of Sentiment Analysis Algorithms Based on Multimodal Fusion","volume":"60","author":"Guo","year":"2024","journal-title":"Comput. Eng. Appl."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Liu, X., Wei, F., Jiang, W., Zheng, Q., Qiao, Y., Liu, J., Niu, L., Chen, Z., and Dong, H. (2023). MTR-SAM: Visual Multimodal Text Recognition and Sentiment Analysis in Public Opinion Analysis on the Internet. Appl. Sci., 13.","DOI":"10.3390\/app13127307"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Hu, M., and Liu, B. (2004, January 22\u201325). Mining and Summarizing Customer Reviews. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD\u201904, Seattle, WA, USA.","DOI":"10.1145\/1014052.1014073"},{"key":"ref_10","unstructured":"Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., and Potts, C. (2011, January 19\u201324). Learning Word Vectors for Sentiment Analysis. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA."},{"key":"ref_11","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013, January 2\u20134). Efficient Estimation of Word Representations in Vector Space. Proceedings of the International Conference on Learning Representations, ICLR 2013, Scottsdale, AZ, USA."},{"key":"ref_12","first-page":"773","article-title":"A Deep Learning Model Enhanced with Emotion Semantics for Microblog Sentiment Analysis","volume":"40","author":"He","year":"2017","journal-title":"Chin. J. Comput."},{"key":"ref_13","first-page":"281","article-title":"Research on Offensive Language Detection in Social Networks Based on Emotion-Assisted Multi-Task Learning","volume":"25","author":"Jin","year":"2025","journal-title":"Netinfo Secur."},{"key":"ref_14","first-page":"3075","article-title":"Text sentiment analysis based on feature fusion of convolution neural network and bidirectional long short-term memory network","volume":"38","author":"LI","year":"2018","journal-title":"J. Comput. Appl."},{"key":"ref_15","first-page":"75","article-title":"Micro-blog sentiment analysis based on emotional fusion and multi-dimensional self-attention mechanism","volume":"39","author":"Han","year":"2019","journal-title":"J. Comput. Appl."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"460","DOI":"10.1109\/TSMC.1978.4309999","article-title":"Textural Features Corresponding to Visual Perception","volume":"8","author":"Tamura","year":"1978","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1109\/93.790610","article-title":"Semantics in Visual Information Retrieval","volume":"6","author":"Colombo","year":"1999","journal-title":"IEEE MultiMed."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Machajdik, J., and Hanbury, A. (2010, January 25\u201329). Affective Image Classification Using Features Inspired by Psychology and Art Theory. Proceedings of the 18th ACM International Conference on Multimedia, MM\u201910, Firenze, Italy.","DOI":"10.1145\/1873951.1873965"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Borth, D., Ji, R., Chen, T., Breuel, T., and Chang, S.F. (2013, January 21\u201325). Large-Scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs. Proceedings of the 21st ACM International Conference on Multimedia, MM\u201913, Barcelona, Spain.","DOI":"10.1145\/2502081.2502282"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yang, J., Sun, M., and Sun, X. (2017). Learning Visual Sentiment Distributions via Augmented Conditional Probability Neural Network. Proceedings of the AAAI Conference on Artificial Intelligence, AAAI Press.","DOI":"10.1609\/aaai.v31i1.10485"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhu, J., Park, T., Isola, P., and Efros, A.A. (2017, January 22\u201329). Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.244"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"997","DOI":"10.1109\/TMM.2017.2757769","article-title":"Predicting Microblog Sentiments via Weakly Supervised Multimodal Deep Learning","volume":"20","author":"Chen","year":"2018","journal-title":"IEEE Trans. Multimed."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"2077","DOI":"10.1007\/s11063-019-10035-7","article-title":"Deep Transfer Learning for Image Emotion Analysis: Reducing Marginal and Joint Distribution Discrepancies Together","volume":"51","author":"He","year":"2020","journal-title":"Neural Process. Lett."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhao, Z., and Liu, Q. (2021, January 20\u201324). Former-DFER: Dynamic Facial Expression Recognition Transformer. Proceedings of the 29th ACM International Conference on Multimedia, MM\u201921, Virtual Event.","DOI":"10.1145\/3474085.3475292"},{"key":"ref_25","unstructured":"Lin, Y., and Wei, G. (2005, January 18\u201321). Speech Emotion Recognition Based on HMM and SVM. Proceedings of the 2005 International Conference on Machine Learning and Cybernetics, Guangzhou, China."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1109\/T-AFFC.2010.16","article-title":"Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels","volume":"2","author":"Wu","year":"2011","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1109\/T-AFFC.2011.14","article-title":"Interdependencies among Voice Source Parameters in Emotional Speech","volume":"2","author":"Sundberg","year":"2011","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Jin, Q., Li, C., Chen, S., and Wu, H. (2015, January 19\u201324). Speech Emotion Recognition with Acoustic and Lexical Features. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia.","DOI":"10.1109\/ICASSP.2015.7178872"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"314","DOI":"10.1109\/TAFFC.2016.2531664","article-title":"Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models","volume":"8","author":"Mencattini","year":"2017","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Eskimez, S.E., Duan, Z., and Heinzelman, W. (2018, January 15\u201320). Unsupervised Learning Approach to Feature Analysis for Automatic Speech Emotion Recognition. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8462685"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"103205","DOI":"10.1016\/j.dsp.2021.103205","article-title":"Semi-Supervised Parallel Shared Encoders for Speech Emotion Recognition","volume":"118","author":"Pourebrahim","year":"2021","journal-title":"Digit. Signal Process."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1109\/TPAMI.2008.52","article-title":"A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions","volume":"31","author":"Zeng","year":"2009","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_33","first-page":"49","article-title":"Survey of Multimodal Data Fusion","volume":"57","author":"Ren","year":"2021","journal-title":"Comput. Eng. Appl."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"102217","DOI":"10.1016\/j.inffus.2023.102217","article-title":"A Survey of Multimodal Hybrid Deep Learning for Computer Vision: Architectures, Applications, Trends, and Challenges","volume":"105","author":"Bayoudh","year":"2024","journal-title":"Inf. Fusion"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Lueangwitchajaroen, P., Watcharapinchai, S., Tepsan, W., and Sooksatra, S. (2024). Multi-Level Feature Fusion in CNN-Based Human Action Recognition: A Case Study on EfficientNet-B7. J. Imaging, 10.","DOI":"10.3390\/jimaging10120320"},{"key":"ref_36","first-page":"438","article-title":"Multi-View Representations for Fake News Detection","volume":"24","author":"Zhang","year":"2024","journal-title":"Netinfo Secur."},{"key":"ref_37","first-page":"77","article-title":"Detection and Identification Model of Gambling Websites Based on Multi-Modal Data","volume":"23","author":"Zhao","year":"2023","journal-title":"Netinfo Secur."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"4416212","DOI":"10.1109\/TGRS.2022.3225843","article-title":"Category-Wise Fusion and Enhancement Learning for Multimodal Remote Sensing Image Semantic Segmentation","volume":"60","author":"Zheng","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_39","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2023). Attention Is All You Need. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Shvetsova, N., Chen, B., Rouditchenko, A., Thomas, S., Kingsbury, B., Feris, R., Harwath, D., Glass, J., and Kuehne, H. (2022, January 18\u201324). Everything at Once\u2014Multi-modal Fusion Transformer for Video Retrieval. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01939"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Xu, H., Yan, M., Li, C., Bi, B., Huang, S., Xiao, W., and Huang, F. (2021, January 1\u20136). E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online.","DOI":"10.18653\/v1\/2021.acl-long.42"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Girdhar, R., Singh, M., Ravi, N., van der Maaten, L., Joulin, A., and Misra, I. (2022, January 18\u201324). Omnivore: A Single Model for Many Visual Modalities. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01563"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Tschannen, M., Mustafa, B., and Houlsby, N. (2023, January 17\u201324). CLIPPO: Image-and-Language Understanding from Pixels Only. Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01059"},{"key":"ref_44","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J. (2021, January 18\u201324). Learning Transferable Visual Models From Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning, Virtual Event."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"5753","DOI":"10.1109\/TMM.2023.3338769","article-title":"UniMF: A Unified Multimodal Framework for Multimodal Sentiment Analysis in Missing Modalities and Unaligned Multimodal Sequences","volume":"26","author":"Huan","year":"2024","journal-title":"IEEE Trans. Multimed."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"0081","DOI":"10.34133\/icomputing.0081","article-title":"A Two-Stage Stacked Transformer Framework for Multimodal Sentiment Analysis","volume":"3","author":"Yi","year":"2024","journal-title":"Intell. Comput."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"664","DOI":"10.26599\/TST.2021.9010055","article-title":"Cross-Modal Complementary Network with Hierarchical Fusion for Multimodal Sentiment Classification","volume":"27","author":"Peng","year":"2022","journal-title":"Tsinghua Sci. Technol."},{"key":"ref_48","first-page":"242","article-title":"Text-Image Gated Fusion Mechanism for Multimodal Aspect-based Sentiment Analysis","volume":"51","author":"Zhang","year":"2024","journal-title":"Comput. Sci."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"3838","DOI":"10.11834\/jig.221015","article-title":"Aspect-level multimodal co-attention graph convolutional sentiment analysis model","volume":"28","author":"Wang","year":"2023","journal-title":"J. Image Graph."},{"key":"ref_50","first-page":"136","article-title":"Target-Oriented Interaction Graph Neural Networks for Multimodal Aspect-Level Sentiment Analysis","volume":"60","author":"Zhang","year":"2024","journal-title":"Comput. Eng. Appl."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"3122","DOI":"10.1109\/TAFFC.2025.3590246","article-title":"CAETFN: Context Adaptively Enhanced Text-Guided Fusion Network for Multimodal Sentiment Analysis","volume":"16","author":"Li","year":"2025","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_52","first-page":"7","article-title":"A Multimodal Sentiment Recognition Method Based on Multitask Learning","volume":"57","author":"Lin","year":"2021","journal-title":"Acta Sci. Nat. Univ. Pekin."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"9298","DOI":"10.1109\/TNNLS.2024.3415028","article-title":"Dual Causes Generation Assisted Model for Multimodal Aspect-Based Sentiment Classification","volume":"36","author":"Fan","year":"2024","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"1607","DOI":"10.11834\/jig.240017","article-title":"Development of multimodal sentiment recognition and understanding","volume":"29","author":"Tao","year":"2024","journal-title":"J. Image Graph."},{"key":"ref_55","unstructured":"Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., and Huang, F. (2023). Qwen Technical Report. arXiv."},{"key":"ref_56","unstructured":"DeepSeek-AI, Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., and Wang, P. (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Pang, N., Wu, W., Hu, Y., Xu, K., Yin, Q., and Qin, L. (2024, January 15\u201319). Enhancing Multimodal Sentiment Analysis via Learning from Large Language Model. Proceedings of the 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, ON, Canada.","DOI":"10.1109\/ICME57554.2024.10688334"},{"key":"ref_58","unstructured":"Gu, Q., and Wang, X. (2020). Network Public Opinion Analysis: Theory, Technology and Application, Tsinghua University Press."},{"key":"ref_59","first-page":"120","article-title":"An Integrated Analysis of Topical and Emotional Evolution of Microblog Public Opinions on Public Emergencies","volume":"61","author":"An","year":"2017","journal-title":"Library Inf. Serv."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Wang, Z., Guo, Y., and Fu, J. (2023, January 15\u201318). CLIP-PubOp: A CLIP-based Multimodal Representation Fusion Method for Public Opinion. Proceedings of the 2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy.","DOI":"10.1109\/BigData59044.2023.10386439"},{"key":"ref_61","unstructured":"Chen, J. (2022). Research on Sentiment Analysis of Netizens Based on Fusion of Multi-modal Hierarchical Features. [Master\u2019s Thesis, Nanjing University of Aeronautics And Astronautics]."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Yang, X., Feng, S., Zhang, Y., and Wang, D. (2021, January 1\u20136). Multimodal Sentiment Detection Based on Multi-channel Graph Neural Networks. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online.","DOI":"10.18653\/v1\/2021.acl-long.28"},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1109\/MIS.2016.94","article-title":"Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages","volume":"31","author":"Zadeh","year":"2016","journal-title":"IEEE Intell. Syst."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Bagher Zadeh, A., Liang, P.P., Poria, S., Cambria, E., and Morency, L.P. (2018, January 15\u201320). Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia.","DOI":"10.18653\/v1\/P18-1208"},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1016\/j.neucom.2021.05.040","article-title":"MASAD: A Large-Scale Dataset for Multimodal Aspect-Based Sentiment Analysis","volume":"455","author":"Zhou","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Xiang, Y., Cai, Y., and Guo, J. (2023). MSFNet: Modality Smoothing Fusion Network for Multimodal Aspect-Based Sentiment Analysis. Front. Phys., 11.","DOI":"10.3389\/fphy.2023.1187503"},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Yang, D., Li, X., Li, Z., Zhou, C., Wang, X., and Chen, F. (2024, January 15\u201319). Prompt Fusion Interaction Transformer For Aspect-Based Multimodal Sentiment Analysis. Proceedings of the 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, ON, Canada.","DOI":"10.1109\/ICME57554.2024.10687885"},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"9814","DOI":"10.1109\/TII.2024.3388670","article-title":"Multichannel Cross-Modal Fusion Network for Multimodal Sentiment Analysis Considering Language Information Enhancement","volume":"20","author":"Hu","year":"2024","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"7657","DOI":"10.1109\/TCSVT.2024.3376564","article-title":"Trustworthy Multimodal Fusion for Sentiment Analysis in Ordinal Sentiment Space","volume":"34","author":"Xie","year":"2024","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1109\/TFUZZ.2024.3419140","article-title":"Exploring Multimodal Multiscale Features for Sentiment Analysis Using Fuzzy-Deep Neural Network Learning","volume":"33","author":"Wang","year":"2025","journal-title":"IEEE Trans. Fuzzy Syst."},{"key":"ref_71","unstructured":"Du, P. (2023). Research and Application of MultiModal Sentiment Analysis Methods in Chinese. [Master\u2019s Thesis, University of Electronic Science And Technology of China]."},{"key":"ref_72","unstructured":"Ni, N. (2019). Research on Cross-Media Topic Detection and Opinion Analysis. [Master\u2019s Thesis, Beijing University of Posts and Telecommunications]."},{"key":"ref_73","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1609\/aaai.v33i01.3301371","article-title":"Multi-Interactive Memory Network for Aspect Based Multimodal Sentiment Analysis","volume":"Volume 33","author":"Xu","year":"2019","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"ref_74","first-page":"5105","article-title":"Multi-Level Attention Map Network for Multimodal Sentiment Analysis","volume":"35","author":"Xue","year":"2023","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1038\/s41586-023-06185-3","article-title":"Accurate Medium-Range Global Weather Forecasting with 3D Neural Networks","volume":"619","author":"Bi","year":"2023","journal-title":"Nature"},{"key":"ref_76","first-page":"418","article-title":"Public Opinion Analysis Based on EEMD-Transformer Model: Taking COVID-19 Public Opinion as an Example","volume":"66","author":"Xu","year":"2020","journal-title":"J. Wuhan Univ. (Nat. Sci. Ed.)"},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Mao, H., Yuan, Z., Xu, H., Yu, W., Liu, Y., and Gao, K. (2022, January 22\u201327). M-SENA: An Integrated Platform for Multimodal Sentiment Analysis. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Dublin, Ireland.","DOI":"10.18653\/v1\/2022.acl-demo.20"},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Yu, W., Xu, H., Meng, F., Zhu, Y., Ma, Y., Wu, J., Zou, J., and Yang, K. (2020, January 5\u201310). CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2020.acl-main.343"},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Liu, Y., Yuan, Z., Mao, H., Liang, Z., Yang, W., Qiu, Y., Cheng, T., Li, X., Xu, H., and Gao, K. (2022, January 7\u201311). Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module. Proceedings of the 2022 International Conference on Multimodal Interaction, ICMI\u201922, Bengaluru, India.","DOI":"10.1145\/3536221.3556630"},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Zadeh, A., Chen, M., Poria, S., Cambria, E., and Morency, L.P. (2017, January 9\u201311). Tensor Fusion Network for Multimodal Sentiment Analysis. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark.","DOI":"10.18653\/v1\/D17-1115"},{"key":"ref_81","doi-asserted-by":"crossref","first-page":"5634","DOI":"10.1609\/aaai.v32i1.12021","article-title":"Memory Fusion Network for Multi-View Sequential Learning","volume":"Volume 32","author":"Zadeh","year":"2018","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"ref_82","doi-asserted-by":"crossref","unstructured":"Hazarika, D., Zimmermann, R., and Poria, S. (2020). MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis. MM \u201920: Proceedings of the 28th ACM International Conference on Multimedia, ACM.","DOI":"10.1145\/3394171.3413678"},{"key":"ref_83","doi-asserted-by":"crossref","unstructured":"Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A., and Potts, C. (2013, January 18\u201321). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA.","DOI":"10.18653\/v1\/D13-1170"},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Williams, J., Kleinegesse, S., Comanescu, R., and Radu, O. (2018, January 20). Recognizing Emotions in Video Using Multimodal DNN Feature Fusion. Proceedings of the Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML), Melbourne, Australia.","DOI":"10.18653\/v1\/W18-3302"},{"key":"ref_85","doi-asserted-by":"crossref","first-page":"10790","DOI":"10.1609\/aaai.v35i12.17289","article-title":"Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis","volume":"Volume 35","author":"Yu","year":"2021","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"ref_86","doi-asserted-by":"crossref","unstructured":"Han, W., Chen, H., and Poria, S. (2021, January 7\u201311). Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.emnlp-main.723"},{"key":"ref_87","unstructured":"Tsai, Y.H.H., Liang, P.P., Zadeh, A., Morency, L.P., and Salakhutdinov, R. (May, January 30). Learning Factorized Multimodal Representations. Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada."},{"key":"ref_88","doi-asserted-by":"crossref","first-page":"9511","DOI":"10.1038\/s41598-023-36201-5","article-title":"Experiments on Real-Life Emotions Challenge Ekman\u2019s Model","volume":"13","author":"Coppini","year":"2023","journal-title":"Sci. Rep."},{"key":"ref_89","doi-asserted-by":"crossref","unstructured":"Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., and Parikh, D. (2017, January 21\u201326). Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.670"},{"key":"ref_90","doi-asserted-by":"crossref","first-page":"5674","DOI":"10.1609\/aaai.v32i1.11962","article-title":"Adaptive Co-Attention Network for Named Entity Recognition in Tweets","volume":"Volume 32","author":"Zhang","year":"2018","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"ref_91","unstructured":"Inui, K., Jiang, J., Ng, V., and Wan, X. (2019, January 3\u20137). UR-FUNNY: A Multimodal Language Dataset for Understanding Humor. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China."},{"key":"ref_92","unstructured":"Lin, J., Men, R., Yang, A., Zhou, C., Ding, M., Zhang, Y., Wang, P., Wang, A., Jiang, L., and Jia, X. (2021). M6: A Chinese Multimodal Pretrainer. arXiv."},{"key":"ref_93","first-page":"26418","article-title":"Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark","volume":"Volume 35","author":"Gu","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"ref_94","doi-asserted-by":"crossref","unstructured":"Cheng, N., Guan, C., Gao, J., Wang, W., Li, Y., Meng, F., Zhou, J., Fang, B., Xu, J., and Han, W. (2024). Touch100k: A Large-Scale Touch-Language-Vision Dataset for Touch-Centric Multimodal Representation. arXiv.","DOI":"10.1016\/j.inffus.2025.103305"},{"key":"ref_95","doi-asserted-by":"crossref","unstructured":"Luo, M., Fei, H., Li, B., Wu, S., Liu, Q., Poria, S., Cambria, E., Lee, M.L., and Hsu, W. (November, January 28). PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis. Proceedings of the 32nd ACM International Conference on Multimedia, MM\u20192024, Melbourne, VIC, Australia.","DOI":"10.1145\/3664647.3680705"},{"key":"ref_96","doi-asserted-by":"crossref","first-page":"969","DOI":"10.1109\/TAFFC.2024.3485057","article-title":"SEED-VII: A Multimodal Dataset of Six Basic Emotions with Continuous Labels for Emotion Recognition","volume":"16","author":"Jiang","year":"2025","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_97","doi-asserted-by":"crossref","first-page":"109615","DOI":"10.1016\/j.fss.2025.109615","article-title":"Joint Uncertainty Model and Metric for Robust Feature Selection: A Bi-Level Distribution Consideration and Feature Evaluation Approach","volume":"523","author":"Wan","year":"2026","journal-title":"Fuzzy Sets Syst."}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/13\/1\/10\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T11:33:28Z","timestamp":1768390408000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/13\/1\/10"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,14]]},"references-count":97,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,1]]}},"alternative-id":["informatics13010010"],"URL":"https:\/\/doi.org\/10.3390\/informatics13010010","relation":{},"ISSN":["2227-9709"],"issn-type":[{"value":"2227-9709","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,14]]}}}