{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T06:09:04Z","timestamp":1768284544563,"version":"3.49.0"},"reference-count":65,"publisher":"Association for Computing Machinery (ACM)","issue":"12","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62276170, \u00a062576216, 62306061, 62206180"],"award-info":[{"award-number":["62276170, \u00a062576216, 62306061, 62206180"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Science and Technology Project of Guangdong Province","award":["2023A1515010688"],"award-info":[{"award-number":["2023A1515010688"]}]},{"name":"Open Research Fund from Guangdong Laboratory of Artificial Intelligence and Digital Economy","award":["GML-KF-24-11"],"award-info":[{"award-number":["GML-KF-24-11"]}]},{"name":"Guangdong Provincial Key Laboratory","award":["2023B1212060076"],"award-info":[{"award-number":["2023B1212060076"]}]},{"name":"XJTLU Research Development Fund","award":["RDF-23-01-053"],"award-info":[{"award-number":["RDF-23-01-053"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,12,31]]},"abstract":"<jats:p>\n                    For Multimodal Sentiment Analysis (MSA), previous methods concentrate on designing sophisticated fusion strategies and performing representation learning across heterogeneous modalities, aiming to leverage multimodal signals to detect human sentiment. However, these approaches fail to address the long-standing issue of corrupted modal details in videos, which may be caused by the challenge of the excessive loss of emotionally relevant semantics resulted from the degradation of detailed information. In this work, we aim to improve the robustness capacity of resisting corruption in MSA, by introducing a Hierarchical Frequency Restoration and Adaptive Modality Enforcement (HFR-AME) approach. The HFR-AME progressively recovers blurred detailed cues in each modality while enhancing the discriminative power of modal representations. Specifically, to reconstruct distinct frequency band features, we propose to equip the HFR module with a key component called the Frequency Multimodal UNet (FM-UNet), so as to utilize complementary modal features as conditions. This meticulous restoration process, performed from low to high frequency, facilitates the comprehensive recovery of intricate details. Meanwhile, to adaptively integrate these diverse frequency features, we introduce the AME module to enhance the beneficial modal frequencies while suppressing irrelevant ones, with the goal of strengthening the restored modal representations. Extensive experiments show our HFR-AME outperforms state-of-the-art methods on the CMU-MOSI and CMU-MOSEI datasets, improving 7-class accuracy by 0.5% and 0.6%, respectively. Further analysis also confirms its cross-lingual generalization and competitive computational efficiency. Our code is made available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/nianhua20\/HFR-AME\">https:\/\/github.com\/nianhua20\/HFR-AME<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1145\/3767746","type":"journal-article","created":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T13:50:57Z","timestamp":1760104257000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Frequency Restoration and Modality Enforcement towards Resisting-corruption Multimodal Sentiment Analysis"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8946-7472","authenticated-orcid":false,"given":"Weicheng","family":"Xie","sequence":"first","affiliation":[{"name":"School of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China, Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China and Guangdong Provincial Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-5453-2426","authenticated-orcid":false,"given":"Haijian","family":"Liang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-5461-6954","authenticated-orcid":false,"given":"Zenghao","family":"Niu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8728-2842","authenticated-orcid":false,"given":"Xianxu","family":"Hou","sequence":"additional","affiliation":[{"name":"Xi\u2019an Jiaotong-Liverpool University, Suzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2339-5685","authenticated-orcid":false,"given":"Siyang","family":"Song","sequence":"additional","affiliation":[{"name":"School of Computer Science, University of Exeter, Exeter, United Kingdom of Great Britain and Northern Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0422-6616","authenticated-orcid":false,"given":"Zitong","family":"Yu","sequence":"additional","affiliation":[{"name":"Great Bay University, Dongguan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1420-0815","authenticated-orcid":false,"given":"Linlin","family":"Shen","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Shenzhen University, Shenzhen, China and Department of Computer Science, University of Nottingham Ningbo China, Ningbo, China"}]}],"member":"320","published-online":{"date-parts":[[2025,11,21]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2016.7477553"},{"key":"e_1_3_3_3_2","volume-title":"Digital Image Processing: Principles and Applications","author":"Baxes Gregory A.","year":"1994","unstructured":"Gregory A. Baxes. 1994. Digital Image Processing: Principles and Applications. John Wiley & Sons, Inc."},{"key":"e_1_3_3_4_2","first-page":"1","volume-title":"Proceedings of the 42nd International Conference on Machine Learning","volume":"267","author":"Chaudhuri Abhra","year":"2025","unstructured":"Abhra Chaudhuri, Anjan Dutta, Tu Bui, and Serban Georgescu. 2025. A closer look at multimodal representation collapse. In Proceedings of the 42nd International Conference on Machine Learning, Vol. 267, PMLR, 1\u201323."},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2014.2347551"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3586075"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2014.6853739"},{"key":"e_1_3_3_8_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171\u20134186."},{"key":"e_1_3_3_9_2","first-page":"8780","article-title":"Diffusion models beat GANs on image synthesis","volume":"34","author":"Dhariwal Prafulla","year":"2021","unstructured":"Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems, Vol. 34, 8780\u20138794.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2022.3163445"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2024.3494239"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3299324"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v39i16.33867"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.723"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413678"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3276075"},{"key":"e_1_3_3_17_2","first-page":"1","volume-title":"Proceedings of the 9th International Conference on Learning Representations (ICLR \u201921)","author":"He Pengcheng","year":"2021","unstructured":"Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. 2021. DeBERTa: Decoding-enhanced BERT with disentangled attention. In Proceedings of the 9th International Conference on Learning Representations (ICLR \u201921), 1\u201321."},{"key":"e_1_3_3_18_2","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, Vol. 33, 6840\u20136851.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.534"},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3388861"},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00682"},{"key":"e_1_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.4208\/cicp.OA-2020-0085"},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2024.3430045"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2023.3293772"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00628"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2023.3284038"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3147032"},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1209"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00258"},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1046"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i01.5347"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3082398"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3171679"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2020.3014889"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33016892"},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.5555\/556016"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-71249-9_47"},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.214"},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00453"},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2023.3274829"},{"key":"e_1_3_3_43_2","first-page":"1","volume-title":"International Conference on Learning Representations (ICLR \u201919)","author":"Tsai Yao-Hung Hubert","year":"2019","unstructured":"Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Learning factorized multimodal representations. In Proceedings of International Conference on Learning Representations (ICLR \u201919), 1\u201320."},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1656"},{"issue":"86","key":"e_1_3_3_45_2","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten Laurens","year":"2008","unstructured":"Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579\u20132605.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_3_46_2","first-page":"1","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, Vol. 30, 1\u201311.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3572915"},{"key":"e_1_3_3_48_2","first-page":"6550","volume-title":"Proceedings of International Joint Conference on Artificial Intelligence","author":"Wu Zhuojia","year":"2024","unstructured":"Zhuojia Wu, Qi Zhang, Duoqian Miao, Kun Yi, Wei Fan, and Liang Hu. 2024. HyDiscGAN: A hybrid distributed cGAN for audio-visual privacy preservation in multimodal sentiment analysis. In Proceedings of International Joint Conference on Artificial Intelligence, 6550\u20136558."},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-36708-4_22"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3517139"},{"key":"e_1_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3547754"},{"key":"e_1_3_3_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3547755"},{"key":"e_1_3_3_53_2","volume-title":"A Fourier Perspective on Model Robustness in Computer Vision","author":"Yin Dong","year":"2019","unstructured":"Dong Yin, Raphael Gontijo Lopes, Jonathon Shlens, Ekin D. Cubuk, and Justin Gilmer. 2019. A Fourier Perspective on Model Robustness in Computer Vision. Curran Associates Inc., Red Hook, NY."},{"key":"e_1_3_3_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCI.2018.2840738"},{"key":"e_1_3_3_55_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.343"},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i12.17289"},{"key":"e_1_3_3_57_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1115"},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12021"},{"key":"e_1_3_3_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2016.94"},{"key":"e_1_3_3_60_2","first-page":"2236","volume-title":"Proceedings of Annual Meeting of the Association for Computational Linguistics","author":"Zadeh AmirAli Bagher","year":"2018","unstructured":"AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. In Proceedings of Annual Meeting of the Association for Computational Linguistics, 2236\u20132246."},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3312858"},{"issue":"3","key":"e_1_3_3_62_2","first-page":"32","article-title":"Affective computing for large-scale heterogeneous multimedia data: A survey","volume":"15","author":"Zhao Sicheng","year":"2019","unstructured":"Sicheng Zhao, Shangfei Wang, Mohammad Soleymani, Dhiraj Joshi, and Qiang Ji. 2019. Affective computing for large-scale heterogeneous multimedia data: A survey. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 3s, Article 93 (Dec. 2019), 32 pages.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_3_63_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2023.02.028"},{"key":"e_1_3_3_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00398"},{"key":"e_1_3_3_65_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2022.103223"},{"key":"e_1_3_3_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3179926"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3767746","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,22]],"date-time":"2025-11-22T06:59:56Z","timestamp":1763794796000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3767746"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,21]]},"references-count":65,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,12,31]]}},"alternative-id":["10.1145\/3767746"],"URL":"https:\/\/doi.org\/10.1145\/3767746","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,21]]},"assertion":[{"value":"2025-03-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-23","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}