{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T15:52:54Z","timestamp":1781538774325,"version":"3.54.5"},"publisher-location":"New York, NY, USA","reference-count":44,"publisher":"ACM","license":[{"start":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T00:00:00Z","timestamp":1781481600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/legalcode"}],"funder":[{"name":"the National Natural Science Foundation of China","award":["62233018, 62136002 and 62221005"],"award-info":[{"award-number":["62233018, 62136002 and 62221005"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,6,16]]},"DOI":"10.1145\/3805622.3810771","type":"proceedings-article","created":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T14:42:57Z","timestamp":1781534577000},"page":"487-496","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Evidential Uncertainty Modulated Adaptive Predictive Contrastive Learning for Multimodal Fusion"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8090-6600","authenticated-orcid":false,"given":"Qiuyu","family":"Mei","sequence":"first","affiliation":[{"name":"Chongqing University of Posts and Telecommunications, Chongqing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0667-8413","authenticated-orcid":false,"given":"Hong","family":"Yu","sequence":"additional","affiliation":[{"name":"Chongqing University of Posts and Telecommunications, Chongqing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-9872-7467","authenticated-orcid":false,"given":"Shijie","family":"Yu","sequence":"additional","affiliation":[{"name":"Chongqing University of Posts and Telecommunications, Chongqing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8648-9692","authenticated-orcid":false,"given":"Yan","family":"Yang","sequence":"additional","affiliation":[{"name":"Chongqing University of Posts and Telecommunications, Chongqing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2026,6,15]]},"reference":[{"key":"e_1_3_3_1_2_2","doi-asserted-by":"publisher","unstructured":"Fukui Akira Dong\u00a0Huk Park Yang Daylen Rohrbach Anna Darrell Trevor and Rohrbach Marcus. 2016. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016). 10.18653\/v1\/d16-1044","DOI":"10.18653\/v1\/d16-1044"},{"key":"e_1_3_3_1_3_2","first-page":"5783","volume-title":"Proceedings of the 29th International Conference on Computational Linguistics","author":"An Chenxin","year":"2022","unstructured":"Chenxin An, Ming Zhong, Zhiyong Wu, Qin Zhu, Xuanjing Huang, and Xipeng Qiu. 2022. CoLo: A Contrastive Learning Based Re-ranking Framework for One-Stage Summarization. In Proceedings of the 29th International Conference on Computational Linguistics, Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony\u00a0Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon Na (Eds.). International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 5783\u20135793. https:\/\/aclanthology.org\/2022.coling-1.508\/"},{"key":"e_1_3_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV56688.2023.00273"},{"key":"e_1_3_3_1_5_2","series-title":"Proceedings of Machine Learning Research","first-page":"7694","volume-title":"Proceedings of the 40th International Conference on Machine Learning","volume":"202","author":"Desai Karan","year":"2023","unstructured":"Karan Desai, Maximilian Nickel, Tanmay Rajpurohit, Justin Johnson, and Shanmukha\u00a0Ramakrishna Vedantam. 2023. Hyperbolic Image-text Representations. In Proceedings of the 40th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol.\u00a0202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 7694\u20137731."},{"key":"e_1_3_3_1_6_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 4171\u20134186."},{"key":"e_1_3_3_1_7_2","volume-title":"9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net."},{"key":"e_1_3_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02538"},{"key":"e_1_3_3_1_9_2","first-page":"6704","volume-title":"Advances in Neural Information Processing Systems","volume":"35","author":"Goel Shashank","year":"2022","unstructured":"Shashank Goel, Hritik Bansal, Sumit Bhatia, Ryan Rossi, Vishwa Vinay, and Aditya Grover. 2022. CyCLIP: Cyclic Contrastive Language-Image Pretraining. In Advances in Neural Information Processing Systems , S.\u00a0Koyejo, S.\u00a0Mohamed, A.\u00a0Agarwal, D.\u00a0Belgrave, K.\u00a0Cho, and A.\u00a0Oh (Eds.), Vol.\u00a035. Curran Associates, Inc., 6704\u20136719. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/file\/2cd36d327f33d47b372d4711edd08de0-Paper-Conference.pdf"},{"key":"e_1_3_3_1_10_2","volume-title":"The Thirty-ninth Annual Conference on Neural Information Processing Systems","author":"Gong Baoquan","year":"2025","unstructured":"Baoquan Gong, Xiyuan Gao, Pengfei Zhu, Qinghua Hu, and Bing Cao. 2025. Multimodal Negative Learning. In The Thirty-ninth Annual Conference on Neural Information Processing Systems. https:\/\/openreview.net\/forum?id=OeXukLC6SK"},{"key":"e_1_3_3_1_11_2","doi-asserted-by":"publisher","unstructured":"Zongbo Han Changqing Zhang Huazhu Fu and Joey\u00a0Tianyi Zhou. 2023. Trusted Multi-View Classification With Dynamic Evidential Fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 2 (2023) 2551\u20132566. 10.1109\/TPAMI.2022.3171983","DOI":"10.1109\/TPAMI.2022.3171983"},{"key":"e_1_3_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-42337-1"},{"key":"e_1_3_3_1_13_2","first-page":"42191","volume-title":"Advances in Neural Information Processing Systems","volume":"36","author":"Jung Myong\u00a0Chol","year":"2023","unstructured":"Myong\u00a0Chol Jung, He Zhao, Joanna Dipnall, and Lan Du. 2023. Beyond Unimodal: Generalising Neural Processes for Multimodal Uncertainty Estimation. In Advances in Neural Information Processing Systems , A.\u00a0Oh, T.\u00a0Naumann, A.\u00a0Globerson, K.\u00a0Saenko, M.\u00a0Hardt, and S.\u00a0Levine (Eds.), Vol.\u00a036. Curran Associates, Inc., 42191\u201342216. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2023\/file\/839e23e5b1c52cfd1268f4023a3af0d6-Paper-Conference.pdf"},{"key":"e_1_3_3_1_14_2","volume-title":"3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings","author":"Kingma Diederik\u00a0P.","year":"2015","unstructured":"Diederik\u00a0P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1412.6980"},{"key":"e_1_3_3_1_15_2","series-title":"Proceedings of Machine Learning Research","first-page":"19730","volume-title":"Proceedings of the 40th International Conference on Machine Learning","volume":"202","author":"Li Junnan","year":"2023","unstructured":"Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. In Proceedings of the 40th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol.\u00a0202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 19730\u201319742. https:\/\/proceedings.mlr.press\/v202\/li23q.html"},{"key":"e_1_3_3_1_16_2","first-page":"9694","volume-title":"Advances in Neural Information Processing Systems","author":"Li Junnan","year":"2021","unstructured":"Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, and Steven Chu\u00a0Hong Hoi. 2021. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation. In Advances in Neural Information Processing Systems , M.\u00a0Ranzato, A.\u00a0Beygelzimer, Y.\u00a0Dauphin, P.S. Liang, and J.\u00a0Wortman Vaughan (Eds.), Vol.\u00a034. Curran Associates, Inc., 9694\u20139705. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2021\/file\/505259756244493872b7709a8a01b536-Paper.pdf"},{"key":"e_1_3_3_1_17_2","doi-asserted-by":"crossref","unstructured":"Wuchao Liu Wengen Li Yu-Ping Ruan Yulou Shu Juntao Chen Yina Li Caili Yu Yichao Zhang Jihong Guan and Shuigeng Zhou. 2024. Weakly Correlated Multimodal Sentiment Analysis: New Dataset and Topic-Oriented Model. IEEE Transactions on Affective Computing 15 4 (2024) 2070\u20132082.","DOI":"10.1109\/TAFFC.2024.3396144"},{"key":"e_1_3_3_1_18_2","doi-asserted-by":"publisher","unstructured":"Huisheng Mao Baozheng Zhang Hua Xu Ziqi Yuan and Yihe Liu. 2024. Robust-MSA: Understanding the Impact of Modality Noise on Multimodal Sentiment Analysis. Proceedings of the AAAI Conference on Artificial Intelligence 37 13 (Jul. 2024) 16458\u201316460. 10.1609\/aaai.v37i13.27078","DOI":"10.1609\/aaai.v37i13.27078"},{"key":"e_1_3_3_1_19_2","first-page":"14200","volume-title":"Advances in Neural Information Processing Systems","author":"Nagrani Arsha","year":"2021","unstructured":"Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, and Chen Sun. 2021. Attention Bottlenecks for Multimodal Fusion. In Advances in Neural Information Processing Systems , M.\u00a0Ranzato, A.\u00a0Beygelzimer, Y.\u00a0Dauphin, P.S. Liang, and J.\u00a0Wortman Vaughan (Eds.), Vol.\u00a034. Curran Associates, Inc., 14200\u201314213. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2021\/file\/76ba9f564ebbc35b1014ac498fafadd0-Paper.pdf"},{"key":"e_1_3_3_1_20_2","first-page":"6149","volume-title":"Proceedings of the Twelfth Language Resources and Evaluation Conference","author":"Nakamura Kai","year":"2020","unstructured":"Kai Nakamura, Sharon Levy, and William\u00a0Yang Wang. 2020. Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Nicoletta Calzolari, Fr\u00e9d\u00e9ric B\u00e9chet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, H\u00e9l\u00e8ne Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association, Marseille, France, 6149\u20136157."},{"key":"e_1_3_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME57554.2024.10687694"},{"key":"e_1_3_3_1_22_2","series-title":"Proceedings of Machine Learning Research","first-page":"8748","volume-title":"Proceedings of the 38th International Conference on Machine Learning","volume":"139","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong\u00a0Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol.\u00a0139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748\u20138763."},{"key":"e_1_3_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP49357.2023.10097207"},{"key":"e_1_3_3_1_24_2","volume-title":"Advances in Neural Information Processing Systems","author":"Sensoy Murat","year":"2018","unstructured":"Murat Sensoy, Lance Kaplan, and Melih Kandemir. 2018. Evidential Deep Learning to Quantify Classification Uncertainty. In Advances in Neural Information Processing Systems , S.\u00a0Bengio, H.\u00a0Wallach, H.\u00a0Larochelle, K.\u00a0Grauman, N.\u00a0Cesa-Bianchi, and R.\u00a0Garnett (Eds.), Vol.\u00a031. Curran Associates, Inc.https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2018\/file\/a981f2b708044d6fb4a71a1463242520-Paper.pdf"},{"key":"e_1_3_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1656"},{"key":"e_1_3_3_1_26_2","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan\u00a0N Gomez, \u0141\u00a0ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems , I.\u00a0Guyon, U.\u00a0Von Luxburg, S.\u00a0Bengio, H.\u00a0Wallach, R.\u00a0Fergus, S.\u00a0Vishwanathan, and R.\u00a0Garnett (Eds.), Vol.\u00a030. Curran Associates, Inc."},{"key":"e_1_3_3_1_27_2","doi-asserted-by":"crossref","unstructured":"Hongbin Wang Qifei Du and Yan Xiang. 2025. Image\u2013text sentiment analysis based on hierarchical interaction fusion and contrast learning enhanced. Engineering Applications of Artificial Intelligence 146 (2025) 110262.","DOI":"10.1016\/j.engappai.2025.110262"},{"key":"e_1_3_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00277"},{"key":"e_1_3_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.63317\/5c2xbka7swxq"},{"key":"e_1_3_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51701.2025.02124"},{"key":"e_1_3_3_1_31_2","doi-asserted-by":"publisher","unstructured":"Peng Xu Xiatian Zhu and David\u00a0A. Clifton. 2023. Multimodal Learning With Transformers: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 10 (2023) 12113\u201312132. 10.1109\/TPAMI.2023.3275156","DOI":"10.1109\/TPAMI.2023.3275156"},{"key":"e_1_3_3_1_32_2","doi-asserted-by":"publisher","unstructured":"Xing Xu Tan Wang Yang Yang Lin Zuo Fumin Shen and Heng\u00a0Tao Shen. 2020. Cross-Modal Attention With Semantic Consistence for Image\u2013Text Matching. IEEE Transactions on Neural Networks and Learning Systems 31 12 (2020) 5412\u20135425. 10.1109\/TNNLS.2020.2967597","DOI":"10.1109\/TNNLS.2020.2967597"},{"key":"e_1_3_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW59228.2023.00256"},{"key":"e_1_3_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01857"},{"key":"e_1_3_3_1_35_2","doi-asserted-by":"crossref","unstructured":"Xiaocui Yang Shi Feng Daling Wang and Yifei Zhang. 2021. Image-Text Multimodal Emotion Classification via Multi-View Attentional Network. IEEE Transactions on Multimedia 23 (2021) 4014\u20134026.","DOI":"10.1109\/TMM.2020.3035277"},{"key":"e_1_3_3_1_36_2","first-page":"62108","volume-title":"Advances in Neural Information Processing Systems","author":"Yang Yang","year":"2024","unstructured":"Yang Yang, Fengqiang Wan, Qing-Yuan Jiang, and Yi Xu. 2024. Facilitating Multimodal Classification via Dynamically Learning Modality Gap. In Advances in Neural Information Processing Systems , A.\u00a0Globerson, L.\u00a0Mackey, D.\u00a0Belgrave, A.\u00a0Fan, U.\u00a0Paquet, J.\u00a0Tomczak, and C.\u00a0Zhang (Eds.), Vol.\u00a037. Curran Associates, Inc., 62108\u201362122."},{"key":"e_1_3_3_1_37_2","doi-asserted-by":"publisher","unstructured":"Yuan Yuan Zhaojian Li and Bin Zhao. 2025. A Survey of Multimodal Learning: Methods Applications and Future. ACM Comput. Surv. 57 7 Article 167 (Feb. 2025) 34\u00a0pages. 10.1145\/3713070","DOI":"10.1145\/3713070"},{"key":"e_1_3_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1115"},{"key":"e_1_3_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01100"},{"key":"e_1_3_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00089"},{"key":"e_1_3_3_1_41_2","doi-asserted-by":"publisher","unstructured":"Fei Zhao Chengcui Zhang and Baocheng Geng. 2024. Deep Multimodal Data Fusion. ACM Comput. Surv. 56 9 Article 216 (April 2024) 36\u00a0pages. 10.1145\/3649447","DOI":"10.1145\/3649447"},{"key":"e_1_3_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3591106.3592271"},{"key":"e_1_3_3_1_43_2","first-page":"2664","volume-title":"Advances in Neural Information Processing Systems","volume":"35","author":"Zhu Jinguo","year":"2022","unstructured":"Jinguo Zhu, Xizhou Zhu, Wenhai Wang, Xiaohua Wang, Hongsheng Li, Xiaogang Wang, and Jifeng Dai. 2022. Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs. In Advances in Neural Information Processing Systems , S.\u00a0Koyejo, S.\u00a0Mohamed, A.\u00a0Agarwal, D.\u00a0Belgrave, K.\u00a0Cho, and A.\u00a0Oh (Eds.), Vol.\u00a035. Curran Associates, Inc., 2664\u20132678. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/file\/11fc8c98b46d4cbdfe8157267228f7d7-Paper-Conference.pdf"},{"key":"e_1_3_3_1_44_2","doi-asserted-by":"crossref","unstructured":"Tong Zhu Leida Li Jufeng Yang Sicheng Zhao Hantao Liu and Jiansheng Qian. 2023. Multimodal Sentiment Analysis With Image-Text Interaction Network. IEEE Transactions on Multimedia 25 (2023) 3375\u20133385.","DOI":"10.1109\/TMM.2022.3160060"},{"key":"e_1_3_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-acl.41"}],"event":{"name":"ICMR '26: International Conference on Multimedia Retrieval","location":"Amsterdam The Netherlands","acronym":"ICMR '26","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 2026 International Conference on Multimedia Retrieval"],"original-title":[],"deposited":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T14:53:55Z","timestamp":1781535235000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3805622.3810771"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,6,15]]},"references-count":44,"alternative-id":["10.1145\/3805622.3810771","10.1145\/3805622"],"URL":"https:\/\/doi.org\/10.1145\/3805622.3810771","relation":{},"subject":[],"published":{"date-parts":[[2026,6,15]]},"assertion":[{"value":"2026-06-15","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}