{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T19:50:45Z","timestamp":1765309845970,"version":"3.46.0"},"publisher-location":"New York, NY, USA","reference-count":37,"publisher":"ACM","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62376193, 61925602"],"award-info":[{"award-number":["62376193, 61925602"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,10,27]]},"DOI":"10.1145\/3746027.3754820","type":"proceedings-article","created":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T06:56:44Z","timestamp":1761375404000},"page":"2429-2436","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A Theoretical Proof of Dynamic Multimodal Fusion Exacerbates Modality Greedy"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-4224-6021","authenticated-orcid":false,"given":"Xiaorui","family":"Ding","sequence":"first","affiliation":[{"name":"Tianjin University, Tianjin, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-4448-9897","authenticated-orcid":false,"given":"Huan","family":"Ma","sequence":"additional","affiliation":[{"name":"Tianjin University, Tianjin, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1410-6650","authenticated-orcid":false,"given":"Changqing","family":"Zhang","sequence":"additional","affiliation":[{"name":"Tianjin University, Tianjin, China"}]}],"member":"320","published-online":{"date-parts":[[2025,10,27]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Multimodal machine learning: A survey and taxonomy","author":"Baltru\u0161aitis Tadas","year":"2018","unstructured":"Tadas Baltru\u0161aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, Vol. 41 (2018), 423-443."},{"key":"e_1_3_2_2_2_1","volume-title":"International conference on machine learning. PMLR, 1059-1071","author":"Brock Andy","year":"2021","unstructured":"Andy Brock, Soham De, Samuel L Smith, and Karen Simonyan. 2021. High-performance large-scale image recognition without normalization. In International conference on machine learning. PMLR, 1059-1071."},{"key":"e_1_3_2_2_3_1","unstructured":"David S Broomhead and David Lowe. 1988. Radial basis functions multi-variable functional interpolation and adaptive networks. Technical Report."},{"key":"e_1_3_2_2_4_1","volume-title":"Predictive Dynamic Fusion. arXiv preprint arXiv:2406.04802","author":"Cao Bing","year":"2024","unstructured":"Bing Cao, Yinan Xia, Yi Ding, Changqing Zhang, and Qinghua Hu. 2024. Predictive Dynamic Fusion. arXiv preprint arXiv:2406.04802 (2024)."},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20077-9_9"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2022.12.014"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00160"},{"key":"e_1_3_2_2_8_1","volume-title":"International Conference on Machine Learning. PMLR, 8632-8656","author":"Du Chenzhuang","year":"2023","unstructured":"Chenzhuang Du, Jiaye Teng, Tingle Li, Yichen Liu, Tianyuan Yuan, Yue Wang, Yang Yuan, and Hang Zhao. 2023. On uni-modal feature learning in supervised multi-modal learning. In International Conference on Machine Learning. PMLR, 8632-8656."},{"key":"e_1_3_2_2_9_1","volume-title":"Your classifier is secretly an energy based model and you should treat it like one. arXiv preprint arXiv:1912.03263","author":"Grathwohl Will","year":"2019","unstructured":"Will Grathwohl, Kuan-Chieh Wang, J\u00f6rn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. 2019. Your classifier is secretly an energy based model and you should treat it like one. arXiv preprint arXiv:1912.03263 (2019)."},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2018.11.017"},{"key":"e_1_3_2_2_11_1","volume-title":"International conference on machine learning. PMLR, 1321-1330","author":"Guo Chuan","year":"2017","unstructured":"Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. 2017. On calibration of modern neural networks. In International conference on machine learning. PMLR, 1321-1330."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.02005"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3171983"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413678"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2024.102648"},{"key":"e_1_3_2_2_16_1","volume-title":"International conference on machine learning. PMLR, 9226-9259","author":"Huang Yu","year":"2022","unstructured":"Yu Huang, Junyang Lin, Chang Zhou, Hongxia Yang, and Longbo Huang. 2022. Modality competition: What makes joint training of multi-modal network fail in deep learning?(provably). In International conference on machine learning. PMLR, 9226-9259."},{"key":"e_1_3_2_2_17_1","first-page":"29406","article-title":"Learning with noisy correspondence for cross-modal matching","volume":"34","author":"Huang Zhenyu","year":"2021","unstructured":"Zhenyu Huang, Guocheng Niu, Xiao Liu, Wenbiao Ding, Xinyan Xiao, Hua Wu, and Xi Peng. 2021. Learning with noisy correspondence for cross-modal matching. Advances in Neural Information Processing Systems, Vol. 34 (2021), 29406-29419.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i1.19988"},{"key":"e_1_3_2_2_19_1","volume-title":"Marc Aurelio Ranzato, and Fu Jie Huang","author":"Lecun Yann","year":"2006","unstructured":"Yann Lecun, Sumit Chopra, Raia Hadsell, Marc Aurelio Ranzato, and Fu Jie Huang. 2006. A tutorial on energy-based learning. MIT Press."},{"key":"e_1_3_2_2_20_1","first-page":"734","volume-title":"Contrastive Multimodal Fusion with TupleInfoNCE. 2021 IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Liu Yunze","year":"2021","unstructured":"Yunze Liu, Qingnan Fan, Shanghang Zhang, Hao Dong, Thomas A. Funkhouser, and Li Yi. 2021. Contrastive Multimodal Fusion with TupleInfoNCE. 2021 IEEE\/CVF International Conference on Computer Vision (ICCV) (2021), 734-743."},{"key":"e_1_3_2_2_21_1","first-page":"6881","article-title":"Trustworthy multimodal regression with mixture of normal-inverse gamma distributions","volume":"34","author":"Ma Huan","year":"2021","unstructured":"Huan Ma, Zongbo Han, Changqing Zhang, Huazhu Fu, Joey Tianyi Zhou, and Qinghua Hu. 2021a. Trustworthy multimodal regression with mixture of normal-inverse gamma distributions. Advances in Neural Information Processing Systems, Vol. 34 (2021), 6881-6893.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i3.16330"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2015.7301342"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00806"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2106235118"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33715-4_54"},{"key":"e_1_3_2_2_27_1","volume-title":"The development of embodied cognition: Six lessons from babies. Artificial life","author":"Smith Linda","year":"2005","unstructured":"Linda Smith and Michael Gasser. 2005. The development of embodied cognition: Six lessons from babies. Artificial life, Vol. 11 (2005), 13-29."},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298655"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01271"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02581"},{"key":"e_1_3_2_2_31_1","volume-title":"International Conference on Machine Learning. PMLR, 24043-24055","author":"Wu Nan","year":"2022","unstructured":"Nan Wu, Stanislaw Jastrzebski, Kyunghyun Cho, and Krzysztof J Geras. 2022. Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks. In International Conference on Machine Learning. PMLR, 24043-24055."},{"key":"e_1_3_2_2_32_1","volume-title":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)","author":"Wu Zhirong","year":"2014","unstructured":"Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2014. 3D ShapeNets: A deep representation for volumetric shapes. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014), 1912-1920. https:\/\/api.semanticscholar.org\/CorpusID:206592833"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i14.29546"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW59228.2023.00256"},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i4.16428"},{"key":"e_1_3_2_2_36_1","volume-title":"Provable Dynamic Fusion for Low-Quality Multimodal Data. In International Conference on Machine Learning.","author":"Zhang Qingyang","year":"2023","unstructured":"Qingyang Zhang, Haitao Wu, Changqing Zhang, Qinghua Hu, Huazhu Fu, Joey Tianyi Zhou, and Xi Peng. 2023. Provable Dynamic Fusion for Low-Quality Multimodal Data. In International Conference on Machine Learning."},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3612468"}],"event":{"name":"MM '25: The 33rd ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Dublin Ireland","acronym":"MM '25"},"container-title":["Proceedings of the 33rd ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3746027.3754820","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T19:46:38Z","timestamp":1765309598000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3746027.3754820"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,27]]},"references-count":37,"alternative-id":["10.1145\/3746027.3754820","10.1145\/3746027"],"URL":"https:\/\/doi.org\/10.1145\/3746027.3754820","relation":{},"subject":[],"published":{"date-parts":[[2025,10,27]]},"assertion":[{"value":"2025-10-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}