{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T19:37:44Z","timestamp":1773517064370,"version":"3.50.1"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"4","funder":[{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,4,30]]},"abstract":"<jats:p>\n                    Deep hashing has proven remarkable effectiveness for large-scale cross-modal retrieval, yet its performance is highly vulnerable to supervisory noise, such as mismatched cross-modal correspondences and incorrect category labels. Such noise is prevalent in real-world scenarios, where correspondence mismatches and label inaccuracies often coexist, posing significant challenges for learning accurate multimodal representations. Existing methods typically address only a single type of noise in isolation and neglect the potential value of noisy data, resulting in limited performance gains. To address these challenges, we propose Noise-Robust Generative Hashing (NRGH), a unified framework designed to accommodate various forms of noise inherent in cross-modal retrieval. Specifically, NRGH introduces a hash-driven noise estimation module that computes the confidence score for each multimodal sample by combining frozen auxiliary hash functions with a Gaussian mixture model. Guided by these confidence scores, NRGH performs data correction through two stages: generative text refinement and multi-label probability calibration. The former leverages a pre-trained vision-language model to generate descriptive captions that refine noisy textual information, while the latter corrects noisy labels using confidence-aware soft labels. Furthermore, a dynamic margin contrastive loss adaptively modulates the data contribution of each sample based on its confidence, enabling sample-level adaptive learning. Extensive experiments on benchmark datasets demonstrate that NRGH significantly exceeds state-of-the-art baselines in various noisy scenarios, delivering superior robustness and accuracy. Our source codes and datasets are available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/xiaolaohuuu\/NRGH\">https:\/\/github.com\/xiaolaohuuu\/NRGH<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1145\/3777477","type":"journal-article","created":{"date-parts":[[2025,11,19]],"date-time":"2025-11-19T16:05:03Z","timestamp":1763568303000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Noise-Robust Generative Hashing for Cross-Modal Retrieval"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-0037-3098","authenticated-orcid":false,"given":"Zequn","family":"Wang","sequence":"first","affiliation":[{"name":"Shandong Normal University, Jinan, China and Tongji University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8013-5188","authenticated-orcid":false,"given":"Tianshi","family":"Wang","sequence":"additional","affiliation":[{"name":"Tongji University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3432-6215","authenticated-orcid":false,"given":"Fengling","family":"Li","sequence":"additional","affiliation":[{"name":"University of Technology Sydney, Sydney, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5504-2529","authenticated-orcid":false,"given":"Jingjing","family":"Li","sequence":"additional","affiliation":[{"name":"University of Electronic Science and Technology of China, Chengdu, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2993-7142","authenticated-orcid":false,"given":"Lei","family":"Zhu","sequence":"additional","affiliation":[{"name":"Tongji University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2026,3,9]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/1646396.1646452"},{"key":"e_1_3_1_3_2","first-page":"9397","volume-title":"Proceedings of the ACM International Conference on Multimedia","author":"Duan Yue","year":"2024","unstructured":"Yue Duan, Zhangxuan Gu, Zhenzhe Ying, Lei Qi, Changhua Meng, and Yinghuan Shi. 2024. PC2: Pseudo-classification based pseudo-captioning for noisy correspondence learning in cross-modal retrieval. In Proceedings of the ACM International Conference on Multimedia, 9397\u20139406."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v39i3.32290"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2729019"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00726"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2025.3558854"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2023.3247939"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00536"},{"issue":"3","key":"e_1_3_1_10_2","first-page":"3877","article-title":"Unsupervised contrastive cross-modal hashing","volume":"45","author":"Hu Peng","year":"2023","unstructured":"Peng Hu, Hongyuan Zhu, Jie Lin, Dezhong Peng, Yin-Ping Zhao, and Xi Peng. 2023. Unsupervised contrastive cross-modal hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2023), 3877\u20133889.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2024.3392763"},{"key":"e_1_3_1_12_2","first-page":"29406","article-title":"Learning with noisy correspondence for cross-modal matching","volume":"34","author":"Huang Zhenyu","year":"2021","unstructured":"Zhenyu Huang, Guocheng Niu, Xiao Liu, Wenbiao Ding, Xinyan Xiao, Hua Wu, and Xi Peng. 2021. Learning with noisy correspondence for cross-modal matching. Advances in Neural Information Processing Systems 34 (2021), 29406\u201329419.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/1460096.1460104"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3285266"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.348"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2025.3586504"},{"key":"e_1_3_1_17_2","first-page":"1","article-title":"Generative augmentation hashing for few-shot cross-modal retrieval","author":"Li Fengling","year":"2025","unstructured":"Fengling Li, Zequn Wang, Tianshi Wang, Lei Zhu, and Xiaojun Chang. 2025. Generative augmentation hashing for few-shot cross-modal retrieval. IEEE Transactions on Circuits and Systems for Video Technology (2025), 1\u20131.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_1_18_2","first-page":"12888","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Li Junnan","year":"2022","unstructured":"Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proceedings of the International Conference on Machine Learning, 12888\u201312900."},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2024.3350695"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i3.16285"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-022-3783-3"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3587251"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2940446"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2024.3374221"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3293104"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2005.10.028"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2025.3535306"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3547922"},{"key":"e_1_3_1_30_2","first-page":"8748","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, 8748\u20138763."},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298598"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2789887"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2022.3172716"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2022.3152527"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00312"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531947"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3177901"},{"issue":"2","key":"e_1_3_1_38_2","first-page":"26","article-title":"Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude","volume":"4","author":"Tieleman Tijmen","year":"2012","unstructured":"Tijmen Tieleman. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning 4, 2 (2012), 26.","journal-title":"COURSERA: Neural Networks for Machine Learning"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3251395"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123326"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2025.3581813"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2015.2455415"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3664647.3680564"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2024.3525147"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3650205"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2024.3381347"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2020.3017344"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3548066"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00740"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2019.2932666"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2024.3403994"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2019.2932742"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i5.16592"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02585"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2022.3218656"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3777477","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T13:24:45Z","timestamp":1773494685000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3777477"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,9]]},"references-count":54,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2026,4,30]]}},"alternative-id":["10.1145\/3777477"],"URL":"https:\/\/doi.org\/10.1145\/3777477","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,9]]},"assertion":[{"value":"2025-08-31","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-05","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-03-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}