{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T15:55:40Z","timestamp":1781538940881,"version":"3.54.5"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","license":[{"start":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T00:00:00Z","timestamp":1781481600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/legalcode"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,6,16]]},"DOI":"10.1145\/3805622.3810829","type":"proceedings-article","created":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T14:42:57Z","timestamp":1781534577000},"page":"1653-1661","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Audit-and-Repair Indexing: Probabilistic Caption Indexes for Robust Multimodal Retrieval"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-2372-0880","authenticated-orcid":false,"given":"Yangyang","family":"Liu","sequence":"first","affiliation":[{"name":"Independent Researcher, Beijing, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2026,6,15]]},"reference":[{"key":"e_1_3_3_1_2_2","doi-asserted-by":"crossref","unstructured":"Carlo Acerbi and Dirk Tasche. 2002. Expected shortfall: a natural coherent alternative to value at risk. Economic notes 31 2 (2002) 379\u2013388.","DOI":"10.1111\/1468-0300.00091"},{"key":"e_1_3_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_3_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.02080"},{"key":"e_1_3_3_1_5_2","unstructured":"Anton Baumann Rui Li Marcus Klasson Santeri Mentu Shyamgopal Karthik Zeynep Akata Arno Solin and Martin Trapp. 2024. Post-hoc probabilistic vision-language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2412.06014 (2024)."},{"key":"e_1_3_3_1_6_2","doi-asserted-by":"crossref","unstructured":"Wenliang Dai Junnan Li Dongxu Li Anthony Tiong Junqi Zhao Weisheng Wang Boyang Li Pascale\u00a0N Fung and Steven Hoi. 2023. Instructblip: Towards general-purpose vision-language models with instruction tuning. Advances in neural information processing systems 36 (2023) 49250\u201349267.","DOI":"10.52202\/075280-2142"},{"key":"e_1_3_3_1_7_2","doi-asserted-by":"crossref","unstructured":"Ronald Fagin Ravi Kumar and Dakshinamurthi Sivakumar. 2003. Comparing top k lists. SIAM Journal on discrete mathematics 17 1 134\u2013160.","DOI":"10.1137\/S0895480102412856"},{"key":"e_1_3_3_1_8_2","doi-asserted-by":"crossref","unstructured":"Han Fang Xianghao Zang Chao Ban Zerun Feng Lanxiang Zhou Zhongjiang He Yongxiang Li and Hao Sun. 2024. Prota: Probabilistic token aggregation for text-video retrieval. 2024 IEEE International Conference on Multimedia and Expo (ICME) (2024) 1\u20136.","DOI":"10.1109\/ICME57554.2024.10687550"},{"key":"e_1_3_3_1_9_2","doi-asserted-by":"crossref","unstructured":"Beno\u00eet Fr\u00e9nay and Michel Verleysen. 2013. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25 5 (2013) 845\u2013869.","DOI":"10.1109\/TNNLS.2013.2292894"},{"key":"e_1_3_3_1_10_2","first-page":"3887","volume-title":"International Conference on Machine Learning","author":"Guo Ruiqi","year":"2020","unstructured":"Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating large-scale inference with anisotropic vector quantization. In International Conference on Machine Learning. PMLR, 3887\u20133896."},{"key":"e_1_3_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_1"},{"key":"e_1_3_3_1_12_2","unstructured":"Ari Holtzman Jan Buys Li Du Maxwell Forbes and Yejin Choi. 2019. The curious case of neural text degeneration. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1904.09751."},{"key":"e_1_3_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00686"},{"key":"e_1_3_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02228"},{"key":"e_1_3_3_1_15_2","doi-asserted-by":"crossref","unstructured":"Jeff Johnson Matthijs Douze and Herv\u00e9 J\u00e9gou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7 3 535\u2013547.","DOI":"10.1109\/TBDATA.2019.2921572"},{"key":"e_1_3_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.215"},{"key":"e_1_3_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.naacl-main.254"},{"key":"e_1_3_3_1_18_2","doi-asserted-by":"crossref","unstructured":"Brian Kulis et\u00a0al. 2013. Metric learning: A survey. Foundations and Trends\u00ae in Machine Learning 5 4 287\u2013364.","DOI":"10.1561\/2200000019"},{"key":"e_1_3_3_1_19_2","unstructured":"Patrick Lewis Ethan Perez Aleksandra Piktus Fabio Petroni Vladimir Karpukhin Naman Goyal Heinrich K\u00fcttler Mike Lewis Wen-tau Yih Tim Rockt\u00e4schel et\u00a0al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 33 9459\u20139474."},{"key":"e_1_3_3_1_20_2","first-page":"19730","volume-title":"International conference on machine learning","author":"Li Junnan","year":"2023","unstructured":"Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning. PMLR, 19730\u201319742."},{"key":"e_1_3_3_1_21_2","first-page":"12888","volume-title":"International conference on machine learning","author":"Li Junnan","year":"2022","unstructured":"Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International conference on machine learning. PMLR, 12888\u201312900."},{"key":"e_1_3_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01303"},{"key":"e_1_3_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02484"},{"key":"e_1_3_3_1_25_2","doi-asserted-by":"crossref","unstructured":"Haotian Liu Chunyuan Li Qingyang Wu and Yong\u00a0Jae Lee. 2023. Visual instruction tuning. Advances in neural information processing systems 36 (2023) 34892\u201334916.","DOI":"10.52202\/075280-1516"},{"key":"e_1_3_3_1_26_2","unstructured":"Huaishao Luo Lei Ji Ming Zhong Yang Chen Wen Lei Nan Duan and Tianrui Li. 2021. Clip4clip: An empirical study of clip for end to end video clip retrieval. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2104.08860 (2021)."},{"key":"e_1_3_3_1_27_2","doi-asserted-by":"crossref","unstructured":"Yiwei Ma Guohai Xu Xiaoshuai Sun Ming Yan Ji Zhang and Rongrong Ji. 2022. X-clip: End-to-end multi-grained contrastive learning for video-text retrieval. Proceedings of the 30th ACM international conference on multimedia (2022) 638\u2013647.","DOI":"10.1145\/3503161.3547910"},{"key":"e_1_3_3_1_28_2","unstructured":"Prasanta\u00a0Chandra Mahalanobis. 2018. On the generalized distance in statistics. Sankhy\u0101: The Indian Journal of Statistics Series A (2008-) 80 (2018) S1\u2013S7."},{"key":"e_1_3_3_1_29_2","doi-asserted-by":"crossref","unstructured":"Yu\u00a0A Malkov and Dmitry\u00a0A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence 42 4 824\u2013836.","DOI":"10.1109\/TPAMI.2018.2889473"},{"key":"e_1_3_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.16"},{"key":"e_1_3_3_1_31_2","volume-title":"Machine learning: a probabilistic perspective","author":"Murphy Kevin\u00a0P","year":"2012","unstructured":"Kevin\u00a0P Murphy. 2012. Machine learning: a probabilistic perspective. MIT press."},{"key":"e_1_3_3_1_32_2","first-page":"8748","volume-title":"International conference on machine learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong\u00a0Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PmLR, 8748\u20138763."},{"key":"e_1_3_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.14778\/3157794.3157797"},{"key":"e_1_3_3_1_34_2","doi-asserted-by":"crossref","unstructured":"Marco\u00a0Tulio Ribeiro Tongshuang Wu Carlos Guestrin and Sameer Singh. 2020. Beyond accuracy: Behavioral testing of NLP models with CheckList. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2005.04118.","DOI":"10.18653\/v1\/2020.acl-main.442"},{"key":"e_1_3_3_1_35_2","doi-asserted-by":"crossref","unstructured":"R\u00a0Tyrrell Rockafellar Stanislav Uryasev et\u00a0al. 2000. Optimization of conditional value-at-risk. Journal of risk 2 (2000) 21\u201342.","DOI":"10.21314\/JOR.2000.038"},{"key":"e_1_3_3_1_36_2","doi-asserted-by":"crossref","unstructured":"Anna Rohrbach Lisa\u00a0Anne Hendricks Kaylee Burns Trevor Darrell and Kate Saenko. 2018. Object hallucination in image captioning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1809.02156.","DOI":"10.18653\/v1\/D18-1437"},{"key":"e_1_3_3_1_37_2","unstructured":"Michael Tschannen Alexey Gritsenko Xiao Wang Muhammad\u00a0Ferjad Naeem Ibrahim Alabdulmohsin Nikhil Parthasarathy Talfan Evans Lucas Beyer Ye Xia Basil Mustafa et\u00a0al. 2025. Siglip 2: Multilingual vision-language encoders with improved semantic understanding localization and dense features. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2502.14786 (2025)."},{"key":"e_1_3_3_1_38_2","unstructured":"Aishwarya Venkataramanan Paul Bodesheim and Joachim Denzler. 2025. Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2505.05163 (2025)."},{"key":"e_1_3_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01115"},{"key":"e_1_3_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.571"},{"key":"e_1_3_3_1_41_2","doi-asserted-by":"crossref","unstructured":"Peter Young Alice Lai Micah Hodosh and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the association for computational linguistics 2 67\u201378.","DOI":"10.1162\/tacl_a_00166"},{"key":"e_1_3_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01100"}],"event":{"name":"ICMR '26: International Conference on Multimedia Retrieval","location":"Amsterdam The Netherlands","acronym":"ICMR '26","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 2026 International Conference on Multimedia Retrieval"],"original-title":[],"deposited":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T15:24:08Z","timestamp":1781537048000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3805622.3810829"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,6,15]]},"references-count":41,"alternative-id":["10.1145\/3805622.3810829","10.1145\/3805622"],"URL":"https:\/\/doi.org\/10.1145\/3805622.3810829","relation":{},"subject":[],"published":{"date-parts":[[2026,6,15]]},"assertion":[{"value":"2026-06-15","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}