{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:14:55Z","timestamp":1750220095537,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":65,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,8,23]],"date-time":"2022-08-23T00:00:00Z","timestamp":1661212800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,8,23]]},"DOI":"10.1145\/3539813.3545148","type":"proceedings-article","created":{"date-parts":[[2022,8,25]],"date-time":"2022-08-25T22:18:32Z","timestamp":1661465912000},"page":"193-203","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["U-BERT for Fast and Scalable Text-Image Retrieval"],"prefix":"10.1145","author":[{"given":"Tan","family":"Yu","sequence":"first","affiliation":[{"name":"Baidu Research, Bellevue, WA, USA"}]},{"given":"Hongliang","family":"Fei","sequence":"additional","affiliation":[{"name":"Baidu Research, Bellevue, WA, USA"}]},{"given":"Ping","family":"Li","sequence":"additional","affiliation":[{"name":"Baidu Research, Bellevue, WA, USA"}]}],"member":"320","published-online":{"date-parts":[[2022,8,25]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00636"},{"volume-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)","author":"Cao Yue","key":"e_1_3_2_1_2_1","unstructured":"Yue Cao , Mingsheng Long , Jianmin Wang , Qiang Yang , and Philip S. Yu . 2016. Deep Visual-Semantic Hashing for Cross-Modal Retrieval . In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) . San Francisco, CA, 1445--1454. Yue Cao, Mingsheng Long, Jianmin Wang, Qiang Yang, and Philip S. Yu. 2016. Deep Visual-Semantic Hashing for Cross-Modal Retrieval. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). San Francisco, CA, 1445--1454."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01267"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58577-8_7"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.707"},{"key":"e_1_3_2_1_6_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) . Minneapolis, MN, 4171--4186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Minneapolis, MN, 4171--4186."},{"key":"e_1_3_2_1_7_1","volume-title":"Jamie Ryan Kiros, and Sanja Fidler","author":"Faghri Fartash","year":"2018","unstructured":"Fartash Faghri , David J. Fleet , Jamie Ryan Kiros, and Sanja Fidler . 2018 . VSE Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, and Sanja Fidler. 2018. VSE"},{"volume-title":"Proceedings of the British Machine Vision Conference (BMVC)","key":"e_1_3_2_1_8_1","unstructured":": Improving Visual-Semantic Embeddings with Hard Negatives . In Proceedings of the British Machine Vision Conference (BMVC) . Newcastle, UK. : Improving Visual-Semantic Embeddings with Hard Negatives. In Proceedings of the British Machine Vision Conference (BMVC). Newcastle, UK."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.285"},{"volume-title":"Advances in Neural Information Processing Systems (NIPS).","author":"Frome Andrea","key":"e_1_3_2_1_10_1","unstructured":"Andrea Frome , Gregory S. Corrado , Jonathon Shlens , Samy Bengio , Jeffrey Dean , Marc'Aurelio Ranzato , and Tom\u00e1 s Mikolov . 2013. DeViSE: A Deep Visual-Semantic Embedding Model . In Advances in Neural Information Processing Systems (NIPS). Lake Tahoe, NV , 2121--2129. Andrea Frome, Gregory S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc'Aurelio Ranzato, and Tom\u00e1 s Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In Advances in Neural Information Processing Systems (NIPS). Lake Tahoe, NV, 2121--2129."},{"key":"e_1_3_2_1_11_1","unstructured":"Zhe Gan Yen-Chun Chen Linjie Li Chen Zhu Yu Cheng and Jingjing Liu. 2020. Large-Scale Adversarial Training for Vision-and-Language Representation Learning. In Advances in Neural Information Processing Systems (NeurIPS). virtual. Zhe Gan Yen-Chun Chen Linjie Li Chen Zhu Yu Cheng and Jingjing Liu. 2020. Large-Scale Adversarial Training for Vision-and-Language Representation Learning. In Advances in Neural Information Processing Systems (NeurIPS). virtual."},{"key":"e_1_3_2_1_12_1","volume-title":"Retrieve Fast","author":"Geigle Gregor","year":"1920","unstructured":"Gregor Geigle , Jonas Pfeiffer , Nils Reimers , Ivan Vuli\u0107 , and Iryna Gurevych . 2021. Retrieve Fast , Rerank Smart : Cooperative and Joint Approaches for Improved Cross-Modal Retrieval . arXiv preprint arXiv:2103.1 1920 (2021). Gregor Geigle, Jonas Pfeiffer, Nils Reimers, Ivan Vuli\u0107 , and Iryna Gurevych. 2021. Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval. arXiv preprint arXiv:2103.11920 (2021)."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-013-0658-4"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1162\/0899766042321814"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018489"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00587"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.767"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00645"},{"key":"e_1_3_2_1_19_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning (ICML). Virtual Event, 4904--4916","author":"Jia Chao","year":"2021","unstructured":"Chao Jia , Yinfei Yang , Ye Xia , Yi-Ting Chen , Zarana Parekh , Hieu Pham , Quoc V. Le , Yun-Hsuan Sung , Zhen Li , and Tom Duerig . 2021 . Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision . In Proceedings of the 38th International Conference on Machine Learning (ICML). Virtual Event, 4904--4916 . Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML). Virtual Event, 4904--4916."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403324"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3402432"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298932"},{"key":"e_1_3_2_1_23_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning (ICML) . Virtual Event, 5583--5594","author":"Kim Wonjae","year":"2021","unstructured":"Wonjae Kim , Bokyung Son , and Ildoo Kim . 2021 . ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision . In Proceedings of the 38th International Conference on Machine Learning (ICML) . Virtual Event, 5583--5594 . Wonjae Kim, Bokyung Son, and Ildoo Kim. 2021. ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML) . Virtual Event, 5583--5594."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_13"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6795"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00475"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Wei Li Can Gao Guocheng Niu Xinyan Xiao Hao Liu Jiachen Liu Hua Wu and Haifeng Wang. 2021. UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing ACL\/IJCNLP 2021 Chengqing Zong Fei Xia Wenjie Li and Roberto Navigli (Eds.). Association for Computational Linguistics. Wei Li Can Gao Guocheng Niu Xinyan Xiao Hao Liu Jiachen Liu Hua Wu and Haifeng Wang. 2021. UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing ACL\/IJCNLP 2021 Chengqing Zong Fei Xia Wenjie Li and Roberto Navigli (Eds.). Association for Computational Linguistics.","DOI":"10.18653\/v1\/2021.acl-long.202"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58577-8_8"},{"volume-title":"Proceedings of the 13th European Conference on Computer Vision (ECCV), Part V","author":"Lin Tsung-Yi","key":"e_1_3_2_1_30_1","unstructured":"Tsung-Yi Lin , Michael Maire , Serge J. Belongie , James Hays , Pietro Perona , Deva Ramanan , Piotr Doll\u00e1 r, and C. Lawrence Zitnick . 2014. Microsoft COCO: Common Objects in Context . In Proceedings of the 13th European Conference on Computer Vision (ECCV), Part V . Zurich, Switzerland, 740--755. Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1 r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Proceedings of the 13th European Conference on Computer Vision (ECCV), Part V. Zurich, Switzerland, 740--755."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.772"},{"key":"e_1_3_2_1_32_1","unstructured":"Jiaheng Liu Tan Yu Hanyu Peng Mingming Sun and Ping Li. 2022. Cross-Lingual Cross-Modal Consolidation for Effective Multilingual Video Corpus Moment Retrieval. In Findings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) . Jiaheng Liu Tan Yu Hanyu Peng Mingming Sun and Ping Li. 2022. Cross-Lingual Cross-Modal Consolidation for Effective Multilingual Video Corpus Moment Retrieval. In Findings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) ."},{"volume-title":"Advances in Neural Information Processing Systems (NeurIPS).","author":"Lu Jiasen","key":"e_1_3_2_1_33_1","unstructured":"Jiasen Lu , Dhruv Batra , Devi Parikh , and Stefan Lee . 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks . In Advances in Neural Information Processing Systems (NeurIPS). Vancouver, Canada , 13--23. Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. In Advances in Neural Information Processing Systems (NeurIPS). Vancouver, Canada, 13--23."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01045"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Xiaopeng Lu Tiancheng Zhao and Kyusong Lee. 2021. VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-words. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL\/IJCNLP) . Virtual Event 5020--5029. Xiaopeng Lu Tiancheng Zhao and Kyusong Lee. 2021. VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-words. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL\/IJCNLP) . Virtual Event 5020--5029.","DOI":"10.18653\/v1\/2021.acl-long.389"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467126"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.232"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00397"},{"key":"e_1_3_2_1_39_1","volume-title":"MLP Architectures for Vision-and-Language Modeling: An Empirical Study. arXiv preprint arXiv:2112.04453","author":"Nie Yixin","year":"2021","unstructured":"Yixin Nie , Linjie Li , Zhe Gan , Shuohang Wang , Chenguang Zhu , Michael Zeng , Zicheng Liu , Mohit Bansal , and Lijuan Wang . 2021. MLP Architectures for Vision-and-Language Modeling: An Empirical Study. arXiv preprint arXiv:2112.04453 ( 2021 ). Yixin Nie, Linjie Li, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal, and Lijuan Wang. 2021. MLP Architectures for Vision-and-Language Modeling: An Empirical Study. arXiv preprint arXiv:2112.04453 (2021)."},{"key":"e_1_3_2_1_40_1","volume-title":"Berg","author":"Ordonez Vicente","year":"2011","unstructured":"Vicente Ordonez , Girish Kulkarni , and Tamara L . Berg . 2011 . Im2Text: Describing Images Using 1 Million Captioned Photographs. In Advances in Neural Information Processing Systems (NIPS). Granada, Spain , 1143--1151. Vicente Ordonez, Girish Kulkarni, and Tamara L. Berg. 2011. Im2Text: Describing Images Using 1 Million Captioned Photographs. In Advances in Neural Information Processing Systems (NIPS). Granada, Spain, 1143--1151."},{"key":"e_1_3_2_1_41_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning (ICML) . Virtual Event, 8748--8763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford , Jong Wook Kim , Chris Hallacy , Aditya Ramesh , Gabriel Goh , Sandhini Agarwal , Girish Sastry , Amanda Askell , Pamela Mishkin , Jack Clark , Gretchen Krueger , and Ilya Sutskever . 2021 . Learning Transferable Visual Models From Natural Language Supervision . In Proceedings of the 38th International Conference on Machine Learning (ICML) . Virtual Event, 8748--8763 . Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML) . Virtual Event, 8748--8763."},{"key":"e_1_3_2_1_42_1","unstructured":"Shaoqing Ren Kaiming He Ross B. Girshick and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems (NIPS). Montreal Canada 91--99. Shaoqing Ren Kaiming He Ross B. Girshick and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems (NIPS). Montreal Canada 91--99."},{"volume-title":"Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Sarafianos Nikolaos","key":"e_1_3_2_1_43_1","unstructured":"Nikolaos Sarafianos , Xiang Xu , and Ioannis A. Kakadiaris . 2019. Adversarial Representation Learning for Text-to-Image Matching . In Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV) . Seoul, Korea, 5813--5823. Nikolaos Sarafianos, Xiang Xu, and Ioannis A. Kakadiaris. 2019. Adversarial Representation Learning for Text-to-Image Matching. In Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV) . Seoul, Korea, 5813--5823."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1238"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00208"},{"key":"e_1_3_2_1_46_1","volume-title":"Proceedings of the 8th International Conference on Learning Representations (ICLR) . Addis Ababa, Ethiopia.","author":"Su Weijie","year":"2020","unstructured":"Weijie Su , Xizhou Zhu , Yue Cao , Bin Li , Lewei Lu , Furu Wei , and Jifeng Dai . 2020 . VL-BERT: Pre-training of Generic Visual-Linguistic Representations . In Proceedings of the 8th International Conference on Learning Representations (ICLR) . Addis Ababa, Ethiopia. Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. 2020. VL-BERT: Pre-training of Generic Visual-Linguistic Representations. In Proceedings of the 8th International Conference on Learning Representations (ICLR) . Addis Ababa, Ethiopia."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00756"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.77"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1514"},{"volume-title":"Advances in Neural Information Processing Systems (NIPS).","author":"Vaswani Ashish","key":"e_1_3_2_1_50_1","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017. Attention is All you Need . In Advances in Neural Information Processing Systems (NIPS). Long Beach, CA , 5998--6008. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems (NIPS). Long Beach, CA, 5998--6008."},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58586-0_2"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732296.2732301"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-015-0391-4"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01095"},{"key":"e_1_3_2_1_55_1","volume-title":"Proceedings of the 32nd International Conference on Machine Learning (ICML)","author":"Xu Kelvin","year":"2015","unstructured":"Kelvin Xu , Jimmy Ba , Ryan Kiros , Kyunghyun Cho , Aaron C. Courville , Ruslan Salakhutdinov , Richard S. Zemel , and Yoshua Bengio . 2015 . Show, Attend and Tell: Neural Image Caption Generation with Visual Attention . In Proceedings of the 32nd International Conference on Machine Learning (ICML) . Lille, France , 2048--2057. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proceedings of the 32nd International Conference on Machine Learning (ICML) . Lille, France, 2048--2057."},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00166"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i4.16431"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531826"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE51399.2021.00225"},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482233"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403297"},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462924"},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3481937"},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413977"},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00553"}],"event":{"name":"ICTIR '22: The 2022 ACM SIGIR International Conference on the Theory of Information Retrieval","sponsor":["SIGIR ACM Special Interest Group on Information Retrieval"],"location":"Madrid Spain","acronym":"ICTIR '22"},"container-title":["Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3539813.3545148","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3539813.3545148","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:10:03Z","timestamp":1750183803000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3539813.3545148"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,23]]},"references-count":65,"alternative-id":["10.1145\/3539813.3545148","10.1145\/3539813"],"URL":"https:\/\/doi.org\/10.1145\/3539813.3545148","relation":{},"subject":[],"published":{"date-parts":[[2022,8,23]]},"assertion":[{"value":"2022-08-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}