{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T18:29:08Z","timestamp":1773772148547,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":40,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,6,27]],"date-time":"2022-06-27T00:00:00Z","timestamp":1656288000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"ONR","award":["N00014-19-1-2119"],"award-info":[{"award-number":["N00014-19-1-2119"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,6,27]]},"DOI":"10.1145\/3512527.3531381","type":"proceedings-article","created":{"date-parts":[[2022,6,23]],"date-time":"2022-06-23T22:23:32Z","timestamp":1656023012000},"page":"158-166","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":31,"title":["CLIP4Hashing: Unsupervised Deep Hashing for Cross-Modal Video-Text Retrieval"],"prefix":"10.1145","author":[{"given":"Yaoxin","family":"Zhuo","sequence":"first","affiliation":[{"name":"Arizona State University, Tempe, AZ, USA"}]},{"given":"Yikang","family":"Li","sequence":"additional","affiliation":[{"name":"OPPO US Research Center, Palo Alto, CA, USA"}]},{"given":"Jenhao","family":"Hsiao","sequence":"additional","affiliation":[{"name":"OPPO US Research Center, Palo Alto, CA, USA"}]},{"given":"Chiuman","family":"Ho","sequence":"additional","affiliation":[{"name":"OPPO US Research Center, Palo Alto, CA, USA"}]},{"given":"Baoxin","family":"Li","sequence":"additional","affiliation":[{"name":"Arizona State University, Tempe, AZ, USA"}]}],"member":"320","published-online":{"date-parts":[[2022,6,27]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.618"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/2002472.2002497"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2019.2911359"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2821921"},{"key":"e_1_3_2_2_5_1","volume-title":"Mean local group average precision (mLGAP): a new performance metric for hashing-based retrieval. arXiv preprint arXiv:1811.09763","author":"Kevin Ding Pak Lun","year":"2018","unstructured":"Pak Lun Kevin Ding , Yikang Li , and Baoxin Li. 2018. Mean local group average precision (mLGAP): a new performance metric for hashing-based retrieval. arXiv preprint arXiv:1811.09763 ( 2018 ). Pak Lun Kevin Ding, Yikang Li, and Baoxin Li. 2018. Mean local group average precision (mLGAP): a new performance metric for hashing-based retrieval. arXiv preprint arXiv:1811.09763 (2018)."},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2019.02.004"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58548-8_13"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01062"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00319"},{"key":"e_1_3_2_2_10_1","volume-title":"International Conference on Machine Learning. PMLR, 4904--4916","author":"Jia Chao","year":"2021","unstructured":"Chao Jia , Yinfei Yang , Ye Xia , Yi-Ting Chen , Zarana Parekh , Hieu Pham , Quoc Le , Yun-Hsuan Sung , Zhen Li , and Tom Duerig . 2021 . Scaling up visual and vision- language representation learning with noisy text supervision . In International Conference on Machine Learning. PMLR, 4904--4916 . Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling up visual and vision- language representation learning with noisy text supervision. In International Conference on Machine Learning. PMLR, 4904--4916."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.2997020"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00725"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.3301176"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i3.16296"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.227"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01170"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401086"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2355047"},{"key":"e_1_3_2_2_19_1","volume-title":"Proceedings of the British Machine Vision Conference (BMVC), Kirill Sidorov and Yulia Hicks (Eds.). BMVA Press, Article 73","author":"Liu Yang","year":"2019","unstructured":"Yang Liu , Samuel Albanie , Arsha Nagrani , and Andrew Zisserman . 2019 . Use What You Have: Video retrieval using representations from collaborative experts . In Proceedings of the British Machine Vision Conference (BMVC), Kirill Sidorov and Yulia Hicks (Eds.). BMVA Press, Article 73 , 14 pages. https:\/\/doi.org\/10.5244\/C.33.73 10.5244\/C.33.73 Yang Liu, Samuel Albanie, Arsha Nagrani, and Andrew Zisserman. 2019. Use What You Have: Video retrieval using representations from collaborative experts. In Proceedings of the British Machine Vision Conference (BMVC), Kirill Sidorov and Yulia Hicks (Eds.). BMVA Press, Article 73, 14 pages. https:\/\/doi.org\/10.5244\/C.33.73"},{"key":"e_1_3_2_2_20_1","volume-title":"SLIP: Self-supervision meets Language-Image Pre-training. arXiv preprint arXiv:2112.12750","author":"Mu Norman","year":"2021","unstructured":"Norman Mu , Alexander Kirillov , David Wagner , and Saining Xie . 2021 . SLIP: Self-supervision meets Language-Image Pre-training. arXiv preprint arXiv:2112.12750 (2021). Norman Mu, Alexander Kirillov, David Wagner, and Saining Xie. 2021. SLIP: Self-supervision meets Language-Image Pre-training. arXiv preprint arXiv:2112.12750 (2021)."},{"key":"e_1_3_2_2_21_1","volume-title":"et al","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , et al . 2019 . Pytorch : An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019), 8026--8037. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al . 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019), 8026--8037."},{"key":"e_1_3_2_2_22_1","volume-title":"International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=EqoXe2zmhrh","author":"Patrick Mandela","year":"2021","unstructured":"Mandela Patrick , Po-Yao Huang , Yuki Asano , Florian Metze , Alexander G Hauptmann , Joao F. Henriques , and Andrea Vedaldi . 2021 . Support-set bottlenecks for video-text representation learning . In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=EqoXe2zmhrh Mandela Patrick, Po-Yao Huang, Yuki Asano, Florian Metze, Alexander G Hauptmann, Joao F. Henriques, and Andrea Vedaldi. 2021. Support-set bottlenecks for video-text representation learning. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=EqoXe2zmhrh"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.3048680"},{"key":"e_1_3_2_2_24_1","volume-title":"International Conference on Machine Learning. PMLR, 8748--8763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford , Jong Wook Kim , Chris Hallacy , Aditya Ramesh , Gabriel Goh , Sandhini Agarwal , Girish Sastry , Amanda Askell , Pamela Mishkin , Jack Clark , 2021 . Learning transferable visual models from natural language supervision . In International Conference on Machine Learning. PMLR, 8748--8763 . Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1873987"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2789887"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3356316"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00289"},{"key":"e_1_3_2_2_29_1","volume-title":"FLAVA: A Foundational Language And Vision Alignment Model. arXiv preprint arXiv:2112.04482","author":"Singh Amanpreet","year":"2021","unstructured":"Amanpreet Singh , Ronghang Hu , Vedanuj Goswami , Guillaume Couairon , Wojciech Galuba , Marcus Rohrbach , and Douwe Kiela . 2021 . FLAVA: A Foundational Language And Vision Alignment Model. arXiv preprint arXiv:2112.04482 (2021). Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela. 2021. FLAVA: A Foundational Language And Vision Alignment Model. arXiv preprint arXiv:2112.04482 (2021)."},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2814344"},{"key":"e_1_3_2_2_31_1","volume-title":"Garnett (Eds.)","volume":"31","author":"Su Shupeng","year":"2018","unstructured":"Shupeng Su , Chao Zhang , Kai Han , and Yonghong Tian . 2018 . Greedy Hash: Towards Fast Optimization for Accurate Hash Coding in CNN. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R . Garnett (Eds.) , Vol. 31 . Curran Associates, Inc. https:\/\/proceedings.neurips.cc\/paper\/ 2018\/file\/13f3cf8c531952d72e5847c4183e6910-Paper.pdf Shupeng Su, Chao Zhang, Kai Han, and Yonghong Tian. 2018. Greedy Hash: Towards Fast Optimization for Accurate Hash Coding in CNN. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https:\/\/proceedings.neurips.cc\/paper\/2018\/file\/13f3cf8c531952d72e5847c4183e6910-Paper.pdf"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00312"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV45572.2020.9093468"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"crossref","unstructured":"Gengshen Wu Zijia Lin Jungong Han Li Liu Guiguang Ding Baochang Zhang and Jialie Shen. 2018. Unsupervised Deep Hashing via Binary Latent Factor Models for Large-scale Cross-modal Retrieval.. In IJCAI. 2854--2860. Gengshen Wu Zijia Lin Jungong Han Li Liu Guiguang Ding Baochang Zhang and Jialie Shen. 2018. Unsupervised Deep Hashing via Binary Latent Factor Models for Large-scale Cross-modal Retrieval.. In IJCAI. 2854--2860.","DOI":"10.24963\/ijcai.2018\/396"},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.2963957"},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.571"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i5.16592"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_29"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_23"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11280-020-00859-y"}],"event":{"name":"ICMR '22: International Conference on Multimedia Retrieval","location":"Newark NJ USA","acronym":"ICMR '22","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 2022 International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3512527.3531381","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3512527.3531381","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3512527.3531381","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:12Z","timestamp":1750188612000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3512527.3531381"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,27]]},"references-count":40,"alternative-id":["10.1145\/3512527.3531381","10.1145\/3512527"],"URL":"https:\/\/doi.org\/10.1145\/3512527.3531381","relation":{},"subject":[],"published":{"date-parts":[[2022,6,27]]},"assertion":[{"value":"2022-06-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}