{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T19:27:22Z","timestamp":1765308442640,"version":"3.46.0"},"publisher-location":"New York, NY, USA","reference-count":72,"publisher":"ACM","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62172420"],"award-info":[{"award-number":["62172420"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,10,27]]},"DOI":"10.1145\/3746027.3755476","type":"proceedings-article","created":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T05:50:47Z","timestamp":1761371447000},"page":"6173-6182","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Learning Partially-Decorrelated Common Spaces for Ad-hoc Video Search"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5371-7780","authenticated-orcid":false,"given":"Fan","family":"Hu","sequence":"first","affiliation":[{"name":"Renmin University of China, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9220-8735","authenticated-orcid":false,"given":"Zijie","family":"Xin","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0220-8310","authenticated-orcid":false,"given":"Xirong","family":"Li","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,10,27]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking. In TRECVID.","author":"Awad George","year":"2016","unstructured":"George Awad, Fiscus Jonathan, Joy David, Michel Martial, Smeaton Alan, Kraaij Wessel, Quenot Georges, Eskevich Maria, Aly Robin, Ordelman Roeland, Jones Gareth, Huet Benoit, and LarsonMartha. 2016. TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking. In TRECVID."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Chen Jiang Hong Liu Xuzheng Yu Qing Wang Yuan Cheng Jia Xu Zhongyi Liu Qingpei Guo Wei Chu Ming Yang et al. 2023. Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning. In ACM MM.","DOI":"10.1145\/3581783.3612006"},{"key":"e_1_3_2_1_3_1","unstructured":"Yiwei Ma Guohai Xu Xiaoshuai Sun Ming Yan Ji Zhang and Rongrong Ji. 2022. X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval. In ACM MM."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2022.07.028"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Yuqi Liu Pengfei Xiong Luhui Xu Shengming Cao and Qin Jin. 2022. TS2-Net: Token shift and selection transformer for text-video retrieval. In ECCV.","DOI":"10.1007\/978-3-031-19781-9_19"},{"key":"e_1_3_2_1_6_1","unstructured":"George Awad Asad Butt Jonathan Fiscus David Joy Andrew Delgado Willie McClinton Martial Michel Alan Smeaton Yvette Graham Wessel Kraaij Georges Quenot Maria Eskevich Roeland Ordelman Gareth Jones and Benoit Huet. 2018a. Trecvid 2017: Evaluating ad-hoc and instance video search events detection video captioning and hyperlinking. In TRECVID."},{"key":"e_1_3_2_1_7_1","volume-title":"TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search. In TRECVID.","author":"Awad George","year":"2018","unstructured":"George Awad, Asad Gov, Asad Butt, Keith Curtis, Yooyoung Lee, yooyoung@nist Gov, Jonathan Fiscus, David Joy, Andrew Delgado, Alan Smeaton, Yvette Graham, Wessel Kraaij, Georges Quenot, Joao Magalhaes, and Saverio Blasi. 2018b. TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search. In TRECVID."},{"key":"e_1_3_2_1_8_1","volume-title":"TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search and retrieval. In TRECVID.","author":"Awad George","year":"2019","unstructured":"George Awad, Asad Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Godil Afzal, Andrew Delgado, Zhang Jesse, Eliot Godard, Lukas Diduch, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, and Georges Quenot. 2019. TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search and retrieval. In TRECVID."},{"key":"e_1_3_2_1_9_1","volume-title":"TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains. In TRECVID.","author":"Awad George","year":"2020","unstructured":"George Awad, Asad A Butt, Keith Curtis, Jonathan Fiscus, Afzal Godil, Yooyoung Lee, Andrew Delgado, Jesse Zhang, Eliot Godard, Baptiste Chocot, et al., 2020. TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains. In TRECVID."},{"key":"e_1_3_2_1_10_1","unstructured":"George Awad Asad A Butt Keith Curtis Jonathan Fiscus Afzal Godil Yooyoung Lee Andrew Delgado Jesse Zhang Eliot Godard Baptiste Chocot et al. 2021. Evaluating multiple video understanding and retrieval tasks at trecvid 2021. In TRECVID."},{"key":"e_1_3_2_1_11_1","unstructured":"George Awad Keith Curtis Asad Butt Jonathan Fiscus Afzal Godil Yooyoung Lee Andrew Delgado Eliot Godard Lukas Diduch Jeffrey Liu et al. 2022. An overview on the evaluated video retrieval tasks at TRECVID 2022. In TRECVID."},{"key":"e_1_3_2_1_12_1","volume-title":"TRECVID 2023 - A series of evaluation tracks in video understanding. In TRECVID.","author":"Awad George","year":"2023","unstructured":"George Awad, Keith Curtis, Asad Butt, Jonathan Fiscus, Afzal Godil, Yooyoung Lee, Eliot Godard, Lukas Diduch, Deepak Gupta, Dina Demner Fushman, Yvette Graham, Georges Qu\u00e9not, et al., 2023. TRECVID 2023 - A series of evaluation tracks in video understanding. In TRECVID."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Jun Xu Tao Mei Ting Yao and Yong Rui. 2016. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language. In CVPR.","DOI":"10.1109\/CVPR.2016.571"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Fabian Berns Luca Rossetto Klaus Schoeffmann Christian Beecks and George Awad. 2019. V3C1 Dataset: An Evaluation of Content Characteristics. In ICMR.","DOI":"10.1145\/3323873.3325051"},{"key":"e_1_3_2_1_15_1","volume-title":"Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al.","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al., 2021. Learning transferable visual models from natural language supervision. In ICML."},{"key":"e_1_3_2_1_16_1","unstructured":"Hongwei Xue Yuchong Sun Bei Liu Jianlong Fu Ruihua Song Houqiang Li and Jiebo Luo. 2023. CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Alignment. In ICLR."},{"key":"e_1_3_2_1_17_1","unstructured":"Wenhao Wu Haipeng Luo Bo Fang Jingdong Wang and Wanli Ouyang. 2023. Cap4video: What can auxiliary captions do for text-video retrieval?. In CVPR."},{"key":"e_1_3_2_1_18_1","volume-title":"Diffusionret: Generative text-video retrieval with diffusion model. In ICCV.","author":"Jin Peng","year":"2023","unstructured":"Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Xiangyang Ji, Chang Liu, Li Yuan, and Jie Chen. 2023. Diffusionret: Generative text-video retrieval with diffusion model. In ICCV."},{"key":"e_1_3_2_1_19_1","unstructured":"Xirong Li Jinde Ye Chaoxi Xu Shanjinwen Yun Leimin Zhang Xun Wang Rui Qian and Jianfeng Dong. 2019. Renmin University of China and Zhejiang Gongshang University at TRECVID 2019: Learn to Search and Describe Videos. In TRECVID."},{"key":"e_1_3_2_1_20_1","unstructured":"Xirong Li Fangming Zhou and Aozhu Chen. 2020. Renmin University of China at TRECVID 2020: Sentence Encoder Assembly for Ad-hoc Video Search. In TRECVID."},{"key":"e_1_3_2_1_21_1","volume-title":"Phuong Anh Nguyen, and Chong-Wah Ngo","author":"Wu Jiaxin","year":"2020","unstructured":"Jiaxin Wu, Phuong Anh Nguyen, and Chong-Wah Ngo. 2020. VIREO@ TRECVID 2020 Ad-hoc Video Search. In TRECVID."},{"key":"e_1_3_2_1_22_1","volume-title":"Phuong Anh Nguyen, and Chong-Wah Ngo","author":"Wu Jiaxin","year":"2021","unstructured":"Jiaxin Wu, Phuong Anh Nguyen, and Chong-Wah Ngo. 2021. VIREO@ TRECVID 2021 ad-hoc video search. In TRECVID."},{"key":"e_1_3_2_1_23_1","unstructured":"Fangming Zhou Yihui Shi Changqiao Wu Xiaofeng Guo Haofan Wang Jincan Deng and Debing Zhang. 2021. Kuaishou at TRECVID 2021: Two-stage Ranking Strategy for Ad-hoc Video Search. In TRECVID."},{"key":"e_1_3_2_1_24_1","unstructured":"Xirong Li Aozhu Chen Fan Hu Xinru Chen Chengbo Dong and Gang Yang. 2021. Renmin University of China at TRECVID 2021: Searching and Describing Video. In TRECVID."},{"key":"e_1_3_2_1_25_1","volume-title":"Waseda meisei softbank at TRECVID","author":"Ueki Kazuya","year":"2022","unstructured":"Kazuya Ueki, Yuma Suzuki, Hiroki Takushima, Hideaki Okamoto, Hayato Tanoue, and Takayuki Hori. 2022. Waseda meisei softbank at TRECVID 2022. In TRECVID."},{"key":"e_1_3_2_1_26_1","unstructured":"Xirong Li Aozhu Chen Ziyue Wang Fan Hu Kaibin Tian Xinru Chen and Chengbo Dong. 2022. Renmin University of China at TRECVID 2022: Improving Video Search by Feature Fusion and Negation Understanding. In TRECVID."},{"key":"e_1_3_2_1_27_1","volume-title":"ITI-CERTH participation in ActEV and AVS tracks of TRECVID","author":"Gkountakos Konstantinos","year":"2022","unstructured":"Konstantinos Gkountakos, Damianos Galanopoulos, Despoina Touska, Konstantinos Ioannidis, Stefanos Vrochidis, Vasileios Mezaris, and Ioannis Kompatsiaris. 2022. ITI-CERTH participation in ActEV and AVS tracks of TRECVID 2022. In TRECVID."},{"key":"e_1_3_2_1_28_1","volume-title":"Waseda Meisei SoftBank at TRECVID","author":"Ueki Kazuya","year":"2023","unstructured":"Kazuya Ueki, Yuma Suzuki, Hiroki Takushima, Haruki Sato, Takumi Takada, Hideaki Okamoto, Hayato Tanoue, Takayuki Hori, and Aiswariya Manoj Kumar. 2023. Waseda Meisei SoftBank at TRECVID 2023. In TRECVID."},{"key":"e_1_3_2_1_29_1","unstructured":"Xirong Li Fan Hu Ruixiang Zhao Ziyuan Wang Jingyu Liu Jiazhen Liu Bangxiang Lan Wenguan Kou Yuhan Fu and Zhanhui Kang. 2023. Renmin University of China and Tencent at TRECVID 2023: Harnessing Pre-trained Models for Ad-hoc Video Search. In TRECVID."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Jianfeng Dong Xirong Li Chaoxi Xu Shouling Ji Yuan He Gang Yang and Xun Wang. 2019. Dual encoding for zero-example video retrieval. In CVPR.","DOI":"10.1109\/CVPR.2019.00957"},{"key":"e_1_3_2_1_31_1","unstructured":"Xirong Li Jianfeng Dong Chaoxi Xu Jing Cao Xun Wang and Gang Yang. 2018. Renmin University of China and Zhejiang Gongshang University at TRECVID 2018: Deep Cross-Modal Embeddings for Video-Text Retrieval. In TRECVID."},{"key":"e_1_3_2_1_32_1","unstructured":"Yida Zhao Yuqing Song Shizhe Chen and Qin Jin. 2020. RUC_AIM3 at TRECVID 2020: Ad-hoc Video Search & Video to Text Description. In TRECVID."},{"key":"e_1_3_2_1_33_1","unstructured":"Xirong Li Chaoxi Xu Gang Yang Zhineng Chen and Jianfeng Dong. 2019. W2VV: Fully deep learning for ad-hoc video search. In ACM MM."},{"key":"e_1_3_2_1_34_1","unstructured":"Jiaxin Wu and Chong-Wah Ngo. 2020. Interpretable embedding for ad-hoc video search. In ACM MM."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Shizhe Chen Yida Zhao Qin Jin and Qi Wu. 2020. Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning. In CVPR.","DOI":"10.1109\/CVPR42600.2020.01065"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.3042067"},{"key":"e_1_3_2_1_37_1","volume-title":"Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video Retrieval. In ECCV Workshops.","author":"Galanopoulos Damianos","year":"2022","unstructured":"Damianos Galanopoulos and Vasileios Mezaris. 2022. Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video Retrieval. In ECCV Workshops."},{"key":"e_1_3_2_1_38_1","unstructured":"Yang Liu Samuel Albanie Arsha Nagrani and Andrew Zisserman. 2019. Use What You Have: Video retrieval using representations from collaborative experts. In BMVC."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Fan Hu Aozhu Chen Ziyue Wang Fangming Zhou Jianfeng Dong and Xirong Li. 2022. Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval. In ECCV.","DOI":"10.1007\/978-3-031-19781-9_26"},{"key":"e_1_3_2_1_40_1","unstructured":"Chengzhi Lin Ancong Wu Junwei Liang Jun Zhang Wenhang Ge Wei-Shi Zheng and Chunhua Shen. 2022. Text-adaptive multiple visual prototype matching for video-text retrieval. NeurIPS."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Valentin Gabeur Chen Sun Karteek Alahari and Cordelia Schmid. 2020. Multi-modal transformer for video retrieval. In ECCV.","DOI":"10.1007\/978-3-030-58548-8_13"},{"key":"e_1_3_2_1_42_1","unstructured":"Jiaxin Wu Chong-Wah Ngo and Wing-Kwong Chan. 2024. Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept Bank. In ICMR."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000014"},{"key":"e_1_3_2_1_44_1","article-title":"Large-Scale Concept Ontology for Multimedia","volume":"13","author":"Naphade Milind","year":"2006","unstructured":"Milind Naphade, J.R. Smith, Jelena Tesic, S. Chang, Winston Hsu, Lyndon Kennedy, Alexander Hauptmann, and Jon Curtis. 2006. Large-Scale Concept Ontology for Multimedia. IEEE Transactions on Multimedia, Vol. 13 (08 2006), 86-91.","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"crossref","DOI":"10.1109\/TMM.2009.2036235","article-title":"Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study","volume":"12","author":"Jiang Yu-Gang","year":"2010","unstructured":"Yu-Gang Jiang, Jun Yang, Chong-Wah Ngo, and Alexander Hauptmann. 2010. Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study. IEEE Transactions on Multimedia, Vol. 12 (02 2010), 42-53.","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_2_1_46_1","volume-title":"Smeulders","author":"Snoek Cees G. M.","year":"2006","unstructured":"Cees G. M. Snoek, Marcel Worring, Jan C. van Gemert, Jan-Mark Geusebroek, and Arnold W. M. Smeulders. 2006. The Challenge Problem for Automated Detection of 101 Semantic Concepts in Multimedia. In ACM MM."},{"key":"e_1_3_2_1_47_1","volume-title":"Minh-Triet Tran, Duc Anh Duong, and Shinichi Satoh.","author":"Nguyen Vinh-Tiep","year":"2016","unstructured":"Vinh-Tiep Nguyen, Duy-Dinh Le, Benjamin Renoust, Thanh Duc Ngo, Minh-Triet Tran, Duc Anh Duong, and Shinichi Satoh. 2016. NII-HITACHI-UIT at TRECVID 2016 Ad-hoc Video Search: Enriching Semantic Features using Multiple Neural Networks. In TRECVID."},{"key":"e_1_3_2_1_48_1","unstructured":"Kazuya Ueki Yu Nakagome Koji Hirakawa Kotaro Kikuchi Yoshihiko Hayashi Tetsuji Ogawa and Tetsunori Kobayashi. 2018. Waseda Meisei at TRECVID 2018:Ad-hoc Video Search. In TRECVID."},{"key":"e_1_3_2_1_49_1","unstructured":"Po-Yao Huang Junwei Liang Vaibhav Xiaojun Chang and Alexander Hauptmann. 2018. Informedia@TRECVID 2018:Ad-hoc Video Search with Discrete and Continuous Representations. In TRECVID."},{"key":"e_1_3_2_1_50_1","volume-title":"Enhanced VIREO KIS at VBS","author":"Nguyen Phuong Anh","year":"2018","unstructured":"Phuong Anh Nguyen, Yi-Jie Lu, Hao Zhang, and Chong-Wah Ngo. 2018. Enhanced VIREO KIS at VBS 2018. In MMM."},{"key":"e_1_3_2_1_51_1","volume-title":"Kazuya an Hori and Tetsunori Kobayashi","author":"Ueki Takayuki","year":"2019","unstructured":"Takayuki Ueki, Kazuya an Hori and Tetsunori Kobayashi. 2019. Waseda_Meisei_SoftBank at TRECVID 2019: Ad-hoc Video Search. In TRECVID."},{"key":"e_1_3_2_1_52_1","volume-title":"Snoek","author":"Habibian Amirhossein","year":"2014","unstructured":"Amirhossein Habibian, Thomas Mensink, and Cees G. M. Snoek. 2014. VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events. In ACM MM."},{"key":"e_1_3_2_1_53_1","volume-title":"Jamie Ryan Kiros, and Sanja Fidlere","author":"Faghri Fartash","year":"2018","unstructured":"Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, and Sanja Fidlere. 2018. VSE: Improving Visual-Semantic Embeddings with Hard Negatives. In BMVC."},{"key":"e_1_3_2_1_54_1","unstructured":"Jiangshan He Ruizhe Li Jiahao Guo Hong Zhang Mingxi Li Zhengqian Wu Zhongyuan Wang Bo Du and Chao Liang. 2023. WHU-NERCMS at TRECVID 2023: Ad-hoc Video Search (AVS) and Deep Video Understanding (DVU) Tasks. In TRECVID."},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2024.104035"},{"key":"e_1_3_2_1_56_1","unstructured":"Mandela Patrick Po-Yao Huang Yuki Asano Florian Metze Alexander Hauptmann Joao Henriques and Andrea Vedaldi. 2021. Support-set bottlenecks for video-text representation learning. In ICLR."},{"key":"e_1_3_2_1_57_1","volume-title":"Carbonell and Jade Goldstein","author":"Jaime","year":"1998","unstructured":"Jaime G. Carbonell and Jade Goldstein. 1998. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In SIGIR."},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2492189.2492205"},{"key":"e_1_3_2_1_59_1","volume-title":"Jamie Ryan Kiros, and Sanja Fidler","author":"Faghri Fartash","year":"2018","unstructured":"Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, and Sanja Fidler. 2018. VSE: Improving Visual-Semantic Embeddings with Hard Negatives. In BMVC."},{"key":"e_1_3_2_1_60_1","volume-title":"Roy-Chowdhury","author":"Mithun Niluthpol Chowdhury","year":"2018","unstructured":"Niluthpol Chowdhury Mithun, Juncheng Li, Florian Metze, and Amit K. Roy-Chowdhury. 2018. Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval. In ICMR."},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"crossref","unstructured":"Gareth James Daniela Witten Trevor Hastie Robert Tibshirani et al. 2013. An introduction to statistical learning. Springer.","DOI":"10.1007\/978-1-4614-7138-7"},{"volume-title":"Deep learning","author":"Goodfellow Ian","key":"e_1_3_2_1_62_1","unstructured":"Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. MIT press Cambridge."},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"crossref","unstructured":"Xu Ma Pengjie Wang Hui Zhao Shaoguo Liu Chuhan Zhao Wei Lin Kuang-Chih Lee Jian Xu and Bo Zheng. 2021. Towards a Better Tradeoff between Effectiveness and Efficiency in Pre-Ranking: A Learnable Feature Selection based Approach. In SIGIR.","DOI":"10.1145\/3404835.3462979"},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"crossref","unstructured":"He Wei Yuekui Yang Haiyang Wu Yangyang Tang Meixi Liu and Jianfeng Li. 2023. Automatic Feature Selection By One-Shot Neural Architecture Search In Recommendation Systems. In WWW.","DOI":"10.1145\/3543507.3583444"},{"key":"e_1_3_2_1_65_1","unstructured":"Youngjae Yu Jongseok Kim and Gunhee Kim. 2018. A joint sequence fusion model for video question answering and retrieval. In ECCV."},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2796248"},{"key":"e_1_3_2_1_67_1","doi-asserted-by":"crossref","unstructured":"Dhruv Mahajan Ross B. Girshick Vignesh Ramanathan Kaiming He Manohar Paluri Yixuan Li Ashwin Bharambe and Laurens van der Maaten. 2018. Exploring the Limits of Weakly Supervised Pretraining. In ECCV.","DOI":"10.1007\/978-3-030-01216-8_12"},{"key":"e_1_3_2_1_68_1","doi-asserted-by":"crossref","unstructured":"Deepti Ghadiyaram Du Tran and Dhruv Mahajan. 2019. Large-Scale Weakly-Supervised Pre-training for Video Action Recognition. In CVPR.","DOI":"10.1109\/CVPR.2019.01232"},{"key":"e_1_3_2_1_69_1","unstructured":"Hangbo Bao Li Dong Songhao Piao and Furu Wei. 2022. BEiT: BERT Pre-Training of Image Transformers. In ICLR."},{"key":"e_1_3_2_1_70_1","volume-title":"Hoi","author":"Li Junnan","year":"2022","unstructured":"Junnan Li, Dongxu Li, Caiming Xiong, and Steven C. H. Hoi. 2022. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In ICML."},{"key":"e_1_3_2_1_71_1","volume-title":"DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval. In AAAI.","author":"Yang Xiangpeng","year":"2024","unstructured":"Xiangpeng Yang, Linchao Zhu, Xiaohan Wang, and Yi Yang. 2024. DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval. In AAAI."},{"key":"e_1_3_2_1_72_1","doi-asserted-by":"crossref","unstructured":"Kaibin Tian Ruixiang Zhao Zijie Xin Bangxiang Lan and Xirong Li. 2024. Holistic Features are almost Sufficient for Text-to-Video Retrieval. In CVPR.","DOI":"10.1109\/CVPR52733.2024.01622"}],"event":{"name":"MM '25: The 33rd ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Dublin Ireland","acronym":"MM '25"},"container-title":["Proceedings of the 33rd ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3746027.3755476","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T19:24:41Z","timestamp":1765308281000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3746027.3755476"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,27]]},"references-count":72,"alternative-id":["10.1145\/3746027.3755476","10.1145\/3746027"],"URL":"https:\/\/doi.org\/10.1145\/3746027.3755476","relation":{},"subject":[],"published":{"date-parts":[[2025,10,27]]},"assertion":[{"value":"2025-10-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}