{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T08:56:20Z","timestamp":1765356980901,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":43,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,24]],"date-time":"2021-08-24T00:00:00Z","timestamp":1629763200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2019YFC0 850202","2018YFC0825102"],"award-info":[{"award-number":["2019YFC0 850202","2018YFC0825102"]}]},{"name":"National Natural Science Founda- tion of China under Grants","award":["62006221"],"award-info":[{"award-number":["62006221"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,24]]},"DOI":"10.1145\/3460426.3463608","type":"proceedings-article","created":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T22:50:28Z","timestamp":1630536628000},"page":"135-143","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Multi-Feature Graph Attention Network for Cross-Modal Video-Text Retrieval"],"prefix":"10.1145","author":[{"given":"Xiaoshuai","family":"Hao","sequence":"first","affiliation":[{"name":"Institute of Information Engineering,Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Yucan","family":"Zhou","sequence":"additional","affiliation":[{"name":"Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China"}]},{"given":"Dayan","family":"Wu","sequence":"additional","affiliation":[{"name":"Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China"}]},{"given":"Wanqian","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China"}]},{"given":"Bo","family":"Li","sequence":"additional","affiliation":[{"name":"Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China"}]},{"given":"Weiping","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2021,9]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"The End-of-End-to-End: A Video Understanding Pentathlon Challenge","author":"Albanie Samuel","year":"2020","unstructured":"Samuel Albanie , Yang Liu , Arsha Nagrani , Antoine Miech , Ernesto Coto , Ivan Laptev , Rahul Sukthankar , Bernard Ghanem , Andrew Zisserman , Valentin Gabeur , Chen Sun , Karteek Alahari , Cordelia Schmid , Shizhe Chen , Yida Zhao , Qin Jin , Kaixu Cui , Hui Liu , Chen Wang , Yudong Jiang , and Xiaoshuai Hao . 2020. The End-of-End-to-End: A Video Understanding Pentathlon Challenge ( 2020 ). CoRR , Vol . abs\/2008.00744 (2020). Samuel Albanie, Yang Liu, Arsha Nagrani, Antoine Miech, Ernesto Coto, Ivan Laptev, Rahul Sukthankar, Bernard Ghanem, Andrew Zisserman, Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid, Shizhe Chen, Yida Zhao, Qin Jin, Kaixu Cui, Hui Liu, Chen Wang, Yudong Jiang, and Xiaoshuai Hao. 2020. The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020). CoRR, Vol. abs\/2008.00744 (2020)."},{"key":"e_1_3_2_1_2_1","volume-title":"NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 5297--5307","author":"Arandjelovic Relja","year":"2016","unstructured":"Relja Arandjelovic , Petr Gronat , Akihiko Torii , Tomas Pajdla , and Josef Sivic . 2016 . NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 5297--5307 . Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 5297--5307."},{"key":"e_1_3_2_1_3_1","volume-title":"Collecting Highly Parallel Data for Paraphrase Evaluation. In Annual Meeting of the Association for Computational Linguistics. 190--200","author":"Chen David L","year":"2011","unstructured":"David L Chen and William B Dolan . 2011 . Collecting Highly Parallel Data for Paraphrase Evaluation. In Annual Meeting of the Association for Computational Linguistics. 190--200 . David L Chen and William B Dolan. 2011. Collecting Highly Parallel Data for Paraphrase Evaluation. In Annual Meeting of the Association for Computational Linguistics. 190--200."},{"key":"e_1_3_2_1_4_1","volume-title":"Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"Chen Shizhe","year":"2020","unstructured":"Shizhe Chen , Yida Zhao , Qin Jin , and Qi Wu . 2020 . Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning. In IEEE Conference on Computer Vision and Pattern Recognition. Shizhe Chen, Yida Zhao, Qin Jin, and Qi Wu. 2020. Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_1_5_1","volume-title":"Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction. arXiv:1604.06838","author":"Dong Jianfeng","year":"2016","unstructured":"Jianfeng Dong , Xirong Li , and Cees G M Snoek . 2016. Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction. arXiv:1604.06838 ( 2016 ). Jianfeng Dong, Xirong Li, and Cees G M Snoek. 2016. Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction. arXiv:1604.06838 (2016)."},{"key":"e_1_3_2_1_6_1","volume-title":"Snoek","author":"Dong Jianfeng","year":"2018","unstructured":"Jianfeng Dong , Xirong Li , and Cees G. M . Snoek . 2018 . Predicting Visual Features From Text for Image and Video Caption Retrieval. IEEE Transactions on Multimedia ( 2018), 3377--3388. Jianfeng Dong, Xirong Li, and Cees G. M. Snoek. 2018. Predicting Visual Features From Text for Image and Video Caption Retrieval. IEEE Transactions on Multimedia (2018), 3377--3388."},{"key":"e_1_3_2_1_7_1","volume-title":"Dual Encoding for Zero-Example Video Retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 9346--9355","author":"Dong Jianfeng","year":"2019","unstructured":"Jianfeng Dong , Xirong Li , Chaoxi Xu , Shouling Ji , Yuan He , Gang Yang , and Xun Wang . 2019 . Dual Encoding for Zero-Example Video Retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 9346--9355 . Jianfeng Dong, Xirong Li, Chaoxi Xu, Shouling Ji, Yuan He, Gang Yang, and Xun Wang. 2019. Dual Encoding for Zero-Example Video Retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 9346--9355."},{"key":"e_1_3_2_1_8_1","volume-title":"British Machine Vision Conference.","author":"Faghri Fartash","year":"2018","unstructured":"Fartash Faghri , David J Fleet , Jamie Ryan Kiros , and Sanja Fidler . 2018 . VSE+: Improving Visual-Semantic Embeddings with Hard Negatives . In British Machine Vision Conference. Fartash Faghri, David J Fleet, Jamie Ryan Kiros, and Sanja Fidler. 2018. VSE+: Improving Visual-Semantic Embeddings with Hard Negatives. In British Machine Vision Conference."},{"key":"e_1_3_2_1_9_1","unstructured":"Andrea Frome Greg S Corrado Jon Shlens Samy Bengio Jeffrey Dean Marcaurelio Ranzato and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In Neural Information Processing Systems. 2121--2129.  Andrea Frome Greg S Corrado Jon Shlens Samy Bengio Jeffrey Dean Marcaurelio Ranzato and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In Neural Information Processing Systems. 2121--2129."},{"key":"e_1_3_2_1_10_1","volume-title":"Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models. In IEEE Conference on Computer Vision and Pattern Recognition. 7181--7189","author":"Gu Jiuxiang","year":"2018","unstructured":"Jiuxiang Gu , Jianfei Cai , Shafiq Joty , Li Niu , and Gang Wang . 2018 . Look , Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models. In IEEE Conference on Computer Vision and Pattern Recognition. 7181--7189 . Jiuxiang Gu, Jianfei Cai, Shafiq Joty, Li Niu, and Gang Wang. 2018. Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models. In IEEE Conference on Computer Vision and Pattern Recognition. 7181--7189."},{"key":"e_1_3_2_1_11_1","volume-title":"What Matters: Attentive and Relational Feature Aggreggation Network for Video-Text Retrieval. In IEEE International Conference on Multimedia and Expo.","author":"Hao Xiaoshuai","year":"2021","unstructured":"Xiaoshuai Hao , Yucan Zhou , Dayan Wu , Wanqian Zhang , Bo Li , Weiping Wang , and Dan Meng . 2021 . What Matters: Attentive and Relational Feature Aggreggation Network for Video-Text Retrieval. In IEEE International Conference on Multimedia and Expo. Xiaoshuai Hao, Yucan Zhou, Dayan Wu, Wanqian Zhang, Bo Li, Weiping Wang, and Dan Meng. 2021. What Matters: Attentive and Relational Feature Aggreggation Network for Video-Text Retrieval. In IEEE International Conference on Multimedia and Expo."},{"key":"e_1_3_2_1_12_1","volume-title":"Learning Semantic Concepts and Order for Image and Sentence Matching. In IEEE Conference on Computer Vision and Pattern Recognition. 6163--6171","author":"Huang Yan","year":"2018","unstructured":"Yan Huang , Qi Wu , Chunfeng Song , and Liang Wang . 2018 . Learning Semantic Concepts and Order for Image and Sentence Matching. In IEEE Conference on Computer Vision and Pattern Recognition. 6163--6171 . Yan Huang, Qi Wu, Chunfeng Song, and Liang Wang. 2018. Learning Semantic Concepts and Order for Image and Sentence Matching. In IEEE Conference on Computer Vision and Pattern Recognition. 6163--6171."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380043"},{"key":"e_1_3_2_1_14_1","unstructured":"Ryan Kiros Ruslan Salakhutdinov and Zemel. 2014. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. In Neural Information Processing Systems.  Ryan Kiros Ruslan Salakhutdinov and Zemel. 2014. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. In Neural Information Processing Systems."},{"key":"e_1_3_2_1_15_1","volume-title":"International Conference on Acoustics, Speech and Signal Processing.","author":"Kobayashi Tetsunori","year":"2016","unstructured":"Tetsunori Kobayashi . 2016 . Improving semantic video indexing: Efforts in Waseda TRECVID 2015 SIN system . In International Conference on Acoustics, Speech and Signal Processing. Tetsunori Kobayashi. 2016. Improving semantic video indexing: Efforts in Waseda TRECVID 2015 SIN system. In International Conference on Acoustics, Speech and Signal Processing."},{"key":"e_1_3_2_1_16_1","volume-title":"Adversarial Multimodal Representation Learning for Click-Through Rate Prediction. In International World Wide Web Conferences. 827--836","author":"Li Xiang","year":"2020","unstructured":"Xiang Li , Chao Wang , Jiwei Tan , Xiaoyi Zeng , Dan Ou , and Bo Zheng . 2020 . Adversarial Multimodal Representation Learning for Click-Through Rate Prediction. In International World Wide Web Conferences. 827--836 . Xiang Li, Chao Wang, Jiwei Tan, Xiaoyi Zeng, Dan Ou, and Bo Zheng. 2020. Adversarial Multimodal Representation Learning for Click-Through Rate Prediction. In International World Wide Web Conferences. 827--836."},{"key":"e_1_3_2_1_17_1","unstructured":"Xirong Li Chaoxi Xu Gang Yang Zhineng Chen and Jianfeng Dong. 2019. W2VV+: Fully Deep Learning for Ad-hoc Video Search. In ACM Multimedia. 1786--1794.  Xirong Li Chaoxi Xu Gang Yang Zhineng Chen and Jianfeng Dong. 2019. W2VV+: Fully Deep Learning for Ad-hoc Video Search. In ACM Multimedia. 1786--1794."},{"key":"e_1_3_2_1_18_1","volume-title":"Learning Semantic Concepts and Order for Image and Sentence Matching","author":"Liang Wang","year":"2017","unstructured":"Wang Liang . 2017. Learning Semantic Concepts and Order for Image and Sentence Matching . IEEE Transactions on Pattern Analysis and Machine Intelligence ( 2017 ). Wang Liang. 2017. Learning Semantic Concepts and Order for Image and Sentence Matching. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380010"},{"volume-title":"British Machine Vision Conference.","author":"Liu Y.","key":"e_1_3_2_1_20_1","unstructured":"Y. Liu , S. Albanie , A. Nagrani , and A. Zisserman . 2019. Use What You Have: Video retrieval using representations from collaborative experts . In British Machine Vision Conference. Y. Liu, S. Albanie, A. Nagrani, and A. Zisserman. 2019. Use What You Have: Video retrieval using representations from collaborative experts. In British Machine Vision Conference."},{"key":"e_1_3_2_1_21_1","volume-title":"Query and Keyframe Representations for Ad-hoc Video Search. In ACM International Conference on Multimedia Retrieval. 407--411","author":"Markatopoulou Foteini","year":"2017","unstructured":"Foteini Markatopoulou , Damianos Galanopoulos , Vasileios Mezaris , and Ioannis Patras . 2017 . Query and Keyframe Representations for Ad-hoc Video Search. In ACM International Conference on Multimedia Retrieval. 407--411 . Foteini Markatopoulou, Damianos Galanopoulos, Vasileios Mezaris, and Ioannis Patras. 2017. Query and Keyframe Representations for Ad-hoc Video Search. In ACM International Conference on Multimedia Retrieval. 407--411."},{"key":"e_1_3_2_1_22_1","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition Workshops.","author":"Miech Antoine","year":"2017","unstructured":"Antoine Miech , Ivan Laptev , and Josef Sivic . 2017 . Learnable pooling with Context Gating for video classification . In IEEE Conference on Computer Vision and Pattern Recognition Workshops. Antoine Miech, Ivan Laptev, and Josef Sivic. 2017. Learnable pooling with Context Gating for video classification. In IEEE Conference on Computer Vision and Pattern Recognition Workshops."},{"key":"e_1_3_2_1_23_1","volume-title":"Learning a Text-Video Embedding from Incomplete and Heterogeneous Data. arXiv:1804.02516","author":"Miech Antoine","year":"2018","unstructured":"Antoine Miech , Ivan Laptev , and Josef Sivic . 2018. Learning a Text-Video Embedding from Incomplete and Heterogeneous Data. arXiv:1804.02516 ( 2018 ). Antoine Miech, Ivan Laptev, and Josef Sivic. 2018. Learning a Text-Video Embedding from Incomplete and Heterogeneous Data. arXiv:1804.02516 (2018)."},{"key":"e_1_3_2_1_24_1","volume-title":"Efficient Estimation of Word Representations in Vector Space. In International Conference on Learning Representations.","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov , Kai Chen , Greg Corrado , and Jeffrey Dean . 2013 . Efficient Estimation of Word Representations in Vector Space. In International Conference on Learning Representations. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In International Conference on Learning Representations."},{"key":"e_1_3_2_1_25_1","volume-title":"Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval. In ACM International Conference on Multimedia Retrieval. 19--27","author":"Mithun Niluthpol Chowdhury","year":"2018","unstructured":"Niluthpol Chowdhury Mithun , Juncheng Li , Florian Metze , and Amit K Roychowdhury . 2018 . Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval. In ACM International Conference on Multimedia Retrieval. 19--27 . Niluthpol Chowdhury Mithun, Juncheng Li, Florian Metze, and Amit K Roychowdhury. 2018. Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval. In ACM International Conference on Multimedia Retrieval. 19--27."},{"key":"e_1_3_2_1_26_1","volume-title":"Dual Attention Networks for Multimodal Reasoning and Matching. In IEEE Conference on Computer Vision and Pattern Recognition. 2156--2164","author":"Nam Hyeonseob","year":"2017","unstructured":"Hyeonseob Nam , Jungwoo Ha , and Jeonghee Kim . 2017 . Dual Attention Networks for Multimodal Reasoning and Matching. In IEEE Conference on Computer Vision and Pattern Recognition. 2156--2164 . Hyeonseob Nam, Jungwoo Ha, and Jeonghee Kim. 2017. Dual Attention Networks for Multimodal Reasoning and Matching. In IEEE Conference on Computer Vision and Pattern Recognition. 2156--2164."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"e_1_3_2_1_28_1","volume-title":"Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 1979--1988","author":"Song Yale","year":"2019","unstructured":"Yale Song and Mohammad Soleymani . 2019 . Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 1979--1988 . Yale Song and Mohammad Soleymani. 2019. Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 1979--1988."},{"key":"e_1_3_2_1_29_1","volume-title":"Order-Embeddings of Images and Language. In International Conference on Learning Representations.","author":"Vendrov Ivan","year":"2017","unstructured":"Ivan Vendrov , Ryan Kiros , Sanja Fidler , and Raquel Urtasun . 2017 . Order-Embeddings of Images and Language. In International Conference on Learning Representations. Ivan Vendrov, Ryan Kiros, Sanja Fidler, and Raquel Urtasun. 2017. Order-Embeddings of Images and Language. In International Conference on Learning Representations."},{"key":"e_1_3_2_1_30_1","volume-title":"Learning Deep Structure-Preserving Image-Text Embeddings. In IEEE Conference on Computer Vision and Pattern Recognition. 5005--5013","author":"Wang Liwei","year":"2016","unstructured":"Liwei Wang , Yin Li , and Svetlana Lazebnik . 2016 . Learning Deep Structure-Preserving Image-Text Embeddings. In IEEE Conference on Computer Vision and Pattern Recognition. 5005--5013 . Liwei Wang, Yin Li, and Svetlana Lazebnik. 2016. Learning Deep Structure-Preserving Image-Text Embeddings. In IEEE Conference on Computer Vision and Pattern Recognition. 5005--5013."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00928"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078971.3078989"},{"key":"e_1_3_2_1_33_1","volume-title":"Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts. In IEEE Conference on Computer Vision and Pattern Recognition. 2665--2672","author":"Wu Shuang","year":"2014","unstructured":"Shuang Wu , Sravanthi Bondugula , Florian Luisier , Xiaodan Zhuang , and Pradeep Natarajan . 2014 . Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts. In IEEE Conference on Computer Vision and Pattern Recognition. 2665--2672 . Shuang Wu, Sravanthi Bondugula, Florian Luisier, Xiaodan Zhuang, and Pradeep Natarajan. 2014. Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts. In IEEE Conference on Computer Vision and Pattern Recognition. 2665--2672."},{"key":"e_1_3_2_1_34_1","volume-title":"MSR-VTT: A Large Video Description Dataset for Bridging Video and Language. In IEEE Conference on Computer Vision and Pattern Recognition. 5288--5296","author":"Xu Jun","year":"2016","unstructured":"Jun Xu , Tao Mei , Ting Yao , and Yong Rui . 2016 . MSR-VTT: A Large Video Description Dataset for Bridging Video and Language. In IEEE Conference on Computer Vision and Pattern Recognition. 5288--5296 . Jun Xu, Tao Mei, Ting Yao, and Yong Rui. 2016. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language. In IEEE Conference on Computer Vision and Pattern Recognition. 5288--5296."},{"key":"e_1_3_2_1_35_1","volume":"201","author":"Xu Ran","unstructured":"Ran Xu , Caiming Xiong , Wei Chen , and Jason J Corso. 201 5. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In Association for the Advancement of Artificial Intelligence. 2346--2352. Ran Xu, Caiming Xiong, Wei Chen, and Jason J Corso. 2015. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In Association for the Advancement of Artificial Intelligence. 2346--2352.","journal-title":"Jason J Corso."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372278.3390673"},{"key":"e_1_3_2_1_37_1","volume-title":"Learning Joint Representations of Videos and Sentences with Web Image Search. In European Conference on Computer Vision.","author":"Yokoya Naokazu","year":"2016","unstructured":"Naokazu Yokoya . 2016 . Learning Joint Representations of Videos and Sentences with Web Image Search. In European Conference on Computer Vision. Naokazu Yokoya. 2016. Learning Joint Representations of Videos and Sentences with Web Image Search. In European Conference on Computer Vision."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"crossref","unstructured":"Jin Yuan Zhengjun Zha Yaotao Zheng Meng Wang Xiangdong Zhou and Tatseng Chua. 2011a. Learning concept bundles for video search with complex queries. In ACM Multimedia. 453--462.  Jin Yuan Zhengjun Zha Yaotao Zheng Meng Wang Xiangdong Zhou and Tatseng Chua. 2011a. Learning concept bundles for video search with complex queries. In ACM Multimedia. 453--462.","DOI":"10.1145\/2072298.2072357"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2011.2168813"},{"key":"e_1_3_2_1_40_1","volume-title":"Fast and Multilevel Semantic-Preserving Discrete Hashing. In British Machine Vision Conference. 157","author":"Zhang Wanqian","year":"2019","unstructured":"Wanqian Zhang , Dayan Wu , Jing Liu , Bo Li , Xiaoyan Gu , Weiping Wang , and Dan Meng . 2019 . Fast and Multilevel Semantic-Preserving Discrete Hashing. In British Machine Vision Conference. 157 . Wanqian Zhang, Dayan Wu, Jing Liu, Bo Li, Xiaoyan Gu, Weiping Wang, and Dan Meng. 2019. Fast and Multilevel Semantic-Preserving Discrete Hashing. In British Machine Vision Conference. 157."},{"key":"e_1_3_2_1_41_1","volume-title":"Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval .","author":"Zhang Wanqian","year":"2020","unstructured":"Wanqian Zhang , Dayan Wu , Yu Zhou , Bo Li , Weiping Wang , and Dan Meng . 2020 a. Binary Neural Network Hashing for Image Retrieval . In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval . Wanqian Zhang, Dayan Wu, Yu Zhou, Bo Li, Weiping Wang, and Dan Meng. 2020 a. Binary Neural Network Hashing for Image Retrieval. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval ."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3414028"},{"key":"e_1_3_2_1_43_1","volume-title":"Asymmetric Deep Hashing for Efficient Hash Code Compression. In ACM International Conference on Multimedia. 763--771","author":"Zhao Shu","year":"2020","unstructured":"Shu Zhao , Dayan Wu , Wanqian Zhang , Yu Zhou , Bo Li , and Weiping Wang . 2020 . Asymmetric Deep Hashing for Efficient Hash Code Compression. In ACM International Conference on Multimedia. 763--771 . Shu Zhao, Dayan Wu, Wanqian Zhang, Yu Zhou, Bo Li, and Weiping Wang. 2020. Asymmetric Deep Hashing for Efficient Hash Code Compression. In ACM International Conference on Multimedia. 763--771."}],"event":{"name":"ICMR '21: International Conference on Multimedia Retrieval","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Taipei Taiwan","acronym":"ICMR '21"},"container-title":["Proceedings of the 2021 International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460426.3463608","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3460426.3463608","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:03Z","timestamp":1750191423000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460426.3463608"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,24]]},"references-count":43,"alternative-id":["10.1145\/3460426.3463608","10.1145\/3460426"],"URL":"https:\/\/doi.org\/10.1145\/3460426.3463608","relation":{},"subject":[],"published":{"date-parts":[[2021,8,24]]},"assertion":[{"value":"2021-09-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}