{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:25:18Z","timestamp":1750220718111,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":27,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,6,8]],"date-time":"2020-06-08T00:00:00Z","timestamp":1591574400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"EU Horizon 2020 research and innovation programme","award":["H2020-780656 ReTV"],"award-info":[{"award-number":["H2020-780656 ReTV"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,6,8]]},"DOI":"10.1145\/3372278.3390737","type":"proceedings-article","created":{"date-parts":[[2020,6,2]],"date-time":"2020-06-02T04:35:27Z","timestamp":1591072527000},"page":"336-340","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["Attention Mechanisms, Signal Encodings and Fusion Strategies for Improved Ad-hoc Video Search with Dual Encoding Networks"],"prefix":"10.1145","author":[{"given":"Damianos","family":"Galanopoulos","sequence":"first","affiliation":[{"name":"CERTH-ITI, Thermi-Thessaloniki, Greece"}]},{"given":"Vasileios","family":"Mezaris","sequence":"additional","affiliation":[{"name":"CERTH-ITI, Thermi-Thessaloniki, Greece"}]}],"member":"320","published-online":{"date-parts":[[2020,6,8]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & retrieval. In TRECVID 2019 Workshop","author":"Awad George","year":"2019","unstructured":"George Awad , Asad Butt , Keith Curtis , Yooyoung Lee , Jonathan Fiscus , Afzal Godil , Andrew Delgado , Alan F Smeaton , Yvette Graham , Wessel Kraaij , and Georges Qu\u00e9not . 2019 . TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & retrieval. In TRECVID 2019 Workshop . Gaithersburg, MD, USA. NIST, USA. George Awad, Asad Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzal Godil, Andrew Delgado, Alan F Smeaton, Yvette Graham, Wessel Kraaij, and Georges Qu\u00e9not. 2019. TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & retrieval. In TRECVID 2019 Workshop. Gaithersburg, MD, USA. NIST, USA."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/K18-1030"},{"key":"e_1_3_2_1_3_1","volume-title":"TRECVID 2018 Workshop","author":"Bastan Muhammet","year":"2018","unstructured":"Muhammet Bastan , Xiangxi Shi , Jiuxiang Gu , Zhao Heng , Chen Zhuo , Dennis Sng , and Alex Kot . 2018 . NTU ROSE Lab at TRECVID 2018: Ad-hoc Video Search and Video to Text . In TRECVID 2018 Workshop . Gaithersburg, MD, USA. NIST, USA. Muhammet Bastan, Xiangxi Shi, Jiuxiang Gu, Zhao Heng, Chen Zhuo, Dennis Sng, and Alex Kot. 2018. NTU ROSE Lab at TRECVID 2018: Ad-hoc Video Search and Video to Text. In TRECVID 2018 Workshop. Gaithersburg, MD, USA. NIST, USA."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1179"},{"key":"e_1_3_2_1_5_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_3_2_1_6_1","volume-title":"Snoek","author":"Dong Jianfeng","year":"2016","unstructured":"Jianfeng Dong , Xirong Li , and Cees G. M . Snoek . 2016 . Word2VisualVec: Image and video to sentence matching by visual feature prediction. arXiv preprint arXiv:1604.06838 (2016). Jianfeng Dong, Xirong Li, and Cees G. M. Snoek. 2016. Word2VisualVec: Image and video to sentence matching by visual feature prediction. arXiv preprint arXiv:1604.06838 (2016)."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2832602"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00957"},{"key":"e_1_3_2_1_9_1","volume-title":"Proceedings of the British Machine Vision Conference (BMVC) .","author":"Faghri Fartash","year":"2018","unstructured":"Fartash Faghri , David J Fleet , Jamie Ryan Kiros , and Sanja Fidler . 2018 . VSE+: Improving Visual-Semantic Embeddings with Hard Negatives . In Proceedings of the British Machine Vision Conference (BMVC) . Fartash Faghri, David J Fleet, Jamie Ryan Kiros, and Sanja Fidler. 2018. VSE+: Improving Visual-Semantic Embeddings with Hard Negatives. In Proceedings of the British Machine Vision Conference (BMVC) ."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00233"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2578726.2578746"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2627563"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_14_1","volume-title":"TRECVID 2016 Workshop","author":"Le Duy-Dinh","year":"2016","unstructured":"Duy-Dinh Le , Sang Phan , Vinh-Tiep Nguyen , Benjamin Renoust , Tuan A Nguyen , Van-Nam Hoang , Thanh Duc Ngo , Minh-Triet Tran , Yuki Watanabe , Martin Klinkigt , 2016 . NII-HITACHI-UIT at TRECVID 2016 . In TRECVID 2016 Workshop . Gaithersburg, MD, USA . Duy-Dinh Le, Sang Phan, Vinh-Tiep Nguyen, Benjamin Renoust, Tuan A Nguyen, Van-Nam Hoang, Thanh Duc Ngo, Minh-Triet Tran, Yuki Watanabe, Martin Klinkigt, et al. 2016. NII-HITACHI-UIT at TRECVID 2016. In TRECVID 2016 Workshop. Gaithersburg, MD, USA ."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350906"},{"key":"e_1_3_2_1_16_1","volume-title":"TRECVID 2019 Workshop","author":"Li Xirong","year":"2019","unstructured":"Xirong Li , Jinde Ye , Chaoxi Xu , Shanjinwen Yun , Leimin Zhang , Xun Wang , Rui Qian , and Jianfeng Dong . 2019 b. Renmin University of China and Zhejiang Gongshang University at TRECVID 2019: Learn to Search and Describe Videos . In TRECVID 2019 Workshop . Gaithersburg, MD, USA . Xirong Li, Jinde Ye, Chaoxi Xu, Shanjinwen Yun, Leimin Zhang, Xun Wang, Rui Qian, and Jianfeng Dong. 2019 b. Renmin University of China and Zhejiang Gongshang University at TRECVID 2019: Learn to Search and Describe Videos. In TRECVID 2019 Workshop. Gaithersburg, MD, USA ."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.502"},{"key":"e_1_3_2_1_18_1","volume-title":"Proceedings of the 5th International Conference on Learning Representations .","author":"Lin Zhouhan","year":"2017","unstructured":"Zhouhan Lin , Minwei Feng , Cicero Nogueira dos Santos , Mo Yu , Bing Xiang , Bowen Zhou , and Yoshua Bengio . 2017 . A structured self-attentive sentence embedding . In Proceedings of the 5th International Conference on Learning Representations . Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. In Proceedings of the 5th International Conference on Learning Representations ."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078971.3079041"},{"key":"e_1_3_2_1_20_1","volume-title":"1st International Conference on Learning Representations, Workshop Track Proceedings (ICLR '13)","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov , G.s Corrado, Kai Chen , and Jeffrey Dean . 2013 . Efficient estimation of word representations in vector space . In 1st International Conference on Learning Representations, Workshop Track Proceedings (ICLR '13) . Tomas Mikolov, G.s Corrado, Kai Chen, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations, Workshop Track Proceedings (ICLR '13)."},{"volume-title":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval (ICMR '18)","author":"Mithun Niluthpol Chowdhury","key":"e_1_3_2_1_21_1","unstructured":"Niluthpol Chowdhury Mithun , Juncheng Li , Florian Metze , and Amit K . Roy-Chowdhury. 2018. Learning joint embedding with multimodal cues for cross-modal video-text retrieval . In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval (ICMR '18) . ACM, 19--27. Niluthpol Chowdhury Mithun, Juncheng Li, Florian Metze, and Amit K. Roy-Chowdhury. 2018. Learning joint embedding with multimodal cues for cross-modal video-text retrieval. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval (ICMR '18). ACM, 19--27."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01186"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1178677.1178722"},{"volume-title":"Detecting Events and Describing Video. In TRECVID 2017 Workshop","author":"Snoek Cees GM","key":"e_1_3_2_1_24_1","unstructured":"Cees GM Snoek , Xirong Li , Chaoxi Xu , and C. Dennis Koelma . 2017. University of Amsterdam and Renmin University at TRECVID 2017: Searching Video , Detecting Events and Describing Video. In TRECVID 2017 Workshop . Gaithersburg, MD, USA . Cees GM Snoek, Xirong Li, Chaoxi Xu, and C. Dennis Koelma. 2017. University of Amsterdam and Renmin University at TRECVID 2017: Searching Video, Detecting Events and Describing Video. In TRECVID 2017 Workshop. Gaithersburg, MD, USA ."},{"key":"e_1_3_2_1_25_1","volume-title":"TRECVID 2017 Workshop","author":"Ueki Kazuya","year":"2017","unstructured":"Kazuya Ueki , Yu Nakagome1, Koji Hirakawa , Kotaro Kikuchi , Tetsuji Ogawa , and Tetsunori Kobayashi . 2017 . Waseda Meisei at TRECVID 2017: Ad-hoc video search . In TRECVID 2017 Workshop . Gaithersburg, MD, USA . Kazuya Ueki, Yu Nakagome1, Koji Hirakawa, Kotaro Kikuchi, Tetsuji Ogawa, and Tetsunori Kobayashi. 2017. Waseda Meisei at TRECVID 2017: Ad-hoc video search. In TRECVID 2017 Workshop. Gaithersburg, MD, USA ."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.634"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.571"}],"event":{"name":"ICMR '20: International Conference on Multimedia Retrieval","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Dublin Ireland","acronym":"ICMR '20"},"container-title":["Proceedings of the 2020 International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3372278.3390737","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3372278.3390737","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:33:25Z","timestamp":1750199605000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3372278.3390737"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,8]]},"references-count":27,"alternative-id":["10.1145\/3372278.3390737","10.1145\/3372278"],"URL":"https:\/\/doi.org\/10.1145\/3372278.3390737","relation":{},"subject":[],"published":{"date-parts":[[2020,6,8]]},"assertion":[{"value":"2020-06-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}