{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:14:57Z","timestamp":1750220097838,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":23,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,19]],"date-time":"2021-10-19T00:00:00Z","timestamp":1634601600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"the National Natural Science Foundation of China","award":["61371196"],"award-info":[{"award-number":["61371196"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,19]]},"DOI":"10.1145\/3487075.3487167","type":"proceedings-article","created":{"date-parts":[[2021,12,7]],"date-time":"2021-12-07T20:35:15Z","timestamp":1638909315000},"page":"1-7","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Super Visual Semantic Embedding for Cross-Modal Image-Text Retrieval"],"prefix":"10.1145","author":[{"given":"Zhixian","family":"Zeng","sequence":"first","affiliation":[{"name":"Department of The Sixty-third Research Institute, University of National University of Defense Technology, China"}]},{"given":"Jianjun","family":"Cao","sequence":"additional","affiliation":[{"name":"Department of The Sixty-third Research Institute, University of National University of Defense Technology, China"}]},{"given":"Guoquan","family":"Jiang","sequence":"additional","affiliation":[{"name":"Department of The Sixty-third Research Institute, University of National University of Defense Technology, China"}]},{"given":"Nianfeng","family":"Weng","sequence":"additional","affiliation":[{"name":"Department of The Sixty-third Research Institute, University of National University of Defense Technology, China"}]},{"given":"Yuxin","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Computer &amp; Software, Nanjing University of Information Science &amp; Technology, China"}]},{"given":"Zibo","family":"Nie","sequence":"additional","affiliation":[{"name":"Department of The Sixty-third Research Institute, University of National University of Defense Technology, China"}]}],"member":"320","published-online":{"date-parts":[[2021,12,7]]},"reference":[{"issue":"2","key":"e_1_3_2_1_1_1","first-page":"327","article-title":"Survey on multimodal visual language representation learning","volume":"32","author":"Du Pengfei","year":"2021","unstructured":"Pengfei Du , Xiaoyong Li and Yali Gao ( 2021 ). Survey on multimodal visual language representation learning . Journal of Software , 32 ( 2 ), 327 - 348 . Pengfei Du, Xiaoyong Li and Yali Gao (2021). Survey on multimodal visual language representation learning. Journal of Software, 32(2), 327-348.","journal-title":"Journal of Software"},{"key":"e_1_3_2_1_2_1","volume-title":"Relations between two sets of variates. Breakthroughs in statistics. 23(7): 162-190","author":"Harold Hotelling","year":"1992","unstructured":"Hotelling Harold ( 1992 ). Relations between two sets of variates. Breakthroughs in statistics. 23(7): 162-190 . Hotelling Harold (1992). Relations between two sets of variates. Breakthroughs in statistics. 23(7): 162-190."},{"key":"e_1_3_2_1_3_1","volume-title":"Ruslan Salakhutdinov and Richard S. Zemel","author":"Kiros Ryan","year":"2014","unstructured":"Ryan Kiros , Ruslan Salakhutdinov and Richard S. Zemel ( 2014 ). Unifying visual-semantic embeddings with multimodal neural language Models. Computer Science . Ryan Kiros, Ruslan Salakhutdinov and Richard S. Zemel (2014). Unifying visual-semantic embeddings with multimodal neural language Models. Computer Science."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_13"},{"key":"e_1_3_2_1_5_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Virtual.","author":"Jiacheng Chen","year":"2021","unstructured":"Chen Jiacheng , Hu Hexiang , Wu Hao , ( 2021 ). Learning the best pooling strategy for visual semantic embedding . Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Virtual. Chen Jiacheng, Hu Hexiang, Wu Hao, (2021). Learning the best pooling strategy for visual semantic embedding. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Virtual."},{"key":"e_1_3_2_1_6_1","volume-title":"2019 IEEE\/CVF International Conference on Computer Vision (ICCV). IEEE.","author":"Kunpeng Li","year":"2019","unstructured":"Li Kunpeng , Zhang, Yulun, Li, Ka , ( 2019 ). Visual semantic reasoning for image-text matching . 2019 IEEE\/CVF International Conference on Computer Vision (ICCV). IEEE. Li Kunpeng, Zhang, Yulun, Li, Ka, (2019). Visual semantic reasoning for image-text matching. 2019 IEEE\/CVF International Conference on Computer Vision (ICCV). IEEE."},{"key":"e_1_3_2_1_7_1","volume-title":"British Machine Vision Conference 2018, BMVC 2018","author":"Lee KuangHuei","year":"2018","unstructured":"KuangHuei Lee , Xi Chen , Gang Hua , ( 2018 ). Vse++: improving visual-semantic embeddings with hard negatives . in British Machine Vision Conference 2018, BMVC 2018 , Newcastle, UK. KuangHuei Lee, Xi Chen, Gang Hua, (2018). Vse++: improving visual-semantic embeddings with hard negatives. in British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK."},{"key":"e_1_3_2_1_8_1","volume-title":"Graph Attention Networks. International Conference on Learning Representations.","author":"Petar Veli\u010dkovi\u0107","year":"2018","unstructured":"Veli\u010dkovi\u0107 Petar , Cucurull Guillem , Casanova Arantxa , ( 2018 ). Graph Attention Networks. International Conference on Learning Representations. Veli\u010dkovi\u0107 Petar, Cucurull Guillem, Casanova Arantxa, (2018). Graph Attention Networks. International Conference on Learning Representations."},{"key":"e_1_3_2_1_9_1","volume-title":"Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE","author":"He Kaiming","year":"2016","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , ( 2016 ). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE , Las Vegas, NV, USA. Kaiming He, Xiangyu Zhang, Shaoqing Ren, (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, NV, USA."},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Virtual.","author":"Kalliatakis Grigorios Ronald","year":"2021","unstructured":"Stergiou, Alexandros, Poppe, Ronald and Kalliatakis Grigorios ( 2021 ). Refining activation downsampling with SoftPool . Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Virtual. Stergiou, Alexandros, Poppe, Ronald and Kalliatakis Grigorios (2021). Refining activation downsampling with SoftPool. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Virtual."},{"key":"e_1_3_2_1_11_1","volume-title":"Semi-Supervised Classification with Graph Convolutional Networks. 5th International Conference on Learning Representations, ICLR","author":"Thomas","year":"2017","unstructured":"Thomas N. Kipf and Max Welling (2017) . Semi-Supervised Classification with Graph Convolutional Networks. 5th International Conference on Learning Representations, ICLR 2017 . OpenReview.net, Toulon, France. Thomas N. Kipf and Max Welling (2017). Semi-Supervised Classification with Graph Convolutional Networks. 5th International Conference on Learning Representations, ICLR 2017. OpenReview.net, Toulon, France."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-2034"},{"issue":"4","key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","first-page":"2728","DOI":"10.1109\/TIP.2019.2952085","article-title":"MAVA: Multi-level adaptive visual-textual alignment by cross-media bi-attention mechanism","volume":"29","author":"Yuxin Peng","year":"2020","unstructured":"Peng Yuxin , Qi Jinwei and Zhuo Yunkan ( 2020 ). MAVA: Multi-level adaptive visual-textual alignment by cross-media bi-attention mechanism . IEEE Transactions on Image Processing , 29 ( 4 ): 2728 - 2741 . Peng Yuxin, Qi Jinwei and Zhuo Yunkan (2020). MAVA: Multi-level adaptive visual-textual alignment by cross-media bi-attention mechanism. IEEE Transactions on Image Processing, 29(4): 2728-2741.","journal-title":"IEEE Transactions on Image Processing"},{"key":"e_1_3_2_1_14_1","volume-title":"Imram: Iterative matching with recurrent attention memory for cross-modal image-text retrieval. 2020 Conference on Computer Vision and Pattern Recognition","author":"Chen Hui","year":"2020","unstructured":"Hui Chen , Guiguang Ding , Xudong Li , ( 2020 ). Imram: Iterative matching with recurrent attention memory for cross-modal image-text retrieval. 2020 Conference on Computer Vision and Pattern Recognition . IEEE , Seattle, WA, USA . Hui Chen, Guiguang Ding, Xudong Li, (2020). Imram: Iterative matching with recurrent attention memory for cross-modal image-text retrieval. 2020 Conference on Computer Vision and Pattern Recognition. IEEE, Seattle, WA, USA."},{"key":"e_1_3_2_1_15_1","volume-title":"Adaptive offline quintuplet loss for image-text matching. Computer Vision-ECCV 2020-16th European Conference","author":"Chen Tianlang","year":"2020","unstructured":"Tianlang Chen , Jiajun Deng , and Jiebo Luo ( 2020 ). Adaptive offline quintuplet loss for image-text matching. Computer Vision-ECCV 2020-16th European Conference . Springer , Glasgow, UK . Tianlang Chen, Jiajun Deng, and Jiebo Luo (2020). Adaptive offline quintuplet loss for image-text matching. Computer Vision-ECCV 2020-16th European Conference. Springer, Glasgow, UK."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6823"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2577031"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_21_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Mingwei Chang , Kenton Lee , ( 2019 ). BERT: Pre-training of deep bidirectional transformers for language understanding . Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics , Minneapolis, MN, USA. Jacob Devlin, Mingwei Chang, Kenton Lee, (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Minneapolis, MN, USA."},{"key":"e_1_3_2_1_22_1","volume-title":"Microsoft COCO: Common objects in context. Computer Vision-ECCV 2014-13th European Conference","author":"Lin Tsung-Yi","year":"2014","unstructured":"Tsung-Yi Lin , Michael Maire , Serge J. Belongi , ( 2014 ). Microsoft COCO: Common objects in context. Computer Vision-ECCV 2014-13th European Conference . Association for Springer , Zurich ,Switzerland. Tsung-Yi Lin, Michael Maire, Serge J. Belongi, (2014). Microsoft COCO: Common objects in context. Computer Vision-ECCV 2014-13th European Conference. Association for Springer, Zurich,Switzerland."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00166"}],"event":{"name":"CSAE 2021: The 5th International Conference on Computer Science and Application Engineering","acronym":"CSAE 2021","location":"Sanya China"},"container-title":["Proceedings of the 5th International Conference on Computer Science and Application Engineering"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3487075.3487167","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3487075.3487167","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:10:11Z","timestamp":1750183811000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3487075.3487167"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,19]]},"references-count":23,"alternative-id":["10.1145\/3487075.3487167","10.1145\/3487075"],"URL":"https:\/\/doi.org\/10.1145\/3487075.3487167","relation":{},"subject":[],"published":{"date-parts":[[2021,10,19]]},"assertion":[{"value":"2021-12-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}