{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:25:18Z","timestamp":1750220718132,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":44,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,6,8]],"date-time":"2020-06-08T00:00:00Z","timestamp":1591574400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,6,8]]},"DOI":"10.1145\/3372278.3390709","type":"proceedings-article","created":{"date-parts":[[2020,6,2]],"date-time":"2020-06-02T04:35:27Z","timestamp":1591072527000},"page":"9-15","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Visual Relations Augmented Cross-modal Retrieval"],"prefix":"10.1145","author":[{"given":"Yutian","family":"Guo","sequence":"first","affiliation":[{"name":"Fudan University, Shanghai, China"}]},{"given":"Jingjing","family":"Chen","sequence":"additional","affiliation":[{"name":"Fudan University, Shanghai, China"}]},{"given":"Hao","family":"Zhang","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, China"}]},{"given":"Yu-Gang","family":"Jiang","sequence":"additional","affiliation":[{"name":"Fudan University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2020,6,8]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240627"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/3288251.3288318"},{"key":"e_1_3_2_1_3_1","volume-title":"Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio.","author":"Cho Kyunghyun","year":"2014","unstructured":"Kyunghyun Cho , Bart Van Merri\u00ebnboer , Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014 . Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014). Kyunghyun Cho, Bart Van Merri\u00ebnboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.352"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.201"},{"key":"e_1_3_2_1_6_1","volume-title":"Jamie Ryan Kiros, and Sanja Fidler","author":"Faghri Fartash","year":"2018","unstructured":"Fartash Faghri , David J Fleet , Jamie Ryan Kiros, and Sanja Fidler . 2018 . VSE+: Improving Visual-Semantic Embeddings with Hard Negatives . (2018). Fartash Faghri, David J Fleet, Jamie Ryan Kiros, and Sanja Fidler. 2018. VSE+: Improving Visual-Semantic Embeddings with Hard Negatives. (2018)."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654902"},{"key":"e_1_3_2_1_8_1","volume-title":"Devise: A deep visual-semantic embedding model. In Advances in neural information processing systems. 2121--2129.","author":"Frome Andrea","year":"2013","unstructured":"Andrea Frome , Greg S Corrado , Jon Shlens , Samy Bengio , Jeff Dean , Marc'Aurelio Ranzato , and Tomas Mikolov . 2013 . Devise: A deep visual-semantic embedding model. In Advances in neural information processing systems. 2121--2129. Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. 2013. Devise: A deep visual-semantic embedding model. In Advances in neural information processing systems. 2121--2129."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00750"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00645"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00133"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298990"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298932"},{"key":"e_1_3_2_1_15_1","unstructured":"Andrej Karpathy Armand Joulin and Li F Fei-Fei. 2014. Deep fragment embeddings for bidirectional image sentence mapping. In Advances in neural information processing systems. 1889--1897.  Andrej Karpathy Armand Joulin and Li F Fei-Fei. 2014. Deep fragment embeddings for bidirectional image sentence mapping. In Advances in neural information processing systems. 1889--1897."},{"volume-title":"Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR) .","author":"Thomas","key":"e_1_3_2_1_16_1","unstructured":"Thomas N. Kipf and Max Welling. 2017 . Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR) . Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR) ."},{"key":"e_1_3_2_1_17_1","volume-title":"Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539","author":"Kiros Ryan","year":"2014","unstructured":"Ryan Kiros , Ruslan Salakhutdinov , and Richard S Zemel . 2014. Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 ( 2014 ). Ryan Kiros, Ruslan Salakhutdinov, and Richard S Zemel. 2014. Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014)."},{"key":"e_1_3_2_1_18_1","volume-title":"Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. https:\/\/arxiv.org\/abs\/1602.07332","author":"Krishna Ranjay","year":"2016","unstructured":"Ranjay Krishna , Yuke Zhu , Oliver Groth , Justin Johnson , Kenji Hata , Joshua Kravitz , Stephanie Chen , Yannis Kalantidis , Li-Jia Li , David A Shamma , Michael Bernstein , and Li Fei-Fei . 2016 . Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. https:\/\/arxiv.org\/abs\/1602.07332 Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, Michael Bernstein, and Li Fei-Fei. 2016. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. https:\/\/arxiv.org\/abs\/1602.07332"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_13"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_17"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.442"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.301"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.232"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_1_26_1","volume-title":"On the Convergence of Adam and Beyond. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=ryQu7f-RZ","author":"Reddi Sashank J.","year":"2018","unstructured":"Sashank J. Reddi , Satyen Kale , and Sanjiv Kumar . 2018 . On the Convergence of Adam and Beyond. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=ryQu7f-RZ Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar. 2018. On the Convergence of Adam and Beyond. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=ryQu7f-RZ"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Olga Russakovsky Jia Deng Hao Su Jonathan Krause Sanjeev Satheesh Sean Ma Zhiheng Huang Andrej Karpathy Aditya Khosla Michael Bernstein etal 2015. Imagenet large scale visual recognition challenge. International journal of computer vision Vol. 115 3 (2015) 211--252.  Olga Russakovsky Jia Deng Hao Su Jonathan Krause Sanjeev Satheesh Sean Ma Zhiheng Huang Andrej Karpathy Aditya Khosla Michael Bernstein et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision Vol. 115 3 (2015) 211--252.","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Yale Song and Mohammad Soleymani. 2019. Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval. In CVPR .  Yale Song and Mohammad Soleymani. 2019. Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval. In CVPR .","DOI":"10.1109\/CVPR.2019.00208"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.386"},{"key":"e_1_3_2_1_30_1","volume-title":"Order-embeddings of images and language. arXiv preprint arXiv:1511.06361","author":"Vendrov Ivan","year":"2015","unstructured":"Ivan Vendrov , Ryan Kiros , Sanja Fidler , and Raquel Urtasun . 2015. Order-embeddings of images and language. arXiv preprint arXiv:1511.06361 ( 2015 ). Ivan Vendrov, Ryan Kiros, Sanja Fidler, and Raquel Urtasun. 2015. Order-embeddings of images and language. arXiv preprint arXiv:1511.06361 (2015)."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123326"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.541"},{"key":"e_1_3_2_1_33_1","volume-title":"Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval. In The IEEE Winter Conference on Applications of Computer Vision. 1508--1517","author":"Wang Sijin","year":"2020","unstructured":"Sijin Wang , Ruiping Wang , Ziwei Yao , Shiguang Shan , and Xilin Chen . 2020 . Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval. In The IEEE Winter Conference on Applications of Computer Vision. 1508--1517 . Sijin Wang, Ruiping Wang, Ziwei Yao, Shiguang Shan, and Xilin Chen. 2020. Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval. In The IEEE Winter Conference on Applications of Computer Vision. 1508--1517."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350875"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01228-1_25"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00805"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.330"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01246-5_41"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00166"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00611"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01180"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01246-5_42"},{"key":"e_1_3_2_1_43_1","volume-title":"Dual-path convolutional image-text embedding with instance loss. arXiv preprint arXiv:1711.05535","author":"Zheng Zhedong","year":"2017","unstructured":"Zhedong Zheng , Liang Zheng , Michael Garrett , Yi Yang , and Yi-Dong Shen . 2017. Dual-path convolutional image-text embedding with instance loss. arXiv preprint arXiv:1711.05535 ( 2017 ). Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, and Yi-Dong Shen. 2017. Dual-path convolutional image-text embedding with instance loss. arXiv preprint arXiv:1711.05535 (2017)."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01174"}],"event":{"name":"ICMR '20: International Conference on Multimedia Retrieval","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Dublin Ireland","acronym":"ICMR '20"},"container-title":["Proceedings of the 2020 International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3372278.3390709","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3372278.3390709","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:33:24Z","timestamp":1750199604000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3372278.3390709"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,8]]},"references-count":44,"alternative-id":["10.1145\/3372278.3390709","10.1145\/3372278"],"URL":"https:\/\/doi.org\/10.1145\/3372278.3390709","relation":{},"subject":[],"published":{"date-parts":[[2020,6,8]]},"assertion":[{"value":"2020-06-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}