{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:27:03Z","timestamp":1750220823016,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":54,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,10,15]],"date-time":"2019-10-15T00:00:00Z","timestamp":1571097600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key R&D Program of China","award":["Grant No.2016YFC0801003, Grant U1536203, Grant 61572493, Grant 61876177"],"award-info":[{"award-number":["Grant No.2016YFC0801003, Grant U1536203, Grant 61572493, Grant 61876177"]}]},{"name":"Central Universities","award":["N.A"],"award-info":[{"award-number":["N.A"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,10,15]]},"DOI":"10.1145\/3343031.3350907","type":"proceedings-article","created":{"date-parts":[[2019,10,21]],"date-time":"2019-10-21T16:32:26Z","timestamp":1571675546000},"page":"1220-1229","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Finding Images by Dialoguing with Image"],"prefix":"10.1145","author":[{"given":"Lejian","family":"Ren","sequence":"first","affiliation":[{"name":"Chinese Academy of Science, Beijing, China"}]},{"given":"Si","family":"Liu","sequence":"additional","affiliation":[{"name":"Beihang University &amp; Guangdong Provincial Key Laboratory of Computer Vision and Virtual Reality Technology, SIAT, CAS, Beijing, China"}]},{"given":"Han","family":"Huang","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}]},{"given":"Jizhong","family":"Han","sequence":"additional","affiliation":[{"name":"Chinese Academy of Science, Beijing, China"}]},{"given":"Shuicheng","family":"Yan","sequence":"additional","affiliation":[{"name":"YITU Tech, Beijing, China"}]},{"given":"Bo","family":"Li","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2019,10,15]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-7908-2604-3_16"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00632"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-4012"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.12"},{"key":"e_1_3_2_1_5_1","volume-title":"Detecting Visual Relationships with Deep Relational Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Dai Bo","year":"2017","unstructured":"Bo Dai , Yuqi Zhang , and Dahua Lin . 2017 . Detecting Visual Relationships with Deep Relational Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 3298--3308. Bo Dai, Yuqi Zhang, and Dahua Lin. 2017. Detecting Visual Relationships with Deep Relational Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 3298--3308."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2005.177"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206772"},{"key":"e_1_3_2_1_9_1","unstructured":"Vittorio Ferrari and Andrew Zisserman. 2008. Learning visual attributes. In Advances in neural information processing systems. 433--440.  Vittorio Ferrari and Andrew Zisserman. 2008. Learning visual attributes. In Advances in neural information processing systems. 433--440."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2008.4587799"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46466-4_15"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-017-1016-8"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.560"},{"key":"e_1_3_2_1_14_1","unstructured":"Xiaoxiao Guo Hui Wu Yu Cheng Steven Rennie Gerald Tesauro and Rogerio Feris. 2018. Dialog-based interactive image retrieval. In Advances in Neural Information Processing Systems. 678--688.  Xiaoxiao Guo Hui Wu Yu Cheng Steven Rennie Gerald Tesauro and Rogerio Feris. 2018. Dialog-based interactive image retrieval. In Advances in Neural Information Processing Systems. 678--688."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.322"},{"key":"e_1_3_2_1_16_1","unstructured":"Roei Herzig Moshiko Raboh Gal Chechik Jonathan Berant and Amir Globerson. 2018. Mapping images to scene graphs with permutation-invariant structured prediction. In Advances in Neural Information Processing Systems. 7211--7221.  Roei Herzig Moshiko Raboh Gal Chechik Jonathan Berant and Amir Globerson. 2018. Mapping images to scene graphs with permutation-invariant structured prediction. In Advances in Neural Information Processing Systems. 7211--7221."},{"key":"e_1_3_2_1_17_1","volume-title":"Long short-term memory. Neural computation","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation , Vol. 9 , 8 ( 1997 ), 1735--1780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation , Vol. 9, 8 (1997), 1735--1780."},{"key":"e_1_3_2_1_18_1","volume-title":"ACM Transactions on Information Systems (TOIS)","volume":"20","author":"Jaana Kalervo","year":"2002","unstructured":"Kalervo J\"arvelin and Jaana Kek\"al\"ainen. 2002 . Cumulated gain-based evaluation of IR techniques . ACM Transactions on Information Systems (TOIS) , Vol. 20 , 4 (2002), 422--446. Kalervo J\"arvelin and Jaana Kek\"al\"ainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) , Vol. 20, 4 (2002), 422--446."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298990"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46604-0_48"},{"key":"e_1_3_2_1_21_1","volume-title":"Kipf and Max Welling","author":"Thomas","year":"2017","unstructured":"Thomas N. Kipf and Max Welling . 2017 . Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24--26, 2017, Conference Track Proceedings . Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24--26, 2017, Conference Track Proceedings ."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206594"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.140"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01246-5_21"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.142"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.142"},{"volume-title":"Gated Graph Sequence Neural Networks. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2--4, 2016, Conference Track Proceedings .","author":"Li Yujia","key":"e_1_3_2_1_28_1","unstructured":"Yujia Li , Daniel Tarlow , Marc Brockschmidt , and Richard S. Zemel . 2016 . Gated Graph Sequence Neural Networks. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2--4, 2016, Conference Track Proceedings . Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. 2016. Gated Graph Sequence Neural Networks. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2--4, 2016, Conference Track Proceedings ."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3241399"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_51"},{"key":"e_1_3_2_1_32_1","volume-title":"A statistical approach to mechanized encoding and searching of literary information. IBM Journal of research and development","author":"Luhn Hans Peter","year":"1957","unstructured":"Hans Peter Luhn . 1957. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of research and development , Vol. 1 , 4 ( 1957 ), 309--317. Hans Peter Luhn. 1957. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of research and development , Vol. 1, 4 (1957), 309--317."},{"key":"e_1_3_2_1_33_1","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten Laurens","year":"2008","unstructured":"Laurens van der Maaten and Geoffrey Hinton . 2008 . Visualizing data using t-SNE . Journal of machine learning research , Vol. 9 , Nov (2008), 2579 -- 2605 . Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research , Vol. 9, Nov (2008), 2579--2605.","journal-title":"Journal of machine learning research"},{"key":"e_1_3_2_1_34_1","volume-title":"Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632","author":"Mao Junhua","year":"2014","unstructured":"Junhua Mao , Wei Xu , Yi Yang , Jiang Wang , Zhiheng Huang , and Alan Yuille . 2014. Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632 ( 2014 ). Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille. 2014. Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632 (2014)."},{"key":"e_1_3_2_1_35_1","unstructured":"Tomas Mikolov Ilya Sutskever Kai Chen Greg S Corrado and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.  Tomas Mikolov Ilya Sutskever Kai Chen Greg S Corrado and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119."},{"key":"e_1_3_2_1_36_1","unstructured":"Alejandro Newell and Jia Deng. 2017. Pixels to Graphs by Associative Embedding. In NIPS .  Alejandro Newell and Jia Deng. 2017. Pixels to Graphs by Associative Embedding. In NIPS ."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126281"},{"key":"e_1_3_2_1_38_1","volume-title":"Attentive Relational Networks for Mapping Images to Scene Graphs. arXiv preprint arXiv:1811.10696","author":"Qi Mengshi","year":"2018","unstructured":"Mengshi Qi , Weijian Li , Zhengyuan Yang , Yunhong Wang , and Jiebo Luo . 2018. Attentive Relational Networks for Mapping Images to Scene Graphs. arXiv preprint arXiv:1811.10696 ( 2018 ). Mengshi Qi, Weijian Li, Zhengyuan Yang, Yunhong Wang, and Jiebo Luo. 2018. Attentive Relational Networks for Mapping Images to Scene Graphs. arXiv preprint arXiv:1811.10696 (2018)."},{"key":"e_1_3_2_1_39_1","volume-title":"European conference on computer vision. Springer, 3--20","author":"Filip Radenovi\u0107","year":"2016","unstructured":"Filip Radenovi\u0107 , Giorgos Tolias , and Ondvr ej Chum . 2016 . CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples . In European conference on computer vision. Springer, 3--20 . Filip Radenovi\u0107 , Giorgos Tolias, and Ondvr ej Chum. 2016. CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In European conference on computer vision. Springer, 3--20."},{"key":"e_1_3_2_1_40_1","volume-title":"Fine-tuning CNN image retrieval with no human annotation","author":"Filip Radenovi\u0107","year":"2018","unstructured":"Filip Radenovi\u0107 , Giorgos Tolias , and Ondrej Chum . 2018. Fine-tuning CNN image retrieval with no human annotation . IEEE transactions on pattern analysis and machine intelligence ( 2018 ). Filip Radenovi\u0107 , Giorgos Tolias, and Ondrej Chum. 2018. Fine-tuning CNN image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence (2018)."},{"key":"e_1_3_2_1_41_1","unstructured":"Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.  Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W15-2812"},{"key":"e_1_3_2_1_43_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb026526"},{"key":"e_1_3_2_1_45_1","volume-title":"Graph Attention Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings .","author":"Velickovic Petar","year":"2018","unstructured":"Petar Velickovic , Guillem Cucurull , Arantxa Casanova , Adriana Romero , Pietro Li\u00f2 , and Yoshua Bengio . 2018 . Graph Attention Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings . Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li\u00f2 , and Yoshua Bengio. 2018. Graph Attention Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings ."},{"key":"e_1_3_2_1_46_1","volume-title":"Composing Text and Image for Image Retrieval - An Empirical Odyssey. arXiv preprint arXiv:1812.07119","author":"Vo Nam S.","year":"2018","unstructured":"Nam S. Vo , Lu Jiang , Chen Sun , Kevin Murphy , Li-Jia Li , Li Fei-Fei , and James Hays . 2018. Composing Text and Image for Image Retrieval - An Empirical Odyssey. arXiv preprint arXiv:1812.07119 ( 2018 ). Nam S. Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, and James Hays. 2018. Composing Text and Image for Image Retrieval - An Empirical Odyssey. arXiv preprint arXiv:1812.07119 (2018)."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.541"},{"key":"e_1_3_2_1_48_1","volume-title":"Linknet: Relational embedding for scene graph. In Advances in Neural Information Processing Systems. 560--570.","author":"Woo Sanghyun","year":"2018","unstructured":"Sanghyun Woo , Dahun Kim , Donghyeon Cho , and In So Kweon . 2018 . Linknet: Relational embedding for scene graph. In Advances in Neural Information Processing Systems. 560--570. Sanghyun Woo, Dahun Kim, Donghyeon Cho, and In So Kweon. 2018. Linknet: Relational embedding for scene graph. In Advances in Neural Information Processing Systems. 560--570."},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.634"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.330"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"crossref","unstructured":"Jianwei Yang Jiasen Lu Stefan Lee Dhruv Batra and Devi Parikh. 2018. Graph R-CNN for Scene Graph Generation. In ECCV .  Jianwei Yang Jiasen Lu Stefan Lee Dhruv Batra and Devi Parikh. 2018. Graph R-CNN for Scene Graph Generation. In ECCV .","DOI":"10.1007\/978-3-030-01246-5_41"},{"key":"e_1_3_2_1_52_1","volume-title":"Neural Motifs: Scene Graph Parsing with Global Context. 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zellers Rowan","year":"2018","unstructured":"Rowan Zellers , Mark Yatskar , Sam Thomson , and Yejin Choi . 2018 . Neural Motifs: Scene Graph Parsing with Global Context. 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (2018), 5831--5840. Rowan Zellers, Mark Yatskar, Sam Thomson, and Yejin Choi. 2018. Neural Motifs: Scene Graph Parsing with Global Context. 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (2018), 5831--5840."},{"key":"e_1_3_2_1_53_1","volume-title":"Visual Translation Embedding Network for Visual Relation Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Zhang Hanwang","year":"2017","unstructured":"Hanwang Zhang , Zawlin Kyaw , Shih-Fu Chang , and Tat-Seng Chua . 2017 . Visual Translation Embedding Network for Visual Relation Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 3107--3115. Hanwang Zhang, Zawlin Kyaw, Shih-Fu Chang, and Tat-Seng Chua. 2017. Visual Translation Embedding Network for Visual Relation Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 3107--3115."},{"key":"e_1_3_2_1_54_1","volume-title":"Towards Context-Aware Interaction Recognition for Visual Relationship Detection. 2017 IEEE International Conference on Computer Vision (ICCV)","author":"Zhuang Bohan","year":"2017","unstructured":"Bohan Zhuang , Lingqiao Liu , Chunhua Shen , and Ian D. Reid . 2017 . Towards Context-Aware Interaction Recognition for Visual Relationship Detection. 2017 IEEE International Conference on Computer Vision (ICCV) ( 2017 ), 589--598. Bohan Zhuang, Lingqiao Liu, Chunhua Shen, and Ian D. Reid. 2017. Towards Context-Aware Interaction Recognition for Visual Relationship Detection. 2017 IEEE International Conference on Computer Vision (ICCV) (2017), 589--598."}],"event":{"name":"MM '19: The 27th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Nice France","acronym":"MM '19"},"container-title":["Proceedings of the 27th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3343031.3350907","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3343031.3350907","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:13:17Z","timestamp":1750201997000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3343031.3350907"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,15]]},"references-count":54,"alternative-id":["10.1145\/3343031.3350907","10.1145\/3343031"],"URL":"https:\/\/doi.org\/10.1145\/3343031.3350907","relation":{},"subject":[],"published":{"date-parts":[[2019,10,15]]},"assertion":[{"value":"2019-10-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}