{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T15:01:47Z","timestamp":1761663707449,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":47,"publisher":"ACM","license":[{"start":{"date-parts":[[2017,10,19]],"date-time":"2017-10-19T00:00:00Z","timestamp":1508371200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2017,10,19]]},"DOI":"10.1145\/3123266.3123439","type":"proceedings-article","created":{"date-parts":[[2017,10,20]],"date-time":"2017-10-20T13:04:26Z","timestamp":1508504666000},"page":"181-189","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["Deep Attribute-preserving Metric Learning for Natural Language Object Retrieval"],"prefix":"10.1145","author":[{"given":"Jianan","family":"Li","sequence":"first","affiliation":[{"name":"Beijing Institute of Technology, Beijing, China"}]},{"given":"Yunchao","family":"Wei","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]},{"given":"Xiaodan","family":"Liang","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}]},{"given":"Fang","family":"Zhao","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]},{"given":"Jianshu","family":"Li","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]},{"given":"Tingfa","family":"Xu","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, Beijing, China"}]},{"given":"Jiashi","family":"Feng","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2017,10,19]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Proceedings of the 30th International Conference on Machine Learning (ICML-13)","author":"Andrew Galen","year":"2013","unstructured":"Galen Andrew , Raman Arora , Jeff A Bilmes , and Karen Livescu . 2013 . Deep canonical correlation analysis . In Proceedings of the 30th International Conference on Machine Learning (ICML-13) . 1247--1255. Galen Andrew, Raman Arora, Jeff A Bilmes, and Karen Livescu. 2013. Deep canonical correlation analysis. In Proceedings of the 30th International Conference on Machine Learning (ICML-13). 1247--1255."},{"key":"e_1_3_2_1_2_1","volume-title":"Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062","author":"Chen Liang-Chieh","year":"2014","unstructured":"Liang-Chieh Chen , George Papandreou , Iasonas Kokkinos , Kevin Murphy , and Alan L Yuille . 2014. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062 ( 2014 ). Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2014. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062 (2014)."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1348246.1348248"},{"key":"e_1_3_2_1_4_1","volume-title":"Imagenet: A large-scale hierarchical image database Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248--255.","author":"Deng Jia","year":"2009","unstructured":"Jia Deng , Wei Dong , Richard Socher , Li-Jia Li , Kai Li , and Li Fei-Fei . 2009 . Imagenet: A large-scale hierarchical image database Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248--255. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248--255."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2004.04.009"},{"key":"e_1_3_2_1_6_1","volume-title":"Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell.","author":"Donahue Jeffrey","year":"2015","unstructured":"Jeffrey Donahue , Lisa Anne Hendricks , Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015 . Long-term recurrent convolutional networks for visual recognition and description Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 2625--2634. Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2625--2634."},{"key":"e_1_3_2_1_7_1","volume-title":"arXiv preprint arXiv:1602.06291","author":"Ghosh Shalini","year":"2016","unstructured":"Shalini Ghosh , Oriol Vinyals , Brian Strope , Scott Roy , Tom Dean , and Larry Heck . 2016. Contextual LSTM (CLSTM) models for Large scale NLP tasks. arXiv preprint arXiv:1602.06291 ( 2016 ). Shalini Ghosh, Oriol Vinyals, Brian Strope, Scott Roy, Tom Dean, and Larry Heck. 2016. Contextual LSTM (CLSTM) models for Large scale NLP tasks. arXiv preprint arXiv:1602.06291 (2016)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-013-0658-4"},{"volume-title":"The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems OntoImage 2006 Workshop on Language Resources for Content-based Image Retrieval during LREC 2006 Final Programme.","author":"Grubinger Michael","key":"e_1_3_2_1_10_1","unstructured":"Michael Grubinger , Paul Clough , Henning M\u00fcller , and Thomas Deselaers . {n. d.}. The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems OntoImage 2006 Workshop on Language Resources for Content-based Image Retrieval during LREC 2006 Final Programme. Michael Grubinger, Paul Clough, Henning M\u00fcller, and Thomas Deselaers. {n. d.}. The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems OntoImage 2006 Workshop on Language Resources for Content-based Image Retrieval during LREC 2006 Final Programme."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2014.X.041"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2465908"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.493"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3128--3137. Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3128--3137.","DOI":"10.1109\/CVPR.2015.7298932"},{"key":"e_1_3_2_1_17_1","unstructured":"Andrej Karpathy Armand Joulin and Fei Fei F Li. 2014. Deep fragment embeddings for bidirectional image sentence mapping Advances in Neural Information Processing Systems. 1889--1897. Andrej Karpathy Armand Joulin and Fei Fei F Li. 2014. Deep fragment embeddings for bidirectional image sentence mapping Advances in Neural Information Processing Systems. 1889--1897."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"crossref","unstructured":"Sahar Kazemzadeh Vicente Ordonez Mark Matten and Tamara L Berg. 2014. ReferItGame: Referring to Objects in Photographs of Natural Scenes The Conference on Empirical Methods in Natural Language Processing (EMNLP). 787--798. Sahar Kazemzadeh Vicente Ordonez Mark Matten and Tamara L Berg. 2014. ReferItGame: Referring to Objects in Photographs of Natural Scenes The Conference on Empirical Methods in Natural Language Processing (EMNLP). 787--798.","DOI":"10.3115\/v1\/D14-1086"},{"key":"e_1_3_2_1_19_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik","year":"2014","unstructured":"Diederik Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.455"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/957013.957143"},{"key":"e_1_3_2_1_22_1","volume-title":"2017 a. Recurrent Topic-Transition GAN for Visual Paragraph Generation. arXiv preprint arXiv:1703.07022","author":"Liang Xiaodan","year":"2017","unstructured":"Xiaodan Liang , Zhiting Hu , Hao Zhang , Chuang Gan , and Eric P Xing . 2017 a. Recurrent Topic-Transition GAN for Visual Paragraph Generation. arXiv preprint arXiv:1703.07022 ( 2017 ). Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, and Eric P Xing. 2017 a. Recurrent Topic-Transition GAN for Visual Paragraph Generation. arXiv preprint arXiv:1703.07022 (2017)."},{"key":"e_1_3_2_1_23_1","volume-title":"2017 b. Deep variation-structured reinforcement learning for visual relationship and attribute detection. arXiv preprint arXiv:1703.03054","author":"Liang Xiaodan","year":"2017","unstructured":"Xiaodan Liang , Lisa Lee , and Eric P Xing . 2017 b. Deep variation-structured reinforcement learning for visual relationship and attribute detection. arXiv preprint arXiv:1703.03054 ( 2017 ). Xiaodan Liang, Lisa Lee, and Eric P Xing. 2017 b. Deep variation-structured reinforcement learning for visual relationship and attribute detection. arXiv preprint arXiv:1703.03054 (2017)."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"e_1_3_2_1_26_1","unstructured":"Vijay Mahadevan Chi W Wong Jose C Pereira Tom Liu Nuno Vasconcelos and Lawrence K Saul. 2011. Maximum covariance unfolding: Manifold learning for bimodal data Advances in Neural Information Processing Systems. 918--926. Vijay Mahadevan Chi W Wong Jose C Pereira Tom Liu Nuno Vasconcelos and Lawrence K Saul. 2011. Maximum covariance unfolding: Manifold learning for bimodal data Advances in Neural Information Processing Systems. 918--926."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Junhua Mao Jonathan Huang Alexander Toshev Oana Camburu Alan L Yuille and Kevin Murphy. 2016. Generation and comprehension of unambiguous object descriptions Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11--20. Junhua Mao Jonathan Huang Alexander Toshev Oana Camburu Alan L Yuille and Kevin Murphy. 2016. Generation and comprehension of unambiguous object descriptions Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11--20.","DOI":"10.1109\/CVPR.2016.9"},{"volume-title":"Text Information Retrieval Systems","author":"Meadow Charles T","key":"e_1_3_2_1_28_1","unstructured":"Charles T Meadow , Bert R Boyce , Donald H Kraft , and Carol Barry . 2007. Text Information Retrieval Systems . Academic Press . Charles T Meadow, Bert R Boyce, Donald H Kraft, and Carol Barry. 2007. Text Information Retrieval Systems.Academic Press."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.142"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.303"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963449"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1873987"},{"key":"e_1_3_2_1_33_1","unstructured":"Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks Advances in Neural Information Processing Systems. 91--99. Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks Advances in Neural Information Processing Systems. 91--99."},{"volume-title":"Grounding of textual phrases in images by reconstruction European Conference on Computer Vision","author":"Rohrbach Anna","key":"e_1_3_2_1_34_1","unstructured":"Anna Rohrbach , Marcus Rohrbach , Ronghang Hu , Trevor Darrell , and Bernt Schiele . 2016. Grounding of textual phrases in images by reconstruction European Conference on Computer Vision . Springer , 817--834. Anna Rohrbach, Marcus Rohrbach, Ronghang Hu, Trevor Darrell, and Bernt Schiele. 2016. Grounding of textual phrases in images by reconstruction European Conference on Computer Vision. Springer, 817--834."},{"key":"e_1_3_2_1_35_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_1_36_1","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)","volume":"4","author":"Slaney Malcolm","year":"2002","unstructured":"Malcolm Slaney . 2002 . Semantic-audio retrieval . In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , Vol. Vol. 4 . IV--4108. Malcolm Slaney. 2002. Semantic-audio retrieval. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. Vol. 4. IV--4108."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.895972"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"crossref","unstructured":"Martin Sundermeyer Ralf Schl\u00fcter and Hermann Ney. 2012. LSTM Neural Networks for Language Modeling. In Interspeech. 194--197. Martin Sundermeyer Ralf Schl\u00fcter and Hermann Ney. 2012. LSTM Neural Networks for Language Modeling. In Interspeech. 194--197.","DOI":"10.21437\/Interspeech.2012-65"},{"key":"e_1_3_2_1_39_1","unstructured":"Ilya Sutskever Oriol Vinyals and Quoc V Le. 2014. Sequence to sequence learning with neural networks Advances in Neural Information Processing Systems. 3104--3112. Ilya Sutskever Oriol Vinyals and Quoc V Le. 2014. Sequence to sequence learning with neural networks Advances in Neural Information Processing Systems. 3104--3112."},{"volume-title":"Inferring a semantic representation of text via cross-language correlation analysis Advances in Neural Information Processing Systems","author":"Vinokourov Alexei","key":"e_1_3_2_1_40_1","unstructured":"Alexei Vinokourov , John Shawe-Taylor , and Nello Cristianini . 2002. Inferring a semantic representation of text via cross-language correlation analysis Advances in Neural Information Processing Systems , Vol. Vol. 1 . 4. Alexei Vinokourov, John Shawe-Taylor, and Nello Cristianini. 2002. Inferring a semantic representation of text via cross-language correlation analysis Advances in Neural Information Processing Systems, Vol. Vol. 1. 4."},{"key":"e_1_3_2_1_41_1","volume-title":"An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep Reinforcement Learning. arXiv preprint arXiv:1703.07579","author":"Wu Fan","year":"2017","unstructured":"Fan Wu , Zhongwen Xu , and Yi Yang . 2017. An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep Reinforcement Learning. arXiv preprint arXiv:1703.07579 ( 2017 ). Fan Wu, Zhongwen Xu, and Yi Yang. 2017. An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep Reinforcement Learning. arXiv preprint arXiv:1703.07579 (2017)."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1631272.1631298"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_5"},{"key":"e_1_3_2_1_44_1","volume-title":"2016 b. A Joint Speaker-Listener-Reinforcer Model for Referring Expressions. arXiv preprint arXiv:1612.09542","author":"Yu Licheng","year":"2016","unstructured":"Licheng Yu , Hao Tan , Mohit Bansal , and Tamara L Berg . 2016 b. A Joint Speaker-Listener-Reinforcer Model for Referring Expressions. arXiv preprint arXiv:1612.09542 ( 2016 ). Licheng Yu, Hao Tan, Mohit Bansal, and Tamara L Berg. 2016 b. A Joint Speaker-Listener-Reinforcer Model for Referring Expressions. arXiv preprint arXiv:1612.09542 (2016)."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1291233.1291290"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2007.911822"},{"key":"e_1_3_2_1_47_1","volume-title":"European Conference on Computer Vision. Springer, 391--405","author":"Lawrence Zitnick C","year":"2014","unstructured":"C Lawrence Zitnick and Piotr Doll\u00e1r . 2014 . Edge boxes: Locating object proposals from edges . European Conference on Computer Vision. Springer, 391--405 . C Lawrence Zitnick and Piotr Doll\u00e1r. 2014. Edge boxes: Locating object proposals from edges. European Conference on Computer Vision. Springer, 391--405."}],"event":{"name":"MM '17: ACM Multimedia Conference","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Mountain View California USA","acronym":"MM '17"},"container-title":["Proceedings of the 25th ACM international conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3123266.3123439","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3123266.3123439","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,26]],"date-time":"2025-06-26T16:33:15Z","timestamp":1750955595000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3123266.3123439"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,10,19]]},"references-count":47,"alternative-id":["10.1145\/3123266.3123439","10.1145\/3123266"],"URL":"https:\/\/doi.org\/10.1145\/3123266.3123439","relation":{},"subject":[],"published":{"date-parts":[[2017,10,19]]},"assertion":[{"value":"2017-10-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}