{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T21:55:34Z","timestamp":1775253334849,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":43,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,10,15]],"date-time":"2018-10-15T00:00:00Z","timestamp":1539561600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,10,15]]},"DOI":"10.1145\/3240508.3240640","type":"proceedings-article","created":{"date-parts":[[2018,10,18]],"date-time":"2018-10-18T13:52:08Z","timestamp":1539870728000},"page":"1029-1037","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":44,"title":["Decoupled Novel Object Captioner"],"prefix":"10.1145","author":[{"given":"Yu","family":"Wu","sequence":"first","affiliation":[{"name":"University of Technology Sydney, Sydney, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Linchao","family":"Zhu","sequence":"additional","affiliation":[{"name":"University of Technology Sydney, Sydney, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lu","family":"Jiang","sequence":"additional","affiliation":[{"name":"Google Inc., San Francisco, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yi","family":"Yang","sequence":"additional","affiliation":[{"name":"University of Technology Sydney &amp; Chinese Academy of Sciences, Sydney, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,10,15]]},"reference":[{"key":"e_1_3_2_1_1_1","first-page":"265","article-title":"TensorFlow: A System for Large-Scale Machine Learning","volume":"16","author":"Abadi Mart'in","year":"2016","journal-title":"OSDI"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Peter Anderson Basura Fernando Mark Johnson and Stephen Gould. 2017. Guided open vocabulary image captioning with constrained beam search. In EMNLP . Peter Anderson Basura Fernando Mark Johnson and Stephen Gould. 2017. Guided open vocabulary image captioning with constrained beam search. In EMNLP .","DOI":"10.18653\/v1\/D17-1098"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Lisa Anne Henzdricks Subhashini Venugopalan Marcus Rohrbach Raymond Mooney Kate Saenko Trevor Darrell Junhua Mao Jonathan Huang Alexander Toshev Oana Camburu et almbox. 2016. Deep compositional captioning: Describing novel object categories without paired training data. In CVPR . Lisa Anne Henzdricks Subhashini Venugopalan Marcus Rohrbach Raymond Mooney Kate Saenko Trevor Darrell Junhua Mao Jonathan Huang Alexander Toshev Oana Camburu et almbox. 2016. Deep compositional captioning: Describing novel object categories without paired training data. In CVPR .","DOI":"10.1109\/CVPR.2016.8"},{"key":"e_1_3_2_1_4_1","volume-title":"METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In ACL-W. 65--72.","author":"Banerjee Satanjeev","year":"2005"},{"key":"e_1_3_2_1_5_1","unstructured":"Samy Bengio Oriol Vinyals Navdeep Jaitly and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. In NIPS . 1171--1179. Samy Bengio Oriol Vinyals Navdeep Jaitly and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. In NIPS . 1171--1179."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Jeffrey Donahue Lisa Anne Hendricks Sergio Guadarrama Marcus Rohrbach Subhashini Venugopalan Kate Saenko and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In CVPR . 2625--2634. Jeffrey Donahue Lisa Anne Hendricks Sergio Guadarrama Marcus Rohrbach Subhashini Venugopalan Kate Saenko and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In CVPR . 2625--2634.","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240527"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"crossref","unstructured":"Ali Farhadi Mohsen Hejrati Mohammad Amin Sadeghi Peter Young Cyrus Rashtchian Julia Hockenmaier and David Forsyth. 2010. Every picture tells a story: Generating sentences from images. In ECCV. 15--29. Ali Farhadi Mohsen Hejrati Mohammad Amin Sadeghi Peter Young Cyrus Rashtchian Julia Hockenmaier and David Forsyth. 2010. Every picture tells a story: Generating sentences from images. In ECCV. 15--29.","DOI":"10.1007\/978-3-642-15561-1_2"},{"key":"e_1_3_2_1_9_1","unstructured":"Chelsea Finn Pieter Abbeel and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In ICML . 1126--1135. Chelsea Finn Pieter Abbeel and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In ICML . 1126--1135."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Jonathan Huang Vivek Rathod Chen Sun Menglong Zhu Anoop Korattikara Alireza Fathi Ian Fischer Zbigniew Wojna Yang Song Sergio Guadarrama et almbox. 2017. Speed\/accuracy trade-offs for modern convolutional object detectors. In CVPR . Jonathan Huang Vivek Rathod Chen Sun Menglong Zhu Anoop Korattikara Alireza Fathi Ian Fischer Zbigniew Wojna Yang Song Sergio Guadarrama et almbox. 2017. Speed\/accuracy trade-offs for modern convolutional object detectors. In CVPR .","DOI":"10.1109\/CVPR.2017.351"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2671188.2749399"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806237"},{"key":"e_1_3_2_1_14_1","volume-title":"Densecap: Fully convolutional localization networks for dense captioning. In CVPR . 4565--4574.","author":"Johnson Justin","year":"2016"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR . 3128--3137. Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR . 3128--3137.","DOI":"10.1109\/CVPR.2015.7298932"},{"key":"e_1_3_2_1_16_1","volume-title":"Adam: A method for stochastic optimization. In ICLR .","author":"Kingma Diederik P","year":"2015"},{"key":"e_1_3_2_1_17_1","unstructured":"Ryan Kiros Ruslan Salakhutdinov and Rich Zemel. 2014. Multimodal neural language models. In ICML. 595--603. Ryan Kiros Ruslan Salakhutdinov and Rich Zemel. 2014. Multimodal neural language models. In ICML. 595--603."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.162"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.140"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Tsung-Yi Lin Michael Maire Serge Belongie James Hays Pietro Perona Deva Ramanan Piotr Doll\u00e1r and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. 740--755. Tsung-Yi Lin Michael Maire Serge Belongie James Hays Pietro Perona Deva Ramanan Piotr Doll\u00e1r and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. 740--755.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"crossref","unstructured":"Jiasen Lu Jianwei Yang Dhruv Batra and Devi Parikh. 2018. Neural Baby Talk. In CVPR. 7219--7228. Jiasen Lu Jianwei Yang Dhruv Batra and Devi Parikh. 2018. Neural Baby Talk. In CVPR. 7219--7228.","DOI":"10.1109\/CVPR.2018.00754"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.291"},{"key":"e_1_3_2_1_23_1","unstructured":"Junhua Mao Wei Xu Yi Yang Jiang Wang Zhiheng Huang and Alan Yuille. 2015b. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN). ICLR (2015). Junhua Mao Wei Xu Yi Yang Jiang Wang Zhiheng Huang and Alan Yuille. 2015b. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN). ICLR (2015)."},{"key":"e_1_3_2_1_24_1","volume-title":"International journal of lexicography","author":"Miller George A","year":"1990"},{"key":"e_1_3_2_1_25_1","volume-title":"Midge: Generating Image Descriptions From Computer Vision Detections. In EACL . 747--756.","author":"Mitchell Margaret","year":"2012"},{"key":"e_1_3_2_1_26_1","unstructured":"Vicente Ordonez Girish Kulkarni and Tamara L Berg. 2011. Im2text: Describing images using 1 million captioned photographs. In NIPS . 1143--1151. Vicente Ordonez Girish Kulkarni and Tamara L Berg. 2011. Im2text: Describing images using 1 million captioned photographs. In NIPS . 1143--1151."},{"key":"e_1_3_2_1_27_1","unstructured":"Marc'Aurelio Ranzato Sumit Chopra Michael Auli and Wojciech Zaremba. 2016. Sequence level training with recurrent neural networks. In ICLR . Marc'Aurelio Ranzato Sumit Chopra Michael Auli and Wojciech Zaremba. 2016. Sequence level training with recurrent neural networks. In ICLR ."},{"key":"e_1_3_2_1_28_1","unstructured":"Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS . 91--99. Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS . 91--99."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995627"},{"key":"e_1_3_2_1_30_1","unstructured":"Adam Santoro Sergey Bartunov Matthew Botvinick Daan Wierstra and Timothy Lillicrap. 2016. One-shot learning with memory-augmented neural networks. NIPS-W (2016). Adam Santoro Sergey Bartunov Matthew Botvinick Daan Wierstra and Timothy Lillicrap. 2016. One-shot learning with memory-augmented neural networks. NIPS-W (2016)."},{"key":"e_1_3_2_1_31_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR . Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR ."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Christian Szegedy Sergey Ioffe Vincent Vanhoucke and Alexander A Alemi. 2017. Inception-v4 inception-resnet and the impact of residual connections on learning. In AAAI . Christian Szegedy Sergey Ioffe Vincent Vanhoucke and Alexander A Alemi. 2017. Inception-v4 inception-resnet and the impact of residual connections on learning. In AAAI .","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Hamed R Tavakoliy Rakshith Shetty Ali Borji and Jorma Laaksonen. 2017. Paying Attention to Descriptions Generated by Image Captioning Models. In ICCV . 2506--2515. Hamed R Tavakoliy Rakshith Shetty Ali Borji and Jorma Laaksonen. 2017. Paying Attention to Descriptions Generated by Image Captioning Models. In ICCV . 2506--2515.","DOI":"10.1109\/ICCV.2017.272"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Subhashini Venugopalan Lisa Anne Hendricks Marcus Rohrbach Raymond Mooney Trevor Darrell and Kate Saenko. 2017. Captioning Images with Diverse Objects. In CVPR . Subhashini Venugopalan Lisa Anne Hendricks Marcus Rohrbach Raymond Mooney Trevor Darrell and Kate Saenko. 2017. Captioning Images with Diverse Objects. In CVPR .","DOI":"10.1109\/CVPR.2017.130"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.515"},{"key":"e_1_3_2_1_36_1","unstructured":"Oriol Vinyals Charles Blundell Tim Lillicrap Daan Wierstra et almbox. 2016. Matching networks for one shot learning. In NIPS. 3630--3638. Oriol Vinyals Charles Blundell Tim Lillicrap Daan Wierstra et almbox. 2016. Matching networks for one shot learning. In NIPS. 3630--3638."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Oriol Vinyals Alexander Toshev Samy Bengio and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In CVPR. 3156--3164. Oriol Vinyals Alexander Toshev Samy Bengio and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In CVPR. 3156--3164.","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2587640"},{"key":"e_1_3_2_1_39_1","unstructured":"Y. Xian C. H. Lampert B. Schiele and Z. Akata. 2018. Zero-Shot Learning - A Comprehensive Evaluation of the Good the Bad and the Ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018) 1--1. Y. Xian C. H. Lampert B. Schiele and Z. Akata. 2018. Zero-Shot Learning - A Comprehensive Evaluation of the Good the Bad and the Ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018) 1--1."},{"key":"e_1_3_2_1_40_1","unstructured":"Kelvin Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhudinov Rich Zemel and Yoshua Bengio. 2015. Show attend and tell: Neural image caption generation with visual attention. In ICML . 2048--2057. Kelvin Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhudinov Rich Zemel and Yoshua Bengio. 2015. Show attend and tell: Neural image caption generation with visual attention. In ICML . 2048--2057."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Ting Yao Yingwei Pan Yehao Li and Tao Mei. 2017. Incorporating copying mechanism in image captioning for learning novel objects. In CVPR . 5263--5271. Ting Yao Yingwei Pan Yehao Li and Tao Mei. 2017. Incorporating copying mechanism in image captioning for learning novel objects. In CVPR . 5263--5271.","DOI":"10.1109\/CVPR.2017.559"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"crossref","unstructured":"Quanzeng You Hailin Jin Zhaowen Wang Chen Fang and Jiebo Luo. 2016. Image captioning with semantic attention. In CVPR. 4651--4659. Quanzeng You Hailin Jin Zhaowen Wang Chen Fang and Jiebo Luo. 2016. Image captioning with semantic attention. In CVPR. 4651--4659.","DOI":"10.1109\/CVPR.2016.503"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-017-1033-7"}],"event":{"name":"MM '18: ACM Multimedia Conference","location":"Seoul Republic of Korea","acronym":"MM '18","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 26th ACM international conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3240508.3240640","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3240508.3240640","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T20:40:43Z","timestamp":1775248843000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3240508.3240640"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,10,15]]},"references-count":43,"alternative-id":["10.1145\/3240508.3240640","10.1145\/3240508"],"URL":"https:\/\/doi.org\/10.1145\/3240508.3240640","relation":{},"subject":[],"published":{"date-parts":[[2018,10,15]]},"assertion":[{"value":"2018-10-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}