{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,15]],"date-time":"2025-10-15T10:19:34Z","timestamp":1760523574723,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","license":[{"start":{"date-parts":[[2017,10,19]],"date-time":"2017-10-19T00:00:00Z","timestamp":1508371200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"the Nature Science Foundation of China","award":["61422210, 61373076,61402388, 61572410"],"award-info":[{"award-number":["61422210, 61373076,61402388, 61572410"]}]},{"name":"National Key R&D Program","award":["2016YFB1001503"],"award-info":[{"award-number":["2016YFB1001503"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2017,10,19]]},"DOI":"10.1145\/3123266.3123275","type":"proceedings-article","created":{"date-parts":[[2017,10,20]],"date-time":"2017-10-20T13:04:26Z","timestamp":1508504666000},"page":"46-54","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":26,"title":["StructCap"],"prefix":"10.1145","author":[{"given":"Fuhai","family":"Chen","sequence":"first","affiliation":[{"name":"Xiamen University, Xiamen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rongrong","family":"Ji","sequence":"additional","affiliation":[{"name":"Xiamen University, Xiamen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jinsong","family":"Su","sequence":"additional","affiliation":[{"name":"Xiamen University, Xiamen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yongjian","family":"Wu","sequence":"additional","affiliation":[{"name":"Tencent YouTu Lab, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yunsheng","family":"Wu","sequence":"additional","affiliation":[{"name":"Tencent YouTu Lab, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2017,10,19]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298856"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"e_1_3_2_1_3_1","volume-title":"Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632","author":"Mao Junhua","year":"2014","unstructured":"Junhua Mao , Wei Xu , Yi Yang , Jiang Wang , Zhiheng Huang , and Alan Yuille . Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632 , 2014 . Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632, 2014."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298932"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"e_1_3_2_1_6_1","first-page":"77","volume-title":"ICML","volume":"14","author":"Xu Kelvin","year":"2015","unstructured":"Kelvin Xu , Jimmy Ba , Ryan Kiros , Kyunghyun Cho , Aaron C Courville , Ruslan Salakhutdinov , Richard S Zemel , and Yoshua Bengio . Show, attend and tell: Neural image caption generation with visual attention . In ICML , volume 14 , pages 77 -- 81 , 2015 . Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C Courville, Ruslan Salakhutdinov, Richard S Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, volume 14, pages 77--81, 2015."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.291"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.503"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.667"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_1_12_1","first-page":"91","volume-title":"NIPS","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren , Kaiming He , Ross Girshick , and Jian Sun . Faster r-cnn : Towards real-time object detection with region proposal networks . In NIPS , pages 91 -- 99 , 2015 . Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, pages 91--99, 2015."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46478-7_22"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.650"},{"key":"e_1_3_2_1_15_1","volume-title":"Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473","author":"Bahdanau Dzmitry","year":"2014","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 , 2014 . Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014."},{"key":"e_1_3_2_1_16_1","first-page":"3104","volume-title":"NIPS","author":"Sutskever Ilya","year":"2014","unstructured":"Ilya Sutskever , Oriol Vinyals , and Quoc V Le . Sequence to sequence learning with neural networks . In NIPS , pages 3104 -- 3112 , 2014 . Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In NIPS, pages 3104--3112, 2014."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964299"},{"key":"e_1_3_2_1_18_1","volume-title":"Reference based lstm for image captioning","author":"Chen Minghai","year":"2017","unstructured":"Minghai Chen , Guiguang Ding , Sicheng Zhao , Hui Chen , Jungong Han , and Qiang Liu . Reference based lstm for image captioning . 2017 . Minghai Chen, Guiguang Ding, Sicheng Zhao, Hui Chen, Jungong Han, and Qiang Liu. Reference based lstm for image captioning. 2017."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.277"},{"key":"e_1_3_2_1_20_1","article-title":"Aligning where to see and what to tell: image caption with region-based attention and scene factorization","author":"Jin Junqi","year":"2016","unstructured":"Junqi Jin , Kun Fu , Runpeng Cui , Fei Sha , and Changshui Zhang . Aligning where to see and what to tell: image caption with region-based attention and scene factorization . IEEE Transactions on Pattern Analysis and Machine Intelligence (TPMI) , 2016 . Junqi Jin, Kun Fu, Runpeng Cui, Fei Sha, and Changshui Zhang. Aligning where to see and what to tell: image caption with region-based attention and scene factorization. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPMI), 2016.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (TPMI)"},{"key":"e_1_3_2_1_21_1","volume-title":"Text-guided attention model for image captioning","author":"Mun Jonghwan","year":"2017","unstructured":"Jonghwan Mun , Minsu Cho , and Bohyung Han . Text-guided attention model for image captioning . 2017 . Jonghwan Mun, Minsu Cho, and Bohyung Han. Text-guided attention model for image captioning. 2017."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/946247.946677"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2009.5459175"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298959"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.479"},{"key":"e_1_3_2_1_26_1","first-page":"18","volume-title":"ICCV","author":"Tu Zhuowen","year":"2003","unstructured":"Zhuowen Tu , Xiangrong Chen , Image parsing : Unifying segmentation, detection, and recognition . In ICCV , pages 18 -- 25 . IEEE, 2003 . Zhuowen Tu, Xiangrong Chen, et al. Image parsing: Unifying segmentation, detection, and recognition. In ICCV, pages 18--25. IEEE, 2003."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2008.65"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2011.160"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.179"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"e_1_3_2_1_31_1","first-page":"129","volume-title":"ICML","author":"Socher Richard","year":"2011","unstructured":"Richard Socher , Cliff C Lin , Chris Manning , and Andrew Y Ng . Parsing natural scenes and natural language with recursive neural networks . In ICML , pages 129 -- 136 , 2011 . Richard Socher, Cliff C Lin, Chris Manning, and Andrew Y Ng. Parsing natural scenes and natural language with recursive neural networks. In ICML, pages 129--136, 2011."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298651"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.250"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1526709.1526757"},{"key":"e_1_3_2_1_35_1","volume-title":"On estimation of a probability density function and mode. The annals of mathematical statistics, 33(3):1065--1076","author":"Parzen Emanuel","year":"1962","unstructured":"Emanuel Parzen . On estimation of a probability density function and mode. The annals of mathematical statistics, 33(3):1065--1076 , 1962 . Emanuel Parzen. On estimation of a probability density function and mode. The annals of mathematical statistics, 33(3):1065--1076, 1962."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073135"},{"key":"e_1_3_2_1_37_1","first-page":"595","volume-title":"ICML","volume":"14","author":"Kiros Ryan","year":"2014","unstructured":"Ryan Kiros , Ruslan Salakhutdinov , and Richard S Zemel . Multimodal neural language models . In ICML , volume 14 , pages 595 -- 603 , 2014 . Ryan Kiros, Ruslan Salakhutdinov, and Richard S Zemel. Multimodal neural language models. In ICML, volume 14, pages 595--603, 2014."},{"key":"e_1_3_2_1_38_1","volume-title":"Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325","author":"Chen Xinlei","year":"2015","unstructured":"Xinlei Chen , Hao Fang , Tsung-Yi Lin , Ramakrishna Vedantam , Saurabh Gupta , Piotr Doll\u00e1r , and C Lawrence Zitnick . Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325 , 2015 . Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Doll\u00e1r, and C Lawrence Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015."},{"key":"e_1_3_2_1_39_1","volume-title":"ICLR","author":"Kingma Diederik","year":"2015","unstructured":"Diederik Kingma and Jimmy Ba. Adam : A method for stochastic optimization . In ICLR , 2015 . Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.3115\/1225403.1225421"}],"event":{"name":"MM '17: ACM Multimedia Conference","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Mountain View California USA","acronym":"MM '17"},"container-title":["Proceedings of the 25th ACM international conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3123266.3123275","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3123266.3123275","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:39:28Z","timestamp":1750217968000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3123266.3123275"}},"subtitle":["Structured Semantic Embedding for Image Captioning"],"short-title":[],"issued":{"date-parts":[[2017,10,19]]},"references-count":41,"alternative-id":["10.1145\/3123266.3123275","10.1145\/3123266"],"URL":"https:\/\/doi.org\/10.1145\/3123266.3123275","relation":{},"subject":[],"published":{"date-parts":[[2017,10,19]]},"assertion":[{"value":"2017-10-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}