{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,27]],"date-time":"2025-09-27T08:23:39Z","timestamp":1758961419750,"version":"3.41.0"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"2s","license":[{"start":{"date-parts":[[2019,4,30]],"date-time":"2019-04-30T00:00:00Z","timestamp":1556582400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2019,4,30]]},"abstract":"<jats:p>Despite the promising progress made in visual captioning and paragraphing, visual storytelling is still largely unexplored. This task is more challenging due to the difficulty in modeling an ordered photo sequence and in generating a relevant paragraph with expressive language style for storytelling. To deal with these challenges, we propose an Attribute-based Hierarchical Generative model with Reinforcement Learning and adversarial training (AHGRL). First, to model the ordered photo sequence and the complex story structure, we propose an attribute-based hierarchical generator. The generator incorporates semantic attributes to create more accurate and relevant descriptions. The hierarchical framework enables the generator to learn from the complex paragraph structure. Second, to generate story-style paragraphs, we design a language-style discriminator, which provides word-level rewards to optimize the generator by policy gradient. Third, we further consider the story generator and the reward critic as adversaries. The generator aims to create indistinguishable paragraphs to human-level stories, whereas the critic aims at distinguishing them and further improving the generator. Extensive experiments on the widely used dataset well demonstrate the advantages of the proposed method over state-of-the-art methods.<\/jats:p>","DOI":"10.1145\/3291925","type":"journal-article","created":{"date-parts":[[2019,7,3]],"date-time":"2019-07-03T13:47:53Z","timestamp":1562161673000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Show, Reward, and Tell"],"prefix":"10.1145","volume":"15","member":"320","published-online":{"date-parts":[[2019,7,3]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","unstructured":"Samy Bengio Oriol Vinyals Navdeep Jaitly and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems.","DOI":"10.5555\/2969239.2969370"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.667"},{"key":"e_1_2_1_3_1","unstructured":"Xinlei Chen Hao Fang Tsung-Yi Lin Ramakrishna Vedantam Saurabh Gupta Piotr Doll\u00e1r and C. Lawrence Zitnick. 2015. Microsoft COCO captions: Data collection and evaluation server. CoRR abs\/1504.00325."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Chen Xinlei","key":"e_1_2_1_4_1","unstructured":"Xinlei Chen and C. Lawrence Zitnick. 2015. Mind\u2019s eye: A recurrent visual representation for image caption generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.323"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/1888089.1888092"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/2566972.2566993"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2017.2710635"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2043612.2043613"},{"volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.","author":"Huang Ting-Hao","key":"e_1_2_1_11_1","unstructured":"Ting-Hao Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, et al. 2016. Visual storytelling. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/3045118.3045167"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Isola Phillip","key":"e_1_2_1_13_1","unstructured":"Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.494"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","unstructured":"Andrej Karpathy Armand Joulin and Fei Fei F. Li. 2014. Deep fragment embeddings for bidirectional image sentence mapping. In Advances in Neural Information Processing Systems.","DOI":"10.5555\/2969033.2969038"},{"key":"e_1_2_1_16_1","unstructured":"Ivan Krasin Tom Duerig Neil Alldrin Vittorio Ferrari Sami Abu-El-Haija Alina Kuznetsova Hassan Rom Jasper Uijlings Stefan Popov Shahab Kamali Matteo Malloci Jordi Pont-Tuset Andreas Veit Serge Belongie Victor Gomes Abhinav Gupta Chen Sun Gal Chechik David Cai Zheyun Feng Dhyanesh Narayanan and Kevin Murphy. 2017. OpenImages: A public dataset for large-scale multi-label and multi-class image classification. Dataset retrieved from https:\/\/storage.googleapis.com\/openimages\/web\/index.html."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.356"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995466"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 9th Workshop on Statistical Machine Translation.","author":"Alon Lavie Michael Denkowski","year":"2014","unstructured":"Michael Denkowski Alon Lavie. 2014. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the 9th Workshop on Statistical Machine Translation."},{"key":"e_1_2_1_20_1","article-title":"Deep Collaborative Embedding for Social Image Understanding","author":"Li Zechao","year":"2018","unstructured":"Zechao Li, Jinhui Tang, and Tao Mei. 2018. Deep Collaborative Embedding for Social Image Understanding. IEEE Trans. Pattern Anal. Mach. Intell.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"volume-title":"Proceedings of the IEEE International Conference on Computer Vision.","author":"Liang Xiaodan","key":"e_1_2_1_21_1","unstructured":"Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, and Eric P. Xing. 2017. Recurrent topic-transition GAN for visual paragraph generation. In Proceedings of the IEEE International Conference on Computer Vision."},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","unstructured":"Siqi Liu Zhenhai Zhu Ning Ye Sergio Guadarrama and Kevin Murphy. 2017. Improved image captioning via policy gradient optimization of SPIDEr. ICCV. 873--881.","DOI":"10.1109\/ICCV.2017.100"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/3298239.3298450"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the International Conference on Learning Representations.","author":"Mao Junhua","year":"2015","unstructured":"Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille. 2015. Deep captioning with multimodal recurrent neural networks (m-rnn). In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_2_1_25_1","unstructured":"Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. CoRR abs\/1411.1784."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073135"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/2969239.2969248"},{"key":"e_1_2_1_28_1","unstructured":"Marc\u2019Aurelio Ranzato Sumit Chopra Michael Auli and Wojciech Zaremba. 2016. Sequence level training with recurrent neural networks. ICLR."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.131"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.548"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the International Conference on Learning Representations.","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2998574"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2608882"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299087"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.515"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence.","author":"Wang Jing","year":"2018","unstructured":"Jing Wang, Jianlong Fu, Jinhui Tang, Zechao Li, and Tao Mei. 2018. Show, reward and tell: Automatic generation of narrative paragraph from photo stream by adversarial training. In Proceedings of the AAAI Conference on Artificial Intelligence."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/3045118.3045336"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157096.3157361"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.512"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.524"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.503"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.496"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.5555\/3298483.3298649"},{"key":"e_1_2_1_46_1","unstructured":"Wojciech Zaremba and Ilya Sutskever. 2015. Reinforcement learning neural turing machines. CoRR abs\/1505.00521. http:\/\/arxiv.org\/abs\/1505.00521."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3291925","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3291925","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:02:02Z","timestamp":1750208522000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3291925"}},"subtitle":["Adversarial Visual Story Generation"],"editor":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9008-222X","authenticated-orcid":false,"given":"Jinhui","family":"Tang","sequence":"first","affiliation":[]},{"given":"Jing","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Zechao","family":"Li","sequence":"additional","affiliation":[]},{"given":"Jianlong","family":"Fu","sequence":"additional","affiliation":[]},{"given":"Tao","family":"Mei","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,4,30]]},"references-count":46,"journal-issue":{"issue":"2s","published-print":{"date-parts":[[2019,4,30]]}},"alternative-id":["10.1145\/3291925"],"URL":"https:\/\/doi.org\/10.1145\/3291925","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2019,4,30]]},"assertion":[{"value":"2018-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-07-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}