{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,1]],"date-time":"2025-10-01T17:58:55Z","timestamp":1759341535343,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":35,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,6,5]],"date-time":"2019-06-05T00:00:00Z","timestamp":1559692800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Natural Science Foundation of China","award":["61472202"],"award-info":[{"award-number":["61472202"]}]},{"name":"National Key R&D Program of China","award":["2018YFB0505400"],"award-info":[{"award-number":["2018YFB0505400"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,6,5]]},"DOI":"10.1145\/3323873.3325050","type":"proceedings-article","created":{"date-parts":[[2019,6,10]],"date-time":"2019-06-10T12:10:58Z","timestamp":1560168658000},"page":"297-305","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["Emotion Reinforced Visual Storytelling"],"prefix":"10.1145","author":[{"given":"Nanxing","family":"Li","sequence":"first","affiliation":[{"name":"Tsinghua University &amp; Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China"}]},{"given":"Bei","family":"Liu","sequence":"additional","affiliation":[{"name":"Microsoft Research Asia, Beijing, China"}]},{"given":"Zhizhong","family":"Han","sequence":"additional","affiliation":[{"name":"University of Maryland, College Park, MD, USA"}]},{"given":"Yu-Shen","family":"Liu","sequence":"additional","affiliation":[{"name":"Tsinghua University &amp; Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China"}]},{"given":"Jianlong","family":"Fu","sequence":"additional","affiliation":[{"name":"Microsoft Research Asia, Beijing , China"}]}],"member":"320","published-online":{"date-parts":[[2019,6,5]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Peter Anderson Xiaodong He Chris Buehler Damien Teney Mark Johnson Stephen Gould and Lei Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In CVPR .  Peter Anderson Xiaodong He Chris Buehler Damien Teney Mark Johnson Stephen Gould and Lei Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In CVPR .","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_3_2_1_2_1","volume-title":"Schwing","author":"Aneja Jyoti","year":"2018","unstructured":"Jyoti Aneja , Aditya Deshpande , and Alexander G . Schwing . 2018 . Convolutional Image Captioning. In CVPR . Jyoti Aneja, Aditya Deshpande, and Alexander G. Schwing. 2018. Convolutional Image Captioning. In CVPR ."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Damian Borth Rongrong Ji Tao Chen Thomas Breuel and Shih-Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In ACM MM . 223--232.  Damian Borth Rongrong Ji Tao Chen Thomas Breuel and Shih-Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In ACM MM . 223--232.","DOI":"10.1145\/2502081.2502282"},{"key":"e_1_3_2_1_4_1","volume-title":"Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096","author":"Brock Andrew","year":"2018","unstructured":"Andrew Brock , Jeff Donahue , and Karen Simonyan . 2018. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 ( 2018 ). Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Fuhai Chen Rongrong Ji Xiaoshuai Sun Yongjian Wu and Jinsong Su. 2018. GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints. In CVPR .  Fuhai Chen Rongrong Ji Xiaoshuai Sun Yongjian Wu and Jinsong Su. 2018. GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints. In CVPR .","DOI":"10.1109\/CVPR.2018.00146"},{"key":"e_1_3_2_1_6_1","unstructured":"Tseng-Hung Chen Yuan-Hong Liao Ching-Yao Chuang Wan-Ting Hsu Jianlong Fu and Min Sun. 2017. Show Adapt and Tell: Adversarial Training of Cross-domain Image Captioner. In ICCV . 521--530.  Tseng-Hung Chen Yuan-Hong Liao Ching-Yao Chuang Wan-Ting Hsu Jianlong Fu and Min Sun. 2017. Show Adapt and Tell: Adversarial Training of Cross-domain Image Captioner. In ICCV . 521--530."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Xinlei Chen and C Lawrence Zitnick. 2015. Mind's eye: A recurrent visual representation for image caption generation. In CVPR . 2422--2431.  Xinlei Chen and C Lawrence Zitnick. 2015. Mind's eye: A recurrent visual representation for image caption generation. In CVPR . 2422--2431.","DOI":"10.1109\/CVPR.2015.7298856"},{"key":"e_1_3_2_1_8_1","volume-title":"Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio.","author":"Cho Kyunghyun","year":"2014","unstructured":"Kyunghyun Cho , Bart Van Merri\u00ebnboer , Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014 . Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP. 1724--1734. Kyunghyun Cho, Bart Van Merri\u00ebnboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP. 1724--1734."},{"key":"e_1_3_2_1_9_1","volume-title":"Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth.","author":"Farhadi Ali","year":"2010","unstructured":"Ali Farhadi , Mohsen Hejrati , Mohammad Amin Sadeghi , Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010 . Every picture tells a story: Generating sentences from images. In ECCV. 15--29. Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every picture tells a story: Generating sentences from images. In ECCV. 15--29."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Bjarke Felbo Alan Mislove Anders S\u00f8gaard Iyad Rahwan and Sune Lehmann. 2017. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment emotion and sarcasm. In EMNLP .  Bjarke Felbo Alan Mislove Anders S\u00f8gaard Iyad Rahwan and Sune Lehmann. 2017. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment emotion and sarcasm. In EMNLP .","DOI":"10.18653\/v1\/D17-1169"},{"key":"e_1_3_2_1_11_1","unstructured":"Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. In NIPS. 2672--2680.   Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. In NIPS. 2672--2680."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Qiuyuan Huang Zhe Gan Asli Celikyilmaz Dapeng Wu Jianfeng Wang and Xiaodong He. 2019. Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation. In AAAI .  Qiuyuan Huang Zhe Gan Asli Celikyilmaz Dapeng Wu Jianfeng Wang and Xiaodong He. 2019. Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation. In AAAI .","DOI":"10.1609\/aaai.v33i01.33018465"},{"key":"e_1_3_2_1_13_1","volume-title":"et almbox","author":"Kenneth Huang Ting-Hao","year":"2016","unstructured":"Ting-Hao Kenneth Huang , Francis Ferraro , Nasrin Mostafazadeh , Ishan Misra , Aishwarya Agrawal , Jacob Devlin , Ross Girshick , Xiaodong He , Pushmeet Kohli , Dhruv Batra , et almbox . 2016 . Visual storytelling. In NAACL HLT. 1233--1239. Ting-Hao Kenneth Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, et almbox. 2016. Visual storytelling. In NAACL HLT. 1233--1239."},{"key":"e_1_3_2_1_14_1","unstructured":"Andrej Karpathy Armand Joulin and Fei Fei F Li. 2014. Deep fragment embeddings for bidirectional image sentence mapping. In NIPS . 1889--1897.   Andrej Karpathy Armand Joulin and Fei Fei F Li. 2014. Deep fragment embeddings for bidirectional image sentence mapping. In NIPS . 1889--1897."},{"key":"e_1_3_2_1_15_1","volume-title":"A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948","author":"Karras Tero","year":"2018","unstructured":"Tero Karras , Samuli Laine , and Timo Aila . 2018. A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948 ( 2018 ). Tero Karras, Samuli Laine, and Timo Aila. 2018. A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948 (2018)."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Jonathan Krause Justin Johnson Ranjay Krishna and Li Fei-Fei. 2017. A hierarchical approach for generating descriptive image paragraphs. In CVPR . 3337--3345.  Jonathan Krause Justin Johnson Ranjay Krishna and Li Fei-Fei. 2017. A hierarchical approach for generating descriptive image paragraphs. In CVPR . 3337--3345.","DOI":"10.1109\/CVPR.2017.356"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995466"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"crossref","unstructured":"Bei Liu Jianlong Fu Makoto P Kato and Masatoshi Yoshikawa. 2018. Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training. In ACM MM. 783--791.  Bei Liu Jianlong Fu Makoto P Kato and Masatoshi Yoshikawa. 2018. Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training. In ACM MM. 783--791.","DOI":"10.1145\/3240508.3240587"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Yu Liu Jianlong Fu Tao Mei and Chang Wen Chen. 2017. Let Your Photos Talk: Generating Narrative Paragraph for Photo Stream via Bidirectional Attention Recurrent Neural Networks.. In AAAI. 1445--1452.   Yu Liu Jianlong Fu Tao Mei and Chang Wen Chen. 2017. Let Your Photos Talk: Generating Narrative Paragraph for Photo Stream via Bidirectional Attention Recurrent Neural Networks.. In AAAI. 1445--1452.","DOI":"10.1609\/aaai.v31i1.10760"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Alexander Mathews Lexing Xie and Xuming He. 2018. SemStyle: Learning to Generate Stylised Image Captions Using Unaligned Text. In CVPR .  Alexander Mathews Lexing Xie and Xuming He. 2018. SemStyle: Learning to Generate Stylised Image Captions Using Unaligned Text. In CVPR .","DOI":"10.1109\/CVPR.2018.00896"},{"key":"e_1_3_2_1_21_1","volume-title":"Conditional generative adversarial nets. Computer Science","author":"Mirza Mehdi","year":"2014","unstructured":"Mehdi Mirza and Simon Osindero . 2014. Conditional generative adversarial nets. Computer Science ( 2014 ), 2672--2680. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. Computer Science (2014), 2672--2680."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073135"},{"key":"e_1_3_2_1_23_1","unstructured":"Cesc C Park and Gunhee Kim. 2015. Expressing an image stream with a sequence of natural sentences. In NIPS . 73--81.   Cesc C Park and Gunhee Kim. 2015. Expressing an image stream with a sequence of natural sentences. In NIPS . 73--81."},{"key":"e_1_3_2_1_24_1","unstructured":"Kihyuk Sohn Honglak Lee and Xinchen Yan. 2015. Learning structured output representation using deep conditional generative models. In NIPS . 3483--3491.   Kihyuk Sohn Honglak Lee and Xinchen Yan. 2015. Learning structured output representation using deep conditional generative models. In NIPS . 3483--3491."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Oriol Vinyals Alexander Toshev Samy Bengio and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In CVPR. 3156--3164.  Oriol Vinyals Alexander Toshev Samy Bengio and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In CVPR. 3156--3164.","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Jing Wang Jianlong Fu Jinhui Tang Zechao Li and Tao Mei. 2018b. Show reward and tell: Automatic generation of narrative paragraph from photo stream by adversarial training. AAAI.  Jing Wang Jianlong Fu Jinhui Tang Zechao Li and Tao Mei. 2018b. Show reward and tell: Automatic generation of narrative paragraph from photo stream by adversarial training. AAAI.","DOI":"10.1609\/aaai.v32i1.12318"},{"key":"e_1_3_2_1_27_1","unstructured":"Jingwen Wang Jianlong Fu Yong Xu and Tao Mei. 2016. Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks.. In IJCAI. 3484--3490.   Jingwen Wang Jianlong Fu Yong Xu and Tao Mei. 2016. Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks.. In IJCAI. 3484--3490."},{"key":"e_1_3_2_1_28_1","unstructured":"Xin Wang Wenhu Chen Yuan-Fang Wang and William Yang Wang. 2018a. No metrics are perfect: Adversarial reward learning for visual storytelling. In ACL . 899--909.  Xin Wang Wenhu Chen Yuan-Fang Wang and William Yang Wang. 2018a. No metrics are perfect: Adversarial reward learning for visual storytelling. In ACL . 899--909."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"e_1_3_2_1_30_1","unstructured":"Jingjing Xu Yi Zhang Qi Zeng Xuancheng Ren Xiaoyan Cai and Xu Sun. 2018. A skeleton-based model for promoting coherence among sentences in narrative story generation. EMNLP 4306--4315.  Jingjing Xu Yi Zhang Qi Zeng Xuancheng Ren Xiaoyan Cai and Xu Sun. 2018. A skeleton-based model for promoting coherence among sentences in narrative story generation. EMNLP 4306--4315."},{"key":"e_1_3_2_1_31_1","unstructured":"Kelvin Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhudinov Rich Zemel and Yoshua Bengio. 2015. Show attend and tell: Neural image caption generation with visual attention. In ICML . 2048--2057.   Kelvin Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhudinov Rich Zemel and Yoshua Bengio. 2015. Show attend and tell: Neural image caption generation with visual attention. In ICML . 2048--2057."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Ting Yao Yingwei Pan Yehao Li Zhaofan Qiu and Tao Mei. 2017. Boosting image captioning with attributes. In ICCV. 22--29.  Ting Yao Yingwei Pan Yehao Li Zhaofan Qiu and Tao Mei. 2017. Boosting image captioning with attributes. In ICCV. 22--29.","DOI":"10.1109\/ICCV.2017.524"},{"key":"e_1_3_2_1_33_1","unstructured":"Quanzeng You Hailin Jin Zhaowen Wang Chen Fang and Jiebo Luo. 2016a. Image captioning with semantic attention. In CVPR. 4651--4659.  Quanzeng You Hailin Jin Zhaowen Wang Chen Fang and Jiebo Luo. 2016a. Image captioning with semantic attention. In CVPR. 4651--4659."},{"key":"e_1_3_2_1_34_1","unstructured":"Quanzeng You Hailin Jin Zhaowen Wang Chen Fang and Jiebo Luo. 2016b. Image Captioning With Semantic Attention. In CVPR .  Quanzeng You Hailin Jin Zhaowen Wang Chen Fang and Jiebo Luo. 2016b. Image Captioning With Semantic Attention. In CVPR ."},{"key":"e_1_3_2_1_35_1","unstructured":"Lantao Yu Weinan Zhang Jun Wang and Yong Yu. 2017. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient.. In AAAI . 2852--2858.   Lantao Yu Weinan Zhang Jun Wang and Yong Yu. 2017. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient.. In AAAI . 2852--2858."}],"event":{"name":"ICMR '19: International Conference on Multimedia Retrieval","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Ottawa ON Canada","acronym":"ICMR '19"},"container-title":["Proceedings of the 2019 on International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3323873.3325050","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3323873.3325050","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:54:12Z","timestamp":1750204452000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3323873.3325050"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,6,5]]},"references-count":35,"alternative-id":["10.1145\/3323873.3325050","10.1145\/3323873"],"URL":"https:\/\/doi.org\/10.1145\/3323873.3325050","relation":{},"subject":[],"published":{"date-parts":[[2019,6,5]]},"assertion":[{"value":"2019-06-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}