{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T15:39:12Z","timestamp":1761061152922,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":54,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T00:00:00Z","timestamp":1602460800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Research Foundation Singapore"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,12]]},"DOI":"10.1145\/3394171.3413589","type":"proceedings-article","created":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T12:27:35Z","timestamp":1602505655000},"page":"4013-4022","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Who You Are Decides How You Tell"],"prefix":"10.1145","author":[{"given":"Shuang","family":"Wu","sequence":"first","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shaojing","family":"Fan","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhiqi","family":"Shen","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohan","family":"Kankanhalli","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anthony K.H.","family":"Tung","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,10,12]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1038\/nrn1056"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.4324\/9780203075319"},{"volume-title":"Design of comparative experiments","author":"Bailey Rosemary A","key":"e_1_3_2_2_4_1","unstructured":"Rosemary A Bailey . 2008. Design of comparative experiments . Vol. 25 . Cambridge University Press . Rosemary A Bailey. 2008. Design of comparative experiments. Vol. 25. Cambridge University Press."},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2008.10.080"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01275"},{"key":"e_1_3_2_2_7_1","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV). 519--535","author":"Chen Tianlang","year":"2018","unstructured":"Tianlang Chen , Zhongping Zhang , Quanzeng You , Chen Fang , Zhaowen Wang , Hailin Jin , and Jiebo Luo . 2018 . ' Factual'or'Emotional': Stylized Image Captioning with Adaptive Learning and Attention . In Proceedings of the European Conference on Computer Vision (ECCV). 519--535 . Tianlang Chen, Zhongping Zhang, Quanzeng You, Chen Fang, Zhaowen Wang, Hailin Jin, and Jiebo Luo. 2018. 'Factual'or'Emotional': Stylized Image Captioning with Adaptive Learning and Attention. In Proceedings of the European Conference on Computer Vision (ECCV). 519--535."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.681"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00608"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3348"},{"key":"e_1_3_2_2_11_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-2074"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298754"},{"key":"e_1_3_2_2_14_1","volume-title":"Automatic caption generation for news images","author":"Feng Yansong","year":"2012","unstructured":"Yansong Feng and Mirella Lapata . 2012. Automatic caption generation for news images . IEEE transactions on pattern analysis and machine intelligence 35, 4 ( 2012 ), 797--812. Yansong Feng and Mirella Lapata. 2012. Automatic caption generation for news images. IEEE transactions on pattern analysis and machine intelligence 35, 4 (2012), 797--812."},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.108"},{"key":"e_1_3_2_2_16_1","volume-title":"Measuring intrinsic motivation in everyday life. Leisure studies 2, 2","author":"Graef Ronald","year":"1983","unstructured":"Ronald Graef , Mihaly Csikszentmihalyi , and Susan McManama Gianinno . 1983. Measuring intrinsic motivation in everyday life. Leisure studies 2, 2 ( 1983 ), 155-- 168. Ronald Graef, Mihaly Csikszentmihalyi, and Susan McManama Gianinno. 1983. Measuring intrinsic motivation in everyday life. Leisure studies 2, 2 (1983), 155-- 168."},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00433"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.205"},{"key":"e_1_3_2_2_19_1","volume-title":"Proceedings of the ICCS\/CogSci-2006 long symposium: Toward social mechanisms of android science. Citeseer, 39--42","author":"Hanson David","year":"2006","unstructured":"David Hanson . 2006 . Exploring the aesthetic range for humanoid robots . In Proceedings of the ICCS\/CogSci-2006 long symposium: Toward social mechanisms of android science. Citeseer, 39--42 . David Hanson. 2006. Exploring the aesthetic range for humanoid robots. In Proceedings of the ICCS\/CogSci-2006 long symposium: Toward social mechanisms of android science. Citeseer, 39--42."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_21_1","volume-title":"Long short-term memory. Neural computation 9, 8","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation 9, 8 ( 1997 ), 1735--1780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780."},{"volume-title":"Teaching and learning science: Towards a personalized approach","author":"Hodson Derek","key":"e_1_3_2_2_22_1","unstructured":"Derek Hodson . 1998. Teaching and learning science: Towards a personalized approach . McGraw-Hill Education (UK) . Derek Hodson. 1998. Teaching and learning science: Towards a personalized approach. McGraw-Hill Education (UK)."},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1037\/0022-3514.79.6.995"},{"key":"e_1_3_2_2_24_1","volume-title":"Re-evaluating automatic metrics for image captioning. arXiv preprint arXiv:1612.07600","author":"Kilickaya Mert","year":"2016","unstructured":"Mert Kilickaya , Aykut Erdem , Nazli Ikizler-Cinbis , and Erkut Erdem . 2016. Re-evaluating automatic metrics for image captioning. arXiv preprint arXiv:1612.07600 ( 2016 ). Mert Kilickaya, Aykut Erdem, Nazli Ikizler-Cinbis, and Erkut Erdem. 2016. Re-evaluating automatic metrics for image captioning. arXiv preprint arXiv:1612.07600 (2016)."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.162"},{"key":"e_1_3_2_2_26_1","volume-title":"Proceedings of the Fifteenth Conference on Computational Natural Language Learning.Association for Computational Linguistics, 220--228","author":"Li Siming","year":"2011","unstructured":"Siming Li , Girish Kulkarni , Tamara L Berg , Alexander C Berg , and Yejin Choi . 2011 . Composing simple image descriptions using web-scale n-grams . In Proceedings of the Fifteenth Conference on Computational Natural Language Learning.Association for Computational Linguistics, 220--228 . Siming Li, Girish Kulkarni, Tamara L Berg, Alexander C Berg, and Yejin Choi. 2011. Composing simple image descriptions using web-scale n-grams. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning.Association for Computational Linguistics, 220--228."},{"key":"e_1_3_2_2_27_1","volume-title":"ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin . 2004 . ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out . Association for Computational Linguistics , Barcelona, Spain , 74--81. https:\/\/www.aclweb.org\/anthology\/W04--1013 Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74--81. https:\/\/www.aclweb.org\/anthology\/W04--1013"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_2_29_1","volume-title":"Entityaware image caption generation. arXiv preprint arXiv:1804.07889","author":"Lu Di","year":"2018","unstructured":"Di Lu , Spencer Whitehead , Lifu Huang , Heng Ji , and Shih-Fu Chang . 2018. Entityaware image caption generation. arXiv preprint arXiv:1804.07889 ( 2018 ). Di Lu, Spencer Whitehead, Lifu Huang, Heng Ji, and Shih-Fu Chang. 2018. Entityaware image caption generation. arXiv preprint arXiv:1804.07889 (2018)."},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.345"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00896"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10475"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.11"},{"key":"e_1_3_2_2_34_1","volume-title":"Running experiments on amazon mechanical turk. Judgment and Decision making 5, 5","author":"Paolacci Gabriele","year":"2010","unstructured":"Gabriele Paolacci , Jesse Chandler , and Panagiotis G Ipeirotis . 2010. Running experiments on amazon mechanical turk. Judgment and Decision making 5, 5 ( 2010 ), 411--419. Gabriele Paolacci, Jesse Chandler, and Panagiotis G Ipeirotis. 2010. Running experiments on amazon mechanical turk. Judgment and Decision making 5, 5 (2010), 411--419."},{"key":"e_1_3_2_2_35_1","volume-title":"Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni , Salim Roukos , Todd Ward , and Wei-Jing Zhu . 2002 . BLEU: a method for automatic evaluation of machine translation . In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318 . Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.303"},{"key":"e_1_3_2_2_37_1","volume-title":"Breakingnews: Article annotation by image and text processing","author":"Ramisa Arnau","year":"2017","unstructured":"Arnau Ramisa , Fei Yan , Francesc Moreno-Noguer , and Krystian Mikolajczyk . 2017 . Breakingnews: Article annotation by image and text processing . IEEE transactions on pattern analysis and machine intelligence 40, 5 (2017), 1072--1085. Arnau Ramisa, Fei Yan, Francesc Moreno-Noguer, and Krystian Mikolajczyk. 2017. Breakingnews: Article annotation by image and text processing. IEEE transactions on pattern analysis and machine intelligence 40, 5 (2017), 1072--1085."},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.445"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01280"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41559-016-0065"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2016.61"},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299087"},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.515"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"e_1_3_2_2_45_1","volume-title":"Motivating forces of human actions: Neuroimaging reward and social interaction. Brain research bulletin 67, 5","author":"Walter Henrik","year":"2005","unstructured":"Henrik Walter , Birgit Abler , Angela Ciaramidaro , and Susanne Erk . 2005. Motivating forces of human actions: Neuroimaging reward and social interaction. Brain research bulletin 67, 5 ( 2005 ), 368--381. Henrik Walter, Birgit Abler, Angela Ciaramidaro, and Susanne Erk. 2005. Motivating forces of human actions: Neuroimaging reward and social interaction. Brain research bulletin 67, 5 (2005), 368--381."},{"key":"e_1_3_2_2_46_1","volume-title":"Thirty-Second AAAI Conference on Artificial Intelligence.","author":"Fu Jianlong","year":"2018","unstructured":"JingWang, Jianlong Fu , Jinhui Tang , Zechao Li , and Tao Mei . 2018 . Show, reward and tell: Automatic generation of narrative paragraph from photo stream by adversarial training . In Thirty-Second AAAI Conference on Artificial Intelligence. JingWang, Jianlong Fu, Jinhui Tang, Zechao Li, and Tao Mei. 2018. Show, reward and tell: Automatic generation of narrative paragraph from photo stream by adversarial training. In Thirty-Second AAAI Conference on Artificial Intelligence."},{"key":"e_1_3_2_2_47_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4195--4203","author":"Chan QingzhongWang","year":"2019","unstructured":"QingzhongWang and Antoni B Chan . 2019 . Describing like humans: on diversity in image captioning . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4195--4203 . QingzhongWang and Antoni B Chan. 2019. Describing like humans: on diversity in image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4195--4203."},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.500"},{"key":"e_1_3_2_2_49_1","volume-title":"International conference on machine learning. 2048--2057","author":"Xu Kelvin","year":"2015","unstructured":"Kelvin Xu , Jimmy Ba , Ryan Kiros , Kyunghyun Cho , Aaron Courville , Ruslan Salakhudinov , Rich Zemel , and Yoshua Bengio . 2015 . Show, attend and tell: Neural image caption generation with visual attention . In International conference on machine learning. 2048--2057 . Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning. 2048--2057."},{"key":"e_1_3_2_2_50_1","unstructured":"Zhilin Yang Ye Yuan Yuexin Wu William W Cohen and Russ R Salakhutdinov. 2016. Review networks for caption generation. In Advances in neural information processing systems. 2361--2369.  Zhilin Yang Ye Yuan Yuexin Wu William W Cohen and Russ R Salakhutdinov. 2016. Review networks for caption generation. In Advances in neural information processing systems. 2361--2369."},{"key":"e_1_3_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.512"},{"key":"e_1_3_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.524"},{"key":"e_1_3_2_2_53_1","volume-title":"Image caption generation with text-conditional semantic attention. arXiv preprint arXiv:1606.04621 2","author":"Zhou Luowei","year":"2016","unstructured":"Luowei Zhou , Chenliang Xu , Parker Koch , and Jason J Corso . 2016. Image caption generation with text-conditional semantic attention. arXiv preprint arXiv:1606.04621 2 ( 2016 ). Luowei Zhou, Chenliang Xu, Parker Koch, and Jason J Corso. 2016. Image caption generation with text-conditional semantic attention. arXiv preprint arXiv:1606.04621 2 (2016)."},{"key":"e_1_3_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.147"}],"event":{"name":"MM '20: The 28th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Seattle WA USA","acronym":"MM '20"},"container-title":["Proceedings of the 28th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394171.3413589","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3394171.3413589","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:14Z","timestamp":1750193234000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394171.3413589"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,12]]},"references-count":54,"alternative-id":["10.1145\/3394171.3413589","10.1145\/3394171"],"URL":"https:\/\/doi.org\/10.1145\/3394171.3413589","relation":{},"subject":[],"published":{"date-parts":[[2020,10,12]]},"assertion":[{"value":"2020-10-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}