{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T16:09:52Z","timestamp":1756310992935,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":26,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,8,26]],"date-time":"2019-08-26T00:00:00Z","timestamp":1566777600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,8,26]]},"DOI":"10.1145\/3337722.3341870","type":"proceedings-article","created":{"date-parts":[[2019,9,3]],"date-time":"2019-09-03T12:32:59Z","timestamp":1567513979000},"page":"1-7","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["End-to-end let's play commentary generation using multi-modal video representations"],"prefix":"10.1145","author":[{"given":"Chengxi","family":"Li","sequence":"first","affiliation":[{"name":"University of Kentucky"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sagar","family":"Gandhi","sequence":"additional","affiliation":[{"name":"University of Kentucky and Google"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Brent","family":"Harrison","sequence":"additional","affiliation":[{"name":"University of Kentucky"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,8,26]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473","author":"Bahdanau Dzmitry","year":"2014","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 ( 2014 ). Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S17-2126"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1027933.1027968"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298856"},{"key":"e_1_3_2_1_5_1","volume-title":"Deep convolutional filter banks for texture recognition and segmentation. arXiv preprint arXiv:1411.6836","author":"Cimpoi Mircea","year":"2014","unstructured":"Mircea Cimpoi , Subhransu Maji , and Andrea Vedaldi . 2014. Deep convolutional filter banks for texture recognition and segmentation. arXiv preprint arXiv:1411.6836 ( 2014 ). Mircea Cimpoi, Subhransu Maji, and Andrea Vedaldi. 2014. Deep convolutional filter banks for texture recognition and segmentation. arXiv preprint arXiv:1411.6836 (2014)."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/1763974.1764031"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.81"},{"key":"e_1_3_2_1_8_1","volume-title":"Towards Automated Let's Play Commentary. arXiv preprint arXiv:1809.09424","author":"Guzdial Matthew","year":"2018","unstructured":"Matthew Guzdial , Shukan Shah , and Mark Riedl . 2018. Towards Automated Let's Play Commentary. arXiv preprint arXiv:1809.09424 ( 2018 ). Matthew Guzdial, Shukan Shah, and Mark Riedl. 2018. Towards Automated Let's Play Commentary. arXiv preprint arXiv:1809.09424 (2018)."},{"key":"e_1_3_2_1_9_1","unstructured":"Wei-Ning Hsu Yu Zhang Ron J Weiss Heiga Zen Yonghui Wu Yuxuan Wang Yuan Cao Ye Jia Zhifeng Chen Jonathan Shen etal 2018. Hierarchical Generative Modeling for Controllable Speech Synthesis. arXiv preprint arXiv:1810.07217 (2018).  Wei-Ning Hsu Yu Zhang Ron J Weiss Heiga Zen Yonghui Wu Yuxuan Wang Yuan Cao Ye Jia Zhifeng Chen Jonathan Shen et al. 2018. Hierarchical Generative Modeling for Controllable Speech Synthesis. arXiv preprint arXiv:1810.07217 (2018)."},{"key":"e_1_3_2_1_10_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_1_11_1","volume-title":"Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025","author":"Luong Minh-Thang","year":"2015","unstructured":"Minh-Thang Luong , Hieu Pham , and Christopher D Manning . 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 ( 2015 ). Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.25080\/Majora-7b98e3ed-003"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.117"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2984062"},{"volume-title":"Advances in Neural Information Processing Systems 27","author":"Simonyan Karen","key":"e_1_3_2_1_15_1","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Two-Stream Convolutional Networks for Action Recognition in Videos . In Advances in Neural Information Processing Systems 27 , Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, and K.Q. Weinberger (Eds.). Curran Associates, Inc. , 568--576. http:\/\/papers.nips.cc\/paper\/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf Karen Simonyan and Andrew Zisserman. 2014. Two-Stream Convolutional Networks for Action Recognition in Videos. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, and K.Q. Weinberger (Eds.). Curran Associates, Inc., 568--576. http:\/\/papers.nips.cc\/paper\/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf"},{"key":"e_1_3_2_1_16_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and AndrewZisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and AndrewZisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_1_17_1","volume-title":"Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron. arXiv preprint arXiv:1803.09047","author":"Skerry-Ryan RJ","year":"2018","unstructured":"RJ Skerry-Ryan , Eric Battenberg , Ying Xiao , Yuxuan Wang , Daisy Stanton , Joel Shor , Ron J Weiss , Rob Clark , and Rif A Saurous . 2018. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron. arXiv preprint arXiv:1803.09047 ( 2018 ). RJ Skerry-Ryan, Eric Battenberg, Ying Xiao, Yuxuan Wang, Daisy Stanton, Joel Shor, Ron J Weiss, Rob Clark, and Rif A Saurous. 2018. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron. arXiv preprint arXiv:1803.09047 (2018)."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1915893"},{"key":"e_1_3_2_1_19_1","unstructured":"Ilya Sutskever Oriol Vinyals and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.   Ilya Sutskever Oriol Vinyals and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems . 3104--3112."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.515"},{"key":"e_1_3_2_1_21_1","volume-title":"Translating videos to natural language using deep recurrent neural networks. arXiv preprint arXiv:1412.4729","author":"Venugopalan Subhashini","year":"2014","unstructured":"Subhashini Venugopalan , Huijuan Xu , Jeff Donahue , Marcus Rohrbach , Raymond Mooney , and Kate Saenko . 2014. Translating videos to natural language using deep recurrent neural networks. arXiv preprint arXiv:1412.4729 ( 2014 ). Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, and Kate Saenko. 2014. Translating videos to natural language using deep recurrent neural networks. arXiv preprint arXiv:1412.4729 (2014)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"e_1_3_2_1_23_1","volume-title":"International conference on machine learning. 2048--2057","author":"Xu Kelvin","year":"2015","unstructured":"Kelvin Xu , Jimmy Ba , Ryan Kiros , Kyunghyun Cho , Aaron Courville , Ruslan Salakhudinov , Rich Zemel , and Yoshua Bengio . 2015 . Show, attend and tell: Neural image caption generation with visual attention . In International conference on machine learning. 2048--2057 . Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning. 2048--2057."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2016.7900036"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.512"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.503"}],"event":{"name":"FDG '19: The Fourteenth International Conference on the Foundations of Digital Games","acronym":"FDG '19","location":"San Luis Obispo California USA"},"container-title":["Proceedings of the 14th International Conference on the Foundations of Digital Games"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3337722.3341870","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3337722.3341870","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:54:25Z","timestamp":1750204465000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3337722.3341870"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,8,26]]},"references-count":26,"alternative-id":["10.1145\/3337722.3341870","10.1145\/3337722"],"URL":"https:\/\/doi.org\/10.1145\/3337722.3341870","relation":{},"subject":[],"published":{"date-parts":[[2019,8,26]]},"assertion":[{"value":"2019-08-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}