{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T13:20:58Z","timestamp":1762953658593,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":42,"publisher":"ACM","license":[{"start":{"date-parts":[[2017,10,23]],"date-time":"2017-10-23T00:00:00Z","timestamp":1508716800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"JST CREST","award":["JPMJCR1686"],"award-info":[{"award-number":["JPMJCR1686"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2017,10,23]]},"DOI":"10.1145\/3123266.3127898","type":"proceedings-article","created":{"date-parts":[[2017,10,20]],"date-time":"2017-10-20T13:04:26Z","timestamp":1508504666000},"page":"1889-1894","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["MANet"],"prefix":"10.1145","author":[{"given":"Sang","family":"Phan","sequence":"first","affiliation":[{"name":"National Institute of Informatics, Tokyo, Japan"}]},{"given":"Yusuke","family":"Miyao","sequence":"additional","affiliation":[{"name":"National Institute of Informatics, Tokyo, Japan"}]},{"given":"Shin'ichi","family":"Satoh","sequence":"additional","affiliation":[{"name":"National Institute of Informatics, Tokyo, Japan"}]}],"member":"320","published-online":{"date-parts":[[2017,10,23]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate ICLR.  Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate ICLR."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2967286"},{"key":"e_1_3_2_1_3_1","volume-title":"Microsoft COCO captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325","author":"Chen Xinlei","year":"2015","unstructured":"Xinlei Chen , Hao Fang , Tsung-Yi Lin , Ramakrishna Vedantam , Saurabh Gupta , Piotr Doll\u00e1r , and C Lawrence Zitnick . 2015. Microsoft COCO captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325 ( 2015 ). Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Doll\u00e1r, and C Lawrence Zitnick. 2015. Microsoft COCO captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015)."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1980.1163420"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2984064"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1386352.1386373"},{"key":"e_1_3_2_1_7_1","volume-title":"Danilo Jimenez Rezende, and Daan Wierstra","author":"Gregor Karol","year":"2015","unstructured":"Karol Gregor , Ivo Danihelka , Alex Graves , Danilo Jimenez Rezende, and Daan Wierstra . 2015 . DRAW : A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623 (2015). Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. DRAW: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623 (2015)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.123"},{"key":"e_1_3_2_1_9_1","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.  Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_1_11_1","volume-title":"Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy . 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 ( 2015 ). Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2011.235"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2984065"},{"key":"e_1_3_2_1_14_1","volume-title":"Adam: A method for stochastic optimization. In ICLR.","author":"Kingma Diederik","year":"2015","unstructured":"Diederik Kingma and Jimmy Ba . 2015 . Adam: A method for stochastic optimization. In ICLR. Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR."},{"key":"e_1_3_2_1_15_1","unstructured":"Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks NIPS.   Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks NIPS."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-013-1391-2"},{"key":"e_1_3_2_1_17_1","volume-title":"Meteor universal: language specific translation evaluation for any target language. ACL","author":"Alon Lavie Michael Denkowski","year":"2014","unstructured":"Michael Denkowski Alon Lavie . 2014. Meteor universal: language specific translation evaluation for any target language. ACL ( 2014 ). Michael Denkowski Alon Lavie. 2014. Meteor universal: language specific translation evaluation for any target language. ACL (2014)."},{"key":"e_1_3_2_1_18_1","volume-title":"Rouge: A package for automatic evaluation of summaries ACL Workshop.","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin . 2004 . Rouge: A package for automatic evaluation of summaries ACL Workshop. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries ACL Workshop."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Pradeep Natarajan Shuang Wu Shiv Vitaladevuni Xiaodan Zhuang Stavros Tsakalidis Unsang Park Rohit Prasad and Premkumar Natarajan. 2012. Multimodal feature fusion for robust event detection in web videos CVPR.   Pradeep Natarajan Shuang Wu Shiv Vitaladevuni Xiaodan Zhuang Stavros Tsakalidis Unsang Park Rohit Prasad and Premkumar Natarajan. 2012. Multimodal feature fusion for robust event detection in web videos CVPR.","DOI":"10.1109\/CVPR.2012.6247814"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00138-013-0525-x"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.228"},{"key":"e_1_3_2_1_22_1","unstructured":"Yingwei Pan Tao Mei Ting Yao Houqiang Li and Yong Rui. 2016. Jointly Modeling Embedding and Translation to Bridge Video and Language CVPR.  Yingwei Pan Tao Mei Ting Yao Houqiang Li and Yong Rui. 2016. Jointly Modeling Embedding and Translation to Bridge Video and Language CVPR."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073135"},{"key":"e_1_3_2_1_24_1","volume-title":"Glove: Global Vectors for Word Representation.. In EMNLP.","author":"Pennington Jeffrey","year":"2014","unstructured":"Jeffrey Pennington , Richard Socher , and Christopher D Manning . 2014 . Glove: Global Vectors for Word Representation.. In EMNLP. Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation.. In EMNLP."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2984066"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2984062"},{"key":"e_1_3_2_1_27_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition ICLR.  Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition ICLR."},{"key":"e_1_3_2_1_28_1","unstructured":"Ilya Sutskever Oriol Vinyals and Quoc V Le. 2014. Sequence to sequence learning with neural networks NIPS.   Ilya Sutskever Oriol Vinyals and Quoc V Le. 2014. Sequence to sequence learning with neural networks NIPS."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_3_2_1_30_1","volume-title":"Cider: Consensus-based image description evaluation CVPR.","author":"Vedantam Ramakrishna","year":"2015","unstructured":"Ramakrishna Vedantam , C Lawrence Zitnick , and Devi Parikh . 2015 . Cider: Consensus-based image description evaluation CVPR. Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-based image description evaluation CVPR."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.515"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Subhashini Venugopalan Huijuan Xu Jeff Donahue Marcus Rohrbach Raymond Mooney and Kate Saenko. 2015 b. Translating Videos to Natural Language Using Deep Recurrent Neural Networks NAACL-HLT.  Subhashini Venugopalan Huijuan Xu Jeff Donahue Marcus Rohrbach Raymond Mooney and Kate Saenko. 2015 b. Translating Videos to Natural Language Using Deep Recurrent Neural Networks NAACL-HLT.","DOI":"10.3115\/v1\/N15-1173"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995407"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Limin Wang Yuanjun Xiong Zhe Wang Yu Qiao Dahua Lin Xiaoou Tang and Luc Van Gool. 2016. Temporal segment networks: towards good practices for deep action recognition ECCV.  Limin Wang Yuanjun Xiong Zhe Wang Yu Qiao Dahua Lin Xiaoou Tang and Luc Van Gool. 2016. Temporal segment networks: towards good practices for deep action recognition ECCV.","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Jun Xu Tao Mei Ting Yao and Yong Rui. 2016. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language CVPR.  Jun Xu Tao Mei Ting Yao and Yong Rui. 2016. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language CVPR.","DOI":"10.1109\/CVPR.2016.571"},{"key":"e_1_3_2_1_36_1","unstructured":"Kelvin Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhudinov Rich Zemel and Yoshua Bengio. 2015. Show Attend and Tell: Neural Image Caption Generation with Visual Attention ICML.   Kelvin Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhudinov Rich Zemel and Yoshua Bengio. 2015. Show Attend and Tell: Neural Image Caption Generation with Visual Attention ICML."},{"key":"e_1_3_2_1_37_1","volume-title":"THUMOS Workshop","author":"Xu Zhongwen","year":"2015","unstructured":"Zhongwen Xu , Linchao Zhu , Yi Yang , and Alexander G Hauptmann . 2015 . Uts-cmu at thumos 2015 . THUMOS Workshop (2015). Zhongwen Xu, Linchao Zhu, Yi Yang, and Alexander G Hauptmann. 2015. Uts-cmu at thumos 2015. THUMOS Workshop (2015)."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.512"},{"key":"e_1_3_2_1_39_1","unstructured":"Quanzeng You Hailin Jin Zhaowen Wang Chen Fang and Jiebo Luo. 2016. Image captioning with semantic attention. In CVPR.  Quanzeng You Hailin Jin Zhaowen Wang Chen Fang and Jiebo Luo. 2016. Image captioning with semantic attention. In CVPR."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Mihai Zanfir Elisabeta Marinoiu and Cristian Sminchisescu. 2016. Spatio-Temporal Attention Models for Grounded Video Captioning ACCV.  Mihai Zanfir Elisabeta Marinoiu and Cristian Sminchisescu. 2016. Spatio-Temporal Attention Models for Grounded Video Captioning ACCV.","DOI":"10.1007\/978-3-319-54190-7_7"},{"key":"e_1_3_2_1_41_1","unstructured":"Wojciech Zaremba and Ilya Sutskever. 2015. Learning to execute ICLR.  Wojciech Zaremba and Ilya Sutskever. 2015. Learning to execute ICLR."},{"key":"e_1_3_2_1_42_1","volume-title":"Recurrent neural network regularization. arXiv preprint arXiv:1409.2329","author":"Zaremba Wojciech","year":"2014","unstructured":"Wojciech Zaremba , Ilya Sutskever , and Oriol Vinyals . 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 ( 2014 ). Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)."}],"event":{"name":"MM '17: ACM Multimedia Conference","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Mountain View California USA","acronym":"MM '17"},"container-title":["Proceedings of the 25th ACM international conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3123266.3127898","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3123266.3127898","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:13:35Z","timestamp":1750212815000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3123266.3127898"}},"subtitle":["A Modal Attention Network for Describing Videos"],"short-title":[],"issued":{"date-parts":[[2017,10,23]]},"references-count":42,"alternative-id":["10.1145\/3123266.3127898","10.1145\/3123266"],"URL":"https:\/\/doi.org\/10.1145\/3123266.3127898","relation":{},"subject":[],"published":{"date-parts":[[2017,10,23]]},"assertion":[{"value":"2017-10-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}