{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T14:01:40Z","timestamp":1772892100153,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":48,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,17]],"date-time":"2021-10-17T00:00:00Z","timestamp":1634428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"JSPS KAKENHI Grant","award":["JP21J20250"],"award-info":[{"award-number":["JP21J20250"]}]},{"name":"JSPS KAKENHI Grant","award":["JP20H04210"],"award-info":[{"award-number":["JP20H04210"]}]},{"name":"JSPS KAKENHI Grant","award":["JP21H04910"],"award-info":[{"award-number":["JP21H04910"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,17]]},"DOI":"10.1145\/3474085.3475322","type":"proceedings-article","created":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T05:04:15Z","timestamp":1634533455000},"page":"1766-1774","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["State-aware Video Procedural Captioning"],"prefix":"10.1145","author":[{"given":"Taichi","family":"Nishimura","sequence":"first","affiliation":[{"name":"Kyoto University, Kyoto, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Atsushi","family":"Hashimoto","sequence":"additional","affiliation":[{"name":"OMRON SINIC X Corporation, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yoshitaka","family":"Ushiku","sequence":"additional","affiliation":[{"name":"OMRON SINIC X Corporation, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hirotaka","family":"Kameko","sequence":"additional","affiliation":[{"name":"Kyoto University, Kyoto, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shinsuke","family":"Mori","sequence":"additional","affiliation":[{"name":"Kyoto University, Kyoto, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,17]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.495"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.234"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/K19-1041"},{"key":"e_1_3_2_2_4_1","volume-title":"Proc. ACL Workshop IEEMMTS. 65--72","author":"Banerjee Satanjeev","year":"2005","unstructured":"Satanjeev Banerjee and Alon Lavie . 2005 . METEOR: an automatic metric for MT evaluation with improved correlation with human judgments . In Proc. ACL Workshop IEEMMTS. 65--72 . Satanjeev Banerjee and Alon Lavie. 2005. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In Proc. ACL Workshop IEEMMTS. 65--72."},{"key":"e_1_3_2_2_5_1","volume-title":"Proc. ICLR.","author":"Bosselut Antoine","year":"2018","unstructured":"Antoine Bosselut , Omer Levy , Ari Holtzman , Corin Ennis , Dieter Fox , and Yejin Choi . 2018 . Simulating action dynamics with neural process networks . In Proc. ICLR. Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Ennis, Dieter Fox, and Yejin Choi. 2018. Simulating action dynamics with neural process networks. In Proc. ICLR."},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964315"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1285"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1144"},{"key":"e_1_3_2_2_10_1","volume-title":"Proc. NAACL. 4171--4186","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: pre-training of deep bidirectional transformers for language understanding . In Proc. NAACL. 4171--4186 . Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. NAACL. 4171--4186."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46487-9_47"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W19-1502"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/3045118.3045167"},{"key":"e_1_3_2_2_16_1","volume-title":"Proc. ICLR.","author":"Jang Eric","year":"2017","unstructured":"Eric Jang , Shixiang Gu , and Ben Poole . 2017 . Categorical reparametrization with gumble-softmax . In Proc. ICLR. Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical reparametrization with gumble-softmax. In Proc. ICLR."},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1090"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1114"},{"key":"e_1_3_2_2_19_1","volume-title":"Proc. ICLR. USA.","author":"Diederik","unstructured":"Diederik P. Kingma and Jimmy Ba. [n.d.]. Adam: A method for stochastic optimization . In Proc. ICLR. USA. Diederik P. Kingma and Jimmy Ba. [n.d.]. Adam: A method for stochastic optimization. In Proc. ICLR. USA."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.233"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.3115\/1218955.1219032"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W15-2206"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00990"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00272"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/2999792.2999959"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/1690219.1690287"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3043452"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413765"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073135"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00676"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_2_32_1","unstructured":"Alec Radford Luke Metz and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv.  Alec Radford Luke Metz and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/2969239.2969250"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.327"},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/3327757.3327832"},{"key":"e_1_3_2_2_36_1","volume-title":"Manning","author":"Liu Peter J.","year":"2017","unstructured":"Abigail See, Peter J. Liu , and Christopher D . Manning . 2017 . Get to the point: summarization with pointer-generator networks. In Proc. ACL. 1073--1083. Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: summarization with pointer-generator networks. In Proc. ACL. 1073--1083."},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1641"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413498"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00756"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2020\/104"},{"key":"e_1_3_2_2_41_1","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten Laurens","year":"2008","unstructured":"Laurens van der Maaten and Geoffrey Hinton . 2008 . Visualizing data using t-SNE . Journal of Machine Learning Research 9 (2008), 2579 -- 2605 . Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579--2605.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299087"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01252-6_29"},{"key":"e_1_3_2_2_45_1","unstructured":"Nadav Zamir Asaf Noy Itamar Friedman Matan Protter and Lihi Zelnik-Manor. 2020. Asymmetric loss for multi-label classification. arXiv.  Nadav Zamir Asaf Noy Itamar Friedman Matan Protter and Lihi Zelnik-Manor. 2020. Asymmetric loss for multi-label classification. arXiv."},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00674"},{"key":"e_1_3_2_2_47_1","volume-title":"Corso","author":"Zhou Luowei","year":"2018","unstructured":"Luowei Zhou , Chenliang Xu , and Jason J . Corso . 2018 . Towards automatic learning of procedures from web instructional videos. In Proc. AAAI. 7590--7598. Luowei Zhou, Chenliang Xu, and Jason J. Corso. 2018. Towards automatic learning of procedures from web instructional videos. In Proc. AAAI. 7590--7598."},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00911"}],"event":{"name":"MM '21: ACM Multimedia Conference","location":"Virtual Event China","acronym":"MM '21","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 29th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3475322","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3474085.3475322","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:49:18Z","timestamp":1750193358000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3475322"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,17]]},"references-count":48,"alternative-id":["10.1145\/3474085.3475322","10.1145\/3474085"],"URL":"https:\/\/doi.org\/10.1145\/3474085.3475322","relation":{},"subject":[],"published":{"date-parts":[[2021,10,17]]},"assertion":[{"value":"2021-10-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}