{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:28:49Z","timestamp":1750220929608,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":24,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,6,5]],"date-time":"2019-06-05T00:00:00Z","timestamp":1559692800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"JSPS Grants-in-Aid for Scientific Research","award":["18K11425"],"award-info":[{"award-number":["18K11425"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,6,5]]},"DOI":"10.1145\/3326458.3326928","type":"proceedings-article","created":{"date-parts":[[2019,6,7]],"date-time":"2019-06-07T21:02:18Z","timestamp":1559941338000},"page":"9-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Frame Selection for Producing Recipe with Pictures from an Execution Video of a Recipe"],"prefix":"10.1145","author":[{"given":"Taichi","family":"Nishimura","sequence":"first","affiliation":[{"name":"Kyoto University, Kyoto, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Atsushi","family":"Hashimoto","sequence":"additional","affiliation":[{"name":"Omron Sinic X Corporation, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yoko","family":"Yamakata","sequence":"additional","affiliation":[{"name":"University of Tokyo, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shinsuke","family":"Mori","sequence":"additional","affiliation":[{"name":"Kyoto University, Kyoto, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,6,5]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.507"},{"key":"e_1_3_2_1_2_1","volume-title":"GILT: Generating images from long text. arXiv preprint arXiv:1901.02404","author":"El Ori Bar","year":"2019","unstructured":"Ori Bar El , Ori Licht , and Netanel Yosephian . 2019 . GILT: Generating images from long text. arXiv preprint arXiv:1901.02404 (2019). Ori Bar El, Ori Licht, and Netanel Yosephian. 2019. GILT: Generating images from long text. arXiv preprint arXiv:1901.02404 (2019)."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2967242"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080686"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICMEW.2016.7574771"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2638728.2641338"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W15-2206"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/N15-1015"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-2407"},{"volume-title":"Proceedings of the International Conference on Language Resources and Evaluation","year":"2014","key":"e_1_3_2_1_11_1","unstructured":"ShinsukeMori,HirokuniMaeta,YokoYamakata,andTetsuroSasada.2014. Flow graph corpus from recipe texts . In Proceedings of the International Conference on Language Resources and Evaluation ( 2014 ), 2370--2377. ShinsukeMori,HirokuniMaeta,YokoYamakata,andTetsuroSasada.2014.Flow graph corpus from recipe texts. In Proceedings of the International Conference on Language Resources and Evaluation (2014), 2370--2377."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2638728.2641328"},{"key":"e_1_3_2_1_13_1","volume-title":"Grounding action descriptions in videos. Transactions of the Association for Computational Linguistics","author":"Regneri Michaela","year":"2013","unstructured":"Michaela Regneri , Marcus Rohrbach , Dominikus Wetzel , Stefan Thater , Bernt Schiele , and Manfred Pinkal . 2013. Grounding action descriptions in videos. Transactions of the Association for Computational Linguistics ( 2013 ), 25--36. Michaela Regneri, Marcus Rohrbach, Dominikus Wetzel, Stefan Thater, Bernt Schiele, and Manfred Pinkal. 2013. Grounding action descriptions in videos. Transactions of the Association for Computational Linguistics (2013), 25--36."},{"key":"e_1_3_2_1_14_1","volume-title":"Advances in Neural Information Processing Systems","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren , Kaiming He , Ross Girshick , and Jian Sun . 2015. Faster r-cnn: Towards real-time object detection with region proposal networks . In Advances in Neural Information Processing Systems ( 2015 ), 91--99. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems (2015), 91--99."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-11752-2_15"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2354909"},{"key":"e_1_3_2_1_17_1","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Salvador Amaia","year":"2017","unstructured":"Amaia Salvador , Nicholas Hynes , Yusuf Aytar , Javier Marin , Ferda Ofli , Ingmar Weber , and Antonio Torralba . 2017 . Learning cross-Modal embeddings for cook- ing recipes and food images . In Proceedings of the Conference on Computer Vision and Pattern Recognition (2017), 3020--3028. Amaia Salvador, Nicholas Hynes, Yusuf Aytar, Javier Marin, Ferda Ofli, Ingmar Weber, and Antonio Torralba. 2017. Learning cross-Modal embeddings for cook- ing recipes and food images. In Proceedings of the Conference on Computer Vision and Pattern Recognition (2017), 3020--3028."},{"key":"e_1_3_2_1_18_1","volume-title":"Proceedings of the Conference of the Pacific Association for Computational Linguistics","author":"Sasada Tetsuro","year":"2015","unstructured":"Tetsuro Sasada , Shinsuke Mori , Tatsuya Kawahara , and Yoko Yamakata . 2015 . Named entity recognizer trainable from partially annotated data . In Proceedings of the Conference of the Pacific Association for Computational Linguistics (2015), 148--160. Tetsuro Sasada, Shinsuke Mori, Tatsuya Kawahara, and Yoko Yamakata. 2015. Named entity recognizer trainable from partially annotated data. In Proceedings of the Conference of the Pacific Association for Computational Linguistics (2015), 148--160."},{"key":"e_1_3_2_1_19_1","volume-title":"Proceedings of the International Joint Conference on Natural Language Processing","author":"Ushiku Atsushi","year":"2017","unstructured":"Atsushi Ushiku , Hayato Hashimoto , Atsushi Hashimoto , and Shinsuke Mori . 2017 . Procedural text generation from an execution video . In Proceedings of the International Joint Conference on Natural Language Processing (2017), 326--335. Atsushi Ushiku, Hayato Hashimoto, Atsushi Hashimoto, and Shinsuke Mori. 2017. Procedural text generation from an execution video. In Proceedings of the International Joint Conference on Natural Language Processing (2017), 326--335."},{"volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","year":"2015","key":"e_1_3_2_1_20_1","unstructured":"OriolVinyals,AlexanderToshev,SamyBengio,andDumitruErhan.2015. Show and tell: A neural image caption generator . In Proceedings of the Conference on Computer Vision and Pattern Recognition ( 2015 ), 3156--3164. OriolVinyals,AlexanderToshev,SamyBengio,andDumitruErhan.2015.Show and tell: A neural image caption generator. In Proceedings of the Conference on Computer Vision and Pattern Recognition (2015), 3156--3164."},{"key":"e_1_3_2_1_21_1","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Wang Liwei","year":"2016","unstructured":"Liwei Wang , Yin Li , Jing Huang , and Svetlana Lazebnik . 2016 . Learning two- branch neural networks for image-Text matching tasks . In Proceedings of the Conference on Computer Vision and Pattern Recognition (2016), 5005--5013. Liwei Wang, Yin Li, Jing Huang, and Svetlana Lazebnik. 2016. Learning two- branch neural networks for image-Text matching tasks. In Proceedings of the Conference on Computer Vision and Pattern Recognition (2016), 5005--5013."},{"key":"e_1_3_2_1_22_1","volume-title":"Realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1710.10916","author":"Zhang Han","year":"2017","unstructured":"Han Zhang , Tao Xu , Hongsheng Li , Shaoting Zhang , Xiaogang Wang , Xiaolei Huang , and Dimitris Metaxas . 2017. Stackgan++ : Realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1710.10916 ( 2017 ). Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris Metaxas. 2017. Stackgan++: Realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1710.10916 (2017)."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01246-5_42"},{"key":"e_1_3_2_1_24_1","volume":"201","author":"Zhou Luowei","unstructured":"Luowei Zhou , Chenliang Xu , and Jason J. Corso. 201 8. Towards automatic learn- ing of procedures from web instructional videos. In Proceedings of the Advance- ment of Artificial Intelligence (2018), 7590--7598. Luowei Zhou, Chenliang Xu, and Jason J. Corso. 2018. Towards automatic learn- ing of procedures from web instructional videos. In Proceedings of the Advance- ment of Artificial Intelligence (2018), 7590--7598.","journal-title":"Jason J. Corso."}],"event":{"name":"ICMR '19: International Conference on Multimedia Retrieval","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Ottawa ON Canada","acronym":"ICMR '19"},"container-title":["Proceedings of the 11th Workshop on Multimedia for Cooking and Eating Activities"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3326458.3326928","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3326458.3326928","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:53:18Z","timestamp":1750204398000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3326458.3326928"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,6,5]]},"references-count":24,"alternative-id":["10.1145\/3326458.3326928","10.1145\/3326458"],"URL":"https:\/\/doi.org\/10.1145\/3326458.3326928","relation":{},"subject":[],"published":{"date-parts":[[2019,6,5]]},"assertion":[{"value":"2019-06-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}