{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T21:13:05Z","timestamp":1776114785610,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":54,"publisher":"ACM","license":[{"start":{"date-parts":[[2025,3,24]],"date-time":"2025-03-24T00:00:00Z","timestamp":1742774400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,3,24]]},"DOI":"10.1145\/3708359.3712144","type":"proceedings-article","created":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T12:50:34Z","timestamp":1742388634000},"page":"1564-1580","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["VideoMix: Aggregating How-To Videos for Task-Oriented Learning"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1776-4712","authenticated-orcid":false,"given":"Saelyne","family":"Yang","sequence":"first","affiliation":[{"name":"School of Computing, KAIST, Daejeon, Republic of Korea,"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-5409-7287","authenticated-orcid":false,"given":"Anh","family":"Truong","sequence":"additional","affiliation":[{"name":"Adobe Research, New York, New York, USA,"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6348-4127","authenticated-orcid":false,"given":"Juho","family":"Kim","sequence":"additional","affiliation":[{"name":"School of Computing, KAIST, Daejeon, Republic of Korea,"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4222-8105","authenticated-orcid":false,"given":"Dingzeyu","family":"Li","sequence":"additional","affiliation":[{"name":"Adobe Research, Seattle, Washington, USA,"}]}],"member":"320","published-online":{"date-parts":[[2025,3,24]]},"reference":[{"key":"e_1_3_3_3_2_2","volume-title":"Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track","author":"Afouras Triantafyllos","year":"2023","unstructured":"Triantafyllos Afouras, Effrosyni Mavroudi, Tushar Nagarajan, Huiyu Wang, and Lorenzo Torresani. 2023. HT-Step: Aligning Instructional Articles with How-To Videos. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track."},{"key":"e_1_3_3_3_3_2","volume-title":"A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom\u2019s Taxonomy of Educational Objectives, Complete Edition","author":"Anderson Lorin\u00a0W.","year":"2001","unstructured":"Lorin\u00a0W. Anderson, David\u00a0R. Krathwohl, Peter\u00a0W. Airasian, Kathleen\u00a0A. Cruikshank, Richard\u00a0E. Mayer, Paul\u00a0R. Pintrich, James Raths, and Merlin\u00a0C. Wittrock. 2001. A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom\u2019s Taxonomy of Educational Objectives, Complete Edition. Longman."},{"key":"e_1_3_3_3_4_2","doi-asserted-by":"crossref","unstructured":"Kumar Ashutosh Zihui Xue Tushar Nagarajan and Kristen Grauman. 2024. Detours for Navigating Instructional Videos. arxiv:https:\/\/arXiv.org\/abs\/2401.01823\u00a0[cs.CV]","DOI":"10.1109\/CVPR52733.2024.01779"},{"key":"e_1_3_3_3_5_2","doi-asserted-by":"crossref","unstructured":"Max Bain Jaesung Huh Tengda Han and Andrew Zisserman. 2023. WhisperX: Time-Accurate Speech Transcription of Long-Form Audio. INTERSPEECH 2023 (2023).","DOI":"10.21437\/Interspeech.2023-78"},{"key":"e_1_3_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3174025"},{"key":"e_1_3_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445131"},{"key":"e_1_3_3_3_8_2","series-title":"(GI 2020)","first-page":"114 \u2013 124","volume-title":"Proceedings of Graphics Interface 2020","author":"Chang Minsuk","year":"2020","unstructured":"Minsuk Chang, Ben Lafreniere, Juho Kim, George Fitzmaurice, and Tovi Grossman. 2020. Workflow Graphs: A Computational Model of Collective Task Strategies for 3D Design Software. In Proceedings of Graphics Interface 2020 (University of Toronto) (GI 2020). Canadian Human-Computer Communications Society \/ Soc\u00edet\u00e9 canadienne du dialogue humain-machie, 114 \u2013 124. https:\/\/doi.org\/10.20380\/GI2020.13"},{"key":"e_1_3_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300931"},{"key":"e_1_3_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642443"},{"key":"e_1_3_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3472749.3474778"},{"key":"e_1_3_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2380116.2380130"},{"key":"e_1_3_3_3_13_2","doi-asserted-by":"crossref","unstructured":"Bogeum Choi Sarah Casteel Jaime Arguello and Robert Capra. 2023. Better Understanding Procedural Search Tasks: Perceptions Behaviors and Challenges. ACM Trans. Inf. Syst. 42 3 Article 65 (Dec. 2023) 32\u00a0pages. https:\/\/doi.org\/10.1145\/3630004","DOI":"10.1145\/3630004"},{"key":"e_1_3_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3641969"},{"key":"e_1_3_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376437"},{"key":"e_1_3_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3379337.3415592"},{"key":"e_1_3_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300527"},{"key":"e_1_3_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3502052"},{"key":"e_1_3_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00292"},{"key":"e_1_3_3_3_20_2","doi-asserted-by":"crossref","unstructured":"Sandra\u00a0G Hart and Lowell\u00a0E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Human mental workload 1 3 (1988) 139\u2013183.","DOI":"10.1016\/S0166-4115(08)62386-9"},{"key":"e_1_3_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3580772"},{"key":"e_1_3_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/2556288.2556986"},{"key":"e_1_3_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/2207676.2208549"},{"key":"e_1_3_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/2470654.2466235"},{"key":"e_1_3_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3635636.3656192"},{"key":"e_1_3_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3581006"},{"key":"e_1_3_3_3_27_2","volume-title":"The British journal of educational psychology","author":"List Alexandra","year":"2021","unstructured":"Alexandra List, Gala\u00a0S Campos\u00a0Oaxaca, Eunseo Lee, Hongcui Du, and Hye\u00a0Yeon Lee. 2021. Examining perceptions, selections, and products in undergraduates\u2019 learning from multiple resources. In The British journal of educational psychology."},{"key":"e_1_3_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/2642918.2647366"},{"key":"e_1_3_3_3_29_2","first-page":"353","volume-title":"Learning from multiple texts","author":"McCrudden Matthew T.","year":"2022","unstructured":"Matthew T. McCrudden, Ivar Br\u00e5ten, and Ladislao Salmer\u00f3n. 2022. Learning from multiple texts. Elsevier, Netherlands, 353\u2013363. https:\/\/doi.org\/10.1016\/B978-0-12-818630-5.14046-1 Publisher Copyright: \u00a9 2023 Elsevier Ltd. All rights reserved.."},{"key":"e_1_3_3_3_30_2","volume-title":"Instructional technology: foundations","author":"Merrill Paul\u00a0F","year":"1987","unstructured":"Paul\u00a0F Merrill. 1987. Job and Task Analysis. In Instructional technology: foundations."},{"key":"e_1_3_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00272"},{"key":"e_1_3_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01773"},{"key":"e_1_3_3_3_33_2","series-title":"(GI\u201919)","volume-title":"Proceedings of the 45th Graphics Interface Conference on Proceedings of Graphics Interface 2019","author":"Nawhal Megha","year":"2019","unstructured":"Megha Nawhal, Jacqueline\u00a0B. Lang, Greg Mori, and Parmit\u00a0K. Chilana. 2019. VideoWhiz: Non-Linear Interactive Overviews for Recipe Videos. In Proceedings of the 45th Graphics Interface Conference on Proceedings of Graphics Interface 2019 (Kingston, Canada) (GI\u201919). Canadian Human-Computer Communications Society, Waterloo, CAN, Article 15, 8\u00a0pages. https:\/\/doi.org\/10.20380\/GI2019.15"},{"key":"e_1_3_3_3_34_2","unstructured":"OpenAI. 2024. Function Calling. https:\/\/platform.openai.com\/docs\/guides\/function-calling. Accessed: Oct 9 2024."},{"key":"e_1_3_3_3_35_2","unstructured":"OpenAI. 2024. GPT-4o-2024-05-13. https:\/\/platform.openai.com\/docs\/models\/gpt-4. Accessed: Oct 9 2024."},{"key":"e_1_3_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/2807442.2807502"},{"key":"e_1_3_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/2642918.2647400"},{"key":"e_1_3_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3643834.3661511"},{"key":"e_1_3_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/2047196.2047213"},{"key":"e_1_3_3_3_40_2","unstructured":"Prolific. 2024. Prolific. https:\/\/www.prolific.co\/. Accessed: Oct 9 2024."},{"key":"e_1_3_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3498366.3505816"},{"key":"e_1_3_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445721"},{"key":"e_1_3_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV56688.2023.00230"},{"key":"e_1_3_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3173859"},{"key":"e_1_3_3_3_45_2","unstructured":"WikiHow. 2024. WikiHow. https:\/\/www.wikihow.com\/. Accessed: Oct 9 2024."},{"key":"e_1_3_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3672539.3686711"},{"key":"e_1_3_3_3_47_2","first-page":"531","volume-title":"Proceedings of HCI Korea 2020","author":"Yang Saelyne","year":"2020","unstructured":"Saelyne Yang and Juho Kim. 2020. What Makes It Hard for Users to Follow Software Tutorial Videos?. In Proceedings of HCI Korea 2020. The HCI Society of KOREA, South Korea, 531\u2013536."},{"key":"e_1_3_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3581126"},{"key":"e_1_3_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i17.29906"},{"key":"e_1_3_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3490099.3511106"},{"key":"e_1_3_3_3_51_2","volume-title":"youtube-dl","author":"dl youtube","year":"2022","unstructured":"youtube dl. 2022 (accessed Sep 14, 2022). youtube-dl. https:\/\/www.npmjs.com\/package\/youtube-dl"},{"key":"e_1_3_3_3_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02208"},{"key":"e_1_3_3_3_53_2","doi-asserted-by":"crossref","unstructured":"Lei Zhang Qian-Kun Xu Lei-Zheng Nie and Hua Huang. 2014. VideoGraph: a non-linear video representation for efficient exploration. Vis. Comput. 30 10 (Oct. 2014) 1123\u20131132. https:\/\/doi.org\/10.1007\/s00371-013-0882-5","DOI":"10.1007\/s00371-013-0882-5"},{"key":"e_1_3_3_3_54_2","series-title":"(CHI \u201922)","volume-title":"Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems","author":"Zhao Yaxi","year":"2022","unstructured":"Yaxi Zhao, Razan Jaber, Donald McMillan, and Cosmin Munteanu. 2022. \u201cRewind to the Jiggling Meat Part\u201d: Understanding Voice Control of Instructional Videos in Everyday Tasks. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI \u201922). Association for Computing Machinery, New York, NY, USA, Article 58, 11\u00a0pages. https:\/\/doi.org\/10.1145\/3491102.3502036"},{"key":"e_1_3_3_3_55_2","doi-asserted-by":"crossref","unstructured":"Dimitri Zhukov Jean-Baptiste Alayrac Ramazan\u00a0Gokberk Cinbis David Fouhey Ivan Laptev and Josef Sivic. 2019. Cross-task weakly supervised learning from instructional videos.","DOI":"10.1109\/CVPR.2019.00365"}],"event":{"name":"IUI '25: 30th International Conference on Intelligent User Interfaces","location":"Cagliari Italy","acronym":"IUI '25","sponsor":["SIGAI ACM Special Interest Group on Artificial Intelligence","SIGCHI ACM Special Interest Group on Computer-Human Interaction"]},"container-title":["Proceedings of the 30th International Conference on Intelligent User Interfaces"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3708359.3712144","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3708359.3712144","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:57:06Z","timestamp":1750298226000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3708359.3712144"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,24]]},"references-count":54,"alternative-id":["10.1145\/3708359.3712144","10.1145\/3708359"],"URL":"https:\/\/doi.org\/10.1145\/3708359.3712144","relation":{},"subject":[],"published":{"date-parts":[[2025,3,24]]},"assertion":[{"value":"2025-03-24","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}