{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T16:47:46Z","timestamp":1779295666671,"version":"3.51.4"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,7,26]],"date-time":"2023-07-26T00:00:00Z","timestamp":1690329600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2023,8]]},"abstract":"<jats:p>We present an approach to generate virtual activity snippets, which comprise sequenced keyframes of multi-character, multi-object interaction scenarios in 3D environments, by learning from recordings of human-scene interactions. The generation consists of two stages. First, we use a sequential deep graph generative model with a temporal module to iteratively generate keyframe descriptions, which represent abstract interactions using graphs, while preserving spatial-temporal relations through the activities. Second, we devise an optimization framework to instantiate the activity snippets in virtual 3D environments guided by the generated keyframe descriptions. Our approach optimizes the poses of character and object instances encoded by the graph nodes to satisfy the relations and constraints encoded by the graph edges. The instantiation process includes a coarse 2D optimization followed by a fine 3D optimization to effectively explore the complex solution space for placing and posing the instances. Through experiments and a perceptual study, we applied our approach to generate plausible activity snippets under different settings.<\/jats:p>","DOI":"10.1145\/3592096","type":"journal-article","created":{"date-parts":[[2023,7,26]],"date-time":"2023-07-26T15:47:45Z","timestamp":1690386465000},"page":"1-15","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Generating Activity Snippets by Learning Human-Scene Interactions"],"prefix":"10.1145","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0074-3889","authenticated-orcid":false,"given":"Changyang","family":"Li","sequence":"first","affiliation":[{"name":"George Mason University, Fairfax, United States of America"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2656-5654","authenticated-orcid":false,"given":"Lap-Fai","family":"Yu","sequence":"additional","affiliation":[{"name":"George Mason University, Fairfax, United States of America"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,7,26]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372923.3404791"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925893"},{"key":"e_1_2_2_3_1","volume-title":"TEACH: Temporal Action Composition for 3D Humans. arXiv preprint arXiv:2209.04066","author":"Athanasiou Nikos","year":"2022","unstructured":"Nikos Athanasiou, Mathis Petrovich, Michael J Black, and G\u00fcl Varol. 2022. TEACH: Temporal Action Composition for 3D Humans. arXiv preprint arXiv:2209.04066 (2022)."},{"key":"e_1_2_2_4_1","first-page":"1","article-title":"Synthesis of concurrent object manipulation tasks","volume":"31","author":"Bai Yunfei","year":"2012","unstructured":"Yunfei Bai, Kristin Siu, and C Karen Liu. 2012. Synthesis of concurrent object manipulation tasks. ACM Transactions on Graphics 31, 6 (2012), 1--9.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298698"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_23"},{"key":"e_1_2_2_7_1","volume-title":"Designing Interactive Systems Conference","author":"Chidambaram Subramanian","year":"2021","unstructured":"Subramanian Chidambaram, Hank Huang, Fengming He, Xun Qian, Ana M Villanueva, Thomas S Redick, Wolfgang Stuerzlinger, and Karthik Ramani. 2021. Processar: An augmented reality-based tool to create in-situ procedural 2d\/3d ar instructions. In Designing Interactive Systems Conference 2021. 234--249."},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366145.2366154"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2816795.2818057"},{"key":"e_1_2_2_10_1","first-page":"1","article-title":"Adaptive synthesis of indoor scenes via activity-associated object relation graphs","volume":"36","author":"Fu Qiang","year":"2017","unstructured":"Qiang Fu, Xiaowu Chen, Xiaotian Wang, Sijia Wen, Bin Zhou, and Hongbo Fu. 2017. Adaptive synthesis of indoor scenes via activity-associated object relation graphs. ACM Transactions on Graphics (TOG) 36, 6 (2017), 1--13.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_2_11_1","unstructured":"Ghost Town Games. 2016. Overcooked. https:\/\/store.steampowered.com\/app\/448510\/Overcooked\/."},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00143"},{"key":"e_1_2_2_13_1","volume-title":"International conference on machine learning. PMLR, 1263--1272","author":"Gilmer Justin","year":"2017","unstructured":"Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In International conference on machine learning. PMLR, 1263--1272."},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413635"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3214832"},{"key":"e_1_2_2_16_1","doi-asserted-by":"crossref","unstructured":"Mohamed Hassan Duygu Ceylan Ruben Villegas Jun Saito Jimei Yang Yi Zhou and Michael J Black. 2021a. Stochastic scene-aware motion prediction. In ICCV. 11374--11384.","DOI":"10.1109\/ICCV48922.2021.01118"},{"key":"e_1_2_2_17_1","doi-asserted-by":"crossref","unstructured":"Mohamed Hassan Partha Ghosh Joachim Tesch Dimitrios Tzionas and Michael J Black. 2021b. Populating 3D Scenes by Learning Human-Scene Interaction. In CVPR. 14708--14718.","DOI":"10.1109\/CVPR46437.2021.01447"},{"key":"e_1_2_2_18_1","doi-asserted-by":"crossref","unstructured":"W Keith Hastings. 1970. Monte Carlo sampling methods using Markov chains and their applications. (1970).","DOI":"10.2307\/2334940"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3386569.3392391"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2016.10.018"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3517696"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01025"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58574-7_46"},{"key":"e_1_2_2_24_1","volume-title":"International conference on machine learning. PMLR, 2323--2332","author":"Jin Wengong","year":"2018","unstructured":"Wengong Jin, Regina Barzilay, and Tommi Jaakkola. 2018. Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning. PMLR, 2323--2332."},{"key":"e_1_2_2_25_1","volume-title":"ASAP: Auto-generating Story-board And Previz with Virtual Humans. In 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). IEEE, 316--320","author":"Kim Hanseob","year":"2021","unstructured":"Hanseob Kim, Ghazanfar Ali, and Jae-In Hwang. 2021. ASAP: Auto-generating Story-board And Previz with Virtual Humans. In 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). IEEE, 316--320."},{"key":"e_1_2_2_26_1","first-page":"1","article-title":"Shape2pose: Human-centric shape analysis","volume":"33","author":"Kim Vladimir G","year":"2014","unstructured":"Vladimir G Kim, Siddhartha Chaudhuri, Leonidas Guibas, and Thomas Funkhouser. 2014. Shape2pose: Human-centric shape analysis. ACM Transactions on Graphics 33, 4 (2014), 1--12.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_2_2_27_1","volume-title":"Optimization by simulated annealing. Science 220, 4598","author":"Kirkpatrick Scott","year":"1983","unstructured":"Scott Kirkpatrick, C Daniel Gelatt, and Mario P Vecchi. 1983. Optimization by simulated annealing. Science 220, 4598 (1983), 671--680."},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1179352.1141972"},{"key":"e_1_2_2_29_1","volume-title":"Deep Compliant Control. In ACM SIGGRAPH 2022 Conference Proceedings. Article 23","author":"Lee Seunghwan","year":"2022","unstructured":"Seunghwan Lee, Phil Sik Chang, and Jehee Lee. 2022. Deep Compliant Control. In ACM SIGGRAPH 2022 Conference Proceedings. Article 23, 9 pages."},{"key":"e_1_2_2_30_1","volume-title":"The ava-kinetics localized human actions video dataset. arXiv preprint arXiv:2005.00214","author":"Li Ang","year":"2020","unstructured":"Ang Li, Meghana Thotakuri, David A Ross, Jo\u00e3o Carreira, Alexander Vostrikov, and Andrew Zisserman. 2020. The ava-kinetics localized human actions video dataset. arXiv preprint arXiv:2005.00214 (2020)."},{"key":"e_1_2_2_31_1","first-page":"1","article-title":"Interactive augmented reality storytelling guided by scene semantics","volume":"41","author":"Li Changyang","year":"2022","unstructured":"Changyang Li, Wanwan Li, Haikun Huang, and Lap-Fai Yu. 2022. Interactive augmented reality storytelling guided by scene semantics. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1--15.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_2_32_1","volume-title":"Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324","author":"Li Yujia","year":"2018","unstructured":"Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. 2018. Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324 (2018)."},{"key":"e_1_2_2_33_1","volume-title":"MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.","author":"Luo Zelun","year":"2022","unstructured":"Zelun Luo, Zane Durante, Linden Li, Wanze Xie, Ruochen Liu, Emily Jin, Zhuoyi Huang, Lun Yu Li, Jiajun Wu, Juan Carlos Niebles, et al. 2022. MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track."},{"key":"e_1_2_2_34_1","first-page":"17939","article-title":"MOMA: Multi-Object Multi-Actor Activity Parsing","volume":"34","author":"Luo Zelun","year":"2021","unstructured":"Zelun Luo, Wanze Xie, Siddharth Kapoor, Yiyun Liang, Michael Cooper, Juan Carlos Niebles, Ehsan Adeli, and Fei-Fei Li. 2021. MOMA: Multi-Object Multi-Actor Activity Parsing. Advances in Neural Information Processing Systems 34 (2021), 17939--17955.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2980179.2980223"},{"key":"e_1_2_2_36_1","volume-title":"Interactive furniture layout using interior design guidelines. ACM transactions on graphics (TOG) 30, 4","author":"Merrell Paul","year":"2011","unstructured":"Paul Merrell, Eric Schkufza, Zeyang Li, Maneesh Agrawala, and Vladlen Koltun. 2011. Interactive furniture layout using interior design guidelines. ACM transactions on graphics (TOG) 30, 4 (2011), 1--10."},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1063\/1.1699114"},{"key":"e_1_2_2_38_1","unstructured":"Nawmal. 2019. Nawmal. https:\/\/www.nawmal.com\/"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01080"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3083725"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661229.2661230"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925867"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00269"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01418-6_41"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356505"},{"key":"e_1_2_2_46_1","first-page":"1","article-title":"WallPlan: synthesizing floorplans by learning to generate wall graphs","volume":"41","author":"Sun Jiahui","year":"2022","unstructured":"Jiahui Sun, Wenming Wu, Ligang Liu, Wenjie Min, Gaofeng Zhang, and Liping Zheng. 2022. WallPlan: synthesizing floorplans by learning to generate wall graphs. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1--14.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISMAR-Adjunct51615.2020.00072"},{"key":"e_1_2_2_48_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_2_2_49_1","volume-title":"Computer Graphics Forum","author":"Wang He","unstructured":"He Wang, S\u00f6ren Pirk, Ersin Yumer, Vladimir G Kim, Ozan Sener, Srinath Sridhar, and Leonidas J Guibas. 2019b. Learning a Generative Model for Multi-Step Human-Object Interactions from Videos. In Computer Graphics Forum, Vol. 38. Wiley Online Library, 367--378."},{"key":"e_1_2_2_50_1","doi-asserted-by":"crossref","unstructured":"Jiashun Wang Huazhe Xu Jingwei Xu Sifei Liu and Xiaolong Wang. 2021a. Synthesizing long-term 3d human motion and interaction in 3d scenes. In CVPR. 9401--9411.","DOI":"10.1109\/CVPR46437.2021.00928"},{"key":"e_1_2_2_51_1","doi-asserted-by":"crossref","unstructured":"Jingbo Wang Sijie Yan Bo Dai and Dahua Lin. 2021b. Scene-aware generative network for human motion synthesis. In CVPR. 12206--12215.","DOI":"10.1109\/CVPR46437.2021.01203"},{"key":"e_1_2_2_52_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3306346.3322941","article-title":"Planit: Planning and instantiating indoor scenes with relation graph and spatial prior networks","volume":"38","author":"Wang Kai","year":"2019","unstructured":"Kai Wang, Yu-An Lin, Ben Weissmann, Manolis Savva, Angel X Chang, and Daniel Ritchie. 2019a. Planit: Planning and instantiating indoor scenes with relation graph and spatial prior networks. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1--15.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_2_53_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3197517.3201362","article-title":"Deep convolutional priors for indoor scene synthesis","volume":"37","author":"Wang Kai","year":"2018","unstructured":"Kai Wang, Manolis Savva, Angel X Chang, and Daniel Ritchie. 2018. Deep convolutional priors for indoor scene synthesis. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1--14.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_2_54_1","volume-title":"Humanise: Language-conditioned human motion generation in 3d scenes. arXiv preprint arXiv:2210.09729","author":"Wang Zan","year":"2022","unstructured":"Zan Wang, Yixin Chen, Tengyu Liu, Yixin Zhu, Wei Liang, and Siyuan Huang. 2022. Humanise: Language-conditioned human motion generation in 3d scenes. arXiv preprint arXiv:2210.09729 (2022)."},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550469.3555426"},{"key":"e_1_2_2_56_1","volume-title":"International conference on machine learning. PMLR, 5708--5717","author":"You Jiaxuan","year":"2018","unstructured":"Jiaxuan You, Rex Ying, Xiang Ren, William Hamilton, and Jure Leskovec. 2018. Graphrnn: Generating realistic graphs with deep auto-regressive models. In International conference on machine learning. PMLR, 5708--5717."},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2010324.1964981"},{"key":"e_1_2_2_58_1","volume-title":"Computer Graphics Forum","author":"Zhang Jia-Qi","unstructured":"Jia-Qi Zhang, Xiang Xu, Zhi-Meng Shen, Ze-Huan Huang, Yang Zhao, Yan-Pei Cao, Pengfei Wan, and Miao Wang. 2021. Write-An-Animation: High-level Text-based Animation Editing with Character-Scene Interaction. In Computer Graphics Forum, Vol. 40. Wiley Online Library, 217--228."},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/3DV50981.2020.00074"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00623"},{"key":"e_1_2_2_61_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3381866","article-title":"Deep generative modeling for scene synthesis via hybrid representations","volume":"39","author":"Zhang Zaiwei","year":"2020","unstructured":"Zaiwei Zhang, Zhenpei Yang, Chongyang Ma, Linjie Luo, Alexander Huth, Etienne Vouga, and Qixing Huang. 2020b. Deep generative modeling for scene synthesis via hybrid representations. ACM Transactions on Graphics (TOG) 39, 2 (2020), 1--21.","journal-title":"ACM Transactions on Graphics (TOG)"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3592096","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3592096","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:45Z","timestamp":1750178265000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3592096"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,26]]},"references-count":61,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,8]]}},"alternative-id":["10.1145\/3592096"],"URL":"https:\/\/doi.org\/10.1145\/3592096","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,26]]},"assertion":[{"value":"2023-07-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}