{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T09:27:31Z","timestamp":1780392451191,"version":"3.54.1"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,8,16]],"date-time":"2023-08-16T00:00:00Z","timestamp":1692144000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Comput. Graph. Interact. Tech."],"published-print":{"date-parts":[[2023,8,16]]},"abstract":"<jats:p>Humans perform everyday tasks using a combination of locomotion and manipulation skills. Building a system that can handle both skills is essential to creating virtual humans. We present a physically-simulated human capable of solving box rearrangement tasks, which requires a combination of both skills. We propose a hierarchical control architecture, where each level solves the task at a different level of abstraction, and the result is a physics-based simulated virtual human capable of rearranging boxes in a cluttered environment. The control architecture integrates a planner, diffusion models, and physics-based motion imitation of sparse motion clips using deep reinforcement learning. Boxes can vary in size, weight, shape, and placement height. Code and trained control policies are provided.<\/jats:p>","DOI":"10.1145\/3606931","type":"journal-article","created":{"date-parts":[[2023,8,24]],"date-time":"2023-08-24T10:05:30Z","timestamp":1692871530000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":25,"title":["Hierarchical Planning and Control for Box Loco-Manipulation"],"prefix":"10.1145","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1948-1767","authenticated-orcid":false,"given":"Zhaoming","family":"Xie","sequence":"first","affiliation":[{"name":"Stanford University, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8535-8324","authenticated-orcid":false,"given":"Jonathan","family":"Tseng","sequence":"additional","affiliation":[{"name":"Stanford University, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4519-4326","authenticated-orcid":false,"given":"Sebastian","family":"Starke","sequence":"additional","affiliation":[{"name":"Meta, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9123-3672","authenticated-orcid":false,"given":"Michiel","family":"van de Panne","sequence":"additional","affiliation":[{"name":"University of British Columbia, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5926-0905","authenticated-orcid":false,"given":"C. Karen","family":"Liu","sequence":"additional","affiliation":[{"name":"Stanford University, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,8,24]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925893"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2008.4651195"},{"key":"e_1_2_1_3_1","first-page":"1","article-title":"Synthesis of concurrent object manipulation tasks","volume":"31","author":"Bai Yunfei","year":"2012","unstructured":"Yunfei Bai, Kristin Siu, and C Karen Liu. 2012. Synthesis of concurrent object manipulation tasks. ACM Transactions on Graphics (TOG) 31, 6 (2012), 1--9.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356536"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i7.16736"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1781156"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2005.1570360"},{"key":"e_1_2_1_8_1","volume-title":"Synthesizing Physical Character-Scene Interactions. arXiv preprint arXiv:2302.00883","author":"Hassan Mohamed","year":"2023","unstructured":"Mohamed Hassan, Yunrong Guo, Tingwu Wang, Michael Black, Sanja Fidler, and Xue Bin Peng. 2023. Synthesizing Physical Character-Scene Interactions. arXiv preprint arXiv:2302.00883 (2023)."},{"key":"e_1_2_1_9_1","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840--6851.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3386569.3392440"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073663"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2018.2792536"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3475946.3480950"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1401132.1401202"},{"key":"e_1_2_1_15_1","volume-title":"Computer Graphics Forum","author":"Kwiatkowski Ariel","unstructured":"Ariel Kwiatkowski, Eduardo Alvarado, Vicky Kalogeiton, C Karen Liu, Julien Pettr\u00e9, Michiel van de Panne, and Marie-Paule Cani. 2022. A survey on reinforcement learning methods in character animation. In Computer Graphics Forum, Vol. 41. Wiley Online Library, 613--639."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3272127.3275016"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3386569.3392422"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1576246.1531365"},{"key":"e_1_2_1_19_1","first-page":"1","article-title":"Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning","volume":"37","author":"Liu Libin","year":"2018","unstructured":"Libin Liu and Jessica Hodgins. 2018. Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1--14.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3386569.3392474"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2185520.2185539"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356501"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201311"},{"key":"e_1_2_1_24_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3072959.3073602","article-title":"Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning","volume":"36","author":"Peng Xue Bin","year":"2017","unstructured":"Xue Bin Peng, Glen Berseth, KangKang Yin, and Michiel van de Panne. 2017. Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Transactions on Graphics (TOG) 36, 4 (2017), 1--13.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530110"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11671"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01129"},{"key":"e_1_2_1_28_1","volume-title":"Drop Prevention Control for Humanoid Robots Carrying Stacked Boxes. In 2021 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 4118--4125","author":"Sato Shimpei","year":"2021","unstructured":"Shimpei Sato, Yuta Kojio, Kunio Kojima, Fumihito Sugai, Yohei Kakiuchi, Kei Okada, and Masayuki Inaba. 2021. Drop Prevention Control for Humanoid Robots Carrying Stacked Boxes. In 2021 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 4118--4125."},{"key":"e_1_2_1_29_1","volume-title":"Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347","author":"Schulman John","year":"2017","unstructured":"John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)."},{"key":"e_1_2_1_30_1","volume-title":"International Conference on Machine Learning. PMLR, 2256--2265","author":"Sohl-Dickstein Jascha","year":"2015","unstructured":"Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning. PMLR, 2256--2265."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356505"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01291"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCG.2011.30"},{"key":"e_1_2_1_34_1","volume-title":"Human motion diffusion model. arXiv preprint arXiv:2209.14916","author":"Tevet Guy","year":"2022","unstructured":"Guy Tevet, Sigal Raab, Brian Gordon, Yonatan Shafir, Daniel Cohen-Or, and Amit H Bermano. 2022. Human motion diffusion model. arXiv preprint arXiv:2209.14916 (2022)."},{"key":"e_1_2_1_35_1","volume-title":"EDGE: Editable Dance Generation From Music. arXiv preprint arXiv:2211.10658","author":"Tseng Jonathan","year":"2022","unstructured":"Jonathan Tseng, Rodrigo Castellon, and C Karen Liu. 2022. EDGE: Editable Dance Generation From Music. arXiv preprint arXiv:2211.10658 (2022)."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3386569.3392381"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530067"},{"key":"e_1_2_1_38_1","volume-title":"Hung Yu Ling, and Michiel van de Panne","author":"Xie Zhaoming","year":"2022","unstructured":"Zhaoming Xie, Sebastian Starke, Hung Yu Ling, and Michiel van de Panne. 2022. Learning Soccer Juggling Skills with Layer-wise Mixture-of-Experts. (2022)."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550454.3555437"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550454.3555434"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1276377.1276509"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459817"},{"key":"e_1_2_1_43_1","volume-title":"PhysDiff: Physics-Guided Human Motion Diffusion Model. arXiv preprint arXiv:2212.02500","author":"Yuan Ye","year":"2022","unstructured":"Ye Yuan, Jiaming Song, Umar Iqbal, Arash Vahdat, and Jan Kautz. 2022. PhysDiff: Physics-Guided Human Motion Diffusion Model. arXiv preprint arXiv:2212.02500 (2022)."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3478513.3480500"},{"key":"e_1_2_1_45_1","volume-title":"Proceedings, Part V. Springer, 518--535","author":"Zhang Xiaohan","year":"2022","unstructured":"Xiaohan Zhang, Bharat Lal Bhatnagar, Sebastian Starke, Vladimir Guzov, and Gerard Pons-Moll. 2022. Couch: towards controllable human-chair interactions. In Computer Vision--ECCV 2022:17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part V. Springer, 518--535."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00589"}],"container-title":["Proceedings of the ACM on Computer Graphics and Interactive Techniques"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3606931","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3606931","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:48:52Z","timestamp":1750182532000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3606931"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,16]]},"references-count":46,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,8,16]]}},"alternative-id":["10.1145\/3606931"],"URL":"https:\/\/doi.org\/10.1145\/3606931","relation":{},"ISSN":["2577-6193"],"issn-type":[{"value":"2577-6193","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,16]]},"assertion":[{"value":"2023-08-24","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}