{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T03:16:38Z","timestamp":1774667798937,"version":"3.50.1"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T00:00:00Z","timestamp":1731974400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2024,12,19]]},"abstract":"<jats:p>Crafting a single, versatile physics-based controller that can breathe life into interactive characters across a wide spectrum of scenarios represents an exciting frontier in character animation. An ideal controller should support diverse control modalities, such as sparse target keyframes, text instructions, and scene information. While previous works have proposed physically simulated, scene-aware control models, these systems have predominantly focused on developing controllers that each specializes in a narrow set of tasks and control modalities. This work presents MaskedMimic, a novel approach that formulates physics-based character control as a general motion inpainting problem. Our key insight is to train a single unified model to synthesize motions from partial (masked) motion descriptions, such as masked keyframes, objects, text descriptions, or any combination thereof. This is achieved by leveraging motion tracking data and designing a scalable training method that can effectively utilize diverse motion descriptions to produce coherent animations. Through this process, our approach learns a physics-based controller that provides an intuitive control interface without requiring tedious reward engineering for all behaviors of interest. The resulting controller supports a wide range of control modalities and enables seamless transitions between disparate tasks. By unifying character control through motion inpainting, MaskedMimic creates versatile virtual characters. These characters can dynamically adapt to complex scenes and compose diverse motions on demand, enabling more interactive and immersive experiences.<\/jats:p>","DOI":"10.1145\/3687951","type":"journal-article","created":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T15:46:04Z","timestamp":1732031164000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":28,"title":["MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting"],"prefix":"10.1145","volume":"43","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6447-9864","authenticated-orcid":false,"given":"Chen","family":"Tessler","sequence":"first","affiliation":[{"name":"NVIDIA Research, Tel Aviv, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7468-6162","authenticated-orcid":false,"given":"Yunrong","family":"Guo","sequence":"additional","affiliation":[{"name":"NVIDIA, Santa Clara, United States of America"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-1435-3399","authenticated-orcid":false,"given":"Ofir","family":"Nabati","sequence":"additional","affiliation":[{"name":"NVIDIA Research, Tel Aviv, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9164-5303","authenticated-orcid":false,"given":"Gal","family":"Chechik","sequence":"additional","affiliation":[{"name":"NVIDIA Research, Tel Aviv, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3677-5655","authenticated-orcid":false,"given":"Xue Bin","family":"Peng","sequence":"additional","affiliation":[{"name":"NVIDIA, Vancouver, Canada"}]}],"member":"320","published-online":{"date-parts":[[2024,11,19]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1781157"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01148"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3610548.3618205"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00054"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2508363.2508399"},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Deepak Gopinath Hanbyul Joo and Jungdam Won. 2022. Motion In-betweening for Physically Simulated Characters. In SIGGRAPH Asia 2022 Posters. 1--2.","DOI":"10.1145\/3550082.3564186"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1080\/10867651.1998.10487493"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00509"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01118"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591525"},{"key":"e_1_2_1_12_1","volume-title":"International conference on learning representations.","author":"Higgins Irina","year":"2016","unstructured":"Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2016. beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations."},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3272127.3275108","article-title":"Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time","volume":"37","author":"Huang Yinghao","year":"2018","unstructured":"Yinghao Huang, Manuel Kaufmann, Emre Aksan, Michael J Black, Otmar Hilliges, and Gerard Pons-Moll. 2018. Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Transactions on Graphics (TOG) 37, 6 (2018), 1--15.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550469.3555391"},{"key":"e_1_2_1_15_1","volume-title":"SuperPADL: Scaling Language-Directed Physics-Based Control with Progressive Supervised Distillation. In ACM SIGGRAPH 2024 Conference Papers. 1--11","author":"Juravsky Jordan","year":"2024","unstructured":"Jordan Juravsky, Yunrong Guo, Sanja Fidler, and Xue Bin Peng. 2024. SuperPADL: Scaling Language-Directed Physics-Based Control with Progressive Supervised Distillation. In ACM SIGGRAPH 2024 Conference Papers. 1--11."},{"key":"e_1_2_1_16_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591504"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1781155"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1781155"},{"key":"e_1_2_1_20_1","volume-title":"Sampling-based Contact-rich Motion Control. ACM Transctions on Graphics 29, 4","author":"Liu Libin","year":"2010","unstructured":"Libin Liu, KangKang Yin, Michiel van de Panne, Tianjia Shao, and Weiwei Xu. 2010. Sampling-based Contact-rich Motion Control. ACM Transctions on Graphics 29, 4 (2010), Article 128."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01000"},{"key":"e_1_2_1_22_1","volume-title":"Universal Humanoid Motion Representations for Physics-Based Control. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=OrOd8PxOO2","author":"Luo Zhengyi","year":"2024","unstructured":"Zhengyi Luo, Jinkun Cao, Josh Merel, Alexander Winkler, Jing Huang, Kris M. Kitani, and Weipeng Xu. 2024. Universal Humanoid Motion Representations for Physics-Based Control. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=OrOd8PxOO2"},{"key":"e_1_2_1_23_1","first-page":"25019","article-title":"Dynamics-regulated kinematic policy for egocentric pose estimation","volume":"34","author":"Luo Zhengyi","year":"2021","unstructured":"Zhengyi Luo, Ryo Hachiuma, Ye Yuan, and Kris Kitani. 2021. Dynamics-regulated kinematic policy for egocentric pose estimation. Advances in Neural Information Processing Systems 34 (2021), 25019--25032.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_24_1","first-page":"6815","article-title":"Embodied scene-aware human pose estimation","volume":"35","author":"Luo Zhengyi","year":"2022","unstructured":"Zhengyi Luo, Shun Iwase, Ye Yuan, and Kris Kitani. 2022a. Embodied scene-aware human pose estimation. Advances in Neural Information Processing Systems 35 (2022), 6815--6828.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_25_1","volume-title":"From Universal Humanoid Control to Automatic Physically Valid Character Creation. arXiv preprint arXiv:2206.09286","author":"Luo Zhengyi","year":"2022","unstructured":"Zhengyi Luo, Ye Yuan, and Kris M Kitani. 2022b. From Universal Humanoid Control to Automatic Physically Valid Character Creation. arXiv preprint arXiv:2206.09286 (2022)."},{"key":"e_1_2_1_26_1","volume-title":"The Twelfth International Conference on Learning Representations.","author":"Ma Yecheng Jason","year":"2023","unstructured":"Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Eureka: Human-Level Reward Design via Coding Large Language Models. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_2_1_27_1","volume-title":"International Conference on Computer Vision. 5442--5451","author":"Mahmood Naureen","unstructured":"Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, and Michael J. Black. 2019. AMASS: Archive of Motion Capture as Surface Shapes. In International Conference on Computer Vision. 5442--5451."},{"key":"e_1_2_1_28_1","unstructured":"Viktor Makoviychuk Lukasz Wawrzyniak Yunrong Guo Michelle Lu Kier Storey Miles Macklin David Hoeller Nikita Rudin Arthur Allshire Ankur Handa and Gavriel State. 2021. Isaac Gym: High Performance GPU Based Physics Simulation For Robot Learning. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). https:\/\/openreview.net\/forum?id=fgFBtYgJQX_"},{"key":"e_1_2_1_29_1","volume-title":"International conference on machine learning. PMLR","author":"Mnih Volodymyr","year":"2016","unstructured":"Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928--1937."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19772-7_1"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/3DV62453.2024.00149"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201311"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530110"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459670"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01080"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings IEEE\/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). 722--731","author":"Punnakkal Abhinanda R.","unstructured":"Abhinanda R. Punnakkal, Arjun Chandrasekaran, Nikos Athanasiou, Alejandra Quiros-Ramirez, and Michael J. Black. 2021. BABEL: Bodies, Action and Behavior with English Labels. In Proceedings IEEE\/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). 722--731."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01322"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 627--635","author":"Ross St\u00e9phane","year":"2011","unstructured":"St\u00e9phane Ross, Geoffrey Gordon, and Drew Bagnell. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 627--635."},{"key":"e_1_2_1_39_1","volume-title":"High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438","author":"Schulman John","year":"2015","unstructured":"John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2015. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-8659.2008.01134.x"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591541"},{"key":"e_1_2_1_42_1","volume-title":"Human Motion Diffusion Model. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=SJ1kSyO2jwu","author":"Tevet Guy","year":"2023","unstructured":"Guy Tevet, Sigal Raab, Brian Gordon, Yoni Shafir, Daniel Cohen-or, and Amit Haim Bermano. 2023a. Human Motion Diffusion Model. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=SJ1kSyO2jwu"},{"key":"e_1_2_1_43_1","volume-title":"Human Motion Diffusion Model. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=SJ1kSyO2jwu","author":"Tevet Guy","year":"2023","unstructured":"Guy Tevet, Sigal Raab, Brian Gordon, Yoni Shafir, Daniel Cohen-or, and Amit Haim Bermano. 2023b. Human Motion Diffusion Model. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=SJ1kSyO2jwu"},{"key":"e_1_2_1_44_1","volume-title":"Voyager: An Open-Ended Embodied Agent with Large Language Models. Transactions on Machine Learning Research","author":"Wang Guanzhi","year":"2024","unstructured":"Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2024b. Voyager: An Open-Ended Embodied Agent with Large Language Models. Transactions on Machine Learning Research (2024). https:\/\/openreview.net\/forum?id=ehfRiF0R3a"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00075"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01981"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01203"},{"key":"e_1_2_1_48_1","volume-title":"Unicon: Universal neural controller for physics-based character motion. arXiv preprint arXiv:2011.15119","author":"Wang Tingwu","year":"2020","unstructured":"Tingwu Wang, Yunrong Guo, Maria Shugrina, and Sanja Fidler. 2020. Unicon: Universal neural controller for physics-based character motion. arXiv preprint arXiv:2011.15119 (2020)."},{"key":"e_1_2_1_49_1","volume-title":"Physhoi: Physics-based imitation of dynamic human-object interaction. arXiv preprint arXiv:2312.04393","author":"Wang Yinhuai","year":"2023","unstructured":"Yinhuai Wang, Jing Lin, Ailing Zeng, Zhengyi Luo, Jian Zhang, and Lei Zhang. 2023. Physhoi: Physics-based imitation of dynamic human-object interaction. arXiv preprint arXiv:2312.04393 (2023)."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550469.3555411"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530067"},{"key":"e_1_2_1_52_1","volume-title":"The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=1vCnDyQkjg","author":"Xiao Zeqi","year":"2024","unstructured":"Zeqi Xiao, Tai Wang, Jingbo Wang, Jinkun Cao, Wenwei Zhang, Bo Dai, Dahua Lin, and Jiangmiao Pang. 2024. Unified Human-Scene Interaction via Prompted Chain-of-Contacts. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=1vCnDyQkjg"},{"key":"e_1_2_1_53_1","volume-title":"The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=gd0lAEtWso","author":"Xie Yiming","year":"2024","unstructured":"Yiming Xie, Varun Jampani, Lei Zhong, Deqing Sun, and Huaizu Jiang. 2024. Omni-Control: Control Any Joint at Any Time for Human Motion Generation. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=gd0lAEtWso"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01371"},{"key":"e_1_2_1_55_1","volume-title":"Lobstr: Real-time lower-body pose prediction from sparse upper-body tracking signals. In Computer Graphics Forum","author":"Yang Dongseok","year":"2021","unstructured":"Dongseok Yang, Doyeon Kim, and Sung-Hee Lee. 2021. Lobstr: Real-time lower-body pose prediction from sparse upper-body tracking signals. In Computer Graphics Forum, Vol. 40. Wiley Online Library, 265--275."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550454.3555434"},{"key":"e_1_2_1_57_1","first-page":"21763","article-title":"Residual force control for agile human behavior imitation and extended motion synthesis","volume":"33","author":"Yuan Ye","year":"2020","unstructured":"Ye Yuan and Kris Kitani. 2020. Residual force control for agile human behavior imitation and extended motion synthesis. Advances in Neural Information Processing Systems 33 (2020), 21763--21774.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3618342"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01354"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01349"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3618397"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687951","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3687951","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:09:57Z","timestamp":1750295397000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687951"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,19]]},"references-count":60,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,12,19]]}},"alternative-id":["10.1145\/3687951"],"URL":"https:\/\/doi.org\/10.1145\/3687951","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,19]]},"assertion":[{"value":"2024-11-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}