{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,4]],"date-time":"2026-07-04T08:26:15Z","timestamp":1783153575141,"version":"3.54.6"},"reference-count":85,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2021,12,1]],"date-time":"2021-12-01T00:00:00Z","timestamp":1638316800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:p>We have recently seen great progress in building photorealistic animatable full-body codec avatars, but generating high-fidelity animation of clothing is still difficult. To address these difficulties, we propose a method to build an animatable clothed body avatar with an explicit representation of the clothing on the upper body from multi-view captured videos. We use a two-layer mesh representation to register each 3D scan separately with the body and clothing templates. In order to improve the photometric correspondence across different frames, texture alignment is then performed through inverse rendering of the clothing geometry and texture predicted by a variational autoencoder. We then train a new two-layer codec avatar with separate modeling of the upper clothing and the inner body layer. To learn the interaction between the body dynamics and clothing states, we use a temporal convolution network to predict the clothing latent code based on a sequence of input skeletal poses. We show photorealistic animation output for three different actors, and demonstrate the advantage of our clothed-body avatars over the single-layer avatars used in previous work. We also show the benefit of an explicit clothing model that allows the clothing texture to be edited in the animation output.<\/jats:p>","DOI":"10.1145\/3478513.3480545","type":"journal-article","created":{"date-parts":[[2021,12,10]],"date-time":"2021-12-10T18:29:20Z","timestamp":1639160960000},"page":"1-15","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":70,"title":["Modeling clothing as a separate layer for an animatable human avatar"],"prefix":"10.1145","volume":"40","author":[{"given":"Donglai","family":"Xiang","sequence":"first","affiliation":[{"name":"Carnegie Mellon University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fabian","family":"Prada","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Timur","family":"Bagautdinov","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Weipeng","family":"Xu","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yuan","family":"Dong","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"He","family":"Wen","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jessica","family":"Hodgins","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chenglei","family":"Wu","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs Research"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2021,12,10]]},"reference":[{"key":"e_1_2_2_1_1","volume-title":"Computer Graphics Forum","author":"Aberman Kfir","unstructured":"Kfir Aberman , Mingyi Shi , Jing Liao , Dani Lischinski , Baoquan Chen , and Daniel Cohen-Or . 2019. Deep video-based performance cloning . In Computer Graphics Forum , Vol. 38 . Wiley Online Library , 219--233. Kfir Aberman, Mingyi Shi, Jing Liao, Dani Lischinski, Baoquan Chen, and Daniel Cohen-Or. 2019. Deep video-based performance cloning. In Computer Graphics Forum, Vol. 38. Wiley Online Library, 219--233."},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1186822.1073207"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459850"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/280814.280821"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58565-5_21"},{"key":"e_1_2_2_6_1","volume-title":"PBNS: Physically Based Neural Simulator for Unsupervised Garment Pose Space Deformation. arXiv preprint arXiv:2012.11310","author":"Bertiche Hugo","year":"2020","unstructured":"Hugo Bertiche , Meysam Madadi , and Sergio Escalera . 2020 b. PBNS: Physically Based Neural Simulator for Unsupervised Garment Pose Space Deformation. arXiv preprint arXiv:2012.11310 (2020). Hugo Bertiche, Meysam Madadi, and Sergio Escalera. 2020b. PBNS: Physically Based Neural Simulator for Unsupervised Garment Pose Space Deformation. arXiv preprint arXiv:2012.11310 (2020)."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.265"},{"key":"e_1_2_2_8_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Bogo Federica","unstructured":"Federica Bogo , Javier Romero , Gerard Pons-Moll , and Michael J. Black . 2017. Dynamic FAUST: Registering Human Bodies in Motion . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Federica Bogo, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2017. Dynamic FAUST: Registering Human Bodies in Motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1399504.1360698"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/11744047_49"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2009.32"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3323010"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/882262.882309"},{"key":"e_1_2_2_14_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV).","author":"Chan Caroline","unstructured":"Caroline Chan , Shiry Ginosar , Tinghui Zhou , and Alexei A. Efros . 2019. Everybody Dance Now . In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). Caroline Chan, Shiry Ginosar, Tinghui Zhou, and Alexei A. Efros. 2019. Everybody Dance Now. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)."},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2816795.2818059"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.14107"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1399504.1360697"},{"key":"e_1_2_2_18_1","volume-title":"European Conference on Computer Vision Workshops. Springer, 409--425","author":"Esser Patrick","year":"2018","unstructured":"Patrick Esser , Johannes Haux , Timo Milbich , 2018 . Towards Learning a Realistic Rendering of Human Behavior . In European Conference on Computer Vision Workshops. Springer, 409--425 . Patrick Esser, Johannes Haux, Timo Milbich, et al. 2018. Towards Learning a Realistic Rendering of Human Behavior. In European Conference on Computer Vision Workshops. Springer, 409--425."},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206755"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.106"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2508363.2508380"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2786784.2786789"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1276377.1276438"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00883"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.353"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459749"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3311970"},{"key":"e_1_2_2_28_1","volume-title":"DeepCap: Monocular Human Performance Capture Using Weak Supervision. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Habermann Marc","year":"2020","unstructured":"Marc Habermann , Weipeng Xu , Michael Zollhofer , Gerard Pons-Moll , and Christian Theobalt . 2020 . DeepCap: Monocular Human Performance Capture Using Weak Supervision. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Marc Habermann, Weipeng Xu, Michael Zollhofer, Gerard Pons-Moll, and Christian Theobalt. 2020. DeepCap: Monocular Human Performance Capture Using Weak Supervision. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.141"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/3DV.2017.00055"},{"key":"e_1_2_2_31_1","volume-title":"Computer Graphics Forum","author":"Jacobson Alec","unstructured":"Alec Jacobson , Elif Tosun , Olga Sorkine , and Denis Zorin . 2010. Mixed finite elements for variational surface modeling . In Computer Graphics Forum , Vol. 29 . Wiley Online Library , 1565--1574. Alec Jacobson, Elif Tosun, Olga Sorkine, and Denis Zorin. 2010. Mixed finite elements for variational surface modeling. In Computer Graphics Forum, Vol. 29. Wiley Online Library, 1565--1574."},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.14108"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1409625.1409627"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2010324.1964988"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1053427.1053429"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2461912.2462020"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/258734.258801"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_41"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/344779.344862"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1661412.1618521"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3333002"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00780"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00600"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995424"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201401"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2816795.2818013"},{"key":"e_1_2_2_47_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Ma Qianli","unstructured":"Qianli Ma , Shunsuke Saito , Jinlong Yang , Siyu Tang , and Michael J. Black . 2021. SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Qianli Ma, Shunsuke Saito, Jinlong Yang, Siyu Tang, and Michael J. Black. 2021. SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_48_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Ma Qianli","unstructured":"Qianli Ma , Jinlong Yang , Anurag Ranjan , Sergi Pujades , Gerard Pons-Moll , Siyu Tang , and Michael J. Black . 2020. Learning to Dress 3D People in Generative Clothing . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Qianli Ma, Jinlong Yang, Anurag Ranjan, Sergi Pujades, Gerard Pons-Moll, Siyu Tang, and Michael J. Black. 2020. Learning to Dress 3D People in Generative Clothing. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/344779.344951"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.109"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366145.2366171"},{"key":"e_1_2_2_52_1","volume-title":"Black","author":"Osman Ahmed A A","year":"2020","unstructured":"Ahmed A A Osman , Timo Bolkart , and Michael J . Black . 2020 . STAR : A Sparse Trained Articulated Human Body Regressor. In Proceedings of the European Conference on Computer Vision (ECCV). Springer , 598--613. Ahmed A A Osman, Timo Bolkart, and Michael J. Black. 2020. STAR: A Sparse Trained Articulated Human Body Regressor. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 598--613."},{"key":"e_1_2_2_53_1","volume-title":"Deformable Neural Radiance Fields. arXiv preprint arXiv:2011.12948","author":"Park Keunhong","year":"2020","unstructured":"Keunhong Park , Utkarsh Sinha , Jonathan T Barron , Sofien Bouaziz , Dan B Goldman , Steven M Seitz , and Ricardo-Martin Brualla . 2020. Deformable Neural Radiance Fields. arXiv preprint arXiv:2011.12948 ( 2020 ). Keunhong Park, Utkarsh Sinha, Jonathan T Barron, Sofien Bouaziz, Dan B Goldman, Steven M Seitz, and Ricardo-Martin Brualla. 2020. Deformable Neural Radiance Fields. arXiv preprint arXiv:2011.12948 (2020)."},{"key":"e_1_2_2_54_1","volume-title":"Shape and Garment Style. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Patel Chaitanya","year":"2020","unstructured":"Chaitanya Patel , Zhouyingcheng Liao , and Gerard Pons-Moll . 2020 . TailorNet: Predicting Clothing in 3D as a Function of Human Pose , Shape and Garment Style. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Chaitanya Patel, Zhouyingcheng Liao, and Gerard Pons-Moll. 2020. TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00894"},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073711"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV48630.2021.00185"},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00899"},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01018"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00372"},{"key":"e_1_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130883"},{"key":"e_1_2_2_62_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Saito Shunsuke","unstructured":"Shunsuke Saito , Jinlong Yang , Qianli Ma , and Michael J. Black . 2021. SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Shunsuke Saito, Jinlong Yang, Qianli Ma, and Michael J. Black. 2021. SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_63_1","volume-title":"Computer Graphics Forum","author":"Santesteban Igor","unstructured":"Igor Santesteban , Miguel A Otaduy , and Dan Casas . 2019. Learning-based animation of clothing for virtual try-on . In Computer Graphics Forum , Vol. 38 . Wiley Online Library , 355--366. Igor Santesteban, Miguel A Otaduy, and Dan Casas. 2019. Learning-based animation of clothing for virtual try-on. In Computer Graphics Forum, Vol. 38. Wiley Online Library, 355--366."},{"key":"e_1_2_2_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01159"},{"key":"e_1_2_2_65_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58621-8_35"},{"key":"e_1_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00020"},{"key":"e_1_2_2_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCG.2007.68"},{"key":"e_1_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/1866158.1866161"},{"key":"e_1_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.14109"},{"key":"e_1_2_2_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/1399504.1360696"},{"key":"e_1_2_2_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/3DV.2017.00015"},{"key":"e_1_2_2_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/1833349.1778844"},{"key":"e_1_2_2_73_1","doi-asserted-by":"publisher","DOI":"10.5555\/3326943.3327049"},{"key":"e_1_2_2_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356512"},{"key":"e_1_2_2_75_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00565"},{"key":"e_1_2_2_76_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00371-005-0346-7"},{"key":"e_1_2_2_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/2508363.2508418"},{"key":"e_1_2_2_78_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33765-9_54"},{"key":"e_1_2_2_79_1","doi-asserted-by":"publisher","DOI":"10.1109\/3DV50981.2020.00042"},{"key":"e_1_2_2_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/3181973"},{"key":"e_1_2_2_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00565"},{"key":"e_1_2_2_82_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.582"},{"key":"e_1_2_2_83_1","volume-title":"Computer Graphics Forum","author":"Zhang Meng","unstructured":"Meng Zhang , Tuanfeng Wang , Duygu Ceylan , and Niloy J Mitra . 2021. Deep detail enhancement for any garment . In Computer Graphics Forum , Vol. 40 . Wiley Online Library , 399--411. Meng Zhang, Tuanfeng Wang, Duygu Ceylan, and Niloy J Mitra. 2021. Deep detail enhancement for any garment. In Computer Graphics Forum, Vol. 40. Wiley Online Library, 399--411."},{"key":"e_1_2_2_84_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.92"},{"key":"e_1_2_2_85_1","volume-title":"Computer Graphics Forum","author":"Zhou Bin","unstructured":"Bin Zhou , Xiaowu Chen , Qiang Fu , Kan Guo , and Ping Tan . 2013. Garment modeling from a single image . In Computer Graphics Forum , Vol. 32 . Wiley Online Library , 85--91. Bin Zhou, Xiaowu Chen, Qiang Fu, Kan Guo, and Ping Tan. 2013. Garment modeling from a single image. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 85--91."}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3478513.3480545","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3478513.3480545","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:11:40Z","timestamp":1750191100000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3478513.3480545"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12]]},"references-count":85,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["10.1145\/3478513.3480545"],"URL":"https:\/\/doi.org\/10.1145\/3478513.3480545","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12]]},"assertion":[{"value":"2021-12-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}