{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T19:34:57Z","timestamp":1770752097900,"version":"3.50.0"},"reference-count":72,"publisher":"Association for Computing Machinery (ACM)","issue":"2","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62441617"],"award-info":[{"award-number":["62441617"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["226-2025-00080"],"award-info":[{"award-number":["226-2025-00080"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,2,28]]},"abstract":"<jats:p>\n                    This article presents SEEAvatar, a novel approach to generate photorealistic 3D avatars from text descriptions. Despite the fact that recent text-to-3D avatar generation methods have shown promising results, their joint representation and optimization of geometry and appearance often yield coarse results and limit practical applications. Our method introduces novel constraints for decoupled geometry and appearance. First, we constrain geometric optimization using a template avatar, which evolves periodically to enable flexible shape generation while maintaining decent human shape. The detailed geometry features in faces and hands are also preserved from static human priors. Second, we leverage diffusion models to guide a physically based rendering pipeline for texture generation, incorporating a lightness constraint on albedo textures to suppress incorrect lighting effects. Experimental results demonstrate that our method significantly outperforms existing methods in both global and local geometry quality as well as appearance fidelity. The high-quality meshes and textures produced by our approach are directly compatible with traditional graphics pipelines, enabling immediate practical applications. Project page at:\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/yoxu515.github.io\/SEEAvatar\/\">https:\/\/yoxu515.github.io\/SEEAvatar\/<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1145\/3774422","type":"journal-article","created":{"date-parts":[[2025,11,26]],"date-time":"2025-11-26T15:05:52Z","timestamp":1764169552000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Photorealistic Text-to-3D Avatar Generation with Constraints for Decoupled Geometry and Appearance"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6800-2465","authenticated-orcid":false,"given":"Yuanyou","family":"Xu","sequence":"first","affiliation":[{"name":"CCAI, Computer Science, Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8783-8313","authenticated-orcid":false,"given":"Zongxin","family":"Yang","sequence":"additional","affiliation":[{"name":"DBMI, Harvard Medical School, Boston, Massachusetts, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0512-880X","authenticated-orcid":false,"given":"Yi","family":"Yang","sequence":"additional","affiliation":[{"name":"CCAI, Computer Science, Zhejiang University, Hangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2026,2,10]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00541"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3182179"},{"key":"e_1_3_2_4_2","unstructured":"Yukang Cao Yan-Pei Cao Kai Han Ying Shan and Kwan-Yee K. Wong. 2023. DreamAvatar: Text-and-shape guided 3D human avatar generation via diffusion models. arXiv:2304.00916. Retrieved from https:\/\/arxiv.org\/abs\/2304.00916"},{"key":"e_1_3_2_5_2","doi-asserted-by":"crossref","unstructured":"Dave Zhenyu Chen Yawar Siddiqui Hsin-Ying Lee Sergey Tulyakov and Matthias Nie\u00dfner. 2023. Text2tex: Text-driven texture synthesis via diffusion models. arXiv:2303.11396. Retrieved from https:\/\/arxiv.org\/abs\/2303.11396","DOI":"10.1109\/ICCV51070.2023.01701"},{"key":"e_1_3_2_6_2","doi-asserted-by":"crossref","unstructured":"Rui Chen Yongwei Chen Ningxin Jiao and Kui Jia. 2023. Fantasia3D: Disentangling geometry and appearance for high-quality text-to-3D content creation. arXiv:2303.13873. Retrieved from https:\/\/arxiv.org\/abs\/2303.13873","DOI":"10.1109\/ICCV51070.2023.02033"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3635717"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3700770"},{"key":"e_1_3_2_9_2","first-page":"9936","article-title":"Learning deformable tetrahedral meshes for 3D reconstruction","volume":"33","author":"Gao Jun","year":"2020","unstructured":"Jun Gao, Wenzheng Chen, Tommy Xiang, Alec Jacobson, Morgan McGuire, and Sanja Fidler. 2020. Learning deformable tetrahedral meshes for 3D reconstruction. In Advances in Neural Information Processing Systems 33 (2020), 9936\u20139947.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_10_2","unstructured":"Xiao Han Yukang Cao Kai Han Xiatian Zhu Jiankang Deng Yi-Zhe Song Tao Xiang and Kwan-Yee K. Wong. 2023. HeadSculpt: Crafting 3D head avatars with text. arXiv:2306.03038. Retrieved from https:\/\/arxiv.org\/abs\/2306.03038"},{"key":"e_1_3_2_11_2","first-page":"22856","article-title":"Shape, light, and material decomposition from images using Monte Carlo rendering and denoising","volume":"35","author":"Hasselgren Jon","year":"2022","unstructured":"Jon Hasselgren, Nikolai Hofmann, and Jacob Munkberg. 2022. Shape, light, and material decomposition from images using Monte Carlo rendering and denoising. In Advances in Neural Information Processing Systems, Vol. 35, 22856\u201322869.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_12_2","article-title":"GANs trained by a two time-scale update rule converge to a local Nash equilibrium","volume":"30","author":"Heusel Martin","year":"2017","unstructured":"Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, Vol. 30.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_13_2","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, Vol. 33, 6840\u20136851.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_14_2","unstructured":"Fangzhou Hong Mingyuan Zhang Liang Pan Zhongang Cai Lei Yang and Ziwei Liu. 2022. AvatarCLIP: Zero-shot text-driven generation and animation of 3D avatars. arXiv:2205.08535. Retrieved from https:\/\/arxiv.org\/abs\/2205.08535"},{"key":"e_1_3_2_15_2","doi-asserted-by":"crossref","unstructured":"Shuo Huang Zongxin Yang Liangting Li Yi Yang and Jia Jia. 2023. AvatarFusion: Zero-shot generation of clothing-decoupled 3D avatars using 2D diffusion. arXiv:2307.06526. Retrieved from https:\/\/arxiv.org\/abs\/2307.06526","DOI":"10.1145\/3581783.3612022"},{"key":"e_1_3_2_16_2","unstructured":"Yukun Huang Jianan Wang Ailing Zeng He Cao Xianbiao Qi Yukai Shi Zheng-Jun Zha and Lei Zhang. 2023. DreamWaltz: Make a scene with complex 3D animatable avatars. arXiv:2305.12529. Retrieved from https:\/\/arxiv.org\/abs\/2305.12529"},{"key":"e_1_3_2_17_2","unstructured":"Yangyi Huang Hongwei Yi Yuliang Xiu Tingting Liao Jiaxiang Tang Deng Cai and Justus Thies. 2023. TeCH: Text-guided reconstruction of lifelike clothed humans. arXiv:2308.08545. Retrieved from https:\/\/arxiv.org\/abs\/2308.08545"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00094"},{"key":"e_1_3_2_19_2","doi-asserted-by":"crossref","unstructured":"Ruixiang Jiang Can Wang Jingbo Zhang Menglei Chai Mingming He Dongdong Chen and Jing Liao. 2023. AvatarCraft: Transforming text into neural human avatars with parameterized shape and pose control. arXiv:2303.17606. Retrieved from https:\/\/arxiv.org\/abs\/2303.17606","DOI":"10.1109\/ICCV51070.2023.01322"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/15922.15902"},{"issue":"4","key":"e_1_3_2_21_2","first-page":"139","article-title":"3D Gaussian splatting for real-time radiance field rendering","volume":"42","author":"Kerbl Bernhard","year":"2023","unstructured":"Bernhard Kerbl, Georgios Kopanas, Thomas Leimk\u00fchler, and George Drettakis. 2023. 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph 42, 4 (2023), 139\u2013131.","journal-title":"ACM Trans. Graph"},{"key":"e_1_3_2_22_2","first-page":"36652","article-title":"Pick-a-pic: An open dataset of user preferences for text-to-image generation","volume":"36","author":"Kirstain Yuval","year":"2023","unstructured":"Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. 2023. Pick-a-pic: An open dataset of user preferences for text-to-image generation. In Advances in Neural Information Processing Systems, Vol. 36, 36652\u201336663.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_23_2","unstructured":"Nikos Kolotouros Thiemo Alldieck Andrei Zanfir Eduard Gabriel Bazavan Mihai Fieraru and Cristian Sminchisescu. 2023. DreamHuman: Animatable 3D avatars from text. arXiv:2306.09329. Retrieved from https:\/\/arxiv.org\/abs\/2306.09329"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130813"},{"key":"e_1_3_2_25_2","unstructured":"Tingting Liao Hongwei Yi Yuliang Xiu Jiaxaing Tang Yangyi Huang Justus Thies and Michael J. Black. 2023. Tada! text to animatable digital avatars. arXiv:2308.10899. Retrieved from https:\/\/arxiv.org\/abs\/2308.10899"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00037"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00635"},{"key":"e_1_3_2_28_2","first-page":"851","volume-title":"Proceedings of the Seminal Graphics Papers: Pushing the Boundaries","volume":"2","author":"Loper Matthew","year":"2023","unstructured":"Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2023. SMPL: A skinned multi-person linear model. In Proceedings of the Seminal Graphics Papers: Pushing the Boundaries, Vol. 2, 851\u2013866."},{"key":"e_1_3_2_29_2","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1145\/280811.281026","volume-title":"Proceedings of the Seminal Graphics: Pioneering Efforts that Shaped the Field","author":"Lorensen William E.","year":"1998","unstructured":"William E. Lorensen and Harvey E. Cline. 1998. Marching cubes: A high resolution 3D surface construction algorithm. In Proceedings of the Seminal Graphics: Pioneering Efforts that Shaped the Field, 347\u2013353."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3687475"},{"issue":"1","key":"e_1_3_2_31_2","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1145\/3503250","article-title":"NeRF: Representing scenes as neural radiance fields for view synthesis","volume":"65","author":"Mildenhall Ben","year":"2021","unstructured":"Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2021. NeRF: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM 65, 1 (2021), 99\u2013106.","journal-title":"Communications of the ACM"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3550469.3555392"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530127"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00810"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01123"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/280811.280980"},{"key":"e_1_3_2_37_2","unstructured":"Ben Poole Ajay Jain Jonathan T. Barron and Ben Mildenhall. 2022. DreamFusion: Text-to-3D using 2D diffusion. arXiv:2209.14988. Retrieved from https:\/\/arxiv.org\/abs\/2209.14988"},{"key":"e_1_3_2_38_2","unstructured":"Optimization Progress. 2023. DreamGaussian: Generative Gaussian splatting for efficient 3D content creation. arXiv:2309.16653. Retrieved from https:\/\/arxiv.org\/abs\/2309.16653"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00946"},{"key":"e_1_3_2_40_2","first-page":"8748","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, 8748\u20138763."},{"key":"e_1_3_2_41_2","first-page":"8821","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Ramesh Aditya","year":"2021","unstructured":"Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. In Proceedings of the International Conference on Machine Learning, 8821\u20138831."},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","unstructured":"Elad Richardson Gal Metzer Yuval Alaluf Raja Giryes and Daniel Cohen-Or. 2023. Texture: Text-guided texturing of 3D shapes. arXiv:2302.01721. Retrieved from https:\/\/arxiv.org\/abs\/2302.01721","DOI":"10.1145\/3588432.3591503"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_2_44_2","unstructured":"Javier Romero Dimitrios Tzionas and Michael J. Black. 2022. Embodied hands: Modeling and capturing hands and bodies together. arXiv:2201.02610. Retrieved from https:\/\/arxiv.org\/abs\/2201.02610"},{"issue":"3","key":"e_1_3_2_45_2","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1145\/1015706.1015720","article-title":"GrabCut\u201d interactive foreground extraction using iterated graph cuts","volume":"23","author":"Rother Carsten","year":"2004","unstructured":"Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. \u201cGrabCut\u201d interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 3 (2004), 309\u2013314.","journal-title":"ACM Trans. Graph"},{"key":"e_1_3_2_46_2","first-page":"36479","article-title":"Photorealistic text-to-image diffusion models with deep language understanding","volume":"35","author":"Saharia Chitwan","year":"2022","unstructured":"Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems, Vol. 35, 36479\u201336494.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01805"},{"key":"e_1_3_2_48_2","first-page":"6087","article-title":"Deep marching tetrahedra: A hybrid representation for high-resolution 3D shape synthesis","volume":"34","author":"Shen Tianchang","year":"2021","unstructured":"Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. 2021. Deep marching tetrahedra: A hybrid representation for high-resolution 3D shape synthesis. In Advances in Neural Information Processing Systems, Vol. 34, 6087\u20136101.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_49_2","first-page":"4811","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"38","author":"Shen Xiaolong","year":"2024","unstructured":"Xiaolong Shen, Jianxin Ma, Chang Zhou, and Zongxin Yang. 2024. Controllable 3D face generation with conditional style code diffusion. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 4811\u20134819."},{"key":"e_1_3_2_50_2","first-page":"72","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Siddiqui Yawar","year":"2022","unstructured":"Yawar Siddiqui, Justus Thies, Fangchang Ma, Qi Shan, Matthias Nie\u00dfner, and Angela Dai. 2022. Texturify: Generating textures on 3D shape surfaces. In Proceedings of the European Conference on Computer Vision. Springer, 72\u201388."},{"key":"e_1_3_2_51_2","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1145\/1179352.1141964","volume-title":"Proceedings of the ACM SIGGRAPH 2006 Papers","author":"Snavely Noah","year":"2006","unstructured":"Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2006. Photo tourism: Exploring photo collections in 3D. In Proceedings of the ACM SIGGRAPH 2006 Papers, 835\u2013846."},{"key":"e_1_3_2_52_2","unstructured":"Jiaming Song Chenlin Meng and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv:2010.02502. Retrieved from https:\/\/arxiv.org\/abs\/2010.02502"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3673901"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01214"},{"key":"e_1_3_2_55_2","unstructured":"Peng Wang Lingjie Liu Yuan Liu Christian Theobalt Taku Komura and Wenping Wang. 2021. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv:2106.10689. Retrieved from https:\/\/arxiv.org\/abs\/2106.10689"},{"issue":"1","key":"e_1_3_2_56_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1631\/FITEE.2400250","article-title":"Visual knowledge in the big model era: Retrospect and prospect","volume":"26","author":"Wang Wenguan","year":"2025","unstructured":"Wenguan Wang, Yi Yang, and Yunhe Pan. 2025. Visual knowledge in the big model era: Retrospect and prospect. Front. Inform. Tech. El. Eng. 26, 1 (2025), 1\u201319.","journal-title":"Front. Inform. Tech. El. Eng"},{"key":"e_1_3_2_57_2","unstructured":"Zhengyi Wang Cheng Lu Yikai Wang Fan Bao Chongxuan Li Hang Su and Jun Zhu. 2023. ProlificDreamer: High-fidelity and diverse text-to-3D generation with variational score distillation. arXiv:2305.16213. Retrieved from https:\/\/arxiv.org\/abs\/2305.16213"},{"key":"e_1_3_2_58_2","first-page":"314","volume-title":"Proceedings of the Computer Vision and Pattern Recognition Conference","author":"Xu Yuanyou","year":"2025","unstructured":"Yuanyou Xu, Zongxin Yang, and Yi Yang. 2025. SKDream: Controllable multi-view and 3D generation with arbitrary skeletons. In Proceedings of the Computer Vision and Pattern Recognition Conference, 314\u2013325."},{"key":"e_1_3_2_59_2","first-page":"162","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Yang Haibo","year":"2024","unstructured":"Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, and Tao Mei. 2024. DreamMesh: Jointly manipulating and texturing triangle meshes for text-to-3D generation. In Proceedings of the European Conference on Computer Vision. Springer, 162\u2013178."},{"issue":"12","key":"e_1_3_2_60_2","doi-asserted-by":"crossref","first-page":"1551","DOI":"10.1631\/FITEE.2100463","article-title":"Multiple knowledge representation for big data artificial intelligence: Framework, applications, and case studies","volume":"22","author":"Yang Yi","year":"2021","unstructured":"Yi Yang, Yueting Zhuang, and Yunhe Pan. 2021. Multiple knowledge representation for big data artificial intelligence: Framework, applications, and case studies. Front. Inform. Tech. El. Eng. 22, 12 (2021), 1551\u20131558.","journal-title":"Front. Inform. Tech. El. Eng"},{"issue":"9","key":"e_1_3_2_61_2","doi-asserted-by":"crossref","first-page":"6247","DOI":"10.1109\/TPAMI.2024.3383592","article-title":"Scalable video object segmentation with identification mechanism","volume":"46","author":"Yang Zongxin","year":"2024","unstructured":"Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, and Yi Yang. 2024. Scalable video object segmentation with identification mechanism. IEEE Trans. Pattern Anal. Mach. Intell. 46, 9 (2024), 6247\u20136262.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/3687909"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00407"},{"key":"e_1_3_2_64_2","unstructured":"Yifei Zeng Yuanxun Lu Xinya Ji Yao Yao Hao Zhu and Xun Cao. 2023. AvatarBooth: High-quality and customizable 3D human avatar generation. arXiv:2306.09864. Retrieved from https:\/\/arxiv.org\/abs\/2306.09864"},{"key":"e_1_3_2_65_2","doi-asserted-by":"crossref","unstructured":"Huichao Zhang Bowen Chen Hao Yang Liao Qu Xu Wang Li Chen Chao Long Feida Zhu Kang Du and Min Zheng. 2023. AvatarVerse: High-quality & stable 3D avatar creation from text and pose. arXiv:2308.03610. Retrieved from https:\/\/arxiv.org\/abs\/2308.03610","DOI":"10.1609\/aaai.v38i7.28540"},{"key":"e_1_3_2_66_2","unstructured":"Hao Zhang Yao Feng Peter Kulits Yandong Wen Justus Thies and Michael J. Black. 2023. Text-guided generation and editing of compositional 3D avatars. arXiv:2309.07125. Retrieved from https:\/\/arxiv.org\/abs\/2309.07125"},{"key":"e_1_3_2_67_2","doi-asserted-by":"crossref","unstructured":"Longwen Zhang Qiwei Qiu Hongyang Lin Qixuan Zhang Cheng Shi Wei Yang Ye Shi Sibei Yang Lan Xu and Jingyi Yu. 2023. DreamFace: Progressive generation of animatable 3D faces under text guidance. arXiv:2304.03117. Retrieved from https:\/\/arxiv.org\/abs\/2304.03117","DOI":"10.1145\/3592094"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"e_1_3_2_69_2","first-page":"7818","article-title":"Global-correlated 3D-decoupling transformer for clothed avatar reconstruction","volume":"36","author":"Zhang Zechuan","year":"2023","unstructured":"Zechuan Zhang, Li Sun, Zongxin Yang, Ling Chen, and Yi Yang. 2023. Global-correlated 3D-decoupling transformer for clothed avatar reconstruction. In Advances in Neural Information Processing Systems, Vol. 36, 7818\u20137830.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00948"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","unstructured":"Dewei Zhou You Li Fan Ma Zongxin Yang and Yi Yang. 2025. MIGC++: Advanced multi-instance generation controller for image synthesis. IEEE Trans. Pattern Anal. Mach. Intell. 47 3 (March 2025) 1714\u20131728. DOI: 10.1109\/TPAMI.2024.3510752","DOI":"10.1109\/TPAMI.2024.3510752"},{"key":"e_1_3_2_72_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Zhou Dewei","year":"2025","unstructured":"Dewei Zhou, Ji Xie, Zongxin Yang, and Yi Yang. 2025. 3DIS: Depth-driven decoupled instance synthesis for text-to-image generation. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_73_2","first-page":"145","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Zhou Zhenglin","year":"2024","unstructured":"Zhenglin Zhou, Fan Ma, Hehe Fan, Zongxin Yang, and Yi Yang. 2024. HeadStudio: Text to animatable head avatars with 3D Gaussian splatting. In Proceedings of the European Conference on Computer Vision. Springer, 145\u2013163."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3774422","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T12:14:34Z","timestamp":1770725674000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3774422"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,10]]},"references-count":72,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2026,2,28]]}},"alternative-id":["10.1145\/3774422"],"URL":"https:\/\/doi.org\/10.1145\/3774422","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,10]]},"assertion":[{"value":"2024-12-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-23","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-02-10","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}