{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T15:02:21Z","timestamp":1777734141636,"version":"3.51.4"},"reference-count":74,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,7,26]],"date-time":"2023-07-26T00:00:00Z","timestamp":1690329600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2022YFF0902301"],"award-info":[{"award-number":["2022YFF0902301"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"name":"NSFC programs","award":["61976138"],"award-info":[{"award-number":["61976138"]}]},{"name":"NSFC programs","award":["61977047"],"award-info":[{"award-number":["61977047"]}]},{"DOI":"10.13039\/501100003399","name":"STCSM","doi-asserted-by":"crossref","award":["2015F0203-000-06"],"award-info":[{"award-number":["2015F0203-000-06"]}],"id":[{"id":"10.13039\/501100003399","id-type":"DOI","asserted-by":"crossref"}]},{"name":"SHMEC","award":["2019-01-07-00-01-E00003"],"award-info":[{"award-number":["2019-01-07-00-01-E00003"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2023,8]]},"abstract":"<jats:p>Emerging Metaverse applications demand accessible, accurate and easy-to-use tools for 3D digital human creations in order to depict different cultures and societies as if in the physical world. Recent large-scale vision-language advances pave the way for novices to conveniently customize 3D content. However, the generated CG-friendly assets still cannot represent the desired facial traits for human characteristics. In this paper, we present Dream-Face, a progressive scheme to generate personalized 3D faces under text guidance. It enables layman users to naturally customize 3D facial assets that are compatible with CG pipelines, with desired shapes, textures and fine-grained animation capabilities. From a text input to describe the facial traits, we first introduce a coarse-to-fine scheme to generate the neutral facial geometry with a unified topology. We employ a selection strategy in the CLIP embedding space to generate coarse geometry, and subsequently optimize both the detailed displacements and normals using Score Distillation Sampling (SDS) from the generic Latent Diffusion Model (LDM). Then, for neutral appearance generation, we introduce a dual-path mechanism, which combines the generic LDM with a novel texture LDM to ensure both the diversity and textural specification in the UV space. We also employ a two-stage optimization to perform SDS in both the latent and image spaces to significantly provide compact priors for fine-grained synthesis. It also enables learning the mapping from the compact latent space into physically-based textures (diffuse albedo, specular intensity, normal maps, etc.). Our generated neutral assets naturally support blendshapes-based facial animations, thanks to the unified geometric topology. We further improve the animation ability with personalized deformation characteristics. To this end, we learn the universal expression prior in a latent space with neutral asset conditioning using the cross-identity hypernetwork, we subsequently train a neural facial tracker from video input space into the pre-trained expression space for personalized fine-grained animation. Extensive qualitative and quantitative experiments validate the effectiveness and generalizability of DreamFace. Notably, DreamFace can generate realistic 3D facial assets with physically-based rendering quality and rich animation ability from video footage, even for fashion icons or exotic characters in cartoons and fiction movies.<\/jats:p>","DOI":"10.1145\/3592094","type":"journal-article","created":{"date-parts":[[2023,7,26]],"date-time":"2023-07-26T15:47:45Z","timestamp":1690386465000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":60,"title":["DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance"],"prefix":"10.1145","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8508-3359","authenticated-orcid":false,"given":"Longwen","family":"Zhang","sequence":"first","affiliation":[{"name":"ShanghaiTech University, Shanghai, China"},{"name":"Deemos Technology, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-0213-4744","authenticated-orcid":false,"given":"Qiwei","family":"Qiu","sequence":"additional","affiliation":[{"name":"ShanghaiTech University, Shanghai, China"},{"name":"Deemos Technology, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6581-7033","authenticated-orcid":false,"given":"Hongyang","family":"Lin","sequence":"additional","affiliation":[{"name":"ShanghaiTech University, Shanghai, China"},{"name":"Deemos Technology, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4837-7152","authenticated-orcid":false,"given":"Qixuan","family":"Zhang","sequence":"additional","affiliation":[{"name":"ShanghaiTech University, Shanghai, China"},{"name":"Deemos Technology, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6942-8481","authenticated-orcid":false,"given":"Cheng","family":"Shi","sequence":"additional","affiliation":[{"name":"ShanghaiTech University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1189-1254","authenticated-orcid":false,"given":"Wei","family":"Yang","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2302-4980","authenticated-orcid":false,"given":"Ye","family":"Shi","sequence":"additional","affiliation":[{"name":"ShanghaiTech University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8144-7351","authenticated-orcid":false,"given":"Sibei","family":"Yang","sequence":"additional","affiliation":[{"name":"ShanghaiTech University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8807-7787","authenticated-orcid":false,"given":"Lan","family":"Xu","sequence":"additional","affiliation":[{"name":"ShanghaiTech University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9198-6853","authenticated-orcid":false,"given":"Jingyi","family":"Yu","sequence":"additional","affiliation":[{"name":"ShanghaiTech University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,7,26]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00951"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1667239.1667251"},{"key":"e_1_2_2_3_1","doi-asserted-by":"crossref","unstructured":"Shivangi Aneja Justus Thies Angela Dai and Matthias Nie\u00dfner. 2022. ClipFace: Text-guided Editing of Textured 3D Morphable Models. In ArXiv preprint arXiv:2212.01406.","DOI":"10.1145\/3588432.3591566"},{"key":"e_1_2_2_4_1","unstructured":"Apple. 2023. ARKit - Face Tracking. https:\/\/developer.apple.com\/documentation\/arkit\/arfaceanchor."},{"key":"e_1_2_2_5_1","unstructured":"AUTOMATIC1111. 2022. stable-diffusion-webui. https:\/\/github.com\/AUTOMATIC1111\/stable-diffusion-webui."},{"key":"e_1_2_2_6_1","volume-title":"Blended Latent Diffusion. arXiv preprint arXiv:2206.02779","author":"Avrahami Omri","year":"2022","unstructured":"Omri Avrahami, Ohad Fried, and Dani Lischinski. 2022. Blended Latent Diffusion. arXiv preprint arXiv:2206.02779 (2022)."},{"key":"e_1_2_2_7_1","volume-title":"High-Fidelity 3D Digital Human Head Creation from RGB-D Selfies. ACM Transactions on Graphics","author":"Bao Linchao","year":"2021","unstructured":"Linchao Bao, Xiangkai Lin, Yajing Chen, Haoxian Zhang, Sheng Wang, Xuefei Zhe, Di Kang, Haozhi Huang, Xinwei Jiang, Jue Wang, Dong Yu, and Zhengyou Zhang. 2021. High-Fidelity 3D Digital Human Head Creation from RGB-D Selfies. ACM Transactions on Graphics (2021)."},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/311535.311556"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530143"},{"key":"e_1_2_2_10_1","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1109\/TVCG.2013.249","article-title":"Faceware-house: A 3d facial expression database for visual computing","volume":"20","author":"Cao Chen","year":"2013","unstructured":"Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2013. Faceware-house: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2013), 413--425.","journal-title":"IEEE Transactions on Visualization and Computer Graphics"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01565"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/3DV50981.2020.00044"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dsp.2009.10.008"},{"key":"e_1_2_2_14_1","volume-title":"TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition. In Advances in Neural Information Processing Systems (NeurIPS).","author":"Chen Yongwei","year":"2022","unstructured":"Yongwei Chen, Rui Chen, Jiabao Lei, Yabin Zhang, and Kui Jia. 2022. TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition. In Advances in Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19836-6_6"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/344779.344855"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01041"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","unstructured":"Yao Feng Haiwen Feng Michael J. Black and Timo Bolkart. 2021. Learning an Animatable Detailed 3D Face Model from In-The-Wild Images. ACM Transactions on Graphics (Proc. SIGGRAPH) 40 8. 10.1145\/3450626.3459936","DOI":"10.1145\/3450626.3459936"},{"key":"e_1_2_2_19_1","volume-title":"Generation High resolution 3D model from natural language by Generative Adversarial Network. arXiv preprint arXiv:1901.07165","author":"Fukamizu Kentaro","year":"2019","unstructured":"Kentaro Fukamizu, Masaaki Kondo, and Ryuichi Sakamoto. 2019. Generation High resolution 3D model from natural language by Generative Adversarial Network. arXiv preprint arXiv:1901.07165 (2019)."},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2638549"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530164"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58526-6_25"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00125"},{"key":"e_1_2_2_24_1","volume-title":"Fast-GANFIT: Generative Adversarial Network for High Fidelity 3D Face Reconstruction","author":"Gecer Baris","year":"2021","unstructured":"Baris Gecer, Stylianos Ploumpis, Irene Kotsia, and Stefanos P Zafeiriou. 2021. Fast-GANFIT: Generative Adversarial Network for High Fidelity 3D Face Reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)."},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024156.2024163"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3422622"},{"key":"e_1_2_2_27_1","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840--6851.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530094"},{"key":"e_1_2_2_29_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 20374--20384","author":"Hong Yang","year":"2022","unstructured":"Yang Hong, Bo Peng, Haiyao Xiao, Ligang Liu, and Juyong Zhang. 2022a. Headnerf: A real-time nerf-based parametric head model. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 20374--20384."},{"key":"e_1_2_2_30_1","volume-title":"Image-to-Image Translation with Conditional Adversarial Networks. CVPR","author":"Isola Phillip","year":"2017","unstructured":"Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. CVPR (2017)."},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00094"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00813"},{"key":"e_1_2_2_33_1","volume-title":"CLIP-Mesh: Generating textured meshes from text using pretrained image-text models. (December","author":"Khalid Nasir Mohammad","year":"2022","unstructured":"Nasir Mohammad Khalid, Tianhao Xie, Eugene Belilovsky, and Popa Tiberiu. 2022. CLIP-Mesh: Generating textured meshes from text using pretrained image-text models. (December 2022)."},{"key":"e_1_2_2_34_1","volume-title":"Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14--16, 2014, Conference Track Proceedings.","author":"Diederik","unstructured":"Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14--16, 2014, Conference Track Proceedings."},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3414685.3417861"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3099564.3099581"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00084"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3125598"},{"key":"e_1_2_2_39_1","volume-title":"Styleuv: Diverse and high-fidelity uv map generative model. arXiv preprint arXiv:2011.12893","author":"Lee Myunggi","year":"2020","unstructured":"Myunggi Lee, Wonwoong Cho, Moonheum Kim, David Inouye, and Nojun Kwak. 2020. Styleuv: Diverse and high-fidelity uv map generative model. arXiv preprint arXiv:2011.12893 (2020)."},{"key":"e_1_2_2_40_1","doi-asserted-by":"crossref","unstructured":"J. Li Z. Kuang Y. Zhao M. He and H. Li. 2020b. Dynamic Facial Asset and Rig Generation from a Single Scan. ACM Transactions on Graphics (TOG) (2020).","DOI":"10.1145\/3414685.3417817"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00347"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130813"},{"key":"e_1_2_2_43_1","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 5871--5880","author":"Liao Yiyi","year":"2020","unstructured":"Yiyi Liao, Katja Schwarz, Lars Mescheder, and Andreas Geiger. 2020. Towards un-supervised learning of generative models for 3d controllable image synthesis. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 5871--5880."},{"key":"e_1_2_2_44_1","volume-title":"Magic3D: High-Resolution Text-to-3D Content Creation. arXiv preprint arXiv:2211.10440","author":"Lin Chen-Hsuan","year":"2022","unstructured":"Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. 2022. Magic3D: High-Resolution Text-to-3D Content Creation. arXiv preprint arXiv:2211.10440 (2022)."},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19784-0_7"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201401"},{"key":"e_1_2_2_47_1","first-page":"10","article-title":"Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination","volume":"2007","author":"Ma Wan-Chun","year":"2007","unstructured":"Wan-Chun Ma, Tim Hawkins, Pieter Peers, Charles-Felix Chabert, Malte Weiss, Paul E Debevec, et al. 2007. Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination. Rendering Techniques 2007, 9 (2007), 10.","journal-title":"Rendering Techniques"},{"key":"e_1_2_2_48_1","unstructured":"Elman Mansimov Emilio Parisotto Jimmy Ba and Ruslan Salakhutdinov. 2016. Generating Images from Captions with Attention. In ICLR."},{"key":"e_1_2_2_49_1","volume-title":"Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures. arXiv e-prints","author":"Metzer Gal","year":"2022","unstructured":"Gal Metzer, Elad Richardson, Or Patashnik, Raja Giryes, and Daniel Cohen-Or. 2022. Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures. arXiv e-prints (2022), arXiv-2211."},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01313"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503250"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3478513.3480515"},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530127"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00810"},{"key":"e_1_2_2_55_1","volume-title":"Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741","author":"Nichol Alex","year":"2021","unstructured":"Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021)."},{"key":"e_1_2_2_56_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13503--13513","author":"Or-El Roy","year":"2022","unstructured":"Roy Or-El, Xuan Luo, Mengyi Shan, Eli Shechtman, Jeong Joon Park, and Ira Kemelmacher-Shlizerman. 2022. StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13503--13513."},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00209"},{"key":"e_1_2_2_58_1","volume-title":"Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988","author":"Poole Ben","year":"2022","unstructured":"Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. 2022. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022)."},{"key":"e_1_2_2_59_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research","volume":"8763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748--8763. https:\/\/proceedings.mlr.press\/v139\/radford21a.html"},{"key":"e_1_2_2_60_1","volume-title":"Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125","author":"Ramesh Aditya","year":"2022","unstructured":"Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022)."},{"key":"e_1_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01219-9_43"},{"key":"e_1_2_2_62_1","volume-title":"International conference on machine learning. PMLR, 1060--1069","author":"Reed Scott","year":"2016","unstructured":"Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. In International conference on machine learning. PMLR, 1060--1069."},{"key":"e_1_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_2_2_64_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2205.11487"},{"key":"e_1_2_2_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01805"},{"key":"e_1_2_2_66_1","volume-title":"International Conference on Machine Learning. PMLR, 2256--2265","author":"Sohl-Dickstein Jascha","year":"2015","unstructured":"Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning. PMLR, 2256--2265."},{"key":"e_1_2_2_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/CSITSS54238.2021.9683460"},{"key":"e_1_2_2_68_1","volume-title":"Stable-dreamfusion: Text-to-3D with Stable-diffusion. https:\/\/github.com\/ashawkey\/stable-dreamfusion.","author":"Tang Jiaxiang","year":"2022","unstructured":"Jiaxiang Tang. 2022. Stable-dreamfusion: Text-to-3D with Stable-diffusion. https:\/\/github.com\/ashawkey\/stable-dreamfusion."},{"key":"e_1_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00618"},{"key":"e_1_2_2_70_1","unstructured":"Kevin Turner. 2022. Decoding Latents to RGB Without Upscaling. https:\/\/discuss.huggingface.co\/t\/decoding-latents-to-rgb-without-upscaling\/23204\/2."},{"key":"e_1_2_2_71_1","first-page":"11287","article-title":"Score-based generative modeling in latent space","volume":"34","author":"Vahdat Arash","year":"2021","unstructured":"Arash Vahdat, Karsten Kreis, and Jan Kautz. 2021. Score-based generative modeling in latent space. Advances in Neural Information Processing Systems 34 (2021), 11287--11302.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_72_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00917"},{"key":"e_1_2_2_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550454.3555445"},{"key":"e_1_2_2_74_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20062-5_16"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3592094","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3592094","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:45Z","timestamp":1750178265000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3592094"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,26]]},"references-count":74,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,8]]}},"alternative-id":["10.1145\/3592094"],"URL":"https:\/\/doi.org\/10.1145\/3592094","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,26]]},"assertion":[{"value":"2023-07-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}