{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,27]],"date-time":"2025-12-27T21:11:51Z","timestamp":1766869911889,"version":"3.44.0"},"reference-count":69,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2023,12,5]],"date-time":"2023-12-05T00:00:00Z","timestamp":1701734400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000781","name":"European Research Council","doi-asserted-by":"publisher","award":["770784"],"award-info":[{"award-number":["770784"]}],"id":[{"id":"10.13039\/501100000781","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2023,12,5]]},"abstract":"<jats:p>Capturing and editing full-head performances enables the creation of virtual characters with various applications such as extended reality and media production. The past few years witnessed a steep rise in the photorealism of human head avatars. Such avatars can be controlled through different input data modalities, including RGB, audio, depth, IMUs, and others. While these data modalities provide effective means of control, they mostly focus on editing the head movements such as the facial expressions, head pose, and\/or camera viewpoint. In this paper, we propose AvatarStudio, a text-based method for editing the appearance of a dynamic full head avatar. Our approach builds on existing work to capture dynamic performances of human heads using Neural Radiance Field (NeRF) and edits this representation with a text-to-image diffusion model. Specifically, we introduce an optimization strategy for incorporating multiple keyframes representing different camera viewpoints and time stamps of a video performance into a single diffusion model. Using this personalized diffusion model, we edit the dynamic NeRF by introducing view-and-time-aware Score Distillation Sampling (VT-SDS) following a model-based guidance approach. Our method edits the full head in a canonical space and then propagates these edits to the remaining time steps via a pre-trained deformation network. We evaluate our method visually and numerically via a user study, and results show that our method outperforms existing approaches. Our experiments validate the design choices of our method and highlight that our edits are genuine, personalized, as well as 3D- and time-consistent.<\/jats:p>","DOI":"10.1145\/3618368","type":"journal-article","created":{"date-parts":[[2023,12,5]],"date-time":"2023-12-05T10:20:48Z","timestamp":1701771648000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":21,"title":["AvatarStudio: Text-Driven Editing of 3D Dynamic Human Head Avatars"],"prefix":"10.1145","volume":"42","author":[{"given":"Mohit","family":"Mendiratta","sequence":"first","affiliation":[{"name":"Max Planck Institute for Informatics and Saarland University, Germany"}]},{"given":"Xingang","family":"Pan","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics and SIC, Germany"}]},{"given":"Mohamed","family":"Elgharib","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics and SIC, Germany"}]},{"given":"Kartik","family":"Teotia","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics and Saarland University, Germany"}]},{"given":"Mallikarjun B","family":"R","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics and Saarland University, Germany"}]},{"given":"Ayush","family":"Tewari","sequence":"additional","affiliation":[{"name":"MIT CSAIL, United States of America"}]},{"given":"Vladislav","family":"Golyanik","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics and SIC, Germany"}]},{"given":"Adam","family":"Kortylewski","sequence":"additional","affiliation":[{"name":"University of Freiburg, Max Planck Institute for Informatics, and SIC, Germany"}]},{"given":"Christian","family":"Theobalt","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics and SIC, Germany"}]}],"member":"320","published-online":{"date-parts":[[2023,12,5]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00453"},{"key":"e_1_2_2_2_1","first-page":"1","article-title":"Metashape python reference","volume":"1","author":"Agisoft LLC","year":"2020","unstructured":"LLC Agisoft. 2020. Metashape python reference. Release 1, 0 (2020), 1--199.","journal-title":"Release"},{"key":"e_1_2_2_3_1","doi-asserted-by":"crossref","unstructured":"Shivangi Aneja Justus Thies Angela Dai and Matthias Nie\u00dfner. 2022. ClipFace: Text-guided Editing of Textured 3D Morphable Models. In ArXiv preprint arXiv:2212.01406.","DOI":"10.1145\/3588432.3591566"},{"key":"e_1_2_2_4_1","doi-asserted-by":"crossref","unstructured":"ShahRukh Athar Zexiang Xu Kalyan Sunkavalli Eli Shechtman and Zhixin Shu. 2022. RigNeRF: Fully Controllable Neural 3D Portraits. In Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52688.2022.01972"},{"key":"e_1_2_2_5_1","unstructured":"Ziqian Bai Feitong Tan Zeng Huang Kripasindhu Sarkar Danhang Tang Di Qiu Abhimitra Meka Ruofei Du Mingsong Dou Sergio Orts-Escolano Rohit Pandey Ping Tan Thabo Beeler Sean Fanello and Yinda Zhang. 2023. Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos. arXiv:2304.01436 [cs.CV]"},{"key":"e_1_2_2_6_1","doi-asserted-by":"crossref","unstructured":"Aayush Bansal Shugao Ma Deva Ramanan and Yaser Sheikh. 2018. Recycle-GAN: Unsupervised Video Retargeting. In ECCV.","DOI":"10.1007\/978-3-030-01228-1_8"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/311535.311556"},{"key":"e_1_2_2_8_1","volume-title":"Efros","author":"Brooks Tim","year":"2023","unstructured":"Tim Brooks, Aleksander Holynski, and Alexei A. Efros. 2023. InstructPix2Pix: Learning to Follow Image Editing Instructions. In CVPR."},{"key":"e_1_2_2_9_1","volume-title":"Lin (Eds.)","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877--1901. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2020\/file\/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf"},{"key":"e_1_2_2_10_1","unstructured":"CompVis. 2022. Stable Diffusion. https:\/\/github.com\/CompVis\/stable-diffusion"},{"key":"e_1_2_2_11_1","volume-title":"Wortman Vaughan (Eds.)","volume":"34","author":"Dhariwal Prafulla","year":"2021","unstructured":"Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 8780--8794. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2021\/file\/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3395208"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073660"},{"key":"e_1_2_2_14_1","doi-asserted-by":"crossref","unstructured":"Guy Gafni Justus Thies Michael Zollh\u00f6fer and Matthias Nie\u00dfner. 2021. Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction. (2021).","DOI":"10.1109\/CVPR46437.2021.00854"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550454.3555501"},{"key":"e_1_2_2_16_1","volume-title":"Neural Head Avatars from Monocular RGB Videos. arXiv preprint arXiv:2112.01554","author":"Grassal Philip-William","year":"2021","unstructured":"Philip-William Grassal, Malte Prinzler, Titus Leistner, Carsten Rother, Matthias Nie\u00dfner, and Justus Thies. 2021. Neural Head Avatars from Monocular RGB Videos. arXiv preprint arXiv:2112.01554 (2021)."},{"key":"e_1_2_2_17_1","doi-asserted-by":"crossref","unstructured":"Ayaan Haque Matthew Tancik Alexei Efros Aleksander Holynski and Angjoo Kanazawa. 2023. Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions. (2023).","DOI":"10.1109\/ICCV51070.2023.01808"},{"key":"e_1_2_2_18_1","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840--6851.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_19_1","volume-title":"Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598","author":"Ho Jonathan","year":"2022","unstructured":"Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022)."},{"key":"e_1_2_2_20_1","doi-asserted-by":"crossref","unstructured":"Ajay Jain Ben Mildenhall Jonathan T. Barron Pieter Abbeel and Ben Poole. 2022. Zero-Shot Text-Guided Object Generation with Dream Fields. (2022).","DOI":"10.1109\/CVPR52688.2022.00094"},{"key":"e_1_2_2_21_1","volume-title":"A Style-Based Generator Architecture for Generative Adversarial Networks. CoRR abs\/1812.04948","author":"Karras Tero","year":"2018","unstructured":"Tero Karras, Samuli Laine, and Timo Aila. 2018. A Style-Based Generator Architecture for Generative Adversarial Networks. CoRR abs\/1812.04948 (2018). arXiv:1812.04948 http:\/\/arxiv.org\/abs\/1812.04948"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201283"},{"key":"e_1_2_2_23_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","unstructured":"Tobias Kirschstein Shenhan Qian Simon Giebenhain TimWalter and Matthias Nie\u00dfner. 2023. NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads. arXiv:2305.03027 [cs.CV] 10.48550\/arXiv.2305.03027","DOI":"10.48550\/arXiv.2305.03027"},{"key":"e_1_2_2_25_1","volume-title":"Countering language drift via visual grounding. arXiv preprint arXiv:1909.04499","author":"Lee Jason","year":"2019","unstructured":"Jason Lee, Kyunghyun Cho, and Douwe Kiela. 2019. Countering language drift via visual grounding. arXiv preprint arXiv:1909.04499 (2019)."},{"key":"e_1_2_2_26_1","volume-title":"Facial Performance Sensing Head-Mounted Display. ACM Transactions on Graphics (Proceedings SIGGRAPH 2015)","author":"Li Hao","year":"2015","unstructured":"Hao Li, Laura Trutoiu, Kyle Olszewski, Lingyu Wei, Tristan Trutna, Pei-Lun Hsieh, Aaron Nicholls, and Chongyang Ma. 2015. Facial Performance Sensing Head-Mounted Display. ACM Transactions on Graphics (Proceedings SIGGRAPH 2015) 34, 4 (July 2015)."},{"key":"e_1_2_2_27_1","volume-title":"Magic3D: High-Resolution Text-to-3D Content Creation. arXiv preprint arXiv:2211.10440","author":"Lin Chen-Hsuan","year":"2022","unstructured":"Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. 2022. Magic3D: High-Resolution Text-to-3D Content Creation. arXiv preprint arXiv:2211.10440 (2022)."},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00865"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201401"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459863"},{"key":"e_1_2_2_31_1","volume-title":"International Conference on Machine Learning. PMLR, 6437--6447","author":"Lu Yuchen","year":"2020","unstructured":"Yuchen Lu, Soumye Singhal, Florian Strub, Aaron Courville, and Olivier Pietquin. 2020. Countering language drift with seeded iterated learning. In International Conference on Machine Learning. PMLR, 6437--6447."},{"key":"e_1_2_2_32_1","first-page":"1","article-title":"PhotoApp: Photorealistic Appearance Editing of Head Portraits","volume":"40","author":"Mallikarjun B R","year":"2021","unstructured":"B R Mallikarjun, Ayush Tewari, Abdallah Dib, Tim Weyrich, Bernd Bickel, Hans-Peter Seidel, Hanspeter Pfister, Wojciech Matusik, Louis Chevallier, Mohamed Elgharib, et al. 2021. PhotoApp: Photorealistic Appearance Editing of Head Portraits. ACM Transactions on Graphics 40, 4 (2021), 1--16.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19784-0_11"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503250"},{"key":"e_1_2_2_35_1","volume-title":"Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784","author":"Mirza Mehdi","year":"2014","unstructured":"Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)."},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530127"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3272127.3275075"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.5244\/C.29.41"},{"key":"e_1_2_2_39_1","volume-title":"Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988","author":"Poole Ben","year":"2022","unstructured":"Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. 2022. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022)."},{"key":"e_1_2_2_40_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research","volume":"8763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748--8763. https:\/\/proceedings.mlr.press\/v139\/radford21a.html"},{"key":"e_1_2_2_41_1","unstructured":"Alec Radford and Karthik Narasimhan. 2018. Improving Language Understanding by Generative Pre-Training."},{"key":"e_1_2_2_42_1","volume-title":"PVA: Pixel-aligned Volumetric Avatars. arXiv:2101.02697 [cs.CV]","author":"Raj Amit","year":"2021","unstructured":"Amit Raj, Michael Zollhoefer, Tomas Simon, Jason Saragih, Shunsuke Saito, James Hays, and Stephen Lombardi. 2021. PVA: Pixel-aligned Volumetric Avatars. arXiv:2101.02697 [cs.CV]"},{"key":"e_1_2_2_43_1","unstructured":"Aditya Ramesh Prafulla Dhariwal Alex Nichol Casey Chu and Mark Chen. 2022. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv:2204.06125 [cs.CV]"},{"key":"e_1_2_2_44_1","unstructured":"Aditya Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen and Ilya Sutskever. 2021. Zero-Shot Text-to-Image Generation. arXiv:2102.12092 [cs.CV]"},{"key":"e_1_2_2_45_1","volume-title":"Jen-Hao Rick Chang, and Oncel Tuzel.","author":"Ranjan Anurag","year":"2023","unstructured":"Anurag Ranjan, Kwang Moo Yi, Jen-Hao Rick Chang, and Oncel Tuzel. 2023. FaceLit: Neural 3D Relightable Faces. In CVPR. https:\/\/arxiv.org\/abs\/2303.15437"},{"key":"e_1_2_2_46_1","unstructured":"Pramod Rao Mallikarjun B R Gereon Fox Tim Weyrich Bernd Bickel Hans-Peter Seidel Hanspeter Pfister Wojciech Matusik Ayush Tewari Christian Theobalt and Mohamed Elgharib. 2022. VoRF: Volumetric Relightable Faces. (2022)."},{"key":"e_1_2_2_47_1","doi-asserted-by":"crossref","unstructured":"Robin Rombach Andreas Blattmann Dominik Lorenz Patrick Esser and Bj\u00f6rn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. In Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_2_2_48_1","volume-title":"Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. arXiv preprint arXiv:2208.12242","author":"Ruiz Nataniel","year":"2022","unstructured":"Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2022. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. arXiv preprint arXiv:2208.12242 (2022)."},{"key":"e_1_2_2_49_1","unstructured":"Christoph Schuhmann Romain Beaumont Richard Vencu Cade Gordon Ross Wightman Mehdi Cherti Theo Coombes Aarush Katta Clayton Mullis Mitchell Wortsman et al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402 (2022)."},{"key":"e_1_2_2_50_1","doi-asserted-by":"crossref","unstructured":"Ahmed Selim Mohamed Elgharib and Linda Doyle. 2016. Painting Style Transfer for Head Portraits using Convolutional Neural Networks. (2016) 129:1--129:18.","DOI":"10.1145\/2897824.2925968"},{"key":"e_1_2_2_51_1","volume-title":"First Order Motion Model for Image Animation. In Conference on Neural Information Processing Systems (NeurIPS).","author":"Siarohin Aliaksandr","year":"2019","unstructured":"Aliaksandr Siarohin, St\u00e9phane Lathuili\u00e8re, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019. First Order Motion Model for Image Animation. In Conference on Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3323008"},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073640"},{"key":"e_1_2_2_54_1","unstructured":"Feitong Tan Sean Fanello Abhimitra Meka Sergio Orts-Escolano Danhang Tang Rohit Pandey Jonathan Taylor Ping Tan and Yinda Zhang. 2022. VoLux-GAN: A Generative Model for 3D Face Synthesis with HDRI Relighting. arXiv:2201.04873 [cs.CV]"},{"key":"e_1_2_2_55_1","doi-asserted-by":"crossref","unstructured":"Kartik Teotia Xingang Pan Hyeongwoo Kim Pablo Garrido Mohamed Elgharib Christian Theobalt et al. 2023. HQ3DAvatar: High Quality Controllable 3D Head Avatar. arXiv preprint arXiv:2303.14471 (2023).","DOI":"10.1145\/3649889"},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3414685.3417803"},{"key":"e_1_2_2_57_1","volume-title":"Neural Voice Puppetry: Audio-driven Facial Reenactment. ECCV 2020","author":"Thies Justus","year":"2020","unstructured":"Justus Thies, Mohamed Elgharib, Ayush Tewari, Christian Theobalt, and Matthias Nie\u00dfner. 2020. Neural Voice Puppetry: Audio-driven Facial Reenactment. ECCV 2020 (2020)."},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3323035"},{"key":"e_1_2_2_59_1","unstructured":"Aaron Van Den Oord Oriol Vinyals et al. 2017. Neural discrete representation learning. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_2_2_60_1","volume-title":"\u0141 ukasz Kaiser, and Illia Polosukhin","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"},{"key":"e_1_2_2_61_1","volume-title":"CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields. arXiv preprint arXiv:2112.05139","author":"Wang Can","year":"2021","unstructured":"Can Wang, Menglei Chai, Mingming He, Dongdong Chen, and Jing Liao. 2021b. CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields. arXiv preprint arXiv:2112.05139 (2021)."},{"key":"e_1_2_2_62_1","volume-title":"NeRF-Art: Text-Driven Neural Radiance Fields Stylization. arXiv preprint arXiv:2212.08070","author":"Wang Can","year":"2022","unstructured":"Can Wang, Ruixiang Jiang, Menglei Chai, Mingming He, Dongdong Chen, and Jing Liao. 2022. NeRF-Art: Text-Driven Neural Radiance Fields Stylization. arXiv preprint arXiv:2212.08070 (2022)."},{"key":"e_1_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00565"},{"key":"e_1_2_2_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3323030"},{"key":"e_1_2_2_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550454.3555437"},{"key":"e_1_2_2_66_1","doi-asserted-by":"crossref","unstructured":"Egor Zakharov Aliaksandra Shysheya Egor Burkov and Victor Lempitsky. 2019. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models. arXiv:1905.08233 [cs.CV]","DOI":"10.1109\/ICCV.2019.00955"},{"key":"e_1_2_2_67_1","volume-title":"SINE: SINgle Image Editing with Text-to-Image Diffusion Models. arXiv preprint arXiv:2212.04489","author":"Zhang Zhixing","year":"2022","unstructured":"Zhixing Zhang, Ligong Han, Arnab Ghosh, Dimitris Metaxas, and Jian Ren. 2022. SINE: SINgle Image Editing with Text-to-Image Diffusion Models. arXiv preprint arXiv:2212.04489 (2022)."},{"key":"e_1_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01318"},{"key":"e_1_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00729"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3618368","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3618368","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T10:45:50Z","timestamp":1755773150000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3618368"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,5]]},"references-count":69,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,12,5]]}},"alternative-id":["10.1145\/3618368"],"URL":"https:\/\/doi.org\/10.1145\/3618368","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"type":"print","value":"0730-0301"},{"type":"electronic","value":"1557-7368"}],"subject":[],"published":{"date-parts":[[2023,12,5]]},"assertion":[{"value":"2023-12-05","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}