{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T13:49:03Z","timestamp":1774964943887,"version":"3.50.1"},"reference-count":85,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,7,26]],"date-time":"2023-07-26T00:00:00Z","timestamp":1690329600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"BSF","award":["2020280"],"award-info":[{"award-number":["2020280"]}]},{"name":"ISF","award":["2492\/20"],"award-info":[{"award-number":["2492\/20"]}]},{"name":"ISF","award":["3441\/21"],"award-info":[{"award-number":["3441\/21"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2023,8]]},"abstract":"<jats:p>\n            Text-to-image personalization aims to teach a pre-trained diffusion model to reason about novel, user provided concepts, embedding them into new scenes guided by natural language prompts. However, current personalization approaches struggle with lengthy training times, high storage requirements or loss of identity. To overcome these limitations, we propose an encoder-based\n            <jats:italic>domain-tuning<\/jats:italic>\n            approach. Our key insight is that by\n            <jats:italic>underfitting<\/jats:italic>\n            on a large set of concepts from a given domain, we can improve generalization and create a model that is more amenable to quickly adding novel concepts from the same domain. Specifically, we employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain,\n            <jats:italic>e.g.<\/jats:italic>\n            a specific face, and learns to map it into a word-embedding representing the concept. Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively injest additional concepts. Together, these components are used to guide the learning of unseen concepts, allowing us to personalize a model using only a single image and as few as 5 training steps --- accelerating personalization from dozens of minutes to\n            <jats:italic>seconds<\/jats:italic>\n            , while preserving quality.\n          <\/jats:p>\n          <jats:p>Code and trained encoders will be available at our project page.<\/jats:p>","DOI":"10.1145\/3592133","type":"journal-article","created":{"date-parts":[[2023,7,26]],"date-time":"2023-07-26T15:47:45Z","timestamp":1690386465000},"page":"1-13","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":145,"title":["Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models"],"prefix":"10.1145","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4875-965X","authenticated-orcid":false,"given":"Rinon","family":"Gal","sequence":"first","affiliation":[{"name":"Tel Aviv University, Tel Aviv, Israel"},{"name":"NVIDIA Research, Tel Aviv, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8423-3538","authenticated-orcid":false,"given":"Moab","family":"Arar","sequence":"additional","affiliation":[{"name":"Tel Aviv University, Tel Aviv, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3817-3698","authenticated-orcid":false,"given":"Yuval","family":"Atzmon","sequence":"additional","affiliation":[{"name":"NVIDIA Research, Tel Aviv, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3592-1112","authenticated-orcid":false,"given":"Amit H.","family":"Bermano","sequence":"additional","affiliation":[{"name":"Tel Aviv University, Tel Aviv, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9164-5303","authenticated-orcid":false,"given":"Gal","family":"Chechik","sequence":"additional","affiliation":[{"name":"NVIDIA Research, Tel Aviv, Israel"},{"name":"Bar-Ilan University, Tel Aviv, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6777-7445","authenticated-orcid":false,"given":"Daniel","family":"Cohen-Or","sequence":"additional","affiliation":[{"name":"Tel Aviv University, Tel Aviv, Israel"}]}],"member":"320","published-online":{"date-parts":[[2023,7,26]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00453"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00832"},{"key":"e_1_2_2_3_1","volume-title":"ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement. arXiv preprint arXiv:2104.02699","author":"Alaluf Yuval","year":"2021","unstructured":"Yuval Alaluf, Or Patashnik, and Daniel Cohen-Or. 2021a. ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement. arXiv preprint arXiv:2104.02699 (2021)."},{"key":"e_1_2_2_4_1","volume-title":"Bermano","author":"Alaluf Yuval","year":"2021","unstructured":"Yuval Alaluf, Omer Tov, Ron Mokady, Rinon Gal, and Amit H. Bermano. 2021b. HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing. arXiv:2111.15666 [cs.CV]"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240323.3241729"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19784-0_3"},{"key":"e_1_2_2_7_1","unstructured":"Yogesh Balaji Seungjun Nah Xun Huang Arash Vahdat Jiaming Song Karsten Kreis Miika Aittala Timo Aila Samuli Laine Bryan Catanzaro et al. 2022. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324 (2022)."},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3323023"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10639-016-9504-y"},{"key":"e_1_2_2_10_1","volume-title":"Large Scale GAN Training for High Fidelity Natural Image Synthesis. In 7th International Conference on Learning Representations, ICLR 2019","author":"Brock Andrew","year":"2019","unstructured":"Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6--9, 2019. OpenReview.net. https:\/\/openreview.net\/forum?id=B1xsqj09Fm"},{"key":"e_1_2_2_11_1","volume-title":"Efros","author":"Brooks Tim","year":"2023","unstructured":"Tim Brooks, Aleksander Holynski, and Alexei A. Efros. 2023. InstructPix2Pix: Learning to Follow Image Editing Instructions. In CVPR."},{"key":"e_1_2_2_12_1","volume-title":"Gabe Schwartz, Michael Zollhoefer, Shunsuke Saito, Stephen Lombardi, Shih-en Wei, Danielle Belko, Shoou-i Yu, Yaser Sheikh, and Jason Saragih.","author":"Cao Chen","year":"2022","unstructured":"Chen Cao, Tomas Simon, Jin Kyu Kim, Gabe Schwartz, Michael Zollhoefer, Shunsuke Saito, Stephen Lombardi, Shih-en Wei, Danielle Belko, Shoou-i Yu, Yaser Sheikh, and Jason Saragih. 2022. Authentic Volumetric Avatars From a Phone Scan. ACM Trans. Graph. (2022)."},{"key":"e_1_2_2_13_1","volume-title":"Jae Kyeong Kim, and Soung Hie Kim","author":"Cho Yoon Ho","year":"2002","unstructured":"Yoon Ho Cho, Jae Kyeong Kim, and Soung Hie Kim. 2002. A personalized recommender system based on web usage mining and decision tree induction. Expert systems with Applications 23, 3 (2002), 329--342."},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20044-1_32"},{"key":"e_1_2_2_15_1","unstructured":"Katherine Crowson. 2021. VQGAN + CLIP. https:\/\/colab.research.google.com\/drive\/1L8oL-vLJXVcRzCFbPwOoMkPKJ8-aYdPN."},{"key":"e_1_2_2_16_1","first-page":"8780","article-title":"Diffusion models beat gans on image synthesis","volume":"34","author":"Dhariwal Prafulla","year":"2021","unstructured":"Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems 34 (2021), 8780--8794.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01110"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01268"},{"key":"e_1_2_2_19_1","volume-title":"Personalized federated learning: A meta-learning approach. arXiv preprint arXiv:2002.07948","author":"Fallah Alireza","year":"2020","unstructured":"Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. 2020. Personalized federated learning: A meta-learning approach. arXiv preprint arXiv:2002.07948 (2020)."},{"key":"e_1_2_2_20_1","volume-title":"International conference on machine learning. PMLR, 1126--1135","author":"Finn Chelsea","year":"2017","unstructured":"Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning. PMLR, 1126--1135."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19784-0_6"},{"key":"e_1_2_2_22_1","volume-title":"An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618","author":"Gal Rinon","year":"2022","unstructured":"Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. 2022. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618 (2022)."},{"key":"e_1_2_2_23_1","volume-title":"Stylegan-nada: Clip-guided domain adaptation of image generators. arXiv preprint arXiv:2108.00946","author":"Gal Rinon","year":"2021","unstructured":"Rinon Gal, Or Patashnik, Haggai Maron, Gal Chechik, and Daniel Cohen-Or. 2021. Stylegan-nada: Clip-guided domain adaptation of image generators. arXiv preprint arXiv:2108.00946 (2021)."},{"key":"e_1_2_2_24_1","volume-title":"Generative adversarial nets. Advances in neural information processing systems 27","author":"Goodfellow Ian","year":"2014","unstructured":"Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014)."},{"key":"e_1_2_2_25_1","unstructured":"Jinjin Gu Yujun Shen and Bolei Zhou. 2020. Image Processing Using Multi-Code GAN Prior. arXiv:1912.07116 [cs.CV]"},{"key":"e_1_2_2_26_1","volume-title":"arXiv preprint arXiv:1609.09106","author":"Ha David","year":"2016","unstructured":"David Ha, Andrew Dai, and Quoc V Le. 2016. Hypernetworks. arXiv preprint arXiv:1609.09106 (2016)."},{"key":"e_1_2_2_27_1","unstructured":"Amir Hertz Ron Mokady Jay Tenenbaum Kfir Aberman Yael Pritch and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. (2022)."},{"key":"e_1_2_2_28_1","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"Heusel Martin","year":"2017","unstructured":"Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4--9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 6626--6637. https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/8a1d694707eb0fefe65871369074926d-Abstract.html"},{"key":"e_1_2_2_29_1","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840--6851.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_30_1","volume-title":"Workshop on faces in'Real-Life'Images: detection, alignment, and recognition.","author":"Huang Gary B","year":"2008","unstructured":"Gary B Huang, Marwan Mattar, Tamara Berg, and Eric Learned-Miller. 2008. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on faces in'Real-Life'Images: detection, alignment, and recognition."},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00594"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","unstructured":"Gabriel Ilharco Mitchell Wortsman Ross Wightman Cade Gordon Nicholas Carlini Rohan Taori Achal Dave Vaishaal Shankar Hongseok Namkoong John Miller Hannaneh Hajishirzi Ali Farhadi and Ludwig Schmidt. 2021. OpenCLIP. If you use this software please cite it as below.. 10.5281\/zenodo.5143773","DOI":"10.5281\/zenodo.5143773"},{"key":"e_1_2_2_33_1","volume-title":"Improving federated learning personalization via model agnostic meta learning. arXiv preprint arXiv:1909.12488","author":"Jiang Yihan","year":"2019","unstructured":"Yihan Jiang, Jakub Kone\u010dn\u1ef3, Keith Rush, and Sreeram Kannan. 2019. Improving federated learning personalization via model agnostic meta learning. arXiv preprint arXiv:1909.12488 (2019)."},{"key":"e_1_2_2_34_1","volume-title":"Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196","author":"Karras Tero","year":"2017","unstructured":"Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)."},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00453"},{"key":"e_1_2_2_36_1","volume-title":"Imagic: Text-Based Real Image Editing with Diffusion Models. arXiv preprint arXiv:2210.09276","author":"Kawar Bahjat","year":"2022","unstructured":"Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, In-bar Mosseri, and Michal Irani. 2022. Imagic: Text-Based Real Image Editing with Diffusion Models. arXiv preprint arXiv:2210.09276 (2022)."},{"key":"e_1_2_2_37_1","volume-title":"Multi-Concept Customization of Text-to-Image Diffusion. arXiv preprint arXiv:2212.04488","author":"Kumari Nupur","year":"2022","unstructured":"Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. 2022. Multi-Concept Customization of Text-to-Image Diffusion. arXiv preprint arXiv:2212.04488 (2022)."},{"key":"e_1_2_2_38_1","volume-title":"Three approaches for personalization with applications to federated learning. arXiv preprint arXiv:2002.10619","author":"Mansour Yishay","year":"2020","unstructured":"Yishay Mansour, Mehryar Mohri, Jae Ro, and Ananda Theertha Suresh. 2020. Three approaches for personalization with applications to federated learning. arXiv preprint arXiv:2002.10619 (2020)."},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCE.2009.4814447"},{"key":"e_1_2_2_40_1","volume-title":"SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In International Conference on Learning Representations.","author":"Meng Chenlin","year":"2022","unstructured":"Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2022. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In International Conference on Learning Representations."},{"key":"e_1_2_2_41_1","volume-title":"Null-text Inversion for Editing Real Images using Guided Diffusion Models. arXiv preprint arXiv:2211.09794","author":"Mokady Ron","year":"2022","unstructured":"Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Null-text Inversion for Editing Real Images using Guided Diffusion Models. arXiv preprint arXiv:2211.09794 (2022)."},{"key":"e_1_2_2_42_1","volume-title":"Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741","author":"Nichol Alex","year":"2021","unstructured":"Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021)."},{"key":"e_1_2_2_43_1","volume-title":"Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999 2, 3","author":"Nichol Alex","year":"2018","unstructured":"Alex Nichol and John Schulman. 2018. Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999 2, 3 (2018), 4."},{"key":"e_1_2_2_44_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18--24","volume":"8171","author":"Nichol Alexander Quinn","year":"2021","unstructured":"Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved Denoising Diffusion Probabilistic Models. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18--24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8162--8171."},{"key":"e_1_2_2_45_1","volume-title":"MyStyle: A Personalized Generative Prior. arXiv preprint arXiv:2203.17272","author":"Nitzan Yotam","year":"2022","unstructured":"Yotam Nitzan, Kfir Aberman, Qiurui He, Orly Liba, Michal Yarom, Yossi Gandelsman, Inbar Mosseri, Yael Pritch, and Daniel Cohen-Or. 2022. MyStyle: A Personalized Generative Prior. arXiv preprint arXiv:2203.17272 (2022)."},{"key":"e_1_2_2_46_1","volume-title":"LARGE: Latent-Based Regression through GAN Semantics. arXiv:2107.11186 [cs.CV]","author":"Nitzan Yotam","year":"2021","unstructured":"Yotam Nitzan, Rinon Gal, Ofir Brenner, and Daniel Cohen-Or. 2021. LARGE: Latent-Based Regression through GAN Semantics. arXiv:2107.11186 [cs.CV]"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01111"},{"key":"e_1_2_2_48_1","unstructured":"Apolin\u00e1rio Passos and Omar Sanseviero. 2022. HuggingFace concept library. https:\/\/huggingface.co\/sd-dreambooth-library."},{"key":"e_1_2_2_49_1","volume-title":"StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. arXiv preprint arXiv:2103.17249","author":"Patashnik Or","year":"2021","unstructured":"Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. arXiv preprint arXiv:2103.17249 (2021)."},{"key":"e_1_2_2_50_1","unstructured":"Suraj Patil and Pedro Cuenca. 2022. HuggingFace DreamBooth Implementation. https:\/\/huggingface.co\/docs\/diffusers\/training\/dreambooth."},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01411"},{"key":"e_1_2_2_52_1","volume-title":"Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al.","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020 (2021)."},{"key":"e_1_2_2_53_1","volume-title":"International Conference on Machine Learning. PMLR, 5301--5310","author":"Rahaman Nasim","year":"2019","unstructured":"Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, and Aaron Courville. 2019. On the spectral bias of neural networks. In International Conference on Machine Learning. PMLR, 5301--5310."},{"key":"e_1_2_2_54_1","volume-title":"Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125","author":"Ramesh Aditya","year":"2022","unstructured":"Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022)."},{"key":"e_1_2_2_55_1","volume-title":"International Conference on Machine Learning. PMLR, 8821--8831","author":"Ramesh Aditya","year":"2021","unstructured":"Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. In International Conference on Machine Learning. PMLR, 8821--8831."},{"key":"e_1_2_2_56_1","volume-title":"Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. arXiv preprint arXiv:2008.00951","author":"Richardson Elad","year":"2020","unstructured":"Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2020. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. arXiv preprint arXiv:2008.00951 (2020)."},{"key":"e_1_2_2_57_1","volume-title":"Pivotal tuning for latent-based editing of real images. arXiv preprint arXiv:2106.05744","author":"Roich Daniel","year":"2021","unstructured":"Daniel Roich, Ron Mokady, Amit H Bermano, and Daniel Cohen-Or. 2021. Pivotal tuning for latent-based editing of real images. arXiv preprint arXiv:2106.05744 (2021)."},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_2_2_59_1","doi-asserted-by":"crossref","unstructured":"Nataniel Ruiz Yuanzhen Li Varun Jampani Yael Pritch Michael Rubinstein and Kfir Aberman. 2022. DreamBooth: Fine Tuning Text-to-image Diffusion Models for Subject-Driven Generation. (2022).","DOI":"10.1109\/CVPR52729.2023.02155"},{"key":"e_1_2_2_60_1","volume-title":"Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al.","author":"Saharia Chitwan","year":"2022","unstructured":"Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv preprint arXiv:2205.11487 (2022)."},{"key":"e_1_2_2_61_1","volume-title":"Large-scale classification of fine-art paintings: Learning the right metric on the right feature. arXiv preprint arXiv:1505.00855","author":"Saleh Babak","year":"2015","unstructured":"Babak Saleh and Ahmed Elgammal. 2015. Large-scale classification of fine-art paintings: Learning the right metric on the right feature. arXiv preprint arXiv:1505.00855 (2015)."},{"key":"e_1_2_2_62_1","volume-title":"International Conference on Machine Learning. PMLR, 9489--9502","author":"Shamsian Aviv","year":"2021","unstructured":"Aviv Shamsian, Aviv Navon, Ethan Fetaya, and Gal Chechik. 2021. Personalized federated learning using hypernetworks. In International Conference on Machine Learning. PMLR, 9489--9502."},{"key":"e_1_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00926"},{"key":"e_1_2_2_64_1","volume-title":"Closed-Form Factorization of Latent Semantics in GANs. 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Shen Yujun","year":"2020","unstructured":"Yujun Shen and Bolei Zhou. 2020. Closed-Form Factorization of Latent Semantics in GANs. 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 1532--1540."},{"key":"e_1_2_2_65_1","volume-title":"Denoising Diffusion Implicit Models. In International Conference on Learning Representations.","author":"Song Jiaming","year":"2020","unstructured":"Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising Diffusion Implicit Models. In International Conference on Learning Representations."},{"key":"e_1_2_2_66_1","volume-title":"Df-gan: Deep fusion generative adversarial networks for text-to-image synthesis. arXiv preprint arXiv:2008.05865","author":"Tao Ming","year":"2020","unstructured":"Ming Tao, Hao Tang, Songsong Wu, Nicu Sebe, Xiao-Yuan Jing, Fei Wu, and Bingkun Bao. 2020. Df-gan: Deep fusion generative adversarial networks for text-to-image synthesis. arXiv preprint arXiv:2008.05865 (2020)."},{"key":"e_1_2_2_67_1","volume-title":"Designing an Encoder for StyleGAN Image Manipulation. arXiv preprint arXiv:2102.02766","author":"Tov Omer","year":"2021","unstructured":"Omer Tov, Yuval Alaluf, Yotam Nitzan, Or Patashnik, and Daniel Cohen-Or. 2021. Designing an Encoder for StyleGAN Image Manipulation. arXiv preprint arXiv:2102.02766 (2021)."},{"key":"e_1_2_2_68_1","volume-title":"Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation. arXiv preprint arXiv:2211.12572","author":"Tumanyan Narek","year":"2022","unstructured":"Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. 2022. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation. arXiv preprint arXiv:2211.12572 (2022)."},{"key":"e_1_2_2_69_1","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition. 9446--9454","author":"Ulyanov Dmitry","year":"2018","unstructured":"Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2018. Deep image prior. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9446--9454."},{"key":"e_1_2_2_70_1","volume-title":"Neural Discrete Representation Learning. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"van den Oord A\u00e4ron","year":"2017","unstructured":"A\u00e4ron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. 2017. Neural Discrete Representation Learning. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4--9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 6306--6315. https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/7a98af17e63a0ac09ce2e96d03992fbc-Abstract.html"},{"key":"e_1_2_2_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01109"},{"key":"e_1_2_2_72_1","volume-title":"StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation. 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Wu Zongze","year":"2020","unstructured":"Zongze Wu, Dani Lischinski, and Eli Shechtman. 2020. StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation. 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 12858--12867."},{"key":"e_1_2_2_73_1","unstructured":"Zongze Wu Yotam Nitzan Eli Shechtman and Dani Lischinski. 2021. StyleAlign: Analysis and Applications of Aligned StyleGAN Models. arXiv:2110.11323 [cs.CV]"},{"key":"e_1_2_2_74_1","unstructured":"Weihao Xia Yulun Zhang Yujiu Yang Jing-Hao Xue Bolei Zhou and Ming-Hsuan Yang. 2021. GAN Inversion: A Survey. arXiv:2101.05278 [cs.CV]"},{"key":"e_1_2_2_75_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00143"},{"key":"e_1_2_2_76_1","unstructured":"Yinghao Xu Yujun Shen Jiapeng Zhu Ceyuan Yang and Bolei Zhou. 2021. Generative Hierarchical Features from Synthesizing Images. In CVPR."},{"key":"e_1_2_2_77_1","volume-title":"Improving text-to-image synthesis using contrastive learning. arXiv preprint arXiv:2107.02423","author":"Ye Hui","year":"2021","unstructured":"Hui Ye, Xiulong Yang, Martin Takac, Rajshekhar Sunderraman, and Shihao Ji. 2021. Improving text-to-image synthesis using contrastive learning. arXiv preprint arXiv:2107.02423 (2021)."},{"key":"e_1_2_2_78_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_20"},{"key":"e_1_2_2_79_1","volume-title":"Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365","author":"Yu Fisher","year":"2015","unstructured":"Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)."},{"key":"e_1_2_2_80_1","volume-title":"Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, et al.","author":"Yu Jiahui","year":"2022","unstructured":"Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, et al. 2022. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation. arXiv preprint arXiv:2206.10789 (2022)."},{"key":"e_1_2_2_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00089"},{"key":"e_1_2_2_82_1","volume-title":"In-domain gan inversion for real image editing. arXiv preprint arXiv:2004.00049","author":"Zhu Jiapeng","year":"2020","unstructured":"Jiapeng Zhu, Yujun Shen, Deli Zhao, and Bolei Zhou. 2020b. In-domain gan inversion for real image editing. arXiv preprint arXiv:2004.00049 (2020)."},{"key":"e_1_2_2_83_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46454-1_36"},{"key":"e_1_2_2_84_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00595"},{"key":"e_1_2_2_85_1","unstructured":"Peihao Zhu Rameen Abdal Yipeng Qin and Peter Wonka. 2020a. Improved StyleGAN Embedding: Where are the Good Latents? arXiv:2012.09036 [cs.CV]"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3592133","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3592133","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:46Z","timestamp":1750178266000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3592133"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,26]]},"references-count":85,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,8]]}},"alternative-id":["10.1145\/3592133"],"URL":"https:\/\/doi.org\/10.1145\/3592133","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,26]]},"assertion":[{"value":"2023-07-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}