{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T19:45:12Z","timestamp":1776887112154,"version":"3.51.2"},"publisher-location":"New York, NY, USA","reference-count":50,"publisher":"ACM","funder":[{"DOI":"10.13039\/501100004375","name":"Tel Aviv University","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004375","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,8,10]]},"DOI":"10.1145\/3721238.3730624","type":"proceedings-article","created":{"date-parts":[[2025,7,23]],"date-time":"2025-07-23T08:42:43Z","timestamp":1753260163000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["IP-Composer: Semantic Composition of Visual Concepts"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-6400-9700","authenticated-orcid":false,"given":"Sara","family":"Dorfman","sequence":"first","affiliation":[{"name":"Tel Aviv University, Tel Aviv, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-9464-8273","authenticated-orcid":false,"given":"Dana","family":"Cohen-Bar","sequence":"additional","affiliation":[{"name":"Tel Aviv University, Tel Aviv, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4875-965X","authenticated-orcid":false,"given":"Rinon","family":"Gal","sequence":"additional","affiliation":[{"name":"NVIDIA Research, Tel Aviv, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6777-7445","authenticated-orcid":false,"given":"Daniel","family":"Cohen-Or","sequence":"additional","affiliation":[{"name":"Tel Aviv University, Tel Aviv, Israel"}]}],"member":"320","published-online":{"date-parts":[[2025,7,27]]},"reference":[{"key":"e_1_3_3_2_2_1","doi-asserted-by":"crossref","unstructured":"Rameen Abdal Peihao Zhu John Femiani Niloy\u00a0J Mitra and Peter Wonka. 2021. Clip2stylegan: Unsupervised extraction of stylegan edit directions. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2112.05219 (2021).","DOI":"10.1145\/3528233.3530747"},{"key":"e_1_3_3_2_3_1","unstructured":"Pranav Aggarwal Hareesh Ravi Naveen Marri Sachin Kelkar Fengbin Chen Vinh Khuc Midhun Harikumar Ritiz Tambi Sudharshan\u00a0Reddy Kakumanu Purvak Lapsiya Alvin Ghouas Sarah Saber Malavika Ramprasad Baldo Faieta and Ajinkya Kale. 2023. Controlled and Conditional Text to Image Generation with Diffusion Prior. arxiv:https:\/\/arXiv.org\/abs\/2302.11710\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2302.11710"},{"key":"e_1_3_3_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3610548.3618173"},{"key":"e_1_3_3_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01762"},{"key":"e_1_3_3_2_6_1","unstructured":"Yogesh Balaji Seungjun Nah Xun Huang Arash Vahdat Jiaming Song Karsten Kreis Miika Aittala Timo Aila Samuli Laine Bryan Catanzaro Tero Karras and Ming-Yu Liu. 2022. eDiff-I: Text-to-Image Diffusion Models with Ensemble of Expert Denoisers. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2211.01324 (2022)."},{"key":"e_1_3_3_2_7_1","unstructured":"Stefan\u00a0Andreas Baumann Felix Krause Michael Neumayr Nick Stracke Vincent\u00a0Tao Hu and Bj\u00f6rn Ommer. 2024. Continuous Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions. arxiv:https:\/\/arXiv.org\/abs\/2403.17064\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2403.17064"},{"key":"e_1_3_3_2_8_1","unstructured":"Shariq\u00a0Farooq Bhat Niloy\u00a0J. Mitra and Peter Wonka. 2023. LooseControl: Lifting ControlNet for Generalized Depth Conditioning. arxiv:https:\/\/arXiv.org\/abs\/2312.03079\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2312.03079"},{"key":"e_1_3_3_2_9_1","unstructured":"Hila Chefer Oran Lang Mor Geva Volodymyr Polosukhin Assaf Shocher Michal Irani Inbar Mosseri and Lior Wolf. 2023. The hidden language of diffusion models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2306.00966 (2023)."},{"key":"e_1_3_3_2_10_1","unstructured":"Guillaume Couairon Jakob Verbeek Holger Schwenk and Matthieu Cord. 2022. Diffedit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2210.11427 (2022)."},{"key":"e_1_3_3_2_11_1","unstructured":"Omer Dahary Or Patashnik Kfir Aberman and Daniel Cohen-Or. 2024. Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation. arxiv:https:\/\/arXiv.org\/abs\/2403.16990\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2403.16990"},{"key":"e_1_3_3_2_12_1","unstructured":"Rinon Gal Yuval Alaluf Yuval Atzmon Or Patashnik Amit\u00a0H. Bermano Gal Chechik and Daniel Cohen-Or. 2022. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. https:\/\/doi.org\/10.48550\/ARXIV.2208.01618"},{"key":"e_1_3_3_2_13_1","doi-asserted-by":"crossref","unstructured":"Rinon Gal Moab Arar Yuval Atzmon Amit\u00a0H Bermano Gal Chechik and Daniel Cohen-Or. 2023. Encoder-based domain tuning for fast personalization of text-to-image models. ACM Transactions on Graphics (TOG) 42 4 (2023) 1\u201313.","DOI":"10.1145\/3592133"},{"key":"e_1_3_3_2_14_1","unstructured":"Rinon Gal Or Patashnik Haggai Maron Gal Chechik and Daniel Cohen-Or. 2021. StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators. arxiv:https:\/\/arXiv.org\/abs\/2108.00946\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2108.00946"},{"key":"e_1_3_3_2_15_1","unstructured":"Yossi Gandelsman Alexei\u00a0A. Efros and Jacob Steinhardt. 2024. Interpreting CLIP\u2019s Image Representation via Text-Based Decomposition. arxiv:https:\/\/arXiv.org\/abs\/2310.05916\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2310.05916"},{"key":"e_1_3_3_2_16_1","unstructured":"Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014)."},{"key":"e_1_3_3_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657444"},{"key":"e_1_3_3_2_18_1","unstructured":"Erik H\u00e4rk\u00f6nen Aaron Hertzmann Jaakko Lehtinen and Sylvain Paris. 2020. GANSpace: Discovering Interpretable GAN Controls. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2004.02546 (2020)."},{"key":"e_1_3_3_2_19_1","unstructured":"Jonathan Ho Ajay Jain and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020) 6840\u20136851."},{"key":"e_1_3_3_2_20_1","unstructured":"Lianghua Huang Di Chen Yu Liu Yujun Shen Deli Zhao and Jingren Zhou. 2023. Composer: Creative and Controllable Image Synthesis with Composable Conditions. arxiv:https:\/\/arXiv.org\/abs\/2302.09778\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2302.09778"},{"key":"e_1_3_3_2_21_1","volume-title":"OpenCLIP","author":"Ilharco Gabriel","year":"2021","unstructured":"Gabriel Ilharco, Mitchell Wortsman, Ross Wightman, Cade Gordon, Nicholas Carlini, Rohan Taori, Achal Dave, Vaishaal Shankar, Hongseok Namkoong, John Miller, Hannaneh Hajishirzi, Ali Farhadi, and Ludwig Schmidt. 2021. OpenCLIP. https:\/\/doi.org\/10.5281\/zenodo.5143773 If you use this software, please cite it as below.."},{"key":"e_1_3_3_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00453"},{"key":"e_1_3_3_2_23_1","unstructured":"Sharon Lee Yunzhi Zhang Shangzhe Wu and Jiajun Wu. 2024. Language-Informed Visual Concept Learning. arxiv:https:\/\/arXiv.org\/abs\/2312.03587\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2312.03587"},{"key":"e_1_3_3_2_24_1","unstructured":"Yuheng Li Haotian Liu Qingyang Wu Fangzhou Mu Jianwei Yang Jianfeng Gao Chunyuan Li and Yong\u00a0Jae Lee. 2023. GLIGEN: Open-Set Grounded Text-to-Image Generation. arxiv:https:\/\/arXiv.org\/abs\/2301.07093\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2301.07093"},{"key":"e_1_3_3_2_25_1","unstructured":"Haotian Liu Chunyuan Li Yuheng Li and Yong\u00a0Jae Lee. 2023b. Improved Baselines with Visual Instruction Tuning."},{"key":"e_1_3_3_2_26_1","unstructured":"Nan Liu Shuang Li Yilun Du Antonio Torralba and Joshua\u00a0B. Tenenbaum. 2023a. Compositional Visual Generation with Composable Diffusion Models. arxiv:https:\/\/arXiv.org\/abs\/2206.01714\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2206.01714"},{"key":"e_1_3_3_2_27_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i5.28226"},{"key":"e_1_3_3_2_28_1","unstructured":"Alex Nichol Prafulla Dhariwal Aditya Ramesh Pranav Shyam Pamela Mishkin Bob McGrew Ilya Sutskever and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2112.10741 (2021)."},{"key":"e_1_3_3_2_29_1","unstructured":"OpenAI. 2022. ChatGPT. https:\/\/chat.openai.com\/. Accessed: 2023-10-15."},{"key":"e_1_3_3_2_30_1","doi-asserted-by":"crossref","unstructured":"Or Patashnik Zongze Wu Eli Shechtman Daniel Cohen-Or and Dani Lischinski. 2021. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2103.17249 (2021).","DOI":"10.1109\/ICCV48922.2021.00209"},{"key":"e_1_3_3_2_31_1","volume-title":"The Twelfth International Conference on Learning Representations","author":"Podell Dustin","year":"2024","unstructured":"Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M\u00fcller, Joe Penna, and Robin Rombach. 2024. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=di52zR8xgf"},{"key":"e_1_3_3_2_32_1","first-page":"8748","volume-title":"International Conference on Machine Learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong\u00a0Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748\u20138763."},{"key":"e_1_3_3_2_33_1","unstructured":"Aditya Ramesh Prafulla Dhariwal Alex Nichol Casey Chu and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2204.06125 (2022)."},{"key":"e_1_3_3_2_34_1","unstructured":"Elad Richardson Yuval Alaluf Ali Mahdavi-Amiri and Daniel Cohen-Or. 2024. pOps: Photo-Inspired Diffusion Operators. arxiv:https:\/\/arXiv.org\/abs\/2406.01300\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2406.01300"},{"key":"e_1_3_3_2_35_1","unstructured":"Robin Rombach Andreas Blattmann Dominik Lorenz Patrick Esser and Bj\u00f6rn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arxiv:https:\/\/arXiv.org\/abs\/2112.10752\u00a0[cs.CV]"},{"key":"e_1_3_3_2_36_1","doi-asserted-by":"crossref","unstructured":"Nataniel Ruiz Yuanzhen Li Varun Jampani Yael Pritch Michael Rubinstein and Kfir Aberman. 2022. DreamBooth: Fine Tuning Text-to-image Diffusion Models for Subject-Driven Generation. (2022).","DOI":"10.1109\/CVPR52729.2023.02155"},{"key":"e_1_3_3_2_37_1","doi-asserted-by":"crossref","unstructured":"Nataniel Ruiz Yuanzhen Li Varun Jampani Wei Wei Tingbo Hou Yael Pritch Neal Wadhwa Michael Rubinstein and Kfir Aberman. 2023. HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models. arxiv:https:\/\/arXiv.org\/abs\/2307.06949\u00a0[cs.CV]","DOI":"10.1109\/CVPR52733.2024.00624"},{"key":"e_1_3_3_2_38_1","unstructured":"Chitwan Saharia William Chan Saurabh Saxena Lala Li Jay Whang Emily Denton Seyed Kamyar\u00a0Seyed Ghasemipour Burcu\u00a0Karagol Ayan S\u00a0Sara Mahdavi Rapha\u00a0Gontijo Lopes et\u00a0al. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2205.11487 (2022)."},{"key":"e_1_3_3_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00926"},{"key":"e_1_3_3_2_40_1","unstructured":"Yujun Shen and Bolei Zhou. 2021. Closed-Form Factorization of Latent Semantics in GANs. arxiv:https:\/\/arXiv.org\/abs\/2007.06600\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2007.06600"},{"key":"e_1_3_3_2_41_1","doi-asserted-by":"crossref","unstructured":"Yael Vinker Andrey Voynov Daniel Cohen-Or and Ariel Shamir. 2023. Concept decomposition for visual exploration and inspiration. ACM Transactions on Graphics (TOG) 42 6 (2023) 1\u201313.","DOI":"10.1145\/3618315"},{"key":"e_1_3_3_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591560"},{"key":"e_1_3_3_2_43_1","first-page":"9786","volume-title":"International conference on machine learning","author":"Voynov Andrey","year":"2020","unstructured":"Andrey Voynov and Artem Babenko. 2020. Unsupervised discovery of interpretable directions in the gan latent space. In International conference on machine learning. PMLR, 9786\u20139796."},{"key":"e_1_3_3_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01461"},{"key":"e_1_3_3_2_45_1","unstructured":"Hu Ye Jun Zhang Sibo Liu Xiao Han and Wei Yang. 2023. Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2308.06721 (2023)."},{"key":"e_1_3_3_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"e_1_3_3_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"e_1_3_3_2_48_1","unstructured":"Yuxin Zhang Weiming Dong Fan Tang Nisha Huang Haibin Huang Chongyang Ma Tong-Yee Lee Oliver Deussen and Changsheng Xu. 2023a. ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models. arxiv:https:\/\/arXiv.org\/abs\/2305.16225\u00a0[cs.GR] https:\/\/arxiv.org\/abs\/2305.16225"},{"key":"e_1_3_3_2_49_1","unstructured":"Shihao Zhao Dongdong Chen Yen-Chun Chen Jianmin Bao Shaozhe Hao Lu Yuan and Kwan-Yee\u00a0K. Wong. 2023. Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models. arxiv:https:\/\/arXiv.org\/abs\/2305.16322\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2305.16322"},{"key":"e_1_3_3_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02154"},{"key":"e_1_3_3_2_51_1","unstructured":"Chenyi Zhuang Ying Hu and Pan Gao. 2024. Magnet: We Never Know How Text-to-Image Diffusion Models Work Until We Learn How Vision-Language Models Function. arxiv:https:\/\/arXiv.org\/abs\/2409.19967\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2409.19967"}],"event":{"name":"SIGGRAPH Conference Papers '25: Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers","location":"Vancouver BC Canada","acronym":"SIGGRAPH Conference Papers '25","sponsor":["SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques"]},"container-title":["Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3721238.3730624","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T15:00:46Z","timestamp":1774018846000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3721238.3730624"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,27]]},"references-count":50,"alternative-id":["10.1145\/3721238.3730624","10.1145\/3721238"],"URL":"https:\/\/doi.org\/10.1145\/3721238.3730624","relation":{},"subject":[],"published":{"date-parts":[[2025,7,27]]},"assertion":[{"value":"2025-07-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}