{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T16:43:44Z","timestamp":1777567424354,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":55,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,8,10]]},"DOI":"10.1145\/3721238.3730747","type":"proceedings-article","created":{"date-parts":[[2025,7,23]],"date-time":"2025-07-23T08:42:43Z","timestamp":1753260163000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-4414-4457","authenticated-orcid":false,"given":"Aleksandar","family":"Cvejic","sequence":"first","affiliation":[{"name":"King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3292-7153","authenticated-orcid":false,"given":"Abdelrahman","family":"Eldesokey","sequence":"additional","affiliation":[{"name":"King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0627-9746","authenticated-orcid":false,"given":"Peter","family":"Wonka","sequence":"additional","affiliation":[{"name":"King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,7,27]]},"reference":[{"key":"e_1_3_3_3_2_1","unstructured":"Alex Andonian Sabrina Osmany Audrey Cui YeonHwan Park Ali Jahanian Antonio Torralba and David Bau. 2023. Paint by Word. arxiv:https:\/\/arXiv.org\/abs\/2103.10951\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2103.10951"},{"key":"e_1_3_3_3_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3610548.3618154"},{"key":"e_1_3_3_3_4_1","doi-asserted-by":"crossref","unstructured":"Omri Avrahami Ohad Fried and Dani Lischinski. 2023b. Blended Latent Diffusion. ACM Trans. Graph. 42 4 Article 149 (jul 2023) 11\u00a0pages. https:\/\/doi.org\/10.1145\/3592450","DOI":"10.1145\/3592450"},{"key":"e_1_3_3_3_5_1","unstructured":"Omri Avrahami Or Patashnik Ohad Fried Egor Nemchinov Kfir Aberman Dani Lischinski and Daniel Cohen-Or. 2024. Stable Flow: Vital Layers for Training-Free Image Editing. arxiv:https:\/\/arXiv.org\/abs\/2411.14430\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2411.14430"},{"key":"e_1_3_3_3_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19784-0_41"},{"key":"e_1_3_3_3_7_1","unstructured":"Black Forest Labs. 2024. Flux. https:\/\/github.com\/black-forest-labs\/flux."},{"key":"e_1_3_3_3_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00846"},{"key":"e_1_3_3_3_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01764"},{"key":"e_1_3_3_3_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02062"},{"key":"e_1_3_3_3_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00630"},{"key":"e_1_3_3_3_12_1","unstructured":"Xi Chen Zhifei Zhang He Zhang Yuqian Zhou Soo\u00a0Ye Kim Qing Liu Yijun Li Jianming Zhang Nanxuan Zhao Yilin Wang et\u00a0al. 2024b. UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2412.07774 (2024)."},{"key":"e_1_3_3_3_13_1","doi-asserted-by":"crossref","unstructured":"Gilad Deutch Rinon Gal Daniel Garibi Or Patashnik and Daniel Cohen-Or. 2024. TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models. arxiv:https:\/\/arXiv.org\/abs\/2408.00735\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2408.00735","DOI":"10.1145\/3680528.3687612"},{"key":"e_1_3_3_3_14_1","doi-asserted-by":"crossref","unstructured":"Ivan Donadello and Luciano Serafini. 2016. Integration of numeric and symbolic information for semantic image interpretation. Intelligenza Artificiale 10 1 (2016) 33\u201347.","DOI":"10.3233\/IA-160093"},{"key":"e_1_3_3_3_15_1","doi-asserted-by":"crossref","unstructured":"Dave Epstein Allan Jabri Ben Poole Alexei Efros and Aleksander Holynski. 2023. Diffusion self-guidance for controllable image generation. Advances in Neural Information Processing Systems 36 (2023) 16222\u201316239.","DOI":"10.52202\/075280-0714"},{"key":"e_1_3_3_3_16_1","volume-title":"Forty-first International Conference on Machine Learning","author":"Esser Patrick","year":"2024","unstructured":"Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M\u00fcller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et\u00a0al. 2024. Scaling rectified flow transformers for high-resolution image synthesis. In Forty-first International Conference on Machine Learning."},{"key":"e_1_3_3_3_17_1","unstructured":"Rinon Gal Yuval Alaluf Yuval Atzmon Or Patashnik Amit\u00a0H. Bermano Gal Chechik and Daniel Cohen-Or. 2022. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. https:\/\/doi.org\/10.48550\/ARXIV.2208.01618"},{"key":"e_1_3_3_3_18_1","first-page":"395","volume-title":"European Conference on Computer Vision","author":"Garibi Daniel","year":"2024","unstructured":"Daniel Garibi, Or Patashnik, Andrey Voynov, Hadar Averbuch-Elor, and Daniel Cohen-Or. 2024. Renoise: Real image inversion through iterative noising. In European Conference on Computer Vision. Springer, 395\u2013413."},{"key":"e_1_3_3_3_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20074-8_8"},{"key":"e_1_3_3_3_20_1","unstructured":"Eric Hedlin Gopal Sharma Shweta Mahajan Xingzhe He Hossam Isack Abhishek Kar\u00a0Helge Rhodin Andrea Tagliasacchi and Kwang\u00a0Moo Yi. 2023. Unsupervised Keypoints from Pretrained Diffusion Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2312.00065 (2023)."},{"key":"e_1_3_3_3_21_1","unstructured":"Amir Hertz Ron Mokady Jay Tenenbaum Kfir Aberman Yael Pritch and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2208.01626 (2022)."},{"key":"e_1_3_3_3_22_1","unstructured":"Amir Hertz Andrey Voynov Shlomi Fruchter and Daniel Cohen-Or. 2023. Style aligned image generation via shared attention. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2312.02133 (2023)."},{"key":"e_1_3_3_3_23_1","unstructured":"Yi Huang Jiancheng Huang Yifan Liu Mingfu Yan Jiaxi Lv Jianzhuang Liu Wei Xiong He Zhang Shifeng Chen and Liangliang Cao. 2024. Diffusion model-based image editing: A survey. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.17525 (2024)."},{"key":"e_1_3_3_3_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01185"},{"key":"e_1_3_3_3_25_1","first-page":"150","volume-title":"European Conference on Computer Vision","author":"Ju Xuan","year":"2024","unstructured":"Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, and Qiang Xu. 2024a. Brushnet: A plug-and-play image inpainting model with decomposed dual-branch diffusion. In European Conference on Computer Vision. Springer, 150\u2013168."},{"key":"e_1_3_3_3_26_1","unstructured":"Xuan Ju Ailing Zeng Yuxuan Bian Shaoteng Liu and Qiang Xu. 2024b. PnP Inversion: Boosting Diffusion-based Editing with 3 Lines of Code. International Conference on Learning Representations (ICLR) (2024)."},{"key":"e_1_3_3_3_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00582"},{"key":"e_1_3_3_3_28_1","volume-title":"The Twelfth International Conference on Learning Representations","author":"Khani Aliasghar","year":"2024","unstructured":"Aliasghar Khani, Saeid Asgari, Aditya Sanghi, Ali\u00a0Mahdavi Amiri, and Ghassan Hamarneh. 2024. SLiMe: Segment Like Me. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=7FeIRqCedv"},{"key":"e_1_3_3_3_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"e_1_3_3_3_30_1","first-page":"19730","volume-title":"International conference on machine learning","author":"Li Junnan","year":"2023","unstructured":"Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning. PMLR, 19730\u201319742."},{"key":"e_1_3_3_3_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00825"},{"key":"e_1_3_3_3_32_1","unstructured":"Weifeng Lin Xinyu Wei Renrui Zhang Le Zhuo Shitian Zhao Siyuan Huang Huan Teng Junlin Xie Yu Qiao Peng Gao et\u00a0al. 2024. Pixwizard: Versatile image-to-image visual assistant with open-language instructions. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2409.15278 (2024)."},{"key":"e_1_3_3_3_33_1","unstructured":"Yuanze Lin Yi-Wen Chen Yi-Hsuan Tsai Lu Jiang and Ming-Hsuan Yang. 2023. Text-Driven Image Editing via Learnable Regions. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2311.16432 (2023)."},{"key":"e_1_3_3_3_34_1","unstructured":"Shilong Liu Zhaoyang Zeng Tianhe Ren Feng Li Hao Zhang Jie Yang Chunyuan Li Jianwei Yang Hang Su Jun Zhu et\u00a0al. 2023. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2303.05499 (2023)."},{"key":"e_1_3_3_3_35_1","doi-asserted-by":"crossref","unstructured":"Pablo Marcos-Manch\u00f3n Roberto Alcover-Couso Juan\u00a0C SanMiguel and Jose\u00a0M Mart\u00ednez. 2024. Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2403.14291 (2024).","DOI":"10.1109\/CVPR52733.2024.00883"},{"key":"e_1_3_3_3_36_1","unstructured":"Chenlin Meng Yutong He Yang Song Jiaming Song Jiajun Wu Jun-Yan Zhu and Stefano Ermon. 2021. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2108.01073 (2021)."},{"key":"e_1_3_3_3_37_1","unstructured":"Alex Nichol Prafulla Dhariwal Aditya Ramesh Pranav Shyam Pamela Mishkin Bob McGrew Ilya Sutskever and Mark Chen. 2022. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. arxiv:https:\/\/arXiv.org\/abs\/2112.10741\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2112.10741"},{"key":"e_1_3_3_3_38_1","doi-asserted-by":"crossref","unstructured":"Nobuyuki Otsu et\u00a0al. 1975. A threshold selection method from gray-level histograms. Automatica 11 285-296 (1975) 23\u201327.","DOI":"10.1016\/0005-1098(75)90044-8"},{"key":"e_1_3_3_3_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591513"},{"key":"e_1_3_3_3_40_1","volume-title":"The Twelfth International Conference on Learning Representations","author":"Podell Dustin","year":"2024","unstructured":"Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M\u00fcller, Joe Penna, and Robin Rombach. 2024. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=di52zR8xgf"},{"key":"e_1_3_3_3_41_1","first-page":"8748","volume-title":"International conference on machine learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong\u00a0Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748\u20138763."},{"key":"e_1_3_3_3_42_1","unstructured":"Aditya Ramesh Prafulla Dhariwal Alex Nichol Casey Chu and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2204.06125 1 2 (2022) 3."},{"key":"e_1_3_3_3_43_1","unstructured":"Tianhe Ren Shilong Liu Ailing Zeng Jing Lin Kunchang Li He Cao Jiayu Chen Xinyu Huang Yukang Chen Feng Yan Zhaoyang Zeng Hao Zhang Feng Li Jie Yang Hongyang Li Qing Jiang and Lei Zhang. 2024. Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks. arxiv:https:\/\/arXiv.org\/abs\/2401.14159\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2401.14159"},{"key":"e_1_3_3_3_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_3_3_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00661"},{"key":"e_1_3_3_3_46_1","doi-asserted-by":"crossref","unstructured":"Chitwan Saharia William Chan Saurabh Saxena Lala Li Jay Whang Emily\u00a0L Denton Kamyar Ghasemipour Raphael Gontijo\u00a0Lopes Burcu Karagol\u00a0Ayan Tim Salimans et\u00a0al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022) 36479\u201336494.","DOI":"10.52202\/068431-2643"},{"key":"e_1_3_3_3_47_1","doi-asserted-by":"crossref","unstructured":"Christoph Schuhmann Romain Beaumont Richard Vencu Cade Gordon Ross Wightman Mehdi Cherti Theo Coombes Aarush Katta Clayton Mullis Mitchell Wortsman et\u00a0al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems 35 (2022) 25278\u201325294.","DOI":"10.52202\/068431-1833"},{"key":"e_1_3_3_3_48_1","unstructured":"Jiaming Song Chenlin Meng and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2010.02502 (2020)."},{"key":"e_1_3_3_3_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01237"},{"key":"e_1_3_3_3_50_1","doi-asserted-by":"crossref","unstructured":"Luming Tang Nataniel Ruiz Qinghao Chu Yuanzhen Li Aleksander Holynski David\u00a0E Jacobs Bharath Hariharan Yael Pritch Neal Wadhwa Kfir Aberman et\u00a0al. 2024. Realfill: Reference-driven generation for authentic image completion. ACM Transactions on Graphics (TOG) 43 4 (2024) 1\u201312.","DOI":"10.1145\/3658237"},{"key":"e_1_3_3_3_51_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.310"},{"key":"e_1_3_3_3_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00191"},{"key":"e_1_3_3_3_53_1","doi-asserted-by":"crossref","unstructured":"Dani Valevski Matan Kalman Eyal Molad Eyal Segalis Yossi Matias and Yaniv Leviathan. 2023. Unitune: Text-driven image editing by fine tuning a diffusion model on a single image. ACM Transactions on Graphics (TOG) 42 4 (2023) 1\u201310.","DOI":"10.1145\/3592451"},{"key":"e_1_3_3_3_54_1","doi-asserted-by":"crossref","unstructured":"Thomas Wolf Lysandre Debut Victor Sanh Julien Chaumond Clement Delangue Anthony Moi Pierric Cistac Tim Rault R\u00e9mi Louf Morgan Funtowicz Joe Davison Sam Shleifer Patrick von Platen Clara Ma Yacine Jernite Julien Plu Canwen Xu Teven\u00a0Le Scao Sylvain Gugger Mariama Drame Quentin Lhoest and Alexander\u00a0M. Rush. 2020. HuggingFace\u2019s Transformers: State-of-the-art Natural Language Processing. arxiv:https:\/\/arXiv.org\/abs\/1910.03771\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/1910.03771","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_3_3_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00903"},{"key":"e_1_3_3_3_56_1","doi-asserted-by":"crossref","unstructured":"Kaiyang Zhou Jingkang Yang Chen\u00a0Change Loy and Ziwei Liu. 2022. Learning to prompt for vision-language models. International Journal of Computer Vision 130 9 (2022) 2337\u20132348.","DOI":"10.1007\/s11263-022-01653-1"}],"event":{"name":"SIGGRAPH Conference Papers '25: Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers","location":"Vancouver BC Canada","acronym":"SIGGRAPH Conference Papers '25","sponsor":["SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques"]},"container-title":["Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3721238.3730747","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T15:01:37Z","timestamp":1774018897000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3721238.3730747"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,27]]},"references-count":55,"alternative-id":["10.1145\/3721238.3730747","10.1145\/3721238"],"URL":"https:\/\/doi.org\/10.1145\/3721238.3730747","relation":{},"subject":[],"published":{"date-parts":[[2025,7,27]]},"assertion":[{"value":"2025-07-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}