{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T16:00:59Z","timestamp":1774022459998,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":56,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,8,10]]},"DOI":"10.1145\/3721238.3730631","type":"proceedings-article","created":{"date-parts":[[2025,7,23]],"date-time":"2025-07-23T08:42:43Z","timestamp":1753260163000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Be Decisive: Noise-Induced Layouts for Multi-Subject Generation"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0448-9301","authenticated-orcid":false,"given":"Omer","family":"Dahary","sequence":"first","affiliation":[{"name":"Tel Aviv University, Tel Aviv, Israel and Snap Research, Tel Aviv, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-7919-4301","authenticated-orcid":false,"given":"Yehonathan","family":"Cohen","sequence":"additional","affiliation":[{"name":"Tel Aviv University, Tel Aviv, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7757-6137","authenticated-orcid":false,"given":"Or","family":"Patashnik","sequence":"additional","affiliation":[{"name":"Tel Aviv University, Tel Aviv, Israel and Snap Research, Tel Aviv, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4958-601X","authenticated-orcid":false,"given":"Kfir","family":"Aberman","sequence":"additional","affiliation":[{"name":"Snap Research, Palo Alto, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6777-7445","authenticated-orcid":false,"given":"Daniel","family":"Cohen-Or","sequence":"additional","affiliation":[{"name":"Tel Aviv University, Tel Aviv, Israel and Snap Research, Tel Aviv, Israel"}]}],"member":"320","published-online":{"date-parts":[[2025,7,27]]},"reference":[{"key":"e_1_3_3_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00217"},{"key":"e_1_3_3_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01762"},{"key":"e_1_3_3_2_4_1","unstructured":"Yuanhao Ban Ruochen Wang Tianyi Zhou Boqing Gong Cho-Jui Hsieh and Minhao Cheng. 2024. The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2406.01970 (2024)."},{"key":"e_1_3_3_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657527"},{"key":"e_1_3_3_2_6_1","unstructured":"Omer Bar-Tal Lior Yariv Yaron Lipman and Tali Dekel. 2023. Multidiffusion: Fusing diffusion paths for controlled image generation. (2023)."},{"key":"e_1_3_3_2_7_1","unstructured":"Lital Binyamin Yoad Tewel Hilit Segev Eran Hirsch Royi Rassin and Gal Chechik. 2024. Make It Count: Text-to-Image Generation with an Accurate Number of Objects. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2406.10210 (2024)."},{"key":"e_1_3_3_2_8_1","doi-asserted-by":"crossref","unstructured":"Hila Chefer Yuval Alaluf Yael Vinker Lior Wolf and Daniel Cohen-Or. 2023. Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models. ACM Transactions on Graphics (TOG) 42 4 (2023) 1\u201310.","DOI":"10.1145\/3592116"},{"key":"e_1_3_3_2_9_1","unstructured":"Minghao Chen Iro Laina and Andrea Vedaldi. 2023a. Training-free layout control with cross-attention guidance. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2304.03373 (2023)."},{"key":"e_1_3_3_2_10_1","unstructured":"Xiaohui Chen Yongfei Liu Yingxiang Yang Jianbo Yuan Quanzeng You Li-Ping Liu and Hongxia Yang. 2023b. Reason out your layout: Evoking the layout master from large language models for text-to-image synthesis. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2311.17126 (2023)."},{"key":"e_1_3_3_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00207"},{"key":"e_1_3_3_2_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-72630-9_25"},{"key":"e_1_3_3_2_13_1","unstructured":"Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34 (2021) 8780\u20138794."},{"key":"e_1_3_3_2_14_1","doi-asserted-by":"crossref","unstructured":"Dave Epstein Allan Jabri Ben Poole Alexei Efros and Aleksander Holynski. 2023. Diffusion self-guidance for controllable image generation. Advances in Neural Information Processing Systems 36 (2023) 16222\u201316239.","DOI":"10.52202\/075280-0714"},{"key":"e_1_3_3_2_15_1","unstructured":"Weixi Feng Xuehai He Tsu-Jui Fu Varun Jampani Arjun Akula Pradyumna Narayana Sugato Basu Xin\u00a0Eric Wang and William\u00a0Yang Wang. 2022. Training-free structured diffusion guidance for compositional text-to-image synthesis. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2212.05032 (2022)."},{"key":"e_1_3_3_2_16_1","unstructured":"Weixi Feng Wanrong Zhu Tsu-jui Fu Varun Jampani Arjun Akula Xuehai He Sugato Basu Xin\u00a0Eric Wang and William\u00a0Yang Wang. 2024b. Layoutgpt: Compositional visual planning and generation with large language models. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_3_3_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00454"},{"key":"e_1_3_3_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00694"},{"key":"e_1_3_3_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00896"},{"key":"e_1_3_3_2_20_1","unstructured":"Amir Hertz Ron Mokady Jay Tenenbaum Kfir Aberman Yael Pritch and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2208.01626 (2022)."},{"key":"e_1_3_3_2_21_1","doi-asserted-by":"crossref","unstructured":"Kaiyi Huang Kaiyue Sun Enze Xie Zhenguo Li and Xihui Liu. 2023. T2i-compbench: A comprehensive benchmark for open-world compositional text-to-image generation. Advances in Neural Information Processing Systems 36 (2023) 78723\u201378747.","DOI":"10.52202\/075280-3443"},{"key":"e_1_3_3_2_22_1","unstructured":"Wonjun Kang Kevin Galim and Hyung\u00a0Il Koo. 2023. Counting guidance for high fidelity text-to-image synthesis. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2306.17567 (2023)."},{"key":"e_1_3_3_2_23_1","unstructured":"Gyeongnyeon Kim Wooseok Jang Gyuseong Lee Susung Hong Junyoung Seo and Seungryong Kim. 2022. Dag: Depth-aware guidance with denoising diffusion probabilistic models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2212.08861 (2022)."},{"key":"e_1_3_3_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00708"},{"key":"e_1_3_3_2_25_1","unstructured":"Yumeng Li Margret Keuper Dan Zhang and Anna Khoreva. 2023a. Divide & bind your attention for improved generative semantic nursing. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.10864 (2023)."},{"key":"e_1_3_3_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02156"},{"key":"e_1_3_3_2_27_1","unstructured":"Long Lian Boyi Li Adam Yala and Trevor Darrell. 2023. Llm-grounded diffusion: Enhancing prompt understanding of text-to-image diffusion models with large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.13655 (2023)."},{"key":"e_1_3_3_2_28_1","doi-asserted-by":"crossref","unstructured":"Zhiheng Liu Yifei Zhang Yujun Shen Kecheng Zheng Kai Zhu Ruili Feng Yu Liu Deli Zhao Jingren Zhou and Yang Cao. 2023. Customizable image synthesis with multiple subjects. Advances in neural information processing systems 36 (2023) 57500\u201357519.","DOI":"10.52202\/075280-2508"},{"key":"e_1_3_3_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00785"},{"key":"e_1_3_3_2_30_1","first-page":"9005","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Meral Tuna Han\u00a0Salih","year":"2024","unstructured":"Tuna Han\u00a0Salih Meral, Enis Simsar, Federico Tombari, and Pinar Yanardag. 2024. Conform: Contrast is all you need for high-fidelity text-to-image diffusion models. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 9005\u20139014."},{"key":"e_1_3_3_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/3DV.2016.79"},{"key":"e_1_3_3_2_32_1","unstructured":"Weili Nie Sifei Liu Morteza Mardani Chao Liu Benjamin Eckart and Arash Vahdat. 2024. Compositional Text-to-Image Generation with Dense Blob Representations. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2405.08246 (2024)."},{"key":"e_1_3_3_2_33_1","doi-asserted-by":"crossref","unstructured":"Or Patashnik Rinon Gal Daniil Ostashev Sergey Tulyakov Kfir Aberman and Daniel Cohen-Or. 2025. Nested Attention: Semantic-aware Attention Values for Concept Personalization. arxiv:https:\/\/arXiv.org\/abs\/2501.01407\u00a0[cs.CV]","DOI":"10.1145\/3721238.3730634"},{"key":"e_1_3_3_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02107"},{"key":"e_1_3_3_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00758"},{"key":"e_1_3_3_2_36_1","unstructured":"Dustin Podell Zion English Kyle Lacey Andreas Blattmann Tim Dockhorn Jonas M\u00fcller Joe Penna and Robin Rombach. 2023. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.01952 (2023)."},{"key":"e_1_3_3_2_37_1","unstructured":"Leigang Qu Shengqiong Wu Hao Fei Liqiang Nie and Tat-Seng Chua. 2023. LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2308.05095 (2023)."},{"key":"e_1_3_3_2_38_1","unstructured":"Aditya Ramesh Prafulla Dhariwal Alex Nichol Casey Chu and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2204.06125 1 2 (2022) 3."},{"key":"e_1_3_3_2_39_1","unstructured":"Royi Rassin Eran Hirsch Daniel Glickman Shauli Ravfogel Yoav Goldberg and Gal Chechik. 2023. Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2306.08877 (2023)."},{"key":"e_1_3_3_2_40_1","unstructured":"Tianhe Ren Shilong Liu Ailing Zeng Jing Lin Kunchang Li He Cao Jiayu Chen Xinyu Huang Yukang Chen Feng Yan et\u00a0al. 2024a. Grounded sam: Assembling open-world models for diverse visual tasks. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2401.14159 (2024)."},{"key":"e_1_3_3_2_41_1","unstructured":"Tianhe Ren Shilong Liu Ailing Zeng Jing Lin Kunchang Li He Cao Jiayu Chen Xinyu Huang Yukang Chen Feng Yan Zhaoyang Zeng Hao Zhang Feng Li Jie Yang Hongyang Li Qing Jiang and Lei Zhang. 2024b. Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks. ArXiv abs\/2401.14159 (2024). https:\/\/api.semanticscholar.org\/CorpusID:267212047"},{"key":"e_1_3_3_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_3_2_43_1","doi-asserted-by":"crossref","unstructured":"Chitwan Saharia William Chan Saurabh Saxena Lala Li Jay Whang Emily\u00a0L Denton Kamyar Ghasemipour Raphael Gontijo\u00a0Lopes Burcu Karagol\u00a0Ayan Tim Salimans et\u00a0al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems 35 (2022) 36479\u201336494.","DOI":"10.52202\/068431-2643"},{"key":"e_1_3_3_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"e_1_3_3_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00191"},{"key":"e_1_3_3_2_46_1","unstructured":"Hazarapet Tunanyan Dejia Xu Shant Navasardyan Zhangyang Wang and Humphrey Shi. 2023. Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.07419 (2023)."},{"key":"e_1_3_3_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591560"},{"key":"e_1_3_3_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00596"},{"key":"e_1_3_3_2_49_1","unstructured":"Xierui Wang Siming Fu Qihan Huang Wanggui He and Hao Jiang. 2024b. Ms-diffusion: Multi-subject zero-shot image personalization with layout guidance. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2406.07209 (2024)."},{"key":"e_1_3_3_2_50_1","unstructured":"Zirui Wang Zhizhou Sha Zheng Ding Yilin Wang and Zhuowen Tu. 2023. Tokencompose: Grounding diffusion with token-level supervision. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2312.03626 (2023)."},{"key":"e_1_3_3_2_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00685"},{"key":"e_1_3_3_2_52_1","volume-title":"Forty-first International Conference on Machine Learning","author":"Yang Ling","year":"2024","unstructured":"Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, and CUI Bin. 2024. Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal llms. In Forty-first International Conference on Machine Learning."},{"key":"e_1_3_3_2_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01369"},{"key":"e_1_3_3_2_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"e_1_3_3_2_55_1","volume-title":"R0-FoMo: Robustness of Few-shot and Zero-shot Learning in Large Foundation Models","author":"Zhang Ruisu","year":"2023","unstructured":"Ruisu Zhang, Yicong Chen, and Kangwook Lee. 2023a. Zero-shot Improvement of Object Counting with CLIP. In R0-FoMo: Robustness of Few-shot and Zero-shot Learning in Large Foundation Models."},{"key":"e_1_3_3_2_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02154"},{"key":"e_1_3_3_2_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00651"}],"event":{"name":"SIGGRAPH Conference Papers '25: Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers","location":"Vancouver BC Canada","acronym":"SIGGRAPH Conference Papers '25","sponsor":["SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques"]},"container-title":["Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3721238.3730631","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T15:03:32Z","timestamp":1774019012000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3721238.3730631"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,27]]},"references-count":56,"alternative-id":["10.1145\/3721238.3730631","10.1145\/3721238"],"URL":"https:\/\/doi.org\/10.1145\/3721238.3730631","relation":{},"subject":[],"published":{"date-parts":[[2025,7,27]]},"assertion":[{"value":"2025-07-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}