{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,23]],"date-time":"2025-12-23T00:29:59Z","timestamp":1766449799267,"version":"3.44.0"},"publisher-location":"New York, NY, USA","reference-count":61,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,12,2]],"date-time":"2024-12-02T00:00:00Z","timestamp":1733097600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100006374","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CNS-2241303, CNS-1949650"],"award-info":[{"award-number":["CNS-2241303, CNS-1949650"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,12,2]]},"DOI":"10.1145\/3658644.3690205","type":"proceedings-article","created":{"date-parts":[[2024,12,9]],"date-time":"2024-12-09T12:19:20Z","timestamp":1733746760000},"page":"1211-1225","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Understanding Implosion in Text-to-Image Generative Models"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-1561-3524","authenticated-orcid":false,"given":"Wenxin","family":"Ding","sequence":"first","affiliation":[{"name":"University of Chicago, Chicago, IL, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-1070-1997","authenticated-orcid":false,"given":"Cathy Y.","family":"Li","sequence":"additional","affiliation":[{"name":"University of Chicago, Chicago, IL, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-4324-7817","authenticated-orcid":false,"given":"Shawn","family":"Shan","sequence":"additional","affiliation":[{"name":"University of Chicago, Chicago, IL, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-8909-0494","authenticated-orcid":false,"given":"Ben Y.","family":"Zhao","sequence":"additional","affiliation":[{"name":"University of Chicago, Chicago, IL, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5918-2940","authenticated-orcid":false,"given":"Haitao","family":"Zheng","sequence":"additional","affiliation":[{"name":"University of Chicago, Chicago, IL, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,12,9]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Philipp Benz Chaoning Zhang Adil Karjauv and In So Kweon. 2021. Robustness may be at odds with fairness: An empirical study on class-wise accuracy. In NeurIPS pre-registration workshop."},{"volume-title":"Natural language processing with Python: analyzing text with the natural language toolkit. \"O'Reilly Media","author":"Bird Steven","key":"e_1_3_2_1_2_1","unstructured":"Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. \"O'Reilly Media, Inc.\"."},{"volume-title":"Proc. of IEEE S&P.","author":"Nicholas","key":"e_1_3_2_1_3_1","unstructured":"Nicholas Carlini et al. 2024. Poisoning web-scale training datasets is practical. In Proc. of IEEE S&P."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00356"},{"volume-title":"Proc. of ICML.","author":"Liqun","key":"e_1_3_2_1_5_1","unstructured":"Liqun Chen et al. 2020. Graph optimal transport for cross-domain alignment. In Proc. of ICML."},{"key":"e_1_3_2_1_6_1","volume-title":"Training-free layout control with cross-attention guidance. arXiv:2304.03373","author":"Chen Minghao","year":"2023","unstructured":"Minghao Chen, Iro Laina, and Andrea Vedaldi. 2023. Training-free layout control with cross-attention guidance. arXiv:2304.03373 (2023)."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00393"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00391"},{"volume-title":"Proc. of CVPR.","author":"Jia","key":"e_1_3_2_1_9_1","unstructured":"Jia Deng et al. 2009. Imagenet: A large-scale hierarchical image database. In Proc. of CVPR."},{"volume-title":"Proc. of NeurIPS.","author":"Ming","key":"e_1_3_2_1_10_1","unstructured":"Ming Ding et al. 2021. Cogview: Mastering text-to-image generation via transformers. In Proc. of NeurIPS."},{"key":"e_1_3_2_1_11_1","volume-title":"Understanding Implosion in Text-to-Image Generative Models. arXiv:2409.12314","author":"Ding Wenxin","year":"2024","unstructured":"Wenxin Ding, Cathy Y Li, Shawn Shan, Ben Y Zhao, and Haitao Zheng. 2024. Understanding Implosion in Text-to-Image Generative Models. arXiv:2409.12314 (2024)."},{"key":"e_1_3_2_1_12_1","unstructured":"Patrick Esser et al. 2024. Scaling rectified flow transformers for high-resolution image synthesis. arXiv:2403.03206 (2024)."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00230"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV57701.2024.00503"},{"volume-title":"Proc. of IEEE TPAMI.","author":"Micah","key":"e_1_3_2_1_15_1","unstructured":"Micah Goldblum et al. 2023. Dataset security for machine learning: data poisoning, backdoor attacks, and defenses. In Proc. of IEEE TPAMI."},{"key":"e_1_3_2_1_16_1","volume-title":"Proc. of NeurIPS.","author":"Heng Alvin","year":"2023","unstructured":"Alvin Heng and Harold Soh. 2023. Selective amnesia: A continual learning approach to forgetting in deep generative models. In Proc. of NeurIPS."},{"key":"e_1_3_2_1_17_1","volume-title":"CLIP knows image aesthetics. Frontiers in Artificial Intelligence","author":"Hentschel Simon","year":"2022","unstructured":"Simon Hentschel, Konstantin Kobs, and Andreas Hotho. 2022. CLIP knows image aesthetics. Frontiers in Artificial Intelligence (2022)."},{"key":"e_1_3_2_1_18_1","unstructured":"Amir Hertz et al. 2022. Prompt-to-prompt image editing with cross attention control. arXiv:2208.01626 (2022)."},{"key":"e_1_3_2_1_19_1","unstructured":"Sadeep Jayasumana et al. 2023. Rethinking FID: Towards a better evaluation metric for image generation. arXiv:2401.09603 (2023)."},{"key":"e_1_3_2_1_20_1","volume-title":"Text-image alignment for diffusion-based perception. arXiv:2310.00031","author":"Kondapaneni Neehar","year":"2024","unstructured":"Neehar Kondapaneni, Markus Marks, Manuel Knott, Rogerio Guimaraes, and Pietro Perona. 2024. Text-image alignment for diffusion-based perception. arXiv:2310.00031 (2024)."},{"key":"e_1_3_2_1_21_1","volume-title":"Data redaction from conditional generative models. arXiv:2305.11351","author":"Kong Zhifeng","year":"2023","unstructured":"Zhifeng Kong and Kamalika Chaudhuri. 2023. Data redaction from conditional generative models. arXiv:2305.11351 (2023)."},{"key":"e_1_3_2_1_22_1","unstructured":"Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical Report. University of Toronto."},{"key":"e_1_3_2_1_23_1","volume-title":"The role of ImageNet classes in Fr\u00e9chet inception distance. arXiv:2203.06026","author":"Kynk\u00e4\u00e4nniemi Tuomas","year":"2022","unstructured":"Tuomas Kynk\u00e4\u00e4nniemi, Tero Karras, Miika Aittala, Timo Aila, and Jaakko Lehtinen. 2022. The role of ImageNet classes in Fr\u00e9chet inception distance. arXiv:2203.06026 (2022)."},{"volume-title":"Proc. of USENIX Security.","author":"Zheng","key":"e_1_3_2_1_24_1","unstructured":"Zheng Li et al. 2023. UnGANable: Defending against GAN-based face manipulation. In Proc. of USENIX Security."},{"volume-title":"Proc. of ICML.","author":"Chumeng","key":"e_1_3_2_1_25_1","unstructured":"Chumeng Liang et al. 2023. Adversarial example does good: Preventing painting imitation from diffusion models via adversarial examples. In Proc. of ICML."},{"key":"e_1_3_2_1_26_1","volume-title":"Towards understanding cross and self-attention in stable diffusion for text-guided image editing. arXiv:2403.03431","author":"Liu Bingyan","year":"2024","unstructured":"Bingyan Liu, Chengyu Wang, Tingfeng Cao, Kui Jia, and Jun Huang. 2024. Towards understanding cross and self-attention in stable diffusion for text-guided image editing. arXiv:2403.03431 (2024)."},{"key":"e_1_3_2_1_27_1","volume-title":"The graph matching problem. Pattern Analysis and Applications","author":"Livi Lorenzo","year":"2013","unstructured":"Lorenzo Livi and Antonello Rizzi. 2013. The graph matching problem. Pattern Analysis and Applications (2013)."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/SaTML59370.2024.00023"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV56688.2023.00253"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00649"},{"key":"e_1_3_2_1_31_1","volume-title":"Proc. of NeurIPS.","author":"Park Dong Huk","year":"2021","unstructured":"Dong Huk Park, Samaneh Azadi, Xihui Liu, Trevor Darrell, and Anna Rohrbach. 2021. Benchmark for compositional text-to-image synthesis. In Proc. of NeurIPS."},{"key":"e_1_3_2_1_32_1","volume-title":"SDXL: Improving latent diffusion models for high-resolution image synthesis. arXiv:2307.01952","author":"Dustin Podell","year":"2023","unstructured":"Dustin Podell et al. 2023. SDXL: Improving latent diffusion models for high-resolution image synthesis. arXiv:2307.01952 (2023)."},{"volume-title":"Proc. of ICML.","author":"Alec","key":"e_1_3_2_1_33_1","unstructured":"Alec Radford et al. 2021. Learning transferable visual models from natural language supervision. In Proc. of ICML."},{"key":"e_1_3_2_1_34_1","unstructured":"Aditya Ramesh et al. 2022. Hierarchical text-conditional image generation with CLIP latents. arXiv:2204.06125 (2022)."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_2_1_36_1","unstructured":"Alberto Romero. 2022. Stable Diffusion 2 Is Not What Users Expected-Or Wanted. https:\/\/www.thealgorithmicbridge.com\/p\/stable-diffusion-2-is-not-what-users."},{"volume-title":"Proc. of CVPR.","author":"Nataniel","key":"e_1_3_2_1_37_1","unstructured":"Nataniel Ruiz et al. 2023. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proc. of CVPR."},{"key":"e_1_3_2_1_38_1","unstructured":"Christoph Schuhmann. 2022. LAION-AESTHETICS. https:\/\/laion.ai\/blog\/laion-aesthetics\/."},{"key":"e_1_3_2_1_39_1","unstructured":"Christoph Schuhmann et al. 2021. Laion-400M: Open dataset of CLIP-filtered 400 million image-text pairs. arXiv:2111.02114 (2021)."},{"key":"e_1_3_2_1_40_1","unstructured":"Christoph Schuhmann et al. 2022. LAION-5B: An open large-scale dataset for training next generation image-text models. arXiv:2210.08402 (2022)."},{"volume-title":"Proc. of USENIX Security.","author":"Shawn","key":"e_1_3_2_1_41_1","unstructured":"Shawn Shan et al. 2023. Glaze: Protecting artists from style mimicry by text-to-image models. In Proc. of USENIX Security."},{"volume-title":"Proc. of IEEE S&P.","author":"Shawn","key":"e_1_3_2_1_42_1","unstructured":"Shawn Shan et al. 2024. Nightshade: Prompt-specific poisoning attacks on text-to-image generative models. In Proc. of IEEE S&P."},{"volume-title":"Proc. of NeurIPS.","author":"Manli","key":"e_1_3_2_1_43_1","unstructured":"Manli Shu et al. 2023. On the exploitability of instruction tuning. In Proc. of NeurIPS."},{"key":"e_1_3_2_1_44_1","unstructured":"Ilia Shumailov et al. 2024. The curse of recursion: Training on generated data makes models forget. arXiv:2305.17493 (2024)."},{"key":"e_1_3_2_1_45_1","unstructured":"Stability AI. 2022. Stable Diffusion 2.0. https:\/\/stability.ai\/news\/stable-diffusion-v2-release."},{"key":"e_1_3_2_1_46_1","unstructured":"Stability AI. 2022. Stable Diffusion 2.1. https:\/\/stability.ai\/news\/stablediffusion2--1-release7-dec-2022."},{"key":"e_1_3_2_1_47_1","unstructured":"Stability AI. 2024. Stable Diffusion 3. https:\/\/stability.ai\/news\/stable-diffusion-3."},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467403"},{"key":"e_1_3_2_1_49_1","volume-title":"Proc. of ICML.","author":"Titouan Vayer","year":"2019","unstructured":"Vayer Titouan, Nicolas Courty, Romain Tavenard, and R\u00e9mi Flamary. 2019. Optimal transport for structured data with application on graphs. In Proc. of ICML."},{"volume-title":"Proc. of ICCV.","author":"Van Thanh","key":"e_1_3_2_1_50_1","unstructured":"Thanh Van Le et al. 2023. Anti-DreamBooth: Protecting users from personalized text-to-image synthesis. In Proc. of ICCV."},{"key":"e_1_3_2_1_51_1","unstructured":"Timothy Alexis Vass. 2023. Explaining the SDXL latent space. https:\/\/huggingface.co\/blog\/TimothyAlexisVass\/explaining-the-sdxl-latent-space."},{"key":"e_1_3_2_1_52_1","volume-title":"Proc. of ICML.","author":"Wan Alexander","year":"2023","unstructured":"Alexander Wan, Eric Wallace, Sheng Shen, and Dan Klein. 2023. Poisoning language models during instruction tuning. In Proc. of ICML."},{"key":"e_1_3_2_1_53_1","volume-title":"The stronger the diffusion model, the easier the backdoor: Data poisoning to induce copyright breaches without adjusting finetuning pipeline. arXiv:2401.04136","author":"Wang Haonan","year":"2024","unstructured":"Haonan Wang, Qianli Shen, Yao Tong, Yang Zhang, and Kenji Kawaguchi. 2024. The stronger the diffusion model, the easier the backdoor: Data poisoning to induce copyright breaches without adjusting finetuning pipeline. arXiv:2401.04136 (2024)."},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i2.25353"},{"key":"e_1_3_2_1_55_1","unstructured":"Jingyao Xu et al. 2024. Perturbing attention gives you more bang for the buck: Subtle imaging perturbations that efficiently fool customized diffusion models. arxiv: 2404.15081 (2024)."},{"key":"e_1_3_2_1_56_1","unstructured":"Liwu Xu et al. 2023. CLIP brings better features to visual aesthetics learners. arXiv:2307.15640 (2023)."},{"key":"e_1_3_2_1_57_1","volume-title":"Shadowcast: Stealthy data poisoning attacks against vision-language models. arXiv:2402.06659","author":"Yuancheng Xu","year":"2024","unstructured":"Yuancheng Xu et al. 2024. Shadowcast: Stealthy data poisoning attacks against vision-language models. arXiv:2402.06659 (2024)."},{"key":"e_1_3_2_1_58_1","volume-title":"Diffusion model with cross attention as an inductive bias for disentanglement. arXiv:2402.09712","author":"Yang Tao","year":"2024","unstructured":"Tao Yang, Cuiling Lan, Yan Lu, and Nanning Zheng. 2024. Diffusion model with cross attention as an inductive bias for disentanglement. arXiv:2402.09712 (2024)."},{"volume-title":"Proc. of ICML.","author":"Ziqing","key":"e_1_3_2_1_59_1","unstructured":"Ziqing Yang et al. 2023. Data poisoning attacks against multimodal encoders. In Proc. of ICML."},{"key":"e_1_3_2_1_60_1","unstructured":"Shengfang Zhai et al. 2023. Text-to-image diffusion models can be easily backdoored through multimodal data poisoning. arXiv:2305.04175 (2023)."},{"volume-title":"Proc. of ICCV.","author":"Wenliang","key":"e_1_3_2_1_61_1","unstructured":"Wenliang Zhao et al. 2023. Unleashing text-to-image diffusion models for visual perception. In Proc. of ICCV."}],"event":{"name":"CCS '24: ACM SIGSAC Conference on Computer and Communications Security","sponsor":["SIGSAC ACM Special Interest Group on Security, Audit, and Control"],"location":"Salt Lake City UT USA","acronym":"CCS '24"},"container-title":["Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3658644.3690205","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3658644.3690205","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T06:15:56Z","timestamp":1755843356000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3658644.3690205"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,2]]},"references-count":61,"alternative-id":["10.1145\/3658644.3690205","10.1145\/3658644"],"URL":"https:\/\/doi.org\/10.1145\/3658644.3690205","relation":{},"subject":[],"published":{"date-parts":[[2024,12,2]]},"assertion":[{"value":"2024-12-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}