{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T17:26:59Z","timestamp":1777656419410,"version":"3.51.4"},"reference-count":263,"publisher":"Association for Computing Machinery (ACM)","issue":"9","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62506317"],"award-info":[{"award-number":["62506317"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2026,7,31]]},"abstract":"<jats:p>Diffusion models have emerged as the leading paradigm in generative modeling, excelling in various applications. Despite their success, these models often misalign with human intentions and generate results with undesired properties or even harmful content. Inspired by the success and popularity of alignment in tuning large language models, recent studies have investigated aligning diffusion models with human expectations and preferences. This work mainly reviews alignment of diffusion models, covering advancements in fundamentals of alignment, alignment techniques of diffusion models, preference benchmarks, and evaluation for diffusion models. Moreover, we discuss key perspectives on current challenges and promising future directions on solving the remaining challenges in alignment of diffusion models. To the best of our knowledge, our work is the first comprehensive review paper for researchers and engineers to comprehend, practice, and research alignment of diffusion models.<\/jats:p>","DOI":"10.1145\/3796982","type":"journal-article","created":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T21:06:16Z","timestamp":1770757576000},"page":"1-37","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Alignment of Diffusion Models: Fundamentals, Challenges, and Future"],"prefix":"10.1145","volume":"58","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3562-9234","authenticated-orcid":false,"given":"Buhua","family":"Liu","sequence":"first","affiliation":[{"name":"The Hong Kong University of Science and Technology - Guangzhou Campus","place":["Guangzhou, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4689-6140","authenticated-orcid":false,"given":"Shitong","family":"Shao","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology - Guangzhou Campus","place":["Guangzhou, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-9183-5539","authenticated-orcid":false,"given":"Bao","family":"Li","sequence":"additional","affiliation":[{"name":"Chinese Academy of Sciences Institute of Automation","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-9067-3448","authenticated-orcid":false,"given":"Lichen","family":"Bai","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology - Guangzhou Campus","place":["Guangzhou, China"]},{"name":"Tsinghua University","place":["Guangzhou, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5693-8933","authenticated-orcid":false,"given":"Zhiqiang","family":"Xu","sequence":"additional","affiliation":[{"name":"Mohamed bin Zayed University of Artificial Intelligence","place":["Masdar City, United Arab Emirates"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5451-3253","authenticated-orcid":false,"given":"Haoyi","family":"Xiong","sequence":"additional","affiliation":[{"name":"Baidu Inc","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4828-8248","authenticated-orcid":false,"given":"James T.","family":"Kwok","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology","place":["Hong Kong, Hong Kong"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5451-4398","authenticated-orcid":false,"given":"Sumi","family":"Helal","sequence":"additional","affiliation":[{"name":"University of Bologna","place":["Bologna, Italy"]},{"name":"CISE, University of Florida","place":["Bologna, Italy"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4766-435X","authenticated-orcid":false,"given":"Zeke","family":"Xie","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology - Guangzhou Campus","place":["Guangzhou, China"]}]}],"member":"320","published-online":{"date-parts":[[2026,3,10]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Alekh Agarwal Sham M. Kakade Jason D. Lee and Gaurav Mahajan. 2021. On the theory of policy gradient methods: Optimality approximation and distribution shift. Journal of Machine Learning Research 22 98 (2021) 1\u201376."},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","unstructured":"Arash Ahmadian Chris Cremer Matthias Gall\u00e9 Marzieh Fadaee Julia Kreutzer Olivier Pietquin Ahmet \u00dcst\u00fcn and Sara Hooker. 2024. Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs. arxiv:2402.14740","DOI":"10.18653\/v1\/2024.acl-long.662"},{"key":"e_1_3_2_4_2","doi-asserted-by":"crossref","unstructured":"Nuha Aldausari Arcot Sowmya Nadine Marcus and Gelareh Mohammadi. 2022. Video generative adversarial networks: A review. ACM Comput. Surv. 55 2 Article 30 (jan2022) 25 pages.","DOI":"10.1145\/3487891"},{"key":"e_1_3_2_5_2","unstructured":"C\u00e9dric Archambeau Manfred Opper Yuan Shen Dan Cornford and John Shawe-Taylor. 2007. Variational inference for diffusion processes. Advances in Neural Information Processing Systems 20 (2007) 17\u201324."},{"key":"e_1_3_2_6_2","volume-title":"The Thirteenth International Conference on Learning Representations","author":"Bai Lichen","year":"2025","unstructured":"Lichen Bai, Shitong Shao, Zikai Zhou, Zipeng Qi, Zhiqiang Xu, Haoyi Xiong, and Zeke Xie. 2025. Zigzag diffusion sampling: Diffusion models can self-improve via self-reflection. In The Thirteenth International Conference on Learning Representations."},{"key":"e_1_3_2_7_2","unstructured":"Yuntao Bai Saurav Kadavath Sandipan Kundu Amanda Askell Jackson Kernion Andy Jones Anna Chen Anna Goldie Azalia Mirhoseini Cameron McKinnon et\u00a0al. 2022. Constitutional ai: Harmlessness from ai feedback. arxiv:2212.08073 (2022)."},{"key":"e_1_3_2_8_2","unstructured":"James Betker Gabriel Goh Li Jing Tim Brooks Jianfeng Wang Linjie Li Long Ouyang Juntang Zhuang Joyce Lee Yufei Guo et\u00a0al. 2023. Improving image generation with better captions. Computer Science. Retrieved from https:\/\/cdn.openai.com\/papers\/dall-e-3.pdf 2 3 (2023) 8."},{"key":"e_1_3_2_9_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Black Kevin","year":"2023","unstructured":"Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. 2023. Training diffusion models with reinforcement learning. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02161"},{"key":"e_1_3_2_11_2","doi-asserted-by":"crossref","unstructured":"Salomon Bochner. 1949. Diffusion equation and stochastic processes. Proceedings of the National Academy of Sciences 35 7 (1949) 368\u2013370.","DOI":"10.1073\/pnas.35.7.368"},{"key":"e_1_3_2_12_2","doi-asserted-by":"crossref","unstructured":"Ralph Allan Bradley and Milton E. Terry. 1952. Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika 39 3\/4 (1952) 324\u2013345.","DOI":"10.1093\/biomet\/39.3-4.324"},{"key":"e_1_3_2_13_2","first-page":"4971","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Burns Collin","year":"2024","unstructured":"Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, et\u00a0al. 2024. Weak-to-strong generalization: Eliciting strong capabilities with weak supervision. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 4971\u20135012."},{"key":"e_1_3_2_14_2","doi-asserted-by":"crossref","unstructured":"Zoya Bylinskii Tilke Judd Aude Oliva Antonio Torralba and Fr\u00e9do Durand. 2018. What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence 41 3 (2018) 740\u2013757.","DOI":"10.1109\/TPAMI.2018.2815601"},{"key":"e_1_3_2_15_2","doi-asserted-by":"crossref","unstructured":"Zhipeng Cai Zuobin Xiong Honghui Xu Peng Wang Wei Li and Yi Pan. 2021. Generative adversarial networks: A survey toward private and secure applications. ACM Comput. Surv. 54 6 Article 132 (jul2021) 38 pages.","DOI":"10.1145\/3459992"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.5555\/3692070.3692281"},{"key":"e_1_3_2_17_2","doi-asserted-by":"crossref","unstructured":"Hanqun Cao Cheng Tan Zhangyang Gao Yilun Xu Guangyong Chen Pheng-Ann Heng and Stan Z. Li. 2024. A survey on generative diffusion models. IEEE Transactions on Knowledge and Data Engineering 36 7 (2024) 2814\u20132830.","DOI":"10.1109\/TKDE.2024.3361474"},{"key":"e_1_3_2_18_2","unstructured":"Pu Cao Feng Zhou Qing Song and Lu Yang. 2025. Controllable generation with text-to-image diffusion models: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025) 1\u201320."},{"key":"e_1_3_2_19_2","volume-title":"Proceedings of the 41st International Conference on Machine Learning (Proceedings of Machine Learning Research)","author":"Carroll Micah","year":"2024","unstructured":"Micah Carroll, Davis Foote, Anand Siththaranjan, Stuart Russell, and Anca Dragan. 2024. AI alignment with changing and influenceable reward functions. In Proceedings of the 41st International Conference on Machine Learning (Proceedings of Machine Learning Research). PMLR."},{"key":"e_1_3_2_20_2","first-page":"6116","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Chakraborty Souradip","year":"2024","unstructured":"Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Dinesh Manocha, Furong Huang, Amrit Bedi, and Mengdi Wang. 2024. MaxMin-RLHF: Alignment with diverse human preferences. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 6116\u20136135."},{"key":"e_1_3_2_21_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Chakraborty Souradip","year":"2024","unstructured":"Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Dinesh Manocha, Furong Huang, Amrit Bedi, and Mengdi Wang. 2024. MaxMin-RLHF: Alignment with diverse human preferences. In Forty-First International Conference on Machine Learning."},{"key":"e_1_3_2_22_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Chan Alex James","year":"2024","unstructured":"Alex James Chan, Hao Sun, Samuel Holt, and Mihaela van der Schaar. 2024. Dense reward for free in reinforcement learning from human feedback. In Forty-First International Conference on Machine Learning."},{"key":"e_1_3_2_23_2","doi-asserted-by":"crossref","unstructured":"Hila Chefer Yuval Alaluf Yael Vinker Lior Wolf and Daniel Cohen-Or. 2023. Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models. ACM Transactions on Graphics 42 4 (2023) 1\u201310.","DOI":"10.1145\/3592116"},{"key":"e_1_3_2_24_2","volume-title":"The Thirteenth International Conference on Learning Representations","author":"Chen Daiwei","year":"2025","unstructured":"Daiwei Chen, Yi Chen, Aniket Rege, Zhi Wang, and Ramya Korlakai Vinayak. 2025. PAL: Sample-efficient personalized reward modeling for pluralistic alignment. In The Thirteenth International Conference on Learning Representations."},{"key":"e_1_3_2_25_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Chen Huanran","year":"2024","unstructured":"Huanran Chen, Yinpeng Dong, Zhengyi Wang, Xiao Yang, Chengqi Duan, Hang Su, and Jun Zhu. 2024. Robust classification via a single diffusion model. In Forty-First International Conference on Machine Learning."},{"key":"e_1_3_2_26_2","unstructured":"Jiaming Chen Yujia Li and Yu Tian. 2024. Fine-tuning of continuous-time diffusion models as entropy-regularized control. arxiv:2402.15194 (2024)."},{"key":"e_1_3_2_27_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Chen Junsong","year":"2024","unstructured":"Junsong Chen, Jincheng YU, Chongjian GE, Lewei Yao, Enze Xie, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li. 2024. Pixart- \\(\\alpha\\) : Fast training of diffusion transformer for photorealistic text-to-image synthesis. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_28_2","unstructured":"Tianqi Chen Bing Xu Chiyuan Zhang and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. arxiv:1604.06174 (2016)."},{"key":"e_1_3_2_29_2","unstructured":"Xinlei Chen Hao Fang Tsung-Yi Lin Ramakrishna Vedantam Saurabh Gupta Piotr Doll\u00e1r and C. Lawrence Zitnick. 2015. Microsoft coco captions: Data collection and evaluation server. arxiv:1504.00325 (2015)."},{"key":"e_1_3_2_30_2","unstructured":"Chuan Cheng Man Tang and Guan Zhang. 2024. Aligning few-step diffusion models with dense reward difference learning. arxiv:2411.11727 (2024)."},{"key":"e_1_3_2_31_2","first-page":"1810","volume-title":"International Conference on Machine Learning","author":"Cheng Xiang","year":"2020","unstructured":"Xiang Cheng, Dong Yin, Peter Bartlett, and Michael Jordan. 2020. Stochastic gradient and langevin processes. In International Conference on Machine Learning. PMLR, 1810\u20131819."},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00283"},{"key":"e_1_3_2_33_2","doi-asserted-by":"crossref","unstructured":"Jaemin Cho Abhay Zala and Mohit Bansal. 2023. Visual programming for step-by-step text-to-image generation and evaluation. Advances in Neural Information Processing Systems 36 (2023) 6048\u20136069.","DOI":"10.52202\/075280-0265"},{"key":"e_1_3_2_34_2","unstructured":"Paul F. Christiano Jan Leike Tom Brown Miljan Martic Shane Legg and Dario Amodei. 2017. Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems 30 (2017) 4299\u20134307."},{"key":"e_1_3_2_35_2","doi-asserted-by":"crossref","unstructured":"Kevin Clark and Priyank Jaini. 2024. Text-to-image diffusion models are zero shot classifiers. Advances in Neural Information Processing Systems 36 (2024) 58921\u201358937.","DOI":"10.52202\/075280-2571"},{"key":"e_1_3_2_36_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Clark Kevin","year":"2024","unstructured":"Kevin Clark, Paul Vicol, Kevin Swersky, and David J. Fleet. 2024. Directly fine-tuning diffusion models on differentiable rewards. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_37_2","unstructured":"Gheorghe Comanici Eric Bieber Mike Schaekermann Ice Pasupat Noveen Sachdeva Inderjit Dhillon Marcel Blistein Ori Ram Dan Zhang Evan Rosen et\u00a0al. 2025. Gemini 2.5: Pushing the frontier with advanced reasoning multimodality long context and next generation agentic capabilities. arxiv:2507.06261 (2025)."},{"key":"e_1_3_2_38_2","first-page":"9346","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Conitzer Vincent","year":"2024","unstructured":"Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Mosse, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, et\u00a0al. 2024. Position: Social choice should guide AI alignment in dealing with diverse human feedback. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 9346\u20139360."},{"key":"e_1_3_2_39_2","series-title":"ICML\u201924","volume-title":"Proceedings of the 41st International Conference on Machine Learning","author":"Conitzer Vincent","year":"2024","unstructured":"Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Moss\u00e9, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, et\u00a0al. 2024. Position: Social choice should guide AI alignment in dealing with diverse human feedback. In Proceedings of the 41st International Conference on Machine Learning (Vienna, Austria) (ICML\u201924). JMLR.org, Article 371, 15 pages."},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.00269"},{"key":"e_1_3_2_41_2","first-page":"9722","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Cui Ganqu","year":"2024","unstructured":"Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Bingxiang He, Wei Zhu, Yuan Ni, Guotong Xie, Ruobing Xie, Yankai Lin, Zhiyuan Liu, and Maosong Sun. 2024. ULTRAFEEDBACK: Boosting language models with scaled AI feedback. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 9722\u20139744."},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","unstructured":"Adyasha Dash and Kathleen Agres. 2024. AI-based affective music generation systems: A review of methods and challenges. ACM Comput. Surv. 56 11 Article 287 (jul2024) 34 pages.","DOI":"10.1145\/3672554"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00709"},{"key":"e_1_3_2_44_2","unstructured":"Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems 34 (2021) 8780\u20138794."},{"key":"e_1_3_2_45_2","unstructured":"Hanze Dong Wei Xiong Deepanshu Goyal Yihan Zhang Winnie Chow Rui Pan Shizhe Diao Jipeng Zhang KaShun SHUM and Tong Zhang. 2023. RAFT: Reward rAnked finetuning for generative foundation model alignment. Transactions on Machine Learning Research (2023)."},{"key":"e_1_3_2_46_2","unstructured":"Hanze Dong Wei Xiong Bo Pang Haoxiang Wang Han Zhao Yingbo Zhou Nan Jiang Doyen Sahoo Caiming Xiong and Tong Zhang. 2024. Rlhf workflow: From reward modeling to online rlhf. arxiv:2405.07863 (2024)."},{"key":"e_1_3_2_47_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Dong Zibin","year":"2024","unstructured":"Zibin Dong, Yifu Yuan, Jianye HAO, Fei Ni, Yao Mu, YAN ZHENG, Yujing Hu, Tangjie Lv, Changjie Fan, and Zhipeng Hu. 2024. AlignDiff: Aligning diverse human preferences via behavior-customisable diffusion model. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_48_2","unstructured":"Kaiwen Duan Hongwei Yao Yufei Chen Ziyun Li Tong Qiao Zhan Qin and Cong Wang. 2025. BadReward: Clean-label poisoning of reward models in text-to-image RLHF. arxiv:2506.03234 (2025)."},{"key":"e_1_3_2_49_2","unstructured":"Abhimanyu Dubey Abhinav Jauhri Abhinav Pandey Abhishek Kadian Ahmad Al-Dahle Aiesha Letman Akhil Mathur Alan Schelten Amy Yang Angela Fan et\u00a0al. 2024. The llama 3 herd of models. arxiv:2407.21783 (2024)."},{"key":"e_1_3_2_50_2","doi-asserted-by":"crossref","unstructured":"Yann Dubois Chen Xuechen Li Rohan Taori Tianyi Zhang Ishaan Gulrajani Jimmy Ba Carlos Guestrin Percy S. Liang and Tatsunori B. Hashimoto. 2023. Alpacafarm: A simulation framework for methods that learn from human feedback. Advances in Neural Information Processing Systems 36 (2023) 30039\u201330069.","DOI":"10.52202\/075280-1308"},{"key":"e_1_3_2_51_2","unstructured":"Arpad E. Elo and Sam Sloan. 1978. The rating of chessplayers: Past and present. Arco Publishing New York. (1978)."},{"key":"e_1_3_2_52_2","volume-title":"International Conference on Learning Representations","author":"Engstrom Logan","year":"2020","unstructured":"Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, and Aleksander Madry. 2020. Implementation matters in deep RL: A case study on PPO and TRPO. In International Conference on Learning Representations."},{"key":"e_1_3_2_53_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Esser Patrick","year":"2024","unstructured":"Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M\u00fcller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et\u00a0al. 2024. Scaling rectified flow transformers for high-resolution image synthesis. In Forty-First International Conference on Machine Learning."},{"key":"e_1_3_2_54_2","first-page":"12634","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Ethayarajh Kawin","year":"2024","unstructured":"Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, and Douwe Kiela. 2024. Model alignment as prospect theoretic optimization. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 12634\u201312651."},{"key":"e_1_3_2_55_2","doi-asserted-by":"crossref","unstructured":"Luca Eyring Shyamgopal Karthik Karsten Roth Alexey Dosovitskiy and Zeynep Akata. 2024. Reno: Enhancing one-step text-to-image models through reward-based noise optimization. Advances in Neural Information Processing Systems 37 (2024) 125487\u2013125519.","DOI":"10.52202\/079017-3987"},{"key":"e_1_3_2_56_2","doi-asserted-by":"crossref","unstructured":"Ying Fan Olivia Watkins Yuqing Du Hao Liu Moonkyung Ryu Craig Boutilier Pieter Abbeel Mohammad Ghavamzadeh Kangwook Lee and Kimin Lee. 2023. Reinforcement learning for fine-tuning text-to-image diffusion models. Advances in Neural Information Processing Systems 36 (2023) 79858\u201379885.","DOI":"10.52202\/075280-3497"},{"key":"e_1_3_2_57_2","volume-title":"The Eleventh International Conference on Learning Representations","author":"Feng Weixi","year":"2023","unstructured":"Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Reddy Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, and William Yang Wang. 2023. Training-free structured diffusion guidance for compositional text-to-image synthesis. In The Eleventh International Conference on Learning Representations."},{"key":"e_1_3_2_58_2","doi-asserted-by":"crossref","unstructured":"Giorgio Franceschelli and Mirco Musolesi. 2024. Creativity and machine learning: A survey. ACM Comput. Surv. 56 11 Article 283 (jun2024) 41 pages.","DOI":"10.1145\/3664595"},{"key":"e_1_3_2_59_2","volume-title":"International Conference on Machine Learning","author":"Fu Minghao","year":"2025","unstructured":"Minghao Fu, Guo-Hua Wang, Liangfu Cao, Qing-Guo Chen, Zhao Xu, Weihua Luo, and Kaifu Zhang. 2025. CHATS: Combining human-aligned optimization and test-time sampling for text-to-image generation. In International Conference on Machine Learning."},{"key":"e_1_3_2_60_2","first-page":"1050","volume-title":"international Conference on Machine Learning","author":"Gal Yarin","year":"2016","unstructured":"Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international Conference on Machine Learning. PMLR, 1050\u20131059."},{"key":"e_1_3_2_61_2","first-page":"10835","volume-title":"International Conference on Machine Learning","author":"Gao Leo","year":"2023","unstructured":"Leo Gao, John Schulman, and Jacob Hilton. 2023. Scaling laws for reward model overoptimization. In International Conference on Machine Learning. PMLR, 10835\u201310866."},{"key":"e_1_3_2_62_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Gao Songyang","year":"2024","unstructured":"Songyang Gao, Qiming Ge, Wei Shen, Shihan Dou, Junjie Ye, Xiao Wang, Rui Zheng, Yicheng Zou, Zhi Chen, Hang Yan, Qi Zhang, and Dahua Lin. 2024. Linear alignment: A closed-form solution for aligning human preferences without tuning and feedback. In Forty-First International Conference on Machine Learning."},{"key":"e_1_3_2_63_2","series-title":"Proceedings of Machine Learning Research","first-page":"4447","volume-title":"Proceedings of The 27th International Conference on Artificial Intelligence and Statistics","volume":"238","author":"Azar Mohammad Gheshlaghi","year":"2024","unstructured":"Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Remi Munos, Mark Rowland, Michal Valko, and Daniele Calandriello. 2024. A general theoretical paradigm to understand learning from human preferences. In Proceedings of The 27th International Conference on Artificial Intelligence and Statistics(Proceedings of Machine Learning Research, Vol. 238), Sanjoy Dasgupta, Stephan Mandt, and Yingzhen Li (Eds.). PMLR, 4447\u20134455."},{"key":"e_1_3_2_64_2","doi-asserted-by":"crossref","unstructured":"Dhruba Ghosh Hannaneh Hajishirzi and Ludwig Schmidt. 2023. Geneval: An object-focused framework for evaluating text-to-image alignment. Advances in Neural Information Processing Systems 36 (2023) 52132\u201352152.","DOI":"10.52202\/075280-2270"},{"key":"e_1_3_2_65_2","doi-asserted-by":"crossref","unstructured":"Daniel T. Gillespie. 2000. The chemical Langevin equation. The Journal of Chemical Physics 113 1 (2000) 297\u2013306.","DOI":"10.1063\/1.481811"},{"key":"e_1_3_2_66_2","doi-asserted-by":"crossref","unstructured":"Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63 11 (2020) 139\u2013144.","DOI":"10.1145\/3422622"},{"key":"e_1_3_2_67_2","volume-title":"Advances in Neural Information Processing Systems","author":"Goyal An-Rok","year":"2023","unstructured":"An-Rok Goyal, Beomsu Kim, and Gyeong-Moon Kim. 2023. VPGen: Visual-programming-guided generation for text-to-image diffusion models. In Advances in Neural Information Processing Systems, Vol. 36."},{"key":"e_1_3_2_68_2","unstructured":"Siyi Gu Minkai Xu Alexander Powers Weili Nie Tomas Geffner Karsten Kreis Jure Leskovec Arash Vahdat and Stefano Ermon. 2024. Aligning target-aware molecule diffusion models with exact energy optimization. arxiv:2407.01648 (2024)."},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00896"},{"key":"e_1_3_2_70_2","unstructured":"Ziyu Guo Renrui Zhang Chengzhuo Tong Zhizheng Zhao Peng Gao Hongsheng Li and Pheng-Ann Heng. 2025. Can we generate images with CoT? Let\u2019s verify and reinforce image generation step by step. arxiv:2501.13926 (2025)."},{"key":"e_1_3_2_71_2","doi-asserted-by":"crossref","unstructured":"Yaru Hao Zewen Chi Li Dong and Furu Wei. 2023. Optimizing prompts for text-to-image generation. Advances in Neural Information Processing Systems 36 (2023) 66923\u201366939.","DOI":"10.52202\/075280-2923"},{"key":"e_1_3_2_72_2","unstructured":"Sebastian Hartwig Dominik Engel Leon Sick Hannah Kniesel Tristan Payer Poonam Poonam Michael Gl\u00f6ckler Alex B\u00e4uerle and Timo Ropinski. 2024. Evaluating text-to-image synthesis: Survey and taxonomy of image quality metrics. CoRR (2024)."},{"key":"e_1_3_2_73_2","volume-title":"The Eleventh International Conference on Learning Representations","author":"Hertz Amir","year":"2023","unstructured":"Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-or. 2023. Prompt-to-prompt image editing with cross-attention control. In The Eleventh International Conference on Learning Representations."},{"key":"e_1_3_2_74_2","unstructured":"Martin Heusel Hubert Ramsauer Thomas Unterthiner Bernhard Nessler and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems 30 (2017) 6626\u20136637."},{"key":"e_1_3_2_75_2","unstructured":"Jonathan Ho William Chan Chitwan Saharia Jay Whang Ruiqi Gao Alexey Gritsenko Diederik P. Kingma Ben Poole Mohammad Norouzi David J. Fleet and Tim Salimans. 2022. Imagen video: High definition video generation with diffusion models. arxiv:2210.02303 (2022)."},{"key":"e_1_3_2_76_2","unstructured":"Jonathan Ho Ajay Jain and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020) 6840\u20136851."},{"key":"e_1_3_2_77_2","doi-asserted-by":"crossref","unstructured":"Jiwoo Hong Noah Lee and James Thorne. 2024. Orpo: Monolithic preference optimization without reference model. arxiv:2403.07691 2 4 (2024) 5.","DOI":"10.18653\/v1\/2024.emnlp-main.626"},{"key":"e_1_3_2_78_2","volume-title":"First Workshop on Scalable Optimization for Efficient and Adaptive Foundation Models","author":"Hong Jiwoo","year":"2024","unstructured":"Jiwoo Hong, Sayak Paul, Noah Lee, Kashif Rasul, James Thorne, and Jongheon Jeong. 2024. Margin-aware preference optimization for aligning diffusion models without reference. In First Workshop on Scalable Optimization for Efficient and Adaptive Foundation Models."},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00686"},{"key":"e_1_3_2_80_2","volume-title":"International Conference on Machine Learning","author":"Hong Yuzhong","year":"2025","unstructured":"Yuzhong Hong, Hanshan Zhang, Junwei Bao, Hongfei Jiang, and Yang Song. 2025. Energy-based preference model offers better offline alignment than the bradley-terry preference model. In International Conference on Machine Learning."},{"key":"e_1_3_2_81_2","unstructured":"Zhenyu Hou Pengfan Du Yilin Niu Zhengxiao Du Aohan Zeng Xiao Liu Minlie Huang Hongning Wang Jie Tang and Yuxiao Dong. 2024. Does RLHF scale? exploring the impacts from data model and method. arxiv:2412.06000 (2024)."},{"key":"e_1_3_2_82_2","volume-title":"International Conference on Learning Representations","author":"Hu Edward J.","year":"2021","unstructured":"Edward J. Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations."},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.02198"},{"key":"e_1_3_2_84_2","volume-title":"International Conference on Machine Learning","author":"Hu Zijing","year":"2025","unstructured":"Zijing Hu, Fengda Zhang, and Kun Kuang. 2025. D-Fusion: Direct preference optimization for aligning diffusion models with visually consistent samples. In International Conference on Machine Learning."},{"key":"e_1_3_2_85_2","unstructured":"Lijun Huang Ka-Chun Wong and Jun-Cheng Chen. 2025. Flow-DPO: Improving video generation with human feedback. arxiv:2501.13918 (2025)."},{"key":"e_1_3_2_86_2","first-page":"13916","volume-title":"International Conference on Machine Learning","author":"Huang Rongjie","year":"2023","unstructured":"Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Luping Liu, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, and Zhou Zhao. 2023. Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models. In International Conference on Machine Learning. PMLR, 13916\u201313932."},{"key":"e_1_3_2_87_2","unstructured":"Zihan Huang Zekun Zhang Yifei Liu Yujia Li and Yu Tian. 2025. ADT: Tuning diffusion models with adversarial supervision. arxiv:2504.11423 (2025)."},{"key":"e_1_3_2_88_2","volume-title":"The Thirteenth International Conference on Learning Representations","author":"Ignatyev Savva Victorovich","year":"2025","unstructured":"Savva Victorovich Ignatyev, Nina Konovalova, Daniil Selikhanovych, Oleg Voynov, Nikolay Patakin, Ilya Olkov, Dmitry Senushkin, Alexey Artemov, Anton Konushin, Alexander Filippov, et\u00a0al. 2025. A3D: Does diffusion dream about 3D alignment?. In The Thirteenth International Conference on Learning Representations."},{"key":"e_1_3_2_89_2","first-page":"20983","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Im Shawn","year":"2024","unstructured":"Shawn Im and Yixuan Li. 2024. Understanding the learning dynamics of alignment with human feedback. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 20983\u201321006."},{"key":"e_1_3_2_90_2","unstructured":"Shawn Im and Yixuan Li. 2024. Understanding the learning dynamics of alignment with human feedback. arxiv:2403.18742 (2024)."},{"key":"e_1_3_2_91_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Isajanyan Arman","year":"2024","unstructured":"Arman Isajanyan, Artur Shatveryan, David Kocharian, Zhangyang Wang, and Humphrey Shi. 2024. Social reward: Evaluating and enhancing generative AI through million-user feedback from an online creative community. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_92_2","doi-asserted-by":"crossref","unstructured":"Abdul Jabbar Xi Li and Bourahla Omar. 2021. A survey on generative adversarial networks: Variants applications and training. ACM Comput. Surv. 54 8 Article 157 (oct2021) 49 pages.","DOI":"10.1145\/3463475"},{"key":"e_1_3_2_93_2","first-page":"21648","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Ji Haozhe","year":"2024","unstructured":"Haozhe Ji, Cheng Lu, Yilin Niu, Pei Ke, Hongning Wang, Jun Zhu, Jie Tang, and Minlie Huang. 2024. Towards efficient exact optimization of language model alignment. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 21648\u201321671."},{"key":"e_1_3_2_94_2","doi-asserted-by":"crossref","unstructured":"Zhen Jia Zhang Zhang Liang Wang and Tieniu Tan. 2024. Human image generation: A comprehensive survey. ACM Comput. Surv. 56 11 Article 279 (jun2024) 39 pages.","DOI":"10.1145\/3665869"},{"key":"e_1_3_2_95_2","doi-asserted-by":"crossref","unstructured":"Lifan Jiang Boxi Wu Jiahui Zhang Xiaotong Guan and Shuang Chen. 2025. HuViDPO: Enhancing video generation through direct preference optimization for human-centric alignment. arxiv:2502.01690 (2025).","DOI":"10.2139\/ssrn.5358518"},{"key":"e_1_3_2_96_2","volume-title":"The Thirteenth International Conference on Learning Representations","author":"Kawata Ryotaro","year":"2025","unstructured":"Ryotaro Kawata, Kazusato Oko, Atsushi Nitanda, and Taiji Suzuki. 2025. Direct distributional optimization for provable alignment of diffusion models. In The Thirteenth International Conference on Learning Representations."},{"key":"e_1_3_2_97_2","volume-title":"Advances in Neural Information Processing Systems","author":"Kendall Alex","year":"2017","unstructured":"Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in bayesian deep learning for computer vision?. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc."},{"key":"e_1_3_2_98_2","unstructured":"Sanghyun Kim Moonseok Choi Jinwoo Shin and Juho Lee. 2024. Safety alignment backfires: Preventing the re-emergence of suppressed concepts in fine-tuned text-to-image diffusion models. arxiv:2412.00357 (2024)."},{"key":"e_1_3_2_99_2","volume-title":"The Thirteenth International Conference on Learning Representations","author":"Kim Sunwoo","year":"2025","unstructured":"Sunwoo Kim, Minkyu Kim, and Dongmin Park. 2025. Test-time alignment of diffusion models without reward over-optimization. In The Thirteenth International Conference on Learning Representations."},{"key":"e_1_3_2_100_2","unstructured":"Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. arxiv:1312.6114 (2013)."},{"key":"e_1_3_2_101_2","doi-asserted-by":"crossref","unstructured":"Yuval Kirstain Adam Polyak Uriel Singer Shahbuland Matiana Joe Penna and Omer Levy. 2023. Pick-a-pic: An open dataset of user preferences for text-to-image generation. Advances in Neural Information Processing Systems 36 (2023) 36652\u201336663.","DOI":"10.52202\/075280-1594"},{"key":"e_1_3_2_102_2","first-page":"1885","volume-title":"International Conference on Machine Learning","author":"Koh Pang Wei","year":"2017","unstructured":"Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In International Conference on Machine Learning. PMLR, 1885\u20131894."},{"key":"e_1_3_2_103_2","volume-title":"International Conference on Learning Representations","author":"Kong Zhifeng","year":"2021","unstructured":"Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. 2021. DiffWave: A versatile diffusion model for audio synthesis. In International Conference on Learning Representations."},{"key":"e_1_3_2_104_2","unstructured":"Daeun Lee Jaehong Yoon Jaemin Cho and Mohit Bansal. 2024. VideoRepair: Improving text-to-video generation via misalignment evaluation and localized refinement. arxiv:2411.15115 (2024)."},{"key":"e_1_3_2_105_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Lee Harrison","year":"2024","unstructured":"Harrison Lee, Samrat Phatale, Hassan Mansoor, Thomas Mesnard, Johan Ferret, Kellie Ren Lu, Colton Bishop, Ethan Hall, Victor Carbune, Abhinav Rastogi, and Sushant Prakash. 2024. RLAIF vs. RLHF: Scaling reinforcement learning from human feedback with AI feedback. In Forty-First International Conference on Machine Learning."},{"key":"e_1_3_2_106_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.01721"},{"key":"e_1_3_2_107_2","unstructured":"Kimin Lee Hao Liu Moonkyung Ryu Olivia Watkins Yuqing Du Craig Boutilier Pieter Abbeel Mohammad Ghavamzadeh and Shixiang Shane Gu. 2023. Aligning text-to-image models using human feedback. arxiv:2302.12192 (2023)."},{"key":"e_1_3_2_108_2","doi-asserted-by":"crossref","unstructured":"Tony Lee Michihiro Yasunaga Chenlin Meng Yifan Mai Joon Sung Park Agrim Gupta Yunzhi Zhang Deepak Narayanan Hannah Teufel Marco Bellagente et\u00a0al. 2023. Holistic evaluation of text-to-image models. Advances in Neural Information Processing Systems 36 (2023) 69981\u201370011.","DOI":"10.52202\/075280-3067"},{"key":"e_1_3_2_109_2","unstructured":"Sergey Levine Aviral Kumar George Tucker and Justin Fu. 2020. Offline reinforcement learning: Tutorial review and perspectives on open problems. arxiv:2005.01643 (2020)."},{"key":"e_1_3_2_110_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00210"},{"key":"e_1_3_2_111_2","first-page":"12888","volume-title":"International Conference on Machine Learning","author":"Li Junnan","year":"2022","unstructured":"Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning. PMLR, 12888\u201312900."},{"key":"e_1_3_2_112_2","unstructured":"Shufan Li Konstantinos Kallidromitis Akash Gokul Yusuke Kato and Kazuki Kozuka. 2024. Aligning diffusion models by optimizing human utility. arxiv:2404.04465 (2024)."},{"key":"e_1_3_2_113_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.02201"},{"key":"e_1_3_2_114_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Li Xian","year":"2024","unstructured":"Xian Li, Ping Yu, Chunting Zhou, Timo Schick, Omer Levy, Luke Zettlemoyer, Jason E. Weston, and Mike Lewis. 2024. Self-alignment with instruction backtranslation. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_115_2","unstructured":"Xiner Li Yulai Zhao Chenyu Wang Gabriele Scalia Gokcen Eraslan Surag Nair Tommaso Biancalani Shuiwang Ji Aviv Regev Sergey Levine et\u00a0al. 2024. Derivative-free guidance in continuous and discrete diffusion models with soft value-based decoding. arxiv:2408.08252 (2024)."},{"key":"e_1_3_2_116_2","unstructured":"Yafu Li Xuyang Hu Xiaoye Qu Linjie Li and Yu Cheng. 2025. Test-time preference optimization: On-the-fly alignment via iterative textual feedback. arxiv:2501.12895 (2025)."},{"key":"e_1_3_2_117_2","volume-title":"34th British Machine Vision Conference 2023, BMVC 2023","author":"Li Yumeng","year":"2023","unstructured":"Yumeng Li, Margret Keuper, Dan Zhang, and Anna Khoreva. 2023. Divide & bind your attention for improved generative semantic nursing. In 34th British Machine Vision Conference 2023, BMVC 2023."},{"key":"e_1_3_2_118_2","volume-title":"Advances in Neural Information Processing Systems","author":"Li Yiyuan","year":"2024","unstructured":"Yiyuan Li, Weizhen Zhou, Sijia Song, and Han Liu. 2024. CoMat: Aligning text-to-image diffusion model with image-to-text concept matching. In Advances in Neural Information Processing Systems."},{"key":"e_1_3_2_119_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01835"},{"key":"e_1_3_2_120_2","unstructured":"Zhanhao Liang Yuhui Yuan Shuyang Gu Bohan Chen Tiankai Hang Ji Li and Liang Zheng. 2024. Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step. arxiv:2406.04314 (2024)."},{"key":"e_1_3_2_121_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_122_2","first-page":"366","volume-title":"European Conference on Computer Vision","author":"Lin Zhiqiu","year":"2024","unstructured":"Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, and Deva Ramanan. 2024. Evaluating text-to-visual generation with image-to-text generation. In European Conference on Computer Vision. Springer, 366\u2013384."},{"key":"e_1_3_2_123_2","unstructured":"Jiashuo Liu Zheyan Shen Yue He Xingxuan Zhang Renzhe Xu Han Yu and Peng Cui. 2021. Towards out-of-distribution generalization: A survey. arxiv:2108.13624 (2021)."},{"key":"e_1_3_2_124_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.00750"},{"key":"e_1_3_2_125_2","unstructured":"Tianqi Liu Zhen Qin Junru Wu Jiaming Shen Misha Khalman Rishabh Joshi Yao Zhao Mohammad Saleh Simon Baumgartner Jialu Liu et\u00a0al. 2024. Lipo: Listwise preference optimization through learning-to-rank. arxiv:2402.01878 (2024)."},{"key":"e_1_3_2_126_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Liu Tianqi","year":"2024","unstructured":"Tianqi Liu, Yao Zhao, Rishabh Joshi, Misha Khalman, Mohammad Saleh, Peter J. Liu, and Jialu Liu. 2024. Statistical rejection sampling improves preference optimization. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_127_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3501825"},{"key":"e_1_3_2_128_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Liu Wei","year":"2024","unstructured":"Wei Liu, Weihao Zeng, Keqing He, Yong Jiang, and Junxian He. 2024. What makes good data for alignment? a comprehensive study of automatic data selection in instruction tuning. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_129_2","unstructured":"Aaron Lou Chenlin Meng and Stefano Ermon. 2024. Discrete diffusion modeling by estimating the ratios of the data distribution. stat (2024)."},{"key":"e_1_3_2_130_2","volume-title":"International Conference on Machine Learning","author":"Lu Yunhong","year":"2025","unstructured":"Yunhong Lu, Qichao Wang, Hengyuan Cao, Xiaoyin Xu, and Min Zhang. 2025. Smoothed preference optimization via ReNoise inversion for aligning diffusion models with varied human preferences. In International Conference on Machine Learning."},{"key":"e_1_3_2_131_2","doi-asserted-by":"crossref","unstructured":"Yujie Lu Xianjun Yang Xiujun Li Xin Eric Wang and William Yang Wang. 2023. Llmscore: Unveiling the power of large language models in text-to-image synthesis evaluation. Advances in Neural Information Processing Systems 36 (2023) 23075\u201323093.","DOI":"10.52202\/075280-1001"},{"key":"e_1_3_2_132_2","volume-title":"Individual Choice Behavior","author":"Luce R. Duncan","year":"1959","unstructured":"R. Duncan Luce. 1959. Individual Choice Behavior. Vol. 4. Wiley New York."},{"key":"e_1_3_2_133_2","unstructured":"Nanye Ma Shangyuan Tong Haolin Jia Hexiang Hu Yu-Chuan Su Mingda Zhang Xuan Yang Yandong Li Tommi Jaakkola Xuhui Jia et\u00a0al. 2025. Inference-time scaling for diffusion models beyond scaling denoising steps. arxiv:2501.09732 (2025)."},{"key":"e_1_3_2_134_2","doi-asserted-by":"publisher","DOI":"10.1145\/3664647.3681688"},{"key":"e_1_3_2_135_2","unstructured":"Oscar Ma\u00f1as Pietro Astolfi Melissa Hall Candace Ross Jack Urbanek Adina Williams Aishwarya Agrawal Adriana Romero-Soriano and Michal Drozdzal. 2024. Improving text-to-image consistency via automatic prompt optimization. arxiv:2403.17804 (2024)."},{"key":"e_1_3_2_136_2","unstructured":"Sourab Mangrulkar Sylvain Gugger Lysandre Debut Younes Belkada Sayak Paul and Benjamin Bossan. 2022. PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods. Retrieved from https:\/\/github.com\/huggingface\/peft"},{"key":"e_1_3_2_137_2","unstructured":"Yu Meng Mengzhou Xia and Danqi Chen. 2024. Simpo: Simple preference optimization with a reference-free reward. arxiv:2405.14734 (2024)."},{"key":"e_1_3_2_138_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02514"},{"key":"e_1_3_2_139_2","unstructured":"Shakir Mohamed Mihaela Rosca Michael Figurnov and Andriy Mnih. 2020. Monte carlo gradient estimation in machine learning. Journal of Machine Learning Research 21 132 (2020) 1\u201362."},{"key":"e_1_3_2_140_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Munos Remi","year":"2024","unstructured":"Remi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, C\u00f4me Fiegel, et\u00a0al. 2024. Nash learning from human feedback. In Forty-First International Conference on Machine Learning."},{"key":"e_1_3_2_141_2","unstructured":"Ashvin Nair Abhishek Gupta Murtaza Dalal and Sergey Levine. 2020. Awac: Accelerating online reinforcement learning with offline datasets. arxiv:2006.09359 (2020)."},{"key":"e_1_3_2_142_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Ngo Richard","year":"2024","unstructured":"Richard Ngo, Lawrence Chan, and S\u00f6ren Mindermann. 2024. The alignment problem from a deep learning perspective. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_143_2","series-title":"Proceedings of Machine Learning Research","first-page":"16784","volume-title":"Proceedings of the 39th International Conference on Machine Learning","volume":"162","author":"Nichol Alexander Quinn","year":"2022","unstructured":"Alexander Quinn Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2022. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In Proceedings of the 39th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 162). PMLR, 16784\u201316804."},{"key":"e_1_3_2_144_2","first-page":"38145","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Nika Andi","year":"2024","unstructured":"Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban, Georgios Tzannetos, Goran Radanovic, and Adish Singla. 2024. Reward model learning vs. direct policy optimization: A comparative analysis of learning from human preferences. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 38145\u201338186."},{"key":"e_1_3_2_145_2","unstructured":"OpenAI. 2023. Gpt-4 technical report. arxiv:2303.08774 (2023)."},{"key":"e_1_3_2_146_2","doi-asserted-by":"crossref","unstructured":"Jonas Oppenlaender. 2023. A taxonomy of prompt modifiers for text-to-image generation. Behaviour & Information Technology 43 15 (2023) 1\u201314.","DOI":"10.1080\/0144929X.2023.2286532"},{"key":"e_1_3_2_147_2","unstructured":"Yuta Oshima Masahiro Suzuki Yutaka Matsuo and Hiroki Furuta. 2025. Inference-time text-to-video alignment with diffusion latent beam search. arxiv:2501.19252 (2025)."},{"key":"e_1_3_2_148_2","doi-asserted-by":"crossref","unstructured":"Long Ouyang Jeffrey Wu Xu Jiang Diogo Almeida Carroll Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal et\u00a0al. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022) 27730\u201327744.","DOI":"10.52202\/068431-2011"},{"key":"e_1_3_2_149_2","first-page":"39416","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Pang Xianghe","year":"2024","unstructured":"Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, and Siheng Chen. 2024. Self-alignment of large language models via monopolylogue-based social scene simulation. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 39416\u201339447."},{"key":"e_1_3_2_150_2","volume-title":"The Thirteenth International Conference on Learning Representations","author":"Peng Yuang","year":"2025","unstructured":"Yuang Peng, Yuxin Cui, Haomiao Tang, Zekun Qi, Runpei Dong, Jing Bai, Chunrui Han, Zheng Ge, Xiangyu Zhang, and Shu-Tao Xia. 2025. DreamBench++: A human-aligned benchmark for personalized image generation. In The Thirteenth International Conference on Learning Representations."},{"key":"e_1_3_2_151_2","unstructured":"Vitali Petsiuk Alexander E. Siemenn Saisamrit Surbehera Zad Chin Keith Tyser Gregory Hunter Arvind Raghavan Yann Hicke Bryan A. Plummer Ori Kerret et\u00a0al. 2022. Human evaluation of text-to-image models on a multi-task benchmark. arxiv:2211.12112 (2022)."},{"key":"e_1_3_2_152_2","doi-asserted-by":"crossref","unstructured":"Robin L. Plackett. 1975. The analysis of permutations. Journal of the Royal Statistical Society Series C: Applied Statistics 24 2 (1975) 193\u2013202.","DOI":"10.2307\/2346567"},{"key":"e_1_3_2_153_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Podell Dustin","year":"2024","unstructured":"Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M\u00fcller, Joe Penna, and Robin Rombach. 2024. SDXL: Improving latent diffusion models for high-resolution image synthesis. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_154_2","unstructured":"Mihir Prabhudesai Anirudh Goyal Deepak Pathak and Katerina Fragkiadaki. 2023. Aligning text-to-image diffusion models with reward backpropagation. arxiv:2310.03739 (2023)."},{"key":"e_1_3_2_155_2","unstructured":"Mihir Prabhudesai Russell Mendonca Zheyang Qin Katerina Fragkiadaki and Deepak Pathak. 2024. Video diffusion alignment via reward gradients. arxiv:2407.08737 (2024)."},{"key":"e_1_3_2_156_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Qi Xiangyu","year":"2024","unstructured":"Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, and Peter Henderson. 2024. Fine-tuning aligned language models compromises safety, even when users do not intend To!. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_157_2","unstructured":"Zipeng Qi Lichen Bai Haoyi Xiong and Zeke Xie. 2024. Not all noises are created equally: Diffusion noise selection and optimization. arxiv:2407.14041 (2024)."},{"key":"e_1_3_2_158_2","unstructured":"Zipeng Qi Buhua Liu Shiyan Zhang Bao Li Zhiqiang Xu Haoyi Xiong and Zeke Xie. 2024. A simple and efficient baseline for zero-shot generative classification. arxiv:2412.12594 (2024)."},{"key":"e_1_3_2_159_2","first-page":"8748","volume-title":"International Conference on Machine Learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748\u20138763."},{"key":"e_1_3_2_160_2","doi-asserted-by":"crossref","unstructured":"Rafael Rafailov Archit Sharma Eric Mitchell Christopher D Manning Stefano Ermon and Chelsea Finn. 2023. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems 36 (2023) 53728\u201353741.","DOI":"10.52202\/075280-2338"},{"key":"e_1_3_2_161_2","doi-asserted-by":"crossref","unstructured":"Alexandre Rame Guillaume Couairon Corentin Dancette Jean-Baptiste Gaya Mustafa Shukor Laure Soulier and Matthieu Cord. 2023. Rewarded soups: Towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards. Advances in Neural Information Processing Systems 36 (2023) 71095\u201371134.","DOI":"10.52202\/075280-3114"},{"key":"e_1_3_2_162_2","doi-asserted-by":"crossref","unstructured":"Royi Rassin Eran Hirsch Daniel Glickman Shauli Ravfogel Yoav Goldberg and Gal Chechik. 2024. Linguistic binding in diffusion models: Enhancing attribute correspondence through attention map alignment. Advances in Neural Information Processing Systems 36 (2024) 3536\u20133559.","DOI":"10.52202\/075280-0157"},{"key":"e_1_3_2_163_2","first-page":"42258","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Chowdhury Sayak Ray","year":"2024","unstructured":"Sayak Ray Chowdhury, Anush Kini, and Nagarajan Natarajan. 2024. Provably robust DPO: Aligning language models with noisy feedback. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 42258\u201342274."},{"key":"e_1_3_2_164_2","unstructured":"Jie Ren Yuhang Zhang Dongrui Liu Xiaopeng Zhang and Qi Tian. 2025. Refining alignment framework for diffusion models with intermediate-step preference ranking. arxiv:2502.01667 (2025)."},{"key":"e_1_3_2_165_2","doi-asserted-by":"publisher","DOI":"10.1145\/3411763.3451760"},{"key":"e_1_3_2_166_2","unstructured":"Pierre Harvey Richemond Yunhao Tang Daniel Guo Daniele Calandriello Mohammad Gheshlaghi Azar Rafael Rafailov et\u00a0al. 2024. Offline regularised reinforcement learning for large language models alignment. arxiv:2405.19107 (2024)."},{"key":"e_1_3_2_167_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_2_168_2","doi-asserted-by":"crossref","unstructured":"Chitwan Saharia William Chan Saurabh Saxena Lala Li Jay Whang Emily L. Denton Kamyar Ghasemipour Raphael Gontijo Lopes Burcu Karagol Ayan Tim Salimans et\u00a0al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022) 36479\u201336494.","DOI":"10.52202\/068431-2643"},{"key":"e_1_3_2_169_2","unstructured":"Tim Salimans Ian Goodfellow Wojciech Zaremba Vicki Cheung Alec Radford and Xi Chen. 2016. Improved techniques for training gans. Advances in Neural Information Processing Systems 29 (2016) 2234\u20132242."},{"key":"e_1_3_2_170_2","doi-asserted-by":"crossref","unstructured":"Axel Sauer Frederic Boesel Tim Dockhorn Andreas Blattmann Patrick Esser and Robin Rombach. 2024. Fast high-resolution image synthesis with latent adversarial diffusion distillation. arxiv:2403.12015 (2024).","DOI":"10.1145\/3680528.3687625"},{"key":"e_1_3_2_171_2","doi-asserted-by":"crossref","unstructured":"Divya Saxena and Jiannong Cao. 2021. Generative Adversarial Networks (GANs): Challenges solutions and future directions. ACM Comput. Surv. 54 3 Article 63 (may2021) 42 pages.","DOI":"10.1145\/3446374"},{"key":"e_1_3_2_172_2","doi-asserted-by":"crossref","unstructured":"Christoph Schuhmann Romain Beaumont Richard Vencu Cade Gordon Ross Wightman et\u00a0al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems 35 (2022) 25278\u201325294.","DOI":"10.52202\/068431-1833"},{"key":"e_1_3_2_173_2","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arxiv:1707.06347 (2017)."},{"key":"e_1_3_2_174_2","doi-asserted-by":"publisher","DOI":"10.1145\/3501714.3501755"},{"key":"e_1_3_2_175_2","doi-asserted-by":"crossref","unstructured":"Ken Sekimoto. 1998. Langevin equation and thermodynamics. Progress of Theoretical Physics Supplement 130 (1998) 17\u201327.","DOI":"10.1143\/PTPS.130.17"},{"key":"e_1_3_2_176_2","unstructured":"Pier Giuseppe Sessa Robert Dadashi L\u00e9onard Hussenot Johan Ferret Nino Vieillard Alexandre Ram\u00e9 Bobak Shariari Sarah Perrin Abe Friesen Geoffrey Cideron et\u00a0al. 2024. Bond: Aligning llms with best-of-n distillation. arxiv:2407.14622 (2024)."},{"key":"e_1_3_2_177_2","volume-title":"International Conference on Learning Representations","author":"Shen Lingkai","year":"2025","unstructured":"Lingkai Shen, Yifei Liu, Yujia Li, and Yu Tian. 2025. Efficient diversity-preserving diffusion alignment via gradient-informed GFlowNets. In International Conference on Learning Representations."},{"key":"e_1_3_2_178_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Shen Xudong","year":"2024","unstructured":"Xudong Shen, Chao Du, Tianyu Pang, Min Lin, Yongkang Wong, and Mohan Kankanhalli. 2024. Finetuning text-to-image diffusion models for fairness. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_179_2","unstructured":"Dingyuan Shi Yong Wang Hangyu Li and Xiangxiang Chu. 2024. Preference Alignment for Diffusion Model via Explicit Denoised Distribution Estimation. arxiv:2411.14871"},{"key":"e_1_3_2_180_2","doi-asserted-by":"crossref","unstructured":"Ilia Shumailov Zakhar Shumaylov Yiren Zhao Nicolas Papernot Ross Anderson and Yarin Gal. 2024. AI models collapse when trained on recursively generated data. Nature 631 8022 (2024) 755\u2013759.","DOI":"10.1038\/s41586-024-07566-y"},{"key":"e_1_3_2_181_2","unstructured":"Raghav Singhal Zachary Horvitz Ryan Teehan Mengye Ren Zhou Yu Kathleen McKeown and Rajesh Ranganath. 2025. A general framework for inference-time scaling and steering of diffusion models. arxiv:2501.06848 (2025)."},{"key":"e_1_3_2_182_2","doi-asserted-by":"crossref","unstructured":"Joar Skalse Nikolaus Howe Dmitrii Krasheninnikov and David Krueger. 2022. Defining and characterizing reward gaming. Advances in Neural Information Processing Systems 35 (2022) 9460\u20139471.","DOI":"10.52202\/068431-0687"},{"key":"e_1_3_2_183_2","unstructured":"Charlie Snell Jaehoon Lee Kelvin Xu and Aviral Kumar. 2024. Scaling llm test-time compute optimally can be more effective than scaling model parameters. arxiv:2408.03314 (2024)."},{"key":"e_1_3_2_184_2","volume-title":"The Eleventh International Conference on Learning Representations","author":"Snell Charlie Victor","year":"2023","unstructured":"Charlie Victor Snell, Ilya Kostrikov, Yi Su, Sherry Yang, and Sergey Levine. 2023. Offline RL for natural language generation with implicit language Q learning. In The Eleventh International Conference on Learning Representations."},{"key":"e_1_3_2_185_2","first-page":"2256","volume-title":"International Conference on Machine Learning","author":"Sohl-Dickstein Jascha","year":"2015","unstructured":"Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning. PMLR, 2256\u20132265."},{"key":"e_1_3_2_186_2","doi-asserted-by":"crossref","unstructured":"Gowthami Somepalli Anubhav Gupta Kamal Gupta Shramay Palta Micah Goldblum Jonas Geiping Abhinav Shrivastava and Tom Goldstein. 2024. Measuring style similarity in diffusion models. arxiv:2404.01292 (2024).","DOI":"10.1007\/978-3-031-72848-8_9"},{"key":"e_1_3_2_187_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i17.29865"},{"key":"e_1_3_2_188_2","volume-title":"International Conference on Learning Representations","author":"Song Jiaming","year":"2021","unstructured":"Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising diffusion implicit models. In International Conference on Learning Representations."},{"key":"e_1_3_2_189_2","volume-title":"International Conference on Learning Representations","author":"Song Yang","year":"2021","unstructured":"Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2021. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations."},{"key":"e_1_3_2_190_2","first-page":"46280","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Sorensen Taylor","year":"2024","unstructured":"Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell L. Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, et\u00a0al. 2024. Position: A roadmap to pluralistic alignment. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 46280\u201346302."},{"key":"e_1_3_2_191_2","first-page":"46625","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Stephan Moritz Pascal","year":"2024","unstructured":"Moritz Pascal Stephan, Alexander Khazatsky, Eric Mitchell, Annie S Chen, Sheryl Hsu, Archit Sharma, and Chelsea Finn. 2024. RLVF: Learning from verbal feedback without overgeneralization. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 46625\u201346656."},{"key":"e_1_3_2_192_2","unstructured":"Nisan Stiennon Long Ouyang Jeffrey Wu Daniel Ziegler Ryan Lowe Chelsea Voss Alec Radford Dario Amodei and Paul F. Christiano. 2020. Learning to summarize with human feedback. Advances in Neural Information Processing Systems 33 (2020) 3008\u20133021."},{"key":"e_1_3_2_193_2","unstructured":"Zhiqing Sun Yikang Shen Qinhong Zhou Hongxin Zhang Zhenfang Chen David Cox Yiming Yang and Chuang Gan. 2023. Principle-driven self-alignment of language models from scratch with minimal human supervision. Advances in Neural Information Processing Systems 36 (2023)."},{"key":"e_1_3_2_194_2","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton Richard S.","year":"2020","unstructured":"Richard S. Sutton and Andrew G. Barto. 2020. Reinforcement Learning: An Introduction. MIT press."},{"key":"e_1_3_2_195_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_2_196_2","unstructured":"Xiaofeng Tan Hongsong Wang Xin Geng and Pan Zhou. 2024. SoPo: Text-to-motion generation using semi-online preference optimization. arxiv:2412.05095 (2024)."},{"key":"e_1_3_2_197_2","unstructured":"Yunhao Tang Daniel Zhaohan Guo Zeyu Zheng Daniele Calandriello Yuan Cao Eugene Tarassov et\u00a0al. 2024. Understanding the performance gap between online and offline alignment algorithms. arxiv:2405.08448 (2024)."},{"key":"e_1_3_2_198_2","first-page":"47725","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Tang Yunhao","year":"2024","unstructured":"Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Remi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Avila Pires, and Bilal Piot. 2024. Generalized preference optimization: A unified approach to offline alignment. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 47725\u201347742."},{"key":"e_1_3_2_199_2","unstructured":"Zhiwei Tang Jiangweizhi Peng Jiasheng Tang Mingyi Hong Fan Wang and Tsung-Hui Chang. 2024. Inference-time alignment of diffusion models with direct noise optimization. arxiv:2405.18881 (2024)."},{"key":"e_1_3_2_200_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et\u00a0al. 2023. Llama 2: Open foundation and fine-tuned chat models. arxiv:2307.09288 (2023)."},{"key":"e_1_3_2_201_2","unstructured":"Masatoshi Uehara Yifei Liu Yujia Li and Yu Tian. 2024. Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review. arxiv:2407.13734 (2024)."},{"key":"e_1_3_2_202_2","volume-title":"International Conference on Machine Learning","author":"Uehara Masatoshi","year":"2024","unstructured":"Masatoshi Uehara, Yifei Wu, Yujia Li, and Yu Tian. 2024. Feedback efficient online fine-tuning of diffusion models. In International Conference on Machine Learning."},{"key":"e_1_3_2_203_2","unstructured":"Masatoshi Uehara Yulai Zhao Chenyu Wang Xiner Li Aviv Regev Sergey Levine and Tommaso Biancalani. 2025. Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review. arxiv:2501.09685 (2025)."},{"key":"e_1_3_2_204_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00786"},{"key":"e_1_3_2_205_2","unstructured":"Chaoqi Wang Yibo Jiang Chenghao Yang Han Liu and Yuxin Chen. 2023. Beyond reverse kl: Generalizing direct preference optimization with diverse divergence constraints. arxiv:2309.16240 (2023)."},{"key":"e_1_3_2_206_2","doi-asserted-by":"crossref","unstructured":"Liyuan Wang Xingxing Zhang Hang Su and Jun Zhu. 2024. A comprehensive survey of continual learning: Theory method and application. IEEE Transactions on Pattern Analysis and Machine Intelligence 46 8 (2024) 5362\u20135383.","DOI":"10.1109\/TPAMI.2024.3367329"},{"key":"e_1_3_2_207_2","doi-asserted-by":"crossref","unstructured":"Shiyu Wang Yuanqi Du Xiaojie Guo Bo Pan Zhaohui Qin and Liang Zhao. 2024. Controllable data generation by deep learning: A review. ACM Comput. Surv. 56 9 Article 228 (apr2024) 38 pages.","DOI":"10.1145\/3648609"},{"key":"e_1_3_2_208_2","unstructured":"Tongzhou Wang Jun-Yan Zhu Antonio Torralba and Alexei A. Efros. 2018. Dataset distillation. arxiv:1811.10959 (2018)."},{"key":"e_1_3_2_209_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3581402"},{"key":"e_1_3_2_210_2","unstructured":"Yibin Wang Zhiyu Tan Junyan Wang Xiaomeng Yang Cheng Jin and Hao Li. 2024. Lift: Leveraging human feedback for text-to-video model alignment. arxiv:2412.04814 (2024)."},{"key":"e_1_3_2_211_2","unstructured":"Yufei Wang Wanjun Zhong Liangyou Li Fei Mi Xingshan Zeng Wenyong Huang Lifeng Shang Xin Jiang and Qun Liu. 2023. Aligning large language models with human: A survey. arxiv:2307.12966 (2023)."},{"key":"e_1_3_2_212_2","doi-asserted-by":"crossref","unstructured":"Zhengwei Wang Qi She and Tom\u00e1s E. Ward. 2021. Generative adversarial networks in computer vision: A survey and taxonomy. ACM Comput. Surv. 54 2 Article 37 (feb2021) 38 pages.","DOI":"10.1145\/3439723"},{"key":"e_1_3_2_213_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.51"},{"key":"e_1_3_2_214_2","first-page":"52588","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Wei Boyi","year":"2024","unstructured":"Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, and Peter Henderson. 2024. Assessing the brittleness of safety alignment via pruning and low-rank modifications. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 52588\u201352610."},{"key":"e_1_3_2_215_2","doi-asserted-by":"crossref","unstructured":"Jason Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Fei Xia Ed Chi Quoc V. Le Denny Zhou et\u00a0al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022) 24824\u201324837.","DOI":"10.52202\/068431-1800"},{"key":"e_1_3_2_216_2","doi-asserted-by":"crossref","unstructured":"Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8 3 (1992) 229\u2013256.","DOI":"10.1023\/A:1022672621406"},{"key":"e_1_3_2_217_2","first-page":"53079","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Wolf Yotam","year":"2024","unstructured":"Yotam Wolf, Noam Wies, Oshri Avnery, Yoav Levine, and Amnon Shashua. 2024. Fundamental limitations of alignment in large language models. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 53079\u201353112."},{"key":"e_1_3_2_218_2","unstructured":"Xiaoshi Wu Yiming Hao Keqiang Sun Yixiong Chen Feng Zhu Rui Zhao and Hongsheng Li. 2023. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis. arxiv:2306.09341 (2023)."},{"key":"e_1_3_2_219_2","doi-asserted-by":"crossref","unstructured":"Xun Wu Shaohan Huang Guolong Wang Jing Xiong and Furu Wei. 2024. Multimodal large language models make text-to-image generative models align better. Advances in Neural Information Processing Systems 37 (2024) 81287\u201381323.","DOI":"10.52202\/079017-2584"},{"key":"e_1_3_2_220_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00200"},{"key":"e_1_3_2_221_2","unstructured":"Yihang Wu Xiao Cao Kaixin Li Zitan Chen Haonan Wang Lei Meng and Zhiyong Huang. 2024. Towards better text-to-image generation alignment via attention modulation. arxiv:2404.13899 (2024)."},{"key":"e_1_3_2_222_2","volume-title":"European Conference on Computer Vision","author":"Wu Yekun","year":"2024","unstructured":"Yekun Wu, Hui Chen, Zhaofeng Zheng, Yufan Zhang, Jie Zhang, and Baining Wang. 2024. Deep reward supervisions for tuning text-to-image diffusion models. In European Conference on Computer Vision."},{"key":"e_1_3_2_223_2","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390306"},{"key":"e_1_3_2_224_2","doi-asserted-by":"crossref","unstructured":"Weihao Xia and Jing-Hao Xue. 2023. A survey on deep generative 3D-aware image synthesis. ACM Comput. Surv. 56 4 (2023) 1\u201334.","DOI":"10.1145\/3626193"},{"key":"e_1_3_2_225_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.01234"},{"key":"e_1_3_2_226_2","volume-title":"International Conference on Learning Representations","author":"Xie Zeke","year":"2021","unstructured":"Zeke Xie, Issei Sato, and Masashi Sugiyama. 2021. A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima. In International Conference on Learning Representations."},{"key":"e_1_3_2_227_2","first-page":"54715","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Xiong Wei","year":"2024","unstructured":"Wei Xiong, Hanze Dong, Chenlu Ye, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, and Tong Zhang. 2024. Iterative preference learning from human feedback: Bridging theory and practice for RLHF under KL-constraint. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 54715\u201354754."},{"key":"e_1_3_2_228_2","unstructured":"Jiazheng Xu Yu Huang Jiale Cheng Yuanming Yang Jiajun Xu Yuan Wang Wenbo Duan Shen Yang Qunlin Jin Shurun Li et\u00a0al. 2024. Visionreward: Fine-grained multi-dimensional human preference learning for image and video generation. arxiv:2412.21059 (2024)."},{"key":"e_1_3_2_229_2","doi-asserted-by":"crossref","unstructured":"Jiazheng Xu Xiao Liu Yuchen Wu Yuxuan Tong Qinkai Li Ming Ding Jie Tang and Yuxiao Dong. 2023. Imagereward: Learning and evaluating human preferences for text-to-image generation. Advances in Neural Information Processing Systems 36 (2023) 15903\u201315935.","DOI":"10.52202\/075280-0700"},{"key":"e_1_3_2_230_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV61041.2025.00299"},{"key":"e_1_3_2_231_2","volume-title":"International Conference on Learning Representations","author":"Xu Minkai","year":"2022","unstructured":"Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, and Jian Tang. 2022. GeoDiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations."},{"key":"e_1_3_2_232_2","unstructured":"Pan Xu Jinghui Chen Difan Zou and Quanquan Gu. 2018. Global convergence of Langevin dynamics based algorithms for nonconvex optimization. Advances in Neural Information Processing Systems 31 (2018) 3122\u20133133."},{"key":"e_1_3_2_233_2","first-page":"54983","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Xu Shusheng","year":"2024","unstructured":"Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, and Yi Wu. 2024. Is DPO superior to PPO for LLM alignment? a comprehensive study. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 54983\u201354998."},{"key":"e_1_3_2_234_2","unstructured":"An Yang Anfeng Li Baosong Yang Beichen Zhang Binyuan Hui Bo Zheng Bowen Yu Chang Gao Chengen Huang Chenxu Lv et\u00a0al. 2025. Qwen3 technical report. arxiv:2505.09388 (2025)."},{"key":"e_1_3_2_235_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00854"},{"key":"e_1_3_2_236_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Yang Ling","year":"2024","unstructured":"Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, and CUI Bin. 2024. Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal llms. In Forty-First International Conference on Machine Learning."},{"key":"e_1_3_2_237_2","doi-asserted-by":"crossref","unstructured":"Ling Yang Zhilong Zhang Yang Song Shenda Hong Runsheng Xu Yue Zhao Wentao Zhang Bin Cui and Ming-Hsuan Yang. 2023. Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv. 56 4 Article 105 (nov2023) 39 pages.","DOI":"10.1145\/3626235"},{"key":"e_1_3_2_238_2","volume-title":"Proceedings of the 41st International Conference on Machine Learning","author":"Yang Rui","year":"2024","unstructured":"Rui Yang, Xiaoman Pan, Feng Luo, Shuang Qiu, Han Zhong, Dong Yu, and Jianshu Chen. 2024. Rewards-in-context: Multi-objective alignment of foundation models with dynamic preference adjustment. In Proceedings of the 41st International Conference on Machine Learning. PMLR."},{"key":"e_1_3_2_239_2","first-page":"56276","volume-title":"International Conference on Machine Learning","author":"Yang Rui","year":"2024","unstructured":"Rui Yang, Xiaoman Pan, Feng Luo, Shuang Qiu, Han Zhong, Dong Yu, and Jianshu Chen. 2024. Rewards-in-context: Multi-objective alignment of foundation models with dynamic preference adjustment. In International Conference on Machine Learning. PMLR, 56276\u201356297."},{"key":"e_1_3_2_240_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Yang Shentao","year":"2024","unstructured":"Shentao Yang, Tianqi Chen, and Mingyuan Zhou. 2024. A dense reward view on aligning text-to-image diffusion with preference. In Forty-First International Conference on Machine Learning."},{"key":"e_1_3_2_241_2","volume-title":"The Eleventh International Conference on Learning Representations","author":"Yang Shuo","year":"2024","unstructured":"Shuo Yang, Zeke Xie, Hanyu Peng, Min Xu, Mingming Sun, and Ping Li. 2024. Dataset pruning: Reducing training data by examining generalization influence. In The Eleventh International Conference on Learning Representations."},{"key":"e_1_3_2_242_2","unstructured":"Junliang Ye Fangfu Liu Qixiu Li Zhengyi Wang Yikai Wang Xinzhou Wang Yueqi Duan and Jun Zhu. 2024. Dreamreward: Text-to-3d generation with human preference. arxiv:2403.14613 (2024)."},{"key":"e_1_3_2_243_2","volume-title":"The Thirteenth International Conference on Learning Representations","author":"Yeh Po-Hung","year":"2025","unstructured":"Po-Hung Yeh, Kuang-Huei Lee, and Jun cheng Chen. 2025. Training-free diffusion model alignment with sampling demons. In The Thirteenth International Conference on Learning Representations."},{"key":"e_1_3_2_244_2","unstructured":"Sibo Yi Yule Liu Zhen Sun Tianshuo Cong Xinlei He Jiaxing Song Ke Xu and Qi Li. 2024. Jailbreak attacks and defenses against large language models: A survey. arxiv:2407.04295 (2024)."},{"key":"e_1_3_2_245_2","unstructured":"Ruonan Yu Songhua Liu and Xinchao Wang. 2023. Dataset distillation: A comprehensive review. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023)."},{"key":"e_1_3_2_246_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00618"},{"key":"e_1_3_2_247_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Yuan Weizhe","year":"2024","unstructured":"Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, Jing Xu, and Jason E. Weston. 2024. Self-rewarding language models. In Forty-First International Conference on Machine Learning."},{"key":"e_1_3_2_248_2","unstructured":"Zheng Yuan Hongyi Yuan Chuanqi Tan Wei Wang Songfang Huang and Fei Huang. 2023. Rrhf: Rank responses to align language models with human feedback without tears. arxiv:2304.05302 (2023)."},{"key":"e_1_3_2_249_2","first-page":"58348","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Zeng Yongcheng","year":"2024","unstructured":"Yongcheng Zeng, Guoqing Liu, Weiyu Ma, Ning Yang, Haifeng Zhang, and Jun Wang. 2024. Token-level direct preference optimization. In Proceedings of the 41st International Conference on Machine Learning, Vol. 235. PMLR, 58348\u201358365."},{"key":"e_1_3_2_250_2","unstructured":"Jiacheng Zhang Jie Wu Weifeng Chen Yatai Ji Xuefeng Xiao Weilin Huang and Kai Han. 2024. Onlinevpo: Align video diffusion model with online video-centric preference optimization. arxiv:2412.15159 (2024)."},{"key":"e_1_3_2_251_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00766"},{"key":"e_1_3_2_252_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00862"},{"key":"e_1_3_2_253_2","unstructured":"Zekun Zhang Yifei Liu Yujia Li and Yu Tian. 2024. Improving long-text alignment for text-to-image diffusion models. arxiv:2410.11817 (2024)."},{"key":"e_1_3_2_254_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Zhang Ziyi","year":"2024","unstructured":"Ziyi Zhang, Sen Zhang, Yibing Zhan, Yong Luo, Yonggang Wen, and Dacheng Tao. 2024. Confronting reward overoptimization for diffusion models: A perspective of inductive and primacy biases. In Forty-First International Conference on Machine Learning."},{"key":"e_1_3_2_255_2","unstructured":"Zicong Zhang Yang Zhang Yang Song and Tao Chen. 2024. Information theoretic text-to-image alignment. arxiv:2405.20759 (2024)."},{"key":"e_1_3_2_256_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Zhao Dora","year":"2024","unstructured":"Dora Zhao, Jerone Andrews, Orestis Papakyriakopoulos, and Alice Xiang. 2024. Position: Measure dataset diversity, don\u2019t just claim it. In Forty-First International Conference on Machine Learning."},{"key":"e_1_3_2_257_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02154"},{"key":"e_1_3_2_258_2","unstructured":"Jiacheng Zheng Yifei Wu Yujia Li and Yu Tian. 2024. Reward fine-tuning two-step diffusion models via learning differentiable latent-space surrogate reward. arxiv:2411.15247 (2024)."},{"key":"e_1_3_2_259_2","unstructured":"Yutong Zhong Yifei Liu Yujia Li and Yu Tian. 2025. Focus-N-Fix: Region-aware fine-tuning for text-to-image generation. arxiv:2501.06481 (2025)."},{"key":"e_1_3_2_260_2","doi-asserted-by":"crossref","unstructured":"Chunting Zhou Pengfei Liu Puxin Xu Srinivasan Iyer Jiao Sun Yuning Mao Xuezhe Ma Avia Efrat Ping Yu LILI YU et\u00a0al. 2023. Lima: Less is more for alignment. Advances in Neural Information Processing Systems 36 (2023) 55006\u201355021.","DOI":"10.52202\/075280-2400"},{"key":"e_1_3_2_261_2","volume-title":"The Eleventh International Conference on Learning Representations","author":"Zhou Yongchao","year":"2023","unstructured":"Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. 2023. Large language models are human-level prompt engineers. In The Eleventh International Conference on Learning Representations."},{"key":"e_1_3_2_262_2","volume-title":"International Conference on Computer Vision","author":"Zhou Zikai","year":"2025","unstructured":"Zikai Zhou, Shitong Shao, Lichen Bai, Shufei Zhang, Zhiqiang Xu, Bo Han, and Zeke Xie. 2025. Golden noise for diffusion models: A learning framework. In International Conference on Computer Vision."},{"key":"e_1_3_2_263_2","volume-title":"Forty-Second International Conference on Machine Learning","author":"Zhou Zhenglin","year":"2025","unstructured":"Zhenglin Zhou, Xiaobo Xia, Fan Ma, Hehe Fan, Yi Yang, and Tat-Seng Chua. 2025. DreamDPO: Aligning text-to-3D generation with human preferences via direct preference optimization. In Forty-Second International Conference on Machine Learning."},{"key":"e_1_3_2_264_2","volume-title":"The Thirteenth International Conference on Learning Representations","author":"Zhu Huaisheng","year":"2025","unstructured":"Huaisheng Zhu, Teng Xiao, and Vasant G. Honavar. 2025. DSPO: Direct score preference optimization for diffusion model alignment. In The Thirteenth International Conference on Learning Representations."}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3796982","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,15]],"date-time":"2026-03-15T09:11:16Z","timestamp":1773565876000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3796982"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,10]]},"references-count":263,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2026,7,31]]}},"alternative-id":["10.1145\/3796982"],"URL":"https:\/\/doi.org\/10.1145\/3796982","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,10]]},"assertion":[{"value":"2024-09-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-02-04","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-03-10","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}