{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T03:40:22Z","timestamp":1759894822676,"version":"build-2065373602"},"publisher-location":"New York, NY, USA","reference-count":49,"publisher":"ACM","license":[{"start":{"date-parts":[[2025,5,8]],"date-time":"2025-05-08T00:00:00Z","timestamp":1746662400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Research Foundation, Singapore, under its Competitive Research Programme","award":["NRF-CRP23-2019-0006"],"award-info":[{"award-number":["NRF-CRP23-2019-0006"]}]},{"name":"General Research Fund (GRF) project of the Hong Kong University Grants Committee","award":["14200720"],"award-info":[{"award-number":["14200720"]}]},{"name":"National Natural Science Foundation of China (NSFC) Project","award":["62073273"],"award-info":[{"award-number":["62073273"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,5,8]]},"DOI":"10.1145\/3701716.3715226","type":"proceedings-article","created":{"date-parts":[[2025,5,23]],"date-time":"2025-05-23T16:20:01Z","timestamp":1748017201000},"page":"315-324","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["GAS: Generative Auto-bidding with Post-training Search"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-0073-123X","authenticated-orcid":false,"given":"Yewen","family":"Li","sequence":"first","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-8430-5871","authenticated-orcid":false,"given":"Shuai","family":"Mao","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4470-5972","authenticated-orcid":false,"given":"Jingtong","family":"Gao","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-1573-1487","authenticated-orcid":false,"given":"Nan","family":"Jiang","sequence":"additional","affiliation":[{"name":"Kuaishou Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0110-8032","authenticated-orcid":false,"given":"Yunjian","family":"Xu","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6451-9299","authenticated-orcid":false,"given":"Qingpeng","family":"Cai","sequence":"additional","affiliation":[{"name":"Kuaishou Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-1739-0868","authenticated-orcid":false,"given":"Fei","family":"Pan","sequence":"additional","affiliation":[{"name":"Kuaishou Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9266-0780","authenticated-orcid":false,"given":"Peng","family":"Jiang","sequence":"additional","affiliation":[{"name":"Kuaishou Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7064-7438","authenticated-orcid":false,"given":"Bo","family":"An","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,5,23]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"Anurag Ajay Yilun Du Abhi Gupta Joshua B. Tenenbaum Tommi S. Jaakkola and Pulkit Agrawal. 2023. Is Conditional Generative Modeling all you need for Decision Making?. In ICLR."},{"key":"e_1_3_2_2_2_1","unstructured":"Alibaba. 2021. Alimama Super Diamond. https:\/\/zuanshi.taobao.com\/."},{"key":"e_1_3_2_2_3_1","volume-title":"International Conference on Artificial Intelligence and Statistics,.","author":"Azar Mohammad Gheshlaghi","year":"2024","unstructured":"Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, R\u00e9mi Munos, Mark Rowland, Michal Valko, and Daniele Calandriello. 2024. A General Theoretical Paradigm to Understand Learning from Human Preferences. In International Conference on Artificial Intelligence and Statistics,."},{"key":"e_1_3_2_2_4_1","unstructured":"Santiago R. Balseiro Yuan Deng Jieming Mao Vahab S. Mirrokni and Song Zuo. 2021. Robust Auction Design in the Auto-bidding World. In NeurIPS."},{"key":"e_1_3_2_2_5_1","first-page":"181","article-title":"Majority systems and the Condorcet jury theorem","volume":"38","author":"Boland Philip J","year":"1989","unstructured":"Philip J Boland. 1989. Majority systems and the Condorcet jury theorem. Journal of the Royal Statistical Society Series D: The Statistician, Vol. 38, 3 (1989), 181--189.","journal-title":"Journal of the Royal Statistical Society Series D: The Statistician"},{"key":"e_1_3_2_2_6_1","volume-title":"Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, and Jeffrey Wu.","author":"Burns Collin","year":"2024","unstructured":"Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, and Jeffrey Wu. 2024. Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision. In ICML."},{"key":"e_1_3_2_2_7_1","unstructured":"Lili Chen Kevin Lu Aravind Rajeswaran Kimin Lee Aditya Grover Michael Laskin Pieter Abbeel Aravind Srinivas and Igor Mordatch. 2021. Decision transformer: reinforcement learning via sequence modeling. In NeurIPS."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Ye Chen Pavel Berkhin Bo Anderson and Nikhil R Devanur. 2011. Real-time bidding algorithms for performance-based display ad allocation. In KDD.","DOI":"10.1145\/2020408.2020604"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"crossref","unstructured":"Yuan Deng Jieming Mao Vahab S. Mirrokni and Song Zuo. 2021. Towards Efficient Auctions in an Auto-bidding World. In WWW.","DOI":"10.1145\/3442381.3450052"},{"key":"e_1_3_2_2_10_1","volume-title":"Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908","author":"Doersch Carl","year":"2016","unstructured":"Carl Doersch. 2016. Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908 (2016)."},{"key":"e_1_3_2_2_11_1","unstructured":"Kawin Ethayarajh Winnie Xu Niklas Muennighoff Dan Jurafsky and Douwe Kiela. 2024. Model Alignment as Prospect Theoretic Optimization. In ICML."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1257\/jep.23.3.37"},{"key":"e_1_3_2_2_13_1","unstructured":"Facebook. 2021. Advertising on Facebook. https:\/\/www.facebook.com\/business\/ads."},{"key":"e_1_3_2_2_14_1","unstructured":"Scott Fujimoto David Meger and Doina Precup. 2019. Off-policy deep reinforcement learning without exploration. In ICML."},{"key":"e_1_3_2_2_15_1","volume-title":"Future-Conditioned Recommendations with Multi-Objective Controllable Decision Transformer. arXiv preprint arXiv:2501.07212","author":"Gao Chongming","year":"2025","unstructured":"Chongming Gao, Kexin Huang, Ziang Fei, Jiaju Chen, Jiawei Chen, Jianshan Sun, Shuchang Liu, Qingpeng Cai, and Peng Jiang. 2025. Future-Conditioned Recommendations with Multi-Objective Controllable Decision Transformer. arXiv preprint arXiv:2501.07212 (2025)."},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3422622"},{"key":"e_1_3_2_2_17_1","unstructured":"Google. 2021. Google Ads. https:\/\/ads.google.com\/."},{"key":"e_1_3_2_2_18_1","volume-title":"AIGB: Generative Auto-bidding via Diffusion Modeling. In KDD.","author":"Guo Jiayan","year":"2024","unstructured":"Jiayan Guo, Yusen Huo, Zhilin Zhang, Tianyu Wang, Chuan Yu, Jian Xu, Yan Zhang, and Bo Zheng. 2024. AIGB: Generative Auto-bidding via Diffusion Modeling. In KDD."},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"crossref","unstructured":"Yue He Xiujun Chen Di Wu Junwei Pan Qing Tan Chuan Yu Jian Xu and Xiaoqiang Zhu. 2021. A Unified Solution to Constrained Bidding in Online Display Advertising. In KDD.","DOI":"10.1145\/3447548.3467199"},{"key":"e_1_3_2_2_20_1","volume-title":"NeurIPS","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In NeurIPS 2020."},{"key":"e_1_3_2_2_21_1","volume-title":"ARGS: Alignment as Reward-Guided Search. In ICLR.","author":"Khanov Maxim","year":"2024","unstructured":"Maxim Khanov, Jirayu Burapacheep, and Yixuan Li. 2024. ARGS: Alignment as Reward-Guided Search. In ICLR."},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"crossref","unstructured":"Levente Kocsis and Csaba Szepesv\u00e1ri. 2006. Bandit Based Monte-Carlo Planning. In ECML.","DOI":"10.1007\/11871842_29"},{"key":"e_1_3_2_2_23_1","unstructured":"Ilya Kostrikov Ashvin Nair and Sergey Levine. 2022. Offline Reinforcement Learning with Implicit Q-Learning. In ICLR."},{"key":"e_1_3_2_2_24_1","unstructured":"Aviral Kumar Aurick Zhou George Tucker and Sergey Levine. 2020. Conservative Q-learning for offline reinforcement learning. In NeurIPS."},{"key":"e_1_3_2_2_25_1","volume-title":"Cascade reward sampling for efficient decoding-time alignment. arXiv preprint arXiv:2406.16306","author":"Li Bolian","year":"2024","unstructured":"Bolian Li, Yifan Wang, Ananth Grama, and Ruqi Zhang. 2024. Cascade reward sampling for efficient decoding-time alignment. arXiv preprint arXiv:2406.16306 (2024)."},{"key":"e_1_3_2_2_26_1","volume-title":"Auto-bidding Equilibrium in ROI-Constrained Online Advertising Markets. CoRR","author":"Li Juncheng","year":"2022","unstructured":"Juncheng Li and Pingzhong Tang. 2022. Auto-bidding Equilibrium in ROI-Constrained Online Advertising Markets. CoRR, Vol. abs\/2210.06107 (2022)."},{"key":"e_1_3_2_2_27_1","unstructured":"Zuxin Liu Zijian Guo Yihang Yao Zhepeng Cen Wenhao Yu Tingnan Zhang and Ding Zhao. 2023. Constrained decision transformer for offline safe reinforcement learning. In ICML."},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626772.3657829"},{"key":"e_1_3_2_2_29_1","unstructured":"Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In ICLR."},{"key":"e_1_3_2_2_30_1","volume-title":"SimPO: Simple Preference Optimization with a Reference-Free Reward. CoRR","author":"Meng Yu","year":"2024","unstructured":"Yu Meng, Mengzhou Xia, and Danqi Chen. 2024. SimPO: Simple Preference Optimization with a Reference-Free Reward. CoRR, Vol. abs\/2405.14734 (2024)."},{"key":"e_1_3_2_2_31_1","unstructured":"Zhiyu Mou Yusen Huo Rongquan Bai Mingzhou Xie Chuan Yu Jian Xu and Bo Zheng. 2022. Sustainable online reinforcement learning for auto-bidding. In NeurIPS."},{"key":"e_1_3_2_2_32_1","article-title":"Biases in Large Language Models: Origins, Inventory, and Discussion","volume":"15","author":"Navigli Roberto","year":"2023","unstructured":"Roberto Navigli, Simone Conia, and Bj\u00f6rn Ross. 2023. Biases in Large Language Models: Origins, Inventory, and Discussion. ACM J. Data Inf. Qual., Vol. 15, 2 (2023), 10:1--10:21.","journal-title":"ACM J. Data Inf. Qual."},{"key":"e_1_3_2_2_33_1","unstructured":"OpenAI. 2024. ChatGPT. https:\/\/chatgpt.com\/."},{"key":"e_1_3_2_2_34_1","unstructured":"Weitong Ou Bo Chen Yingxuan Yang Xinyi Dai Weiwen Liu Weinan Zhang Ruiming Tang and Yong Yu. 2023. Deep Landscape Forecasting in Multi-Slot Real-Time Bidding. In KDD."},{"key":"e_1_3_2_2_35_1","unstructured":"Long Ouyang Jeffrey Wu Xu Jiang Diogo Almeida Carroll L. Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray John Schulman Jacob Hilton Fraser Kelton Luke Miller Maddie Simens Amanda Askell Peter Welinder Paul F. Christiano Jan Leike and Ryan Lowe. 2022a. Training language models to follow instructions with human feedback. In NeurIPS."},{"key":"e_1_3_2_2_36_1","unstructured":"Long Ouyang Jeffrey Wu Xu Jiang Diogo Almeida Carroll L. Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray John Schulman Jacob Hilton Fraser Kelton Luke Miller Maddie Simens Amanda Askell Peter Welinder Paul F. Christiano Jan Leike and Ryan Lowe. 2022b. Training language models to follow instructions with human feedback. In NeurIPS."},{"key":"e_1_3_2_2_37_1","volume-title":"Softmax deep double deterministic policy gradients. Advances in neural information processing systems","author":"Pan Ling","year":"2020","unstructured":"Ling Pan, Qingpeng Cai, and Longbo Huang. 2020. Softmax deep double deterministic policy gradients. Advances in neural information processing systems, Vol. 33 (2020), 11767--11777."},{"key":"e_1_3_2_2_38_1","unstructured":"Ling Pan Nikolay Malkin Dinghuai Zhang and Yoshua Bengio. 2023. Better Training of GFlowNets with Local Credit and Incomplete Trajectories. In ICML."},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"crossref","unstructured":"Ryan Park Rafael Rafailov Stefano Ermon and Chelsea Finn. 2024. Disentangling Length from Quality in Direct Preference Optimization. In ACL.","DOI":"10.18653\/v1\/2024.findings-acl.297"},{"key":"e_1_3_2_2_40_1","unstructured":"Rafael Rafailov Archit Sharma Eric Mitchell Christopher D. Manning Stefano Ermon and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. In NeurIPS."},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"crossref","unstructured":"Robin Rombach Andreas Blattmann Dominik Lorenz Patrick Esser and Bj\u00f6rn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. In CVPR.","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_2_2_42_1","volume-title":"Proximal Policy Optimization Algorithms. CoRR","author":"Schulman John","year":"2017","unstructured":"John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. CoRR, Vol. abs\/1707.06347 (2017)."},{"key":"e_1_3_2_2_43_1","volume-title":"AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale Games. In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track.","author":"Su Kefan","year":"2024","unstructured":"Kefan Su, Yusen Huo, Zhilin Zhang, Shuai Dou, Chuan Yu, Jian Xu, Zongqing Lu, and Bo Zheng. 2024. AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale Games. In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track."},{"key":"e_1_3_2_2_44_1","volume-title":"Reinforcement learning: An introduction. A Bradford Book","author":"Sutton Richard S","year":"2018","unstructured":"Richard S Sutton. 2018. Reinforcement learning: An introduction. A Bradford Book (2018)."},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"crossref","unstructured":"Jun Wang and Shuai Yuan. 2015. Real-Time Bidding: A New Frontier of Computational Advertising Research. In WSDM.","DOI":"10.1145\/2684822.2697041"},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"crossref","unstructured":"Chao Wen Miao Xu Zhilin Zhang Zhenzhe Zheng Yuhui Wang Xiangyu Liu Yu Rong Dong Xie Xiaoyang Tan Chuan Yu et al. 2022. A cooperative-competitive multi-agent framework for auto-bidding in online advertising. In WSDM.","DOI":"10.1145\/3488560.3498373"},{"key":"e_1_3_2_2_47_1","volume-title":"Kenton Murray, and Young Jin Kim.","author":"Xu Haoran","year":"2024","unstructured":"Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, and Young Jin Kim. 2024. Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation. In ICML."},{"key":"e_1_3_2_2_48_1","unstructured":"Jian Xu Zhilin Zhang Zongqing Lu Xiaotie Deng Michael P Wellman Chuan Yu Shuai Dou Yusen Huo Zhiwei Xu Zhijian Duan et al. [n. d.]. Auto-Bidding in Large-Scale Auctions: Learning Decision-Making in Uncertain and Competitive Games. In NeurIPS 2024 Competition Track."},{"key":"e_1_3_2_2_49_1","unstructured":"Hao Yu Michael J Neely and Xiaohan Wei. 2017. Online convex optimization with stochastic constraints. In NeurIPS."}],"event":{"name":"WWW '25: The ACM Web Conference 2025","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web"],"location":"Sydney NSW Australia","acronym":"WWW '25"},"container-title":["Companion Proceedings of the ACM on Web Conference 2025"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3701716.3715226","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3701716.3715226","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T03:06:01Z","timestamp":1759892761000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3701716.3715226"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,8]]},"references-count":49,"alternative-id":["10.1145\/3701716.3715226","10.1145\/3701716"],"URL":"https:\/\/doi.org\/10.1145\/3701716.3715226","relation":{},"subject":[],"published":{"date-parts":[[2025,5,8]]},"assertion":[{"value":"2025-05-23","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}