{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,9]],"date-time":"2025-07-09T22:44:56Z","timestamp":1752101096871,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":44,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,9,14]],"date-time":"2023-09-14T00:00:00Z","timestamp":1694649600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,9,14]]},"DOI":"10.1145\/3604915.3608854","type":"proceedings-article","created":{"date-parts":[[2023,9,14]],"date-time":"2023-09-14T22:40:23Z","timestamp":1694731223000},"page":"955-962","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4973-8458","authenticated-orcid":false,"given":"Ruiyang","family":"Xu","sequence":"first","affiliation":[{"name":"Applied Reinforcement Learning, Meta AI, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7115-8986","authenticated-orcid":false,"given":"Jalaj","family":"Bhandari","sequence":"additional","affiliation":[{"name":"Applied Reinforcement Learning, Meta AI, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-5748-9571","authenticated-orcid":false,"given":"Dmytro","family":"Korenkevych","sequence":"additional","affiliation":[{"name":"Applied Reinforcement Learning, Meta AI, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5167-285X","authenticated-orcid":false,"given":"Fan","family":"Liu","sequence":"additional","affiliation":[{"name":"Meta, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3153-5177","authenticated-orcid":false,"given":"Yuchen","family":"He","sequence":"additional","affiliation":[{"name":"Meta, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-2335-289X","authenticated-orcid":false,"given":"Alex","family":"Nikulkov","sequence":"additional","affiliation":[{"name":"Applied Reinforcement Learning, Meta AI, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1162-106X","authenticated-orcid":false,"given":"Zheqing","family":"Zhu","sequence":"additional","affiliation":[{"name":"Applied Reinforcement Learning, Meta AI, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,9,14]]},"reference":[{"volume-title":"Dynamic programming and optimal control","author":"Bertsekas Dimitri","key":"e_1_3_2_1_1_1","unstructured":"Dimitri Bertsekas. 2012. Dynamic programming and optimal control: Volume I. Vol.\u00a01. Athena scientific."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3289600.3290999"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3523227.3546758"},{"key":"e_1_3_2_1_4_1","volume-title":"International Conference on Machine Learning. PMLR, 1052\u20131061","author":"Chen Xinshi","year":"2019","unstructured":"Xinshi Chen, Shuang Li, Hui Li, Shaohua Jiang, Yuan Qi, and Le Song. 2019. Generative adversarial user model for reinforcement learning based recommendation system. In International Conference on Machine Learning. PMLR, 1052\u20131061."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2988450.2988454"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2959100.2959190"},{"key":"e_1_3_2_1_7_1","volume-title":"Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American economic review 97, 1","author":"Edelman Benjamin","year":"2007","unstructured":"Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. 2007. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American economic review 97, 1 (2007), 242\u2013259."},{"key":"e_1_3_2_1_8_1","volume-title":"Collaborative filtering recommender systems. Foundations and Trends\u00ae in Human\u2013Computer Interaction 4, 2","author":"Ekstrand D","year":"2011","unstructured":"Michael\u00a0D Ekstrand, John\u00a0T Riedl, Joseph\u00a0A Konstan, 2011. Collaborative filtering recommender systems. Foundations and Trends\u00ae in Human\u2013Computer Interaction 4, 2 (2011), 81\u2013173."},{"key":"e_1_3_2_1_9_1","volume-title":"The economics of the online advertising industry. Review of network economics 7, 3","author":"Evans S","year":"2008","unstructured":"David\u00a0S Evans. 2008. The economics of the online advertising industry. Review of network economics 7, 3 (2008)."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2843948"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2017.7989385"},{"key":"e_1_3_2_1_12_1","volume-title":"Contextual markov decision processes. arXiv preprint arXiv:1502.02259","author":"Hallak Assaf","year":"2015","unstructured":"Assaf Hallak, Dotan Di\u00a0Castro, and Shie Mannor. 2015. Contextual markov decision processes. arXiv preprint arXiv:1502.02259 (2015)."},{"key":"e_1_3_2_1_13_1","volume-title":"Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939","author":"Hidasi Bal\u00e1zs","year":"2015","unstructured":"Bal\u00e1zs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015)."},{"key":"e_1_3_2_1_14_1","volume-title":"Universal algorithmic intelligence: A mathematical top\u2192 down approach. Artificial general intelligence","author":"Hutter Marcus","year":"2007","unstructured":"Marcus Hutter. 2007. Universal algorithmic intelligence: A mathematical top\u2192 down approach. Artificial general intelligence (2007), 227\u2013290."},{"key":"e_1_3_2_1_15_1","unstructured":"Eugene Ie Vihan Jain Jing Wang Sanmit Narvekar Ritesh Agarwal Rui Wu Heng-Tze Cheng Tushar Chandra and Craig Boutilier. 2019. SlateQ: A tractable decomposition for reinforcement learning with recommendation sets. (2019)."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2009.263"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772758"},{"key":"e_1_3_2_1_18_1","volume-title":"Content-based recommender systems: State of the art and trends. Recommender systems handbook","author":"Lops Pasquale","year":"2011","unstructured":"Pasquale Lops, Marco De\u00a0Gemmis, and Giovanni Semeraro. 2011. Content-based recommender systems: State of the art and trends. Recommender systems handbook (2011), 73\u2013105."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2015.03.008"},{"key":"e_1_3_2_1_20_1","volume-title":"Reinforcement learning, bit by bit. arXiv preprint arXiv:2103.04047","author":"Lu Xiuyuan","year":"2021","unstructured":"Xiuyuan Lu, Benjamin Van\u00a0Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, and Zheng Wen. 2021. Reinforcement learning, bit by bit. arXiv preprint arXiv:2103.04047 (2021)."},{"key":"e_1_3_2_1_21_1","volume-title":"Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective. arXiv preprint arXiv:2302.03561","author":"Maystre Lucas","year":"2023","unstructured":"Lucas Maystre, Daniel Russo, and Yu Zhao. 2023. Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective. arXiv preprint arXiv:2302.03561 (2023)."},{"key":"e_1_3_2_1_22_1","volume-title":"Human-level control through deep reinforcement learning. nature 518, 7540","author":"Mnih Volodymyr","year":"2015","unstructured":"Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei\u00a0A Rusu, Joel Veness, Marc\u00a0G Bellemare, Alex Graves, Martin Riedmiller, Andreas\u00a0K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529\u2013533."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098108"},{"key":"e_1_3_2_1_24_1","volume-title":"Content-based recommendation systems. The adaptive web: methods and strategies of web personalization","author":"Pazzani J","year":"2007","unstructured":"Michael\u00a0J Pazzani and Daniel Billsus. 2007. Content-based recommendation systems. The adaptive web: methods and strategies of web personalization (2007), 325\u2013341."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611973440.53"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3190616"},{"key":"e_1_3_2_1_27_1","volume-title":"Collaborative filtering recommender systems. The adaptive web: methods and strategies of web personalization","author":"Schafer J\u00a0Ben","year":"2007","unstructured":"J\u00a0Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen. 2007. Collaborative filtering recommender systems. The adaptive web: methods and strategies of web personalization (2007), 291\u2013324."},{"key":"e_1_3_2_1_28_1","volume-title":"International conference on machine learning. PMLR","author":"Schulman John","year":"2015","unstructured":"John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. PMLR, 1889\u20131897."},{"key":"e_1_3_2_1_29_1","volume-title":"Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347","author":"Schulman John","year":"2017","unstructured":"John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)."},{"key":"e_1_3_2_1_30_1","volume-title":"An MDP-based recommender system.Journal of Machine Learning Research 6, 9","author":"Shani Guy","year":"2005","unstructured":"Guy Shani, David Heckerman, Ronen\u00a0I Brafman, and Craig Boutilier. 2005. An MDP-based recommender system.Journal of Machine Learning Research 6, 9 (2005)."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2556270"},{"key":"e_1_3_2_1_32_1","volume-title":"Two decades of recommender systems at Amazon. com. Ieee internet computing 21, 3","author":"Smith Brent","year":"2017","unstructured":"Brent Smith and Greg Linden. 2017. Two decades of recommender systems at Amazon. com. Ieee internet computing 21, 3 (2017), 12\u201318."},{"volume-title":"Reinforcement learning: An introduction","author":"Sutton S","key":"e_1_3_2_1_33_1","unstructured":"Richard\u00a0S Sutton and Andrew\u00a0G Barto. 2018. Reinforcement learning: An introduction. MIT press."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2017.8202134"},{"key":"e_1_3_2_1_35_1","volume-title":"Position auctions. international Journal of industrial Organization 25, 6","author":"Varian R","year":"2007","unstructured":"Hal\u00a0R Varian. 2007. Position auctions. international Journal of industrial Organization 25, 6 (2007), 1163\u20131178."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1257\/aer.104.5.442"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.10936"},{"key":"e_1_3_2_1_38_1","volume-title":"Sequential recommender systems: challenges, progress and prospects. arXiv preprint arXiv:2001.04830","author":"Wang Shoujin","year":"2019","unstructured":"Shoujin Wang, Liang Hu, Yan Wang, Longbing Cao, Quan\u00a0Z Sheng, and Mehmet Orgun. 2019. Sequential recommender systems: challenges, progress and prospects. arXiv preprint arXiv:2001.04830 (2019)."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401147"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939878"},{"key":"e_1_3_2_1_41_1","volume-title":"Deep learning based recommender system: A survey and new perspectives. ACM computing surveys (CSUR) 52, 1","author":"Zhang Shuai","year":"2019","unstructured":"Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep learning based recommender system: A survey and new perspectives. ACM computing surveys (CSUR) 52, 1 (2019), 1\u201338."},{"key":"e_1_3_2_1_42_1","volume-title":"Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator. ACM Sigweb NewsletterSpring","author":"Zhao Xiangyu","year":"2019","unstructured":"Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin. 2019. \" Deep reinforcement learning for search, recommendation, and online advertising: a survey\" by Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator. ACM Sigweb NewsletterSpring (2019), 1\u201315."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3185994"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330668"}],"event":{"name":"RecSys '23: Seventeenth ACM Conference on Recommender Systems","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGAI ACM Special Interest Group on Artificial Intelligence","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data","SIGIR ACM Special Interest Group on Information Retrieval","SIGCHI ACM Special Interest Group on Computer-Human Interaction","SIGecom Special Interest Group on Economics and Computation"],"location":"Singapore Singapore","acronym":"RecSys '23"},"container-title":["Proceedings of the 17th ACM Conference on Recommender Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3604915.3608854","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3604915.3608854","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:34Z","timestamp":1750178794000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3604915.3608854"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,14]]},"references-count":44,"alternative-id":["10.1145\/3604915.3608854","10.1145\/3604915"],"URL":"https:\/\/doi.org\/10.1145\/3604915.3608854","relation":{},"subject":[],"published":{"date-parts":[[2023,9,14]]},"assertion":[{"value":"2023-09-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}