{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T23:59:34Z","timestamp":1776470374214,"version":"3.51.2"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,8]]},"abstract":"<jats:p>Reinforcement learning methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. We demonstrate our methods in simulation, and validate the scalability and effectiveness of decomposed TD-learning on YouTube.<\/jats:p>","DOI":"10.24963\/ijcai.2019\/360","type":"proceedings-article","created":{"date-parts":[[2019,7,28]],"date-time":"2019-07-28T03:46:05Z","timestamp":1564285565000},"page":"2592-2599","source":"Crossref","is-referenced-by-count":73,"title":["SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets"],"prefix":"10.24963","author":[{"given":"Eugene","family":"Ie","sequence":"first","affiliation":[{"name":"Google Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vihan","family":"Jain","sequence":"additional","affiliation":[{"name":"Google Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jing","family":"Wang","sequence":"additional","affiliation":[{"name":"Google Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sanmit","family":"Narvekar","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Texas at Austin"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ritesh","family":"Agarwal","sequence":"additional","affiliation":[{"name":"Google Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rui","family":"Wu","sequence":"additional","affiliation":[{"name":"Google Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Heng-Tze","family":"Cheng","sequence":"additional","affiliation":[{"name":"Google Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tushar","family":"Chandra","sequence":"additional","affiliation":[{"name":"Google Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Craig","family":"Boutilier","sequence":"additional","affiliation":[{"name":"Google Research"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"10584","event":{"name":"Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}","theme":"Artificial Intelligence","location":"Macao, China","acronym":"IJCAI-2019","number":"28","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"start":{"date-parts":[[2019,8,10]]},"end":{"date-parts":[[2019,8,16]]}},"container-title":["Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2019,7,28]],"date-time":"2019-07-28T03:48:47Z","timestamp":1564285727000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2019\/360"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2019,8]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2019\/360","relation":{},"subject":[],"published":{"date-parts":[[2019,8]]}}}