{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T08:25:29Z","timestamp":1774599929495,"version":"3.50.1"},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2024,8,13]],"date-time":"2024-08-13T00:00:00Z","timestamp":1723507200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Department of Education of the Basque Government through the Consolidated Research Group MATHMODE","award":["IT1456-22"],"award-info":[{"award-number":["IT1456-22"]}]},{"DOI":"10.13039\/501100001665","name":"French \u201cAgence Nationale de la Recherche (ANR)\u201d","doi-asserted-by":"crossref","award":["ANR-22-CE25-0013-02 (ANR EPLER)"],"award-info":[{"award-number":["ANR-22-CE25-0013-02 (ANR EPLER)"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"crossref"}]},{"name":"S. S. Bhatnagar Fellowship from Council of Scientific and Industrial Research, Government of India and Google Research"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Model. Perform. Eval. Comput. Syst."],"published-print":{"date-parts":[[2024,9,30]]},"abstract":"<jats:p>\n            The Whittle index policy is a heuristic that has shown remarkably good performance (with guaranteed asymptotic optimality) when applied to the class of problems known as Restless Multi-Armed Bandit Problems (RMABPs). In this article, we present QWI and QWINN, two reinforcement learning algorithms, respectively tabular and deep, to learn the Whittle index for the total discounted criterion. The key feature is the use of two time-scales, a faster one to update the state-action\n            <jats:italic>Q<\/jats:italic>\n            -values, and a relatively slower one to update the Whittle indices. In our main theoretical result, we show that QWI, which is a tabular implementation, converges to the real Whittle indices. We then present QWINN, an adaptation of QWI algorithm using neural networks to compute the\n            <jats:italic>Q<\/jats:italic>\n            -values on the faster time-scale, which is able to extrapolate information from one state to another and scales naturally to large state-space environments. For QWINN, we show that all local minima of the Bellman error are locally stable equilibria, which is the first result of its kind for DQN-based schemes. Numerical computations show that QWI and QWINN converge faster than the standard\n            <jats:italic>Q<\/jats:italic>\n            -learning algorithm, neural-network based approximate Q-learning, and other state-of-the-art algorithms.\n          <\/jats:p>","DOI":"10.1145\/3670686","type":"journal-article","created":{"date-parts":[[2024,6,3]],"date-time":"2024-06-03T11:09:55Z","timestamp":1717412995000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Tabular and Deep Learning for the Whittle Index"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1040-1513","authenticated-orcid":false,"given":"Francisco","family":"Robledo Rela\u00f1o","sequence":"first","affiliation":[{"name":"UPV\/EHU, Bilbao, Spain and UPPA, Pau France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0756-5402","authenticated-orcid":false,"given":"Vivek","family":"Borkar","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology, Mumbai, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1761-2313","authenticated-orcid":false,"given":"Urtzi","family":"Ayesta","sequence":"additional","affiliation":[{"name":"Institut de Recherche en Informatique de Toulouse, Toulouse, France, UPV\/EHU, Donostia Spain and Ikerbasque, Bilbao, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8124-8272","authenticated-orcid":false,"given":"Konstantin","family":"Avrachenkov","sequence":"additional","affiliation":[{"name":"Inria, Sophia Antipolis, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,8,13]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1137\/S0363012999361974"},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2022.110186"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-93-86279-38-5"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-377-6.50034-7"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/ANZCC47194.2019.8945748"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-92511-6_10"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1002\/9780470980033"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISIT.2018.8437712"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467370"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2016.12.014"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2010.2068950"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_3_14_2","first-page":"828","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Nakhleh Khaled","year":"2021","unstructured":"Khaled Nakhleh, Santosh Ganji, Ping-Chun Hsieh, I.-Hong Hou, and Srinivas Shakkottai. 2021. NeurWIN: Neural whittle index network for restless bandits via deep RL. In Proceedings of the Advances in Neural Information Processing Systems. Curran Associates, Inc., 828\u2013839. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/0768281a05da9f27df178b5c39a51263-Abstract.html"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.1287\/moor.2019.0998"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-34106-9_19"},{"key":"e_1_3_3_17_2","first-page":"235","volume-title":"Proceedings of the Learning for Dynamics and Control Conference","author":"Pagare Tejas","year":"2023","unstructured":"Tejas Pagare, Vivek Borkar, and Konstantin Avrachenkov. 2023. Full gradient deep reinforcement learning for average-reward criterion. In Proceedings of the Learning for Dynamics and Control Conference. PMLR, 235\u2013247."},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1287\/moor.24.2.293"},{"key":"e_1_3_3_19_2","volume-title":"Markov Decision Processes: Discrete Stochastic Dynamic Programming","author":"Puterman Martin L.","year":"2014","unstructured":"Martin L. Puterman. 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons. Google-Books-ID: VvBjBAAAQBAJ."},{"key":"e_1_3_3_20_2","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT press."},{"key":"e_1_3_3_21_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Hasselt Hado Van","year":"2010","unstructured":"Hado Van Hasselt. 2010. Double Q-learning. In Proceedings of the Advances in Neural Information Processing Systems. Curran Associates, Inc. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2010\/hash\/091d584fced301b442654dd8c23b3fc9-Abstract.html"},{"key":"e_1_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1214\/15-AAP1137"},{"key":"e_1_3_3_24_2","first-page":"11878","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Wang Siwei","year":"2020","unstructured":"Siwei Wang, Longbo Huang, and John C. S. Lui. 2020. Restless-UCB, an efficient and low-complexity algorithm for online restless bandits. In Proceedings of the Advances in Neural Information Processing Systems. Curran Associates, Inc., 11878\u201311889. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/89ae0fe22c47d374bc9350ef99e01685-Abstract.html"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.2307\/3214547"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.2307\/3214163"},{"key":"e_1_3_3_28_2","volume-title":"Proceedings of the 27th Conference on Neural Information Processing Systems","author":"Xiong Guojun","year":"2023","unstructured":"Guojun Xiong and Jian Li. 2023. Finite-time analysis of whittle index based Q-learning for restless multi-armed bandits with neural network function approximation. In Proceedings of the 27th Conference on Neural Information Processing Systems."},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.2018.2807924"}],"container-title":["ACM Transactions on Modeling and Performance Evaluation of Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3670686","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3670686","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:58:20Z","timestamp":1750294700000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3670686"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,13]]},"references-count":28,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,9,30]]}},"alternative-id":["10.1145\/3670686"],"URL":"https:\/\/doi.org\/10.1145\/3670686","relation":{},"ISSN":["2376-3639","2376-3647"],"issn-type":[{"value":"2376-3639","type":"print"},{"value":"2376-3647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,13]]},"assertion":[{"value":"2022-07-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-28","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}