{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T03:33:27Z","timestamp":1773891207457,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":31,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,14]],"date-time":"2021-08-14T00:00:00Z","timestamp":1628899200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Harvard Center for Research on Computation and Society"},{"name":"National Science Foundation Graduate Research Fellowship","award":["DGE1745303"],"award-info":[{"award-number":["DGE1745303"]}]},{"name":"Army Research Office Multidisciplinary University Research Initiative","award":["W911NF1810208"],"award-info":[{"award-number":["W911NF1810208"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,14]]},"DOI":"10.1145\/3447548.3467370","type":"proceedings-article","created":{"date-parts":[[2021,8,12]],"date-time":"2021-08-12T06:12:09Z","timestamp":1628748729000},"page":"871-881","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["Q-Learning Lagrange Policies for Multi-Action Restless Bandits"],"prefix":"10.1145","author":[{"given":"Jackson A.","family":"Killian","sequence":"first","affiliation":[{"name":"Harvard University, Cambridge, MA, USA"}]},{"given":"Arpita","family":"Biswas","sequence":"additional","affiliation":[{"name":"Harvard University, Cambridge, MA, USA"}]},{"given":"Sanket","family":"Shah","sequence":"additional","affiliation":[{"name":"Harvard University, Cambridge, MA, USA"}]},{"given":"Milind","family":"Tambe","sequence":"additional","affiliation":[{"name":"Harvard University, Cambridge, MA, USA"}]}],"member":"320","published-online":{"date-parts":[[2021,8,14]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0363012999361974"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1287\/opre.1070.0445"},{"key":"e_1_3_2_1_3_1","volume-title":"Whittle index based q-learning for restless bandits with average reward. arXiv preprint arXiv:2004.14427","author":"Avrachenkov Konstantin","year":"2020","unstructured":"Konstantin Avrachenkov and Vivek S Borkar . 2020. Whittle index based q-learning for restless bandits with average reward. arXiv preprint arXiv:2004.14427 ( 2020 ). Konstantin Avrachenkov and Vivek S Borkar. 2020. Whittle index based q-learning for restless bandits with average reward. arXiv preprint arXiv:2004.14427 (2020)."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209811.3209865"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2021\/556"},{"key":"e_1_3_2_1_6_1","volume-title":"Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. 1467--1468","author":"Biswas Arpita","year":"2021","unstructured":"Arpita Biswas , Gaurav Aggarwal , Pradeep Varakantham , and Milind Tambe . 2021 b. Learning Index Policies for Restless Bandits with Application to Maternal Healthcare . In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. 1467--1468 . Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, and Milind Tambe. 2021 b. Learning Index Policies for Restless Bandits with Application to Maternal Healthcare. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. 1467--1468."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.2196\/formative.9707"},{"key":"e_1_3_2_1_8_1","volume-title":"Towards Q-learning the Whittle Index for Restless Bandits. In 2019 Australian & New Zealand Control Conference (ANZCC). IEEE, 249--254","author":"Fu Jing","year":"2019","unstructured":"Jing Fu , Yoni Nazarathy , Sarat Moka , and Peter G Taylor . 2019 . Towards Q-learning the Whittle Index for Restless Bandits. In 2019 Australian & New Zealand Control Conference (ANZCC). IEEE, 249--254 . Jing Fu, Yoni Nazarathy, Sarat Moka, and Peter G Taylor. 2019. Towards Q-learning the Whittle Index for Restless Bandits. In 2019 Australian & New Zealand Control Conference (ANZCC). IEEE, 249--254."},{"key":"e_1_3_2_1_9_1","volume-title":"Learning in restless multi-armed bandits via adaptive arm sequencing rules","author":"Gafni Tomer","year":"2020","unstructured":"Tomer Gafni and Kobi Cohen . 2020. Learning in restless multi-armed bandits via adaptive arm sequencing rules . IEEE Trans. Automat. Control ( 2020 ). Tomer Gafni and Kobi Cohen. 2020. Learning in restless multi-armed bandits via adaptive arm sequencing rules. IEEE Trans. Automat. Control (2020)."},{"key":"e_1_3_2_1_10_1","unstructured":"Robert Gardner. 2017. Convex Functions. https:\/\/faculty.etsu.edu\/gardnerr\/5210\/Beamer-Proofs\/Proofs-6-6-print.pdf. Accessed: 2021-01--15.  Robert Gardner. 2017. Convex Functions. https:\/\/faculty.etsu.edu\/gardnerr\/5210\/Beamer-Proofs\/Proofs-6-6-print.pdf. Accessed: 2021-01--15."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Kevin D Glazebrook David J Hodge and Chris Kirkbride. 2011. General notions of indexability for queueing control and asset management. Ann. Appl. Probab. (2011) 876--907.  Kevin D Glazebrook David J Hodge and Chris Kirkbride. 2011. General notions of indexability for queueing control and asset management. Ann. Appl. Probab. (2011) 876--907.","DOI":"10.1214\/10-AAP705"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1239\/aap\/1158684996"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cor.2011.11.017"},{"key":"e_1_3_2_1_14_1","unstructured":"LLC Gurobi Optimization. 2021. Gurobi Optimizer Reference Manual. http:\/\/www.gurobi.com  LLC Gurobi Optimization. 2021. Gurobi Optimizer Reference Manual. http:\/\/www.gurobi.com"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1239\/aap\/1444308876"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CISS.2012.6310816"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/3463952.3464038"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330777"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2016.12.014"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1287\/msom.2017.0697"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2010.2068950"},{"key":"e_1_3_2_1_23_1","unstructured":"Aditya Mate Jackson A Killian Haifend Xu Andrew Perrault and Milind Tambe. 2020. Collapsing Bandits and Their Application to Public Health Interventions. In Neural Information Processing Systems NeurIPS.  Aditya Mate Jackson A Killian Haifend Xu Andrew Perrault and Milind Tambe. 2020. Collapsing Bandits and Their Application to Public Health Interventions. In Neural Information Processing Systems NeurIPS."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/SCT.1994.315792"},{"key":"e_1_3_2_1_25_1","volume-title":"Markov Decision Processes: Discrete Stochastic Dynamic Programming","author":"Puterman Martin L","unstructured":"Martin L Puterman . 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming . John Wiley & Sons . Martin L Puterman. 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons."},{"key":"e_1_3_2_1_26_1","volume-title":"Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. 123--131","author":"Qian Yundi","year":"2016","unstructured":"Yundi Qian , Chao Zhang , Bhaskar Krishnamachari , and Milind Tambe . 2016 . Restless poachers: Handling exploration-exploitation tradeoffs in security domains . In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. 123--131 . Yundi Qian, Chao Zhang, Bhaskar Krishnamachari, and Milind Tambe. 2016. Restless poachers: Handling exploration-exploitation tradeoffs in security domains. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. 123--131."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cor.2020.104927"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/COMSNETS48256.2020.9027444"},{"key":"e_1_3_2_1_29_1","volume-title":"Machine learning","author":"Watkins Christopher JCH","year":"1992","unstructured":"Christopher JCH Watkins and Peter Dayan . 1992. Q-learning. Machine learning , Vol. 8 , 3--4 ( 1992 ), 279--292. Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning, Vol. 8, 3--4 (1992), 279--292."},{"key":"e_1_3_2_1_30_1","unstructured":"Christopher John Cornish Hellaby Watkins. 1989. Learning from delayed rewards. (1989).  Christopher John Cornish Hellaby Watkins. 1989. Learning from delayed rewards. (1989)."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.2307\/3214547"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.2307\/3214163"}],"event":{"name":"KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","location":"Virtual Event Singapore","acronym":"KDD '21","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"]},"container-title":["Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &amp; Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447548.3467370","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3447548.3467370","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:18:23Z","timestamp":1750191503000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447548.3467370"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,14]]},"references-count":31,"alternative-id":["10.1145\/3447548.3467370","10.1145\/3447548"],"URL":"https:\/\/doi.org\/10.1145\/3447548.3467370","relation":{},"subject":[],"published":{"date-parts":[[2021,8,14]]},"assertion":[{"value":"2021-08-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}