{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T15:25:31Z","timestamp":1773156331259,"version":"3.50.1"},"reference-count":35,"publisher":"Institute for Operations Research and the Management Sciences (INFORMS)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Management Science"],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:p>This paper addresses an important class of restless multiarmed bandit (RMAB) problems that finds broad application in operations research, stochastic optimization, and reinforcement learning. There are N independent Markov processes that may be operated, observed and offer rewards. Due to the resource constraint, we can only choose a subset of [Formula: see text] processes to operate and accrue reward determined by the states of selected processes. We formulate the problem as a partially observable RMAB with an infinite state space and design an algorithm that achieves a near-optimal performance with low complexity. Our algorithm is based on a generalization of Whittle\u2019s original idea of indexability. Referred to as the relaxed indexability, the extended definition leads to the efficient online verifications and computations of the approximate Whittle index under the proposed algorithmic framework.<\/jats:p>\n                  <jats:p>This paper was accepted by Chung Piaw Teo, optimization.<\/jats:p>\n                  <jats:p>Supplemental Material: The online appendix and data files are available at https:\/\/doi.org\/10.1287\/mnsc.2022.02831 .<\/jats:p>","DOI":"10.1287\/mnsc.2022.02831","type":"journal-article","created":{"date-parts":[[2025,4,16]],"date-time":"2025-04-16T12:57:01Z","timestamp":1744808221000},"page":"10106-10121","source":"Crossref","is-referenced-by-count":1,"title":["Relaxed Indexability and Index Policy for Partially Observable Restless Bandits"],"prefix":"10.1287","volume":"71","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4783-346X","authenticated-orcid":false,"given":"Keqin","family":"Liu","sequence":"first","affiliation":[{"name":"School of Mathematics and Physics, Xi\u2019an Jiaotong-Liverpool University, Suzhou 215123, China; and Jiangsu National Center for Applied Mathematics, Nanjing 210093, China"}]}],"member":"109","reference":[{"key":"B1","volume-title":"Dynamic Programming: Deterministic and Stochastic Models","author":"Bertsekas DP","year":"1987"},{"key":"B2","doi-asserted-by":"publisher","DOI":"10.1287\/moor.21.2.257"},{"key":"B3","doi-asserted-by":"publisher","DOI":"10.1287\/opre.48.1.80.12444"},{"key":"B4","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.2019.3342"},{"key":"B5","doi-asserted-by":"crossref","unstructured":"Elmaghraby HM, Liu K, Ding Z (2018) Femtocell scheduling as a restless multiarmed bandit problem using partial channel state observation.\n                      Proc. IEEE Internat. Conf. Comm. (ICC)\n                      (IEEE, Piscataway, NJ), 1\u20136.","DOI":"10.1109\/ICC.2018.8422619"},{"key":"B6","doi-asserted-by":"publisher","DOI":"10.1007\/s10479-013-1523-0"},{"key":"B7","unstructured":"Gast N, Gaujal B, Yan C (2021) (Close to) Optimal policies for finite horizon restless bandits. Working paper. https:\/\/hal.inria.fr\/hal-03262307\/file\/LP_paper.pdf."},{"key":"B9","doi-asserted-by":"publisher","DOI":"10.1111\/j.2517-6161.1979.tb01068.x"},{"key":"B10","doi-asserted-by":"publisher","DOI":"10.1002\/9780470980033"},{"key":"B12","doi-asserted-by":"publisher","DOI":"10.1287\/opre.1080.0632"},{"key":"B13","doi-asserted-by":"publisher","DOI":"10.1007\/s11134-011-9217-2"},{"key":"B14","unstructured":"Hu W, Frazier PI (2017) An asymptotically optimal index policy for finite-horizon restless bandits. Preprint, submitted July 1, https:\/\/arxiv.org\/abs\/1707.00205."},{"key":"B15","doi-asserted-by":"crossref","unstructured":"Lapiccirella FE, Liu K, Ding Z (2011) Multi-channel opportunistic access based on primary ARQ messages overhearing.\n                      Proc. IEEE Internat. Conf. Comm. (ICC)\n                      (IEEE, Piscataway, NJ), 1\u20135.","DOI":"10.1109\/icc.2011.5963326"},{"key":"B16","doi-asserted-by":"crossref","unstructured":"Le Ny J, Dahleh M, Feron E (2008) Multi-UAV dynamic routing with partial observations using restless bandit allocation indices.\n                      Proc. Amer. Control Conf.\n                      (IEEE, Piscataway, NJ), 4220\u20134225.","DOI":"10.1109\/ACC.2008.4587156"},{"issue":"4","key":"B17","first-page":"372","volume":"42","author":"Liu K","year":"2020","journal-title":"Numer. Math."},{"key":"B18","doi-asserted-by":"publisher","DOI":"10.1007\/s00186-024-00868-x"},{"key":"B19","doi-asserted-by":"crossref","unstructured":"Liu K, Weber RR, Zhao Q (2011) Indexability and Whittle index for restless bandit problems involving reset processes.\n                      Proc. 50th IEEE Conf. Decision Control\n                      (IEEE, Piscataway, NJ), 7690\u20137696.","DOI":"10.1109\/CDC.2011.6160533"},{"key":"B20","doi-asserted-by":"crossref","unstructured":"Liu K, Zhao Q (2008) A restless bandit formulation of opportunistic access: Indexability and index Policy.\n                      Proc. IEEE Workshop Networking Technol. Software Defined Radio (SDR) Networks\n                      (IEEE, Piscataway, NJ), 1\u20135.","DOI":"10.1109\/SAHCNW.2008.12"},{"key":"B21","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2010.2068950"},{"key":"B22","doi-asserted-by":"crossref","unstructured":"Liu K, Zhao Q (2012) Dynamic intrusion detection in resource-constrained cyber networks.\n                      IEEE Internat. Symposium Inform. Theory Proc.\n                      (IEEE, Piscataway, NJ), 970\u2013974.","DOI":"10.1109\/ISIT.2012.6284708"},{"key":"B23","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2010.2041600"},{"key":"B25","doi-asserted-by":"publisher","DOI":"10.1017\/S0001867800010648"},{"key":"B26","doi-asserted-by":"publisher","DOI":"10.1007\/s11750-007-0025-0"},{"key":"B27","doi-asserted-by":"publisher","DOI":"10.1287\/moor.24.2.293"},{"key":"B28","doi-asserted-by":"publisher","DOI":"10.1287\/opre.26.2.282"},{"key":"B29","doi-asserted-by":"publisher","DOI":"10.2307\/2332286"},{"key":"B30","doi-asserted-by":"publisher","DOI":"10.1214\/15-AAP1137"},{"key":"B31","doi-asserted-by":"publisher","DOI":"10.1109\/TVT.2013.2285713"},{"key":"B32","first-page":"1024","volume":"2","author":"Weber RR","year":"1992","journal-title":"Ann. Probab."},{"key":"B33","doi-asserted-by":"publisher","DOI":"10.2307\/3214547"},{"key":"B34","doi-asserted-by":"publisher","DOI":"10.2307\/1427757"},{"key":"B35","doi-asserted-by":"publisher","DOI":"10.1111\/j.2517-6161.1980.tb01111.x"},{"key":"B36","doi-asserted-by":"publisher","DOI":"10.2307\/3214163"},{"key":"B37","doi-asserted-by":"publisher","DOI":"10.1017\/apr.2019.29"},{"key":"B38","volume-title":"Multi-Armed Bandits: Theory and Applications to Online Learning in Networks","author":"Zhao Q","year":"2019"}],"container-title":["Management Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/pubsonline.informs.org\/doi\/pdf\/10.1287\/mnsc.2022.02831","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T09:43:06Z","timestamp":1764927786000},"score":1,"resource":{"primary":{"URL":"https:\/\/pubsonline.informs.org\/doi\/10.1287\/mnsc.2022.02831"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12]]},"references-count":35,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["10.1287\/mnsc.2022.02831"],"URL":"https:\/\/doi.org\/10.1287\/mnsc.2022.02831","relation":{},"ISSN":["0025-1909","1526-5501"],"issn-type":[{"value":"0025-1909","type":"print"},{"value":"1526-5501","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12]]}}}