{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2021,5,12]],"date-time":"2021-05-12T02:39:16Z","timestamp":1620787156125},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"DOI":"10.13039\/100000144","name":"Division of Computer and Network Systems","doi-asserted-by":"publisher","award":["CNS-0540347CCF-0745761"]},{"DOI":"10.13039\/100000143","name":"Division of Computing and Communication Foundations","doi-asserted-by":"publisher","award":["CNS-0540347CCF-0745761"]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. ACM"],"published-print":{"date-parts":[[2010,12]]},"abstract":"\n The\n restless bandit<\/jats:italic>\n problem is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit (MAB) problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACE-Hard to approximate to any nontrivial factor, and little progress has been made on this problem despite its significance in modeling activity allocation under uncertainty.\n <\/jats:p>\n \n In this article, we consider the Feedback MAB problem, where the reward obtained by playing each of\n n<\/jats:italic>\n independent arms varies according to an underlying on\/off Markov process whose exact state is only revealed when the arm is played. The goal is to design a policy for playing the arms in order to maximize the infinite horizon time average expected reward. This problem is also an instance of a Partially Observable Markov Decision Process (POMDP), and is widely studied in wireless scheduling and unmanned aerial vehicle (UAV) routing. Unlike the stochastic MAB problem, the Feedback MAB problem does not admit to greedy index-based optimal policies.\n <\/jats:p>\n \n We develop a novel duality-based algorithmic technique that yields a surprisingly simple and intuitive (2+\u03f5)-approximate greedy policy to this problem. We show that both in terms of approximation factor and computational efficiency, our policy is closely related to the\n Whittle index<\/jats:italic>\n , which is widely used for its simplicity and efficiency of computation. Subsequently we define a multi-state generalization, that we term Monotone bandits, which remains subclass of the restless bandit problem. We show that our policy remains a 2-approximation in this setting, and further, our technique is robust enough to incorporate various side-constraints such as blocking plays, switching costs, and even models where determining the state of an arm is a separate operation from playing it.\n <\/jats:p>\n Our technique is also of independent interest for other restless bandit problems, and we provide an example in nonpreemptive machine replenishment. Interestingly, in this case, our policy provides a constant factor guarantee, whereas the Whittle index is provably polynomially worse.<\/jats:p>\n By presenting the first O(1) approximations for nontrivial instances of restless bandits as well as of POMDPs, our work initiates the study of approximation algorithms in both these contexts.<\/jats:p>","DOI":"10.1145\/1870103.1870106","type":"journal-article","created":{"date-parts":[[2010,12,22]],"date-time":"2010-12-22T14:41:31Z","timestamp":1293028891000},"page":"1-50","source":"Crossref","is-referenced-by-count":42,"title":["Approximation algorithms for restless bandit problems"],"prefix":"10.1145","volume":"58","author":[{"given":"Sudipto","family":"Guha","sequence":"first","affiliation":[{"name":"University of Pennsylvania, Philadelphia, PA"}]},{"given":"Kamesh","family":"Munagala","sequence":"additional","affiliation":[{"name":"Duke University, Durham, NC"}]},{"given":"Peng","family":"Shi","sequence":"additional","affiliation":[{"name":"Duke University, Durham, NC"}]}],"member":"320","reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the Annual Conference on Learning Theory (COLT). 263--274","author":"Abernethy J."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1287\/opre.1070.0445"},{"key":"e_1_2_1_3_1","unstructured":"Ahmad S. H. A. Liu M. Javidi T. Zhao Q. and Krishnamachari B. 2008. Optimality of myopic sensing in multi-channel opportunistic access. CoRR~abs\/0811.0637. Ahmad S. H. A. Liu M. Javidi T. Zhao Q. and Krishnamachari B. 2008. Optimality of myopic sensing in multi-channel opportunistic access. CoRR~abs\/0811.0637."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/s001860200257"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.2307\/1905525"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/9.486316"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the Annual Conference on Learning Theorys (COLT).","author":"Audibert J.-Y."},{"key":"e_1_2_1_8_1","first-page":"397","article-title":"Using confidence bounds for exploitation-exploration trade-offs","volume":"3","author":"Auer P.","year":"2002","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1013689704352"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539701398375"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.2307\/2951664"},{"key":"e_1_2_1_12_1","volume-title":"Procedings of the ACM-SIAM Symposium on Discrete Algorithms. 11--20","author":"Bar-Noy A."},{"key":"e_1_2_1_13_1","unstructured":"Bertsekas D. 2001. Dynamic Programming and Optimal Control 2nd Ed. Athena Scientific. Bertsekas D. 2001. Dynamic Programming and Optimal Control 2nd Ed. Athena Scientific."},{"key":"e_1_2_1_14_1","first-page":"1384","article-title":"Performance of multiclass Markovian queueing networks via piecewise linear Lyapunov functions","volume":"11","author":"Bertsimas D.","year":"2002","journal-title":"Ann. of Appl. Prob."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1287\/moor.21.2.257"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1287\/opre.48.1.80.12444"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0165-1889(01)00028-8"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/258128.258179"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1183907.1183911"},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Faigle U. Kern W. and Still G. 2002. Algorithmic Principles of Mathematical Programming. Kluwer Dordrecht. Faigle U. Kern W. and Still G. 2002. Algorithmic Principles of Mathematical Programming. Kluwer Dordrecht.","DOI":"10.1007\/978-94-015-9896-5"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'05)","author":"Flaxman A. D.","year":"2005"},{"key":"e_1_2_1_22_1","volume-title":"Progress in Statistics (European Meeting of Statisticians).","author":"Gittins J. C."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1002\/nav.10036"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejor.2004.01.036"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1239\/aap\/1158684996"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1142351.1142380"},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Goseva-Popstojanova K. and Trivedi K. S. 2000. Stochastic modeling formalisms for dependability performance and performability. In Performance Evaluation: Origins and Directions. 403--422. Goseva-Popstojanova K. and Trivedi K. S. 2000. Stochastic modeling formalisms for dependability performance and performability. In Performance Evaluation: Origins and Directions. 403--422.","DOI":"10.1007\/3-540-46506-5_17"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250790.1250807"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2007.12"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'07)","author":"Guha S."},{"key":"e_1_2_1_31_1","unstructured":"Guha S. Munagala K. and Sarkar S. 2008. Information acquisition and exploitation in multichannel wireless networks. CoRR abs\/0804.1724. Guha S. Munagala K. and Sarkar S. 2008. Information acquisition and exploitation in multichannel wireless networks. CoRR abs\/0804.1724."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'09)","author":"Guha S."},{"key":"e_1_2_1_33_1","unstructured":"Hawkins J. T. 2003. A Lagrangian decomposition approach to weakly coupled dynamic optimization problems and its applications. Ph.D. dissertation Operations Research Center Massachusetts Institute of Technology. Hawkins J. T. 2003. A Lagrangian decomposition approach to weakly coupled dynamic optimization problems and its applications. Ph.D. dissertation Operations Research Center Massachusetts Institute of Technology."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(98)00023-X"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/11503415_41"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of INFOCOM. 53--61","author":"Kodialam M. S."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/0196-8858(85)90002-8"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1006\/inco.1994.1009"},{"key":"e_1_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Liu K. and Zhao Q. 2008. A restless bandit formulation of multi-channel opportunistic access: Indexablity and index policy. Computing Research Repository arXiv:0810.4658. Liu K. and Zhao Q. 2008. A restless bandit formulation of multi-channel opportunistic access: Indexablity and index policy. Computing Research Repository arXiv:0810.4658.","DOI":"10.1109\/SAHCNW.2008.12"},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of the 20th International Colloguium on Automata, Languages and Programming (ICALP). 40--51","author":"Lund C."},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the 13th International Conference on Integer Programming and Combinatorial Optimization (IPCO'08)","author":"Munagala K."},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the American Control Conference.","author":"Ny J. L."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1287\/moor.24.2.293"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1090\/S0002-9904-1952-09620-8"},{"key":"e_1_2_1_45_1","unstructured":"Slivkins A. and Upfal E. 2008. Adapting to a changing environment: The Brownian restless bandits. In (COLT). 343--354. Slivkins A. and Upfal E. 2008. Adapting to a changing environment: The Brownian restless bandits. In (COLT). 343--354."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1287\/opre.21.5.1071"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1287\/opre.26.2.282"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoap\/1177005207"},{"key":"e_1_2_1_49_1","unstructured":"Wald A. 1947. Sequential Analysis. Wiley New York. Wald A. 1947. Sequential Analysis. Wiley New York."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.2307\/3214547"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0269964800000826"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1214\/aop\/1176994469"},{"key":"e_1_2_1_53_1","doi-asserted-by":"crossref","unstructured":"Whittle P. 1988. Restless bandits: Activity allocation in a changing world. Appl. Prob. 25 A 287--298. Whittle P. 1988. Restless bandits: Activity allocation in a changing world. Appl. Prob. 25 A 287--298.","DOI":"10.1017\/S0021900200040420"},{"key":"e_1_2_1_54_1","unstructured":"Zhao Q. Krishnamachari B. and Liu K. 2007. On myopic sensing for multi-channel opportunistic access. CoRR abs\/0712.0035. Zhao Q. Krishnamachari B. and Liu K. 2007. On myopic sensing for multi-channel opportunistic access. CoRR abs\/0712.0035."},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the International Conference on Machine Learning (ICML). 928--936","author":"Zinkevich M.","year":"2003"}],"container-title":["Journal of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1870103.1870106","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,2,21]],"date-time":"2021-02-21T17:18:25Z","timestamp":1613927905000},"score":1,"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,12]]},"references-count":55,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,12]]}},"alternative-id":["10.1145\/1870103.1870106"],"URL":"http:\/\/dx.doi.org\/10.1145\/1870103.1870106","relation":{},"ISSN":["0004-5411","1557-735X"],"issn-type":[{"value":"0004-5411","type":"print"},{"value":"1557-735X","type":"electronic"}],"subject":["Artificial Intelligence","Hardware and Architecture","Information Systems","Control and Systems Engineering","Software"],"published":{"date-parts":[[2010,12]]}}}