{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,26]],"date-time":"2026-04-26T09:55:52Z","timestamp":1777197352095,"version":"3.51.4"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2022,3,31]],"date-time":"2022-03-31T00:00:00Z","timestamp":1648684800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation","award":["CNS-1524317 and CNS-1907905"],"award-info":[{"award-number":["CNS-1524317 and CNS-1907905"]}]},{"DOI":"10.13039\/100000006","name":"Office of Naval Research","doi-asserted-by":"crossref","award":["N00014-20-1-2119"],"award-info":[{"award-number":["N00014-20-1-2119"]}],"id":[{"id":"10.13039\/100000006","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Model. Perform. Eval. Comput. Syst."],"published-print":{"date-parts":[[2022,3,31]]},"abstract":"<jats:p>With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are often unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieve desirable network performance (e.g., high throughput or low delay). In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized. Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem. To overcome this difficulty, we propose a new algorithm, called RL for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space while applying a known stabilizing policy for the rest of the states. We establish that the average queue backlog under RL-QN with an appropriately constructed subset can be arbitrarily close to the optimal result. We evaluate RL-QN in dynamic server allocation, routing, and switching problems. Simulation results show that RL-QN minimizes the average queue backlog effectively.<\/jats:p>","DOI":"10.1145\/3529375","type":"journal-article","created":{"date-parts":[[2022,4,21]],"date-time":"2022-04-21T12:50:12Z","timestamp":1650545412000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3950-4965","authenticated-orcid":false,"given":"Bai","family":"Liu","sequence":"first","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2834-6866","authenticated-orcid":false,"given":"Qiaomin","family":"Xie","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison, Madison, WI"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8238-8130","authenticated-orcid":false,"given":"Eytan","family":"Modiano","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,8,17]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"3692","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Abbasi-Yadkori Yasin","year":"2019","unstructured":"Yasin Abbasi-Yadkori, Peter Bartlett, Kush Bhatia, Nevena Lazic, Csaba Szepesvari, and Gell\u00e9rt Weisz. 2019. Politex: Regret bounds for policy iteration using expert prediction. In Proceedings of the International Conference on Machine Learning. PMLR, 3692\u20133702."},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/SPAWC.2008.4641641"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/SFCS.1993.366841"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-377-6.50013-X"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/0167-6911(85)90037-4"},{"key":"e_1_3_2_7_2","article-title":"REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs","author":"Bartlett Peter L.","year":"2012","unstructured":"Peter L. Bartlett and Ambuj Tewari. 2012. REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs. arXiv:1205.2661. Retrieved from https:\/\/arxiv.org\/abs\/1205.2661.","journal-title":"arXiv:1205.2661"},{"key":"e_1_3_2_8_2","volume-title":"Dynamic Programming and Optimal Control","author":"Bertsekas Dimitri","year":"2017","unstructured":"Dimitri Bertsekas. 2017. Dynamic Programming and Optimal Control. Vol. 1. Athena scientific Belmont, MA."},{"key":"e_1_3_2_9_2","unstructured":"Dimitris Bertsimas David Gamarnik John N. Tsitsiklis et\u00a0al. 1998. Geometric bounds for stationary distributions of infinite markov chains via lyapunov functions. https:\/\/www.researchgate.net\/profile\/Dimitris-Bertsimas\/publication\/37594830_Geometric_Bounds_for_Stationary_Distributions_of_Infinite_Markov_Chains_Via_Lyapunov_Functions\/links\/53f389350cf2155be34f3316\/Geometric-Bounds-for-Stationary-Distributions-of-Infinite-Markov-Chains-Via-Lyapunov-Functions.pdf."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1214\/aoap\/1015345407"},{"key":"e_1_3_2_11_2","first-page":"43","article-title":"Optimization of multiclass queueing networks: Polyhedral and nonlinear characterizations of achievable performance","author":"Bertsimas Dimitris","year":"1994","unstructured":"Dimitris Bertsimas, Ioannis C. H. Paschalidis, and John N. Tsitsiklis. 1994. Optimization of multiclass queueing networks: Polyhedral and nonlinear characterizations of achievable performance. The Annals of Applied Probability (1994), 43\u201375.","journal-title":"The Annals of Applied Probability"},{"key":"e_1_3_2_12_2","first-page":"671","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Boyan Justin A.","year":"1994","unstructured":"Justin A. Boyan and Michael L. Littman. 1994. Packet routing in dynamically changing networks: A reinforcement learning approach. In Proceedings of the Advances in Neural Information Processing Systems. Citeseer, 671\u2013678."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-3124-8_5"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.5555\/2980539.2980713"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSAC.2016.2525518"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.2307\/1427064"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1019182903300"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1364\/OFC.2018.W4F.2"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2011.2178150"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1287\/opre.1040.0170"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2018.2800101"},{"key":"e_1_3_2_22_2","first-page":"1578","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Fruit Ronan","year":"2018","unstructured":"Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, and Ronald Ortner. 2018. Efficient bias-span-constrained exploration-exploitation in reinforcement learning. In Proceedings of the International Conference on Machine Learning. PMLR, 1578\u20131586."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2006.890695"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICC40277.2020.9148923"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1000"},{"key":"e_1_3_2_26_2","volume-title":"Homogeneous Denumerable Markov Processes","author":"Hou Zhenting","year":"2012","unstructured":"Zhenting Hou and Qingfeng Guo. 2012. Homogeneous Denumerable Markov Processes. Springer Science & Business Media."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1287\/moor.22.4.921"},{"issue":"51","key":"e_1_3_2_28_2","first-page":"1563","article-title":"Near-optimal regret bounds for reinforcement learning","volume":"11","author":"Jaksch Thomas","year":"2010","unstructured":"Thomas Jaksch, Ronald Ortner, and Peter Auer. 2010. Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research 11, 51 Apr (2010), 1563\u20131600.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_29_2","first-page":"2137","volume-title":"Proceedings of the Conference on Learning Theory","author":"Jin Chi","year":"2020","unstructured":"Chi Jin, Zhuoran Yang, Zhaoran Wang, and Michael I. Jordan. 2020. Provably efficient reinforcement learning with linear function approximation. In Proceedings of the Conference on Learning Theory. PMLR, 2137\u20132143."},{"key":"e_1_3_2_30_2","first-page":"593","volume-title":"Proceedings of the Annual Allerton Conference on Communication Control and Computing","volume":"39","author":"Keslassy Isaac","year":"2001","unstructured":"Isaac Keslassy and Nick McKeown. 2001. Analysis of scheduling algorithms that provide 100% throughput in input-queued switches. In Proceedings of the Annual Allerton Conference on Communication Control and Computing, Vol. 39. Citeseer, 593\u2013602."},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF02894735"},{"key":"e_1_3_2_32_2","unstructured":"Daphne Koller and Ron Parr. 2000. Policy iteration for factored MDPs. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence ."},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/9.341782"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/9.310033"},{"key":"e_1_3_2_35_2","doi-asserted-by":"crossref","unstructured":"Qingkai Liang and Eytan Modiano. 2019. Optimal network control in partially-controllable networks. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications . IEEE.","DOI":"10.1109\/INFOCOM.2019.8737528"},{"key":"e_1_3_2_36_2","unstructured":"Michael L. Littman Thomas L. Dean and Leslie Pack Kaelbling. 1985. On the complexity of solving markov decision problems. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence ."},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2020.2967566"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1287\/15-SSY193"},{"key":"e_1_3_2_39_2","article-title":"Selecting the state-representation in reinforcement learning","author":"Maillard Odalric-Ambrym","year":"2013","unstructured":"Odalric-Ambrym Maillard, R\u00e9mi Munos, and Daniil Ryabko. 2013. Selecting the state-representation in reinforcement learning. arXiv:1302.2552. Retrieved from https:\/\/arxiv.org\/abs\/1302.2552.","journal-title":"arXiv:1302.2552"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/26.780463"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFCOM.1998.665102"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.2307\/3213440"},{"key":"e_1_3_2_43_2","volume-title":"Markov Chains and Stochastic Stability","author":"Meyn Sean P.","year":"2012","unstructured":"Sean P. Meyn and Richard L. Tweedie. 2012. Markov Chains and Stochastic Stability. Springer Science & Business Media."},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/GLOCOM.2017.8254101"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CDC.2010.5717235"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2007.900405"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFCOM.2003.1208724"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1.11316"},{"key":"e_1_3_2_49_2","first-page":"3003","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Osband Ian","year":"2013","unstructured":"Ian Osband, Daniel Russo, and Benjamin Van Roy. 2013. (More) efficient reinforcement learning via posterior sampling. In Proceedings of the Advances in Neural Information Processing Systems. 3003\u20133011."},{"key":"e_1_3_2_50_2","unstructured":"Yi Ouyang Mukul Gagrani Ashutosh Nayyar and Rahul Jain. 2017. Learning unknown markov decision processes: A thompson sampling approach. Advances in Neural Information Processing Systems 30 (2017)."},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2002.1007796"},{"key":"e_1_3_2_52_2","unstructured":"Guannan Qu Yiheng Lin Adam Wierman and Na Li. 2020. Scalable multi-agent reinforcement learning for networked systems with average reward. Advances in Neural Information Processing Systems 33 (2020) 2074\u20132086."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1239\/jap\/1032374646"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1214\/11-AAP759"},{"issue":"4","key":"e_1_3_2_55_2","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1002\/9781118909690.ch16","article-title":"Overlay networks: An akamai perspective","volume":"51","author":"Sitaraman Ramesh K.","year":"2014","unstructured":"Ramesh K. Sitaraman, Mangesh Kasbekar, Woody Lichtenstein, and Manish Jain. 2014. Overlay networks: An akamai perspective. Advanced Content Delivery, Streaming, and Cloud Services 51, 4 (2014), 305\u2013328.","journal-title":"Advanced Content Delivery, Streaming, and Cloud Services"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-25280-8_5"},{"key":"e_1_3_2_57_2","first-page":"1057","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Sutton Richard S.","year":"2000","unstructured":"Richard S. Sutton, David A. McAllester, Satinder P. Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the Advances in Neural Information Processing Systems. 1057\u20131063."},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/CDC.1990.204000"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/18.212277"},{"key":"e_1_3_2_60_2","article-title":"Posterior sampling for large scale reinforcement learning","author":"Theocharous Georgios","year":"2017","unstructured":"Georgios Theocharous, Zheng Wen, Yasin Abbasi-Yadkori, and Nikos Vlassis. 2017. Posterior sampling for large scale reinforcement learning. arXiv:1711.07979. Retrieved from https:\/\/arxiv.org\/abs\/1711.07979.","journal-title":"arXiv:1711.07979"},{"key":"e_1_3_2_61_2","volume-title":"Approximate Linear Programming for Networks: Average Cost Bounds","author":"Veatch Michael H.","year":"2010","unstructured":"Michael H. Veatch. 2010. Approximate Linear Programming for Networks: Average Cost Bounds. Technical Report. Working paper, Gordon College."},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCCN.2018.2809722"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"e_1_3_2_64_2","first-page":"10170","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Wei Chen-Yu","year":"2020","unstructured":"Chen-Yu Wei, Mehdi Jafarnia Jahromi, Haipeng Luo, Hiteshi Sharma, and Rahul Jain. 2020. Model-free reinforcement learning in infinite-horizon average-reward markov decision processes. In Proceedings of the International Conference on Machine Learning. PMLR, 10170\u201310180."},{"key":"e_1_3_2_65_2","unstructured":"Tsachy Weissman Erik Ordentlich Gadiel Seroussi Sergio Verdu and Marcelo J. Weinberger. 2003. Inequalities for the L1 deviation of the empirical distribution. Hewlett-Packard Labs Technical Report HPL-2003-97R1."},{"key":"e_1_3_2_66_2","volume-title":"Multiaccess and Fading in Communication Networks","author":"Yeh Edmund Meng","year":"2001","unstructured":"Edmund Meng Yeh. 2001. Multiaccess and Fading in Communication Networks. Ph.D. Dissertation. Massachusetts Institute of Technology."},{"key":"e_1_3_2_67_2","unstructured":"Edmund M. Yeh. 2003. Throughput and delay optimal resource allocation in multiple access fading channels. In Proceedings of the IEEE International Symposium on Information Theory ."},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2877686"},{"key":"e_1_3_2_69_2","unstructured":"Zihan Zhang and Xiangyang Ji. 2019. Regret minimization for reinforcement learning by evaluating the optimal bias function. Advances in Neural Information Processing Systems 32 (2019)."}],"container-title":["ACM Transactions on Modeling and Performance Evaluation of Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3529375","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3529375","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:39Z","timestamp":1750188639000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3529375"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,31]]},"references-count":68,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,3,31]]}},"alternative-id":["10.1145\/3529375"],"URL":"https:\/\/doi.org\/10.1145\/3529375","relation":{},"ISSN":["2376-3639","2376-3647"],"issn-type":[{"value":"2376-3639","type":"print"},{"value":"2376-3647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,31]]},"assertion":[{"value":"2020-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-08-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}