{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T13:05:17Z","timestamp":1775826317070,"version":"3.50.1"},"reference-count":77,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2021,7,13]],"date-time":"2021-07-13T00:00:00Z","timestamp":1626134400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2022,7,31]]},"abstract":"<jats:p>Reinforcement learning (RL) algorithms find applications in inventory control, recommender systems, vehicular traffic management, cloud computing, and robotics. The real-world complications arising in these domains makes them difficult to solve with the basic assumptions underlying classical RL algorithms. RL agents in these applications often need to react and adapt to changing operating conditions. A significant part of research on single-agent RL techniques focuses on developing algorithms when the underlying assumption of stationary environment model is relaxed. This article provides a survey of RL methods developed for handling dynamically varying environment models. The goal of methods not limited by the stationarity assumption is to help autonomous agents adapt to varying operating conditions. This is possible either by minimizing the rewards lost during learning by RL agent or by finding a suitable policy for the RL agent that leads to efficient operation of the underlying system. A representative collection of these algorithms is discussed in detail in this work along with their categorization and their relative merits and demerits. Additionally, we also review works that are tailored to application domains. Finally, we discuss future enhancements for this field.<\/jats:p>","DOI":"10.1145\/3459991","type":"journal-article","created":{"date-parts":[[2021,7,13]],"date-time":"2021-07-13T16:48:08Z","timestamp":1626194888000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":167,"title":["A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments"],"prefix":"10.1145","volume":"54","author":[{"given":"Sindhu","family":"Padakandla","sequence":"first","affiliation":[{"name":"Department of Computer Science and Automation, Indian Institute of Science, Bangalore, Karnataka, India"}]}],"member":"320","published-online":{"date-parts":[[2021,7,13]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"1","article-title":"Addressing environment non-stationarity by repeating Q-learning updates","volume":"17","author":"Abdallah Sherief","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the International Conference on Learning Representations.","author":"Al-Shedivat Maruan","year":"2018"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the IEEE International Conference on Robotics and Automation (ICRA\u201914)","author":"Allamaraju R."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10846-020-01162-8"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISIT44484.2020.9174054"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the American Control Conference (ACC\u201917)","author":"Banerjee Taposh"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 21st International Conference on Information Fusion (FUSION\u201918). 1940","author":"Banerjee T.","year":"1947"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1038\/nn.4401"},{"key":"e_1_2_1_9_1","volume-title":"Dynamic Programming and Optimal Control (4 ed.)","author":"Bertsekas D. P."},{"key":"e_1_2_1_10_1","volume-title":"Tsitsiklis","author":"Bertsekas Dimitri P.","year":"1996"},{"key":"e_1_2_1_11_1","volume-title":"Stochastic Approximation: A Dynamical Systems Viewpoint.","author":"Borkar Vivek S.","year":"2009"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCC.2007.913919"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3220122"},{"key":"e_1_2_1_14_1","unstructured":"Samuel P. M. Choi Dit-Yan Yeung and Nevin Lianwen Zhang. 2000. An environment model for nonstationary reinforcement learning. In Advances in Neural Information Processing Systems. 987--993.  Samuel P. M. Choi Dit-Yan Yeung and Nevin Lianwen Zhang. 2000. An environment model for nonstationary reinforcement learning. In Advances in Neural Information Processing Systems. 987--993."},{"key":"e_1_2_1_15_1","volume-title":"Zhang","author":"Choi Samuel P. M.","year":"2000"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1002\/itl2.2"},{"key":"e_1_2_1_17_1","volume-title":"Value function-based reinforcement learning in changing Markovian environments. J. Mach. Learn. Res. 9 (June","author":"Cs\u00e1ji Bal\u00e1zs Csan\u00e1d","year":"2008"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 23rd International Conference on Machine Learning. 217--224","author":"da Silva Bruno C."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the International Conference on Machine Learning. 512--520","author":"Dick Travis","year":"2014"},{"key":"e_1_2_1_20_1","unstructured":"Gabriel Dulac-Arnold Daniel J. Mankowitz and Todd Hester. 2019. Challenges of real-world reinforcement learning. Retrieved from https:\/\/abs\/1904.12901.  Gabriel Dulac-Arnold Daniel J. Mankowitz and Todd Hester. 2019. Challenges of real-world reinforcement learning. Retrieved from https:\/\/abs\/1904.12901."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML\u20197). JMLR.org, 1126-1135","author":"Finn Chelsea","year":"2017"},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","volume-title":"An Introduction to Deep Reinforcement Learning","author":"Francois-Lavet Vincent","DOI":"10.1561\/9781680835397"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/3327144.3327222"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/2789272.2886795"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2843948"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1134\/S0005117918060036"},{"key":"e_1_2_1_27_1","volume-title":"Advances in Neural Information Processing Systems","author":"Gupta Abhishek"},{"key":"e_1_2_1_28_1","volume-title":"Learning over Multiple Contexts.","author":"Hadoux Emmanuel"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 36th International Conference on Machine Learning","volume":"97","author":"Hafner Danijar","year":"2019"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 12th European Workshop on Reinforcement Learning (EWRL\u201915)","author":"Hallak Assaf","year":"2015"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 18th IEEE International Conference On Machine Learning and Applications (ICMLA\u201919)","author":"Jaafra Y.","year":"2019"},{"key":"e_1_2_1_32_1","unstructured":"Thomas Jaksch Ronald Ortner and Peter Auer. 2010. Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res. 11(Apr.2010) 1563--1600.  Thomas Jaksch Ronald Ortner and Peter Auer. 2010. Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res. 11(Apr.2010) 1563--1600."},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the 35th International Conference on Machine Learning","volume":"80","author":"Kaplanis Christos","year":"2018"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 36th International Conference on Machine Learning","volume":"97","author":"Kaplanis Christos","year":"2019"},{"key":"e_1_2_1_35_1","volume-title":"Reinforcement Learning in Robotics: A Survey","author":"Kober Jens"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0363012901385691"},{"key":"e_1_2_1_37_1","volume-title":"Advances in Neural Information Processing Systems","author":"Lecarpentier Erwan"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the American Control Conference (ACC\u201919)","author":"Li Y.","year":"2019"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the 35th International Conference on Machine Learning","volume":"80","author":"Liang Eric","year":"2018"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.5555\/3237383.3237846"},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the International Conference on Robotics and Automation (ICRA\u201919)","author":"Lin X."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2814606"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the 36th International Conference on Machine Learning","volume":"97","author":"Liu Hao","year":"2019"},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids\u201919)","author":"Mesesan G."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1049\/iet-cps.2017.0048"},{"key":"e_1_2_1_46_1","volume-title":"Georg Ostrovski et\u00a0al","author":"Mnih Volodymyr","year":"2015"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201918)","author":"Moritz Philipp","year":"2018"},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS\u201920)","author":"Okada Masashi","year":"2020"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1002\/jmv.25830"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/SURV.2013.112813.00168"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence.","author":"Ortner Ronald","year":"2019"},{"key":"e_1_2_1_52_1","volume-title":"Reinforcement learning algorithm for non-stationary environments. Appl. Intell. 50, 11 (01","author":"Padakandla Sindhu","year":"2020"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/41.1-2.100"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2019.01.012"},{"key":"e_1_2_1_55_1","unstructured":"K. J. Prabuchandran Nitin Singh Pankaj Dayama and Vinayaka Pandit. 2019. Change point detection for compositional multivariate data. Retrieved from https:\/\/arXiv:1901.04935.  K. J. Prabuchandran Nitin Singh Pankaj Dayama and Vinayaka Pandit. 2019. Change point detection for compositional multivariate data. Retrieved from https:\/\/arXiv:1901.04935."},{"key":"e_1_2_1_56_1","volume-title":"Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems (ITSC\u201914)","author":"Prabuchandran K. J."},{"key":"e_1_2_1_57_1","volume-title":"Markov Decision Processes: Discrete Stochastic Dynamic Programming","author":"Puterman Martin L.","edition":"2"},{"key":"e_1_2_1_58_1","first-page":"2","article-title":"Survey on reinforcement learning applications in communication networks","volume":"4","author":"Qian Y.","year":"2019","journal-title":"J. Commun. Info. Netw."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.omega.2013.10.004"},{"key":"e_1_2_1_60_1","volume-title":"CHILD: A first step towards continual learning. In Learning to Learn","author":"Ring Mark B.","year":"1998"},{"key":"e_1_2_1_61_1","volume-title":"Proceedings of the 13th International IEEE Conference on Intelligent Transportation Systems. 531--538","author":"Salkham A."},{"key":"e_1_2_1_62_1","volume-title":"Reinforcement learning methods for operations research applications: The order release problem","author":"Schneckenreither Manuel"},{"key":"e_1_2_1_63_1","volume-title":"Proceedings of the International Conference on Machine Learning. 1889--1897","author":"Schulman John","year":"2015"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1561\/2200000018"},{"key":"e_1_2_1_65_1","unstructured":"Kun Shao Zhentao Tang Yuanheng Zhu Nannan Li and Dongbin Zhao. 2019. A Survey of Deep Reinforcement Learning in Video Games. Retrieved from https:\/\/arXiv:1912.10944.  Kun Shao Zhentao Tang Yuanheng Zhu Nannan Li and Dongbin Zhao. 2019. A Survey of Deep Reinforcement Learning in Video Games. Retrieved from https:\/\/arXiv:1912.10944."},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1137\/1108002"},{"key":"e_1_2_1_67_1","volume-title":"Luh","author":"Sun Tao","year":"2009"},{"key":"e_1_2_1_68_1","volume-title":"Barto","author":"Sutton Richard S.","year":"2018"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.5555\/3009657.3009806"},{"key":"e_1_2_1_70_1","volume-title":"Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI\u201917)","author":"Thomas Philip S.","year":"2017"},{"key":"e_1_2_1_71_1","volume-title":"Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA\u201920)","author":"Turchetta M."},{"key":"e_1_2_1_72_1","doi-asserted-by":"crossref","unstructured":"Joaquin Vanschoren. 2019. Meta-Learning. Springer International Publishing 35--61. DOI:https:\/\/doi.org\/10.1007\/978-3-030-05318-5_2  Joaquin Vanschoren. 2019. Meta-Learning. Springer International Publishing 35--61. DOI:https:\/\/doi.org\/10.1007\/978-3-030-05318-5_2","DOI":"10.1007\/978-3-030-05318-5_2"},{"key":"e_1_2_1_73_1","doi-asserted-by":"crossref","volume-title":"Theoretical Foundations of Artificial General Intelligence","author":"Wang Pei","DOI":"10.2991\/978-94-91216-62-6"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2011.10.003"},{"key":"e_1_2_1_76_1","first-page":"3","article-title":"A Survey on reinforcement learning models and algorithms for traffic signal control","volume":"50","author":"Alvin Yau Kok-Lim","year":"2017","journal-title":"ACM Comput. Surv."},{"key":"e_1_2_1_77_1","volume-title":"Proceedings of the 48th IEEE Conference on Decision and Control Conference (CDC\u201909)","author":"Yu J. Y."}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3459991","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3459991","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:28:46Z","timestamp":1750195726000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3459991"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,13]]},"references-count":77,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2022,7,31]]}},"alternative-id":["10.1145\/3459991"],"URL":"https:\/\/doi.org\/10.1145\/3459991","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,7,13]]},"assertion":[{"value":"2020-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}