{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T06:40:44Z","timestamp":1780555244944,"version":"3.54.1"},"reference-count":162,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2022,3,19]],"date-time":"2022-03-19T00:00:00Z","timestamp":1647648000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Reinforcement learning (RL) has become a highly successful framework for learning in Markov decision processes (MDP). Due to the adoption of RL in realistic and complex environments, solution robustness becomes an increasingly important aspect of RL deployment. Nevertheless, current RL algorithms struggle with robustness to uncertainty, disturbances, or structural changes in the environment. We survey the literature on robust approaches to reinforcement learning and categorize these methods in four different ways: (i) Transition robust designs account for uncertainties in the system dynamics by manipulating the transition probabilities between states; (ii) Disturbance robust designs leverage external forces to model uncertainty in the system behavior; (iii) Action robust designs redirect transitions of the system by corrupting an agent\u2019s output; (iv) Observation robust designs exploit or distort the perceived system state of the policy. Each of these robust designs alters a different aspect of the MDP. Additionally, we address the connection of robustness to the risk-based and entropy-regularized RL formulations. The resulting survey covers all fundamental concepts underlying the approaches to robust reinforcement learning and their recent advances.<\/jats:p>","DOI":"10.3390\/make4010013","type":"journal-article","created":{"date-parts":[[2022,3,20]],"date-time":"2022-03-20T21:30:14Z","timestamp":1647811814000},"page":"276-315","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":108,"title":["Robust Reinforcement Learning: A Review of Foundations and Recent Advances"],"prefix":"10.3390","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2484-3830","authenticated-orcid":false,"given":"Janosch","family":"Moos","sequence":"first","affiliation":[{"name":"Institute for Mechatronic Systems in Mechanical Engineering, Technical University of Darmstadt, 64287 Darmstadt, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8448-4510","authenticated-orcid":false,"given":"Kay","family":"Hansel","sequence":"additional","affiliation":[{"name":"Intelligent Autonomous Systems in Computer Science, Technical University of Darmstadt, 64289 Darmstadt, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hany","family":"Abdulsamad","sequence":"additional","affiliation":[{"name":"Intelligent Autonomous Systems in Computer Science, Technical University of Darmstadt, 64289 Darmstadt, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Svenja","family":"Stark","sequence":"additional","affiliation":[{"name":"Intelligent Autonomous Systems in Computer Science, Technical University of Darmstadt, 64289 Darmstadt, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Debora","family":"Clever","sequence":"additional","affiliation":[{"name":"Institute for Mechatronic Systems in Mechanical Engineering, Technical University of Darmstadt, 64287 Darmstadt, Germany"},{"name":"ABB AG, 68309 Mannheim, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jan","family":"Peters","sequence":"additional","affiliation":[{"name":"Intelligent Autonomous Systems in Computer Science, Technical University of Darmstadt, 64289 Darmstadt, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,19]]},"reference":[{"key":"ref_1","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press. [2nd ed.]."},{"key":"ref_2","unstructured":"Puterman, M.L. (2014). Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons."},{"key":"ref_3","unstructured":"Franklin, G.F., Powell, J.D., Emami-Naeini, A., and Powell, J.D. (1994). Feedback Control of Dynamic Systems, Addison-Wesley."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1109\/37.506394","article-title":"A brief history of automatic control","volume":"16","author":"Bennett","year":"1996","journal-title":"IEEE Control Syst. Mag."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1109\/37.506395","article-title":"Optimal control-1950 to 1985","volume":"16","author":"Bryson","year":"1996","journal-title":"IEEE Control Syst. Mag."},{"key":"ref_6","unstructured":"Kirk, D.E. (2012). Optimal Control Theory: An Introduction, Courier Corporation."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Leen, T.K., Dietterich, T.G., and Tresp, V. (2001). Robust Reinforcement Learning. Advances in Neural Information Processing Systems 13, MIT Press.","DOI":"10.7551\/mitpress\/1120.001.0001"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Pinto, L., Davidson, J., and Gupta, A. (June, January 29). Supervision via Competition: Robot Adversaries for Learning Tasks. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989190"},{"key":"ref_9","unstructured":"Pinto, L., Davidson, J., Sukthankar, R., and Gupta, A. (2017, January 6\u201311). Robust Adversarial Reinforcement Learning. Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia."},{"key":"ref_10","unstructured":"Tamar, A., Xu, H., and Mannor, S. (2013). Scaling up robust MDPs by reinforcement learning. arXiv."},{"key":"ref_11","unstructured":"Tessler, C., Efroni, Y., and Mannor, S. (2019, January 9\u201315). Action Robust Reinforcement Learning and Applications in Continuous Control. Proceedings of the 36th International Conference on Machine Learning (ICML), PMLR, Long Beach, CA, USA."},{"key":"ref_12","unstructured":"Zhou, K., and Doyle, J.C. (1998). Essentials of Robust Control, Prentice Hall."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ben-Tal, A., El Ghaoui, L., and Nemirovski, A. (2009). Robust Optimization, Princeton University Press.","DOI":"10.1515\/9781400831050"},{"key":"ref_14","unstructured":"Hansen, L.P., and Sargent, T.J. (2016). Robustness, Princeton University Press."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"769","DOI":"10.1287\/moor.23.4.769","article-title":"Robust convex optimization","volume":"23","author":"Nemirovski","year":"1998","journal-title":"Math. Oper. Res."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1007\/s101070100286","article-title":"Robust optimization\u2013methodology and applications","volume":"92","author":"Nemirovski","year":"2002","journal-title":"Math. Program."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1016\/j.arcontrol.2012.09.001","article-title":"Origins of robust control: Early history and future speculations","volume":"26","author":"Safonov","year":"2012","journal-title":"Annu. Rev. Control"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1109\/TAC.1981.1102603","article-title":"Feedback and optimal sensitivity: Model reference transformations, multiplicative seminorms, and approximate inverses","volume":"26","author":"Zames","year":"1981","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Doyle, J. (1982). Analysis of feedback systems with structured uncertainties. IEE Proceedings D-Control Theory and Applications, IET.","DOI":"10.1049\/ip-d.1982.0053"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1109\/TAC.1983.1103275","article-title":"Feedback, minimax sensitivity, and optimal robustness","volume":"28","author":"Zames","year":"1983","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Doyle, J.C., Glover, K., Khargonekar, P.P., and Francis, B.A. (1989). State-space solutions to standard H2 and H\u221e control problems. IEEE Trans. Autom. Control, 1691\u20131696.","DOI":"10.23919\/ACC.1988.4789992"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"770","DOI":"10.1109\/9.256331","article-title":"L 2-gain analysis of nonlinear systems and nonlinear state feedback H\u221e control","volume":"37","year":"1992","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_23","unstructured":"Bagnell, J.A., Ng, A.Y., and Schneider, J.G. (2001). Solving Uncertain Markov Decision Processes, Carnegie Mellon University, the Robotics Institute."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"780","DOI":"10.1287\/opre.1050.0216","article-title":"Robust control of Markov decision processes with uncertain transition matrices","volume":"53","author":"Nilim","year":"2005","journal-title":"Oper. Res."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1287\/moor.1040.0129","article-title":"Robust dynamic programming","volume":"30","author":"Iyengar","year":"2005","journal-title":"Math. Oper. Res."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1016\/0167-6911(88)90055-2","article-title":"State-space formulae for all stabilizing controllers that satisfy an H(infinity)-norm bound and relations to risk sensitivity","volume":"11","author":"Glover","year":"1988","journal-title":"Syst. Control Lett."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Basar, T., and Bernhard, P. (2008). H\u221e-Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach, Birkh\u00e1user.","DOI":"10.1007\/978-0-8176-4757-5"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1137\/0330017","article-title":"A Game Theoretic Approach to H\u221e Control for Time-varying Systems","volume":"30","author":"Limebeer","year":"1992","journal-title":"SIAM J. Control Optim."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1007\/BF01210205","article-title":"Robust control and differential games on a finite time horizon","volume":"8","author":"McEneaney","year":"1995","journal-title":"Math. Control Signals Syst."},{"key":"ref_30","unstructured":"Isaacs, R. (1954). Differential Games I: Introduction, Technical Report; Rand Corp."},{"key":"ref_31","unstructured":"Owen, G. (1982). Game Theory, Academic Press."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1109\/TAC.1965.1098197","article-title":"Differential games and optimal pursuit-evasion strategies","volume":"10","author":"Ho","year":"1965","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1007\/BF00929443","article-title":"Nonzero-sum differential games","volume":"3","author":"Starr","year":"1969","journal-title":"J. Optim. Theory Appl."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1016\/S1389-0417(01)00015-8","article-title":"Value-function reinforcement learning in Markov games","volume":"2","author":"Littman","year":"2001","journal-title":"Cogn. Syst. Res."},{"key":"ref_35","unstructured":"Uther, W., and Veloso, M. (1997). Adversarial Reinforcement Learning, Carnegie Mellon University."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Shoham, Y., and Leyton-Brown, K. (2008). Multiagent systems: Algorithmic, Game-Theoretic, and Logical Foundations, Cambridge University Press.","DOI":"10.1017\/CBO9780511811654"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"430","DOI":"10.1007\/s004539910020","article-title":"Robot Motion Planning: A Game-Theoretic Foundation","volume":"26","author":"LaValle","year":"2000","journal-title":"Algorithmica"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1109\/TAC.2007.894517","article-title":"Stochastic uncertain systems subject to relative entropy constraints: Induced norms and monotonicity properties of minimax games","volume":"52","author":"Charalambous","year":"2007","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Xu, H., and Mannor, S. (2006). The robustness-performance tradeoff in Markov decision processes. Advances in Neural Information Processing Systems (NIPS), MIT Press.","DOI":"10.7551\/mitpress\/7503.003.0197"},{"key":"ref_40","unstructured":"Xu, H., and Mannor, S. (2010). Distributionally robust Markov decision processes. Advances in Neural Information Processing Systems (NIPS), Curran Associates, Inc."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1287\/opre.1080.0685","article-title":"Percentile optimization for Markov decision processes with parameter uncertainty","volume":"58","author":"Delage","year":"2010","journal-title":"Oper. Res."},{"key":"ref_42","unstructured":"Mannor, S., Mebel, O., and Xu, H. (2012). Lightning does not strike twice: Robust MDPs with coupled uncertainty. arXiv."},{"key":"ref_43","unstructured":"Hu, Z., and Hong, L.J. (2022, March 12). Kullback-Leibler Divergence Constrained Distributionally Robust Optimization. Available at Optimization Online. Available online: https:\/\/asset-pdf.scinapse.io\/prod\/2562747313\/2562747313.pdf."},{"key":"ref_44","unstructured":"Lim, S.H., Xu, H., and Mannor, S. (2013). Reinforcement learning in robust markov decision processes. Advances in Neural Information Processing Systems (NIPS), Curran Associates, Inc."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1287\/moor.1120.0566","article-title":"Robust Markov Decision Processes","volume":"38","author":"Wiesemann","year":"2013","journal-title":"Math. Oper. Res."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"2538","DOI":"10.1109\/TAC.2015.2495174","article-title":"Distributionally robust counterpart in Markov decision processes","volume":"61","author":"Yu","year":"2015","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1484","DOI":"10.1287\/moor.2016.0786","article-title":"Robust MDPs with k-rectangular uncertainty","volume":"41","author":"Mannor","year":"2016","journal-title":"Math. Oper. Res."},{"key":"ref_48","unstructured":"Goyal, V., and Grand-Clement, J. (2018). Robust Markov Decision Process: Beyond Rectangularity. arXiv."},{"key":"ref_49","unstructured":"Smirnova, E., Dohmatob, E., and Mary, J. (2019). Distributionally robust reinforcement learning. arXiv."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Coulson, J., Lygeros, J., and D\u00f6rfler, F. (2019, January 11\u201313). Regularized and Distributionally Robust Data-Enabled Predictive Control. Proceedings of the 2019 IEEE 58th Conference on Decision and Control (CDC), Nice, France.","DOI":"10.1109\/CDC40024.2019.9028943"},{"key":"ref_51","unstructured":"Derman, E., and Mannor, S. (2020). Distributional robustness and regularization in reinforcement learning. arXiv."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Turchetta, M., Krause, A., and Trimpe, S. (August, January 31). Robust model-free reinforcement learning with multi-objective Bayesian optimization. Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France.","DOI":"10.1109\/ICRA40945.2020.9197000"},{"key":"ref_53","unstructured":"Abdulsamad, H., Dorau, T., Belousov, B., Zhu, J.J., and Peters, J. (2021). Distributionally Robust Trajectory Optimization Under Uncertain Dynamics via Relative-Entropy Trust Regions. arXiv."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"3863","DOI":"10.1109\/TAC.2020.3030884","article-title":"Wasserstein Distributionally Robust Stochastic Control: A Data-Driven Approach","volume":"66","author":"Yang","year":"2021","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_55","unstructured":"Klima, R., Bloembergen, D., Kaisers, M., and Tuyls, K. (2019, January 13\u201317). Robust temporal difference learning for critical domains. Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS). International Foundation for Autonomous Agents and Multiagent Systems, Montreal, QC, Canada."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Pan, X., Seita, D., Gao, Y., and Canny, J. (2019). Risk Averse Robust Adversarial Reinforcement Learning. arXiv.","DOI":"10.1109\/ICRA.2019.8794293"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Tan, K.L., Esfandiari, Y., Lee, X.Y., and Sarkar, S. (2020, January 1\u20133). Robustifying reinforcement learning agents via action space adversarial training. Proceedings of the 2020 American control conference (ACC), Denver, CO, USA.","DOI":"10.23919\/ACC45564.2020.9147846"},{"key":"ref_58","unstructured":"Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., and Weinberger, K.Q. (2014). Generative Adversarial Nets. Advances in Neural Information Processing Systems 27 (NIPS), Curran Associates, Inc."},{"key":"ref_59","unstructured":"Goodfellow, I.J., Shlens, J., and Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv."},{"key":"ref_60","unstructured":"Huang, S., Papernot, N., Goodfellow, I., Duan, Y., and Abbeel, P. (2017). Adversarial Attacks on Neural Network Policies. arXiv."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Mandlekar, A., Zhu, Y., Garg, A., Fei-Fei, L., and Savarese, S. (2017, January 24\u201328). Adversarially robust policy learning: Active construction of physically-plausible perturbations. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8206245"},{"key":"ref_62","unstructured":"Pattanaik, A., Tang, Z., Liu, S., Bommannan, G., and Chowdhary, G. (2018, January 10\u201315). Robust Deep Reinforcement Learning with Adversarial Attacks. Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS). International Foundation for Autonomous Agents and Multiagent Systems, Stockholm, Sweden."},{"key":"ref_63","unstructured":"Gleave, A., Dennis, M., Kant, N., Wild, C., Levine, S., and Russell, S. (2019). Adversarial Policies: Attacking Deep Reinforcement Learning. arXiv."},{"key":"ref_64","unstructured":"Zhang, H., Chen, H., Xiao, C., Li, B., Liu, M., Boning, D., and Hsieh, C.J. (2020). Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations. arXiv."},{"key":"ref_65","unstructured":"L\u00fctjens, B., Everett, M., and How, J.P. (November, January 30). Certified adversarial robustness for deep reinforcement learning. Proceedings of the Conference on Robot Learning (CoRL), Osaka, Japan."},{"key":"ref_66","first-page":"1367","article-title":"Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory","volume":"32","author":"Dawid","year":"2004","journal-title":"Ann. Stat."},{"key":"ref_67","unstructured":"Osogami, T. (2012). Robustness and risk-sensitivity in Markov decision processes. Advances in Neural Information Processing Systems (NIPS), Curran Associates, Inc."},{"key":"ref_68","unstructured":"Eysenbach, B., and Levine, S. (2019). If MaxEnt RL is the Answer, What is the Question?. arXiv."},{"key":"ref_69","unstructured":"Eysenbach, B., and Levine, S. (2021). Maximum entropy rl (provably) solves some robust rl problems. arXiv."},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Boyd, S., and Vandenberghe, L. (2004). Convex Optimization, Cambridge University Press.","DOI":"10.1017\/CBO9780511804441"},{"key":"ref_71","unstructured":"Papageorgiou, M., Leibold, M., and Buss, M. (1991). Optimierung, Springer."},{"key":"ref_72","unstructured":"Kall, P., Wallace, S.W., and Kall, P. (1994). Stochastic Programming, Springer."},{"key":"ref_73","doi-asserted-by":"crossref","first-page":"3190","DOI":"10.1016\/j.cma.2007.03.003","article-title":"Robust optimization\u2013A comprehensive survey","volume":"196","author":"Beyer","year":"2007","journal-title":"Comput. Methods Appl. Mech. Eng."},{"key":"ref_74","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1287\/moor.1110.0531","article-title":"A distributional interpretation of robust optimization","volume":"37","author":"Xu","year":"2012","journal-title":"Math. Oper. Res."},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"1358","DOI":"10.1287\/opre.2014.1314","article-title":"Distributionally robust convex optimization","volume":"62","author":"Wiesemann","year":"2014","journal-title":"Oper. Res."},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Heger, M. (1994). Consideration of risk in reinforcement learning. Machine Learning Proceedings 1994, Elsevier.","DOI":"10.1016\/B978-1-55860-335-6.50021-0"},{"key":"ref_77","unstructured":"Scarf, H.E. (1957). A Min-Max Solution of an Inventory Problem, Technical Report; Rand Corp."},{"key":"ref_78","unstructured":"Bolza, O. (1909). Vorlesungen \u00fcber Variationsrechnung, BG Teubner. Available online: https:\/\/diglib.uibk.ac.at\/ulbtirol\/content\/titleinfo\/372088."},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"809","DOI":"10.2307\/2371626","article-title":"On multipliers for Lagrange problems","volume":"61","author":"McShane","year":"1939","journal-title":"Am. J. Math."},{"key":"ref_80","unstructured":"Bliss, G.A. (1946). Lectures on the Calculus of Variations, University of Chicago Press."},{"key":"ref_81","unstructured":"Cicala, P. (1957). An Engineering Approach to the Calculus of Variations, Libreria Editrice Universitaria Levrotto & Bella."},{"key":"ref_82","doi-asserted-by":"crossref","unstructured":"Pontryagin, L.S. (2018). Mathematical Theory of Optimal Processes, Routledge.","DOI":"10.1201\/9780203749319"},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1090\/S0002-9904-1954-09848-8","article-title":"The theory of dynamic programming","volume":"60","author":"Bellman","year":"1954","journal-title":"Bull. Am. Math. Soc."},{"key":"ref_84","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1162\/089976600300015961","article-title":"Reinforcement Learning in Continuous Time and Space","volume":"12","author":"Doya","year":"2000","journal-title":"Neural Comput."},{"key":"ref_85","first-page":"679","article-title":"A Markovian decision process","volume":"6","author":"Bellman","year":"1957","journal-title":"J. Math. Mech."},{"key":"ref_86","first-page":"102","article-title":"Contributions to the theory of optimal control","volume":"5","author":"Kalman","year":"1960","journal-title":"Bol. Soc. Mat. Mex."},{"key":"ref_87","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1115\/1.3662604","article-title":"Control system analysis and design via the \u201csecond method\u201d of Lyapunov: I\u2014Continuous-time systems","volume":"82","author":"Kalman","year":"1960","journal-title":"J. Basic Eng."},{"key":"ref_88","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1115\/1.3662552","article-title":"A new approach to linear filtering and prediction problems","volume":"82","author":"Kalman","year":"1960","journal-title":"J. Basic Eng."},{"key":"ref_89","unstructured":"Von Neumann, J., and Morgenstern, O. (1944). Theory of Games and Economic Behavior, Princeton University Press."},{"key":"ref_90","unstructured":"Isaacs, R. (1999). Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization, Dover Publications."},{"key":"ref_91","doi-asserted-by":"crossref","first-page":"286","DOI":"10.2307\/1969529","article-title":"Non-cooperative games","volume":"54","author":"Nash","year":"1951","journal-title":"Ann. Math."},{"key":"ref_92","unstructured":"Awheda, M. (2017). On Multi-Agent Reinforcement Learning in Matrix, Stochastic and Differential Games. [Ph.D. Thesis, Carleton University]."},{"key":"ref_93","unstructured":"Bowling, M.H., and Veloso, M.M. (2022, March 12). An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning. Available online: https:\/\/apps.dtic.mil\/sti\/citations\/ADA385122."},{"key":"ref_94","unstructured":"Howard, R.A. (2022, March 12). Dynamic Programming and Markov Processes. Available online: https:\/\/psycnet.apa.org\/record\/1961-01474-000."},{"key":"ref_95","doi-asserted-by":"crossref","unstructured":"Littman, M.L. (1994). Markov games as a framework for multi-agent reinforcement learning. Machine Learning Proceedings 1994, Elsevier.","DOI":"10.1016\/B978-1-55860-335-6.50027-1"},{"key":"ref_96","doi-asserted-by":"crossref","first-page":"818","DOI":"10.1016\/j.asoc.2006.02.005","article-title":"A robust Markov game controller for nonlinear systems","volume":"7","author":"Sharma","year":"2007","journal-title":"Appl. Soft Comput."},{"key":"ref_97","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1287\/mnsc.28.1.1","article-title":"State of the art\u2014A survey of partially observable Markov decision processes: Theory, models, and algorithms","volume":"28","author":"Monahan","year":"1982","journal-title":"Manag. Sci."},{"key":"ref_98","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/S0004-3702(98)00023-X","article-title":"Planning and acting in partially observable stochastic domains","volume":"101","author":"Kaelbling","year":"1998","journal-title":"Artif. Intell."},{"key":"ref_99","unstructured":"Sutton, R.S., McAllester, D., Singh, S., and Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst., 12."},{"key":"ref_100","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv."},{"key":"ref_101","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_102","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018, January 10\u201315). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proceedings of the 35th International Conference on Machine Learning PMLR, Stockholm, Sweden."},{"key":"ref_103","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1007\/BF00992696","article-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning","volume":"8","author":"Williams","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_104","unstructured":"Kakade, S.M. (2001). A natural policy gradient. Advances in Neural Information Processing Systems, NIPS."},{"key":"ref_105","doi-asserted-by":"crossref","first-page":"1180","DOI":"10.1016\/j.neucom.2007.11.026","article-title":"Natural actor-critic","volume":"71","author":"Peters","year":"2008","journal-title":"Neurocomputing"},{"key":"ref_106","doi-asserted-by":"crossref","unstructured":"Peters, J., Mulling, K., and Altun, Y. (2010, January 11\u201315). Relative entropy policy search. Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, Atlanta, GA, USA.","DOI":"10.1609\/aaai.v24i1.7727"},{"key":"ref_107","unstructured":"Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. (2015, January 7\u20139). Trust region policy optimization. Proceedings of the International Conference on Machine Learning, PMLR, Lille, France."},{"key":"ref_108","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv."},{"key":"ref_109","unstructured":"Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., Perolat, J., Silver, D., and Graepel, T. (2017). A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning. Advances in Neural Information Processing Systems, NIPS."},{"key":"ref_110","first-page":"55","article-title":"The world of independent learners is not Markovian","volume":"15","author":"Laurent","year":"2011","journal-title":"Int. J.-Knowl.-Based Intell. Eng. Syst."},{"key":"ref_111","unstructured":"Claus, C., and Boutilier, C. (1998, January 26\u201330). The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems. Proceedings of the Fifteenth National\/Tenth Conference on Artificial Intelligence\/Innovative Applications of Artificial Intelligence. American Association for Artificial Intelligence, Madison, WI, USA."},{"key":"ref_112","doi-asserted-by":"crossref","first-page":"1095","DOI":"10.1073\/pnas.39.10.1095","article-title":"Stochastic games","volume":"39","author":"Shapley","year":"1953","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_113","doi-asserted-by":"crossref","unstructured":"Bu\u015foniu, L., Babu\u0161ka, R., and Schutter, B.D. (2010). Multi-agent reinforcement learning: An overview. Innovations in Multi-Agent Systems and Applications-1, Springer.","DOI":"10.1007\/978-3-642-14435-6_7"},{"key":"ref_114","doi-asserted-by":"crossref","first-page":"895","DOI":"10.1007\/s10462-021-09996-w","article-title":"Multi-agent deep reinforcement learning: A survey","volume":"55","author":"Gronauer","year":"2021","journal-title":"Artif. Intell. Rev."},{"key":"ref_115","first-page":"310","article-title":"A generalized reinforcement-learning model: Convergence and applications","volume":"96","author":"Littman","year":"1996","journal-title":"ICML"},{"key":"ref_116","doi-asserted-by":"crossref","first-page":"2017","DOI":"10.1162\/089976699300016070","article-title":"A unified analysis of value-function-based reinforcement-learning algorithms","volume":"11","author":"Littman","year":"1999","journal-title":"Neural Comput."},{"key":"ref_117","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of Go with deep neural networks and tree search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"ref_118","unstructured":"Foerster, J., Assael, I.A., De Freitas, N., and Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. Adv. Neural Inf. Process. Syst., 29."},{"key":"ref_119","unstructured":"Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., and Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in Neural Information Processing Systems, NIPS."},{"key":"ref_120","doi-asserted-by":"crossref","unstructured":"Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S. (2018, January 2\u20137). Counterfactual multi-agent policy gradients. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11794"},{"key":"ref_121","unstructured":"Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., and Whiteson, S. (2018, January 10\u201315). Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. Proceedings of the International Conference on Machine Learning (ICML), PMLR, Stockholm, Sweden."},{"key":"ref_122","unstructured":"Mahajan, A., Rashid, T., Samvelyan, M., and Whiteson, S. (2019). Maven: Multi-agent variational exploration. Advances in Neural Information Processing Systems (NIPS), Curran Associates, Inc."},{"key":"ref_123","unstructured":"Son, K., Kim, D., Kang, W.J., Hostallero, D.E., and Yi, Y. (2019, January 9\u201315). Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. Proceedings of the 36th International Conference on Machine Learning (ICML), PMLR, Long Beach, CA, USA."},{"key":"ref_124","first-page":"10199","article-title":"Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning","volume":"33","author":"Rashid","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_125","doi-asserted-by":"crossref","first-page":"1228","DOI":"10.1109\/TNNLS.2020.3041469","article-title":"Online minimax Q network learning for two-player zero-sum Markov games","volume":"33","author":"Zhu","year":"2020","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_126","unstructured":"Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., and Wu, Y. (2021). The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games. arXiv."},{"key":"ref_127","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1007\/s10994-011-5268-1","article-title":"Robustness and generalization","volume":"86","author":"Xu","year":"2012","journal-title":"Mach. Learn."},{"key":"ref_128","doi-asserted-by":"crossref","first-page":"728","DOI":"10.1287\/opre.21.3.728","article-title":"Markovian decision processes with uncertain transition probabilities","volume":"21","author":"Satia","year":"1973","journal-title":"Oper. Res."},{"key":"ref_129","doi-asserted-by":"crossref","unstructured":"Xiao, C., Li, B., Zhu, J.Y., He, W., Liu, M., and Song, D. (2018). Generating adversarial examples with adversarial networks. arXiv.","DOI":"10.24963\/ijcai.2018\/543"},{"key":"ref_130","doi-asserted-by":"crossref","first-page":"739","DOI":"10.1287\/opre.42.4.739","article-title":"Markov decision processes with imprecise transition probabilities","volume":"42","author":"White","year":"1994","journal-title":"Oper. Res."},{"key":"ref_131","doi-asserted-by":"crossref","unstructured":"Givan, R., Leach, S., and Dean, T. (1997). Bounded parameter Markov decision processes. European Conference on Planning, Springer.","DOI":"10.1007\/3-540-63912-8_89"},{"key":"ref_132","doi-asserted-by":"crossref","unstructured":"Littman, M.L. (1994). Memoryless policies: Theoretical limitations and practical results. From Animals to Animats 3: Proceedings of the Third International Conference on Simulation of Adaptive Behavior, MIT Press.","DOI":"10.7551\/mitpress\/3117.003.0041"},{"key":"ref_133","doi-asserted-by":"crossref","unstructured":"Zhu, J.J., Jitkrittum, W., Diehl, M., and Sch\u00f6lkopf, B. (2020, January 14\u201318). Worst-Case Risk Quantification under Distributional Ambiguity using Kernel Mean Embedding in Moment Problem. Proceedings of the 2020 59th IEEE Conference on Decision and Control (CDC), Jeju, Korea.","DOI":"10.1109\/CDC42340.2020.9303938"},{"key":"ref_134","doi-asserted-by":"crossref","first-page":"4242","DOI":"10.1287\/mnsc.2018.3140","article-title":"Near-optimal Bayesian ambiguity sets for distributionally robust optimization","volume":"65","author":"Gupta","year":"2019","journal-title":"Manag. Sci."},{"key":"ref_135","unstructured":"Rahimian, H., and Mehrotra, S. (2019). Distributionally robust optimization: A review. arXiv."},{"key":"ref_136","unstructured":"Badrinath, K.P., and Kalathil, D. (2021, January 18\u201324). Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees. Proceedings of the International Conference on Machine Learning PMLR, Virtual."},{"key":"ref_137","unstructured":"Abdullah, M.A., Ren, H., Ammar, H.B., Milenkovic, V., Luo, R., Zhang, M., and Wang, J. (2019). Wasserstein robust reinforcement learning. arXiv."},{"key":"ref_138","doi-asserted-by":"crossref","unstructured":"Todorov, E., Erez, T., and Tassa, Y. (2012, January 7\u201312). Mujoco: A physics engine for model-based control. Proceedings of the 2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal.","DOI":"10.1109\/IROS.2012.6386109"},{"key":"ref_139","unstructured":"O\u2019Donoghue, B., Osband, I., Munos, R., and Mnih, V. (2017, January 6\u201311). The uncertainty bellman equation and exploration. Proceedings of the International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_140","unstructured":"Derman, E., Mankowitz, D., Mann, T., and Mannor, S. (2019, January 22\u201325). A Bayesian Approach to Robust Reinforcement Learning. Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), PMLR, Tel Aviv, Israel."},{"key":"ref_141","unstructured":"Rajeswaran, A., Ghotra, S., Ravindran, B., and Levine, S. (2016). Epopt: Learning robust neural network policies using model ensembles. arXiv."},{"key":"ref_142","doi-asserted-by":"crossref","unstructured":"Tamar, A., Glassner, Y., and Mannor, S. (2015, January 25\u201330). Optimizing the CVaR via sampling. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA.","DOI":"10.1609\/aaai.v29i1.9561"},{"key":"ref_143","unstructured":"Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. (2016). OpenAI Gym. arXiv."},{"key":"ref_144","unstructured":"Mankowitz, D.J., Levine, N., Jeong, R., Shi, Y., Kay, J., Abdolmaleki, A., Springenberg, J.T., Mann, T., Hester, T., and Riedmiller, M. (2019). Robust reinforcement learning for continuous control with model misspecification. arXiv."},{"key":"ref_145","doi-asserted-by":"crossref","unstructured":"Lutter, M., Mannor, S., Peters, J., Fox, D., and Garg, A. (2021). Robust Value Iteration for Continuous Control Tasks. arXiv.","DOI":"10.15607\/RSS.2021.XVII.007"},{"key":"ref_146","doi-asserted-by":"crossref","unstructured":"Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., and Swami, A. (2016, January 21\u201324). The limitations of deep learning in adversarial settings. Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS&P), Saarbruecken, Germany.","DOI":"10.1109\/EuroSP.2016.36"},{"key":"ref_147","unstructured":"Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. (2013). Intriguing properties of neural networks. arXiv."},{"key":"ref_148","doi-asserted-by":"crossref","unstructured":"Van Hasselt, H., Guez, A., and Silver, D. (2016, January 12\u201317). Deep reinforcement learning with double q-learning. Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"ref_149","unstructured":"Welling, M., and Teh, Y.W. (July, January 28). Bayesian learning via stochastic gradient Langevin dynamics. Proceedings of the 28th international conference on machine learning (ICML-11), Bellevue, WA, USA."},{"key":"ref_150","unstructured":"Salman, H., Yang, G., Zhang, H., Hsieh, C.J., and Zhang, P. (2019). A convex relaxation barrier to tight robustness verification of neural networks. arXiv."},{"key":"ref_151","unstructured":"Ziebart, B.D. (2010). Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy, Carnegie Mellon University."},{"key":"ref_152","doi-asserted-by":"crossref","first-page":"356","DOI":"10.1287\/mnsc.18.7.356","article-title":"Risk-sensitive Markov decision processes","volume":"18","author":"Howard","year":"1972","journal-title":"Manag. Sci."},{"key":"ref_153","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1287\/mnsc.23.1.43","article-title":"A utility criterion for Markov decision processes","volume":"23","author":"Jaquette","year":"1976","journal-title":"Manag. Sci."},{"key":"ref_154","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1007\/BF01582110","article-title":"Optimal stopping, exponential utility, and linear programming","volume":"16","author":"Denardo","year":"1979","journal-title":"Math. Program."},{"key":"ref_155","doi-asserted-by":"crossref","first-page":"1379","DOI":"10.1016\/S0005-1098(01)00084-X","article-title":"On terminating Markov decision processes with a risk-averse objective function","volume":"37","author":"Patek","year":"2001","journal-title":"Automatica"},{"key":"ref_156","unstructured":"Osogami, T. (2012). Iterated risk measures for risk-sensitive Markov decision processes with discounted cost. arXiv."},{"key":"ref_157","doi-asserted-by":"crossref","first-page":"764","DOI":"10.2307\/1426972","article-title":"Risk-sensitive linear quadratic Gaussian control","volume":"13","author":"Whittle","year":"1981","journal-title":"Adv. Appl. Probab."},{"key":"ref_158","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1017\/S1365100502027025","article-title":"Risk sensitivity, a strangely pervasive concept","volume":"6","author":"Whittle","year":"2002","journal-title":"Macroecon. Dyn."},{"key":"ref_159","doi-asserted-by":"crossref","unstructured":"Nass, D., Belousov, B., and Peters, J. (2019, January 3\u20138). Entropic Risk Measure in Policy Search. Proceedings of the 2019 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China.","DOI":"10.1109\/IROS40897.2019.8967699"},{"key":"ref_160","doi-asserted-by":"crossref","first-page":"398","DOI":"10.1109\/9.847720","article-title":"Minimax optimal control of stochastic uncertain systems with relative entropy constraints","volume":"45","author":"Petersen","year":"2000","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_161","unstructured":"Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv."},{"key":"ref_162","doi-asserted-by":"crossref","unstructured":"Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., and Swami, A. (2017, January 2\u20136). Practical black-box attacks against machine learning. Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, Abu Dhabi, United Arab Emirates.","DOI":"10.1145\/3052973.3053009"}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/4\/1\/13\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:39:33Z","timestamp":1760135973000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/4\/1\/13"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,19]]},"references-count":162,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["make4010013"],"URL":"https:\/\/doi.org\/10.3390\/make4010013","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,19]]}}}