{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,3]],"date-time":"2026-05-03T03:05:42Z","timestamp":1777777542948,"version":"3.51.4"},"reference-count":224,"publisher":"Emerald","issue":"1-2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,1,1]]},"abstract":"<jats:p>This monograph provides an exposition of recently developed reinforcement learning-based techniques for decision and control in human-engineered cognitive systems. The developed methods learn the solution to optimal control, zero-sum, non zero-sum, and graphical game problems completely online by using measured data along the system trajectories and have proved stability, optimality, and robustness. It is true that games have been shown to be important in robust control for disturbance rejection, and in coordinating activities among multiple agents in networked teams. We also consider cases with intermittent (an analogous to triggered control) instead of continuous learning and apply those techniques for optimal regulation and optimal tracking. We also introduce a bounded rational model to quantify the cognitive skills of a reinforcement learning agent. In order to do that, we leverage ideas from behavioral psychology to formulate differential games where the interacting learning agents have different intelligence skills, and we introduce an iterative method of optimal responses that determine the policy of an agent in adversarial environments. Finally, we present applications of reinforcement learning to motion planning and collaborative target tracking of bounded rational unmanned aerial vehicles.<\/jats:p>","DOI":"10.1561\/2600000022","type":"journal-article","created":{"date-parts":[[2020,11,12]],"date-time":"2020-11-12T10:11:13Z","timestamp":1605175873000},"page":"1-175","source":"Crossref","is-referenced-by-count":20,"title":["Synchronous Reinforcement Learning-Based Control for Cognitive Autonomy"],"prefix":"10.1108","volume":"8","author":[{"given":"Kyriakos G.","family":"Vamvoudakis","sequence":"first","affiliation":[{"name":"Georgia Institute of Technology ,","place":["USA"]}]},{"given":"Nick-Marios T.","family":"Kokolakis","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology ,","place":["USA"]}]}],"member":"140","published-online":{"date-parts":[[2021,1,1]]},"reference":[{"issue":"12","key":"2026033114240019200_ref001","doi-asserted-by":"crossref","first-page":"3038","DOI":"10.1016\/j.automatica.2014.10.047","article-title":"Multi-agent discrete-time graphical games and reinforcement learning solutions","volume":"50","author":"Abouheaf","year":"2014","journal-title":"Automatica"},{"issue":"12","key":"2026033114240019200_ref002","doi-asserted-by":"crossref","first-page":"1222","DOI":"10.1109\/TAC.1985.1103886","article-title":"Analytical solution for an open-loop Stackelberg game","volume":"30","author":"H.","year":"1985","journal-title":"IEEE Transactions on Automatic Control"},{"issue":"5","key":"2026033114240019200_ref003","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1016\/j.automatica.2004.11.034","article-title":"Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach","volume":"41","author":"Abu-Khalaf","year":"2005","journal-title":"Automatica"},{"key":"2026033114240019200_ref004","first-page":"105","author":"Abu-Khalaf","year":"2006"},{"key":"2026033114240019200_ref005","first-page":"1252","author":"Abuzainab","year":"2016"},{"key":"2026033114240019200_ref006","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511760778","volume-title":"Network Security: A Decision and Game-Theoretic Approach","author":"Alpcan","year":"2010"},{"issue":"4","key":"2026033114240019200_ref007","doi-asserted-by":"crossref","first-page":"943","DOI":"10.1109\/TSMCB.2008.926614","article-title":"Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof","volume":"38","author":"Al-Tamimi","year":"2008","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)"},{"issue":"4","key":"2026033114240019200_ref008","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1109\/37.88585","article-title":"An introduction to autonomous control systems","volume":"11","author":"Antsaklis","year":"1991","journal-title":"IEEE Control Systems Magazine"},{"key":"2026033114240019200_ref009","doi-asserted-by":"crossref","DOI":"10.1201\/9780429496639","volume-title":"The Economy as an Evolving Complex System II","author":"Arthur","year":"2018"},{"key":"2026033114240019200_ref010","volume-title":"Distributed Austonomous Robotic Systems 2","author":"Asama","year":"2013"},{"issue":"3","key":"2026033114240019200_ref011","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1007\/BF00934911","article-title":"Stackelberg strategies in linear-quadratic stochastic differential games","volume":"35","author":"Bagchi","year":"1981","journal-title":"Journal of Optimization Theory and Applications"},{"key":"2026033114240019200_ref012","volume-title":"Reinforcement Learning: An Introduction","author":"Barto","year":"1998"},{"key":"2026033114240019200_ref013","volume-title":"Dynamic Noncooperative Game Theory","author":"Ba\u015far","year":"1999"},{"key":"2026033114240019200_ref014","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-8176-4757-5","volume-title":"H-Infinity Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach","author":"Ba\u015far","year":"2008"},{"issue":"12","key":"2026033114240019200_ref015","doi-asserted-by":"crossref","first-page":"1203","DOI":"10.1016\/j.sysconle.2012.08.014","article-title":"Error bounds for constant step-size Q-learning","volume":"61","author":"Beck","year":"2012","journal-title":"Systems & Control Letters"},{"issue":"5","key":"2026033114240019200_ref016","doi-asserted-by":"crossref","first-page":"753","DOI":"10.1002\/acs.2862","article-title":"Model-based vs data-driven adaptive control: An overview","volume":"32","author":"Benosman","year":"2018","journal-title":"International Journal of Adaptive Control and Signal Processing"},{"key":"2026033114240019200_ref017","volume-title":"Reinforcement Learning and Optimal Control","author":"Bertsekas","year":"2019"},{"key":"2026033114240019200_ref018","first-page":"560","author":"Bertsekas","year":"1995"},{"key":"2026033114240019200_ref019","volume-title":"Neuro-Dynamic Program-ming","author":"Bertsekas","year":"1996"},{"issue":"1","key":"2026033114240019200_ref020","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1016\/j.automatica.2012.09.019","article-title":"A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems","volume":"49","author":"Bhasin","year":"2013","journal-title":"Automatica"},{"key":"2026033114240019200_ref021","volume-title":"Game Theory and National Security","author":"Brams","year":"1988"},{"key":"2026033114240019200_ref022","doi-asserted-by":"crossref","DOI":"10.1201\/9781315137667","volume-title":"Applied Optimal Control: Optimization, Estima-tion and Control","author":"Bryson","year":"2018"},{"issue":"2","key":"2026033114240019200_ref023","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1109\/TSMCC.2007.913919","article-title":"A comprehensive survey of multiagent reinforcement learning","volume":"38","author":"Busoniu","year":"2008","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)"},{"key":"2026033114240019200_ref024","volume-title":"Reinforcement Learning and Dynamic Programming Using Function Approximators","author":"Busoniu","year":"2010"},{"key":"2026033114240019200_ref025","volume-title":"Behavioral Game Theory: Experiments in Strategic Interaction","author":"Camerer","year":"2011"},{"issue":"3","key":"2026033114240019200_ref026","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1162\/0033553041502225","article-title":"A cognitive hierarchy model of games","volume":"119","author":"Camerer","year":"2004","journal-title":"The Quarterly Journal of Economics"},{"issue":"7","key":"2026033114240019200_ref027","doi-asserted-by":"crossref","first-page":"1122","DOI":"10.1093\/nsr\/nwaa046","article-title":"Merging game theory and control theory in the era of AI and autonomy","volume":"7","author":"Cao","year":"2020","journal-title":"National Science Review"},{"key":"2026033114240019200_ref028","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-69082-7","volume-title":"Stochastic Learning and Optimization","author":"Cao","year":"2007"},{"issue":"6","key":"2026033114240019200_ref1128","doi-asserted-by":"crossref","first-page":"791","DOI":"10.1109\/TAC.1972.1100179","article-title":"Stackelburg solution for two-person games with biased information patterns","volume":"17","author":"Chen","year":"1972","journal-title":"IEEE Transactions on Automatic Control"},{"issue":"6","key":"2026033114240019200_ref029","doi-asserted-by":"crossref","first-page":"1317","DOI":"10.1109\/TCSI.2007.895383","article-title":"Pinning complex networks by a single controller","volume":"54","author":"Chen","year":"2007","journal-title":"IEEE Transactions on Circuits and Systems I: Regular Papers"},{"key":"2026033114240019200_ref030","author":"Chong","year":"2016"},{"key":"2026033114240019200_ref031","first-page":"3674","author":"Chowdhary","year":"2010"},{"key":"2026033114240019200_ref032","author":"Crawford","year":"2007"},{"issue":"1","key":"2026033114240019200_ref033","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1137\/070699652","article-title":"The complexity of computing a Nash equilibrium","volume":"39","author":"Daskalakis","year":"2009","journal-title":"SIAM Journal on Computing"},{"key":"2026033114240019200_ref034","first-page":"10","author":"Devraj","year":"2019"},{"key":"2026033114240019200_ref035","first-page":"3048","author":"Dierks","year":"2010"},{"issue":"1","key":"2026033114240019200_ref036","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1109\/TPWRS.2016.2537984","article-title":"An event-triggered approach for load frequency control with supplementary ADP","volume":"32","author":"Dong","year":"2016","journal-title":"IEEE Transactions on Power Systems"},{"issue":"8","key":"2026033114240019200_ref037","doi-asserted-by":"crossref","first-page":"1941","DOI":"10.1109\/TNNLS.2016.2586303","article-title":"Event-triggered adaptive dynamic programming for continuous-time systems with control constraints","volume":"28","author":"Dong","year":"2016","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"2026033114240019200_ref038","author":"Donkers","year":"2012"},{"issue":"1","key":"2026033114240019200_ref039","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1162\/089976600300015961","article-title":"Reinforcement learning in continuous time and space","volume":"12","author":"Doya","year":"2000","journal-title":"Neural Computation"},{"key":"2026033114240019200_ref040","volume-title":"LQ Dynamic Optimization and Differential Games","author":"Engwerda","year":"2005"},{"issue":"4","key":"2026033114240019200_ref041","first-page":"848","article-title":"Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria","volume":"88","author":"Erev","year":"1998","journal-title":"American Economic Review"},{"key":"2026033114240019200_ref042","volume-title":"The Logical Approach to Chess","author":"Euwe","year":"1982"},{"issue":"9","key":"2026033114240019200_ref043","doi-asserted-by":"crossref","first-page":"1465","DOI":"10.1109\/TAC.2004.834433","article-title":"Information flow and cooperative control of vehicle formations","volume":"49","author":"Fax","year":"2004","journal-title":"IEEE Transactions on Automatic Control"},{"issue":"4\u20135","key":"2026033114240019200_ref044","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1016\/0191-2615(84)90013-4","article-title":"Game theory and transportation systems modelling","volume":"18","author":"Fisk","year":"1984","journal-title":"Transportation Research Part B: Methodological"},{"issue":"3","key":"2026033114240019200_ref045","doi-asserted-by":"crossref","first-page":"683","DOI":"10.1137\/S0363012998332433","article-title":"A max-plus-based algorithm for a Hamilton\u2013Jacobi\u2013Bellman equation of nonlinear filtering","volume":"38","author":"Fleming","year":"2000","journal-title":"SIAM Journal on Control and Optimization"},{"issue":"2","key":"2026033114240019200_ref046","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1109\/9.481532","article-title":"On global existence of solutions to coupled matrix Riccati equations in closed-loop Nash games","volume":"41","author":"Freiling","year":"1996","journal-title":"IEEE Transactions on Automatic Control"},{"issue":"3","key":"2026033114240019200_ref047","doi-asserted-by":"crossref","first-page":"515","DOI":"10.1023\/A:1017532210579","article-title":"Existence and uniqueness of open-loop Stackelberg equilibria in linear-quadratic differential games","volume":"110","author":"Freiling","year":"2001","journal-title":"Journal of Optimization Theory and Applications"},{"key":"2026033114240019200_ref048","first-page":"963","author":"Freiling","year":"2003"},{"key":"2026033114240019200_ref049","volume-title":"The Theory of Learning in Games","author":"Fudenberg","year":"1998"},{"key":"2026033114240019200_ref050","author":"Gajic","year":"1988"},{"key":"2026033114240019200_ref051","author":"Gao","year":"2016"},{"issue":"2","key":"2026033114240019200_ref052","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1109\/TAC.2012.2211411","article-title":"Model-based event-triggered control for systems with quantization and time-varying network delays","volume":"58","author":"Garcia","year":"2013","journal-title":"IEEE Transactions on Automatic Control"},{"key":"2026033114240019200_ref053","first-page":"2429","author":"Goretkin"},{"issue":"6","key":"2026033114240019200_ref054","doi-asserted-by":"crossref","first-page":"1291","DOI":"10.1109\/TSMCC.2012.2218595","article-title":"A sur-vey of actor-critic reinforcement learning: Standard and natural policy gradients","volume":"42","author":"Grondman","year":"2012","journal-title":"IEEE Transactions on Systems, Man, and Cyber-netics, Part C (Applications and Reviews)"},{"key":"2026033114240019200_ref055","doi-asserted-by":"crossref","DOI":"10.1515\/9781400865246","volume-title":"Impulsive and Hybrid Dynamical Systems: Stability, Dissipativity, and Control","author":"Haddad","year":"2006"},{"key":"2026033114240019200_ref056","volume-title":"Neural Networks and Learning Machines","author":"Haykin","year":"2009"},{"issue":"4","key":"2026033114240019200_ref057","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1007\/s11518-007-5058-2","article-title":"A survey of Stackelberg differential game models in supply and marketing channels","volume":"16","author":"He","year":"2007","journal-title":"Journal of Systems Science and Systems Engineering"},{"issue":"13","key":"2026033114240019200_ref058","doi-asserted-by":"crossref","first-page":"3429","DOI":"10.1109\/TSP.2016.2548987","article-title":"Faster learning and adaptation in security games by exploiting information asymmetry","volume":"64","author":"He","year":"2016","journal-title":"IEEE Transactions on Signal Processing"},{"issue":"4","key":"2026033114240019200_ref059","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1080\/00207170701506919","article-title":"Analysis of event-driven controllers for linear systems","volume":"81","author":"Heemels","year":"2008","journal-title":"International Journal of Control"},{"key":"2026033114240019200_ref060","volume-title":"Noncooperative Game Theory: An Introduction for Engineers and Computer Scientists","author":"Hespanha","year":"2017"},{"issue":"11","key":"2026033114240019200_ref061","doi-asserted-by":"crossref","first-page":"2735","DOI":"10.1016\/j.automatica.2008.03.021","article-title":"Lyapunov conditions for input-to-state stability of impulsive systems","volume":"44","author":"Hespanha","year":"2008","journal-title":"Automatica"},{"key":"2026033114240019200_ref062","author":"Ho","year":"2010"},{"issue":"5","key":"2026033114240019200_ref063","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1016\/0893-6080(89)90020-8","article-title":"Multilayer feedfor-ward networks are universal approximators","volume":"2","author":"Hornik","year":"1989","journal-title":"Neural Networks"},{"issue":"5","key":"2026033114240019200_ref064","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1016\/0893-6080(90)90005-6","article-title":"Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks","volume":"3","author":"Hornik","year":"1990","journal-title":"Neural Networks"},{"key":"2026033114240019200_ref065","doi-asserted-by":"crossref","DOI":"10.1137\/1.9780898719376","volume-title":"L1 Adaptive Control Theory: Guar-anteed Robustness with Fast Adaptation","author":"Hovakimyan","year":"2010"},{"issue":"6","key":"2026033114240019200_ref066","doi-asserted-by":"crossref","first-page":"1083","DOI":"10.1016\/0005-1098(92)90053-I","article-title":"Neural networks for control systems\u2014A survey","volume":"28","author":"Hunt","year":"1992","journal-title":"Automatica"},{"key":"2026033114240019200_ref067","author":"Ioannou","year":"2006"},{"key":"2026033114240019200_ref068","volume-title":"Robust Adaptive Control","author":"Ioannou","year":"2012"},{"issue":"6","key":"2026033114240019200_ref069","doi-asserted-by":"crossref","first-page":"988","DOI":"10.1109\/TAC.2003.812781","article-title":"Coordination of groups of mobile autonomous agents using nearest neighbor rules","volume":"48","author":"Jadbabaie","year":"2003","journal-title":"IEEE Transactions on Automatic Control"},{"issue":"10","key":"2026033114240019200_ref070","doi-asserted-by":"crossref","first-page":"2699","DOI":"10.1016\/j.automatica.2012.06.096","article-title":"Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics","volume":"48","author":"Jiang","year":"2012","journal-title":"Automatica"},{"issue":"5","key":"2026033114240019200_ref071","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1016\/j.ejcon.2013.05.017","article-title":"Robust adaptive dynamic program-ming for linear and nonlinear systems: An overview","volume":"19","author":"Jiang","year":"2013","journal-title":"European Journal of Control"},{"key":"2026033114240019200_ref072","first-page":"6686","author":"Johnson","year":"2010"},{"issue":"8","key":"2026033114240019200_ref073","doi-asserted-by":"crossref","first-page":"1645","DOI":"10.1109\/TNNLS.2014.2350835","article-title":"Approximate N-player nonzero-sum game solution for an uncertain continuous nonlinear system","volume":"26","author":"Johnson","year":"2015","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"1","key":"2026033114240019200_ref074","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1613\/jair.301","article-title":"Reinforce-ment learning: A survey","volume":"4","author":"Kaelbling","year":"1996","journal-title":"Journal of Artificial Intelligence Research"},{"key":"2026033114240019200_ref075","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-78384-0","volume-title":"Reinforcement Learning for Optimal Feedback Control","author":"Kamalapurkar","year":"2018"},{"key":"2026033114240019200_ref076","author":"Kanellopoulos","year":"2019"},{"key":"2026033114240019200_ref077","author":"Kearns","year":"2007"},{"issue":"4","key":"2026033114240019200_ref078","doi-asserted-by":"crossref","first-page":"1167","DOI":"10.1016\/j.automatica.2014.02.015","article-title":"Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics","volume":"50","author":"Kiumarsi","year":"2014","journal-title":"Automatica"},{"key":"2026033114240019200_ref079","author":"Kiumarsi","year":"2017"},{"issue":"1","key":"2026033114240019200_ref080","doi-asserted-by":"crossref","first-page":"114","DOI":"10.1109\/TAC.1968.1098829","article-title":"On an iterative technique for Riccati equation computations","volume":"13","author":"Kleinman","year":"1968","journal-title":"IEEE Transactions on Automatic Control"},{"key":"2026033114240019200_ref081","first-page":"473","author":"Kokolakis","year":"2018"},{"key":"2026033114240019200_ref082","first-page":"2508","author":"Kokolakis","year":"2020"},{"issue":"12","key":"2026033114240019200_ref083","doi-asserted-by":"crossref","first-page":"3803","DOI":"10.1109\/TNNLS.2019.2899311","article-title":"Kinodynamic motion planning with continuous-time Q-learning: An online, model-free, and safe navigation framework","volume":"30","author":"Kontoudis","year":"2019","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"2026033114240019200_ref084","volume-title":"Nonlinear and Adaptive Control Design. Adaptive and Learning Systems for Signal Processing, Communication and Control","author":"Krsti\u0107","year":"1995"},{"key":"2026033114240019200_ref085","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.arcontrol.2017.04.001","article-title":"Systems and control for the future of humanity, research agenda: Current and future roles, impact and grand challenges","volume":"43","author":"Lamnabhi-Lagarrigue","year":"2017","journal-title":"Annual Reviews in Control"},{"issue":"6","key":"2026033114240019200_ref086","doi-asserted-by":"crossref","first-page":"912","DOI":"10.1109\/70.736775","article-title":"Optimal motion planning for multiple robots having independent goals","volume":"14","author":"LaValle","year":"1998","journal-title":"IEEE Transactions on Robotics and Automation"},{"issue":"11","key":"2026033114240019200_ref087","doi-asserted-by":"crossref","first-page":"2850","DOI":"10.1016\/j.automatica.2012.06.008","article-title":"Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems","volume":"48","author":"Lee","year":"2012","journal-title":"Automatica"},{"key":"2026033114240019200_ref088","first-page":"293","author":"Lemmon","year":"2010"},{"issue":"1","key":"2026033114240019200_ref089","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1109\/TSMCB.2010.2043839","article-title":"Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data","volume":"41","author":"Lewis","year":"2010","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)"},{"key":"2026033114240019200_ref090","author":"Lewis","year":"2013"},{"key":"2026033114240019200_ref091","author":"Lewis","year":"1998"},{"key":"2026033114240019200_ref092","author":"Lewis","year":"2012"},{"issue":"6","key":"2026033114240019200_ref093","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1109\/MCS.2012.2214134","article-title":"Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers","volume":"32","author":"Lewis","year":"2012","journal-title":"Control Systems, IEEE"},{"key":"2026033114240019200_ref094","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4471-5574-4","volume-title":"Cooperative Control of Multi-Agent Systems-Optimal and Adaptive Design Approaches","author":"Lewis-Movric","year":"2014"},{"issue":"5","key":"2026033114240019200_ref095","doi-asserted-by":"crossref","first-page":"1308","DOI":"10.1109\/TNNLS.2018.2861945","article-title":"Off-policy interleaved Q-learning: Optimal control for affine nonlinear discrete-time systems","volume":"30","author":"Li","year":"2018","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"2026033114240019200_ref096","first-page":"727","author":"Li","year":"2016"},{"key":"2026033114240019200_ref097","author":"Li","year":"1997"},{"issue":"3","key":"2026033114240019200_ref098","doi-asserted-by":"crossref","first-page":"683","DOI":"10.1109\/TSMCB.2009.2034976","article-title":"Studying bio-inspired coalition formation of robots for detecting intrusions using game theory","volume":"40","author":"Liang","year":"2009","journal-title":"IEEE Trans-actions on Systems, Man, and Cybernetics, Part B (Cybernetics)"},{"issue":"1","key":"2026033114240019200_ref099","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1016\/S1389-0417(01)00015-8","article-title":"Value-function reinforcement learning in Markov games","volume":"2","author":"Littman","year":"2001","journal-title":"Cognitive Systems Research"},{"key":"2026033114240019200_ref100","author":"Liu","year":"2013"},{"issue":"8","key":"2026033114240019200_ref101","doi-asserted-by":"crossref","first-page":"1015","DOI":"10.1109\/TSMC.2013.2295351","article-title":"Online synchronous approximate optimal learning algorithm for multi-player non-zero-sum games with unknown dynamics","volume":"44","author":"Liu","year":"2014","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics: Systems"},{"key":"2026033114240019200_ref102","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-50815-3","volume-title":"Adaptive Dynamic Programming with Application in Optimal Control","author":"Liu","year":"2017"},{"issue":"1","key":"2026033114240019200_ref103","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1109\/TCYB.2014.2319577","article-title":"Off-policy reinforcement learning for H\u221e control design","volume":"45","author":"Luo","year":"2014","journal-title":"IEEE Transactions on Cybernetics"},{"key":"2026033114240019200_ref104","first-page":"541","author":"Lyshevski","year":"1996"},{"key":"2026033114240019200_ref105","author":"Lyshevski","year":"1998"},{"issue":"1","key":"2026033114240019200_ref106","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1109\/TCST.2011.2174059","article-title":"Decentralized charging control of large populations of plug-in electric vehicles","volume":"21","author":"Ma","year":"2011","journal-title":"IEEE Transactions on Control Systems Technology"},{"key":"2026033114240019200_ref107","author":"MacKenzie","year":"2001"},{"issue":"7","key":"2026033114240019200_ref108","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1016\/j.artint.2007.01.003","article-title":"Multi-agent learning for engineers","volume":"171","author":"Mannor","year":"2007","journal-title":"Artificial Intelligence"},{"key":"2026033114240019200_ref109","author":"Marden","year":"2015"},{"key":"2026033114240019200_ref110","author":"Marden","year":"2018"},{"key":"2026033114240019200_ref111","author":"Marden","year":"2018"},{"key":"2026033114240019200_ref112","volume-title":"Max-Plus Methods for Nonlinear Control and Estimation","author":"McEneaney","year":"2006"},{"issue":"1","key":"2026033114240019200_ref113","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1006\/game.1995.1023","article-title":"Quantal response equilibria for normal form games","volume":"10","author":"McKelvey","year":"1995","journal-title":"Games and Economic Behavior"},{"key":"2026033114240019200_ref114","first-page":"3598","author":"Mehta","year":"2009"},{"key":"2026033114240019200_ref115","first-page":"664","author":"Melo","year":"2008"},{"issue":"7","key":"2026033114240019200_ref116","doi-asserted-by":"crossref","first-page":"1780","DOI":"10.1016\/j.automatica.2014.05.011","article-title":"Optimal tracking control of non-linear partially-unknown constrained-input systems using integral reinforcement learning","volume":"50","author":"Modares","year":"2014","journal-title":"Automatica"},{"issue":"10","key":"2026033114240019200_ref117","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1109\/TNNLS.2013.2276571","article-title":"Adaptive optimal control of unknown constrained-input systems using pol-icy iteration and neural Networks","volume":"24","author":"Modares","year":"2013","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"1","key":"2026033114240019200_ref118","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1016\/j.automatica.2013.09.043","article-title":"Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems","volume":"50","author":"Modares","year":"2014","journal-title":"Automatica"},{"issue":"10","key":"2026033114240019200_ref119","doi-asserted-by":"crossref","first-page":"2550","DOI":"10.1109\/TNNLS.2015.2441749","article-title":"H\u221e tracking control of completely unknown continuous-time systems via off-policy rein-forcement learning","volume":"26","author":"Modares","year":"2015","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"2","key":"2026033114240019200_ref120","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1109\/TAC.2012.2206719","article-title":"On the optimality of certainty equivalence for event-triggered control systems","volume":"58","author":"Molin","year":"2013","journal-title":"IEEE Transactions on Automatic Control"},{"key":"2026033114240019200_ref121","volume-title":"Theory of Games and Economic Behavior","author":"Morgenstern","year":"1953"},{"key":"2026033114240019200_ref122","author":"Mu","year":"2020"},{"key":"2026033114240019200_ref123","author":"Myerson","year":"2013"},{"issue":"2","key":"2026033114240019200_ref124","doi-asserted-by":"crossref","first-page":"286","DOI":"10.2307\/1969529","article-title":"Non-cooperative games","volume":"54","author":"Nash","journal-title":"Annals of Mathematics"},{"issue":"9","key":"2026033114240019200_ref125","doi-asserted-by":"crossref","first-page":"1520","DOI":"10.1109\/TAC.2004.834113","article-title":"Consensus problems in networks of agents with switching topology and time-delays","volume":"49","author":"Olfati-Saber","year":"2004","journal-title":"IEEE Transactions on Automatic Control"},{"issue":"1","key":"2026033114240019200_ref126","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1109\/JPROC.2006.887293","article-title":"Consensus and cooperation in networked multi-agent systems","volume":"95","author":"Olfati-Saber","year":"2007","journal-title":"Proceedings of the IEEE"},{"key":"2026033114240019200_ref127","first-page":"6123","author":"Paccagnan","year":"2016"},{"key":"2026033114240019200_ref128","first-page":"196","author":"Paccagnan","year":"2016"},{"issue":"4","key":"2026033114240019200_ref129","doi-asserted-by":"crossref","first-page":"1373","DOI":"10.1109\/TAC.2018.2849946","article-title":"Nash and Wardrop equilibria in aggregative games with coupling constraints","volume":"64","author":"Paccagnan","year":"2019","journal-title":"IEEE Transactions on Automatic Control"},{"issue":"2","key":"2026033114240019200_ref130","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1109\/TCYB.2014.2322116","article-title":"Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems","volume":"45","author":"Palanisamy","year":"2014","journal-title":"IEEE Transactions on Cybernetics"},{"issue":"5","key":"2026033114240019200_ref131","doi-asserted-by":"crossref","first-page":"848","DOI":"10.1109\/TAC.2006.875009","article-title":"A noncooperative game approach to OSNR optimization in optical networks","volume":"51","author":"Pavel","year":"2006","journal-title":"IEEE Transactions on Automatic Control"},{"key":"2026033114240019200_ref132","volume-title":"Lectures on Conditioned Reflexes: Twenty-Five Years of Objective Study of the Higher Nervous Activity (Behaviour) of Animals","author":"Pavlov","year":"1928"},{"key":"2026033114240019200_ref133","first-page":"2537","author":"Perez","year":"2012"},{"key":"2026033114240019200_ref134","first-page":"125","author":"Pita","year":"2008"},{"key":"2026033114240019200_ref135","author":"Polycarpou","year":"2003"},{"issue":"2","key":"2026033114240019200_ref136","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1002\/acs.2866","article-title":"Hybrid online learning control in networked multiagent systems: A survey","volume":"33","author":"Poveda","year":"2019","journal-title":"Inter-national Journal of Adaptive Control and Signal Processing"},{"key":"2026033114240019200_ref137","volume-title":"Cooperative Control of Dynamical Systems: Applications to Autonomous Vehicles","author":"Qu","year":"2009"},{"issue":"39","key":"2026033114240019200_ref138","first-page":"147","article-title":"Effective computability of winning strategies","volume":"3","author":"Rabin","year":"1957","journal-title":"Contributions to the Theory of Games"},{"key":"2026033114240019200_ref139","volume-title":"Using Game Theory for Distributed Control Engineering","author":"Rantzer","year":"2008"},{"key":"2026033114240019200_ref140","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1146\/annurev-control-053018-023825","article-title":"A tour of reinforcement learning: The view from continuous control","volume":"2","author":"Recht","year":"2019","journal-title":"Annual Review of Control, Robotics, and Au-tonomous Systems"},{"issue":"5","key":"2026033114240019200_ref141","doi-asserted-by":"crossref","first-page":"655","DOI":"10.1109\/TAC.2005.846556","article-title":"Consensus seeking in multiagent systems under dynamically changing interaction topologies","volume":"50","author":"Ren","year":"2005","journal-title":"IEEE Transactions on Automatic Control"},{"key":"2026033114240019200_ref142","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-84800-015-5","volume-title":"Distributed Consensus in Multi-Vehicle Cooperative Control","author":"Ren","year":"2008"},{"key":"2026033114240019200_ref143","author":"Ren","year":"2005"},{"issue":"1","key":"2026033114240019200_ref144","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1016\/S0899-8256(05)80020-X","article-title":"Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term","volume":"8","author":"Roth","year":"1995","journal-title":"Games and Economic Behavior"},{"key":"2026033114240019200_ref145","first-page":"1","author":"Roy","year":"2010"},{"key":"2026033114240019200_ref146","volume-title":"Principles of Mathematical Analysis","author":"Rudin","year":"1964"},{"issue":"5","key":"2026033114240019200_ref147","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1109\/MSP.2012.2186410","article-title":"Game-theoretic methods for the smart grid: An overview of microgrid systems, demand-side management, and smart grid communications","volume":"29","author":"Saad","year":"2012","journal-title":"IEEE Signal Processing Magazine"},{"issue":"3","key":"2026033114240019200_ref148","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1109\/LCSYS.2020.2979572","article-title":"On\u2013off adversarially robust Q-learning","volume":"4","author":"Sahoo","year":"2020","journal-title":"IEEE Control Systems Letters"},{"key":"2026033114240019200_ref149","author":"Saleheen","year":"2019"},{"issue":"6","key":"2026033114240019200_ref150","doi-asserted-by":"crossref","first-page":"770","DOI":"10.1109\/9.256331","article-title":"L2-gain analysis of nonlinear systems and nonlinear state-feedback H\u221e control","volume":"37","author":"Schaft van der","year":"1992","journal-title":"IEEE Transactions on Automatic Control"},{"issue":"1","key":"2026033114240019200_ref151","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1006\/game.2000.0794","article-title":"Zermelo and the early history of game theory","volume":"34","author":"Schwalbe","year":"2001","journal-title":"Games and Economic Behavior"},{"key":"2026033114240019200_ref152","author":"Semsar-Kazerooni","year":"2009"},{"issue":"5","key":"2026033114240019200_ref153","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1007\/BF00935665","article-title":"On the Stackelberg strategy in nonzero-sum games","volume":"11","author":"Simaan","year":"1973","journal-title":"Journal of Optimization Theory and Applica-tions"},{"issue":"6","key":"2026033114240019200_ref154","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1007\/BF00935561","article-title":"Additional aspects of the Stack-elberg strategy in nonzero-sum games","volume":"11","author":"Simaan","year":"1973","journal-title":"Journal of Optimization Theory and Applications"},{"key":"2026033114240019200_ref155","volume-title":"Models of Bounded Rationality, Volume 1: Eco-nomic Analysis and Public Policy","author":"Simon","year":"1984"},{"key":"2026033114240019200_ref156","author":"Solowjow","year":"2020"},{"key":"2026033114240019200_ref157","first-page":"339","author":"Sontag","year":"1993","journal-title":"Essays on Control: Perspectives in the Theory and Its Applications"},{"key":"2026033114240019200_ref158","article-title":"Mathematical theory of neural networks","author":"Sontag","year":"1997","journal-title":"Tech. rep."},{"key":"2026033114240019200_ref159","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1016\/j.jebo.2014.09.002","article-title":"Depth of reasoning and higher order beliefs","volume":"108","author":"Strzalecki","year":"2014","journal-title":"Journal of Economic Behavior & Organization"},{"key":"2026033114240019200_ref160","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton","year":"2018"},{"issue":"2","key":"2026033114240019200_ref161","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1109\/37.126844","article-title":"Reinforcement learning is direct adaptive optimal control","volume":"12","author":"Sutton","year":"1992","journal-title":"Control Systems, IEEE"},{"issue":"9","key":"2026033114240019200_ref162","doi-asserted-by":"crossref","first-page":"1680","DOI":"10.1109\/TAC.2007.904277","article-title":"Event-triggered real-time scheduling of stabilizing control tasks","volume":"52","author":"Tabuada","year":"2007","journal-title":"IEEE Transactions on Automatic Control"},{"key":"2026033114240019200_ref163","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511973031","volume-title":"Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned","author":"Tambe","year":"2011"},{"key":"2026033114240019200_ref164","author":"Tao","year":"2003"},{"key":"2026033114240019200_ref165","doi-asserted-by":"crossref","DOI":"10.1007\/978-93-86279-17-0","volume-title":"Introduction to Game Theory","author":"Tijs","year":"2003"},{"key":"2026033114240019200_ref166","author":"Tsai","year":"2009"},{"key":"2026033114240019200_ref167","article-title":"Problems in Decentralized Decision Making and Computation","author":"Tsitsiklis","year":"1984","journal-title":"Ph.D. Thesis"},{"issue":"3","key":"2026033114240019200_ref168","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1023\/A:1022689125041","article-title":"Asynchronous stochastic approximation and Q-learning","volume":"16","author":"Tsitsiklis","year":"1994","journal-title":"Machine Learning"},{"key":"2026033114240019200_ref169","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-75151-1","volume-title":"Pareto\u2013Nash\u2013Stackelberg Game and Control Theory","author":"Ungureanu","year":"2018"},{"issue":"3","key":"2026033114240019200_ref170","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1109\/JAS.2014.7004686","article-title":"Event-triggered optimal adaptive control algorithm for continuous-time nonlinear systems","volume":"1","author":"Vamvoudakis","year":"2014","journal-title":"IEEE\/CAA Journal of Automatica Sinica"},{"key":"2026033114240019200_ref171","first-page":"1","author":"Vamvoudakis","year":"2014"},{"key":"2026033114240019200_ref172","author":"Vamvoudakis","year":"2015"},{"key":"2026033114240019200_ref173","author":"Vamvoudakis","year":"2017"},{"issue":"5","key":"2026033114240019200_ref174","doi-asserted-by":"crossref","first-page":"878","DOI":"10.1016\/j.automatica.2010.02.018","article-title":"Online actor\u2013critic algorithm to solve the continuous-time infinite horizon optimal control problem","volume":"46","author":"Vamvoudakis","year":"2010","journal-title":"Automatica"},{"issue":"8","key":"2026033114240019200_ref175","doi-asserted-by":"crossref","first-page":"1556","DOI":"10.1016\/j.automatica.2011.03.005","article-title":"Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton\u2013Jacobi equations","volume":"47","author":"Vamvoudakis","year":"2011","journal-title":"Automatica"},{"issue":"13","key":"2026033114240019200_ref176","doi-asserted-by":"crossref","first-page":"1460","DOI":"10.1002\/rnc.1760","article-title":"Online solution of nonlinear two-player zero-sum games using synchronous policy iteration","volume":"22","author":"Vamvoudakis","year":"2012","journal-title":"International Journal of Robust and Nonlinear Control"},{"key":"2026033114240019200_ref177","author":"Vamvoudakis","year":"2018"},{"issue":"4","key":"2026033114240019200_ref178","doi-asserted-by":"crossref","first-page":"1018","DOI":"10.1109\/TAC.2017.2734840","article-title":"Cooperative Q-learning for rejection of persistent adversarial inputs in networked linear quadratic systems","volume":"63","author":"Vamvoudakis","year":"2018","journal-title":"IEEE Transactions on Automatic Control"},{"key":"2026033114240019200_ref179","first-page":"1","author":"Vamvoudakis","year":"2019"},{"issue":"4","key":"2026033114240019200_ref180","first-page":"315","article-title":"Online learning algorithm for zero-sum games with integral reinforcement learning","volume":"1","author":"Vamvoudakis","year":"2011","journal-title":"Journal of Artificial Intelligence and Soft Computing Research"},{"issue":"8","key":"2026033114240019200_ref181","doi-asserted-by":"crossref","first-page":"1598","DOI":"10.1016\/j.automatica.2012.05.074","article-title":"Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality","volume":"48","author":"Vamvoudakis","year":"2012","journal-title":"Automatica"},{"key":"2026033114240019200_ref182","author":"Vamvoudakis","year":"2014"},{"key":"2026033114240019200_ref183","first-page":"5062","author":"Vamvoudakis","year":"2015"},{"issue":"11","key":"2026033114240019200_ref184","doi-asserted-by":"crossref","first-page":"2386","DOI":"10.1109\/TNNLS.2015.2487972","article-title":"Asymptotically stable adaptive\u2013optimal control algorithm with saturating actuators and relaxed persistence of excitation","volume":"27","author":"Vamvoudakis","year":"2016","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"1","key":"2026033114240019200_ref185","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1109\/MCS.2016.2621461","article-title":"Game theory-based control system algorithms with real-time rein-forcement learning: How to solve multiplayer games online","volume":"37","author":"Vamvoudakis","year":"2017","journal-title":"IEEE Control Systems Magazine"},{"issue":"4","key":"2026033114240019200_ref186","doi-asserted-by":"crossref","first-page":"598","DOI":"10.1002\/rnc.3587","article-title":"Event-triggered optimal tracking control of nonlinear systems","volume":"27","author":"Vamvoudakis","year":"2017","journal-title":"Inter-national Journal of Robust and Nonlinear Control"},{"issue":"2","key":"2026033114240019200_ref187","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1002\/acs.2831","article-title":"Open-loop Stackelberg learning solution for hierarchical control problems","volume":"33","author":"Vamvoudakis","year":"2019","journal-title":"International Journal of Adaptive Control and Signal Processing"},{"issue":"3","key":"2026033114240019200_ref188","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1111\/j.1467-6419.2010.00668.x","article-title":"Non-linear dynamics, complexity and randomness: Algorithmic foundations","volume":"25","author":"Velupillai","year":"2011","journal-title":"Journal of Economic Surveys"},{"key":"2026033114240019200_ref189","volume-title":"Market Structure and Equilibrium","author":"Von Stackelberg","year":"2010"},{"issue":"2","key":"2026033114240019200_ref190","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1016\/j.automatica.2008.08.017","article-title":"Adaptive optimal control for continuous-time linear systems based on policy iteration","volume":"45","author":"Vrabie","year":"2009","journal-title":"Automatica"},{"key":"2026033114240019200_ref191","volume-title":"Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles","author":"Vrabie","year":"2013"},{"key":"2026033114240019200_ref192","volume-title":"Deterministic Learning Theory for Identification, Recognition, and Control","author":"Wang","year":"2009"},{"issue":"7","key":"2026033114240019200_ref193","doi-asserted-by":"crossref","first-page":"1358","DOI":"10.1109\/TSMC.2016.2592682","article-title":"Event-driven adaptive robust control of nonlinear systems with uncertainties through NDP strategy","volume":"47","author":"Wang","year":"2016","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics: Systems"},{"key":"2026033114240019200_ref194","author":"Wang","year":"2017"},{"issue":"10","key":"2026033114240019200_ref195","doi-asserted-by":"crossref","first-page":"3417","DOI":"10.1109\/TCYB.2017.2653800","article-title":"Improving the critic learning for event-based nonlinear H\u221e control design","volume":"47","author":"Wang","year":"2017","journal-title":"IEEE Transactions on Cybernetics"},{"issue":"10","key":"2026033114240019200_ref196","doi-asserted-by":"crossref","first-page":"8177","DOI":"10.1109\/TIE.2017.2698377","article-title":"Event-driven nonlinear discounted optimal regulation involving a power system application","volume":"64","author":"Wang","year":"2017","journal-title":"IEEE Transactions on Industrial Electronics"},{"issue":"4","key":"2026033114240019200_ref197","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1109\/TNNLS.2016.2642128","article-title":"On mixed data and event driven design for adaptive-critic-based nonlinear H\u221e control","volume":"29","author":"Wang","year":"2017","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"3","key":"2026033114240019200_ref198","doi-asserted-by":"crossref","first-page":"586","DOI":"10.1109\/TAC.2010.2057951","article-title":"Event-triggering in distributed networked control systems","volume":"56","author":"Wang","year":"2011","journal-title":"IEEE Transactions on Automatic Control"},{"key":"2026033114240019200_ref199","author":"Wang","year":"2002"},{"key":"2026033114240019200_ref200","volume-title":"Learning from Delayed Rewards","author":"Watkins","year":"1989"},{"key":"2026033114240019200_ref201","author":"Watkins","year":"1992"},{"key":"2026033114240019200_ref202","first-page":"5054","author":"Webb","year":"2013"},{"issue":"2","key":"2026033114240019200_ref203","doi-asserted-by":"crossref","first-page":"444","DOI":"10.1109\/TNNLS.2015.2464080","article-title":"Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP","volume":"27","author":"Wei","year":"2015","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"2026033114240019200_ref204","author":"Werbos","year":"1992"},{"key":"2026033114240019200_ref205","first-page":"109","author":"Werbos","year":"2007"},{"issue":"3","key":"2026033114240019200_ref206","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-642-27645-3","article-title":"Reinforcement learning","volume":"12","author":"Wiering","year":"2012","journal-title":"Adaptation, Learning, and Optimization"},{"issue":"12","key":"2026033114240019200_ref207","doi-asserted-by":"crossref","first-page":"1884","DOI":"10.1109\/TNNLS.2012.2217349","article-title":"Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear H\u221e control","volume":"23","author":"Wu","year":"2012","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"6","key":"2026033114240019200_ref208","doi-asserted-by":"crossref","first-page":"1017","DOI":"10.1016\/j.automatica.2012.03.007","article-title":"Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses","volume":"48","author":"Xu","year":"2012","journal-title":"Automatica"},{"issue":"3","key":"2026033114240019200_ref209","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1080\/23307706.2014.926624","article-title":"Optimal regulation of uncertain dynamic systems using adaptive dynamic programming","volume":"1","author":"Xu","year":"2014","journal-title":"Journal of Control and Decision"},{"issue":"6","key":"2026033114240019200_ref210","doi-asserted-by":"crossref","first-page":"2255","DOI":"10.1109\/TCYB.2018.2823199","article-title":"Adaptive critic designs for event-triggered robust control of nonlinear systems with unknown dynamics","volume":"49","author":"Yang","year":"2018","journal-title":"IEEE Transactions on Cybernetics"},{"issue":"9","key":"2026033114240019200_ref211","doi-asserted-by":"crossref","first-page":"3706","DOI":"10.1002\/rnc.4962","article-title":"Safe rein-forcement learning for dynamical games","volume":"30","author":"Yang","year":"2020","journal-title":"International Journal of Robust and Nonlinear Control"},{"key":"2026033114240019200_ref212","author":"Yang","year":"2020"},{"issue":"9","key":"2026033114240019200_ref213","doi-asserted-by":"crossref","first-page":"4811","DOI":"10.1109\/TAC.2017.2688452","article-title":"Distributed Nash equilibrium seeking by a consensus-based approach","volume":"62","author":"Ye","year":"2017","journal-title":"IEEE Transactions on Automatic Control"},{"issue":"5","key":"2026033114240019200_ref214","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1109\/TAC.1983.1103275","article-title":"Feedback, minimax sensitivity, and optimal robustness","volume":"28","author":"Zames","year":"1983","journal-title":"IEEE Transactions on Automatic Control"},{"issue":"1","key":"2026033114240019200_ref215","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1109\/TSMCB.2012.2203336","article-title":"Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP","volume":"43","author":"Zhang","year":"2012","journal-title":"IEEE Transactions on Cybernetics"},{"issue":"7","key":"2026033114240019200_ref216","doi-asserted-by":"crossref","first-page":"1315","DOI":"10.1109\/TCYB.2014.2350511","article-title":"Distributed co-operative optimal control for multiagent systems on directed graphs: An inverse optimal approach","volume":"45","author":"Zhang","year":"2014","journal-title":"IEEE Transactions on Cybernetics"},{"issue":"12","key":"2026033114240019200_ref217","doi-asserted-by":"crossref","first-page":"2706","DOI":"10.1109\/TCYB.2014.2313915","article-title":"Online adaptive policy learning algorithm for H\u221e state feedback control of unknown affine nonlinear discrete-time systems","volume":"44","author":"Zhang","year":"2014","journal-title":"IEEE Transactions on Cybernetics"},{"issue":"1","key":"2026033114240019200_ref218","doi-asserted-by":"crossref","first-page":"152","DOI":"10.1109\/TFUZZ.2014.2310238","article-title":"Leader-based optimal coordination control for the consensus problem of multiagent differential games via fuzzy adaptive dynamic programming","volume":"23","author":"Zhang","year":"2014","journal-title":"IEEE Transactions on Fuzzy Systems"},{"issue":"1","key":"2026033114240019200_ref219","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1109\/TNNLS.2016.2614002","article-title":"Event-based robust control for uncertain nonlinear systems using adaptive dynamic programming","volume":"29","author":"Zhang","year":"2016","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"7","key":"2026033114240019200_ref220","doi-asserted-by":"crossref","first-page":"1071","DOI":"10.1109\/TSMC.2016.2531680","article-title":"Event-triggered H\u221e control for continuous-time nonlinear system via concurrent learning","volume":"47","author":"Zhang","year":"2016","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics: Systems"},{"key":"2026033114240019200_ref221","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-030-33384-3","volume-title":"Deep Reinforcement Learning with Guaranteed Performance","author":"Zhang","year":"2020"},{"issue":"3","key":"2026033114240019200_ref222","doi-asserted-by":"crossref","first-page":"683","DOI":"10.1109\/TCYB.2016.2523878","article-title":"An event-triggered ADP control approach for continuous-time system with unknown internal states","volume":"47","author":"Zhong","year":"2016","journal-title":"IEEE Transactions on Cybernetics"},{"issue":"5","key":"2026033114240019200_ref223","doi-asserted-by":"crossref","first-page":"4101","DOI":"10.1109\/TIE.2016.2597763","article-title":"Event-triggered optimal control for partially unknown constrained-input systems via adaptive dynamic programming","volume":"64","author":"Zhu","year":"2016","journal-title":"IEEE Transactions on Industrial Electronics"}],"container-title":["Foundations and Trends\u00ae in Systems and Control"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/ftsys\/article-pdf\/8\/1-2\/1\/11150041\/2600000022en.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/ftsys\/article-pdf\/8\/1-2\/1\/11150041\/2600000022en.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T19:00:45Z","timestamp":1777489245000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/ftsys\/article\/8\/1-2\/1\/1332195\/Synchronous-Reinforcement-Learning-Based-Control"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,1]]},"references-count":224,"journal-issue":{"issue":"1-2","published-print":{"date-parts":[[2021,1,1]]}},"URL":"https:\/\/doi.org\/10.1561\/2600000022","relation":{},"ISSN":["2325-6818","2325-6826"],"issn-type":[{"value":"2325-6818","type":"print"},{"value":"2325-6826","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,1]]}}}