{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,20]],"date-time":"2026-01-20T12:26:42Z","timestamp":1768912002819,"version":"3.49.0"},"reference-count":53,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,4,18]],"date-time":"2025-04-18T00:00:00Z","timestamp":1744934400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100005329","name":"Natural Science Foundation of Guizhou Province","doi-asserted-by":"publisher","award":["MS[2025]047"],"award-info":[{"award-number":["MS[2025]047"]}],"id":[{"id":"10.13039\/501100005329","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Construction of Engineering Research Centers in Higher Education Institutions in Guizhou Province","award":["Qian Education and Technology [2023]041"],"award-info":[{"award-number":["Qian Education and Technology [2023]041"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Axioms"],"abstract":"<jats:p>Since high complexity and uncertainty is inherent in real-world environments that can influence the strategies choices of agents, we introduce a stochastic perturbation term to characterize the interference caused by uncertain factors on multi-agent systems (MASs). Firstly, the stochastic Q learning is designed by introducing stochastic perturbation term into Q learning, and the corresponding replicator dynamic equations of stochastic Q learning are derived. Secondly, we focus on two-agent games with two and three action scenarios, analyzing the impact of learning parameters on agents\u2019 strategy selection and demonstrating how the learning process converges to its Nash equilibria. Finally, we also conduct a sensitivity analysis on exploration parameters, demonstrating how exploration rates affect the convergence process in potential games. The analysis and numerical experiments offer insights into the effectiveness of different exploration parameters in scenarios involving uncertainty.<\/jats:p>","DOI":"10.3390\/axioms14040311","type":"journal-article","created":{"date-parts":[[2025,4,18]],"date-time":"2025-04-18T02:25:31Z","timestamp":1744943131000},"page":"311","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Evolutionary Dynamics of Stochastic Q Learning in Multi-Agent Systems"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0656-781X","authenticated-orcid":false,"given":"Luping","family":"Liu","sequence":"first","affiliation":[{"name":"Computer and Information Engineering College, Guizhou University of Commerce, Guiyang 550014, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7795-2953","authenticated-orcid":false,"given":"Gang","family":"Sun","sequence":"additional","affiliation":[{"name":"Computer and Information Engineering College, Guizhou University of Commerce, Guiyang 550014, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,4,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Masri, H., P\u00e9rez-Gladish, B., and Zopounidis, C. (2018). A behavioral and rational investor modeling to explain subprime crisis: Multi-agent systems simulation in artificial financial markets. Financial Decision Aid Using Multiple Criteria. Multiple Criteria Decision Making, Springer.","DOI":"10.1007\/978-3-319-68876-3"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1006\/jeth.1998.2419","article-title":"A learning approach to auctions","volume":"82","author":"Monderer","year":"1998","journal-title":"J. Econ. Theory"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Achbarou, O., Kiram, M.A.E., Bourkoukou, O., and Elbouanani, S. (2018). A Multi-Agent System Based Distributed Intrusion Detection System for a Cloud Computing, International Conference on Model and Data Engineering, Springer.","DOI":"10.1007\/978-3-030-02852-7_9"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1109\/JAS.2021.1003820","article-title":"Physical safety and cyber security analysis of multi-agent systems: A survey of recent advances","volume":"8","author":"Zhang","year":"2021","journal-title":"IEEE\/CAA J. Autom. Sin."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Yong, B.X., and Brintrup, A. (2019). Multi-Agent System for Machine Learning Under Uncertainty in Cyber Physical Manufacturing System, International Workshop on Service Orientation in Holonic and Multi-Agent Manufacturing, Springer.","DOI":"10.1007\/978-3-030-27477-1_19"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"697","DOI":"10.1109\/TCST.2018.2884226","article-title":"Distributed formation control using artificial potentials and neural network for constrained multiagent systems","volume":"28","author":"Liu","year":"2018","journal-title":"IEEE Trans. Control Syst. Technol."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","article-title":"Grandmaster level in StarCraft II using multi-agent reinforcement learning","volume":"575","author":"Vinyals","year":"2019","journal-title":"Nature"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"4091","DOI":"10.1109\/TIE.2016.2542134","article-title":"Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method","volume":"64","author":"Zhang","year":"2016","journal-title":"IEEE Trans. Ind. Electron."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Sutton, R., and Barto, A. (1998). Reinforcement Learning: An Introduction, MIT Press.","DOI":"10.1109\/TNN.1998.712192"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"11216","DOI":"10.1109\/TPAMI.2024.3457538","article-title":"A review of safe reinforcement learning: Methods, theories and applications","volume":"46","author":"Gu","year":"2024","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_11","unstructured":"Albrecht, S.V., Christianos, F., and Sch\u00e4fer, L. (2024). Multi-Agent Reinforcement Learning: Foundations and Modern Approaches, MIT Press."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1086","DOI":"10.1109\/TITS.2019.2901791","article-title":"Multi-agent deep reinforcement learning for large-scale traffic signal control","volume":"21","author":"Chu","year":"2019","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"3133","DOI":"10.1109\/COMST.2019.2916583","article-title":"Applications of deep reinforcement learning in communications and networking: A survey","volume":"21","author":"Nguyen","year":"2019","journal-title":"IEEE Commun. Surv. Tutor."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"128182","DOI":"10.1016\/j.energy.2023.128182","article-title":"Collaborative optimization of multi-microgrids system with shared energy storage based on multi-agent stochastic game and reinforcement learning","volume":"280","author":"Wang","year":"2023","journal-title":"Energy"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1016\/j.apenergy.2018.11.002","article-title":"Reinforcement learning for demand response: A review of algorithms and modeling techniques","volume":"235","author":"Nagy","year":"2019","journal-title":"Appl. Energy"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1109\/TSMCC.2007.913919","article-title":"A comprehensive survey of multiagent reinforcement learning","volume":"38","author":"Busoniu","year":"2008","journal-title":"IEEE Trans. Syst. Man Cybern. Part C Appl. Rev."},{"key":"ref_17","first-page":"41","article-title":"Multiagent learning: Basics, challenges, and prospects","volume":"33","author":"Tuyls","year":"2012","journal-title":"AI Mag."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1287\/moor.2016.0778","article-title":"Learning in games via reinforcement and regularization","volume":"41","author":"Mertikopoulos","year":"2016","journal-title":"Math. Oper. Res."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"4902","DOI":"10.1038\/s41598-018-22013-5","article-title":"The prevalence of chaotic dynamics in games with many players","volume":"8","author":"Sanders","year":"2018","journal-title":"Sci. Rep."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Tuyls, K., Verbeeck, K., and Lenaerts, T. (2003, January 14\u201318). A selection-mutation model for Q learning in multi-agent systems. Proceedings of the 2th International Joint Conference on Autonomous Agents and Multiagent Systems, Melbourne, Australia.","DOI":"10.1145\/860575.860687"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1401","DOI":"10.1111\/j.1468-0262.2005.00625.x","article-title":"Adaptive heuristics","volume":"73","author":"Hart","year":"2004","journal-title":"Econometrica"},{"key":"ref_22","unstructured":"Gomes, E.R., and Kowalczyk, R. (2009, January 14\u201318). Dynamic analysis of multiagent Q learning with \u03b5-greedy exploration. Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, Canada."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"043305","DOI":"10.1103\/PhysRevE.99.043305","article-title":"Deterministic limit of temporal difference reinforcement learning for stochastic games","volume":"99","author":"Barfuss","year":"2019","journal-title":"Phys. Rev. E"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1007\/BF00993306","article-title":"Asynchronous stochastic approximation and Q learning","volume":"16","author":"Tsitsiklis","year":"1994","journal-title":"Mach. Learn."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"538","DOI":"10.1007\/s13235-020-00372-x","article-title":"Stochastic evolutionary game analysis between special committees and CEO: Incentive and Supervision","volume":"11","author":"Liu","year":"2021","journal-title":"Dyn. Games Appl."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"015206","DOI":"10.1103\/PhysRevE.67.015206","article-title":"Coupled replicator equations for the dynamics of learning in multiagent systems","volume":"67","author":"Sato","year":"2003","journal-title":"Phys. Rev. E"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Blum, A., and Monsour, Y. (2007). Learning, regret minimization and equilibria. Algorithmic Game Theory, Cambridge University Press.","DOI":"10.1017\/CBO9780511800481.006"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Klos, T., van Ahee, G.J., and Tuyls, K. (2010). Evolutionary Dynamics of Regret Minimization, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer.","DOI":"10.1007\/978-3-642-15883-4_6"},{"key":"ref_29","first-page":"534","article-title":"Modelling the dynamics of regret minimization in large agent populations: A master equation approach","volume":"22","author":"Wang","year":"2022","journal-title":"IJCAI"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"2954","DOI":"10.1038\/s41467-018-05259-5","article-title":"Exploiting a cognitive bias promotes cooperation in social dilemma experiments","volume":"9","author":"Wang","year":"2018","journal-title":"Nat. Commun."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"e1601444","DOI":"10.1126\/sciadv.1601444","article-title":"Onymity promotes cooperation in social dilemma experiments","volume":"3","author":"Wang","year":"2017","journal-title":"Sci. Adv."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"17650","DOI":"10.1073\/pnas.1922345117","article-title":"Communicating sentiment and outlook reverses inaction against collective risks","volume":"117","author":"Wang","year":"2020","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1109\/TEVC.2010.2059031","article-title":"Differential evolution: A survey of the state-of-the-art","volume":"15","author":"Das","year":"2010","journal-title":"IEEE Trans. Evol. Comput."},{"key":"ref_34","first-page":"659","article-title":"Evolutionary dynamics of multi-agent learning: A survey","volume":"53","author":"Bloembergen","year":"2015","journal-title":"Artif. Intell."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1006\/jeth.1997.2319","article-title":"Learning through reinforcement and replicator dynamics","volume":"77","author":"Sarin","year":"1997","journal-title":"J. Econ. Theory"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1007\/s10458-005-3783-9","article-title":"An evolutionary dynamical analysis of multi-agent learning in iterated games","volume":"12","author":"Tuyls","year":"2006","journal-title":"Auton. Agents Multi-Agent Syst."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"406","DOI":"10.1016\/j.artint.2007.01.004","article-title":"What evolutionary game theory tells us about multiagent learning","volume":"171","author":"Tuyls","year":"2007","journal-title":"Artif. Intell."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"387","DOI":"10.1007\/s10458-005-2631-2","article-title":"Cooperative multi-agent learning: The state of the art","volume":"11","author":"Panait","year":"2005","journal-title":"Auton. Agents Multi-Agent Syst."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1016\/S0004-3702(02)00121-2","article-title":"Multi-agent learning using a variable learning rate","volume":"136","author":"Bowling","year":"2002","journal-title":"Artif. Intell."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"041145","DOI":"10.1103\/PhysRevE.85.041145","article-title":"Dynamics of boltzmann Q learning in two-player two-action games","volume":"85","author":"Kianercy","year":"2012","journal-title":"Phys. Rev. E"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"103653","DOI":"10.1016\/j.artint.2021.103653","article-title":"Exploration-exploitation in multi-agent learning: Catastrophe theory meets game theory","volume":"304","author":"Leonardos","year":"2022","journal-title":"Artif. Intell."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"4956","DOI":"10.1109\/TIE.2017.2674625","article-title":"Nash equilibrium topology of multi-agent systems with competitive groups","volume":"64","author":"Ma","year":"2017","journal-title":"IEEE Trans. Ind. Electron."},{"key":"ref_43","unstructured":"Sandholm, W.H. (2010). Population Games and Evolutionary Dynamics, MIT Press."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/j.physrep.2007.04.004","article-title":"Evolutionary games on graphs","volume":"446","author":"Gabor","year":"2007","journal-title":"Phys. Rep."},{"key":"ref_45","unstructured":"Martin, A.N. (2006). Evolutionary Dynamics: Exploring the Equations of Life, Harvard University Press."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Wiering, M.A., and van Otterlo, M. (2012). Reinforcement Learning: State-of-the-Art, Springer.","DOI":"10.1007\/978-3-642-27645-3"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1171","DOI":"10.1016\/S0893-9659(03)90113-3","article-title":"Approximate solution for some stochastic differential equations involving both Gaussian and Poissonian white noises","volume":"16","author":"Jumarie","year":"2003","journal-title":"Appl. Math. Lett."},{"key":"ref_48","first-page":"1371","article-title":"Evolutionary selection in normal-form games","volume":"63","author":"Ritzberger","year":"1995","journal-title":"Econom. J. Econom. Soc."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"116","DOI":"10.2307\/2227641","article-title":"Studies in the economics of transportation","volume":"67","author":"Smeed","year":"1957","journal-title":"Econ. J."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1006\/game.1995.1023","article-title":"Quantal response equilibria for normal form games","volume":"10","author":"McKelvey","year":"1995","journal-title":"Games Econ. Behav."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1287\/moor.2014.0687","article-title":"Penalty-regulated dynamics and robust learning procedures in games","volume":"40","author":"Coucheney","year":"2015","journal-title":"Math. Oper. Res."},{"key":"ref_52","first-page":"369","article-title":"Super-convergence: Very fast training of neural networks using large learning rates","volume":"11006","author":"Smith","year":"2019","journal-title":"Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications"},{"key":"ref_53","unstructured":"Li, H., Xu, Z., Taylor, G., Studer, C., and Goldstein, T. (2018, January 3\u20138). Visualizing the loss landscape of neural nets. Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, Canada."}],"container-title":["Axioms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2075-1680\/14\/4\/311\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:16:56Z","timestamp":1760030216000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2075-1680\/14\/4\/311"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,18]]},"references-count":53,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,4]]}},"alternative-id":["axioms14040311"],"URL":"https:\/\/doi.org\/10.3390\/axioms14040311","relation":{},"ISSN":["2075-1680"],"issn-type":[{"value":"2075-1680","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,18]]}}}