{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,16]],"date-time":"2025-10-16T06:59:42Z","timestamp":1760597982991,"version":"build-2065373602"},"reference-count":24,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2020,8,30]],"date-time":"2020-08-30T00:00:00Z","timestamp":1598745600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>In a general Markov decision progress system, only one agent\u2019s learning evolution is considered. However, considering the learning evolution of a single agent in many problems has some limitations, more and more applications involve multi-agent. There are two types of cooperation, game environment among multi-agent. Therefore, this paper introduces a Cooperation Markov Decision Process (CMDP) system with two agents, which is suitable for the learning evolution of cooperative decision between two agents. It is further found that the value function in the CMDP system also converges in the end, and the convergence value is independent of the choice of the value of the initial value function. This paper presents an algorithm for finding the optimal strategy pair (\u03c0k0,\u03c0k1) in the CMDP system, whose fundamental task is to find an optimal strategy pair and form an evolutionary system CMDP(\u03c0k0,\u03c0k1). Finally, an example is given to support the theoretical results.<\/jats:p>","DOI":"10.3390\/e22090955","type":"journal-article","created":{"date-parts":[[2020,8,30]],"date-time":"2020-08-30T06:06:22Z","timestamp":1598767582000},"page":"955","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["The Convergence of a Cooperation Markov Decision Process System"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0070-1061","authenticated-orcid":false,"given":"Xiaoling","family":"Mo","sequence":"first","affiliation":[{"name":"College of Computer Science and Technology, Guizhou University, Guiyang 550025, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daoyun","family":"Xu","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Guizhou University, Guiyang 550025, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6605-2370","authenticated-orcid":false,"given":"Zufeng","family":"Fu","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Guizhou University, Guiyang 550025, China"},{"name":"Department of Electronics and Information Engineering, Anshun University, Anshun 561000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,8,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Shalev-Shwartz, S. (2014). Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press.","DOI":"10.1017\/CBO9781107298019"},{"key":"ref_2","first-page":"550","article-title":"A reputation-oriented reinforcement learning approach for agents in electronic marketplaces","volume":"18","author":"Tran","year":"2002","journal-title":"Am. Assoc. Artif. Intell."},{"key":"ref_3","unstructured":"Jonathan, B., Andrew, T., and Lex, W. (2001). Reinforcement learning and chess. Machines that Learn to Play Games, Nova Science Publishers, Inc."},{"key":"ref_4","unstructured":"Liebman, E., and Stone, P. (2015, January 4\u20138). DJ-MC: A Reinforcement-Learning Agent for Music Playlist Recommendation. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, Istanbul, Turkey."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Van Hassel, H. (2012). Reinforcement Learning in Continuous State and Action Spaces. Reinforcement Learning, Springer.","DOI":"10.1007\/978-3-642-27645-3_7"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"829","DOI":"10.1109\/TNN.2006.875990","article-title":"A statistical property of multiagent learning based on Markov decision process","volume":"17","author":"Iwata","year":"2006","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_7","unstructured":"Julien, P., Bilal, P., Matthieu, G., Bruno, S., and Olivier, P. (2016, January 19\u201324). Softened approximate policy iteration for Markov games. Proceedings of the 33rd International Conference on Machine Learning (ICML), New York, NY, USA."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1038\/nature14540","article-title":"Reinforcement learning improves behaviour from evaluative feedback","volume":"521","author":"Littman","year":"2015","journal-title":"Nature"},{"key":"ref_9","unstructured":"Leibo, J.Z., Zambaldi, V., Lanctot, M., Marecki, J., and Graepel, T. (2017, January 8\u201312). Multi-agent Reinforcement Learning in Sequential Social Dilemmas. Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), Sao Paulo, Brazil."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1016\/j.jenvman.2004.02.007","article-title":"A design and application of a multi-agent system for simulation of multi-actor spatial planning","volume":"72","author":"Ligtenberg","year":"2004","journal-title":"J. Environ. Manag."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"659","DOI":"10.1613\/jair.4818","article-title":"Evolutionary Dynamics of Multi-Agent Learning: A Survey","volume":"53","author":"Bloembergen","year":"2015","journal-title":"J. Artif. Intell. Res."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Littman, M.L. (1994, January 10\u201313). Markov games as a framework for multi-agent reinforcement learning. Proceedings of the Eleventh International Conference on International Conference on Machine Learning, San Francisco, CA, USA.","DOI":"10.1016\/B978-1-55860-335-6.50027-1"},{"key":"ref_13","unstructured":"Littman, M.L. (2001, January 26). Friend-or-foe Q-learning in general-sum games. Proceedings of the Eighteenth International Conference on Machine Learning, San Francisco, CA, USA."},{"key":"ref_14","unstructured":"Puppala, S.N., and Gordin, S.M. (1998, January 4\u20139). Shared memory based cooperative coevolution. Proceedings of the 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360), Anchorage, AK, USA."},{"key":"ref_15","unstructured":"Mahmoud, S., Miles, S., and Luck, M. (2016, January 9\u201313). Cooperation emergence under resource-constrained peer punishment. Proceedings of the 2016 International Conference on Autonomous Agents Multiagent Systems, Richland, SC, Singapore."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1016\/S1389-0417(01)00015-8","article-title":"Value-function reinforcement learning in Markov games","volume":"1","author":"Littman","year":"2001","journal-title":"Cogn. Syst. Res."},{"key":"ref_17","first-page":"1","article-title":"An Approach to Intelligent Traffic Management System Using a Multi-agent System","volume":"16","author":"Hamidi","year":"2018","journal-title":"Int. J. Intell. Transp. Syst. Res."},{"key":"ref_18","unstructured":"Cruz, D.L., and Yu, W. (2014, January 5\u20138). Multi-agent path planning in unknown environment with reinforcement learning and neural network. Proceedings of the 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC), San Diego, CA, USA."},{"key":"ref_19","first-page":"53","article-title":"An algorithm of cooperative multiple satellites mission planning based on multi-agent reinforcement learning","volume":"33","author":"Wang","year":"2011","journal-title":"J. Natl. Univ. Def. Technol."},{"key":"ref_20","first-page":"1","article-title":"Multi-agent reinforcement learning based on local communication","volume":"22","author":"Zhang","year":"2018","journal-title":"Clust. Comput."},{"key":"ref_21","unstructured":"Zawadzki, E., Lipson, A., and Leyton-Brown, K. (2014). Empirically evaluating multiagent learning algorithms. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.artint.2016.03.005","article-title":"Algorithms for computing strategies in two-player simultaneous move games","volume":"237","author":"Lanctot","year":"2016","journal-title":"Artif. Intell."},{"key":"ref_23","unstructured":"Heris, S., Kalami, M., Mohammad-Bagher, S., and Naser, P. (2009). Using Control Theory for Analysis of Reinforcement Learning and Optimal Policy Properties in Grid-World Problems. International Conference on Intelligent Computing (ICIC\u201909), Springer."},{"key":"ref_24","unstructured":"Gheorghe, C., and Doina, P. (2010, January 10\u201314). Optimal policy switching algorithms for reinforcement learning. Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010), Toronto, ON, Canada."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/22\/9\/955\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:04:44Z","timestamp":1760177084000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/22\/9\/955"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,30]]},"references-count":24,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2020,9]]}},"alternative-id":["e22090955"],"URL":"https:\/\/doi.org\/10.3390\/e22090955","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2020,8,30]]}}}