{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,19]],"date-time":"2025-11-19T07:06:05Z","timestamp":1763535965463,"version":"3.37.3"},"reference-count":24,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2022,4,18]],"date-time":"2022-04-18T00:00:00Z","timestamp":1650240000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61806221"],"award-info":[{"award-number":["61806221"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,7,13]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>In order to solve the incalculability caused by the issue of inconsistent objective functions in multi-agent deep reinforcement learning, the concept of Nash equilibrium is introduced. However, a Marko game may have multiple equilibriums, how to filter out a stable and optimal one is worth studying. Besides solution concept, how to keep the balance between exploration and exploitation is another key issue in reinforcement learning. On basis of the methods, which can converge to Nash equilibrium, this paper makes improvement through Pareto optimization. In order to alleviate the problem of over fitting caused by Pareto optimization and non-convergence caused by strategy change, we use stratified sampling in place of random sampling as assistance. What\u2019s more, our methods are trained through fictitious self-play to make full of self-learning experiences. By analyzing the experiment carried out on MAgent platform, the proposed methods are not only far better than traditional methods, but also reaching or even surpassing the state of art MADRL methods.<\/jats:p>","DOI":"10.1093\/comjnl\/bxac027","type":"journal-article","created":{"date-parts":[[2022,4,14]],"date-time":"2022-04-14T11:13:14Z","timestamp":1649934794000},"page":"1573-1585","source":"Crossref","is-referenced-by-count":1,"title":["Improvement of MADRL Equilibrium Based on Pareto Optimization"],"prefix":"10.1093","volume":"66","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6952-5337","authenticated-orcid":false,"given":"Zhiruo","family":"Zhao","sequence":"first","affiliation":[{"name":"Command & Control Engineering College , , Nanjing, CO 210000 China"},{"name":"Army Engineering University of PLA , , Nanjing, CO 210000 China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6997-8504","authenticated-orcid":false,"given":"Lei","family":"Cao","sequence":"additional","affiliation":[{"name":"Command & Control Engineering College , , Nanjing, CO 210000 China"},{"name":"Army Engineering University of PLA , , Nanjing, CO 210000 China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5198-0932","authenticated-orcid":false,"given":"Xiliang","family":"Chen","sequence":"additional","affiliation":[{"name":"Command & Control Engineering College , , Nanjing, CO 210000 China"},{"name":"Army Engineering University of PLA , , Nanjing, CO 210000 China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3608-4985","authenticated-orcid":false,"given":"Jun","family":"Lai","sequence":"additional","affiliation":[{"name":"Command & Control Engineering College , , Nanjing, CO 210000 China"},{"name":"Army Engineering University of PLA , , Nanjing, CO 210000 China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2768-7044","authenticated-orcid":false,"given":"Legui","family":"Zhang","sequence":"additional","affiliation":[{"name":"Command & Control Engineering College , , Nanjing, CO 210000 China"},{"name":"Army Engineering University of PLA , , Nanjing, CO 210000 China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,4,18]]},"reference":[{"key":"2023071709111566700_ref1","doi-asserted-by":"crossref","DOI":"10.1016\/B978-1-55860-335-6.50027-1","volume-title":"Markov Games as a Framework for Multi Agent Reinforcement Learning","author":"Littman","year":"1994"},{"volume-title":"The 15th International Conference on Machine Learning","year":"1998","author":"Hu","key":"2023071709111566700_ref2"},{"key":"2023071709111566700_ref3","article-title":"Equilibrium in a stochastic N-person game","author":"Fink","year":"1964","journal-title":"Journalof Science of the Hiroshima University"},{"volume-title":"The 32nd International Conference on Machine Learning","year":"2015","author":"Heinrich","key":"2023071709111566700_ref4"},{"key":"2023071709111566700_ref5","first-page":"572","article-title":"Brown's original fictitious play","volume":"135","author":"Berger","year":"2005","journal-title":"Game Theory Inf., 2005"},{"volume-title":"The 35th International Conference on Machine Learning","year":"2018","author":"Yang","key":"2023071709111566700_ref6"},{"key":"2023071709111566700_ref7","doi-asserted-by":"crossref","first-page":"387","DOI":"10.1006\/game.1993.1023","article-title":"The statistical mechanics of strategic interaction","volume":"5","author":"Blume","year":"1993","journal-title":"Games Econ. Action"},{"key":"2023071709111566700_ref8","first-page":"927","article-title":"Phase transitions and critical phenomena","volume":"40","author":"Stanley","year":"1979","journal-title":"Am. J. Phys. 1979"},{"volume-title":"The 32nd AAAI Conference on Artificial Intelligence","year":"2018","author":"Zheng","key":"2023071709111566700_ref9"},{"key":"2023071709111566700_ref10","first-page":"1039","article-title":"Nash Q-learning for general-sum stochastic games","volume":"4","author":"Hu","year":"2003","journal-title":"J. Mach. Learn. Res."},{"volume-title":"The 18th International Conference on Machine Learning","year":"2001","author":"Littman","key":"2023071709111566700_ref11"},{"volume-title":"The 6th International Conference on Learning Representations","year":"2018","author":"Plappert","key":"2023071709111566700_ref12"},{"volume-title":"The 6th International Conference on Learning Representations","year":"2018","author":"Fortunato","key":"2023071709111566700_ref13"},{"volume-title":"The 7th International Conference on Learning Representations","year":"2019","author":"Choi","key":"2023071709111566700_ref14"},{"volume-title":"The 34th International Conference on Machine Learning","year":"2017","author":"Ostrovski","key":"2023071709111566700_ref15"},{"key":"2023071709111566700_ref16","doi-asserted-by":"publisher","DOI":"10.1109\/ISAP.2005.1599245","volume-title":"Proceedings of the 13th International Conference on. IEEE","author":"Ngatchou","year":"2005"},{"volume-title":"Proceedings of the 13th ACM Conference on Recommender Systems, RecSys 2019","year":"2019","author":"Lin","key":"2023071709111566700_ref17"},{"volume-title":"Games and Information: An Introduction to Game Theory","year":"2001","author":"Rasmusen","key":"2023071709111566700_ref18"},{"key":"2023071709111566700_ref19","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.95.17.9724","article-title":"Cooperation and self-interest: Pareto-inefficiency of Nash equilibria in finite random games","author":"Cohen","year":"1998","journal-title":"Proceedings of the National Academy of Sciences of the United States of America"},{"key":"2023071709111566700_ref20","doi-asserted-by":"crossref","DOI":"10.1007\/s00224-012-9433-0","article-title":"On the complexity of Pareto-optimal Nash and strong equilibria","author":"Hoefer","year":"2013","journal-title":"Theory Comput. Syst., 2013"},{"key":"2023071709111566700_ref21","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Volodymyr","year":"2015","journal-title":"Nature"},{"volume-title":"Multi Agent Actor-Critic Reinforcement Learning for Cooperative Tasks","year":"2014","author":"Bayiz","key":"2023071709111566700_ref22"},{"volume-title":"The 31st Conference on Neural Information Processing Systems","year":"2017","author":"Lowe","key":"2023071709111566700_ref23"},{"key":"2023071709111566700_ref24","article-title":"The surprising effectiveness of MAPPO in cooperative","author":"Yu","year":"2021","journal-title":"Multi Agent Games"}],"container-title":["The Computer Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/66\/7\/1573\/50876291\/bxac027.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/66\/7\/1573\/50876291\/bxac027.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,17]],"date-time":"2023-07-17T09:16:30Z","timestamp":1689585390000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/comjnl\/article\/66\/7\/1573\/6567701"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,18]]},"references-count":24,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,4,18]]},"published-print":{"date-parts":[[2023,7,13]]}},"URL":"https:\/\/doi.org\/10.1093\/comjnl\/bxac027","relation":{},"ISSN":["0010-4620","1460-2067"],"issn-type":[{"type":"print","value":"0010-4620"},{"type":"electronic","value":"1460-2067"}],"subject":[],"published-other":{"date-parts":[[2023,7]]},"published":{"date-parts":[[2022,4,18]]}}}