{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,28]],"date-time":"2026-05-28T09:06:43Z","timestamp":1779959203661,"version":"3.53.1"},"reference-count":36,"publisher":"Wiley","issue":"3","license":[{"start":{"date-parts":[[2026,5,28]],"date-time":"2026-05-28T00:00:00Z","timestamp":1779926400000},"content-version":"vor","delay-in-days":27,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"},{"start":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T00:00:00Z","timestamp":1777593600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/doi.wiley.com\/10.1002\/tdm_license_1.1"}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Computer Animation &amp;amp; Virtual"],"published-print":{"date-parts":[[2026,5]]},"abstract":"<jats:title>ABSTRACT<\/jats:title>\n                  <jats:p>Multi\u2010agent reinforcement learning (MARL) often suffers from low sample efficiency and limited behavioral diversity, leading to policy homogenization, insufficient exploration, and reduced robustness. To address these challenges, we propose MADECM, a curiosity\u2010augmented evolutionary framework built upon MADDPG that integrates curiosity\u2010driven updates with evolutionary quality\u2010diversity optimization. MADECM employs random network distillation (RND) to estimate the novelty of each agent's local observations and uses the resulting novelty signal to dynamically allocate additional update frequencies, thereby emphasizing exploration\u2010relevant experience during training. In addition, MADECM combines population\u2010based diversification with a quality\u2010diversity (QD) archive through a staged optimization procedure, enabling the joint improvement of task return and policy diversity. We evaluate MADECM on the multi\u2010agent particle environment (MPE), including Spread and Reference, which capture cooperative and partially observable dynamics, and on google research football (GRF), which emphasizes long\u2010horizon sequential decision\u2010making. Results show that MADECM consistently outperforms strong MADDPG\u2010based baselines. The modular design of MADECM, consisting of RND\u2010based novelty estimation and staged QD optimization, further supports consistent generalization across these structurally distinct environments without task\u2010specific hyperparameter tuning.<\/jats:p>","DOI":"10.1002\/cav.70121","type":"journal-article","created":{"date-parts":[[2026,5,28]],"date-time":"2026-05-28T08:37:48Z","timestamp":1779957468000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["MADECM: A Curiosity\u2010Augmented Evolutionary Algorithm for Multi\u2010Agent Policy Diversity Optimization"],"prefix":"10.1002","volume":"37","author":[{"given":"Jianyang","family":"Wu","sequence":"first","affiliation":[{"name":"Dalian University  Dalian China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yv","family":"Fu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Social Computing and Cognitive Intelligence, Ministry of Education Dalian University of Technology  Dalian China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-5643-3892","authenticated-orcid":false,"given":"Xinning","family":"Wang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Social Computing and Cognitive Intelligence, Ministry of Education Dalian University of Technology  Dalian China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xin","family":"Yang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Social Computing and Cognitive Intelligence, Ministry of Education Dalian University of Technology  Dalian China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"311","published-online":{"date-parts":[[2026,5,28]]},"reference":[{"issue":"11","key":"e_1_2_9_2_1","doi-asserted-by":"crossref","DOI":"10.3390\/app11114948","article-title":"Multi\u2010Agent Reinforcement Learning: A Review of Challenges and Applications","volume":"11","author":"Canese L.","year":"2021","journal-title":"Applied Sciences"},{"key":"e_1_2_9_3_1","doi-asserted-by":"crossref","first-page":"12353","DOI":"10.1109\/ICRA57147.2024.10611322","volume-title":"2024 IEEE International Conference on Robotics and Automation (ICRA)","author":"Wang W.","year":"2024"},{"key":"e_1_2_9_4_1","doi-asserted-by":"crossref","first-page":"8765","DOI":"10.1109\/ICRA46639.2022.9811626","volume-title":"2022 International Conference on Robotics and Automation (ICRA)","author":"Han S.","year":"2022"},{"key":"e_1_2_9_5_1","unstructured":"H.Jia Y.Hu Y.Chen et al. \u201cFever Basketball: A Complex Flexible and Asynchronized Sports Game Environment for Multi\u2010Agent Reinforcement Learning\u201d arXiv preprint arXiv:2012.03204."},{"key":"e_1_2_9_6_1","unstructured":"T.Wang H.Dong V.Lesser andC.Zhang \u201cRoma: Multi\u2010agent reinforcement learning with emergent roles\u201d arXiv preprint arXiv:2003.08039."},{"key":"e_1_2_9_7_1","first-page":"16829","volume-title":"International Conference on Machine Learning","author":"Kim W.","year":"2023"},{"key":"e_1_2_9_8_1","first-page":"3991","article-title":"Celebrating Diversity in Shared Multi\u2010Agent Reinforcement Learning","volume":"34","author":"Li C.","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_9_9_1","first-page":"1","volume-title":"2024 International Joint Conference on Neural Networks (IJCNN)","author":"Yang K.","year":"2024"},{"key":"e_1_2_9_10_1","unstructured":"M.Jaderberg V.Dalibard S.Osindero et al. \u201cPopulation Based Training of Neural Networks\u201d arXiv preprint arXiv:1711.09846."},{"key":"e_1_2_9_11_1","doi-asserted-by":"crossref","first-page":"866","DOI":"10.1145\/3449639.3459304","volume-title":"Proceedings of the Genetic and Evolutionary Computation Conference","author":"Nilsson O.","year":"2021"},{"key":"e_1_2_9_12_1","unstructured":"T.Wang T.Gupta A.Mahajan B.Peng S.Whiteson andC.Zhang \u201cRode: Learning Roles to Decompose Multi\u2010Agent Tasks\u201d arXiv preprint arXiv:2010.01523."},{"key":"e_1_2_9_13_1","unstructured":"W.Wang T.Yang Y.Liu et al. \u201cAction Semantics Network: Considering the Effects of Actions in Multiagent Systems\u201d arXiv preprint arXiv:1907.11461."},{"key":"e_1_2_9_14_1","unstructured":"Y.Yang J.Hao B.Liao et al. \u201cQatten: A General Framework for Cooperative Multiagent Reinforcement Learning\u201d arXiv preprint arXiv:2002.03939."},{"key":"e_1_2_9_15_1","first-page":"8398","article-title":"Learning to Play With Intrinsically\u2010Motivated, Self\u2010Aware Agents","volume":"31","author":"Haber N.","year":"2018","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_9_16_1","unstructured":"B.Eysenbach A.Gupta J.Ibarz andS.Levine \u201cDiversity is All You Need: Learning Skills Without a Reward Function\u201d arXiv preprint arXiv:1802.06070."},{"key":"e_1_2_9_17_1","first-page":"7611","article-title":"Maven: Multi\u2010Agent Variational Exploration","volume":"32","author":"Mahajan A.","year":"2019","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_9_18_1","first-page":"1","volume-title":"2024 International Joint Conference on Neural Networks (IJCNN)","author":"Tao J.","year":"2024"},{"key":"e_1_2_9_19_1","first-page":"15930","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Chen Y.","year":"2025"},{"key":"e_1_2_9_20_1","unstructured":"F.Chalumeau R.Boige B.Lim et al. \u201cNeuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery\u201d arXiv preprint arXiv:2210.03516."},{"key":"e_1_2_9_21_1","unstructured":"K.Yang J.Tao J.Lyu andX.Li \u201cExploration and Anti\u2010Exploration With Distributional Random Network Distillation\u201d arXiv preprint arXiv:2401.09750."},{"key":"e_1_2_9_22_1","unstructured":"Y.Burda H.Edwards A.Storkey andO.Klimov \u201cExploration By Random Network Distillation\u201d arXiv preprint arXiv:1810.12894."},{"key":"e_1_2_9_23_1","first-page":"1928","volume-title":"Proceedings of the 33nd International Conference on Machine Learning","author":"Mnih V.","year":"2016"},{"key":"e_1_2_9_24_1","first-page":"1861","volume-title":"International Conference on Machine Learning","author":"Haarnoja T.","year":"2018"},{"key":"e_1_2_9_25_1","first-page":"10510","article-title":"Diversity\u2010Driven Exploration Strategy for Deep Reinforcement Learning","volume":"31","author":"Hong Z.\u2010W.","year":"2018","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_9_26_1","volume-title":"8th International Conference on Learning Representations","author":"Jung W.","year":"2020"},{"key":"e_1_2_9_27_1","first-page":"18050","article-title":"Effective Diversity in Population Based Reinforcement Learning","volume":"33","author":"Parker\u2010Holder J.","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_9_28_1","volume-title":"International Conference on Learning Representations","author":"Wang Y.","year":"2021"},{"key":"e_1_2_9_29_1","unstructured":"J.\u2010B.MouretandJ.Clune \u201cIlluminating Search Spaces by Mapping Elites\u201d arXiv preprint arXiv:1504.04909."},{"key":"e_1_2_9_30_1","first-page":"1582","volume-title":"Proceedings of the 35th International Conference on Machine Learning","author":"Fujimoto S.","year":"2018"},{"key":"e_1_2_9_31_1","doi-asserted-by":"crossref","first-page":"1075","DOI":"10.1145\/3512290.3528845","volume-title":"Proceedings of the Genetic and Evolutionary Computation Conference","author":"Pierrot T.","year":"2022"},{"key":"e_1_2_9_32_1","first-page":"6382","article-title":"Multi\u2010Agent Actor\u2010Critic for Mixed Cooperative\u2010Competitive Environments","volume":"30","author":"Lowe R.","year":"2017","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_9_33_1","first-page":"7445","volume-title":"Proceedings of the 37th International Conference on Machine Learning","author":"Pacchiano A.","year":"2020"},{"issue":"3","key":"e_1_2_9_34_1","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1093\/biomet\/25.3-4.285","article-title":"On the Likelihood That One Unknown Probability Exceeds Another in View of the Evidence of Two Samples","volume":"25","author":"Thompson W. R.","year":"1933","journal-title":"Biometrika"},{"issue":"2","key":"e_1_2_9_35_1","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1023\/A:1013689704352","article-title":"Finite\u2010Time Analysis of the Multiarmed Bandit Problem","volume":"47","author":"Auer P.","year":"2002","journal-title":"Machine Learning"},{"key":"e_1_2_9_36_1","unstructured":"M.Arjovsky S.Chintala andL.Bottou \u201cWasserstein GAN\u201d arXiv preprint arXiv:1701.07875."},{"issue":"32","key":"e_1_2_9_37_1","first-page":"1","article-title":"Heterogeneous\u2010Agent Reinforcement Learning","volume":"25","author":"Zhong Y.","year":"2024","journal-title":"Journal of Machine Learning Research"}],"container-title":["Computer Animation and Virtual Worlds"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cav.70121","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/full-xml\/10.1002\/cav.70121","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cav.70121","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,28]],"date-time":"2026-05-28T08:37:57Z","timestamp":1779957477000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/cav.70121"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,5]]},"references-count":36,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,5]]}},"alternative-id":["10.1002\/cav.70121"],"URL":"https:\/\/doi.org\/10.1002\/cav.70121","archive":["Portico"],"relation":{},"ISSN":["1546-4261","1546-427X"],"issn-type":[{"value":"1546-4261","type":"print"},{"value":"1546-427X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,5]]},"assertion":[{"value":"2026-04-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-05-11","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-05-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"e70121"}}