{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,3]],"date-time":"2026-07-03T02:40:19Z","timestamp":1783046419226,"version":"3.54.6"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"4","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62372243)"],"award-info":[{"award-number":["62372243)"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2025,8,31]]},"abstract":"<jats:p>Achieving joint goals efficiently in complex real-world tasks demands effective collaboration among multiple agents. Multi-Agent Reinforcement Learning (MARL) faces two interrelated challenges: limited exploration leads to early convergence on suboptimal behaviors, which in turn exacerbates non-stationarity under partial observability. To address these issues, we propose a novel framework, Spatio-Temporal Multi-agent Population Evolution (STPE-MARL). By integrating Evolutionary Algorithms (EAs) with MARL, our method enhances exploration diversity and facilitates global policy optimization. We further incorporate Graph Neural Networks (GNNs) to mitigate partial observability by encoding permutation symmetry through graph-based message passing. Two GNN-based training modes, Graph Relation and Graph Decomposition, are introduced to extend agents\u2019 receptive fields and capture spatio-temporal dependencies through time-series trajectory sampling. We evaluate STPE-MARL in two complex environments: micromanagement tasks in StarCraft II and large-scale traffic simulations in SUMO (Simulation of Urban MObility). Experimental results demonstrate that STPE-MARL significantly improves policy convergence and outperforms baseline methods, highlighting the complementary roles of EAs in exploration and GNNs in addressing observation limitations.<\/jats:p>","DOI":"10.1145\/3742479","type":"journal-article","created":{"date-parts":[[2025,6,2]],"date-time":"2025-06-02T11:49:26Z","timestamp":1748864966000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["STPE-MARL: Spatio-Temporal Multi-Agent Population Evolution Reinforcement Learning"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2712-9088","authenticated-orcid":false,"given":"Kexing","family":"Peng","sequence":"first","affiliation":[{"name":"School of Computer Science, Nanjing University of Information Science and Technology, Nanjing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-6757-1018","authenticated-orcid":false,"given":"Shihao","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Software, Nanjing University of Information Science and Technology, Nanjing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2320-1692","authenticated-orcid":false,"given":"Tinghuai","family":"Ma","sequence":"additional","affiliation":[{"name":"School of Computer Engineering, Jiangsu Ocean University, Lianyungang, China and School of Software, Nanjing University of Information Science and Technology, Nanjing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,7,22]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5728"},{"key":"e_1_3_1_3_2","volume-title":"Advances in Neural Information Processing Systems (NIPS)","volume":"32","author":"Du Yali","year":"2019","unstructured":"Yali Du, Lei Han, Meng Fang, Ji Liu, Tianhong Dai, and Dacheng Tao. 2019. LIIR: Learning individual intrinsic reward in multi-agent reinforcement learning. In Advances in Neural Information Processing Systems (NIPS), Vol. 32."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN54540.2023.10191313"},{"issue":"102411","key":"e_1_3_1_5_2","first-page":"1","article-title":"Large-scale group hierarchical DEMATEL method with automatic consensus reaching","volume":"108","author":"Du Yuan-Wei","year":"2024","unstructured":"Yuan-Wei Du and Xin-Lu Shen. 2024. Large-scale group hierarchical DEMATEL method with automatic consensus reaching. Information Fusion 108 (2024), 102411, 1\u201326.","journal-title":"Information Fusion"},{"key":"e_1_3_1_6_2","first-page":"DOI: 10.1109\/TN","article-title":"Inferring latent temporal sparse coordination graph for multi-agent reinforcement learning","author":"Duan Wei","year":"2024","unstructured":"Wei Duan, Jie Lu, and Junyu Xuan. 2024. Inferring latent temporal sparse coordination graph for multi-agent reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems (2024). DOI: 10.1109\/TNNLS.2024.3513402","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"e_1_3_1_7_2","first-page":"37567","article-title":"SMACv2: An improved benchmark for cooperative multi-agent reinforcement learning","volume":"36","author":"Ellis Benjamin","year":"2024","unstructured":"Benjamin Ellis, Jonathan Cook, Skander Moalla, Mikayel Samvelyan, Mingfei Sun, Anuj Mahajan, Jakob Foerster, and Shimon Whiteson. 2024. SMACv2: An improved benchmark for cooperative multi-agent reinforcement learning. In\u00a0Advances in Neural Information Processing Systems (NIPS), Vol. 36, 37567\u201337593.","journal-title":"Advances in Neural Information Processing Systems (NIPS)"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/CDC49753.2023.10384223"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2024.3407760"},{"key":"e_1_3_1_10_2","first-page":"12790","volume-title":"40th International Conference on Machine Learning (ICML), Vol","volume":"202","author":"He Jiafan","year":"2023","unstructured":"Jiafan He, Heyang Zhao, Dongruo Zhou, and Quanquan Gu. 2023. Nearly minimax optimal reinforcement learning for linear Markov decision processes. In 40th International Conference on Machine Learning (ICML), Vol. 202, 12790\u201312822."},{"key":"e_1_3_1_11_2","volume-title":"11th International Conference on Learning Representations (ICLR)","author":"Hao Jianye","year":"2023","unstructured":"Jianye Hao, Xiaotian Hao, Hangyu Mao, Weixun Wang, Yaodong Yang, Dong Li, Yan Zheng, and Zhen Wang. 2023. Boosting multiagent reinforcement learning via permutation invariant and permutation equivariant networks. In 11th International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_1_12_2","volume-title":"11th International Conference on Learning Representations (ICLR)","author":"Hao Jianye","year":"2023","unstructured":"Jianye Hao, Pengyi Li, Hongyao Tang, Yan Zheng, Xian Fu, and Zhaopeng Meng. 2023. ERL-Re2: Efficient evolutionary reinforcement learning with shared state representation and individual policy representation. In 11th International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_1_13_2","doi-asserted-by":"crossref","unstructured":"Bin Li Ziping Wei Jingjing Wu Shuai Yu Tian Zhang Chunli Zhu Dezhi Zheng Weisi Guo Chenglin Zhao and Jun Zhang. 2023. Machine learning-enabled globally guaranteed evolutionary computation. Nature Machine Intelligence 5 4 (2023) 457\u2013467.","DOI":"10.1038\/s42256-023-00642-4"},{"key":"e_1_3_1_14_2","first-page":"19490","volume-title":"40th International Conference on Machine Learning (ICML)","volume":"202","author":"Li Pengyi","year":"2023","unstructured":"Pengyi Li, Jianye Hao, Hongyao Tang, Yan Zheng, and Xian Fu. 2023. RACE: Improve multi-agent reinforcement learning with representation asymmetry and collaborative evolution. In 40th International Conference on Machine Learning (ICML), Vol. 202, 19490\u201319503."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.5555\/3635637.3662972"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2024.112124"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6211"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ITSC.2018.8569938"},{"key":"e_1_3_1_19_2","first-page":"1","article-title":"Multi-agent actor-critic for mixed cooperative-competitive environments","volume":"30","author":"Lowe Ryan","year":"2017","unstructured":"Ryan Lowe, Yi. I. Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems (NIPS), Vol.\u00a030, 1\u201312.","journal-title":"Advances in Neural Information Processing Systems (NIPS)"},{"key":"e_1_3_1_20_2","doi-asserted-by":"crossref","unstructured":"Amjad Yousef Majid Serge Saaybi Vincent Francois-Lavet R. Venkatesha Prasad and Chris Verhoeven. 2023. Deep reinforcement learning versus evolution strategies: A comparative survey. IEEE Transactions on Neural Networks and Learning Systems 35 9 (2023) 11939\u201311957.","DOI":"10.1109\/TNNLS.2023.3264540"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.5555\/3463952.3464065"},{"key":"e_1_3_1_22_2","first-page":"2681","volume-title":"International Conference on Machine Learning","author":"Omidshafiei Shayegan","year":"2017","unstructured":"Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P. How, and John Vian. 2017. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In International Conference on Machine Learning. PMLR, 2681\u20132690."},{"key":"e_1_3_1_23_2","first-page":"12208","volume-title":"Advances in Neural Information Processing Systems (NIPS)","volume":"34","author":"Peng Bei","year":"2021","unstructured":"Bei Peng, Tabish Rashid, Christian Schroeder de Witt, Pierre-Alexandre Kamienny, Philip Torr, Wendelin B\u00f6hmer, and Shimon Whiteson. 2021. FACMAC: Factored multi-agent centralised policy gradients. In Advances in Neural Information Processing Systems (NIPS), Vol. 34, 12208\u201312221."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2024.3453892"},{"key":"e_1_3_1_25_2","volume-title":"7th International Conference on Learning Representations (ICLR)","author":"Pourchot Alois","year":"2019","unstructured":"Alois Pourchot and Olivier Sigaud. 2019. CEM-RL: Combining evolutionary and gradient-based methods for policy search. In 7th International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_1_26_2","doi-asserted-by":"crossref","unstructured":"Rafael Figueiredo Prudencio Marcos R. O. A. Maximo and Esther Luna Colombini. 2023. A survey on offline reinforcement learning: Taxonomy review and open problems. IEEE Transactions on Neural Networks and Learning Systems 35 8 (2023) 10237\u201310257.","DOI":"10.1109\/TNNLS.2023.3250269"},{"key":"e_1_3_1_27_2","unstructured":"Tabish Rashid Mikayel Samvelyan Christian Schroeder De Witt Gregory Farquhar Jakob Foerster and Shimon Whiteson. 2020. Monotonic value function factorisation for deep multi-agent reinforcement learning. Journal of Machine Learning Research 21 178 (2020) 1\u201351."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-023-42257-0"},{"key":"e_1_3_1_29_2","first-page":"5887","volume-title":"36th International Conference on Machine Learning (ICML)","volume":"97","author":"Son Kyunghwan","year":"2019","unstructured":"Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Earl Hostallero, and Yung Yi. 2019. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In 36th International Conference on Machine Learning (ICML), Vol. 97, 5887\u20135896."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1038\/s43588-023-00509-z"},{"key":"e_1_3_1_31_2","volume-title":"10th International Conference on Learning Representations (ICLR)","author":"van der Pol Elise","year":"2022","unstructured":"Elise van der Pol, Herke van Hoof, Frans A. Oliehoek, and Max Welling. 2022. Multi-agent MDP homomorphic networks. In 10th International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_1_32_2","first-page":"4199","article-title":"MDP homomorphic networks: Group symmetries in reinforcement learning","volume":"33","author":"Van der Pol Elise","year":"2020","unstructured":"Elise Van der Pol, Daniel Worrall, Herke van Hoof, Frans Oliehoek, and Max Welling. 2020. MDP homomorphic networks: Group symmetries in reinforcement learning. In Advances in Neural Information Processing Systems (NIPS), Vol. 33, 4199\u20134210.","journal-title":"Advances in Neural Information Processing Systems (NIPS)"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i11.17203"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-019-1724-z"},{"key":"e_1_3_1_35_2","first-page":"9876","volume-title":"37th International Conference on Machine Learning (ICML)","volume":"119","author":"Wang Tonghan","year":"2020","unstructured":"Tonghan Wang, Heng Dong, Victor Lesser, and Chongjie Zhang. 2020. ROMA: Multi-agent reinforcement learning with emergent roles. In 37th International Conference on Machine Learning (ICML), Vol. 119, 9876\u20139886."},{"key":"e_1_3_1_36_2","first-page":"1","volume-title":"9th International Conference on Learning Representations (ICLR)","author":"Wang T.","year":"2021","unstructured":"T. Wang, T. Gupta, B. Peng, A. Mahajan, S. Whiteson, and C. Zhang. 2021. RODE: Learning roles to decompose multi- agent tasks. In 9th International Conference on Learning Representations (ICLR), 1\u201324."},{"key":"e_1_3_1_37_2","first-page":"427","volume-title":"22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS)","author":"Wang Xuefeng","year":"2023","unstructured":"Xuefeng Wang, Xinran Li, Jiawei Shao, and Zhang Jun. 2023. AC2C: Adaptively controlled two-hop communication for multi-agent reinforcement learning. In 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 427\u2013435."},{"key":"e_1_3_1_38_2","first-page":"1","volume-title":"8th International Conference on Learning Representations (ICLR)","author":"Wang Yihan","year":"2020","unstructured":"Yihan Wang, Beining Han, Tonghan Wang, Heng Dong, and Chongjie Zhang. 2020. Dop: Off-policy multi-agent decomposed policy gradients. In 8th International Conference on Learning Representations (ICLR), 1\u201324."},{"key":"e_1_3_1_39_2","first-page":"35084","article-title":"Demystifying oversmoothing in attention-based graph neural networks","volume":"36","author":"Wu Xinyi","year":"2024","unstructured":"Xinyi Wu, Amir Ajorlou, Zihui Wu, and Ali Jadbabaie. 2024. Demystifying oversmoothing in attention-based graph neural networks. In Advances in Neural Information Processing Systems (NIPS), Vol. 36, 35084\u201335106.","journal-title":"Advances in Neural Information Processing Systems (NIPS)"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"e_1_3_1_41_2","first-page":"1751","article-title":"Selective experience replay compression using coresets for lifelong deep reinforcement learning in medical imaging","volume":"227","author":"Zheng Guangyao","year":"2024","unstructured":"Guangyao Zheng, Samson Zhou, Vladimir Braverman, Michael A. Jacobs, and Vishwa Sanjay Parekh. 2024. Selective experience replay compression using coresets for lifelong deep reinforcement learning in medical imaging. Medical Imaging with Deep Learning 227 (2024), 1751\u20131764.","journal-title":"Medical Imaging with Deep Learning"},{"key":"e_1_3_1_42_2","unstructured":"Ming Zhou Ziyu Wan Hanjing Wang Muning Wen Runzhe Wu Ying Wen Yaodong Yang Yong Yu Jun Wang and Weinan Zhang. 2023. MALib: A parallel framework for population-based multi-agent reinforcement learning. Journal of Machine Learning Research 24 150 (2023) 1\u201312."},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2023.126628"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3742479","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,22]],"date-time":"2025-07-22T23:23:47Z","timestamp":1753226627000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3742479"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,22]]},"references-count":42,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,8,31]]}},"alternative-id":["10.1145\/3742479"],"URL":"https:\/\/doi.org\/10.1145\/3742479","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"value":"2157-6904","type":"print"},{"value":"2157-6912","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,22]]},"assertion":[{"value":"2025-01-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-24","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}