{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,26]],"date-time":"2025-11-26T11:02:50Z","timestamp":1764154970704,"version":"3.46.0"},"reference-count":60,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:00:00Z","timestamp":1750291200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,11,13]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Multi-agent reinforcement learning has been widely applied in solving various sequential decision-making problems in recent years. However, a key challenge is sparse rewards, where agents receive meaningful reward signals only upon task completion. This issue leads to inefficient exploration and slow learning progress. To address this problem, we propose an excellent-student learning method supported by a decentralized distributed learning paradigm, drawing inspiration from academically diverse classrooms. Specifically, we design a novel excellent-student learning model, which suggests that agents mimic the learning behaviors of excellent students. This model requires agents to share knowledge with other agents and engage in individual exploration driven by curiosity. Next, to foster collaborative team learning behaviors, similarity measurement techniques are integrated to enhance knowledge sharing among agents. An intrinsic reward function is designed, combining individual exploration with whole-class sharing, providing additional motivation for discovering new actions and states. This reward function is seamlessly incorporated into the policy learning process. Finally, experiments conducted in various multi-agent particle environments demonstrate significant improvements in training efficiency and stability.<\/jats:p>","DOI":"10.1093\/comjnl\/bxaf076","type":"journal-article","created":{"date-parts":[[2025,5,23]],"date-time":"2025-05-23T09:41:48Z","timestamp":1747993308000},"page":"1813-1826","source":"Crossref","is-referenced-by-count":0,"title":["Excellent-student learning method for decentralized MARL with networked agents system"],"prefix":"10.1093","volume":"68","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3414-3328","authenticated-orcid":false,"given":"Yang","family":"Chen","sequence":"first","affiliation":[{"name":"School of Computer Science , Peking University, No. 5 Yiheyuan Road, Haidian District, Beijing 100871,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8112-371X","authenticated-orcid":false,"given":"Dianxi","family":"Shi","sequence":"additional","affiliation":[{"name":"Tianjin Artificial Intelligence Innovation Center , No. 19 Xinhuan West Road, Binhai New Area, Tianjin 300457,","place":["China"]},{"name":"Intelligent Game and Decision Lab , Academy of Military Sciences, No. 53 Fengtai East Street Compound, Fengtai District, Beijing 100071,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9632-2543","authenticated-orcid":false,"given":"Huanhuan","family":"Yang","sequence":"additional","affiliation":[{"name":"The College of Computer , National University of Defense Technology, No. 109 Deya Road, Yuelu District, Changsha 410073,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3975-6852","authenticated-orcid":false,"given":"Tongyue","family":"Li","sequence":"additional","affiliation":[{"name":"Intelligent Game and Decision Lab , Academy of Military Sciences, No. 53 Fengtai East Street Compound, Fengtai District, Beijing 100071,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3291-0768","authenticated-orcid":false,"given":"Zhen","family":"Wang","sequence":"additional","affiliation":[{"name":"Intelligent Game and Decision Lab , Academy of Military Sciences, No. 53 Fengtai East Street Compound, Fengtai District, Beijing 100071,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"2025112605421558600_ref1","doi-asserted-by":"publisher","first-page":"487","DOI":"10.1016\/j.jmsy.2021.07.015","article-title":"Optimizing task scheduling in human-robot collaboration with deep multi-agent reinforcement learning","volume":"60","author":"Yu","year":"2021","journal-title":"J Manuf Syst"},{"key":"2025112605421558600_ref2","doi-asserted-by":"crossref","DOI":"10.1609\/icaps.v33i1.27214","article-title":"Binary branching multi-objective conflict-based search for multi-agent path finding","volume-title":"Proceedings of the Thirty-Third International Conference on Automated Planning and Scheduling","author":"Ren","year":"2023"},{"article-title":"Multi-agent actor-critic for mixed cooperative-competitive environments","volume-title":"Advances in Neural Information Processing Systems 30 (NeurIPS 2017)","author":"Lowe","key":"2025112605421558600_ref3"},{"key":"2025112605421558600_ref4","doi-asserted-by":"publisher","first-page":"750","DOI":"10.1007\/s10458-019-09421-1","article-title":"A survey and critique of multiagent deep reinforcement learning","volume":"33","author":"Hernandez-Leal","year":"2019","journal-title":"Auton Agents Multi-Agent Syst"},{"article-title":"Value-decomposition networks for cooperative multi-agent learning","volume-title":"Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS '18)","author":"Sunehag","key":"2025112605421558600_ref5"},{"key":"2025112605421558600_ref6","doi-asserted-by":"publisher","first-page":"463","DOI":"10.1093\/comjnl\/bxac187","article-title":"Finding influencers in complex networks: an effective deep reinforcement learning approach","volume":"67","author":"Liu","year":"2024","journal-title":"Comput J"},{"article-title":"The surprising effectiveness of ppo in cooperative multi-agent games","volume-title":"Advances in Neural Information Processing Systems 35 (NeurIPS 2022)","author":"Yu","key":"2025112605421558600_ref7"},{"key":"2025112605421558600_ref8","doi-asserted-by":"publisher","first-page":"973","DOI":"10.1093\/comjnl\/bxab040","article-title":"A virtual network embedding algorithm based on double-layer reinforcement learning","volume":"64","author":"Li","year":"2021","journal-title":"Comput J"},{"key":"2025112605421558600_ref9","doi-asserted-by":"publisher","first-page":"650","DOI":"10.1177\/0278364913483345","article-title":"Decentralized multi-robot cooperation with auctioned pomdps","volume":"32","author":"Capitan","year":"2013","journal-title":"Int J Rob Res"},{"article-title":"Multi-Type Mean Field Reinforcement Learning","volume-title":"Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2020)","author":"Subramanian","key":"2025112605421558600_ref10"},{"journal-title":"Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2020)","article-title":"A new framework for multi-agent reinforcement learning\u2014centralized training and exploration with decentralized execution via policy distillation","author":"Chen","key":"2025112605421558600_ref11"},{"key":"2025112605421558600_ref12","doi-asserted-by":"publisher","first-page":"2317","DOI":"10.1093\/comjnl\/bxae008","article-title":"An intelligent security system using enhanced anomaly-based detection scheme","volume":"67","author":"Louati","year":"2024","journal-title":"Comput J"},{"key":"2025112605421558600_ref13","doi-asserted-by":"publisher","first-page":"1575","DOI":"10.1093\/comjnl\/bxab100","article-title":"A collaborative learning-based algorithm for task offloading in UAV-aided wireless sensor networks","volume":"64","author":"Al-Share","year":"2021","journal-title":"Comput J"},{"key":"2025112605421558600_ref14","first-page":"2778","article-title":"Curiosity-driven exploration by self-supervised prediction","volume-title":"International Conference on Machine Learning","author":"Pathak","year":"2017"},{"key":"2025112605421558600_ref15","article-title":"Mean-field multi-agent reinforcement learning: a decentralized network approach","volume":"50","author":"Gu","journal-title":"Mathematics of Operations Research"},{"key":"2025112605421558600_ref16","doi-asserted-by":"publisher","first-page":"5925","DOI":"10.1109\/TAC.2021.3049345","article-title":"Finite-sample analysis for decentralized batch multiagent reinforcement learning with networked agents","volume":"66","author":"Zhang","year":"2021","journal-title":"IEEE Trans Automat Contr"},{"key":"2025112605421558600_ref17","doi-asserted-by":"publisher","first-page":"472","DOI":"10.1016\/j.trc.2018.05.011","article-title":"Dynamic operations and pricing of electric unmanned aerial vehicle systems and power networks","volume":"92","author":"Zhang","year":"2018","journal-title":"Transp Res Part C: Emerging Technol"},{"key":"2025112605421558600_ref18","doi-asserted-by":"publisher","first-page":"802","DOI":"10.1631\/FITEE.1900661","article-title":"Decentralized multi-agent reinforcement learning with networked agents: Recent advances","volume":"22","author":"Zhang","year":"2021","journal-title":"Front Inf Technol Electron Eng"},{"volume-title":"Large-Scale Study of Curiosity-Driven Learning","author":"Burda","key":"2025112605421558600_ref19"},{"key":"2025112605421558600_ref20","doi-asserted-by":"publisher","first-page":"110032","DOI":"10.1016\/j.comnet.2023.110032","article-title":"Decentralized piggybacking-based dissemination of cooperative awareness messages in vehicular ad-hoc networks","volume":"236","author":"Xiao","year":"2023","journal-title":"Comput Netw"},{"key":"2025112605421558600_ref21","doi-asserted-by":"publisher","first-page":"109624","DOI":"10.1016\/j.comnet.2023.109624","article-title":"A decentralized adaptation of model-free q-learning for thermal-aware energy-efficient virtual machine placement in cloud data centers","volume":"224","author":"Aghasi","year":"2023","journal-title":"Comput Netw"},{"journal-title":"Advances in Neural Information Processing Systems (NeurIPS 2022)","article-title":"Exploration-guided reward shaping for reinforcement learning under sparse rewards","author":"","key":"2025112605421558600_ref22"},{"key":"2025112605421558600_ref23","doi-asserted-by":"publisher","DOI":"10.1145\/3206157.3206174","article-title":"Overview on deepmind and its AlphaGo Zero AI","volume-title":"Proceedings of the 2018 International Conference on Big Data and Education (ICBDE '18)","author":"Holcomb"},{"key":"2025112605421558600_ref24","doi-asserted-by":"publisher","first-page":"501","DOI":"10.1016\/j.jmsy.2020.07.001","article-title":"Utilization of a reinforcement learning algorithm for the accurate alignment of a robotic arm in a complete soft fabric shoe tongues automation process","volume":"56","author":"Tsai","year":"2020","journal-title":"J Manuf Syst"},{"volume-title":"How to Differentiate Instruction in Mixed-Ability Classrooms","author":"Tomlinson","key":"2025112605421558600_ref25"},{"key":"2025112605421558600_ref26","doi-asserted-by":"publisher","DOI":"10.4324\/9780203181522","volume-title":"Visible Learning for Teachers: Maximizing Impact on Learning","author":"Hattie","year":"2012"},{"key":"2025112605421558600_ref27","doi-asserted-by":"crossref","DOI":"10.1609\/icaps.v33i1.27191","article-title":"Model checking for adversarial multi-agent reinforcement learning with reactive defense methods","volume-title":"Proceedings of the 33rd International Conference on Automated Planning and Scheduling (ICAPS 2023)","author":"Gross","year":"2023"},{"key":"2025112605421558600_ref28","doi-asserted-by":"publisher","first-page":"402","DOI":"10.1609\/icaps.v33i1.27219","article-title":"Multi agent path finding under obstacle uncertainty","volume":"33","author":"Shofer","year":"2023","journal-title":"Proceedings of the International Conference on Automated Planning and Scheduling"},{"key":"2025112605421558600_ref29","doi-asserted-by":"publisher","first-page":"110019","DOI":"10.1016\/j.comnet.2023.110019","article-title":"CRLM: a cooperative model based on reinforcement learning and metaheuristic algorithms of routing protocols in wireless sensor networks","volume":"236","author":"Wang","year":"2023","journal-title":"Comput Netw"},{"key":"2025112605421558600_ref30","doi-asserted-by":"publisher","first-page":"1848","DOI":"10.1109\/TSP.2013.2241057","article-title":"QD-Learning: a collaborative distributed strategy for multiagent reinforcement learning through consensus + innovations","volume":"61","author":"Kar","year":"2013","journal-title":"IEEE Trans Signal Process"},{"article-title":"MANSA: learning fast and slow in multi-agent systems","volume-title":"Proceedings of the 40th International Conference on Machine Learning (ICML 2023)","author":"Mguni","key":"2025112605421558600_ref31"},{"key":"2025112605421558600_ref32","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v37i9.26288","article-title":"DeCOM: decomposed policy for constrained cooperative multi-agent reinforcement learning","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Yang","year":"2023"},{"key":"2025112605421558600_ref33","doi-asserted-by":"publisher","first-page":"566","DOI":"10.1109\/TNET.2023.3289172","article-title":"Ensuring threshold AoI for UAV-assisted mobile crowdsensing by multi-agent deep reinforcement learning with transformer","volume":"32","author":"Wang","year":"2024","journal-title":"IEEE\/ACM Trans Netw"},{"key":"2025112605421558600_ref34","doi-asserted-by":"publisher","first-page":"3663","DOI":"10.1109\/TAC.2019.2953089","article-title":"Renewal Monte Carlo: Renewal theory-based reinforcement learning","volume":"65","author":"Subramanian","year":"2019","journal-title":"IEEE Trans Automat Contr"},{"article-title":"Actor-attention-critic for multi-agent reinforcement learning","volume-title":"Proceedings of the 36th International Conference on Machine Learning (ICML 2019)","author":"Iqbal","key":"2025112605421558600_ref35"},{"article-title":"Decentralized online convex optimization in networked systems","volume-title":"Proceedings of the 39th International Conference on Machine Learning (ICML 2022)","author":"Lin","key":"2025112605421558600_ref36"},{"key":"2025112605421558600_ref37","first-page":"238","article-title":"Decentralized gossip-based stochastic bilevel optimization over communication networks","volume":"35","author":"Yang","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"article-title":"Mean field multi-agent reinforcement learning","volume-title":"Proceedings of the 35th International Conference on Machine Learning (ICML 2018)","author":"Yang","key":"2025112605421558600_ref38"},{"key":"2025112605421558600_ref39","first-page":"1","article-title":"On the approximation of cooperative heterogeneous multi-agent reinforcement learning (MARL) using mean field control (mfc)","volume":"23","author":"Mondal","year":"2022","journal-title":"J Mach Learn Res"},{"article-title":"Breaking the curse of many agents: provable mean embedding q-iteration for mean-field reinforcement learning","volume-title":"Proceedings of the 37th International Conference on Machine Learning (ICML 2020)","author":"Wang","key":"2025112605421558600_ref40"},{"key":"2025112605421558600_ref41","doi-asserted-by":"publisher","first-page":"54","DOI":"10.1006\/ceps.1999.1020","article-title":"Intrinsic and extrinsic motivations: classic definitions and new directions","volume":"25","author":"Ryan","year":"2000","journal-title":"Contemporary educational psychology"},{"key":"2025112605421558600_ref42","doi-asserted-by":"publisher","first-page":"313","DOI":"10.3389\/fpsyg.2013.00313","article-title":"PowerPlay: training an increasingly general problem solver by continually searching for the simplest still unsolvable problem","volume":"4","author":"Schmidhuber","year":"2013","journal-title":"Front Psychol"},{"article-title":"Unifying Count-Based Exploration and Intrinsic Motivation","volume-title":"Advances in Neural Information Processing Systems (NeurIPS 2016)","author":"Bellemare","key":"2025112605421558600_ref43"},{"key":"2025112605421558600_ref44","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v34i04.5955","article-title":"Count-based exploration with the successor representation","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Machado"},{"key":"2025112605421558600_ref45","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1155\/2022\/1238571","article-title":"Count-based exploration via embedded state space for deep reinforcement learning","volume":"2022","author":"Liu","year":"2022","journal-title":"Wireless Commun Mobile Comput"},{"key":"2025112605421558600_ref46","first-page":"3757","article-title":"Episodic multi-agent reinforcement learning with curiosity-driven exploration","volume":"34","author":"Zheng","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"journal-title":"CoRR","article-title":"Intrinsic motivation for encouraging synergistic behavior","author":"Chitnis","key":"2025112605421558600_ref47"},{"key":"2025112605421558600_ref48","first-page":"15774","article-title":"Promoting coordination through policy regularization in multi-agent deep reinforcement learning","volume":"33","author":"Roy","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2025112605421558600_ref49","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP40776.2020.9054546","article-title":"Attention-based curiosity-driven exploration in deep reinforcement learning","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020)","author":"Reizinger"},{"author":"Haarnoja","key":"2025112605421558600_ref50","article-title":"Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor"},{"key":"2025112605421558600_ref51","first-page":"8304","article-title":"ELIGN: expectation alignment as a multi-agent intrinsic reward","volume":"35","author":"Ma","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2025112605421558600_ref52","doi-asserted-by":"publisher","first-page":"715","DOI":"10.1038\/445715a","article-title":"Collective minds","volume":"445","author":"Couzin","year":"2007","journal-title":"Nature"},{"article-title":"ROMA: multi-agent reinforcement learning with emergent roles","volume-title":"Proceedings of the 37th International Conference on Machine Learning (ICML 2020)","author":"Wang","key":"2025112605421558600_ref53"},{"key":"2025112605421558600_ref54","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v35i13.17357","article-title":"Coordination between individual agents in multi-agent reinforcement learning","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Zhang","year":"2021"},{"key":"2025112605421558600_ref55","doi-asserted-by":"publisher","first-page":"895","DOI":"10.1007\/s10462-021-09996-w","article-title":"Multi-agent deep reinforcement learning: a survey","volume":"55","author":"Gronauer","year":"2022","journal-title":"Artif Intell Rev"},{"key":"2025112605421558600_ref56","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-60801-3_27","article-title":"Multilayer perceptron (MLP)","volume-title":"Geomatic Approaches for Modeling Land Change Scenarios","author":"Taud"},{"article-title":"Model-based offline reinforcement learning with count-based conservatism","volume-title":"Proceedings of the 40th International Conference on Machine Learning (ICML 2023)","author":"Kim","key":"2025112605421558600_ref57"},{"key":"2025112605421558600_ref58","doi-asserted-by":"publisher","first-page":"575","DOI":"10.1146\/annurev-ento-020117-043357","article-title":"Correlates and consequences of worker polymorphism in ants","volume":"63","author":"Wills","year":"2018","journal-title":"Annu Rev Entomol"},{"key":"2025112605421558600_ref59","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1146\/annurev.en.08.010163.002021","article-title":"The social biology of ants","volume":"8","author":"Wilson","year":"1963","journal-title":"Annu Rev Entomol"},{"key":"2025112605421558600_ref60","doi-asserted-by":"publisher","first-page":"10980","DOI":"10.1109\/TNNLS.2022.3172168","article-title":"Multiagent soft actor-critic based hybrid motion planner for mobile robots","volume":"34","author":"He","year":"2023","journal-title":"IEEE Trans Neural Networks Learn Syst"}],"container-title":["The Computer Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/68\/11\/1813\/63527904\/bxaf076.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/68\/11\/1813\/63527904\/bxaf076.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,26]],"date-time":"2025-11-26T10:42:28Z","timestamp":1764153748000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/comjnl\/article\/68\/11\/1813\/8169301"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":60,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2025,6,19]]},"published-print":{"date-parts":[[2025,11,13]]}},"URL":"https:\/\/doi.org\/10.1093\/comjnl\/bxaf076","relation":{},"ISSN":["0010-4620","1460-2067"],"issn-type":[{"type":"print","value":"0010-4620"},{"type":"electronic","value":"1460-2067"}],"subject":[],"published-other":{"date-parts":[[2025,11]]},"published":{"date-parts":[[2025,6,19]]}}}