{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T18:00:12Z","timestamp":1775066412531,"version":"3.50.1"},"reference-count":31,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2022,3,28]],"date-time":"2022-03-28T00:00:00Z","timestamp":1648425600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>With the development and appliance of multi-agent systems, multi-agent cooperation is becoming an important problem in artificial intelligence. Multi-agent reinforcement learning (MARL) is one of the most effective methods for solving multi-agent cooperative tasks. However, the huge sample complexity of traditional reinforcement learning methods results in two kinds of training waste in MARL for cooperative tasks: all homogeneous agents are trained independently and repetitively, and multi-agent systems need training from scratch when adding a new teammate. To tackle these two problems, we propose the knowledge reuse methods of MARL. On the one hand, this paper proposes sharing experience and policy within agents to mitigate training waste. On the other hand, this paper proposes reusing the policies learned by original teams to avoid knowledge waste when adding a new agent. Experimentally, the Pursuit task demonstrates how sharing experience and policy can accelerate the training speed and enhance the performance simultaneously. Additionally, transferring the learned policies from the N-agent enables the (N+1)\u2013agent team to immediately perform cooperative tasks successfully, and only a minor training resource can allow the multi-agents to reach optimal performance identical to that from scratch.<\/jats:p>","DOI":"10.3390\/e24040470","type":"journal-article","created":{"date-parts":[[2022,3,29]],"date-time":"2022-03-29T21:44:52Z","timestamp":1648590292000},"page":"470","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Knowledge Reuse of Multi-Agent Reinforcement Learning in Cooperative Tasks"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8327-555X","authenticated-orcid":false,"given":"Daming","family":"Shi","sequence":"first","affiliation":[{"name":"Department of Automation, Tsinghua University, Beijing 100084, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7433-0788","authenticated-orcid":false,"given":"Junbo","family":"Tong","sequence":"additional","affiliation":[{"name":"Department of Automation, Tsinghua University, Beijing 100084, China"}]},{"given":"Yi","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Automation, Tsinghua University, Beijing 100084, China"}]},{"given":"Wenhui","family":"Fan","sequence":"additional","affiliation":[{"name":"Department of Automation, Tsinghua University, Beijing 100084, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1109\/JPROC.2006.887293","article-title":"Consensus and Cooperation in Networked Multi-Agent Systems","volume":"95","author":"Fax","year":"2007","journal-title":"Proc. IEEE"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1109\/TSMCC.2006.874022","article-title":"Agent-based distributed manufacturing process planning and scheduling: A state-of-the-art survey","volume":"36","author":"Shen","year":"2006","journal-title":"IEEE Trans. Syst. Man Cybern. Part C"},{"key":"ref_3","first-page":"1","article-title":"Analysis and design of steel-making complex logistics system based on multi-Agent","volume":"36","author":"Zhao","year":"2012","journal-title":"Metall. Ind. Autom."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"485","DOI":"10.1109\/TITS.2010.2048313","article-title":"A Review of the Applications of Agent Technology in Traffic and Transportation Systems","volume":"11","author":"Chen","year":"2010","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1023\/A:1008942012299","article-title":"Multiagent Systems: A Survey from a Machine Learning Perspective","volume":"8","author":"Stone","year":"2000","journal-title":"Auton. Robot."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1109\/TSMCC.2007.913919","article-title":"A comprehensive survey of multi-agent reinforcement learning","volume":"38","author":"Busoniu","year":"2008","journal-title":"IEEE Trans. Syst. Man Cybern. Part C Appl. Rev."},{"key":"ref_7","unstructured":"Tan, M. (1993, January 27\u201329). Multi-agent reinforcement learning: Independent vs. cooperative agents. Proceedings of the 10th International Conference on Machine Learning, Amerhest, MA, USA."},{"key":"ref_8","unstructured":"Whitehead, S.D. (1991). A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning, AAAI."},{"key":"ref_9","unstructured":"Torrey, L., and Taylor, M. (2013, January 6\u201310). Teaching on a budget: Agents advising agents in reinforcement learning. Proceedings of the 12th Conference on Autonomous Agents and MultiAgent Systems. IFAAMAS, St. Paul, MN, USA."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Da Silva, F.L., Glatt, R., and Costa, A.H.R. (2017, January 8\u201312). Simultaneously learning and advising in multiagent reinforcement learning. Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems. IFAAMAS, Sao Paulo, Brazil.","DOI":"10.1609\/aaai.v31i1.11086"},{"key":"ref_11","unstructured":"Souza, L.O., Ramos, G.D.O., and Ralha, C.G. (2019, January 4\u20136). Experience Sharing Between Cooperative Reinforcement Learning Agents. Proceedings of the 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), Portland, OR, USA."},{"key":"ref_12","first-page":"1633","article-title":"Knowledge reuse for Reinforcement Learning Domains: A Survey","volume":"10","author":"Taylor","year":"2009","journal-title":"J. Mach. Learn. Res."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Glatt, R., Silva, F.L.D., and Costa, A.H.R. (2016, January 9\u201312). Towards Knowledge Transfer in Deep Reinforcement Learning. Proceedings of the Brazilian Conference on Intelligent Systems (BRACIS), Recife, Brazil.","DOI":"10.1109\/BRACIS.2016.027"},{"key":"ref_14","unstructured":"Omidshafiei, S., Pazis, J., Amato, C., How, J.P., and Vian, J. (2017, January 6\u20139). Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability. Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia."},{"key":"ref_15","unstructured":"Wright, M.A., and Horowitz, R. (2019). Attentional Policies for Cross-Context Multi-Agent Reinforcement Learning. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"2604","DOI":"10.1109\/LCOMM.2021.3078442","article-title":"A Cooperative Spectrum Sensing with Multi-Agent Reinforcement Learning Approach in Cognitive Radio Networks","volume":"25","author":"Gao","year":"2021","journal-title":"IEEE Commun. Lett."},{"key":"ref_17","unstructured":"Pinheiro, F.L., and Santos, F.P. (2018). Local Wealth Redistribution Promotes Cooperation in Multiagent Systems. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"189","DOI":"10.2478\/jaiscr-2020-0013","article-title":"Multi Agent Deep Learning with Cooperative Communication","volume":"10","author":"Simes","year":"2020","journal-title":"J. Artif. Intell. Soft Comput. Res."},{"key":"ref_19","first-page":"1","article-title":"A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget Constraint","volume":"15","author":"Zhu","year":"2021","journal-title":"ACM Trans. Auton. Adapt. Syst. (TAAS)"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"362","DOI":"10.1007\/s11768-020-00007-x","article-title":"Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning","volume":"18","author":"Zhao","year":"2020","journal-title":"Control. Theory Technol."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1007\/s10458-020-09489-0","article-title":"Enabling scalable and fault-tolerant multi-agent systems by utilizing cloud-native computing","volume":"35","author":"Dhling","year":"2021","journal-title":"Auton. Agents Multi-Agent Syst."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Liu, L., Liu, Q., Song, Y., Pang, X., Yuan, X., and Xu, Q. (2021). A Collaborative Control Method of Dual-Arm Robots Based on Deep Reinforcement Learning. Appl. Sci., 11.","DOI":"10.3390\/app11041816"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1901","DOI":"10.1109\/LGRS.2020.3009823","article-title":"An Online Distributed Satellite Cooperative Observation Scheduling Algorithm Based on Multiagent Deep Reinforcement Learning","volume":"18","author":"Dalin","year":"2020","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"5717","DOI":"10.1109\/TCYB.2019.2958912","article-title":"Model Learning and Knowledge Sharing for Cooperative Multiagent Systems in Stochastic Environment","volume":"51","author":"Jiang","year":"2021","journal-title":"IEEE Trans. Cybern."},{"key":"ref_25","unstructured":"Souza, L.O., Ramos, G., and Ralha, C.G. (2019). Experience Sharing Between Cooperative Reinforcement Learning Agents. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wang, W., Yang, T., Liu, Y., Hao, J., Hao, X., Hu, Y., Chen, Y., Fan, C., and Gao, Y. (2020, January 2\u20139). From Few to More: Large-Scale Dynamic Multiagent Curriculum Learning. Proceedings of the AAAI Conference on Artificial Intelligence, Online.","DOI":"10.1609\/aaai.v34i05.6221"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1016\/S1389-0417(01)00015-8","article-title":"Value-function reinforcement learning in Markov games","volume":"2","author":"Littman","year":"2001","journal-title":"J. Cogn. Syst. Res."},{"key":"ref_28","unstructured":"Lauer, M., and Riedmiller, M. (July, January 29). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. Proceedings of the Seventeenth International Conference on Machine Learning (ICML-00), Stanford University, Stanford, CA, USA."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_30","unstructured":"Benda, M., Jagannathan, V., and Dodhiawala, R. (1986). On Optimal Cooperation of Knowledge Sources\u2014An Empirical Investigation, Boeing Advanced Technology Center, Boeing Computing Services. Technical Report BCS\u2013G2010\u201328;."},{"key":"ref_31","unstructured":"Barrett, S., Stone, P., and Kraus, S. (2011, January 2\u20136). Empirical evaluation of ad hoc teamwork in the pursuit domain. Proceedings of the International Conference on Autonomous Agents & Multiagent Systems, Taipei, Taiwan."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/4\/470\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:44:58Z","timestamp":1760136298000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/4\/470"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,28]]},"references-count":31,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["e24040470"],"URL":"https:\/\/doi.org\/10.3390\/e24040470","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,28]]}}}