{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T09:27:57Z","timestamp":1774949277645,"version":"3.50.1"},"reference-count":29,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2021,8,13]],"date-time":"2021-08-13T00:00:00Z","timestamp":1628812800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Recently, deep reinforcement learning (RL) algorithms have achieved significant progress in the multi-agent domain. However, training for increasingly complex tasks would be time-consuming and resource intensive. To alleviate this problem, efficient leveraging of historical experience is essential, which is under-explored in previous studies because most existing methods fail to achieve this goal in a continuously dynamic system owing to their complicated design. In this paper, we propose a method for knowledge reuse called \u201cKnowRU\u201d, which can be easily deployed in the majority of multi-agent reinforcement learning (MARL) algorithms without requiring complicated hand-coded design. We employ the knowledge distillation paradigm to transfer knowledge among agents to shorten the training phase for new tasks while improving the asymptotic performance of agents. To empirically demonstrate the robustness and effectiveness of KnowRU, we perform extensive experiments on state-of-the-art MARL algorithms in collaborative and competitive scenarios. The results show that KnowRU outperforms recently reported methods and not only successfully accelerates the training phase, but also improves the training performance, emphasizing the importance of the proposed knowledge reuse for MARL.<\/jats:p>","DOI":"10.3390\/e23081043","type":"journal-article","created":{"date-parts":[[2021,8,13]],"date-time":"2021-08-13T09:22:38Z","timestamp":1628846558000},"page":"1043","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["KnowRU: Knowledge Reuse via Knowledge Distillation in Multi-Agent Reinforcement Learning"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5151-3381","authenticated-orcid":false,"given":"Zijian","family":"Gao","sequence":"first","affiliation":[{"name":"College of Computer, National University of Defense Technology, Changsha 410000, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5997-5169","authenticated-orcid":false,"given":"Kele","family":"Xu","sequence":"additional","affiliation":[{"name":"College of Computer, National University of Defense Technology, Changsha 410000, China"}]},{"given":"Bo","family":"Ding","sequence":"additional","affiliation":[{"name":"College of Computer, National University of Defense Technology, Changsha 410000, China"}]},{"given":"Huaimin","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Computer, National University of Defense Technology, Changsha 410000, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,8,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1145\/203330.203343","article-title":"Temporal difference learning and TD-Gammon","volume":"38","author":"Tesauro","year":"1995","journal-title":"Commun. ACM"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","article-title":"Grandmaster level in StarCraft II using multi-agent reinforcement learning","volume":"575","author":"Vinyals","year":"2019","journal-title":"Nature"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zhou, X., Wang, H., and Ding, B. (2018, January 21\u201326). How many robots are enough: A multi-objective genetic algorithm for the single-objective time-limited complete coverage problem. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8461028"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"645","DOI":"10.1613\/jair.1.11396","article-title":"A survey on transfer learning for multiagent reinforcement learning systems","volume":"64","author":"Costa","year":"2019","journal-title":"J. Artif. Intell. Res."},{"key":"ref_6","unstructured":"Crandall, J.W. (2012, January 4). Just add Pepper: Extending learning algorithms for repeated matrix games to repeated markov games. Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, Valencia, Spain."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Hernandez-Leal, P., and Kaisers, M. (2017). Towards a fast detection of opponents in repeated stochastic games. International Conference on Autonomous Agents and Multiagent Systems, Springer.","DOI":"10.1007\/978-3-319-71682-4_15"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Kelly, S., and Heywood, M.I. (2015, January 11\u201315). Knowledge transfer from keepaway soccer to half-field offense through program symbiosis: Building simple programs for a complex task. Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, Madrid, Spain.","DOI":"10.1145\/2739480.2754798"},{"key":"ref_9","unstructured":"Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. arXiv."},{"key":"ref_10","unstructured":"Bowling, M., and Veloso, M. (2000). An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning, Carnegie-Mellon Univ Pittsburgh Pa School of Computer Science. Technical Report."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_12","unstructured":"Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014). Deterministic Policy Gradient Algorithms, PMLR. International Conference on Machine Learning."},{"key":"ref_13","unstructured":"Konda, V.R., and Tsitsiklis, J.N. (2000). Actor-critic algorithms. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_14","unstructured":"Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Abbeel, O.P., and Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_15","unstructured":"Iqbal, S., and Sha, F. (2019, January 9\u201315). Actor-attention-critic for multi-agent reinforcement learning. Proceedings of the International Conference on Machine Learning (PMLR), Long Beach, CA, USA."},{"key":"ref_16","first-page":"1633","article-title":"Transfer learning for reinforcement learning domains: A survey","volume":"10","author":"Taylor","year":"2009","journal-title":"J. Mach. Learn. Res."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Tan, M. (1993, January 27\u201329). Multi-agent reinforcement learning: Independent vs. cooperative agents. Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, USA.","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"key":"ref_18","unstructured":"Han, J., and Hu, R. (2020, January 20\u201324). Deep fictitious play for finding Markovian Nash equilibrium in multi-agent games. Proceedings of the Mathematical and Scientific Machine Learning (PMLR), Princeton, NJ, USA."},{"key":"ref_19","unstructured":"Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J.Z., and Tuyls, K. (2017). Value-decomposition networks for cooperative multi-agent learning. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhou, M., Chen, Y., Wen, Y., Yang, Y., Su, Y., Zhang, W., Zhang, D., and Wang, J. (2019, January 13\u201315). Factorized q-learning for large-scale multi-agent systems. Proceedings of the First International Conference on Distributed Artificial Intelligence, Beijing, China.","DOI":"10.1145\/3356464.3357707"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1109\/TCYB.2014.2319733","article-title":"Stochastic abstract policies: Generalizing knowledge to improve reinforcement learning","volume":"45","author":"Koga","year":"2014","journal-title":"IEEE Trans. Cybern."},{"key":"ref_22","unstructured":"Didi, S., and Nitschke, G. (April, January 30). Multi-agent behavior-based policy transfer. Proceedings of the European Conference on the Applications of Evolutionary Computation, Porto, Portugal."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Bucilu\u01ce, C., Caruana, R., and Niculescu-Mizil, A. (2006, January 20\u201323). Model compression. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery And Data Mining, Philadelphia, PA, USA.","DOI":"10.1145\/1150402.1150464"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Lai, K.H., Zha, D., Li, Y., and Hu, X. (2020). Dual policy distillation. arXiv.","DOI":"10.24963\/ijcai.2020\/435"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Wadhwania, S., Kim, D.K., Omidshafiei, S., and How, J.P. (2019). Policy distillation and value matching in multiagent reinforcement learning. arXiv.","DOI":"10.1109\/IROS40897.2019.8967849"},{"key":"ref_26","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2009","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1186\/s40537-016-0043-6","article-title":"A survey of transfer learning","volume":"3","author":"Weiss","year":"2016","journal-title":"J. Big Data"},{"key":"ref_29","unstructured":"Kuka\u010dka, J., Golkov, V., and Cremers, D. (2017). Regularization for deep learning: A taxonomy. arXiv."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/8\/1043\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:45:38Z","timestamp":1760165138000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/8\/1043"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,13]]},"references-count":29,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2021,8]]}},"alternative-id":["e23081043"],"URL":"https:\/\/doi.org\/10.3390\/e23081043","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,13]]}}}