{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T04:26:03Z","timestamp":1773807963810,"version":"3.50.1"},"reference-count":29,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,5,7]],"date-time":"2024-05-07T00:00:00Z","timestamp":1715040000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Neurorobot."],"abstract":"<jats:p>Multiagent Reinforcement Learning (MARL) has been well adopted due to its exceptional ability to solve multiagent decision-making problems. To further enhance learning efficiency, knowledge transfer algorithms have been developed, among which experience-sharing-based and action-advising-based transfer strategies share the mainstream. However, it is notable that, although there exist many successful applications of both strategies, they are not flawless. For the long-developed action-advising-based methods (namely KT-AA, short for knowledge transfer based on action advising), their data efficiency and scalability are not satisfactory. As for the newly proposed experience-sharing-based knowledge transfer methods (KT-ES), although the shortcomings of KT-AA have been partially overcome, they are incompetent to correct specific bad decisions in the later learning stage. To leverage the superiority of both KT-AA and KT-ES, this study proposes KT-Hybrid, a hybrid knowledge transfer approach. In the early learning phase, KT-ES methods are employed, expecting better data efficiency from KT-ES to enhance the policy to a basic level as soon as possible. Later, we focus on correcting specific errors made by the basic policy, trying to use KT-AA methods to further improve the performance. Simulations demonstrate that the proposed KT-Hybrid outperforms well-received action-advising- and experience-sharing-based methods.<\/jats:p>","DOI":"10.3389\/fnbot.2024.1364587","type":"journal-article","created":{"date-parts":[[2024,5,7]],"date-time":"2024-05-07T10:16:27Z","timestamp":1715076987000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Hybrid knowledge transfer for MARL based on action advising and experience sharing"],"prefix":"10.3389","volume":"18","author":[{"given":"Feng","family":"Liu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dongqi","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jian","family":"Gao","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2024,5,7]]},"reference":[{"key":"B1","first-page":"804","article-title":"\u201cInteractive teaching strategies for agent training,\u201d","volume-title":"Proceedings of the 25th International Joint Conference on Artificial Intelligence","author":"Amir","year":"2016"},{"key":"B2","volume-title":"Learning and Sequential Decision Making","author":"Barto","year":"1989"},{"key":"B3","doi-asserted-by":"publisher","first-page":"2935","DOI":"10.1109\/TSG.2022.3154718","article-title":"Reinforcement learning for selective key applications in power systems: recent advances and future challenges","volume":"13","author":"Chen","year":"2022","journal-title":"IEEE Trans. Smart Grid"},{"key":"B4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1613\/jair.2584","article-title":"Interactive policy learning through confidence-based autonomy","volume":"34","author":"Chernova","year":"2009","journal-title":"J. Artif. Intell. Res"},{"key":"B5","doi-asserted-by":"publisher","first-page":"414","DOI":"10.1038\/s41586-021-04301-9","article-title":"Magnetic control of tokamak plasmas through deep reinforcement learning","volume":"602","author":"Degrave","year":"2022","journal-title":"Nature"},{"key":"B6","first-page":"1879","article-title":"\u201cStabilising experience replay for deep multi-agent reinforcement learning,\u201d","volume-title":"Proceedings of the 34th International Conference on Machine Learning","author":"Foerster","year":"2017"},{"key":"B7","article-title":"\u201cHalf field offense: an environment for multiagent learning and ad hoc teamwork,\u201d","author":"Hausknecht","year":"2016","journal-title":"Proceedings of AAMAS Adaptive Learning Agents (ALA) Workshop"},{"key":"B8","doi-asserted-by":"publisher","first-page":"601","DOI":"10.1109\/TEVC.2017.2664665","article-title":"An evolutionary transfer reinforcement learning framework for multiagent systems","volume":"21","author":"Hou","year":"2017","journal-title":"IEEE Trans. Evolut. Comput"},{"key":"B9","doi-asserted-by":"publisher","first-page":"5962","DOI":"10.1109\/TSMC.2019.2958846","article-title":"Evolutionary multiagent transfer learning with model-based opponent behavior prediction","volume":"51","author":"Hou","year":"2021","journal-title":"IEEE Trans. Syst. Man Cyber"},{"key":"B10","doi-asserted-by":"publisher","first-page":"698","DOI":"10.1177\/0278364920987859","article-title":"How to train your robot with deep reinforcement learning: lessons we have learned","volume":"40","author":"Ibarz","year":"2021","journal-title":"Int. J. Robot. Res"},{"key":"B11","first-page":"629","article-title":"\u201cAction advising with advice imitation in deep reinforcement learning,\u201d","volume-title":"Proceedings of the 20th International Joint Conference on Autonomous Agents and Multiagent Systems","author":"Ilhan","year":"2021"},{"key":"B12","article-title":"Playing atari with deep reinforcement learning","author":"Mnih","year":"2013","journal-title":"arXiv preprint arXiv:1312.56021-9"},{"key":"B13","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"B14","first-page":"6128","article-title":"\u201cLearning to teach in cooperative multiagent reinforcement learning,\u201d","volume-title":"Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI)","author":"Omidshafiei","year":"2019"},{"key":"B15","first-page":"443","article-title":"\u201cLenient multi-agent deep reinforcement learning,\u201d","volume-title":"Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems","author":"Palmer","year":"2018"},{"key":"B16","doi-asserted-by":"publisher","first-page":"1095","DOI":"10.1073\/pnas.39.10.1095","article-title":"Stochastic games","volume":"39","author":"Shapley","year":"1953","journal-title":"Proc. Natl. Acad. Sci"},{"key":"B17","doi-asserted-by":"publisher","first-page":"645","DOI":"10.1613\/jair.1.11396","article-title":"A survey on transfer learning for multiagent reinforcement learning systems","volume":"64","author":"Silva","year":"2019","journal-title":"J. Artif. Intell. Res"},{"key":"B18","first-page":"1100","article-title":"\u201cSimultaneously learning and advising in multiagent reinforcement learning,\u201d","volume-title":"Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems","author":"Silva","year":"2017"},{"key":"B19","first-page":"5792","article-title":"\u201cUncertainty-aware action advising for deep reinforcement learning agents,\u201d","volume-title":"Proceedings of the 34th AAAI Conference on Artificial Intelligence","author":"Silva","year":"2020"},{"key":"B20","doi-asserted-by":"publisher","first-page":"1140","DOI":"10.1126\/science.aar6404","article-title":"A general reinforcement learning algorithm that masters chess, shogi, and go through self-play","volume":"362","author":"Silver","year":"2018","journal-title":"Science"},{"key":"B21","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton","year":"2018"},{"key":"B22","doi-asserted-by":"publisher","first-page":"e0172395","DOI":"10.1371\/journal.pone.0172395","article-title":"Multiagent cooperation and competition with deep reinforcement learning","volume":"12","author":"Tampuu","year":"2017","journal-title":"PLoS ONE"},{"key":"B23","first-page":"330","article-title":"\u201cMulti-agent reinforcement learning: Independent vs. cooperative agents,\u201d","volume-title":"Proceedings of the 10th International Conference on Lachine Learning","author":"Tan","year":"1993"},{"key":"B24","first-page":"1053","article-title":"\u201cTeaching on a budget: agents advising agents in reinforcement learning,\u201d","volume-title":"Proceedings of the 12nd International Conference on Autonomous Agents and Multiagent Systems","author":"Torrey","year":"2013"},{"key":"B25","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/s12293-021-00339-4","article-title":"Experience sharing based memetic transfer learning for multiagent reinforcement learning","volume":"14","author":"Wang","year":"2022","journal-title":"Memetic Comput"},{"key":"B26","doi-asserted-by":"publisher","first-page":"101475","DOI":"10.1016\/j.swevo.2024.101475","article-title":"Automated design of action advising trigger conditions for multiagent reinforcement learning: a genetic programming-based approach","volume":"85","author":"Wang","year":"2024","journal-title":"Swarm Evolut. Comput"},{"key":"B27","doi-asserted-by":"publisher","first-page":"2735","DOI":"10.1007\/s40747-021-00423-9","article-title":"S2es: a stationary and scalable knowledge transfer approach for multiagent reinforcement learning","volume":"7","author":"Wang","year":"2021","journal-title":"Complex Intell. Syst"},{"key":"B28","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn"},{"key":"B29","first-page":"6672","article-title":"\u201cMastering complex control in moba games with deep reinforcement learning,\u201d","volume-title":"Proceedings of the 34th AAAI Conference on Artificial Intelligence","author":"Ye","year":"2020"}],"container-title":["Frontiers in Neurorobotics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2024.1364587\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,7]],"date-time":"2024-05-07T10:16:38Z","timestamp":1715076998000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2024.1364587\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,7]]},"references-count":29,"alternative-id":["10.3389\/fnbot.2024.1364587"],"URL":"https:\/\/doi.org\/10.3389\/fnbot.2024.1364587","relation":{},"ISSN":["1662-5218"],"issn-type":[{"value":"1662-5218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,7]]},"article-number":"1364587"}}