{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T15:24:55Z","timestamp":1772119495134,"version":"3.50.1"},"reference-count":56,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T00:00:00Z","timestamp":1768003200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T00:00:00Z","timestamp":1768003200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001777","name":"The University of Wollongong","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001777","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Auton Agent Multi-Agent Syst"],"published-print":{"date-parts":[[2026,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Due to its exceptional learning ability, multi-agent deep reinforcement learning (MADRL) has garnered widespread research interest. However, since the learning is data-driven and involves sampling from millions of steps, training a large number of agents is inherently challenging and inefficient. Inspired by the human learning process, we aim to transfer knowledge from humans to avoid starting from scratch. Given the growing emphasis on the Human-on-the-Loop concept, this study focuses on addressing the challenges of large-population learning by incorporating suboptimal human knowledge into the cooperative multi-agent environment. To leverage human experience, we integrate human knowledge into the training process of MADRL, representing it in natural language rather than specific action-state pairs. Compared to previous works, we further consider the attributes of transferred knowledge to assess its impact on algorithm scalability. Additionally, we examine several features of knowledge mapping to effectively convert human knowledge to the action space where agent learning occurs. In reaction to the disparity in knowledge construction between humans and agents, our approach allows agents to decide freely which portions of the state space to leverage human knowledge. From the challenging domains of the StarCraft Multi-agent Challenge, our method successfully alleviates the scalability issue in MADRL. Furthermore, we find that, despite individual-type knowledge significantly accelerating the training process, cooperative-type knowledge is more desirable for addressing a large agent population. We hope this study provides valuable insights into applying and mapping human knowledge, ultimately enhancing the interpretability of agent behavior.<\/jats:p>","DOI":"10.1007\/s10458-025-09729-1","type":"journal-article","created":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T07:36:25Z","timestamp":1768030585000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Improving scalability of multi-agent deep reinforcement learning with suboptimal human knowledge"],"prefix":"10.1007","volume":"40","author":[{"given":"Dingbang","family":"Liu","sequence":"first","affiliation":[]},{"given":"Fenghui","family":"Ren","sequence":"additional","affiliation":[]},{"given":"Jun","family":"Yan","sequence":"additional","affiliation":[]},{"given":"Guoxin","family":"Su","sequence":"additional","affiliation":[]},{"given":"Wen","family":"Gu","sequence":"additional","affiliation":[]},{"given":"Shohei","family":"Kato","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2026,1,10]]},"reference":[{"key":"9729_CR1","doi-asserted-by":"publisher","unstructured":"Barwise, J. (1977). An introduction to first-order logic. In Studies in Logic and the Foundations of Mathematics (vol.\u00a090, pp. 5\u201346). Elsevier. https:\/\/doi.org\/10.1016\/S0049-237X(08)71097-8","DOI":"10.1016\/S0049-237X(08)71097-8"},{"key":"9729_CR2","doi-asserted-by":"publisher","first-page":"100914","DOI":"10.1016\/J.SWEVO.2021.100914","volume":"65","author":"H Chen","year":"2021","unstructured":"Chen, H., Wang, C., Huang, J., & Gong, J. (2021). Efficient use of heuristics for accelerating xcs-based policy learning in Markov games. Swarm and Evolutionary Computation, 65, 100914. https:\/\/doi.org\/10.1016\/J.SWEVO.2021.100914","journal-title":"Swarm and Evolutionary Computation"},{"key":"9729_CR3","unstructured":"Christianos, F., Papoudakis, G., Rahman, M. A., & Albrecht, S. V. (2021). Scaling multi-agent reinforcement learning with selective parameter sharing. In M. Meila, T. Zhang (Eds.) Proceedings of the 38th International conference on machine learning (vol. 139, pp. 1989\u20131998). PMLR. http:\/\/proceedings.mlr.press\/v139\/christianos21a.html"},{"issue":"3","key":"9729_CR4","doi-asserted-by":"publisher","first-page":"1086","DOI":"10.1109\/TITS.2019.2901791","volume":"21","author":"T Chu","year":"2019","unstructured":"Chu, T., Wang, J., Codec\u00e0, L., & Li, Z. (2019). Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Transactions on Intelligent Transportation Systems, 21(3), 1086\u20131095. https:\/\/doi.org\/10.1109\/TITS.2019.2901791","journal-title":"IEEE Transactions on Intelligent Transportation Systems"},{"key":"9729_CR5","doi-asserted-by":"publisher","unstructured":"Cui, K., Tahir, A., Ekinci, G., Elshamanhory, A., Eich, Y., Li, M., & Koeppl, H. (2022). A survey on large-population systems and scalable multi-agent reinforcement learning. arXiv:2209.03859. https:\/\/doi.org\/10.48550\/arXiv.2209.03859.","DOI":"10.48550\/arXiv.2209.03859"},{"key":"9729_CR6","doi-asserted-by":"publisher","first-page":"645","DOI":"10.1613\/JAIR.1.11396","volume":"64","author":"FL Da Silva","year":"2019","unstructured":"Da Silva, F. L., & Costa, A. H. R. (2019). A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research, 64, 645\u2013703. https:\/\/doi.org\/10.1613\/JAIR.1.11396","journal-title":"Journal of Artificial Intelligence Research"},{"key":"9729_CR7","doi-asserted-by":"publisher","unstructured":"Deka A, Sycara K (2021) Natural emergence of heterogeneous strategies in artificially intelligent competitive teams. In Y. Tan, & Y. Shi (Eds.) Proceedings of the swarm intelligence: 12th International conference (pp. 13\u201325). Springer. https:\/\/doi.org\/10.1007\/978-3-030-78743-1_2","DOI":"10.1007\/978-3-030-78743-1_2"},{"issue":"5","key":"9729_CR8","doi-asserted-by":"publisher","first-page":"3215","DOI":"10.1007\/S10462-020-09938-Y","volume":"54","author":"W Du","year":"2021","unstructured":"Du, W., & Ding, S. (2021). A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications. Artificial Intelligence Review, 54(5), 3215\u20133238. https:\/\/doi.org\/10.1007\/S10462-020-09938-Y","journal-title":"Artificial Intelligence Review"},{"issue":"5","key":"9729_CR9","doi-asserted-by":"publisher","first-page":"297","DOI":"10.1049\/iet-its.2019.0317","volume":"14","author":"J Duan","year":"2020","unstructured":"Duan, J., Eben Li, S., Guan, Y., Sun, Q., & Cheng, B. (2020). Hierarchical reinforcement learning for self-driving decision-making without reliance on labelled driving data. IET Intelligent Transport Systems, 14(5), 297\u2013305. https:\/\/doi.org\/10.1049\/iet-its.2019.0317","journal-title":"IET Intelligent Transport Systems"},{"key":"9729_CR10","doi-asserted-by":"publisher","unstructured":"Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual multi-agent policy gradients. In S. A. McIlraith, & K. Q. Weinberger (Eds.) Proceedings of the 32nd AAAI conference on artificial intelligence (pp. 2974\u20132982). AAAI Press. https:\/\/doi.org\/10.1609\/AAAI.V32I1.11794","DOI":"10.1609\/AAAI.V32I1.11794"},{"key":"9729_CR11","unstructured":"Fu, Q., Ai, X., Yi, J., Qiu, T., Yuan, W., & Pu, Z. (2022). Learning heterogeneous agent cooperation via multiagent league training. arXiv:2211.11616. https:\/\/arxiv.org\/abs\/2211.11616"},{"key":"9729_CR12","unstructured":"Glanois, C., Weng, P., Zimmer, M., Li, D., Yang, T., Hao, J., & Liu, W. (2021). A survey on interpretable reinforcement learning. arXiv:2112.13112. https:\/\/arxiv.org\/abs\/2112.13112"},{"issue":"2","key":"9729_CR13","doi-asserted-by":"publisher","first-page":"895","DOI":"10.1007\/S10462-021-09996-W","volume":"55","author":"S Gronauer","year":"2022","unstructured":"Gronauer, S., & Diepold, K. (2022). Multi-agent deep reinforcement learning: A survey. Artificial Intelligence Review, 55(2), 895\u2013943. https:\/\/doi.org\/10.1007\/S10462-021-09996-W","journal-title":"Artificial Intelligence Review"},{"key":"9729_CR14","doi-asserted-by":"publisher","unstructured":"Grupen, N. A., Lee, D. D., & Selman, B. (2022). Multi-agent curricula and emergent implicit signaling. In P. Faliszewski, V. Mascardi, C. Pelachaud, & M. E. Taylor (Eds.) Proceedings of the 21st International conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, (pp. 553\u2013561). https:\/\/doi.org\/10.5555\/3535850.3535913","DOI":"10.5555\/3535850.3535913"},{"key":"9729_CR15","doi-asserted-by":"publisher","unstructured":"Gupta, J. K., Egorov, M., & Kochenderfer, M. (2017). Cooperative multi-agent control using deep reinforcement learning. In G. Sukthankar, & J. A. Rodr\u00edguez-Aguilar (Eds.) Proceedings of the autonomous agents and multiagent systems (vol. 10642, pp. 66\u201383). Springer. https:\/\/doi.org\/10.1007\/978-3-319-71682-4_5","DOI":"10.1007\/978-3-319-71682-4_5"},{"key":"9729_CR16","doi-asserted-by":"publisher","unstructured":"Han, X., Tang, H., Li, Y., Kou, G., & Liu, L. (2020). Improving multi-agent reinforcement learning with imperfect human knowledge. In I. Farkas, P. Masulli, & S. Wermter (Eds.) Proceedings of the 29th International conference on artificial neural networks (pp. 369\u2013380). Springer. https:\/\/doi.org\/10.1007\/978-3-030-61616-8_30","DOI":"10.1007\/978-3-030-61616-8_30"},{"key":"9729_CR17","doi-asserted-by":"publisher","unstructured":"Hao, J., & Varakantham, P. (2022). Hierarchical value decomposition for effective on-demand ride-pooling. In P. Faliszewski, V. Mascardi, C. Pelachaud, & M. E. Taylor (Eds.) Proceedings of the 21st International conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), pp 580\u2013587. https:\/\/doi.org\/10.5555\/3535850.3535916","DOI":"10.5555\/3535850.3535916"},{"issue":"4","key":"9729_CR18","doi-asserted-by":"publisher","first-page":"631","DOI":"10.3390\/SYM12040631","volume":"12","author":"C Hu","year":"2020","unstructured":"Hu, C. (2020). A confrontation decision-making method with deep reinforcement learning and knowledge transfer for multi-agent system. Symmetry, 12(4), 631. https:\/\/doi.org\/10.3390\/SYM12040631","journal-title":"Symmetry"},{"issue":"2","key":"9729_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3054912","volume":"50","author":"A Hussein","year":"2017","unstructured":"Hussein, A., Gaber, M. M., Elyan, E., & Jayne, C. (2017). Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR), 50(2), 1\u201335. https:\/\/doi.org\/10.1145\/3054912","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"9729_CR20","unstructured":"Jiang, J., & Lu, Z. (2018). Learning attentional communication for multi-agent cooperation. In S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.) Proceedings of the neural information processing systems (vol. 31, pp. 7265\u20137275). https:\/\/proceedings.neurips.cc\/paper\/2018\/hash\/6a8018b3a00b69c008601b8becae392b-Abstract.html"},{"key":"9729_CR21","unstructured":"Jiang, Z., & Luo, S. (2019). Neural logic reinforcement learning. In K. Chaudhuri, & R. Salakhutdinov (Eds.) Proceedings of the 36th International conference on machine learning (pp. 3110\u20133119). PMLR. http:\/\/proceedings.mlr.press\/v97\/jiang19a.html"},{"key":"9729_CR22","unstructured":"Le, H., Jiang, N., Agarwal, A., Dud\u00edk, M., Yue, Y., & Daum\u00e9 III, H. (2018). Hierarchical imitation and reinforcement learning. In J. G. Dy, & A. Krause (Eds.) Proceedings of the 35th International conference on machine learning (pp. 2917\u20132926). PMLR. http:\/\/proceedings.mlr.press\/v80\/le18a.html"},{"key":"9729_CR23","unstructured":"Le, H. M., Yue, Y., Carr, P., & Lucey, P. (2017). Coordinated multi-agent imitation learning. In D. Precup, & Y. W. Teh (Eds.) Proceedings of the International Conference on Machine Learning (pp. 1995\u20132003). PMLR. http:\/\/proceedings.mlr.press\/v70\/le17a.html"},{"key":"9729_CR24","unstructured":"Li, A. C., Florensa, C., Clavera, I., & Abbeel, P. (2020). Sub-policy adaptation for hierarchical reinforcement learning. In Proceedings of the International conference on learning representations. OpenReview.net. https:\/\/openreview.net\/forum?id=ByeWogStDS"},{"key":"9729_CR25","doi-asserted-by":"publisher","unstructured":"Li, S., Gupta, J. K., Morales, P., Allen, R., & Kochenderfer, M. J. (2021). Deep implicit coordination graphs for multi-agent reinforcement learning. In F. Dignum, A. Lomuscio, U. Endriss, & A. Now\u00e9 (Eds.) Proceedings of the 20th International conference on autonomous agents and multiagent systems (pp. 764\u2013772). ACM. https:\/\/doi.org\/10.5555\/3463952.3464044","DOI":"10.5555\/3463952.3464044"},{"key":"9729_CR26","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1016\/J.NEUCOM.2022.06.091","volume":"504","author":"W Liang","year":"2022","unstructured":"Liang, W., Wang, J., Bao, W., Zhu, X., Wu, G., Zhang, D., & Niu, L. (2022). Qauxi: Cooperative multi-agent reinforcement learning with knowledge transferred from auxiliary task. Neurocomputing, 504, 163\u2013173. https:\/\/doi.org\/10.1016\/J.NEUCOM.2022.06.091","journal-title":"Neurocomputing"},{"key":"9729_CR27","doi-asserted-by":"crossref","unstructured":"Liu, X., Yu, J., Feng, Z., & Gao, Y. (2020). Multi-agent reinforcement learning for resource allocation in iot networks with edge computing. China Communications, 17(9), 220\u2013236. http:\/\/www.cic-chinacommunications.cn\/EN\/Y2020\/V17\/I9\/220","DOI":"10.23919\/JCC.2020.09.017"},{"key":"9729_CR28","unstructured":"Long, Q., Zhou, Z., Gupta, A., Fang, F., Wu, Y., & Wang, X. (2020). Evolutionary population curriculum for scaling multi-agent reinforcement learning. In Proceedings of the 8th International conference on learning representations. OpenReview.net. https:\/\/openreview.net\/forum?id=SJxbHkrKDH"},{"key":"9729_CR29","doi-asserted-by":"publisher","unstructured":"Mandel, T., Liu, Y.-E., Brunskill, E., & Popovi\u0107, Z. (2017). Where to add actions in human-in-the-loop reinforcement learning. In S. Singh, & S. Markovitch (Eds.) Proceedings of the 31st AAAI conference on artificial intelligence (pp. 2322\u20132328). AAAI Press. https:\/\/doi.org\/10.1609\/AAAI.V31I1.10945","DOI":"10.1609\/AAAI.V31I1.10945"},{"issue":"1","key":"9729_CR30","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1109\/MSMC.2016.2623867","volume":"3","author":"S Nahavandi","year":"2017","unstructured":"Nahavandi, S. (2017). Trusted autonomy between humans and robots: Toward human-on-the-loop in robotics and autonomous systems. IEEE Systems, Man, and Cybernetics Magazine, 3(1), 10\u201317. https:\/\/doi.org\/10.1109\/MSMC.2016.2623867","journal-title":"IEEE Systems, Man, and Cybernetics Magazine"},{"key":"9729_CR31","unstructured":"Narvekar, S., Sinapov, J., Leonetti, M., & Stone, P. (2016). Source task creation for curriculum learning. In C. M. Jonker, S. Marsella, J. Thangarajah, & K. Tuyls (Eds.) Proceedings of the 2016 International conference on autonomous agents multiagent systems (pp. 566\u2013574). ACM. http:\/\/dl.acm.org\/citation.cfm?id=2937007"},{"issue":"9","key":"9729_CR32","doi-asserted-by":"publisher","first-page":"3826","DOI":"10.1109\/TCYB.2020.2977374","volume":"50","author":"TT Nguyen","year":"2020","unstructured":"Nguyen, T. T., Nguyen, N. D., & Nahavandi, S. (2020). Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE Transactions on Cybernetics, 50(9), 3826\u20133839. https:\/\/doi.org\/10.1109\/TCYB.2020.2977374","journal-title":"IEEE Transactions on Cybernetics"},{"key":"9729_CR33","doi-asserted-by":"publisher","unstructured":"Niu, Y., Paleja, R., & Gombolay, M. (2021). Multi-agent graph-attention communication and teaming. In F. Dignum, A. Lomuscio, U. Endriss, & A. Now\u00e9 (Eds.) Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (pp. 964\u2013973). ACM. https:\/\/doi.org\/10.5555\/3463952.3464065","DOI":"10.5555\/3463952.3464065"},{"key":"9729_CR34","doi-asserted-by":"publisher","unstructured":"Oliehoek, F. A., & Amato, C. (2016). A concise introduction to decentralized POMDPs. Springer International Publishing. https:\/\/doi.org\/10.1007\/978-3-319-28929-8","DOI":"10.1007\/978-3-319-28929-8"},{"key":"9729_CR35","unstructured":"Rashid, T., Samvelyan, M., De Witt, C. S., Farquhar, G., Foerster, J., & Whiteson, S. (2020). Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research, 21(1), 7234\u20137284. http:\/\/jmlr.org\/papers\/v21\/20-081.html"},{"key":"9729_CR36","unstructured":"Samvelyan, M., Rashid, T., De Witt, C. S., Farquhar, G., Nardelli, N., Rudner, T. G. J., Hung, C.-M., Torr, P. H. S., Foerster, J., & Whiteson, S. (2019). The starcraft multi-agent challenge. In E. Elkind, M. Veloso, N. Agmon, & M. E. Taylor (Eds.) Proceedings of the 18th International conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, (pp. 2186\u20132188). http:\/\/dl.acm.org\/citation.cfm?id=3332052"},{"key":"9729_CR37","unstructured":"Schaal, S. (1996). Learning from demonstration. Proceedings of the neural information processing systems (vol. 9, pp. 1040\u20131046). http:\/\/papers.nips.cc\/paper\/1224-learning-from-demonstration"},{"issue":"1","key":"9729_CR38","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1109\/TETCI.2018.2823329","volume":"3","author":"K Shao","year":"2018","unstructured":"Shao, K., Zhu, Y., & Zhao, D. (2018). Starcraft micromanagement with reinforcement learning and curriculum transfer learning. IEEE Transactions on Emerging Topics in Computational Intelligence, 3(1), 73\u201384. https:\/\/doi.org\/10.1109\/TETCI.2018.2823329","journal-title":"IEEE Transactions on Emerging Topics in Computational Intelligence"},{"issue":"3","key":"9729_CR39","doi-asserted-by":"publisher","first-page":"1699","DOI":"10.1109\/TCYB.2021.3108237","volume":"53","author":"H Shi","year":"2023","unstructured":"Shi, H., Li, J., Mao, J., & Hwang, K. S. (2023). Lateral transfer learning for multiagent reinforcement learning. IEEE Transactions on Cybernetics, 53(3), 1699\u20131711. https:\/\/doi.org\/10.1109\/TCYB.2021.3108237","journal-title":"IEEE Transactions on Cybernetics"},{"key":"9729_CR40","unstructured":"Song, J., Ren, H., Sadigh, D., & Ermon, S. (2018). Multi-agent generative adversarial imitation learning. In S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.) Proceedings of the neural information processing systems (vol. 31, pp. 7472\u20137483). https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2018\/file\/240c945bb72980130446fc2b40fbb8e0-Paper.pdf"},{"key":"9729_CR41","unstructured":"Suay, H. B., Brys, T., Taylor, M. E., & Chernova, S. (2016). Learning from demonstration for shaping through inverse reinforcement learning. In C. M. Jonker, S. Marsella, J. Thangarajah, & K. Tuyls (Eds.) Proceedings of the 15st International conference on autonomous agents and multiagent systems (pp. 429\u2013437). ACM. http:\/\/dl.acm.org\/citation.cfm?id=2936988"},{"key":"9729_CR42","unstructured":"Terry, J. K., Grammel, N., Son, S., & Black, B. (2022). Parameter sharing for heterogeneous agents in multi-agent reinforcement learning. arXiv:2005.13625v7. https:\/\/arxiv.org\/abs\/2005.13625v7"},{"key":"9729_CR43","doi-asserted-by":"publisher","unstructured":"Troullinos, D., Chalkiadakis, G., Papamichail, I., & Papageorgiou, M. (2021). Collaborative multiagent decision making for lane-free autonomous driving. In F. Dignum, A. Lomuscio, U. Endriss, & A. Now\u00e9 (Eds.) Proceedings of the 20th International conference on autonomous agents and multiagent systems (pp. 1335\u20131343). ACM. https:\/\/doi.org\/10.5555\/3463952.3464106","DOI":"10.5555\/3463952.3464106"},{"key":"9729_CR44","unstructured":"Wang, T., Dong, H., Lesser, V., & Zhang, C. (2020). ROMA: multi-agent reinforcement learning with emergent roles. In Proceedings of the 37th International conference on machine learning (vol. 119, pp. 9876\u20139886). PMLR. http:\/\/proceedings.mlr.press\/v119\/wang20f.html"},{"key":"9729_CR45","doi-asserted-by":"publisher","unstructured":"Wang, T., Dong, H., Lesser, V., & Zhang, C. (2020). From few to more: Large-scale dynamic multiagent curriculum learning. In Proceedings of the 34th AAAI conference on artificial intelligence (pp. 7293\u20137300). AAAI Press. https:\/\/doi.org\/10.1609\/AAAI.V34I05.6221","DOI":"10.1609\/AAAI.V34I05.6221"},{"key":"9729_CR46","doi-asserted-by":"publisher","unstructured":"Wang, X., Ke, L., Zhang, G., & Zhu, D. (2022). Attention based large scale multi-agent reinforcement learning. In Proceedings of the 5th International conference on artificial intelligence and big data (pp. 112\u2013117). https:\/\/doi.org\/10.1109\/ICAIBD55127.2022.9820093","DOI":"10.1109\/ICAIBD55127.2022.9820093"},{"key":"9729_CR47","doi-asserted-by":"publisher","unstructured":"Wang, Y., & Sartoretti, G. (2022). Fcmnet: Full communication memory net for team-level cooperation in multi-agent systems. In P. Faliszewski, V. Mascardi, C. Pelachaud, & M. E. Taylor (Eds.) Proceedings of the 21st International conference on autonomous agents and multiagent systems (pp. 1355\u20131363). International Foundation for Autonomous Agents and Multiagent Systems. https:\/\/doi.org\/10.5555\/3535850.3536001","DOI":"10.5555\/3535850.3536001"},{"key":"9729_CR48","doi-asserted-by":"publisher","unstructured":"Yang, B., Ma, C., & Xia, X. (2021). Drone formation control via belief-correlated imitation learning. In: Dignum F, Lomuscio A, Endriss U,& A. Now\u00e9 (Eds.) Proceedings of the 20th International conference on autonomous agents and multiagent systems (pp. 1407\u20131415). ACM. https:\/\/doi.org\/10.5555\/3463952.3464114","DOI":"10.5555\/3463952.3464114"},{"key":"9729_CR49","doi-asserted-by":"publisher","unstructured":"Yang, J., Borovikov, I., & Zha, H. (2020). Hierarchical cooperative multi-agent reinforcement learning with skill discovery. In A. E. F. Seghrouchni, G. Sukthankar, B. An, & N. Yorke-Smith (Eds.) Proceedings of the 19th International conference on autonomous agents and multiagent systems (pp. 1566\u20131574). International Foundation for Autonomous Agents and Multiagent Systems. https:\/\/doi.org\/10.5555\/3398761.3398941","DOI":"10.5555\/3398761.3398941"},{"key":"9729_CR50","doi-asserted-by":"publisher","unstructured":"Yang, N., Ding, B., Shi, P., & Feng, D. (2022). Improving scalability of multi-agent reinforcement learning with parameters sharing. In Proceedings of the 2022 IEEE International Conference on Joint Cloud Computing (pp 37\u201342). https:\/\/doi.org\/10.1109\/JCC56315.2022.00013","DOI":"10.1109\/JCC56315.2022.00013"},{"key":"9729_CR51","unstructured":"Yang, Y., Hao, J., Liao, B., Shao, K., Chen, G., Liu, W., & Tang, H. (2020). Qatten: A general framework for cooperative multiagent reinforcement learning. https:\/\/arxiv.org\/abs\/2002.03939"},{"key":"9729_CR52","doi-asserted-by":"publisher","unstructured":"Zhang, K., Yang, Z., & Ba\u015far, T. (2021). Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control (pp. 321\u2013384). https:\/\/doi.org\/10.1007\/978-3-030-60990-0_12","DOI":"10.1007\/978-3-030-60990-0_12"},{"key":"9729_CR53","doi-asserted-by":"publisher","unstructured":"Zhang, P., Hao, J., Wang, W., Tang, H., Ma, Y., Duan, Y., & Zheng, Y. (2021). Kogun: Accelerating deep reinforcement learning via integrating human suboptimal knowledge. In C. Bessiere, (Ed.) Proceedings of the 29th International Joint Conference on Artificial Intelligence. ijcai.org, (pp. 2263\u20132269). https:\/\/doi.org\/10.24963\/IJCAI.2020\/317","DOI":"10.24963\/IJCAI.2020\/317"},{"key":"9729_CR54","doi-asserted-by":"publisher","unstructured":"Zhang, R., Torabi, F., Guan, L., Ballard, D. H., & Stone, P. (2019). Leveraging human guidance for deep reinforcement learning tasks. In S. Kraus (Ed.) Proceedings of the 28th International Joint Conference on Artificial Intelligence. ijcai.org, (pp. 6339\u20136346). https:\/\/doi.org\/10.24963\/IJCAI.2019\/884","DOI":"10.24963\/IJCAI.2019\/884"},{"issue":"3","key":"9729_CR55","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1108\/17563781211255862","volume":"5","author":"S Zhifei","year":"2012","unstructured":"Zhifei, S., & Meng Joo, E. (2012). A survey of inverse reinforcement learning techniques. International Journal of Intelligent Computing and Cybernetics, 5(3), 293\u2013311. https:\/\/doi.org\/10.1108\/17563781211255862","journal-title":"International Journal of Intelligent Computing and Cybernetics"},{"key":"9729_CR56","doi-asserted-by":"publisher","unstructured":"Zhou, M., Chen, Y., Wen, Y., Yang, Y., Su, Y., Zhang, W., Zhang, D., & Wang, J. (2019). Factorized q-learning for large-scale multi-agent systems. In Proceedings of the First International conference on distributed artificial intelligence, (pp. 1\u20137). https:\/\/doi.org\/10.1145\/3356464.3357707","DOI":"10.1145\/3356464.3357707"}],"container-title":["Autonomous Agents and Multi-Agent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-025-09729-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10458-025-09729-1","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-025-09729-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T07:36:32Z","timestamp":1768030592000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10458-025-09729-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,10]]},"references-count":56,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,6]]}},"alternative-id":["9729"],"URL":"https:\/\/doi.org\/10.1007\/s10458-025-09729-1","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-3907706\/v1","asserted-by":"object"}]},"ISSN":["1387-2532","1573-7454"],"issn-type":[{"value":"1387-2532","type":"print"},{"value":"1573-7454","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,10]]},"assertion":[{"value":"29 January 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 December 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 January 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest. And they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflicts of Interest"}},{"value":"Not applicable","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval"}},{"value":"Not applicable","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"Not applicable","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":6,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"2"}}