{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T15:23:47Z","timestamp":1772119427274,"version":"3.50.1"},"reference-count":73,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,5,16]],"date-time":"2025-05-16T00:00:00Z","timestamp":1747353600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,5,16]],"date-time":"2025-05-16T00:00:00Z","timestamp":1747353600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Honda Research Institute Europe GmbH, Germany"},{"name":"German Federal Ministry of Education and Research"},{"DOI":"10.13039\/501100002341","name":"Research Council of Finland","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005714","name":"Technische Universit\u00e4t Darmstadt","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005714","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Auton Agent Multi-Agent Syst"],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Research on multi-agent interaction involving humans is still in its infancy. Most approaches have focused on environments with collaborative human behavior or a small, defined set of situations. When deploying robots in human-inhabited environments in the future, the diversity of interactions surpasses the capabilities of pre-trained collaboration models. \u201dCoexistence\u201d environments, characterized by agents with varying or partially aligned objectives, present a unique challenge for robotic collaboration. Traditional reinforcement learning methods fall short in these settings. These approaches lack the flexibility to adapt to changing agent counts or task requirements without undergoing retraining. Moreover, existing models do not adequately support scenarios where robots should exhibit helpful behavior toward others without compromising their primary goals. To tackle this issue, we introduce a novel framework that decomposes interaction and task-solving into separate learning problems and blends the resulting policies at inference time using a goal inference model for task estimation. We create impact-aware agents and linearly scale the cost of training agents with the number of agents and available tasks. To this end, a weighting function blending action distributions for individual interactions with the original task action distribution is proposed. To support our claims we demonstrate that our framework scales in task and agent count across several environments and considers collaboration opportunities when present. The new learning paradigm opens the path to more complex multi-robot, multi-human interactions.<\/jats:p>","DOI":"10.1007\/s10458-025-09707-7","type":"journal-article","created":{"date-parts":[[2025,5,16]],"date-time":"2025-05-16T01:55:58Z","timestamp":1747360558000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Entropy based blending of policies for multi-agent coexistence"],"prefix":"10.1007","volume":"39","author":[{"given":"David","family":"Rother","sequence":"first","affiliation":[]},{"given":"Franziska","family":"Herbert","sequence":"additional","affiliation":[]},{"given":"Fabian","family":"Kalter","sequence":"additional","affiliation":[]},{"given":"Dorothea","family":"Koert","sequence":"additional","affiliation":[]},{"given":"Joni","family":"Pajarinen","sequence":"additional","affiliation":[]},{"given":"Jan","family":"Peters","sequence":"additional","affiliation":[]},{"given":"Thomas H.","family":"Weisswange","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,5,16]]},"reference":[{"issue":"3","key":"9707_CR1","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1561\/1100000005","volume":"1","author":"MA Goodrich","year":"2008","unstructured":"Goodrich, M. A., & Schultz, A. C. (2008). Human-robot interaction: A survey. Foundations and Trends in Human-Computer Interaction, 1(3), 203\u2013275.","journal-title":"Foundations and Trends in Human-Computer Interaction"},{"issue":"4","key":"9707_CR2","doi-asserted-by":"publisher","first-page":"525","DOI":"10.1177\/0018720816644364","volume":"58","author":"TB Sheridan","year":"2016","unstructured":"Sheridan, T. B. (2016). Human-robot interaction: Status and challenges. Human Factors, 58(4), 525\u2013532.","journal-title":"Human Factors"},{"key":"9707_CR3","doi-asserted-by":"crossref","unstructured":"Akgun, B., Cakmak, M., Yoo, J. W., & Thomaz, A. L. (2012). Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective. In Proceedings of the seventh annual ACM\/IEEE international conference on Human-Robot Interaction (pp. 391\u2013398).","DOI":"10.1145\/2157689.2157815"},{"issue":"12","key":"9707_CR4","doi-asserted-by":"publisher","first-page":"9571","DOI":"10.1109\/TIE.2018.2823667","volume":"65","author":"G Du","year":"2018","unstructured":"Du, G., Chen, M., Liu, C., Zhang, B., & Zhang, P. (2018). Online robot teaching with natural human-robot interaction. IEEE Transactions on Industrial Electronics, 65(12), 9571\u20139581.","journal-title":"IEEE Transactions on Industrial Electronics"},{"key":"9707_CR5","doi-asserted-by":"crossref","unstructured":"Carros, F., Meurer, J., L\u00f6ffler, D., Unbehaun, D., Matthies, S., Koch, I., Wieching, R., Randall, D., Hassenzahl, M., & Wulf, V. (2020). Exploring human-robot interaction with the elderly: Results from a ten-week case study in a care home. In Proceedings of the 2020 CHI conference on human factors in computing systems (pp. 1\u201312).","DOI":"10.1145\/3313831.3376402"},{"key":"9707_CR6","doi-asserted-by":"crossref","unstructured":"Mast, M., Burmester, M., Graf, B., Weisshardt, F., Arbeiter, G., \u0160pan\u011bl, M., Materna, Z., Smr\u017e, P., & Kronreif, G. (2015). Design of the human-robot interaction for a semi-autonomous service robot to assist elderly people. In Ambient assisted living (pp. 15\u201329). Springer.","DOI":"10.1007\/978-3-319-11866-6_2"},{"issue":"2","key":"9707_CR7","doi-asserted-by":"publisher","first-page":"4063","DOI":"10.1109\/LRA.2022.3150013","volume":"7","author":"J Zhu","year":"2022","unstructured":"Zhu, J., Gienger, M., & Kober, J. (2022). Learning task-parameterized skills from few demonstrations. IEEE Robotics and Automation Letters, 7(2), 4063\u20134070.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"9707_CR8","doi-asserted-by":"crossref","unstructured":"Aaltonen, I., Arvola, A., Heikkil\u00e4, P., & Lammi, H. (2017). Hello Pepper, may I tickle you? Children\u2019s and adults\u2019 responses to an entertainment robot at a shopping mall. In Proceedings of the companion of the 2017 ACM\/IEEE international conference on human-robot interaction (pp. 53\u201354).","DOI":"10.1145\/3029798.3038362"},{"issue":"1","key":"9707_CR9","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s00291-020-00607-8","volume":"43","author":"N Boysen","year":"2021","unstructured":"Boysen, N., Fedtke, S., & Schwerdfeger, S. (2021). Last-mile delivery concepts: A survey from an operational research perspective. OR Spectrum, 43(1), 1\u201358.","journal-title":"OR Spectrum"},{"issue":"22","key":"9707_CR10","doi-asserted-by":"publisher","first-page":"10702","DOI":"10.3390\/app112210702","volume":"11","author":"JA Gonzalez-Aguirre","year":"2021","unstructured":"Gonzalez-Aguirre, J. A., Osorio-Oliveros, R., Rodr\u00edguez-Hern\u00e1ndez, K. L., Liz\u00e1rraga-Iturralde, J., Morales Menendez, R., Ram\u00edrez-Mendoza, R. A., Ram\u00edrez-Moreno, M. A., & de Jes\u00fas Lozoya-Santos, J. (2021). Service robots: Trends and technology. Applied Sciences, 11(22), 10702.","journal-title":"Applied Sciences"},{"issue":"2","key":"9707_CR11","doi-asserted-by":"publisher","first-page":"88","DOI":"10.1080\/10400435.2014.978916","volume":"27","author":"S Bedaf","year":"2015","unstructured":"Bedaf, S., Gelderblom, G. J., & De Witte, L. (2015). Overview and categorization of robots supporting independent living of elderly people: What activities do they support and how far have they developed. Assistive Technology, 27(2), 88\u2013100.","journal-title":"Assistive Technology"},{"key":"9707_CR12","doi-asserted-by":"crossref","unstructured":"Asama, H., Ozaki, K., Itakura, H., Matsumoto, A., Ishida, Y., & Endo, I. (1991). Collision avoidance among multiple mobile robots based on rules and communication. In IROS (Vol.\u00a091, pp. 1215\u20131220).","DOI":"10.1109\/IROS.1991.174665"},{"key":"9707_CR13","doi-asserted-by":"crossref","unstructured":"Buehler, M. C., & Weisswange, T. H. (2020). Theory of mind based communication for human agent cooperation. In 2020 IEEE International Conference on Human-Machine Systems (ICHMS) (pp. 1\u20136). IEEE.","DOI":"10.1109\/ICHMS49158.2020.9209472"},{"key":"9707_CR14","doi-asserted-by":"crossref","unstructured":"Sendhoff, B., & Wersing, H. (2020). Cooperative intelligence-a humane perspective. In 2020 IEEE International Conference on Human-Machine Systems (ICHMS) (pp. 1\u20136). IEEE.","DOI":"10.1109\/ICHMS49158.2020.9209387"},{"key":"9707_CR15","unstructured":"Street, C., Lacerda, B., Staniaszek, M., M\u00fchlig, M., & Hawes, N. (2022). Context-aware modelling for multi-robot systems under uncertainty. In 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-2022) (pp. 1228\u20131236). International Foundation for Autonomous Agents and Multiagent Systems."},{"key":"9707_CR16","doi-asserted-by":"crossref","unstructured":"Grosz, B. J., & Kraus, S. (1999). The evolution of SharedPlans. In Foundations of rational agency (pp. 227\u2013262). Springer.","DOI":"10.1007\/978-94-015-9204-8_10"},{"key":"9707_CR17","doi-asserted-by":"crossref","unstructured":"Mirsky, R., Carlucho, I., Rahman, A., Fosong, E., Macke, W., Sridharan, M., Stone, P., & Albrecht, S. V. (2022). A survey of Ad Hoc teamwork research. In European Conference on Multi-Agent Systems (EUMAS) (pp. 275\u2013293). Springer International Publishing.","DOI":"10.1007\/978-3-031-20614-6_16"},{"key":"9707_CR18","doi-asserted-by":"crossref","unstructured":"Rother, D., Weisswange, T., & Peters, J. (2023). Disentangling interaction using maximum entropy reinforcement learning in multi-agent systems. 26th European Conference on Artificial Intelligence (ECAI 2023).","DOI":"10.3233\/FAIA230491"},{"key":"9707_CR19","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1613\/jair.1579","volume":"24","author":"PJ Gmytrasiewicz","year":"2005","unstructured":"Gmytrasiewicz, P. J., & Doshi, P. (2005). A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research, 24, 49\u201379.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"9707_CR20","unstructured":"Doshi, P., Qu, X., Goodie, A., & Young, D. (2010). Modeling recursive reasoning by humans using empirically informed interactive POMDPs. In Proceedings of the 9th international conference on autonomous agents and multiagent systems: volume 1-volume 1 (pp. 1223\u20131230)."},{"key":"9707_CR21","unstructured":"Hoang, T. N., & Low, K. H. (2013). Interactive POMDP Lite: Towards practical planning to predict and exploit intentions for interacting with self-interested agents. In Proceedings of the 23rd international joint conference on Artificial Intelligence (IJCAI 2013 (pp. 2298\u20132305)."},{"issue":"11","key":"9707_CR22","doi-asserted-by":"publisher","first-page":"13677","DOI":"10.1007\/s10489-022-04105-y","volume":"53","author":"A Oroojlooy","year":"2023","unstructured":"Oroojlooy, A., & Hajinezhad, D. (2023). A review of cooperative multi-agent deep reinforcement learning. Applied Intelligence, 53(11), 13677\u201313722.","journal-title":"Applied Intelligence"},{"key":"9707_CR23","unstructured":"Christianos, F., Papoudakis, G., & Albrecht, S. V. (2023). Pareto actor-critic for equilibrium selection in multi-agent reinforcement learning. Transactions on Machine Learning Research."},{"key":"9707_CR24","unstructured":"Yang, Y., & Wang, J. (2020). An overview of multi-agent reinforcement learning from game theoretical perspective. arXiv:2011.00583"},{"key":"9707_CR25","unstructured":"Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., & Whiteson, S. (2018). Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International conference on machine learning (pp. 4295\u20134304). PMLR."},{"key":"9707_CR26","unstructured":"Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J. Z., Tuyls, K., & Graepel, T. (2018). Value-decomposition networks for cooperative multi-agent learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS \u201918) (pp. 2085\u20132087)."},{"key":"9707_CR27","unstructured":"Zhang, T., Li, Y., Wang, C., Xie, G., & Lu, Z. (2021). Fop: Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning. In International conference on machine learning (pp. 12491\u201312500). PMLR."},{"key":"9707_CR28","unstructured":"Yang, J., Nakhaei, A., Isele, D., Fujimura, K., & Zha, H. (2019). CM3: Cooperative multi-goal multi-stage multi-agent reinforcement learning. In International conference on learning representations."},{"key":"9707_CR29","unstructured":"Omidshafiei, S., Pazis, J., Amato, C., How, J. P., & Vian, J. (2017). Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In International conference on machine learning (pp. 2681\u20132690). PMLR."},{"key":"9707_CR30","unstructured":"Albrecht, S. V., & Stone, P. (2017). Reasoning about hypothetical agent behaviours and their parameters. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems (AAMAS \u201917) (pp. 547\u2014-555)."},{"key":"9707_CR31","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1016\/j.artint.2016.10.005","volume":"242","author":"S Barrett","year":"2017","unstructured":"Barrett, S., Rosenfeld, A., Kraus, S., & Stone, P. (2017). Making friends on the fly: Cooperating with new teammates. Artificial Intelligence, 242, 132\u2013171.","journal-title":"Artificial Intelligence"},{"key":"9707_CR32","unstructured":"Barrett, S., Stone, P., Kraus, S., & Rosenfeld, A. (2012). Learning teammate models for ad hoc teamwork. In AAMAS Adaptive Learning Agents (ALA) Workshop (pp. 57\u201363). Citeseer."},{"key":"9707_CR33","doi-asserted-by":"crossref","unstructured":"Barrett, S., Stone, P., Kraus, S., & Rosenfeld, A. (2013). Teamwork with limited knowledge of teammates. In Proceedings of the AAAI conference on artificial intelligence (Vol.\u00a027, pp. 102\u2013108).","DOI":"10.1609\/aaai.v27i1.8659"},{"issue":"2","key":"9707_CR34","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1007\/s10458-015-9280-x","volume":"30","author":"FS Melo","year":"2016","unstructured":"Melo, F. S., & Sardinha, A. (2016). Ad hoc teamwork by learning teammates\u2019 task. Autonomous Agents and Multi-Agent Systems, 30(2), 175\u2013219.","journal-title":"Autonomous Agents and Multi-Agent Systems"},{"issue":"4","key":"9707_CR35","doi-asserted-by":"publisher","first-page":"508","DOI":"10.1002\/aaai.12131","volume":"44","author":"A Eck","year":"2023","unstructured":"Eck, A., Soh, L. K., & Doshi, P. (2023). Decision making in open agent systems. AI Magazine, 44(4), 508\u2013523.","journal-title":"AI Magazine"},{"key":"9707_CR36","unstructured":"Rahman, M. A., Hopner, N., Christianos, F., & Albrecht, S. V. (2021). Towards open ad hoc teamwork using graph-based policy learning. In International conference on machine learning (pp. 8776\u20138786). PMLR."},{"key":"9707_CR37","doi-asserted-by":"publisher","first-page":"594","DOI":"10.3390\/e17020594","volume":"17","author":"G Wilmers","year":"2015","unstructured":"Wilmers, G. (2015). A foundational approach to generalising the maximum entropy inference process to the multi-agent context. Entropy, 17, 594\u2013645. https:\/\/doi.org\/10.3390\/e17020594","journal-title":"Entropy"},{"key":"9707_CR38","unstructured":"Haarnoja, T., Tang, H., Abbeel, P., & Levine, S. (2017). Reinforcement learning with deep energy-based policies. In International conference on machine learning (pp. 1352\u20131361). PMLR."},{"key":"9707_CR39","doi-asserted-by":"crossref","unstructured":"Wang, Z., Zhang, Y., Yin, C., & Huang, Z. (2021). Multi-agent deep reinforcement learning based on maximum entropy. In 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC) (Vol. 4, pp. 1402\u20131406). IEEE.","DOI":"10.1109\/IMCEC51613.2021.9482235"},{"key":"9707_CR40","doi-asserted-by":"crossref","unstructured":"Haarnoja, T., Pong, V., Zhou, A., Dalal, M., Abbeel, P., & Levine, S. (2018). Composable deep reinforcement learning for robotic manipulation. In 2018 IEEE international conference on robotics and automation (ICRA) (pp. 6244\u20136251). IEEE.","DOI":"10.1109\/ICRA.2018.8460756"},{"issue":"8","key":"9707_CR41","doi-asserted-by":"publisher","first-page":"1771","DOI":"10.1162\/089976602760128018","volume":"14","author":"GE Hinton","year":"2002","unstructured":"Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771\u20131800.","journal-title":"Neural Computation"},{"key":"9707_CR42","unstructured":"Le, A. T., Hansel, K., Peters, J., & Chalvatzaki, G. (2022). Hierarchical policy blending as optimal transport. In 5th annual learning for dynamics and control conference (pp. 211:797\u2013211:812). PMLR."},{"key":"9707_CR43","doi-asserted-by":"crossref","unstructured":"Hansel, K., Urain, J., Peters, J., & Chalvatzaki, G. (2023). Hierarchical policy blending as inference for reactive robot control. In 2023 IEEE International Conference on Robotics and Automation (ICRA) (pp. 10181\u201310188). IEEE.","DOI":"10.1109\/ICRA48891.2023.10161374"},{"key":"9707_CR44","doi-asserted-by":"crossref","unstructured":"Tange, Y., Kiryu, S., & Matsui, T. (2020). Mild action blending policy on deep reinforcement learning with discretized actions for process control. In 2020 59th annual conference of the Society of Instrument and Control Engineers of Japan (SICE) (pp. 587\u2013592). IEEE.","DOI":"10.23919\/SICE48898.2020.9240311"},{"key":"9707_CR45","doi-asserted-by":"crossref","unstructured":"Singh, S., & Heard, J. (2023). Probabilistic policy blending for shared autonomy using deep reinforcement learning. In 2023 32nd IEEE International conference on robot and human interactive communication (RO-MAN) (pp. 1537\u20131544). IEEE.","DOI":"10.1109\/RO-MAN57019.2023.10309604"},{"issue":"7","key":"9707_CR46","doi-asserted-by":"publisher","first-page":"790","DOI":"10.1177\/0278364913490324","volume":"32","author":"AD Dragan","year":"2013","unstructured":"Dragan, A. D., & Srinivasa, S. S. (2013). A policy-blending formalism for shared control. The International Journal of Robotics Research, 32(7), 790\u2013805.","journal-title":"The International Journal of Robotics Research"},{"key":"9707_CR47","unstructured":"Hiatt, L. M., Harrison, A. M., & Trafton, J. G. (2011). Accommodating human variability in human-robot teams through theory of mind. In Twenty-second international joint conference on artificial intelligence (pp. 2066\u20132071)."},{"issue":"12","key":"9707_CR48","doi-asserted-by":"publisher","first-page":"493","DOI":"10.1016\/S1364-6613(98)01262-5","volume":"2","author":"V Gallese","year":"1998","unstructured":"Gallese, V., & Goldman, A. (1998). Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences, 2(12), 493\u2013501.","journal-title":"Trends in Cognitive Sciences"},{"key":"9707_CR49","doi-asserted-by":"crossref","unstructured":"Nguyen, T., & Gonzalez, C. (2021). Theory of mind from observation in cognitive models and humans. Topics in Cognitive Science.","DOI":"10.1111\/tops.12553"},{"key":"9707_CR50","doi-asserted-by":"crossref","unstructured":"Gray, J., Breazeal, C., Berlin, M., Brooks, A., & Lieberman, J. (2005). Action parsing and goal inference using self as simulator. In ROMAN 2005. IEEE international workshop on robot and human interactive communication, 2005 (pp. 202\u2013209). IEEE.","DOI":"10.1109\/ROMAN.2005.1513780"},{"issue":"5","key":"9707_CR51","doi-asserted-by":"publisher","first-page":"656","DOI":"10.1177\/0278364909102796","volume":"28","author":"C Breazeal","year":"2009","unstructured":"Breazeal, C., Gray, J., & Berlin, M. (2009). An embodied cognition approach to mindreading skills for socially intelligent robots. The International Journal of Robotics Research, 28(5), 656\u2013680.","journal-title":"The International Journal of Robotics Research"},{"issue":"4","key":"9707_CR52","doi-asserted-by":"publisher","first-page":"460","DOI":"10.1109\/TSMCA.2005.850592","volume":"35","author":"JG Trafton","year":"2005","unstructured":"Trafton, J. G., Cassimatis, N. L., Bugajska, M. D., Brock, D. P., Mintz, F. E., & Schultz, A. C. (2005). Enabling effective human-robot interaction using perspective-taking in robots. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 35(4), 460\u2013470.","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans"},{"key":"9707_CR53","unstructured":"Berlin, M., Gray, J., Thomaz, A. L., Breazeal, C. (2006). Perspective taking: An organizing principle for learning in human-robot interaction. In Association for the advancement of artificial intelligence (Vol.\u00a02, pp. 1444\u20131450)."},{"key":"9707_CR54","doi-asserted-by":"crossref","unstructured":"Talamadupula, K., Briggs, G., Chakraborti, T., Scheutz, M., & Kambhampati, S. (2014). Coordination in human-robot teams using mental modeling and plan recognition. In 2014 IEEE\/RSJ international conference on intelligent robots and systems (pp. 2957\u20132962). IEEE.","DOI":"10.1109\/IROS.2014.6942970"},{"key":"9707_CR55","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1016\/j.cobeha.2019.04.010","volume":"29","author":"J Jara-Ettinger","year":"2019","unstructured":"Jara-Ettinger, J. (2019). Theory of mind as inverse reinforcement learning. Current Opinion in Behavioral Sciences, 29, 105\u2013110.","journal-title":"Current Opinion in Behavioral Sciences"},{"key":"9707_CR56","doi-asserted-by":"crossref","unstructured":"Choudhury, R., Swamy, G., Hadfield-Menell, D., & Dragan, A. D. (2019). On the utility of model learning in hri. In 2019 14th ACM\/IEEE international conference on Human-Robot Interaction (HRI) (pp. 317\u2013325). IEEE.","DOI":"10.1109\/HRI.2019.8673256"},{"key":"9707_CR57","doi-asserted-by":"crossref","unstructured":"Javdani, S., Srinivasa, S. S., & Bagnell, J. A. (2015). Shared autonomy via hindsight optimization. Robotics science and systems: online proceedings.","DOI":"10.15607\/RSS.2015.XI.032"},{"key":"9707_CR58","unstructured":"Baker, C., Saxe, R., & Tenenbaum, J. (2011). Bayesian theory of mind: Modeling joint belief-desire attribution. In Proceedings of the annual meeting of the cognitive science society (Vol.\u00a033, pp. 2069\u20132074)."},{"issue":"2","key":"9707_CR59","doi-asserted-by":"publisher","first-page":"414","DOI":"10.1111\/tops.12525","volume":"13","author":"SA Wu","year":"2021","unstructured":"Wu, S. A., Wang, R. E., Evans, J. A., Tenenbaum, J. B., Parkes, D. C., & Kleiman-Weiner, M. (2021). Too many cooks: Bayesian inference for coordinating multi-agent collaboration. Topics in Cognitive Science, 13(2), 414\u2013432.","journal-title":"Topics in Cognitive Science"},{"key":"9707_CR60","unstructured":"Yuan, L., Fu, Z., Zhou, L., Yang, K., & Zhu, S. C. (2019). Emergence of theory of mind collaboration in multiagent systems. Emergent Communication Workshop, 33rd Conference on Neural Information Processing Systems (NeurIPS)."},{"key":"9707_CR61","unstructured":"Rabinowitz, N., Perbet, F., Song, F., Zhang, C., Eslami, S. A., & Botvinick, M. (2018). Machine theory of mind. In International conference on machine learning (pp. 4218\u20134227). PMLR."},{"key":"9707_CR62","doi-asserted-by":"crossref","unstructured":"Oguntola, I., Hughes, D., & Sycara, K. (2021). Deep interpretable models of theory of mind. In 2021 30th IEEE international conference on robot & human interactive communication (RO-MAN) (pp. 657\u2013664). IEEE.","DOI":"10.1109\/RO-MAN50785.2021.9515505"},{"key":"9707_CR63","unstructured":"Zhu, H., Neubig, G., & Bisk, Y. (2021). Few-shot language coordination by modeling theory of mind. In International conference on machine learning (pp. 12901\u201312911). PMLR."},{"key":"9707_CR64","unstructured":"Ziebart, B. D. (2010). Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University."},{"key":"9707_CR65","unstructured":"Schulman, J., Chen, X., & Abbeel, P. (2017). Equivalence between policy gradients and soft q-learning. arXiv:1704.06440"},{"issue":"2","key":"9707_CR66","doi-asserted-by":"publisher","first-page":"221","DOI":"10.3390\/e22020221","volume":"22","author":"F Nielsen","year":"2020","unstructured":"Nielsen, F. (2020). On a generalization of the Jensen-Shannon divergence and the Jensen-Shannon centroid. Entropy, 22(2), 221.","journal-title":"Entropy"},{"key":"9707_CR67","unstructured":"Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In 3rd international conference for learning representations, San Diego."},{"key":"9707_CR68","unstructured":"Lowe, R., Wu, Y. I., Tamar, A., Harb, J., Pieter\u00a0Abbeel, O., & Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in Neural Information Processing Systems, 30."},{"key":"9707_CR69","unstructured":"Albrecht, S. V., & Ramamoorthy, S. (2013). A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems (AAMAS \u201913) (pp. 1155\u20131156)."},{"key":"9707_CR70","first-page":"11853","volume":"33","author":"M Zhou","year":"2020","unstructured":"Zhou, M., Liu, Z., Sui, P., Li, Y., & Chung, Y. Y. (2020). Learning implicit credit assignment for cooperative multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 33, 11853\u201311864.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"14","key":"9707_CR71","doi-asserted-by":"publisher","first-page":"6938","DOI":"10.3390\/app12146938","volume":"12","author":"L Feng","year":"2022","unstructured":"Feng, L., Xie, Y., Liu, B., & Wang, S. (2022). Multi-level credit assignment for cooperative multi-agent reinforcement learning. Applied Sciences, 12(14), 6938.","journal-title":"Applied Sciences"},{"key":"9707_CR72","unstructured":"Zhou, T., Zhang, F., Shao, K., Li, K., Huang, W., Luo, J., Wang, W., Yang, Y., Mao, H., Wang, B., Li, D., Liu, W., & Hao, J. (2021). Cooperative multi-agent transfer learning with level-adaptive credit assignment. arXiv:2106.00517"},{"key":"9707_CR73","doi-asserted-by":"crossref","unstructured":"Nguyen, D. T., Kumar, A., & Lau, H. C. (2018). Credit assignment for collective multiagent RL with global rewards. Advances in Neural Information Processing Systems, 31.","DOI":"10.1609\/aaai.v31i1.10708"}],"container-title":["Autonomous Agents and Multi-Agent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-025-09707-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10458-025-09707-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-025-09707-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,6]],"date-time":"2025-07-06T23:54:18Z","timestamp":1751846058000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10458-025-09707-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,16]]},"references-count":73,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["9707"],"URL":"https:\/\/doi.org\/10.1007\/s10458-025-09707-7","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-4562541\/v1","asserted-by":"object"}]},"ISSN":["1387-2532","1573-7454"],"issn-type":[{"value":"1387-2532","type":"print"},{"value":"1573-7454","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,16]]},"assertion":[{"value":"28 April 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 May 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"This work was supported by the Honda Research Institute Europe, Germany Dorothea Koert was funded by German Federal Ministry of Education and Research (project IKIDA 01IS20045) Joni Pajarinen was supported by Research Council of Finland (formerly Academy of Finland) (decision 345521) Thomas H. Weisswange is an employee of the Honda Research Institute Europe GmbH.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"27"}}