{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T19:16:26Z","timestamp":1773774986800,"version":"3.50.1"},"reference-count":56,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2023,4,26]],"date-time":"2023-04-26T00:00:00Z","timestamp":1682467200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,4,26]],"date-time":"2023-04-26T00:00:00Z","timestamp":1682467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"National Project of Foreign Experts","award":["No.G2022033007L"],"award-info":[{"award-number":["No.G2022033007L"]}]},{"DOI":"10.13039\/501100018593","name":"Bagui Scholars Program of Guangxi Zhuang Autonomous Region","doi-asserted-by":"publisher","award":["No.2019A08"],"award-info":[{"award-number":["No.2019A08"]}],"id":[{"id":"10.13039\/501100018593","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Postgraduate Research & Practice Innovation Program of Jiangsu Province","award":["KYCX22_3504"],"award-info":[{"award-number":["KYCX22_3504"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Effective multi-agent teamwork can be facilitated by using personas to decompose goals into lower-level team subtasks through a shared understanding of multi-agent tasks. However, traditional methods for role discovery and assignment are not scalable and fail to adapt to dynamic changes in the environment. To solve this problem, we propose a new framework for learning dynamic role discovery and assignment. We first introduce an action encoder to construct a vector representation for each action based on its characteristics, and define and classify roles from a more comprehensive perspective based on both action differences and action contributions. To rationally assign roles to agents, we propose a representation-based role selection policy based on consideration of role differences and reward horizons, which enables agents to play roles dynamically by dynamically assigning agents with similar abilities to play the same role. Agents playing the same role share their learning of the role, and different roles correspond to different action spaces. We also introduce regularizers to increase the differences between roles and stabilize training by preventing agents from changing roles frequently. Role selection and role policy integrate action representations and role differences in a restricted action space, improving learning efficiency. We conducted experiments in the SMAC benchmark and showed that our method enables effective role discovery and assignment, outperforming the baseline on four of the six scenarios, with an average improvement in win rate of 20<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>%<\/mml:mo>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>, and is effective in hard and super hard maps. We also conduct ablation experiments to demonstrate the importance of each component in our approach.<\/jats:p>","DOI":"10.1007\/s40747-023-01071-x","type":"journal-article","created":{"date-parts":[[2023,4,26]],"date-time":"2023-04-26T12:04:25Z","timestamp":1682510665000},"page":"6211-6222","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Dynamic role discovery and assignment in multi-agent task decomposition"],"prefix":"10.1007","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9116-8674","authenticated-orcid":false,"given":"Yu","family":"Xia","sequence":"first","affiliation":[]},{"given":"Junwu","family":"Zhu","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5260-295X","authenticated-orcid":false,"given":"Liucun","family":"Zhu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,4,26]]},"reference":[{"key":"1071_CR1","unstructured":"Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. In: AAAI\/IAAI, vol 2, pp 746\u2013752"},{"key":"1071_CR2","doi-asserted-by":"crossref","unstructured":"Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the tenth international conference on machine learning, pp 330\u2013337","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"key":"1071_CR3","unstructured":"Foerster J, Assael IA, De Freitas N, Whiteson S (2016) Learning to communicate with deep multi-agent reinforcement learning. In: Advances in neural information processing systems, vol 29"},{"key":"1071_CR4","doi-asserted-by":"crossref","unstructured":"Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International conference on autonomous agents and multiagent systems. Springer, pp 66\u201383","DOI":"10.1007\/978-3-319-71682-4_5"},{"key":"1071_CR5","unstructured":"Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S (2018) Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International conference on machine learning, PMLR, pp 4295\u20134304"},{"key":"1071_CR6","unstructured":"Mahajan A, Rashid T, Samvelyan M, Whiteson S (2019) Maven: multi-agent variational exploration. In: Advances in neural information processing systems, vol 32"},{"key":"1071_CR7","unstructured":"Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, Pineau J (2019) Tarmac: targeted multi-agent communication. In: International conference on machine learning. PMLR, pp 1538\u20131546"},{"key":"1071_CR8","unstructured":"Wang J, Ren Z, Han B, Ye J, Zhang C (2020) Towards understanding linear value decomposition in cooperative multi-agent q-learning"},{"key":"1071_CR9","volume-title":"The condensed wealth of nations","author":"E Butler","year":"2012","unstructured":"Butler E (2012) The condensed wealth of nations. Centre for Independent Studies, NSW"},{"key":"1071_CR10","volume-title":"Adelfe 2.0. Handbook on agent-oriented design processes","author":"N Bonjean","year":"2014","unstructured":"Bonjean N, Mefteh W, Gleizes M-P, Maurel C, Migeon F (2014) Adelfe 2.0. Handbook on agent-oriented design processes. Springer, Berlin"},{"key":"1071_CR11","unstructured":"Wang T, Dong H, Lesser V, Zhang C (2020) Roma: multi-agent reinforcement learning with emergent roles. arXiv preprint arXiv:2003.08039"},{"key":"1071_CR12","unstructured":"Samvelyan M, Rashid T, De\u00a0Witt CS, Farquhar G, Nardelli N, Rudner TG, Hung C-M, Torr PH, Foerster J, Whiteson S (2019) The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043"},{"key":"1071_CR13","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1016\/j.neucom.2016.01.031","volume":"190","author":"L Kraemer","year":"2016","unstructured":"Kraemer L, Banerjee B (2016) Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190:82\u201394","journal-title":"Neurocomputing"},{"key":"1071_CR14","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1613\/jair.2447","volume":"32","author":"FA Oliehoek","year":"2008","unstructured":"Oliehoek FA, Spaan MT, Vlassis N (2008) Optimal and approximate q-value functions for decentralized pomdps. J Artif Intell Res 32:289\u2013353","journal-title":"J Artif Intell Res"},{"key":"1071_CR15","unstructured":"Son K, Kim D, Kang WJ, Hostallero DE, Yi Y (2019) Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International conference on machine learning. PMLR, pp 5887\u20135896"},{"key":"1071_CR16","unstructured":"Yang Y, Hao J, Liao B, Shao K, Chen G, Liu W, Tang H (2020) Qatten: a general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939"},{"key":"1071_CR17","unstructured":"Wang J, Ren Z, Liu T, Yu Y, Zhang C (2020) Qplex: duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062"},{"key":"1071_CR18","unstructured":"Lowe R, Wu YI, Tamar A, Harb J, Pieter Abbeel O, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, vol 30"},{"key":"1071_CR19","unstructured":"Yang Y, Luo R, Li M, Zhou M, Zhang W, Wang J (2018) Mean field multi-agent reinforcement learning. In: International conference on machine learning. PMLR, pp 5571\u20135580"},{"key":"1071_CR20","doi-asserted-by":"crossref","unstructured":"Guo F, Wu Z (2022) Learning maximum entropy policies with qmix in cooperative marl. In: 2022 IEEE 2nd international conference on electronic technology, communication and information (ICETCI). IEEE, pp 357\u2013361","DOI":"10.1109\/ICETCI55101.2022.9832186"},{"key":"1071_CR21","unstructured":"Hu J, Jiang S, Harding SA, Wu H, Liao S-W (2021) Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning. arXiv e-prints, 2021"},{"key":"1071_CR22","unstructured":"Botelho SC, Alami R (1999) M+: a scheme for multi-robot cooperation through negotiated task allocation and achievement. In: Proceedings 1999 IEEE international conference on robotics and automation (Cat. No. 99CH36288C), vol 2. IEEE, pp 1234\u20131239"},{"key":"1071_CR23","doi-asserted-by":"crossref","unstructured":"Chen J, Yang Y, Wei L (2010) Research on the approach of task decomposition in soccer robot system. In: 2010 International conference on digital manufacturing & automation, vol 2. IEEE, pp 284\u2013289","DOI":"10.1109\/ICDMA.2010.33"},{"key":"1071_CR24","unstructured":"Zlot R, Stentz A (2003) Multirobot control using task abstraction in a market framework. In: Collaborative technology alliances conference"},{"issue":"7","key":"1071_CR25","doi-asserted-by":"publisher","first-page":"1257","DOI":"10.1109\/JPROC.2006.876939","volume":"94","author":"MB Dias","year":"2006","unstructured":"Dias MB, Zlot R, Kalra N, Stentz A (2006) Market-based multirobot coordination: a survey and analysis. Proc IEEE 94(7):1257\u20131270","journal-title":"Proc IEEE"},{"issue":"4","key":"1071_CR26","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1109\/MRA.2013.2252996","volume":"20","author":"M Dorigo","year":"2013","unstructured":"Dorigo M, Floreano D, Gambardella LM, Mondada F, Nolfi S, Baaboura T, Birattari M, Bonani M, Brambilla M, Brutschy A et al (2013) Swarmanoid: a novel concept for the study of heterogeneous robotic swarms. IEEE Robot Autom Mag 20(4):60\u201371","journal-title":"IEEE Robot Autom Mag"},{"issue":"7","key":"1071_CR27","doi-asserted-by":"publisher","first-page":"921","DOI":"10.1016\/j.robot.2010.03.013","volume":"58","author":"J Kiener","year":"2010","unstructured":"Kiener J, Von Stryk O (2010) Towards cooperation of heterogeneous, autonomous robots: a case study of humanoid and wheeled robots. Robot Auton Syst 58(7):921\u2013929","journal-title":"Robot Auton Syst"},{"key":"1071_CR28","doi-asserted-by":"crossref","unstructured":"Li X, Dang S, Li K, Liu Q (2010) Multi-agent-based battlefield reconnaissance simulation by novel task decompositionand allocation. In: 2010 5th international conference on computer science & education. IEEE, pp 1410\u20131414","DOI":"10.1109\/ICCSE.2010.5593757"},{"key":"1071_CR29","volume-title":"Automatic task decomposition and state abstraction from demonstration","author":"LC Cobo","year":"2012","unstructured":"Cobo LC, Isbell CL Jr, Thomaz AL (2012) Automatic task decomposition and state abstraction from demonstration. Georgia Institute of Technology, Atlanta"},{"key":"1071_CR30","doi-asserted-by":"crossref","unstructured":"Hu T, Messelodi S, Lanz O (2014) Dynamic task decomposition for probabilistic tracking in complex scenes. In: 2014 22nd International conference on pattern recognition. IEEE, pp 4134\u20134139","DOI":"10.1109\/ICPR.2014.708"},{"issue":"1","key":"1071_CR31","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1177\/0278364906061160","volume":"25","author":"R Zlot","year":"2006","unstructured":"Zlot R, Stentz A (2006) Market-based multirobot coordination for complex tasks. Int J Robot Res 25(1):73\u2013101","journal-title":"Int J Robot Res"},{"key":"1071_CR32","doi-asserted-by":"crossref","unstructured":"Evertsz R, Thangarajah J (2020) A framework for engineering human\/agent teaming systems. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 2477\u20132484","DOI":"10.1609\/aaai.v34i03.5629"},{"issue":"3","key":"1071_CR33","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1023\/A:1010071910869","volume":"3","author":"M Wooldridge","year":"2000","unstructured":"Wooldridge M, Jennings NR, Kinny D (2000) The gaia methodology for agent-oriented analysis and design. Auton Agent Multi-Agent Syst 3(3):285\u2013312","journal-title":"Auton Agent Multi-Agent Syst"},{"key":"1071_CR34","first-page":"185","volume-title":"International workshop on agent-oriented software engineering","author":"A Omicini","year":"2000","unstructured":"Omicini A (2000) Soda: societies and infrastructures in the analysis and design of agent-based systems. International workshop on agent-oriented software engineering. Springer, Berlin, pp 185\u2013193"},{"key":"1071_CR35","first-page":"174","volume-title":"International workshop on agent-oriented software engineering","author":"L Padgham","year":"2002","unstructured":"Padgham L, Winikoff M (2002) Prometheus: a methodology for developing intelligent agents. International workshop on agent-oriented software engineering. Springer, Berlin, pp 174\u2013185"},{"key":"1071_CR36","first-page":"183","volume-title":"International central and eastern European conference on multi-agent systems","author":"M Cossentino","year":"2005","unstructured":"Cossentino M, Gaglio S, Sabatucci L, Seidita V (2005) The passi and agile passi mas meta-models compared with a unifying proposal. International central and eastern European conference on multi-agent systems. Springer, Berlin, pp 183\u2013192"},{"key":"1071_CR37","doi-asserted-by":"publisher","first-page":"254","DOI":"10.4018\/978-1-59904-510-8.ch012","volume-title":"Personalized information retrieval and access: concepts, methods and practices","author":"H Zhu","year":"2008","unstructured":"Zhu H, Zhou M (2008) Role-based multi-agent systems. Personalized information retrieval and access: concepts, methods and practices. IGI Global, USA, pp 254\u2013285"},{"key":"1071_CR38","first-page":"106","volume-title":"International workshop on agent-oriented software engineering","author":"N Spanoudakis","year":"2010","unstructured":"Spanoudakis N, Moraitis P (2010) Using aseme methodology for model-driven agent systems development. International workshop on agent-oriented software engineering. Springer, New York, pp 106\u2013127"},{"issue":"3","key":"1071_CR39","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1504\/IJAOSE.2010.036984","volume":"4","author":"SA DeLoach","year":"2010","unstructured":"DeLoach SA, Garcia-Ojeda JC (2010) O-mase: a customisable approach to designing and building complex, adaptive multi-agent systems. Int J Agent-Oriented Softw Eng 4(3):244\u2013280","journal-title":"Int J Agent-Oriented Softw Eng"},{"issue":"5","key":"1071_CR40","doi-asserted-by":"publisher","first-page":"2054","DOI":"10.1109\/TNNLS.2020.2996209","volume":"32","author":"C Sun","year":"2020","unstructured":"Sun C, Liu W, Dong L (2020) Reinforcement learning with task decomposition for cooperative multiagent systems. IEEE Trans Neural Netw Learn Syst 32(5):2054\u20132065","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"issue":"01","key":"1071_CR41","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1142\/S0218194018500043","volume":"28","author":"KM Lhaksmana","year":"2018","unstructured":"Lhaksmana KM, Murakami Y, Ishida T (2018) Role-based modeling for designing agent behavior in self-organizing multi-agent systems. Int J Softw Eng Knowl Eng 28(01):79\u201396","journal-title":"Int J Softw Eng Knowl Eng"},{"issue":"3","key":"1071_CR42","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1023\/B:AGNT.0000018806.20944.ef","volume":"8","author":"P Bresciani","year":"2004","unstructured":"Bresciani P, Perini A, Giorgini P, Giunchiglia F, Mylopoulos J (2004) Tropos: an agent-oriented software development methodology. Auton Agent Multi-Agent Syst 8(3):203\u2013236","journal-title":"Auton Agent Multi-Agent Syst"},{"key":"1071_CR43","doi-asserted-by":"crossref","unstructured":"Wilson A, Fern A, Tadepalli P (2010) Bayesian policy search for multi-agent role discovery. In: Twenty-fourth AAAI conference on artificial intelligence","DOI":"10.1609\/aaai.v24i1.7679"},{"key":"1071_CR44","unstructured":"Christianos F, Papoudakis G, Rahman MA, Albrecht SV (2021) Scaling multi-agent reinforcement learning with selective parameter sharing. In: International conference on machine learning. PMLR, pp 1989\u20131998"},{"key":"1071_CR45","unstructured":"Le HM, Yue Y, Carr P, Lucey P (2017) Coordinated multi-agent imitation learning. In: International conference on machine learning. PMLR, pp 1995\u20132003"},{"key":"1071_CR46","unstructured":"Wang T, Gupta T, Mahajan A, Peng B, Whiteson S, Zhang C (2020) Rode: learning roles to decompose multi-agent tasks. arXiv preprint arXiv:2010.01523"},{"key":"1071_CR47","unstructured":"Nguyen D, Nguyen P, Venkatesh S, Tran T (2022) Learning to transfer role assignment across team sizes. arXiv preprint arXiv:2204.12937"},{"key":"1071_CR48","unstructured":"Hu S, Xie C, Liang X, Chang X (2022) Policy diagnosis via measuring role diversity in cooperative multi-agent rl. In: International conference on machine learning. PMLR, pp 9041\u20139071"},{"key":"1071_CR49","unstructured":"Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J (2016) Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, vol 29"},{"key":"1071_CR50","first-page":"4","volume":"5","author":"SC Ong","year":"2009","unstructured":"Ong SC, Png SW, Hsu D, Lee WS (2009) Pomdps for robotic tasks with mixed observability. Robot Sci Syst 5:4","journal-title":"Robot Sci Syst"},{"key":"1071_CR51","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-28929-8","volume-title":"A concise introduction to decentralized POMDPs","author":"FA Oliehoek","year":"2016","unstructured":"Oliehoek FA, Amato C (2016) A concise introduction to decentralized POMDPs. Springer, Berlin"},{"key":"1071_CR52","doi-asserted-by":"crossref","unstructured":"Kurach K, Raichuk A, Sta\u0144czyk P, Zajac M, Bachem O, Espeholt L, Riquelme C, Vincent D, Michalski M, Bousquet O, et\u00a0al (2020) Google research football: a novel reinforcement learning environment. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 4501\u20134510","DOI":"10.1609\/aaai.v34i04.5878"},{"key":"1071_CR53","unstructured":"Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable mdps. In: 2015 Aaai fall symposium series"},{"key":"1071_CR54","unstructured":"Ha D, Dai A, Le QV (2016) Hypernetworks. arXiv preprint arXiv:1609.09106"},{"issue":"1","key":"1071_CR55","first-page":"7234","volume":"21","author":"T Rashid","year":"2020","unstructured":"Rashid T, Samvelyan M, De Witt CS, Farquhar G, Foerster J, Whiteson S (2020) Monotonic value function factorisation for deep multi-agent reinforcement learning. J Mach Learn Res 21(1):7234\u20137284","journal-title":"J Mach Learn Res"},{"key":"1071_CR56","unstructured":"Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K, et al (2017) Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01071-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-01071-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01071-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,27]],"date-time":"2023-10-27T19:13:22Z","timestamp":1698434002000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-01071-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,26]]},"references-count":56,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,12]]}},"alternative-id":["1071"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-01071-x","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,26]]},"assertion":[{"value":"20 November 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 April 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 April 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}