{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,30]],"date-time":"2025-08-30T17:01:01Z","timestamp":1756573261799},"reference-count":23,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2023,10,18]],"date-time":"2023-10-18T00:00:00Z","timestamp":1697587200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,10,18]],"date-time":"2023-10-18T00:00:00Z","timestamp":1697587200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"National Natural Science Foundation of China Youth Science Foundation","award":["No. 61902425","No. 62102444"],"award-info":[{"award-number":["No. 61902425","No. 62102444"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Great progress has been made in the domain of multi-agent reinforcement learning in recent years. Most work concentrates on solving a single task by learning the cooperative behaviors of agents. However, many real-world problems are normally composed of a set of subtasks in which the execution order follows a certain procedure. Cooperative behaviors should be learned on the premise that agents are first allocated to those subtasks. In this paper, we propose a hierarchical framework for learning the dynamic allocation of agents among subtasks, as well as cooperative behaviors. We design networks corresponding to agents and subnetworks, respectively, which constitute the whole hierarchical network. For the upper layer, a novel allocation learning mechanism is devised to map an agent network to a subtask network. Each agent network could only be assigned to only one subtask network at each time step. For the lower layer, an action learning module is designed to compute appropriate actions for each agent with the allocation result. The agent networks together with the subtask networks are updated by a common reward obtained from the environment. To evaluate the effectiveness of our framework, we conduct experiments in two challenging environments, i.e., Google Research Football and SAVETHECITY. Empirical results show that our framework achieves much better performance than other recent methods.<\/jats:p>","DOI":"10.1007\/s40747-023-01255-5","type":"journal-article","created":{"date-parts":[[2023,10,18]],"date-time":"2023-10-18T09:01:38Z","timestamp":1697619698000},"page":"1985-1995","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["A hierarchical multi-agent allocation-action learning framework for multi-subtask games"],"prefix":"10.1007","volume":"10","author":[{"given":"Xianglong","family":"Li","sequence":"first","affiliation":[]},{"given":"Yuan","family":"Li","sequence":"additional","affiliation":[]},{"given":"Jieyuan","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Xinhai","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Donghong","family":"Liu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,10,18]]},"reference":[{"key":"1255_CR1","unstructured":"Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K et al (2017) Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296"},{"key":"1255_CR2","unstructured":"Wang T, Gupta T, Mahajan A, Peng B, Whiteson S, Zhang C (2020) Rode: learning roles to decompose multi-agent tasks. arXiv preprint arXiv:2010.01523"},{"key":"1255_CR3","unstructured":"Son K, Kim D, Kang WJ, Hostallero DE, Yi Y (2019) Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International conference on machine learning. PMLR , pp 5887\u20135896"},{"key":"1255_CR4","unstructured":"Iqbal S, Costales R Sha F (2022) Alma: hierarchical learning for composite multi-agent tasks. arXiv preprint arXiv:2205.14205"},{"key":"1255_CR5","doi-asserted-by":"publisher","first-page":"13890","DOI":"10.1109\/TITS.2021.3096226","volume":"23","author":"C Liu","year":"2021","unstructured":"Liu C, Chen C-X, Chen C (2021) Meta: a city-wide taxi repositioning framework based on multi-agent reinforcement learning. IEEE Trans Intell Transp Syst 23:13890\u201313895","journal-title":"IEEE Trans Intell Transp Syst"},{"issue":"13","key":"1255_CR6","doi-asserted-by":"publisher","first-page":"10843","DOI":"10.1109\/JIOT.2021.3050804","volume":"8","author":"X Chen","year":"2021","unstructured":"Chen X, Liu G (2021) Energy-efficient task offloading and resource allocation via deep reinforcement learning for augmented reality in mobile edge networks. IEEE Internet Things J 8(13):10843\u201310856","journal-title":"IEEE Internet Things J"},{"key":"1255_CR7","unstructured":"Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S (2018) Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International conference on machine learning. PMLR, pp 4295\u20134304"},{"key":"1255_CR8","doi-asserted-by":"crossref","unstructured":"Gerkey BP, Matari\u0107 MJ (2004) A formal analysis and taxonomy of task allocation in multi-robot systems. Int J Robot Res 23:939\u2013954","DOI":"10.1177\/0278364904045564"},{"key":"1255_CR9","unstructured":"Proper S, Tadepalli P (2009) Solving multiagent assignment Markov decision processes. In: Proceedings of The 8th international conference on autonomous agents and multiagent systems-volume 1. pp 681\u2013688"},{"key":"1255_CR10","doi-asserted-by":"publisher","first-page":"841","DOI":"10.1007\/s10846-018-0783-y","volume":"94","author":"W Dai","year":"2019","unstructured":"Dai W, Lu H, Xiao J, Zheng Z (2019) Task allocation without communication based on incomplete information game theory for multi-robot systems. J Intell Robot Syst 94:841\u2013856","journal-title":"J Intell Robot Syst"},{"issue":"2","key":"1255_CR11","doi-asserted-by":"publisher","first-page":"101","DOI":"10.3390\/info11020101","volume":"11","author":"J Han","year":"2020","unstructured":"Han J, Zhang Z, Wu X (2020) A real-world-oriented multi-task allocation approach based on multi-agent reinforcement learning in mobile crowd sensing. Information 11(2):101","journal-title":"Information"},{"key":"1255_CR12","unstructured":"Pathan S, Shrivastava V (2021) Reinforcement learning for assignment problem with time constraints. arXiv preprint arXiv:2106.02856"},{"key":"1255_CR13","unstructured":"Ma Q, Ge S, He D, Thaker D, Drori I (2019) Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning. arXiv preprint arXiv:1911.04936"},{"issue":"3","key":"1255_CR14","doi-asserted-by":"publisher","first-page":"1529","DOI":"10.1109\/TETC.2019.2902661","volume":"9","author":"J Wang","year":"2019","unstructured":"Wang J, Zhao L, Liu J, Kato N (2019) Smart resource allocation for mobile edge computing: a deep reinforcement learning approach. IEEE Trans Emerg Top Comput 9(3):1529\u20131541","journal-title":"IEEE Trans Emerg Top Comput"},{"key":"1255_CR15","doi-asserted-by":"crossref","unstructured":"Liu Z, Zhang H, Rao B, Wang L (2018) A reinforcement learning based resource management approach for time-critical workloads in distributed computing environment. In: 2018 IEEE international conference on big data (Big Data). IEEE, pp 252\u2013261","DOI":"10.1109\/BigData.2018.8622393"},{"key":"1255_CR16","unstructured":"Hameed MSA, Schwung A (2020) Reinforcement learning on job shop scheduling problems using graph networks. arXiv preprint arXiv:2009.03836"},{"key":"1255_CR17","doi-asserted-by":"crossref","unstructured":"Bitsakos C, Konstantinou I, Koziris N (2018) Derp: a deep reinforcement learning cloud system for elastic resource provisioning. In: 2018 IEEE international conference on cloud computing technology and science (CloudCom). IEEE, pp 21\u201329","DOI":"10.1109\/CloudCom2018.2018.00020"},{"key":"1255_CR18","unstructured":"Liu B, Liu Q, Stone P, Garg A, Zhu Y, Anandkumar A (2021) Coach-player multi-agent reinforcement learning for dynamic team composition. In: International conference on machine learning. PMLR, pp 6860\u20136870"},{"key":"1255_CR19","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2022.104096","volume":"154","author":"M Iovino","year":"2022","unstructured":"Iovino M, Scukins E, Styrud J, \u00d6gren P, Smith C (2022) A survey of behavior trees in robotics and AI. Robot Auton Syst 154:104096","journal-title":"Robot Auton Syst"},{"key":"1255_CR20","unstructured":"Yang Y, Hao J, Liao B, Shao K, Chen G, Liu W, Tang H (2020) Qatten: a general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939"},{"key":"1255_CR21","doi-asserted-by":"crossref","unstructured":"Tokic M, Palm G (2011) Value-difference based exploration: adaptive control between epsilon-greedy and softmax. In: Annual conference on artificial intelligence. Springer, pp 335\u2013346","DOI":"10.1007\/978-3-642-24455-1_33"},{"key":"1255_CR22","doi-asserted-by":"crossref","unstructured":"Kurach K, Raichuk A, Sta\u0144czyk P, Zaj\u0105c M, Bachem O, Espeholt L, Riquelme C, Vincent D, Michalski M, Bousquet O et\u00a0al (2020) Google research football: a novel reinforcement learning environment. In: Proceedings of the AAAI conference on artificial intelligence, vol 34. pp 4501\u20134510","DOI":"10.1609\/aaai.v34i04.5878"},{"key":"1255_CR23","unstructured":"Oliehoek FA, Spaan MT, Vlassis N, Whiteson S (2008) Exploiting locality of interaction in factored dec-pomdps. In: International joint conference on autonomous agents and multi-agent systems, pp 517\u2013524"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01255-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-01255-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01255-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,30]],"date-time":"2024-03-30T15:21:21Z","timestamp":1711812081000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-01255-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,18]]},"references-count":23,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,4]]}},"alternative-id":["1255"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-01255-5","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,18]]},"assertion":[{"value":"22 May 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 September 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 October 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}