{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,15]],"date-time":"2026-01-15T23:58:12Z","timestamp":1768521492298,"version":"3.49.0"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2024,1,25]],"date-time":"2024-01-25T00:00:00Z","timestamp":1706140800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,1,25]],"date-time":"2024-01-25T00:00:00Z","timestamp":1706140800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"National Natural Science Foundation of China Youth Science Foundation","award":["62102446"],"award-info":[{"award-number":["62102446"]}]},{"name":"National Natural Science Foundation of China Youth Science Foundation","award":["62102444"],"award-info":[{"award-number":["62102444"]}]},{"name":"National Natural Science Foundation of China Youth Science Foundation","award":["61902425"],"award-info":[{"award-number":["61902425"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Behavior trees have attracted great interest in computer games and robotic applications. However, it lacks the learning ability for dynamic environments. Previous works combining behavior trees with reinforcement learning either need to construct an independent sub-scenario or train the learning method over the whole game, which is not suited for complex multi-agent games. In this paper, a framework is proposed, named as MARL-BT, that embeds multi-agent reinforcement learning methods into behavior trees. Following the running mechanism of behavior trees, we design the way of collecting samples and the training procedure. Further, we point out a special phenomenon in MARL-BT, i.e., the unexpected interruption, and present an action masking technique to remove its harmful effect on learning performance. Finally, we make extensive experiments on the 11 versus 11 full game in Google Research Football. The introduced MARL-BT framework could get an 11.507% improvement compared to pure BT for certain scenarios. The action masking technique could greatly improve the performance of the learning method, i.e., the final reward is improved around 100% times for a sub-task.<\/jats:p>","DOI":"10.1007\/s40747-023-01326-7","type":"journal-article","created":{"date-parts":[[2024,1,25]],"date-time":"2024-01-25T18:02:44Z","timestamp":1706205764000},"page":"3273-3282","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Embedding multi-agent reinforcement learning into behavior trees with unexpected interruptions"],"prefix":"10.1007","volume":"10","author":[{"given":"Xianglong","family":"Li","sequence":"first","affiliation":[]},{"given":"Yuan","family":"Li","sequence":"additional","affiliation":[]},{"given":"Jieyuan","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Xinhai","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Donghong","family":"Liu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,1,25]]},"reference":[{"key":"1326_CR1","unstructured":"Weber BG, Mateas M, Jhala A (2011) Building human-level AI for real-time strategy games. In: 2011 AAAI Fall symposium series"},{"key":"1326_CR2","doi-asserted-by":"publisher","unstructured":"Robertson G, Watson I (2015) Building behavior trees from observations in real-time strategy games. In: 2015 International symposium on innovations in intelligent systems and applications (INISTA), pp 1\u20137. https:\/\/doi.org\/10.1109\/INISTA.2015.7276774","DOI":"10.1109\/INISTA.2015.7276774"},{"key":"1326_CR3","doi-asserted-by":"crossref","unstructured":"Goudarzi H, Hine D, Richards A (2019) Mission automation for drone inspection in congested environments. In: 2019 Workshop on research, education and development of unmanned aerial systems (RED UAS). IEEE, pp 305\u2013314","DOI":"10.1109\/REDUAS47371.2019.8999719"},{"key":"1326_CR4","unstructured":"Olsson M (2016) Behavior trees for decision-making in autonomous driving. https:\/\/api.semanticscholar.org\/CorpusID:112621565"},{"key":"1326_CR5","doi-asserted-by":"crossref","unstructured":"Kuckling J, Ligot A, Bozhinoski D, Birattari M (2018) Behavior trees as a control architecture in the automatic modular design of robot swarms. In: International conference on swarm intelligence. Springer, pp 30\u201343","DOI":"10.1007\/978-3-030-00533-7_3"},{"key":"1326_CR6","doi-asserted-by":"crossref","unstructured":"Sprague CI, \u00d6zkahraman \u00d6, Munaf\u00f2 A, Marlow R, Phillips AB, \u00d6gren P (2018) Improving the modularity of AUV control systems using behaviour trees. In: 2018 IEEE\/OES autonomous underwater vehicle workshop (AUV), pp 1\u20136","DOI":"10.1109\/AUV.2018.8729810"},{"key":"1326_CR7","doi-asserted-by":"crossref","unstructured":"Macenski S, Mart\u2019in FJP, White R, Clavero JG (2020) The marathon 2: a navigation system. In: 2020 IEEE\/RSJ international conference on intelligent robots and systems (IROS), pp 2718\u20132725","DOI":"10.1109\/IROS45743.2020.9341207"},{"key":"1326_CR8","doi-asserted-by":"crossref","unstructured":"Zhang Q, Xu K, Jiao P, Yin Q (2018) Behavior modeling for autonomous agents based on modified evolving behavior trees. In: 2018 IEEE 7th data driven control and learning systems conference (DDCLS). IEEE, pp 1140\u20131145","DOI":"10.1109\/DDCLS.2018.8515939"},{"issue":"1","key":"1326_CR9","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1109\/TG.2017.2771831","volume":"11","author":"I Sagredo-Olivenza","year":"2017","unstructured":"Sagredo-Olivenza I, G\u00f3mez-Mart\u00edn PP, G\u00f3mez-Mart\u00edn MA, Gonz\u00e1lez-Calero PA (2017) Trained behavior trees: programming by demonstration to support AI game designers. IEEE Trans Games 11(1):5\u201314","journal-title":"IEEE Trans Games"},{"key":"1326_CR10","doi-asserted-by":"crossref","unstructured":"Fu Y, Qin L, Yin Q (2016) A reinforcement learning behavior tree framework for game AI. In: 2016 International conference on economics, social science, arts, education and management engineering. Atlantis Press, pp 573\u2013579","DOI":"10.2991\/essaeme-16.2016.120"},{"key":"1326_CR11","doi-asserted-by":"crossref","unstructured":"Dey R, Child C (2013) QL-BT: enhancing behaviour tree design and implementation with q-learning. In: 2013 IEEE conference on computational intelligence in games (CIG). IEEE, pp 1\u20138","DOI":"10.1109\/CIG.2013.6633623"},{"key":"1326_CR12","unstructured":"Pereira RdP, Engel PM (2015) A framework for constrained and adaptive behavior-based agents. arXiv preprint arXiv:1506.02312"},{"key":"1326_CR13","unstructured":"Kartasev M (2019) Integrating reinforcement learning into behavior trees by hierarchical composition"},{"key":"1326_CR14","doi-asserted-by":"crossref","unstructured":"Zhang Q, Sun L, Jiao P, Yin Q (2017) Combining behavior trees with maxq learning to facilitate cgfs behavior modeling. In: 2017 4th International conference on systems and informatics (ICSAI). IEEE, pp 525\u2013531","DOI":"10.1109\/ICSAI.2017.8248348"},{"key":"1326_CR15","first-page":"271","volume":"30","author":"R Lowe","year":"2017","unstructured":"Lowe R, Wu YI, Tamar A, Harb J, Pieter Abbeel O, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Adv Neural Inf Process Syst 30:271","journal-title":"Adv Neural Inf Process Syst"},{"key":"1326_CR16","unstructured":"Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S (2018) Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: International conference on machine learning. PMLR, pp 4295\u20134304"},{"key":"1326_CR17","unstructured":"Yu C, Velu A, Vinitsky E, Wang Y, Bayen AM, Wu Y (2021) The surprising effectiveness of MAPPO in cooperative, multi-agent games. CoRR. arXiv:2103.01955"},{"key":"1326_CR18","doi-asserted-by":"crossref","unstructured":"Zhao J, Zhao Y, Wang W, Yang M, Hu X, Zhou W, Hao J, Li H (2022) Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents. arXiv preprint arXiv:2203.08454","DOI":"10.1631\/FITEE.2100594"},{"key":"1326_CR19","unstructured":"Wen M, Kuba JG, Lin R, Zhang W, Wen Y, Wang J, Yang Y (2022) Multi-agent reinforcement learning is a sequence modeling problem. arXiv preprint arXiv:2205.14953"},{"key":"1326_CR20","first-page":"113","volume":"1","author":"L Li","year":"2021","unstructured":"Li L, Wang L, Li Y, Sheng J (2021) Mixed deep reinforcement learning-behavior tree for intelligent agents design. ICAART 1:113\u2013124","journal-title":"ICAART"},{"key":"1326_CR21","doi-asserted-by":"crossref","unstructured":"Isla D (2005) GDC 2005 proceeding: handling complexity in the halo 2 AI. Retrieved Oct 21, 2009","DOI":"10.1016\/S0885-064X(04)00090-1"},{"key":"1326_CR22","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2022.104096","volume":"154","author":"M Iovino","year":"2022","unstructured":"Iovino M, Scukins E, Styrud J, \u00d6gren P, Smith C (2022) A survey of behavior trees in robotics and AI. Robot Auton Syst 154:104096","journal-title":"Robot Auton Syst"},{"key":"1326_CR23","unstructured":"Yang Y, Hao J, Liao B, Shao K, Chen G, Liu W, Tang H (2020) Qatten: a general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939"},{"key":"1326_CR24","unstructured":"Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K et al (2017) Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296"},{"key":"1326_CR25","unstructured":"Son K, Kim D, Kang WJ, Hostallero DE, Yi Y (2019) QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International conference on machine learning. PMLR, pp 5887\u20135896"},{"key":"1326_CR26","unstructured":"Tomai E, Salazar R, Flores R (2013) Simulating aggregate player behavior with learning behavior trees. In: Proceedings of the 22nd annual conference on behavior representation in modeling and simulation"},{"issue":"5","key":"1326_CR27","doi-asserted-by":"publisher","first-page":"6071","DOI":"10.3233\/JIFS-179190","volume":"37","author":"X Zhu","year":"2019","unstructured":"Zhu X (2019) Behavior tree design of intelligent behavior of non-player character (NPC) based on unity3d. J Intell Fuzzy Syst 37(5):6071\u20136079","journal-title":"J Intell Fuzzy Syst"},{"key":"1326_CR28","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602"},{"key":"1326_CR29","doi-asserted-by":"crossref","unstructured":"Kurach K, Raichuk A, Sta\u0144czyk P, Zaj\u0105c M, Bachem O, Espeholt L, Riquelme C, Vincent D, Michalski M, Bousquet O et al (2020) Google research football: a novel reinforcement learning environment. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp 4501\u20134510","DOI":"10.1609\/aaai.v34i04.5878"},{"key":"1326_CR30","unstructured":"Google Research (2020). https:\/\/www.kaggle.com\/competitions\/google-football\/code"},{"key":"1326_CR31","unstructured":"Shen S, Ma C, Li C, Liu W, Fu Y, Mei S, Liu X, Wang C (2023) RiskQ: risk-sensitive multi-agent reinforcement learning value factorization"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01326-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-01326-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01326-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T18:10:51Z","timestamp":1715883051000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-01326-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,25]]},"references-count":31,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,6]]}},"alternative-id":["1326"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-01326-7","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,25]]},"assertion":[{"value":"2 September 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 December 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 January 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}