{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T14:46:23Z","timestamp":1772635583177,"version":"3.50.1"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,2,22]],"date-time":"2024-02-22T00:00:00Z","timestamp":1708560000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Hong Kong Research Grant Council","award":["GRF 11218621"],"award-info":[{"award-number":["GRF 11218621"]}]},{"name":"RIF","award":["R5060-19"],"award-info":[{"award-number":["R5060-19"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2024,4,30]]},"abstract":"<jats:p>Multi-agent reinforcement learning (MARL) has proven effective in training multi-robot confrontation, such as StarCraft and robot soccer games. However, the current joint action policies utilized in MARL have been unsuccessful in recognizing and preventing actions that often lead to failures on our side. This exacerbates the cooperation dilemma, ultimately resulting in our agents acting independently and being defeated individually by their opponents. To tackle this challenge, we propose a novel joint action policy, referred to as the consensus action policy (CAP). Specifically, CAP records the number of times each joint action has caused our side to fail in the past and computes a cooperation tendency, which is integrated with each agent\u2019s<jats:italic>Q<\/jats:italic>-value and Nash bargaining solution to determine a joint action. The cooperation tendency promotes team cooperation by selecting joint actions that have a high tendency of cooperation and avoiding actions that may lead to team failure. Moreover, the proposed CAP policy can be extended to partially observable scenarios by combining it with Deep<jats:italic>Q<\/jats:italic>network or actor-critic\u2013based methods. We conducted extensive experiments to compare the proposed method with seven existing joint action policies, including four commonly used methods and three state-of-the-art methods, in terms of episode rewards, winning rates, and other metrics. Our results demonstrate that this approach holds great promise for multi-robot confrontation scenarios.<\/jats:p>","DOI":"10.1145\/3639371","type":"journal-article","created":{"date-parts":[[2023,12,29]],"date-time":"2023-12-29T22:04:43Z","timestamp":1703887483000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Strengthening Cooperative Consensus in Multi-Robot Confrontation"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4857-5439","authenticated-orcid":false,"given":"Meng","family":"Xu","sequence":"first","affiliation":[{"name":"Department of Computer Science, City University of Hong Kong, Hong Kong, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8563-148X","authenticated-orcid":false,"given":"Xinhong","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science, City University of Hong Kong, Hong Kong, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6951-2392","authenticated-orcid":false,"given":"Yechao","family":"She","sequence":"additional","affiliation":[{"name":"Department of Computer Science, City University of Hong Kong, Hong Kong, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5337-487X","authenticated-orcid":false,"given":"Yang","family":"Jin","sequence":"additional","affiliation":[{"name":"Department of Computer Science, City University of Hong Kong, Hong Kong, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-5459-1911","authenticated-orcid":false,"given":"Guanyi","family":"Zhao","sequence":"additional","affiliation":[{"name":"Department of Computer Science, City University of Hong Kong, Hong Kong, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9318-1482","authenticated-orcid":false,"given":"Jianping","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, City University of Hong Kong, Hong Kong, China"}]}],"member":"320","published-online":{"date-parts":[[2024,2,22]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"1","article-title":"Solving navigation and obstacle avoidance in soccer robot using case-base reasoning technique","author":"Azarkasb Seyed Omid","year":"2021","unstructured":"Seyed Omid Azarkasb and Seyed Hossein Khasteh. 2021. Solving navigation and obstacle avoidance in soccer robot using case-base reasoning technique. In Proceedings of the 26th International Computer Conference, Computer Society of Iran (CSICC\u201921), 1\u20136.","journal-title":"Proceedings of the 26th International Computer Conference, Computer Society of Iran (CSICC\u201921)"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2022.3189021"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1080\/1350486X.2022.2136727"},{"key":"e_1_3_2_5_2","unstructured":"Baiming Chen Mengdi Xu Zuxin Liu Liang Li and Ding Zhao. 2020. Delay-aware multi-agent reinforcement learning for cooperative and competitive environments. Retrieved from https:\/\/arxiv.org\/abs\/2005.05441"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12652-019-01635-1"},{"key":"e_1_3_2_7_2","article-title":"Finite-sample guarantees for nash Q-learning with linear function approximation","author":"Cisneros-Velarde Pedro","year":"2023","unstructured":"Pedro Cisneros-Velarde and Sanmi Koyejo. 2023. Finite-sample guarantees for nash Q-learning with linear function approximation. arXiv:2303.00177. Retrieved from https:\/\/arxiv.org\/abs\/2303.00177","journal-title":"arXiv:2303.00177"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10846-018-0783-y"},{"key":"e_1_3_2_9_2","first-page":"269","volume-title":"Anais do VIII Simp\u00f3sio Brasileiro de Rob\u00f3tica e XVII Simp\u00f3sio Latino Americano de Rob\u00f3tica","author":"Medeiros Thiago Felipe de","year":"2020","unstructured":"Thiago Felipe de Medeiros, Marcos M\u00e1ximo, and Takashi Yoneyama. 2020. Deep reinforcement learning applied to IEEE very small size soccer strategy. In Anais do VIII Simp\u00f3sio Brasileiro de Rob\u00f3tica e XVII Simp\u00f3sio Latino Americano de Rob\u00f3tica. SBC, 269\u2013274."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403135"},{"key":"e_1_3_2_11_2","article-title":"SMACv2: An improved benchmark for cooperative multi-agent reinforcement learning","author":"Ellis Benjamin","year":"2022","unstructured":"Benjamin Ellis, Skander Moalla, Mikayel Samvelyan, Mingfei Sun, Anuj Mahajan, Jakob N. Foerster, and Shimon Whiteson. 2022. SMACv2: An improved benchmark for cooperative multi-agent reinforcement learning. arXiv:2212.07489. Retrieved from https:\/\/arxiv.org\/abs\/2212.07489","journal-title":"arXiv:2212.07489"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","unstructured":"J. Foerster G. Farquhar T. Afouras N. Nardelli and S. Whiteson. 2018. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI Conference on Artificial Intelligence 32 1 (2018). DOI:10.1609\/aaai.v32i1.11794","DOI":"10.1609\/aaai.v32i1.11794"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.3390\/e23081043"},{"key":"e_1_3_2_14_2","doi-asserted-by":"crossref","unstructured":"Tuomas Haarnoja Ben Moran Guy Lever Sandy H. Huang Dhruva Tirumala Markus Wulfmeier Jan Humplik Saran Tunyasuvunakool Noah Y. Siegel Roland Hafner et\u00a0al. 2023. Learning agile soccer skills for a bipedal robot with deep reinforcement learning. Retrieved from https:\/\/arxiv.org\/abs\/2304.13653","DOI":"10.1126\/scirobotics.adi8022"},{"key":"e_1_3_2_15_2","first-page":"243","volume-title":"Advances in Artificial Intelligence: Proceedings of the 33rd Australasian Joint Conference (AI\u201920)","author":"Hao Daniel","year":"2020","unstructured":"Daniel Hao, Penny Sweetser, and Matthew Aitchison. 2020. Designing curriculum for deep reinforcement learning in StarCraft II. In Advances in Artificial Intelligence: Proceedings of the 33rd Australasian Joint Conference (AI\u201920). Springer, 243\u2013255."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-49470-7"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-47096-2_12"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.5555\/945365.964288"},{"key":"e_1_3_2_19_2","article-title":"A unified game-theoretic approach to multiagent reinforcement learning","volume":"30","author":"Lanctot Marc","year":"2017","unstructured":"Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien P\u00e9rolat, David Silver, and Thore Graepel. 2017. A unified game-theoretic approach to multiagent reinforcement learning. Adv. Neural Inf. Process. Syst. 30 (2017).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_20_2","article-title":"Multi-agent actor-critic for mixed cooperative-competitive environments","volume":"30","author":"Lowe Ryan","year":"2017","unstructured":"Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. 30 (2017).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_21_2","article-title":"Partial-information Q-learning for general two-player stochastic games","author":"Medhin Negash","year":"2023","unstructured":"Negash Medhin, Andrew Papanicolaou, and Marwen Zrida. 2023. Partial-information Q-learning for general two-player stochastic games. arXiv:2302.10830. Retrieved from https:\/\/arxiv.org\/abs\/2302.10830","journal-title":"arXiv:2302.10830"},{"key":"e_1_3_2_22_2","first-page":"1365","article-title":"Regularized softmax deep multi-agent q-learning","volume":"34","author":"Pan Ling","year":"2021","unstructured":"Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, and Shimon Whiteson. 2021. Regularized softmax deep multi-agent q-learning. Adv. Neural Inf. Process. Syst. 34 (2021), 1365\u20131377.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/WSC.2006.323178"},{"key":"e_1_3_2_24_2","article-title":"The starcraft multi-agent challenge","author":"Samvelyan Mikayel","year":"2019","unstructured":"Mikayel Samvelyan, Tabish Rashid, Christian Schroeder De Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, and Shimon Whiteson. 2019. The starcraft multi-agent challenge. arXiv:1902.04043. Retrieved from https:\/\/arxiv.org\/abs\/1902.04043","journal-title":"arXiv:1902.04043"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1002\/9781118884614"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TETCI.2018.2823329"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-015-4210-7"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-023-09603-y"},{"key":"e_1_3_2_29_2","article-title":"Value-decomposition networks for cooperative multi-agent learning","author":"Sunehag Peter","year":"2017","unstructured":"Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et\u00a0al. 2017. Value-decomposition networks for cooperative multi-agent learning. arXiv:1706.05296. Retrieved from https:\/\/arxiv.org\/abs\/1706.05296","journal-title":"arXiv:1706.05296"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICMLC.2006.258352"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2020.2990722"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/CEI57409.2022.9950106"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/DSC50466.2020.00049"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3623405"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICMA.2019.8816420"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2022.109448"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579829"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-72062-9_35"},{"key":"e_1_3_2_39_2","article-title":"Hierarchical reinforcement learning in StarCraft II with human expertise in subgoals selection","author":"Xu Xinyi","year":"2020","unstructured":"Xinyi Xu, Tiancheng Huang, Pengfei Wei, Akshay Narayan, and Tze-Yun Leong. 2020. Hierarchical reinforcement learning in StarCraft II with human expertise in subgoals selection. arXiv:2008.03444. Retrieved from https:\/\/arxiv.org\/abs\/2008.03444","journal-title":"arXiv:2008.03444"},{"key":"e_1_3_2_40_2","first-page":"24611","article-title":"The surprising effectiveness of ppo in cooperative multi-agent games","volume":"35","author":"Yu Chao","year":"2022","unstructured":"Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. 2022. The surprising effectiveness of ppo in cooperative multi-agent games. Adv. Neural Inf. Process. Syst. 35 (2022), 24611\u201324624.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_41_2","first-page":"140","volume-title":"Robot World Cup","author":"Zare Nader","year":"2021","unstructured":"Nader Zare, Mahtab Sarvmaili, Aref Sayareh, Omid Amini, Stan Matwin, and Amilcar Soares. 2021. Engineering features to improve pass prediction in soccer simulation 2d games. In Robot World Cup. Springer, 140\u2013152."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-60990-0_12"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639371","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3639371","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:54:11Z","timestamp":1750287251000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639371"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,22]]},"references-count":41,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,4,30]]}},"alternative-id":["10.1145\/3639371"],"URL":"https:\/\/doi.org\/10.1145\/3639371","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"value":"2157-6904","type":"print"},{"value":"2157-6912","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,22]]},"assertion":[{"value":"2023-04-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-26","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-02-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}