{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T10:53:47Z","timestamp":1774436027495,"version":"3.50.1"},"reference-count":73,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,4,23]],"date-time":"2024-04-23T00:00:00Z","timestamp":1713830400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,4,23]],"date-time":"2024-04-23T00:00:00Z","timestamp":1713830400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Previous deep multi-agent reinforcement learning (MARL) algorithms have achieved impressive results, typically in symmetric and homogeneous scenarios. However, asymmetric heterogeneous scenarios are prevalent and usually harder to solve. In this paper, the main discussion is about the cooperative heterogeneous MARL problem in asymmetric heterogeneous maps of the Starcraft Multi-Agent Challenges (SMAC) environment. Recent mainstream approaches use policy-based actor-critic algorithms to solve the heterogeneous MARL problem with various individual agent policies. However, these approaches lack formal definition and further analysis of the heterogeneity problem. Therefore, a formal definition of the Local Transition Heterogeneity (LTH) problem is first given. Then, the LTH problem in SMAC environment can be studied. To comprehensively reveal and study the LTH problem, some new asymmetric heterogeneous maps in SMAC are designed. It has been observed that baseline algorithms fail to perform well in the new maps. Then, the authors propose the Grouped Individual-Global-Max (GIGM) consistency and a novel MARL algorithm, Grouped Hybrid Q-Learning (GHQ). GHQ separates agents into several groups and keeps individual parameters for each group. To enhance cooperation between groups, GHQ maximizes the mutual information between trajectories of different groups. A novel hybrid structure for value factorization in GHQ is also proposed. Finally, experiments on the original and the new maps show the fabulous performance of GHQ compared to other state-of-the-art algorithms.<\/jats:p>","DOI":"10.1007\/s40747-024-01415-1","type":"journal-article","created":{"date-parts":[[2024,4,23]],"date-time":"2024-04-23T17:02:24Z","timestamp":1713891744000},"page":"5261-5280","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["GHQ: grouped hybrid Q-learning for cooperative heterogeneous multi-agent reinforcement learning"],"prefix":"10.1007","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6533-5176","authenticated-orcid":false,"given":"Xiaoyang","family":"Yu","sequence":"first","affiliation":[]},{"given":"Youfang","family":"Lin","sequence":"additional","affiliation":[]},{"given":"Xiangsen","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Sheng","family":"Han","sequence":"additional","affiliation":[]},{"given":"Kai","family":"Lv","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,4,23]]},"reference":[{"issue":"11","key":"1415_CR1","doi-asserted-by":"publisher","first-page":"13677","DOI":"10.1007\/s10489-022-04105-y","volume":"53","author":"A Oroojlooy","year":"2022","unstructured":"Oroojlooy A, Hajinezhad D (2022) A review of cooperative multi-agent deep reinforcement learning. Appl Intell 53(11):13677\u2013722","journal-title":"Appl Intell"},{"issue":"2","key":"1415_CR2","doi-asserted-by":"publisher","first-page":"153","DOI":"10.1007\/s10462-016-9520-8","volume":"49","author":"A Sujil","year":"2018","unstructured":"Sujil A, Verma J, Kumar R (2018) Multi agent system: concepts, platforms and applications in power systems. Artif Intell Rev 49(2):153\u2013182","journal-title":"Artif Intell Rev"},{"issue":"3","key":"1415_CR3","doi-asserted-by":"publisher","first-page":"692","DOI":"10.1109\/TPDS.2020.3030920","volume":"32","author":"X Gao","year":"2020","unstructured":"Gao X, Liu R, Kaushik A (2020) Hierarchical multi-agent optimization for resource allocation in cloud computing. IEEE Trans Parallel Distrib Syst 32(3):692\u2013707","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"1415_CR4","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1016\/j.neucom.2022.07.007","volume":"505","author":"F Li","year":"2022","unstructured":"Li F, Liu Z, Zhang X et al (2022) Dynamic power allocation in iiot based on multi-agent deep reinforcement learning. Neurocomputing 505:10\u201318","journal-title":"Neurocomputing"},{"key":"1415_CR5","doi-asserted-by":"publisher","first-page":"4195","DOI":"10.1007\/s10489-020-01755-8","volume":"50","author":"H Chen","year":"2020","unstructured":"Chen H, Liu Y, Zhou Z et al (2020) Gama: graph attention multi-agent reinforcement learning algorithm for cooperation. Appl Intell 50:4195\u20134205","journal-title":"Appl Intell"},{"issue":"12","key":"1415_CR6","doi-asserted-by":"publisher","first-page":"14819","DOI":"10.1007\/s10489-022-04225-5","volume":"53","author":"Q Sun","year":"2022","unstructured":"Sun Q, Yao Y, Yi P et al (2022) Learning controlled and targeted communication with the centralized critic for the multi-agent system. Appl Intell 53(12):14819\u201337","journal-title":"Appl Intell"},{"key":"1415_CR7","doi-asserted-by":"publisher","first-page":"3691","DOI":"10.1007\/s10489-021-02554-5","volume":"52","author":"Z Ye","year":"2022","unstructured":"Ye Z, Chen Y, Jiang X et al (2022) Improving sample efficiency in multi-agent actor-critic methods. Appl Intell 52:3691\u20133704","journal-title":"Appl Intell"},{"issue":"4","key":"1415_CR8","doi-asserted-by":"publisher","first-page":"4063","DOI":"10.1007\/s10489-022-03605-1","volume":"53","author":"T Kravaris","year":"2023","unstructured":"Kravaris T, Lentzos K, Santipantakis G et al (2023) Explaining deep reinforcement learning decisions in complex multiagent settings: towards enabling automation in air traffic flow management. Appl Intell 53(4):4063\u20134098","journal-title":"Appl Intell"},{"issue":"4","key":"1415_CR9","doi-asserted-by":"publisher","first-page":"4483","DOI":"10.1007\/s10489-022-03643-9","volume":"53","author":"Z Qiao","year":"2023","unstructured":"Qiao Z, Ke L, Wang X (2023) Traffic signal control using a cooperative ewma-based multi-agent reinforcement learning. Appl Intell 53(4):4483\u20134498","journal-title":"Appl Intell"},{"key":"1415_CR10","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2020.106708","volume":"213","author":"S Yang","year":"2021","unstructured":"Yang S, Yang B (2021) A semi-decentralized feudal multi-agent learned-goal algorithm for multi-intersection traffic signal control. Knowl-Based Syst 213:106708","journal-title":"Knowl-Based Syst"},{"key":"1415_CR11","doi-asserted-by":"publisher","first-page":"390","DOI":"10.1016\/j.neucom.2021.11.106","volume":"490","author":"B Liu","year":"2022","unstructured":"Liu B, Ding Z (2022) A distributed deep reinforcement learning method for traffic light control. Neurocomputing 490:390\u2013399","journal-title":"Neurocomputing"},{"issue":"6","key":"1415_CR12","doi-asserted-by":"publisher","first-page":"3461","DOI":"10.1109\/TSMC.2022.3225381","volume":"53","author":"Z Zhuang","year":"2023","unstructured":"Zhuang Z, Tao H, Chen Y et al (2023) An optimal iterative learning control approach for linear systems with nonuniform trial lengths under input constraints. IEEE Trans Syst, Man, Cybern Syst 53(6):3461\u20133473. https:\/\/doi.org\/10.1109\/TSMC.2022.3225381","journal-title":"IEEE Trans Syst, Man, Cybern Syst"},{"key":"1415_CR13","doi-asserted-by":"publisher","first-page":"2533","DOI":"10.1007\/s00521-018-3937-8","volume":"32","author":"S Malakar","year":"2020","unstructured":"Malakar S, Ghosh M, Bhowmik S et al (2020) A ga based hierarchical feature selection approach for handwritten word recognition. Neural Comput Appl 32:2533\u20132552","journal-title":"Neural Comput Appl"},{"issue":"4","key":"1415_CR14","doi-asserted-by":"publisher","DOI":"10.1088\/1361-6501\/acb075","volume":"34","author":"L Shen","year":"2023","unstructured":"Shen L, Tao H, Ni Y et al (2023) Improved yolov3 model with feature map cropping for multi-scale road object detection. Meas Sci Technol 34(4):045406. https:\/\/doi.org\/10.1088\/1361-6501\/acb075","journal-title":"Meas Sci Technol"},{"issue":"2","key":"1415_CR15","doi-asserted-by":"publisher","first-page":"1454","DOI":"10.1016\/j.jfranklin.2022.11.004","volume":"360","author":"H Tao","year":"2023","unstructured":"Tao H, Qiu J, Chen Y et al (2023) Unsupervised cross-domain rolling bearing fault diagnosis based on time-frequency information fusion. J Frankl Inst 360(2):1454\u20131477","journal-title":"J Frankl Inst"},{"issue":"21","key":"1415_CR16","doi-asserted-by":"publisher","first-page":"2705","DOI":"10.3390\/math9212705","volume":"9","author":"N Bacanin","year":"2021","unstructured":"Bacanin N, Stoean R, Zivkovic M et al (2021) Performance of a novel chaotic firefly algorithm with enhanced exploration for tackling global optimization problems: application for dropout regularization. Mathematics 9(21):2705","journal-title":"Mathematics"},{"issue":"6","key":"1415_CR17","doi-asserted-by":"publisher","first-page":"750","DOI":"10.1007\/s10458-019-09421-1","volume":"33","author":"P Hernandez-Leal","year":"2019","unstructured":"Hernandez-Leal P, Kartal B, Taylor ME (2019) A survey and critique of multiagent deep reinforcement learning. Auton Agents Multi-Agent Syst 33(6):750\u2013797","journal-title":"Auton Agents Multi-Agent Syst"},{"issue":"2","key":"1415_CR18","doi-asserted-by":"publisher","first-page":"895","DOI":"10.1007\/s10462-021-09996-w","volume":"55","author":"S Gronauer","year":"2022","unstructured":"Gronauer S, Diepold K (2022) Multi-agent deep reinforcement learning: a survey. Artif Intell Rev 55(2):895\u2013943","journal-title":"Artif Intell Rev"},{"key":"1415_CR19","unstructured":"Samvelyan M, Rashid T, De\u00a0Witt CS et\u00a0al (2019) The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043"},{"key":"1415_CR20","unstructured":"Rashid T, Samvelyan M, Schroeder C et\u00a0al (2018) Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: International conference on machine learning"},{"key":"1415_CR21","unstructured":"Hu J, Jiang S, Harding SA et\u00a0al (2021) Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning. arXiv e-prints pp arXiv\u20132102"},{"key":"1415_CR22","unstructured":"Yu C, Velu A, Vinitsky E et\u00a0al (2021) The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955"},{"issue":"4","key":"1415_CR23","doi-asserted-by":"publisher","first-page":"661","DOI":"10.1137\/070710111","volume":"51","author":"A Clauset","year":"2009","unstructured":"Clauset A, Shalizi CR, Newman ME (2009) Power-law distributions in empirical data. SIAM Rev 51(4):661\u2013703","journal-title":"SIAM Rev"},{"key":"1415_CR24","doi-asserted-by":"crossref","unstructured":"Wang S, Wu Z, Hu X et\u00a0al (2024) What effects the generalization in visual reinforcement learning: policy consistency with truncated return prediction. In: Proceedings of the AAAI conference on artificial intelligence, vol. 38, no. 1","DOI":"10.1609\/aaai.v38i6.28369"},{"key":"1415_CR25","doi-asserted-by":"crossref","unstructured":"Wang S, Wu Z, Hu X et\u00a0al (2023) Skill-based hierarchical reinforcement learning for target visual navigation. In: IEEE Transactions on Multimedia","DOI":"10.1109\/TMM.2023.3243618"},{"key":"1415_CR26","doi-asserted-by":"crossref","unstructured":"Lv K, Wang S, Han S et\u00a0al (2023) Spatially-regularized features for vehicle re-identification: An explanation of where deep models should focus. In: IEEE Transactions on Intelligent Transportation Systems","DOI":"10.1109\/TITS.2023.3308138"},{"key":"1415_CR27","doi-asserted-by":"publisher","first-page":"5163","DOI":"10.1109\/TIP.2020.2980130","volume":"29","author":"K Lv","year":"2020","unstructured":"Lv K, Sheng H, Xiong Z et al (2020) Pose-based view synthesis for vehicles: a perspective aware method. IEEE Trans Image Process 29:5163\u20135174","journal-title":"IEEE Trans Image Process"},{"issue":"10","key":"1415_CR28","doi-asserted-by":"publisher","first-page":"3718","DOI":"10.1109\/TMC.2021.3057826","volume":"21","author":"Y Yu","year":"2021","unstructured":"Yu Y, Liew SC, Wang T (2021) Multi-agent deep reinforcement learning multiple access for heterogeneous wireless networks with imperfect channels. IEEE Trans Mob Comput 21(10):3718\u201330","journal-title":"IEEE Trans Mob Comput"},{"issue":"5","key":"1415_CR29","doi-asserted-by":"publisher","first-page":"3123","DOI":"10.1109\/TCYB.2020.3022952","volume":"52","author":"S Ivi\u0107","year":"2020","unstructured":"Ivi\u0107 S (2020) Motion control for autonomous heterogeneous multiagent area search in uncertain conditions. IEEE Trans Cybern 52(5):3123\u201335","journal-title":"IEEE Trans Cybern"},{"key":"1415_CR30","doi-asserted-by":"crossref","unstructured":"Yoon HJ, Chen H, Long K et\u00a0al (2019) Learning to communicate: A machine learning framework for heterogeneous multi-agent robotic systems. In: AIAA Scitech 2019 Forum, p 1456","DOI":"10.2514\/6.2019-1456"},{"key":"1415_CR31","unstructured":"Zhong Y, Kuba JG, Hu S et\u00a0al (2023) Heterogeneous-agent reinforcement learning. arXiv preprint arXiv:2304.09870"},{"key":"1415_CR32","unstructured":"Kuba JG, Chen R, Wen M et\u00a0al (2021) Trust region policy optimisation in multi-agent reinforcement learning. arXiv preprint arXiv:2109.11251"},{"key":"1415_CR33","doi-asserted-by":"crossref","unstructured":"Bono G, Dibangoye JS, Matignon L et\u00a0al (2018) Cooperative multi-agent policy gradient. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 459\u2013476","DOI":"10.1007\/978-3-030-10925-7_28"},{"key":"1415_CR34","unstructured":"Bettini M, Shankar A, Prorok A (2023) Heterogeneous multi-robot reinforcement learning. In: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pp 1485\u20131494"},{"key":"1415_CR35","unstructured":"Dong H, Wang T, Liu J et\u00a0al (2021) Birds of a feather flock together: A close look at cooperation emergence via multi-agent rl. arXiv preprint arXiv:2104.11455"},{"key":"1415_CR36","unstructured":"Son K, Kim D, Kang WJ et\u00a0al (2019) Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International conference on machine learning, pp 5887\u20135896"},{"key":"1415_CR37","unstructured":"Foerster J, Assael IA, De\u00a0Freitas N et\u00a0al (2016) Learning to communicate with deep multi-agent reinforcement learning. Advances in neural information processing systems 29"},{"key":"1415_CR38","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1016\/j.neucom.2016.01.031","volume":"190","author":"L Kraemer","year":"2016","unstructured":"Kraemer L, Banerjee B (2016) Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190:82\u201394","journal-title":"Neurocomputing"},{"key":"1415_CR39","doi-asserted-by":"crossref","unstructured":"Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International conference on autonomous agents and multiagent systems, pp 66\u201383","DOI":"10.1007\/978-3-319-71682-4_5"},{"key":"1415_CR40","unstructured":"Sunehag P, Lever G, Gruslys A et\u00a0al (2017) Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296"},{"key":"1415_CR41","first-page":"10199","volume":"33","author":"T Rashid","year":"2020","unstructured":"Rashid T, Farquhar G, Peng B et al (2020) Weighted qmix: expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Adv Neural Inf Process Syst 33:10199\u201310210","journal-title":"Adv Neural Inf Process Syst"},{"key":"1415_CR42","unstructured":"Yang Y, Hao J, Liao B et\u00a0al (2020) Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939"},{"key":"1415_CR43","unstructured":"Wang J, Ren Z, Liu T et\u00a0al (2020) Qplex: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062"},{"key":"1415_CR44","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1016\/j.neucom.2022.06.091","volume":"504","author":"W Liang","year":"2022","unstructured":"Liang W, Wang J, Bao W et al (2022) Qauxi: cooperative multi-agent reinforcement learning with knowledge transferred from auxiliary task. Neurocomputing 504:163\u2013173","journal-title":"Neurocomputing"},{"key":"1415_CR45","doi-asserted-by":"publisher","first-page":"9701","DOI":"10.1007\/s10489-021-02873-7","volume":"52","author":"H Ge","year":"2022","unstructured":"Ge H, Ge Z, Sun L et al (2022) Enhancing cooperation by cognition differences and consistent representation in multi-agent reinforcement learning. Appl Intell 52:9701\u20139716","journal-title":"Appl Intell"},{"issue":"8","key":"1415_CR46","doi-asserted-by":"publisher","first-page":"9261","DOI":"10.1007\/s10489-022-03924-3","volume":"53","author":"H Wang","year":"2022","unstructured":"Wang H, Xie X, Zhou L (2022) Transform networks for cooperative multi-agent deep reinforcement learning. Appl Intell 53(8):9261\u20139","journal-title":"Appl Intell"},{"issue":"16","key":"1415_CR47","doi-asserted-by":"publisher","first-page":"19044","DOI":"10.1007\/s10489-022-04426-y","volume":"53","author":"X He","year":"2023","unstructured":"He X, Ge H, Sun L et al (2023) Brgr: multi-agent cooperative reinforcement learning with bidirectional real-time gain representation. Appl Intell 53(16):19044\u201359","journal-title":"Appl Intell"},{"key":"1415_CR48","doi-asserted-by":"crossref","unstructured":"Yang Q, Parasuraman R (2021) How can robots trust each other for better cooperation? a relative needs entropy based robot-robot trust assessment model. In: 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp 2656\u20132663","DOI":"10.1109\/SMC52423.2021.9659187"},{"key":"1415_CR49","unstructured":"Hartmann VN, Orthey A, Driess D et\u00a0al (2021) Long-horizon multi-robot rearrangement planning for construction assembly. arXiv preprint arXiv:2106.02489"},{"key":"1415_CR50","doi-asserted-by":"publisher","first-page":"5793","DOI":"10.1007\/s10489-020-02065-9","volume":"51","author":"H Jiang","year":"2021","unstructured":"Jiang H, Shi D, Xue C et al (2021) Multi-agent deep reinforcement learning with type-based hierarchical group communication. Appl Intell 51:5793\u20135808","journal-title":"Appl Intell"},{"key":"1415_CR51","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1016\/j.neucom.2020.07.020","volume":"414","author":"Y Liu","year":"2020","unstructured":"Liu Y, Shen J, He H (2020) Multi-attention deep reinforcement learning and re-ranking for vehicle re-identification. Neurocomputing 414:27\u201335","journal-title":"Neurocomputing"},{"key":"1415_CR52","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1016\/j.neucom.2020.09.007","volume":"421","author":"X Li","year":"2021","unstructured":"Li X, Wang L, Jiang Q et al (2021) Differential evolution algorithm with multi-population cooperation and multi-strategy integration. Neurocomputing 421:285\u2013302","journal-title":"Neurocomputing"},{"key":"1415_CR53","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.106937","volume":"220","author":"Z Cheng","year":"2021","unstructured":"Cheng Z, Song H, Wang J et al (2021) Hybrid firefly algorithm with grouping attraction for constrained optimization problem. Knowl-Based Syst 220:106937","journal-title":"Knowl-Based Syst"},{"key":"1415_CR54","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.107556","volume":"234","author":"Y Li","year":"2021","unstructured":"Li Y, Li J, Zhang M (2021) Deep transformer modeling via grouping skip connection for neural machine translation. Knowl-Based Syst 234:107556","journal-title":"Knowl-Based Syst"},{"key":"1415_CR55","doi-asserted-by":"crossref","unstructured":"Rotman D, Yaroker Y, Amrani E et\u00a0al (2020) Learnable optimal sequential grouping for video scene detection. In: Proceedings of the 28th ACM International Conference on Multimedia, pp 1958\u20131966","DOI":"10.1145\/3394171.3413612"},{"key":"1415_CR56","unstructured":"Ling Z, Yue Z, Xia J et\u00a0al (2022) Fedentropy: efficient device grouping for federated learning using maximum entropy judgment. arXiv preprint arXiv:2205.12038"},{"key":"1415_CR57","doi-asserted-by":"crossref","unstructured":"Hou J, Zhou X, Gan Z et\u00a0al (2022) Enhanced decentralized autonomous aerial swarm with group planning. arXiv preprint arXiv:2203.01069","DOI":"10.1109\/LRA.2022.3191037"},{"key":"1415_CR58","doi-asserted-by":"crossref","unstructured":"Al Faiya B, Athanasiadis D, Chen M et al (2021) A self-organizing multi-agent system for distributed voltage regulation. IEEE Trans Smart Grid 12(5):4102\u20134112","DOI":"10.1109\/TSG.2021.3070783"},{"key":"1415_CR59","unstructured":"Mahajan A, Rashid T, Samvelyan M et\u00a0al (2019) Maven: multi-agent variational exploration. Advances in Neural Information Processing Systems 32"},{"key":"1415_CR60","unstructured":"Wang T, Dong H, Lesser V et\u00a0al (2020) Roma: Multi-agent reinforcement learning with emergent roles. arXiv preprint arXiv:2003.08039"},{"key":"1415_CR61","doi-asserted-by":"crossref","unstructured":"Li P, Tang H, Yang T et\u00a0al (2022) Pmic: improving multi-agent reinforcement learning with progressive mutual information collaboration. arXiv preprint arXiv:2203.08553","DOI":"10.1109\/JCC56315.2022.00013"},{"key":"1415_CR62","doi-asserted-by":"crossref","unstructured":"Yuan L, Wang J, Zhang F et\u00a0al (2022) Multi-agent incentive communication via decentralized teammate modeling. Association for the Advancement of Artificial Intelligence","DOI":"10.1609\/aaai.v36i9.21179"},{"key":"1415_CR63","first-page":"3991","volume":"34","author":"C Li","year":"2021","unstructured":"Li C, Wang T, Wu C et al (2021) Celebrating diversity in shared multi-agent reinforcement learning. Adv Neural Inf Process Syst 34:3991\u20134002","journal-title":"Adv Neural Inf Process Syst"},{"key":"1415_CR64","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-28929-8","volume-title":"A concise introduction to decentralized POMDPs","author":"FA Oliehoek","year":"2016","unstructured":"Oliehoek FA, Amato C (2016) A concise introduction to decentralized POMDPs. Springer"},{"key":"1415_CR65","doi-asserted-by":"crossref","unstructured":"Cho K, van Merrienboer B, G\u00fcl\u00e7ehre \u00c7 et\u00a0al (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. EMNLP","DOI":"10.3115\/v1\/D14-1179"},{"issue":"1","key":"1415_CR66","first-page":"7234","volume":"21","author":"T Rashid","year":"2020","unstructured":"Rashid T, Samvelyan M, De Witt CS et al (2020) Monotonic value function factorisation for deep multi-agent reinforcement learning. J Mach Learn Res 21(1):7234\u20137284","journal-title":"J Mach Learn Res"},{"key":"1415_CR67","doi-asserted-by":"crossref","unstructured":"Tan M (1993) Multi-agent reinforcement learning: Independent vs. cooperative agents. In: Proceedings of the tenth international conference on machine learning, pp 330\u2013337","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"key":"1415_CR68","unstructured":"Guestrin C, Koller D, Parr R (2001) Multiagent planning with factored mdps. Advances in neural information processing systems 14"},{"key":"1415_CR69","unstructured":"Foerster J, Nardelli N, Farquhar G et\u00a0al (2017) Stabilising experience replay for deep multi-agent reinforcement learning. In: International conference on machine learning, pp 1146\u20131155"},{"key":"1415_CR70","unstructured":"Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980"},{"key":"1415_CR71","unstructured":"Wang T, Gupta T, Mahajan A et\u00a0al (2020) Rode: learning roles to decompose multi-agent tasks. arXiv preprint arXiv:2010.01523"},{"key":"1415_CR72","doi-asserted-by":"crossref","unstructured":"Foerster J, Farquhar G, Afouras T et\u00a0al (2018) Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1","DOI":"10.1609\/aaai.v32i1.11794"},{"issue":"1","key":"1415_CR73","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1016\/j.swevo.2011.02.002","volume":"1","author":"J Derrac","year":"2011","unstructured":"Derrac J, Garc\u00eda S, Molina D et al (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3\u201318","journal-title":"Swarm Evol Comput"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01415-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-024-01415-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01415-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,17]],"date-time":"2024-07-17T17:22:12Z","timestamp":1721236932000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-024-01415-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,23]]},"references-count":73,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["1415"],"URL":"https:\/\/doi.org\/10.1007\/s40747-024-01415-1","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,23]]},"assertion":[{"value":"10 July 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 March 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 April 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no conflict of interest to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"This article does not involve any ethical problem which needs approval.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"All authors have seen and approved the final version of the manuscript being submitted.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"All authors warrant that the article is our original work, has not received prior publication, and is not under consideration for publication elsewhere. A preprint version of our manuscript has been submitted to arXiv, and the page is . The journal version improves the overall structure of the article, and enhances with more definitions, demonstrations, and experiments.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}]}}