{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,19]],"date-time":"2025-11-19T06:26:09Z","timestamp":1763533569860,"version":"3.45.0"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2025,6,24]],"date-time":"2025-06-24T00:00:00Z","timestamp":1750723200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"name":"National Defense Key Laboratory of Science and Technology Foundation of China","award":["WDZC20225250403"],"award-info":[{"award-number":["WDZC20225250403"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,11,13]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Multi-agent credit assignment is a research hotspot in the field of cooperative multi-agent reinforcement learning, and its key is how to accurately measure the individual contribution of each agent in the system to promote multi-agent cooperation. Existing solutions mainly use value function factorization or intrinsic reward mechanism, each of which has its own limitations, and both of them utilize global state information, which is not consistent with the information conditions in the actual confrontation. Therefore, this paper proposes a novel value factorization method for multi-agent deep reinforcement learning, which can solve the problem of credit assignment without using global state information. Our method establishes an explicit individual contribution evaluation mechanism for each agent, which portrays the role of each agent in the system by comparing the differences of joint value functions under different information conditions, so that more important agents get more attention, so as to improve the cooperative ability of agents. Experimental results show that our method outperforms all baselines in terms of learning efficiency and stability in multiple scenarios of StarCraft II, and its performance is comparable to that of the method based on global state information in easy scenarios.<\/jats:p>","DOI":"10.1093\/comjnl\/bxaf063","type":"journal-article","created":{"date-parts":[[2025,6,5]],"date-time":"2025-06-05T07:51:57Z","timestamp":1749109917000},"page":"1628-1640","source":"Crossref","is-referenced-by-count":0,"title":["Improving value factorization for multi-agent deep reinforcement learning via individual contribution"],"prefix":"10.1093","volume":"68","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5071-8844","authenticated-orcid":false,"given":"Liqin","family":"Xiong","sequence":"first","affiliation":[{"name":"Command and Control Engineering Institute, Army Engineering University of PLA , Nanjing, 210007 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1137-6778","authenticated-orcid":false,"given":"Lei","family":"Cao","sequence":"additional","affiliation":[{"name":"Command and Control Engineering Institute, Army Engineering University of PLA , Nanjing, 210007 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5198-0932","authenticated-orcid":false,"given":"Xiliang","family":"Chen","sequence":"additional","affiliation":[{"name":"Command and Control Engineering Institute, Army Engineering University of PLA , Nanjing, 210007 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3608-4985","authenticated-orcid":false,"given":"Jun","family":"Lai","sequence":"additional","affiliation":[{"name":"Command and Control Engineering Institute, Army Engineering University of PLA , Nanjing, 210007 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4143-4521","authenticated-orcid":false,"given":"Xijian","family":"Luo","sequence":"additional","affiliation":[{"name":"Command and Control Engineering Institute, Army Engineering University of PLA , Nanjing, 210007 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaoyan","family":"Wang","sequence":"additional","affiliation":[{"name":"Command and Control Engineering Institute, Army Engineering University of PLA , Nanjing, 210007 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Legui","family":"Zhang","sequence":"additional","affiliation":[{"name":"Command and Control Engineering Institute, Army Engineering University of PLA , Nanjing, 210007 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haoyang","family":"Dong","sequence":"additional","affiliation":[{"name":"Command and Control Engineering Institute, Army Engineering University of PLA , Nanjing, 210007 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,6,24]]},"reference":[{"key":"2025111901220939900_ref1","doi-asserted-by":"crossref","first-page":"5649","DOI":"10.1007\/s00521-021-06702-3","article-title":"Tactical UAV path optimization under radar threat using deep reinforcement learning","volume":"34","author":"Alpdemir","year":"2022","journal-title":"Neural Comput Applic"},{"key":"2025111901220939900_ref2","first-page":"9980","article-title":"MAPDP: cooperative multi-agent reinforcement learning to solve pickup and delivery problems","volume-title":"Proceedings of the 36th AAAI Conference on Artificial Intelligence","author":"Zong","year":"2022"},{"key":"2025111901220939900_ref3","first-page":"149","article-title":"Study on intelligent recommendation method of dueling network reinforcement learning based on regret exploration","volume":"49","author":"Hong","year":"2022","journal-title":"Comput Sci"},{"key":"2025111901220939900_ref4","doi-asserted-by":"crossref","first-page":"2317","DOI":"10.1093\/comjnl\/bxae008","article-title":"An intelligent security system using enhanced anomaly-based detection scheme","volume":"67","author":"Louati","year":"2024","journal-title":"Comput J"},{"key":"2025111901220939900_ref5","doi-asserted-by":"crossref","first-page":"4816","DOI":"10.1109\/TAC.2022.3159453","article-title":"A generalized minimax Q-learning algorithm for two-player zero-sum stochastic games","volume":"67","author":"Diddigi","year":"2022","journal-title":"IEEE Trans Autom Control"},{"key":"2025111901220939900_ref6","doi-asserted-by":"crossref","first-page":"1573","DOI":"10.1093\/comjnl\/bxac027","article-title":"Improvement of MADRL equilibrium based on pareto optimization","volume":"66","author":"Zhao","year":"2022","journal-title":"Comput J"},{"key":"2025111901220939900_ref7","first-page":"199","article-title":"Gplight: grouped multi-agent reinforcement learning for large-scale traffic signal control","volume-title":"Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI)","author":"Liu","year":"2023"},{"key":"2025111901220939900_ref8","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1109\/MWC.003.2100690","article-title":"Decentralized computation offloading with cooperative UAVs: multi-agent deep reinforcement learning perspective","volume":"29","author":"Hwang","year":"2022","journal-title":"IEEE Wirel Commun"},{"key":"2025111901220939900_ref9","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1109\/MCOM.001.2000634","article-title":"Artificial intelligence empowered power allocation for smart railway","volume":"59","author":"Xu","year":"2021","journal-title":"IEEE Commun Mag"},{"key":"2025111901220939900_ref10","doi-asserted-by":"crossref","first-page":"7227","DOI":"10.1007\/s00521-021-06855-1","article-title":"Distributed localization for IoT with multi-agent reinforcement learning","volume":"34","author":"Jia","year":"2022","journal-title":"Neural Comput Applic"},{"key":"2025111901220939900_ref11","first-page":"1","article-title":"Generalize learned heuristics to solve large-scale vehicle routing problems in real-time","volume-title":"Proceedings of the 11th International Conference on Learning Representations (ICLR)","author":"Hou","year":"2023"},{"key":"2025111901220939900_ref12","first-page":"1741","article-title":"Learning transferable cooperative behavior in multi-agent team","volume-title":"Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)","author":"Agarwal","year":"2020"},{"key":"2025111901220939900_ref13","first-page":"13","article-title":"Overview of multi-agent deep reinforcement learning","volume":"56","author":"Sun","year":"2020","journal-title":"Comput Eng Appl"},{"key":"2025111901220939900_ref14","doi-asserted-by":"crossref","first-page":"895","DOI":"10.1007\/s10462-021-09996-w","article-title":"Multi-agent deep reinforcement learning: a survey","volume":"55","author":"Gronauer","year":"2022","journal-title":"Artif Intell Rev"},{"key":"2025111901220939900_ref15","first-page":"2085","article-title":"Value-decomposition networks for cooperative multi-agent learning based on team reward","volume-title":"Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS)","author":"Sunehag","year":"2018"},{"key":"2025111901220939900_ref16","first-page":"1","article-title":"Monotonic value function factorisation for deep multi-agent reinforcement learning","volume":"21","author":"Rashid","year":"2020","journal-title":"J Mach Learn Res"},{"key":"2025111901220939900_ref17","first-page":"1","article-title":"Qatten: a general framework for cooperative multiagent reinforcement learning.","author":"Yang","year":"2020"},{"key":"2025111901220939900_ref18","first-page":"4403","article-title":"LIIR: learning individual intrinsic reward in multi-agent reinforcement learning","volume-title":"Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS)","author":"Du","year":"2019"},{"key":"2025111901220939900_ref19","doi-asserted-by":"crossref","first-page":"17298814211044946","DOI":"10.1177\/17298814211044946","article-title":"Generating individual intrinsic reward for cooperative multiagent reinforcement learning","volume":"18","author":"Wu","year":"2021","journal-title":"Int J Adv Robot Syst"},{"key":"2025111901220939900_ref20","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1109\/TG.2023.3263013","article-title":"Attention-based intrinsic reward mixing network for credit assignment in multiagent reinforcement learning","volume":"16","author":"Li","year":"2024","journal-title":"IEEE Trans Games"},{"key":"2025111901220939900_ref21","first-page":"9278","article-title":"Locality matters: a scalable value decomposition approach for cooperative multi-agent reinforcement learning","volume-title":"Proceedings of the 36th AAAI Conference on Artificial Intelligence","author":"Zohar","year":"2022"},{"key":"2025111901220939900_ref22","doi-asserted-by":"crossref","first-page":"111422","DOI":"10.1016\/j.knosys.2024.111422","article-title":"DVF: multi-agent Q-learning with difference value factorization","volume":"286","author":"Huang","year":"2024","journal-title":"Knowl-Based Syst"},{"key":"2025111901220939900_ref23","first-page":"11853","article-title":"Learning implicit credit assignment for cooperative multi-agent reinforcement learning","volume-title":"Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS)","author":"Zhou","year":"2020"},{"key":"2025111901220939900_ref24","first-page":"10199","article-title":"Weighted qmix: expanding monotonic value function factorisation for deep multi-agent reinforcement learning","volume":"33","author":"Rashid","year":"2022","journal-title":"Adv Neural Inform Process Syst"},{"key":"2025111901220939900_ref25","doi-asserted-by":"crossref","first-page":"2782","DOI":"10.1093\/comjnl\/bxac121","article-title":"Character-based value factorization for MADRL","volume":"66","author":"Liqin","year":"2022","journal-title":"Comput J"},{"key":"2025111901220939900_ref26","first-page":"7810","article-title":"AVD-net: Attention value decomposition network for deep multi-agent reinforcement learning","volume-title":"Proceedings of the 25th International Conference on Pattern Recognition (ICPR)","author":"Zhang","year":": , 2021"},{"key":"2025111901220939900_ref27","first-page":"5887","article-title":"QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning","volume-title":"Proceedings of the 36th International Conference on Machine Learning (ICML)","author":"Son","year":"2019"},{"key":"2025111901220939900_ref28","first-page":"2974","article-title":"Counterfactual multi-agent policy gradients","volume-title":"Proceedings of the 32nd AAAI Conference on Artificial Intelligence","author":"Foerster","year":"2018"},{"key":"2025111901220939900_ref29","first-page":"1","article-title":"Cooperative multi-agent deep reinforcement learning with counterfactual reward","volume-title":"Proceedings of the International Joint Conference on Neural Networks (IJCNN)","author":"Shao","year":"2020"},{"key":"2025111901220939900_ref30","first-page":"8113","article-title":"Credit assignment for collective multiagent RL with global rewards","volume-title":"Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS)","author":"Nguyen","year":"2018"},{"key":"2025111901220939900_ref31","doi-asserted-by":"crossref","first-page":"681","DOI":"10.1016\/j.neunet.2023.05.021","article-title":"Credit assignment with predictive contribution measurement in multi-agent reinforcement learning","volume":"164","author":"Chen","year":"2023","journal-title":"Neural Netw"},{"key":"2025111901220939900_ref32","first-page":"3926","article-title":"Group-aware coordination graph for multi-agent reinforcement learning","volume-title":"Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI)","author":"Duan","year":"2024"},{"key":"2025111901220939900_ref33","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1613\/jair.2447","article-title":"Optimal and approximate Q-value functions for decentralized POMDPs","volume":"32","author":"Oliehoek","year":"2008","journal-title":"J Artif Intell Res"},{"key":"2025111901220939900_ref34","doi-asserted-by":"crossref","first-page":"e0172395","DOI":"10.1371\/journal.pone.0172395","article-title":"Multiagent cooperation and competition with deep reinforcement learning","volume":"12","author":"Tampuu","year":"2017","journal-title":"PloS One"},{"key":"2025111901220939900_ref35","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1007\/978-3-642-14435-6_7","article-title":"Multi-agent reinforcement learning: an overview","volume":"310","author":"Bu\u015foniu","year":"2010","journal-title":"Innov Multi-agent System Appl-1"},{"key":"2025111901220939900_ref36","first-page":"29","article-title":"Deep recurrent Q-learning for partially observable MDPs","author":"Hausknecht","year":"2015","journal-title":"Proc AAAI Fall Symp Series,"}],"container-title":["The Computer Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/68\/11\/1628\/63565791\/bxaf063.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/68\/11\/1628\/63565791\/bxaf063.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,19]],"date-time":"2025-11-19T06:22:19Z","timestamp":1763533339000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/comjnl\/article\/68\/11\/1628\/8172477"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,24]]},"references-count":36,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2025,6,24]]},"published-print":{"date-parts":[[2025,11,13]]}},"URL":"https:\/\/doi.org\/10.1093\/comjnl\/bxaf063","relation":{},"ISSN":["0010-4620","1460-2067"],"issn-type":[{"type":"print","value":"0010-4620"},{"type":"electronic","value":"1460-2067"}],"subject":[],"published-other":{"date-parts":[[2025,11]]},"published":{"date-parts":[[2025,6,24]]}}}