{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T16:05:43Z","timestamp":1777910743748,"version":"3.51.4"},"reference-count":37,"publisher":"SAGE Publications","issue":"12","license":[{"start":{"date-parts":[[2022,3,29]],"date-time":"2022-03-29T00:00:00Z","timestamp":1648512000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/501100004608","name":"Natural Science Foundation of Jiangsu Province","doi-asserted-by":"publisher","award":["BK2011124"],"award-info":[{"award-number":["BK2011124"]}],"id":[{"id":"10.13039\/501100004608","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61203192"],"award-info":[{"award-number":["61203192"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Transactions of the Institute of Measurement and Control"],"published-print":{"date-parts":[[2022,8]]},"abstract":"<jats:p>Although multi-agent deep deterministic policy gradient is a classic deep reinforcement learning algorithm in multi-agent systems. It also has critical problems such as poor training stability and low policy robustness, which significantly limit the capability and application of the algorithm. So this article proposes an improved algorithm called friend-or-foe multi-agent deep deterministic policy gradient for solving the above problems. The main innovations are as follows: (1) inspired by the concept of friend-or-foe game theory, we modified the framework of the original multi-agent deep deterministic policy gradient by using two identical training networks with agents\u2019 optimal and worst actions input, which improves the robustness of training policies, and (2) we propose an action perturbation technique based on gradient-descent to expand the selection range of actions, thereby improving training stability of our proposing algorithm. Finally, we conducted multiple sets of comparative experiments between our friend-or-foe multi-agent deep deterministic policy gradient and original one in four authoritative mixed cooperative\u2013competitive scenarios. The results show that our improving algorithm can simultaneously improve the training stability and the robustness of agents\u2019 generating policies in different complicated environments.<\/jats:p>","DOI":"10.1177\/01423312221077755","type":"journal-article","created":{"date-parts":[[2022,3,29]],"date-time":"2022-03-29T08:38:52Z","timestamp":1648543132000},"page":"2378-2395","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":3,"title":["A Friend-or-Foe framework for multi-agent reinforcement learning policy generation in mixing cooperative\u2013competitive scenarios"],"prefix":"10.1177","volume":"44","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8218-3909","authenticated-orcid":false,"given":"Yu","family":"Sun","sequence":"first","affiliation":[{"name":"Naval Command College, China"},{"name":"The PLA Unit, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Lai","sequence":"additional","affiliation":[{"name":"Command & Control Engineering College, Army Engineering University of PLA, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lei","family":"Cao","sequence":"additional","affiliation":[{"name":"Command & Control Engineering College, Army Engineering University of PLA, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiliang","family":"Chen","sequence":"additional","affiliation":[{"name":"Command & Control Engineering College, Army Engineering University of PLA, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhixiong","family":"Xu","sequence":"additional","affiliation":[{"name":"Army Academy of Border and Coastal Defense, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhen","family":"Lian","sequence":"additional","affiliation":[{"name":"Naval Command College, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huijin","family":"Fan","sequence":"additional","affiliation":[{"name":"Naval Command College, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2022,3,29]]},"reference":[{"key":"bibr1-01423312221077755","author":"Ackermann J","year":"2019","journal-title":"arXiv preprint arXiv:1910.01465"},{"key":"bibr2-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1016\/S0968-090X(02)00030-X"},{"key":"bibr3-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-14435-6_7"},{"key":"bibr4-01423312221077755","unstructured":"Calvo J, Dusparic I (2018) Heterogeneous multi-agent deep reinforcement learning for traffic lights control. In: AICS, pp. 2\u201313. Available at: http:\/\/ceur-ws.org\/Vol-2259\/aics_2.pdf"},{"key":"bibr5-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143865"},{"key":"bibr6-01423312221077755","author":"Chu X","year":"2017","journal-title":"arXiv preprint arXiv:1710.00336"},{"key":"bibr7-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1109\/TWC.2019.2935201"},{"key":"bibr8-01423312221077755","author":"Foerster J","year":"2017","journal-title":"arXiv preprint arXiv:1709.04326"},{"key":"bibr9-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11794"},{"key":"bibr10-01423312221077755","author":"Foerster N","year":"2016","journal-title":"arXiv preprint arXiv:1605.06676"},{"key":"bibr11-01423312221077755","first-page":"242","volume-title":"ICML","volume":"3","author":"Greenwald A","year":"2003"},{"key":"bibr12-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11694"},{"key":"bibr13-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-019-09421-1"},{"key":"bibr14-01423312221077755","first-page":"1039","volume":"4","author":"Hu J","year":"2003","journal-title":"Journal of Machine Learning Research"},{"issue":"2","key":"bibr15-01423312221077755","first-page":"105","volume":"2","author":"K\u00f6n\u00f6nen V","year":"2004","journal-title":"Web Intelligence and Agent Systems: An International Journal"},{"key":"bibr16-01423312221077755","author":"Leibo J","year":"2017","journal-title":"arXiv preprint arXiv:1702.03037"},{"key":"bibr17-01423312221077755","unstructured":"Littman M (2001) Friend-or-foe Q-learning in general-sum games. In: ICML, vol. 1, pp. 322\u2013328. Available at: https:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.589.8571&rep=rep1&type=pdf"},{"key":"bibr18-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2016.12.038"},{"key":"bibr19-01423312221077755","author":"Lowe R","year":"2017","journal-title":"arXiv preprint arXiv:1706.02275"},{"key":"bibr20-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1145\/1160633.1160772."},{"key":"bibr21-01423312221077755","author":"Mao H","year":"2018","journal-title":"arXiv preprint arXiv:1811.07029"},{"key":"bibr22-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-6451-2_4"},{"key":"bibr23-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"bibr24-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1109\/JSAC.2019.2933973"},{"key":"bibr25-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2020.2977374"},{"key":"bibr26-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1613\/jair.2447"},{"key":"bibr27-01423312221077755","author":"Peng P","year":"2017","journal-title":"arXiv preprint arXiv:1703.10069"},{"key":"bibr28-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1109\/PSCE.2009.4840087"},{"key":"bibr29-01423312221077755","unstructured":"Rashid T, Samvelyan M, Schroeder C, et al. (2018) QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: International conference on machine learning, pp. 4295\u20134304. PMLR. Available at: http:\/\/proceedings.mlr.press\/v80\/rashid18a\/rashid18a.pdf"},{"key":"bibr30-01423312221077755","author":"Ruder S","year":"2016","journal-title":"arXiv preprint arXiv:1609.04747"},{"key":"bibr31-01423312221077755","author":"Ryu H","year":"2018","journal-title":"arXiv preprint arXiv:1810.09206"},{"key":"bibr32-01423312221077755","first-page":"2244","volume":"29","author":"Sukhbaatar S","year":"2016","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr33-01423312221077755","author":"Sunehag P","year":"2017","journal-title":"arXiv preprint arXiv:1706.05296"},{"key":"bibr34-01423312221077755","author":"Wang R","year":"2020","journal-title":"arXiv preprint arXiv:2002.06684"},{"key":"bibr35-01423312221077755","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2006.281729"},{"key":"bibr36-01423312221077755","first-page":"506","volume-title":"Proceedings of the third international joint conference on autonomous agents and multiagent systems","volume":"2","author":"Weinberg M","year":"2004"},{"key":"bibr37-01423312221077755","unstructured":"Yang Y, Luo R, Li M, et al. (2018) Mean field multi-agent reinforcement learning. In: International conference on machine learning, pp. 5571\u20135580. PMLR. Available at: https:\/\/proceedings.mlr.press\/v80\/yang18d\/yang18d.pdf"}],"container-title":["Transactions of the Institute of Measurement and Control"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01423312221077755","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/01423312221077755","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01423312221077755","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T15:05:21Z","timestamp":1777647921000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/01423312221077755"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,29]]},"references-count":37,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2022,8]]}},"alternative-id":["10.1177\/01423312221077755"],"URL":"https:\/\/doi.org\/10.1177\/01423312221077755","relation":{},"ISSN":["0142-3312","1477-0369"],"issn-type":[{"value":"0142-3312","type":"print"},{"value":"1477-0369","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,29]]}}}