{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T19:57:24Z","timestamp":1773950244415,"version":"3.50.1"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"23","license":[{"start":{"date-parts":[[2025,5,14]],"date-time":"2025-05-14T00:00:00Z","timestamp":1747180800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,5,14]],"date-time":"2025-05-14T00:00:00Z","timestamp":1747180800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>In multi-agent deep reinforcement learning (MADRL), agents can learn to communicate to broaden their view and understanding of the environment and their teammates. Previous works on communication in MADRL mainly rely on centralized or independent value functions for learning communication, which cannot differentiate how communicating agents individually contribute to the overall learning process. Moreover, continuous environments that incorporate continuous state\/action spaces have received limited attention in previous research. In this paper, we propose a novel architecture for communicating agents and apply centralized but factorized value functions to differentiate how each agent contributes to learning during communication, along with gradient backpropagation. Additionally, to address the complexity introduced by communication, we investigate the use of an attention mechanism that aggregates messages, enabling policies to maintain a fixed input length. We then present a new policy gradient method termed communication with factorized policy gradients (CFPG), featuring full backpropagation from factorized value functions to communicating agents\u2019 architecture. We demonstrate that CFPG can enhance performance and accelerate learning in continuous predator\u2013prey scenarios and multi-agent MuJoCo, when compared to other learning communication methods.<\/jats:p>","DOI":"10.1007\/s00521-025-11272-9","type":"journal-article","created":{"date-parts":[[2025,5,14]],"date-time":"2025-05-14T13:57:09Z","timestamp":1747231029000},"page":"18933-18956","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Communication with factorized policy gradients in multi-agent deep reinforcement learning"],"prefix":"10.1007","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2910-5506","authenticated-orcid":false,"given":"Changxi","family":"Zhu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mehdi","family":"Dastani","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shihan","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,5,14]]},"reference":[{"key":"11272_CR1","unstructured":"Shalev-Shwartz S, Shammah S, Shashua A (2016) Safe, multi-agent, reinforcement learning for autonomous driving. CoRR arXiv:1610.03295"},{"issue":"3","key":"11272_CR2","doi-asserted-by":"publisher","first-page":"455","DOI":"10.1093\/comjnl\/bxq018","volume":"54","author":"M Vinyals","year":"2011","unstructured":"Vinyals M, Rodr\u00edguez-Aguilar JA, Cerquides J (2011) A survey on sensor networks from a multiagent perspective. Comput. J. 54(3):455\u2013470. https:\/\/doi.org\/10.1093\/comjnl\/bxq018","journal-title":"Comput. J."},{"issue":"11","key":"11272_CR3","doi-asserted-by":"publisher","first-page":"1238","DOI":"10.1177\/0278364913495721","volume":"32","author":"J Kober","year":"2013","unstructured":"Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: A survey. Int J Robot Res 32(11):1238\u20131274. https:\/\/doi.org\/10.1177\/0278364913495721","journal-title":"Int J Robot Res"},{"issue":"7676","key":"11272_CR4","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/nature24270","volume":"550","author":"D Silver","year":"2017","unstructured":"Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap TP, Hui F, Sifre L, Driessche G, Graepel T, Hassabis D (2017) Mastering the game of go without human knowledge. Nat 550(7676):354\u2013359. https:\/\/doi.org\/10.1038\/nature24270","journal-title":"Nat"},{"issue":"6456","key":"11272_CR5","doi-asserted-by":"publisher","first-page":"885","DOI":"10.1126\/science.aay2400","volume":"365","author":"N Brown","year":"2019","unstructured":"Brown N, Sandholm T (2019) Superhuman ai for multiplayer poker. Science 365(6456):885\u2013890","journal-title":"Science"},{"issue":"NIPS","key":"11272_CR6","first-page":"2137","volume":"29","author":"JN Foerster","year":"2016","unstructured":"Foerster JN, Assael YM, Freitas N, Whiteson S (2016) Learning to communicate with deep multi-agent reinforcement learning. Adv Neural Inf Proc Syst 29(NIPS):2137\u20132145","journal-title":"Adv Neural Inf Proc Syst"},{"key":"11272_CR7","first-page":"2244","volume":"29","author":"S Sukhbaatar","year":"2016","unstructured":"Sukhbaatar S, Szlam A, Fergus R (2016) Learning multiagent communication with backpropagation. Adv Neural Inf Proc Syst 29:2244\u20132252","journal-title":"Adv Neural Inf Proc Syst"},{"key":"11272_CR8","first-page":"7265","volume":"31","author":"J Jiang","year":"2018","unstructured":"Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. Adv Neural Inf Process Syst (NIPS) 31:7265\u20137275","journal-title":"Adv Neural Inf Process Syst (NIPS)"},{"key":"11272_CR9","unstructured":"Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, Pineau J (2019) Tarmac: Targeted multi-agent communication. In: Proceedings of the 36th International Conference on Machine Learning (ICML), pp. 1538\u20131546"},{"key":"11272_CR10","unstructured":"Zhu C, Dastani M, Wang S (2022) A survey of multi-agent reinforcement learning with communication. CoRR arXiv:2203.08975"},{"key":"11272_CR11","unstructured":"Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Guyon, I., Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017. Long Beach, CA, USA, pp. 6379\u20136390. https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/68a9750337a418a86fe06c1991a1d64c-Abstract.html"},{"key":"11272_CR12","doi-asserted-by":"crossref","unstructured":"Foerster JN, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018) Counterfactual multi-agent policy gradients. In: McIlraith, S.A., Weinberger, K.Q. (eds.) Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp. 2974\u20132982","DOI":"10.1609\/aaai.v32i1.11794"},{"key":"11272_CR13","unstructured":"Rashid T, Samvelyan M, Witt CS, Farquhar G, Foerster JN, Whiteson S (2018) QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm\u00e4ssan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4292\u20134301"},{"key":"11272_CR14","unstructured":"Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi VF, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K, Graepel T (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Andr\u00e9, E., Koenig, S., Dastani, M., Sukthankar, G. (eds.) Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, Stockholm, Sweden, July 10-15, 2018, pp. 2085\u20132087"},{"key":"11272_CR15","unstructured":"Peng B, Rashid T, Witt CS, Kamienny P, Torr PHS, Boehmer W, Whiteson S (2021) FACMAC: factored multi-agent centralised policy gradients. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 12208\u201312221. https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/65b9eea6e1cc6bb9f0cd2a47751a186f-Abstract.html"},{"key":"11272_CR16","unstructured":"Samvelyan M, Rashid T, Witt CS, Farquhar G, Nardelli N, Rudner TGJ, Hung C, Torr PHS, Foerster JN, Whiteson S (2019) The starcraft multi-agent challenge. In: Elkind, E., Veloso, M., Agmon, N., Taylor, M.E. (eds.) Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS \u201919, Montreal, QC, Canada, May 13-17, 2019, pp. 2186\u20132188"},{"key":"11272_CR17","unstructured":"Niu Y, Paleja RR, Gombolay MC (2021) Multi-agent graph-attention communication and teaming. In: 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 964\u2013973"},{"key":"11272_CR18","unstructured":"Guan C, Chen F, Yuan L, Zhang Z, Yu Y (2023) Efficient communication via self-supervised information aggregation for online and offline multi-agent reinforcement learning. CoRR arXiv:2302.09605"},{"key":"11272_CR19","unstructured":"Zhang SQ, Zhang Q, Lin J (2019) Efficient communication in multi-agent reinforcement learning via variance based control. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d\u2019Alch\u00e9-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32 (NeurIPS), pp. 3230\u20133239. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/14cfdb59b5bda1fc245aadae15b1984a-Abstract.html"},{"key":"11272_CR20","unstructured":"Zhang SQ, Zhang Q, Lin J (2020) Succinct and robust multi-agent communication with temporal message control. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33 (NIPS). https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/c82b013313066e0702d58dc70db033ca-Abstract.html"},{"key":"11272_CR21","unstructured":"Wang T, Wang J, Zheng C, Zhang C (2020) Learning nearly decomposable value functions via communication minimization. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020"},{"key":"11272_CR22","doi-asserted-by":"crossref","unstructured":"Yuan L, Wang J, Zhang F, Wang C, Zhang Z, Yu Y, Zhang C (2022) Multi-agent incentive communication via decentralized teammate modeling. In: Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pp. 9466\u20139474","DOI":"10.1609\/aaai.v36i9.21179"},{"key":"11272_CR23","unstructured":"Singh A, Jain T, Sukhbaatar S (2019) Learning when to communicate at scale in multiagent cooperative and competitive tasks. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. https:\/\/openreview.net\/forum?id=rye7knCqK7"},{"key":"11272_CR24","doi-asserted-by":"crossref","unstructured":"Mao H, Zhang Z, Xiao Z, Gong Z, Ni Y (2020) Learning agent communication under limited bandwidth by message pruning. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 5142\u20135149","DOI":"10.1609\/aaai.v34i04.5957"},{"key":"11272_CR25","unstructured":"Ding Z, Huang T, Lu Z (2020) Learning individually inferred communication for multi-agent cooperation. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33 (NeurIPS). https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/fb2fcd534b0ff3bbed73cc51df620323-Abstract.html"},{"key":"11272_CR26","doi-asserted-by":"publisher","unstructured":"Guo X, Shi D, Fan W (2023) Scalable communication for multi-agent reinforcement learning via transformer-based email mechanism. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China, pp. 126\u2013134. ijcai.org, ???. https:\/\/doi.org\/10.24963\/IJCAI.2023\/15","DOI":"10.24963\/IJCAI.2023\/15"},{"key":"11272_CR27","unstructured":"Hu G, Zhu Y, Zhao D, Zhao M, Hao J (2020) Event-triggered multi-agent reinforcement learning with communication under limited-bandwidth constraint. CoRR arXiv:2010.04978"},{"key":"11272_CR28","unstructured":"Wang R, He X, Yu R, Qiu W, An B, Rabinovich Z (2020) Learning efficient multi-agent communication: An information bottleneck approach. In: Proceedings of the 37th International Conference on Machine Learning (ICML). Proceedings of Machine Learning Research, vol. 119, pp. 9908\u20139918"},{"key":"11272_CR29","doi-asserted-by":"crossref","unstructured":"Chen J, Lan T, Joe-Wong C (2024) Rgmcomm: Return gap minimization via discrete communications in multi-agent reinforcement learning. In: Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada, pp. 17327\u201317336","DOI":"10.1609\/aaai.v38i16.29680"},{"key":"11272_CR30","unstructured":"Li Q, Kumar A, Kostrikov I, Levine S (2025) Learning to communicate using a communication critic and counterfactual reasoning. In: Neural Computing and Applications"},{"key":"11272_CR31","unstructured":"Sun C, He P, Wang R, Zheng C (2025) Revisiting communication efficiency in multi-agent reinforcement learning from the dimensional analysis perspective. In: AAMAS \u201925: 24th International Conference on Autonomous Agents and Multiagent Systems"},{"key":"11272_CR32","doi-asserted-by":"crossref","unstructured":"Liu Y, Wang W, Hu Y, Hao J, Chen X, Gao Y (2020) Multi-agent game abstraction via graph attention neural network. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), pp. 7211\u20137218","DOI":"10.1609\/aaai.v34i05.6211"},{"key":"11272_CR33","unstructured":"Kim D, Moon S, Hostallero D, Kang WJ, Lee T, Son K, Yi Y (2019) Learning to schedule communication in multi-agent reinforcement learning. In: 7th International Conference on Learning Representations (ICLR)"},{"key":"11272_CR34","unstructured":"Son K, Kim D, Kang WJ, Hostallero D, Yi Y (2019) QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 5887\u20135896"},{"key":"11272_CR35","unstructured":"Wang Y, Han B, Wang T, Dong H, Zhang C (2021) DOP: off-policy multi-agent decomposed policy gradients. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021"},{"key":"11272_CR36","doi-asserted-by":"crossref","unstructured":"Oliehoek FA, Amato C (2016) A Concise Introduction to Decentralized POMDPs. Springer Briefs in Intelligent Systems","DOI":"10.1007\/978-3-319-28929-8"},{"key":"11272_CR37","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon, I., Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998\u20136008. https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html"},{"key":"11272_CR38","unstructured":"Li Q, Kumar A, Kostrikov I, Levine S (2023) Efficient deep reinforcement learning requires regulating overfitting. In: The Eleventh International Conference on Learning Representations (ICLR), Kigali, Rwanda, May 1-5, 2023"},{"issue":"8","key":"11272_CR39","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput. 9(8):1735\u20131780","journal-title":"Neural Comput."},{"key":"11272_CR40","doi-asserted-by":"crossref","unstructured":"Wang Y, Sartoretti G (2022) Fcmnet: Full communication memory net for team-level cooperation in multi-agent systems. CoRR arXiv:2201.11994","DOI":"10.21203\/rs.3.rs-2563058\/v1"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-025-11272-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-025-11272-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-025-11272-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,6]],"date-time":"2025-09-06T14:47:37Z","timestamp":1757170057000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-025-11272-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,14]]},"references-count":40,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["11272"],"URL":"https:\/\/doi.org\/10.1007\/s00521-025-11272-9","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"value":"0941-0643","type":"print"},{"value":"1433-3058","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,14]]},"assertion":[{"value":"16 November 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 April 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 May 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no conflict of interest to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}