{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T17:15:19Z","timestamp":1772644519199,"version":"3.50.1"},"reference-count":40,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,12,20]],"date-time":"2024-12-20T00:00:00Z","timestamp":1734652800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Robot. AI"],"abstract":"<jats:p>This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called <jats:italic>Multi-Agent Cooperative Recurrent Proximal Policy Optimization<\/jats:italic> (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in the critic\u2019s network architecture and propose a new framework to use the proposed meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents\u2019 rewards and value functions by controlling the level of cooperation between agents using a parameter. The use of this control parameter is suitable for environments in which the agents are unable to fully cooperate with each other. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as MAGIC, IC3Net, CommNet, GA-Comm, QMIX, MADDPG, and RMAPPO, and also single-agent methods with shared parameters between agents such as IMPALA and APEX. The results show superior performance against other algorithms. The code is available online at <jats:ext-link>https:\/\/github.com\/kargarisaac\/macrpo<\/jats:ext-link>.<\/jats:p>","DOI":"10.3389\/frobt.2024.1394209","type":"journal-article","created":{"date-parts":[[2024,12,20]],"date-time":"2024-12-20T08:53:16Z","timestamp":1734684796000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["MACRPO: Multi-agent cooperative recurrent policy optimization"],"prefix":"10.3389","volume":"11","author":[{"given":"Eshagh","family":"Kargar","sequence":"first","affiliation":[]},{"given":"Ville","family":"Kyrki","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2024,12,20]]},"reference":[{"key":"B1","volume-title":"Emergent complexity via multi-agent competition","author":"Bansal","year":"2017"},{"key":"B2","volume-title":"Dota 2 with large scale deep reinforcement learning","author":"Berner","year":"2019"},{"key":"B3","doi-asserted-by":"publisher","first-page":"645","DOI":"10.1613\/jair.1.11396","article-title":"A survey on transfer learning for multiagent reinforcement learning systems","volume":"64","author":"Da Silva","year":"2019","journal-title":"J. Artif. Intell. Res."},{"key":"B4","first-page":"1","article-title":"Carla: an open urban driving simulator","author":"Dosovitskiy","year":"2017"},{"key":"B5","doi-asserted-by":"publisher","first-page":"591","DOI":"10.1613\/jair.2502","article-title":"A multiagent approach to autonomous intersection management","volume":"31","author":"Dresner","year":"2008","journal-title":"J. Artif. Intell. Res."},{"key":"B6","doi-asserted-by":"publisher","first-page":"2505","DOI":"10.24963\/ijcai.2020\/347","article-title":"Balancing individual preferences and shared objectives in multiagent reinforcement learning","author":"Durugkar","year":"2020","journal-title":"Good Systems-Published Res."},{"key":"B7","doi-asserted-by":"publisher","first-page":"407","DOI":"10.1613\/jair.1735","article-title":"Cooperative information sharing to improve distributed learning in multi-agent systems","volume":"24","author":"Dutta","year":"2005","journal-title":"J. Artif. Intell. Res."},{"key":"B8","first-page":"2137","article-title":"Learning to communicate with deep multi-agent reinforcement learning","volume-title":"Advances in neural information processing systems","author":"Foerster","year":"2016"},{"key":"B9","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11794","article-title":"Counterfactual multi-agent policy gradients","volume":"32","author":"Foerster","year":"2018","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"B10","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1613\/jair.1579","article-title":"A framework for sequential planning in multi-agent settings","volume":"24","author":"Gmytrasiewicz","year":"2005","journal-title":"J. Artif. Intell. Res."},{"key":"B11","first-page":"66","article-title":"Cooperative multi-agent control using deep reinforcement learning","author":"Gupta","year":"2017"},{"key":"B12","doi-asserted-by":"publisher","first-page":"709","DOI":"10.5555\/1597148.1597262","article-title":"Dynamic programming for partially observable stochastic games","volume":"4","author":"Hansen","year":"2004","journal-title":"AAAI"},{"key":"B13","doi-asserted-by":"publisher","first-page":"750","DOI":"10.1007\/s10458-019-09421-1","article-title":"A survey and critique of multiagent deep reinforcement learning","volume":"33","author":"Hernandez-Leal","year":"2019","journal-title":"Aut. Agents Multi-Agent Syst."},{"key":"B14","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1613\/jair.301","article-title":"Reinforcement learning: a survey","volume":"4","author":"Kaelbling","year":"1996","journal-title":"J. Artif. Intell. Res."},{"key":"B15","doi-asserted-by":"publisher","first-page":"571","DOI":"10.1613\/jair.3213","article-title":"Multiagent learning in large anonymous games","volume":"40","author":"Kash","year":"2011","journal-title":"J. Artif. Intell. Res."},{"key":"B16","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1016\/j.neucom.2016.01.031","article-title":"Multi-agent reinforcement learning as a rehearsal for decentralized planning","volume":"190","author":"Kraemer","year":"2016","journal-title":"Neurocomputing"},{"key":"B17","doi-asserted-by":"publisher","first-page":"1421","DOI":"10.1613\/jair.1.12412","article-title":"Deep reinforcement learning: a state-of-the-art walkthrough","volume":"69","author":"Lazaridis","year":"2020","journal-title":"J. Artif. Intell. Res."},{"key":"B18","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1016\/B978-1-55860-335-6.50027-1","article-title":"Markov games as a framework for multi-agent reinforcement learning","volume-title":"Machine learning proceedings 1994","author":"Littman","year":"1994"},{"key":"B19","first-page":"7211","article-title":"Multi-agent game abstraction via graph attention neural network","author":"Liu","year":"2020"},{"key":"B20","first-page":"6379","article-title":"Multi-agent actor-critic for mixed cooperative-competitive environments","volume-title":"Advances in neural information processing systems","author":"Lowe","year":"2017"},{"key":"B21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1017\/s0269888912000057","article-title":"Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems","volume":"27","author":"Matignon","year":"2012","journal-title":"Knowl. Eng. Rev."},{"key":"B22","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v32i1.11492","article-title":"Emergence of grounded compositional language in multi-agent populations","volume-title":"Thirty-second AAAI conference on artificial intelligence","author":"Mordatch","year":"2018"},{"key":"B23","doi-asserted-by":"publisher","first-page":"3826","DOI":"10.1109\/tcyb.2020.2977374","article-title":"Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications","volume":"50","author":"Nguyen","year":"2020","journal-title":"IEEE Trans. Cybern."},{"key":"B24","first-page":"964","article-title":"Multi-agent graph-attention communication and teaming","author":"Niu","year":"2021"},{"key":"B25","volume-title":"Deepdrive zero","author":"Quiter","year":"2020"},{"key":"B26","volume-title":"Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning","author":"Rashid","year":"2018"},{"key":"B27","doi-asserted-by":"publisher","first-page":"1517","DOI":"10.1613\/jair.1.12531","article-title":"Madras: multi agent driving simulator","volume":"70","author":"Santara","year":"2021","journal-title":"J. Artif. Intell. Res."},{"key":"B28","volume-title":"Proximal policy optimization algorithms","author":"Schulman","year":"2017"},{"key":"B29","volume-title":"Safe, multi-agent, reinforcement learning for autonomous driving","author":"Shalev-Shwartz","year":"2016"},{"key":"B30","article-title":"Learning when to communicate at scale in multiagent cooperative and competitive tasks","volume-title":"7th international Conference on learning representations, ICLR 2019","author":"Singh","year":"2019"},{"key":"B31","first-page":"2244","article-title":"Learning multiagent communication with backpropagation","volume-title":"Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, december 5-10, 2016","author":"Sukhbaatar","year":"2016"},{"key":"B32","first-page":"2085","article-title":"Value-decomposition networks for cooperative multi-agent learning based on team reward","author":"Sunehag","year":"2018","journal-title":"Aamas"},{"key":"B33","first-page":"330","article-title":"Multi-agent reinforcement learning: independent vs. cooperative agents","author":"Tan","year":"1993"},{"key":"B34","volume-title":"Parameter sharing is surprisingly useful for multi-agent deep reinforcement learning","author":"Terry","year":"2020"},{"key":"B35","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","article-title":"Grandmaster level in starcraft ii using multi-agent reinforcement learning","volume":"575","author":"Vinyals","year":"2019","journal-title":"Nature"},{"key":"B36","volume-title":"R-maddpg for partially observable environments and limited communication","author":"Wang","year":"2020"},{"key":"B37","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1109\/MRS50823.2021.9620607","article-title":"Local advantage actor-critic for robust multi-agent deep reinforcement learning","volume-title":"2021 international symposium on multi-robot and multi-agent systems (MRS)","author":"Xiao","year":"2021"},{"key":"B38","doi-asserted-by":"publisher","first-page":"431","DOI":"10.1016\/j.eswa.2005.04.039","article-title":"Multi-agent framework for third party logistics in e-commerce","volume":"29","author":"Ying","year":"2005","journal-title":"Expert Syst. Appl."},{"key":"B39","article-title":"The surprising effectiveness of PPO in cooperative multi-agent games","author":"Yu","year":"2022"},{"key":"B40","doi-asserted-by":"publisher","first-page":"4546","DOI":"10.3390\/s20164546","article-title":"Research on the multiagent joint proximal policy optimization algorithm controlling cooperative fixed-wing uav obstacle avoidance","volume":"20","author":"Zhao","year":"2020","journal-title":"Sensors"}],"container-title":["Frontiers in Robotics and AI"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2024.1394209\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,20]],"date-time":"2024-12-20T08:53:24Z","timestamp":1734684804000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2024.1394209\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,20]]},"references-count":40,"alternative-id":["10.3389\/frobt.2024.1394209"],"URL":"https:\/\/doi.org\/10.3389\/frobt.2024.1394209","relation":{},"ISSN":["2296-9144"],"issn-type":[{"value":"2296-9144","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,20]]},"article-number":"1394209"}}