{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T16:36:41Z","timestamp":1779295001899,"version":"3.51.4"},"reference-count":88,"publisher":"SAGE Publications","issue":"8","license":[{"start":{"date-parts":[[2025,2,6]],"date-time":"2025-02-06T00:00:00Z","timestamp":1738800000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"},{"start":{"date-parts":[[2025,2,6]],"date-time":"2025-02-06T00:00:00Z","timestamp":1738800000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/100000183","name":"Army Research Office","doi-asserted-by":"publisher","award":["W911NF20-1-0265"],"award-info":[{"award-number":["W911NF20-1-0265"]}],"id":[{"id":"10.13039\/100000183","id-type":"DOI","asserted-by":"publisher"}]},{"name":"NSF CAREER","award":["2044993"],"award-info":[{"award-number":["2044993"]}]},{"DOI":"10.13039\/100000006","name":"U.S. Office of Naval Research","doi-asserted-by":"crossref","award":["N00014-19-1-2131"],"award-info":[{"award-number":["N00014-19-1-2131"]}],"id":[{"id":"10.13039\/100000006","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:p>\n                    The state-of-the-art multi-agent reinforcement learning (MARL) methods provide promising solutions to a variety of complex problems. Yet, these methods all assume that agents perform primitive actions in a synchronized manner, making them impractical for long-horizon real-world multi-robot tasks that inherently require robots to asynchronously reason about action selection at varying time durations. To solve this problem, we first propose a group of value-based cooperative MARL approaches for asynchronous execution using temporally extended\n                    <jats:italic toggle=\"yes\">macro-actions<\/jats:italic>\n                    . Here, agents perform asynchronous learning and decision-making with macro-action-value functions in three paradigms: decentralized learning and control, centralized learning and control, and centralized training for decentralized execution (CTDE). Building on the above work, we formulate a set of macro-action-based policy gradient algorithms under the three training paradigms, where agents directly optimize their parameterized policies in an asynchronous manner. We evaluate our methods both in simulation and on real robots over a variety of realistic domains. Empirical results demonstrate the effectiveness of our algorithms for learning high-quality and asynchronous solutions with macro-actions in large multi-agent problems that were previously unsolvable via primitive-action-based approaches. The proposed approaches represent the first general MARL methods for temporally extended actions and serve as the foundation for future methods in the area.\n                  <\/jats:p>","DOI":"10.1177\/02783649241306124","type":"journal-article","created":{"date-parts":[[2025,2,6]],"date-time":"2025-02-06T20:39:31Z","timestamp":1738874371000},"page":"1257-1286","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":7,"title":["Asynchronous multi-agent deep reinforcement learning under partial observability"],"prefix":"10.1177","volume":"44","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1038-1882","authenticated-orcid":false,"given":"Yuchen","family":"Xiao","sequence":"first","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weihao","family":"Tan","sequence":"additional","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joshua","family":"Hoffman","sequence":"additional","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tian","family":"Xia","sequence":"additional","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christopher","family":"Amato","sequence":"additional","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2025,2,6]]},"reference":[{"key":"e_1_3_6_2_1","unstructured":"Ahilan S Dayan P (2019) Feudal multi-agent hierarchies for cooperative reinforcement learning. arXiv preprint abs\/1901.08492."},{"key":"e_1_3_6_3_1","unstructured":"Ahn M Brohan A Brown N et al. (2022) Do as i can not as i say: grounding language in robotic affordances."},{"key":"e_1_3_6_4_1","unstructured":"Amato C Konidaris GD Kaelbling LP (2014) Planning with macro-actions in decentralized POMDPs. In: Proceedings of the international conference on autonomous agents and multiagent systems. Paris France 5-9 May 2014."},{"key":"e_1_3_6_5_1","doi-asserted-by":"crossref","unstructured":"Amato C Konidaris GD Anders A et al. (2015a) Policy search for multi-robot coordination under uncertainty. In: Proceedings of the robotics: science and systems conference. Rome Italy 13\u201317 July 2015.","DOI":"10.15607\/RSS.2015.XI.007"},{"key":"e_1_3_6_6_1","doi-asserted-by":"crossref","unstructured":"Amato C Konidaris GD Cruz G et al. (2015b) Planning for decentralized control of multiple robots under uncertainty. In: Proceedings of the international conference on robotics and automation Seattle WA USA 26\u201330 May 2015 pp. 1241\u20131248.","DOI":"10.1109\/ICRA.2015.7139350"},{"key":"e_1_3_6_7_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1.11418"},{"key":"e_1_3_6_8_1","doi-asserted-by":"crossref","unstructured":"Bacon P Harb J Precup O (2017) The option-critic architecture. In: Proceedings of the AAAI Conference on Artificial Intelligence San Francisco CA USA 41726\u201391734 Feb 2017.","DOI":"10.1609\/aaai.v31i1.10916"},{"key":"e_1_3_6_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/MRA.2015.2448951"},{"key":"e_1_3_6_10_1","unstructured":"Chakravorty J Ward PN Roy J et al. (2019) Option-critic in cooperative multi-agent systems. arXiv preprint arXiv:1911.12825."},{"key":"e_1_3_6_11_1","doi-asserted-by":"crossref","unstructured":"Cho K van Merrienboer B G\u00fcl\u00e7ehre \u00c7 et al. (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. Empirical Methods in Natural Language Processing (EMNLP) 1724\u20131734.","DOI":"10.3115\/v1\/D14-1179"},{"key":"e_1_3_6_12_1","unstructured":"Dalal M Pathak D Salakhutdinov R (2021) Accelerating robotic reinforcement learning via parameterized action primitives. In: Proceedings of the conference on neural information processing systems."},{"key":"e_1_3_6_13_1","unstructured":"de Witt CS Foerster J Farquhar G et al. (2019) Multi-agent common knowledge reinforcement learning. In: Proceedings of the conference on neural information processing systems. Vancouver BC Canada 8\u201314 Dec 2019."},{"key":"e_1_3_6_14_1","unstructured":"Du Y Han L Fang M et al. (2019) Liir: learning individual intrinsic reward in multi-agent reinforcement learning. In: Proceedings of the conference on neural information processing systems. Vancouver BC Canda 8\u201314 Dec 2019."},{"key":"e_1_3_6_15_1","doi-asserted-by":"crossref","unstructured":"Foerster J Farquhar G Afouras T et al. (2018) Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI conference on artificial intelligence.","DOI":"10.1609\/aaai.v32i1.11794"},{"key":"e_1_3_6_16_1","unstructured":"Fulda N Ventura D (2007) Predicting and preventing coordination problems in cooperative q-learning systems. In: Proceedings of the international joint conference on artificial intelligence. Hyderabad India 6780\u201312785 Jan 2007."},{"key":"e_1_3_6_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-29911-8_7"},{"key":"e_1_3_6_18_1","unstructured":"Hasselt HV (2010) Double q-learning. In: Proceedings of the conference on neural information processing systems. Vancouver BC Canada 62613\u2013112621 Dec 2010."},{"key":"e_1_3_6_19_1","unstructured":"Hasselt HV Guez A Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence. pp. 2094\u20132100."},{"key":"e_1_3_6_20_1","unstructured":"Hausknecht M Stone P (2015) Deep recurrent q-learning for partially observable mdps. In: AAAI fall symposium on sequential decision making for intelligent agents (AAAI-SDMIA15)."},{"key":"e_1_3_6_21_1","unstructured":"He R Bachrach A Roy N (2010) Efficient planning under uncertainty for a target-tracking micro-aerial vehicle. In: Proceedings of the international conference on robotics and automation Anchorage AK USA 03\u201307 May 2010."},{"key":"e_1_3_6_22_1","doi-asserted-by":"crossref","unstructured":"Hoang TN Xiao Y Sivakumar K et al. (2018) Near-optimal adversarial policy switching for decentralized asynchronous multi-agent systems. In: Proceedings of the international conference on robotics and automation Brisbane QLD Australia 21\u201325 May 2018.","DOI":"10.1109\/ICRA.2018.8460485"},{"key":"e_1_3_6_23_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_6_24_1","doi-asserted-by":"crossref","unstructured":"Hsiao K Kaelbling LP Lozano-Perez T (2010) Task-driven tactile exploration. In: Proceedings of the robotics: science and systems conference. Zaragoza Spain 27\u201330 June 2010.","DOI":"10.15607\/RSS.2010.VI.029"},{"key":"e_1_3_6_25_1","first-page":"2961","article-title":"Actor-attention-critic for multi-agent reinforcement learning","volume":"97","author":"Iqbal S","year":"2019","unstructured":"Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. Proceedings of the International Conference on Machine Learning 97: 2961\u20132970.","journal-title":"Proceedings of the International Conference on Machine Learning"},{"key":"e_1_3_6_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2006.12.011"},{"key":"e_1_3_6_27_1","unstructured":"Konda VR Tsitsiklis JN (2000) Actor-critic algorithms. In: Proceedings of the conference on neural information processing systems. pp. 1008\u20131014."},{"key":"e_1_3_6_28_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v25i1.7982"},{"key":"e_1_3_6_29_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.5575"},{"key":"e_1_3_6_30_1","doi-asserted-by":"crossref","unstructured":"Koubaa A Sriti MF Javed Y et al. (2016) Turtlebot at office: a service-oriented software architecture for personal assistant robots using ROS. 2016 international conference on autonomous robot systems and competitions (ICARSC) Braganca 04\u201306 May 2016 pp. 270\u2013276.","DOI":"10.1109\/ICARSC.2016.66"},{"key":"e_1_3_6_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2016.01.031"},{"key":"e_1_3_6_32_1","doi-asserted-by":"crossref","unstructured":"Lee Y Cai P Hsu D (2021) MAGIC: learning macro-actions for online POMDP planning. In: Proceedings of the robotics: science and systems conference. 12\u201316 July 2021.","DOI":"10.15607\/RSS.2021.XVII.041"},{"key":"e_1_3_6_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992699"},{"key":"e_1_3_6_34_1","doi-asserted-by":"crossref","unstructured":"Liu X Chen S Aditya S et al. (2018) Robust fruit counting: combining deep learning tracking and structure from motion. In: Proceedings of IEEE\/RSJ international conference on intelligent robots and systems Madrid Spain 01\u201305 October 2018 pp. 1045\u20131052.","DOI":"10.1109\/IROS.2018.8594239"},{"key":"e_1_3_6_35_1","unstructured":"Liu S Lever G Wang Z et al. (2021) From motor control to team play in simulated humanoid football abs\/2105.12196."},{"key":"e_1_3_6_36_1","unstructured":"Lowe R Wu Y Tamar A et al. (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Proceedings of the conference on neural information processing systems. Long Beach CA USA 4\u20139 Dec 2017."},{"key":"e_1_3_6_37_1","unstructured":"Lyu X Xiao Y Daley B et al. (2021) Contrasting centralized and decentralized critics in multi-agent reinforcement learning. In: Proceedings of the international conference on autonomous agents and multiagent systems. 3\u20137 May 2021."},{"key":"e_1_3_6_38_1","unstructured":"Mahajan A Rashid T Samvelyan M et al. (2019) Maven: multi-agent variational exploration. In: Proceedings of the conference on neural information processing systems. Vancouver BC Canada 87611\u2013147622 Dec 2019."},{"key":"e_1_3_6_39_1","doi-asserted-by":"crossref","unstructured":"Matignon L Laurent GJ Fort-Piat NL (2007) Hysteretic q-learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: Proceedings of IEEE\/RSJ international conference on intelligent robots and systems San Diego CA USA 29 October 2007\u201302 November 2007 pp. 64\u201369.","DOI":"10.1109\/IROS.2007.4399095"},{"key":"e_1_3_6_40_1","unstructured":"Maxime CB Julien R (2020) Teamgrid. https:\/\/github.com\/mila-iqia\/teamgrid."},{"key":"e_1_3_6_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2018.2848264"},{"key":"e_1_3_6_42_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_6_43_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.trc.2019.11.003"},{"key":"e_1_3_6_44_1","unstructured":"Nachum O Ahn M Ponte H et al. (2019) Multi-agent manipulation via locomotion using hierarchical sim2real. In: Proceedings of the conference on robot learning. Osaka Japan 30 Oct - 1 Jan 2019."},{"key":"e_1_3_6_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-28929-8"},{"key":"e_1_3_6_46_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.2447"},{"key":"e_1_3_6_47_1","doi-asserted-by":"crossref","unstructured":"Omidshafiei S Agha-mohammadi A Amato C et al. (2016) Graph-based cross entropy method for solving multi-robot decentralized POMDPs. In: Proceedings of the international conference on robotics and automation Stockholm Sweden 16\u201321 May 2016.","DOI":"10.1109\/ICRA.2016.7487751"},{"key":"e_1_3_6_48_1","doi-asserted-by":"publisher","DOI":"10.1177\/0278364917692864"},{"key":"e_1_3_6_49_1","unstructured":"Omidshafiei S Pazis J Amato C et al. (2017b) Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the international conference on machine learning Sydney Australia 62681\u2013112690 Aug 2017."},{"key":"e_1_3_6_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3030190"},{"key":"e_1_3_6_51_1","unstructured":"Rashid T Samvelyan M de Witt CS et al. (2018) Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the International Conference on Machine Learning. Stockholm Sweden 10\u201315 July 2018."},{"key":"e_1_3_6_52_1","unstructured":"Rashid T Farquhar G Peng B et al. (2020) Weighted qmix: expanding monotonic value function factorisation. In: Proceedings of the conference on neural information processing systems. 6\u201312 Dec 2020."},{"key":"e_1_3_6_53_1","doi-asserted-by":"crossref","unstructured":"Rosenband DL (2017) Inside waymo\u2019s self-driving car: my favorite transistors. 2017 symposium on VLSI circuits Kyoto Japan 05\u201308 June 2017 pp. C20\u2013C22.","DOI":"10.23919\/VLSIC.2017.8008500"},{"key":"e_1_3_6_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-65340-2_11"},{"key":"e_1_3_6_55_1","unstructured":"Son K Kim D Kang WJ et al. (2019) QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the international conference on machine learning. Long Beach CA 10\u201315 June 2019."},{"key":"e_1_3_6_56_1","doi-asserted-by":"crossref","unstructured":"Stulp F Schaal S (2011) Hierarchical reinforcement learning with movement primitives. In: 11th IEEE-RAS international conference on humanoid robots Bled Slovenia 26\u201328 October 2011.","DOI":"10.1109\/Humanoids.2011.6100841"},{"key":"e_1_3_6_57_1","doi-asserted-by":"crossref","unstructured":"Su J Adams S Beling PA (2021) Value-decomposition multi-agent actor-critics. In: Proceedings of the AAAI conference on artificial intelligence.","DOI":"10.1609\/aaai.v35i13.17353"},{"key":"e_1_3_6_58_1","unstructured":"Sunehag P Lever G Gruslys A et al. (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the international conference on autonomous agents and multiagent systems. Stockholm Sweden pp. 102085\u2013152087 July 2018."},{"key":"e_1_3_6_59_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00115009"},{"key":"e_1_3_6_60_1","unstructured":"Sutton RS Precup D Singh S (1998) Intra-option learning about temporally abstract actions. In: Proceedings of the International Conference on Machine Learning. Madison Wisconsin USA 24\u201327 July 1998."},{"key":"e_1_3_6_61_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(99)00052-1"},{"key":"e_1_3_6_62_1","first-page":"1057","volume-title":"Advances in Neural Information Processing Systems","author":"Sutton RS","year":"2000","unstructured":"Sutton RS, McAllester DA, Singh SP, et al. (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, 1057\u20131063."},{"key":"e_1_3_6_63_1","doi-asserted-by":"crossref","unstructured":"Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the international conference on machine learning. Amherst MA USA 27330\u201329337 June 1993.","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"key":"e_1_3_6_64_1","doi-asserted-by":"crossref","unstructured":"Tang YC (2019) Towards learning multi-agent negotiations via self-play. In: Autonomous driving workshop IEEE international conference on computer vision Seoul Republic of Korea 27\u201328 October 2019.","DOI":"10.1109\/ICCVW.2019.00297"},{"key":"e_1_3_6_65_1","volume-title":"Advances in Neural Information Processing Systems","author":"Theocharous G","year":"2004","unstructured":"Theocharous G, Kaelbling L (2004) Approximate planning in pomdps with macro-actions. In: Advances in Neural Information Processing Systems."},{"key":"e_1_3_6_66_1","unstructured":"Vezhnevets AS Wu Y Leblond R et al. (2020) Options as responses: grounding behavioural hierarchies in multi-agent rl. In: Proceedings of the international conference on machine learning. 12\u201318 July 2020."},{"key":"e_1_3_6_67_1","unstructured":"Wang J Ren Z Liu T et al. (2021c) Qplex: duplex dueling multi-agent q-learning. In: Proceedings of the international conference on learning representations. 3\u20137 May 2021."},{"key":"e_1_3_6_68_1","doi-asserted-by":"crossref","unstructured":"Wang J Zhang Y Kim TK et al. (2020a) Shapley q-value: a local reward approach to solve global reward games. In: Proceedings of the AAAI conference on artificial intelligence.","DOI":"10.1609\/aaai.v34i05.6220"},{"key":"e_1_3_6_69_1","unstructured":"Wang T Dong H and Victor Lesser CZ (2020c) Roma: multi-agent reinforcement learning with emergent roles. In: Proceedings of the international conference on machine learning. 12\u201318 July 2020."},{"key":"e_1_3_6_70_1","unstructured":"Wang T Gupta T Mahajan A et al. (2021a) Rode: learning roles to decompose multi-agent tasks. In: Proceedings of the international conference on learning representations. 3\u20137 May 2021."},{"key":"e_1_3_6_71_1","unstructured":"Wang Y Han B Wang T et al. (2021b) DOP: off-policy multi-agent decomposed policy gradients. In: Proceedings of the international conference on learning representations. 3\u20137 May 2021."},{"key":"e_1_3_6_72_1","unstructured":"Wang RE Kew JC Lee D et al. (2020b) Model-based reinforcement learning for decentralized multiagent rendezvous. In: Proceedings of the conference on robot learning. 16\u201318 Nov 2020."},{"key":"e_1_3_6_73_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"e_1_3_6_74_1","unstructured":"Weaver L Tao N (2001) The optimal reward baseline for gradient-based reinforcement learning. In: Proceedings of the conference on uncertainty in artificial intelligence. Morgan Kaufmann pp. 538\u2013545."},{"key":"e_1_3_6_75_1","unstructured":"Wise M Ferguson M King D et al. (2016) Fetch & freight: standard platforms for service robot applications. In: Workshop on autonomous mobile service robots international joint conference on artificial intelligence."},{"key":"e_1_3_6_76_1","doi-asserted-by":"crossref","unstructured":"Wu J Sun X Zeng A et al. (2020) Spatial action maps for mobile manipulation. In: Proceedings of the robotics: science and systems conference. 12\u201316 July 2020.","DOI":"10.15607\/RSS.2020.XVI.035"},{"key":"e_1_3_6_77_1","doi-asserted-by":"crossref","unstructured":"Wu J Sun X Zeng A et al. (2021a) Spatial intention maps for multi-agent mobile manipulation. In: Proceedings of the international conference on robotics and automation. Xi'an China 30 May \u2013 5 June 2021.","DOI":"10.1109\/ICRA48506.2021.9561359"},{"key":"e_1_3_6_78_1","doi-asserted-by":"publisher","DOI":"10.1111\/tops.12525"},{"key":"e_1_3_6_79_1","unstructured":"Xiao Y Hoffman J Amato C (2019a) Macro-action-based deep multi-agent reinforcement learning. In: Proceedings of the conference on robot learning. Osaka Japan 30 Oct \u2013 1 Nov 2019."},{"key":"e_1_3_6_80_1","doi-asserted-by":"crossref","unstructured":"Xiao Y Katt S ten Pas A et al. (2019b) Online planning for target object search in clutter under partial observability. In: Proceedings of the international conference on robotics and automation Montreal QC Canada 20\u201324 May 2019.","DOI":"10.1109\/ICRA.2019.8793494"},{"key":"e_1_3_6_81_1","first-page":"28","author":"Xiao Y","year":"2022","unstructured":"Xiao Y, Tan W, Amato C (2022) - 9 28. (accessed Nov).","journal-title":"- 9"},{"key":"e_1_3_6_82_1","unstructured":"Xu Z Bai Y Zhang B et al. (2021) HAVEN: hierarchical cooperative multi-agent reinforcement learning with dual coordination mechanism. arXiv preprint abs\/2110.07246."},{"key":"e_1_3_6_83_1","unstructured":"Yang J Borovikov I Zha H (2020a) Hierarchical cooperative multi-agent reinforcement learning with skill discovery. In: Proceedings of the international conference on autonomous agents and multiagent systems. 9\u201313 May 2020."},{"key":"e_1_3_6_84_1","unstructured":"Yang J Nakhaei A Isele D et al. (2020b) Cm3: cooperative multi-goal multi-stage multi-agent reinforcement learning. In: Proceedings of The International Conference On Learning Representations. Addis Ababa Ethiopia 26\u201330 April 2020."},{"key":"e_1_3_6_85_1","unstructured":"Yang T Wang W Tang H et al. (2021) An efficient transfer learining framework for multiagent reinforcement learining. In: Proceedings of the conference on neural information processing systems. 6\u201314 Dec 2021."},{"key":"e_1_3_6_86_1","first-page":"28","author":"Yu C","year":"2022","unstructured":"Yu C, Velu A, Vinitsky E, et al. (2022) - 9 28. (accessed Nov).","journal-title":"- 9"},{"key":"e_1_3_6_87_1","unstructured":"Yu C Yang X Gao J et al. (2023) Asynchronous multi-agent reinforcement learning for efficient real-time multi-robot cooperative exploration. In: Proceedings of the international conference on autonomous agents and multiagent systems. London UK 29 May \u2013 2 June 2023."},{"key":"e_1_3_6_88_1","doi-asserted-by":"crossref","unstructured":"Zheng Y Meng Z Hao J et al. (2018) Weighted double deep multiagent reinforcement learning in stochastic cooperative environments. In: Pacific rim international conference on artificial intelligence. pp. 421\u2013429.","DOI":"10.1007\/978-3-319-97310-4_48"},{"key":"e_1_3_6_89_1","unstructured":"Zhou M Liu Z Sui P et al. (2020) Learning implicit credit assignment for cooperative multi-agent reinforcement learning. In: Proceedings of the conference on neural information processing systems. 6\u201312 Dec 2020."}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649241306124","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/02783649241306124","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649241306124","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649241306124","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T10:17:39Z","timestamp":1777457859000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/02783649241306124"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,6]]},"references-count":88,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["10.1177\/02783649241306124"],"URL":"https:\/\/doi.org\/10.1177\/02783649241306124","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,6]]}}}