{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T19:37:36Z","timestamp":1780774656475,"version":"3.54.1"},"reference-count":256,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2021,4,15]],"date-time":"2021-04-15T00:00:00Z","timestamp":1618444800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,4,15]],"date-time":"2021-04-15T00:00:00Z","timestamp":1618444800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100005713","name":"Technische Universit\u00e4t M\u00fcnchen","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005713","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Artif Intell Rev"],"published-print":{"date-parts":[[2022,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The advances in reinforcement learning have recorded sublime success in various domains. Although the multi-agent domain has been overshadowed by its single-agent counterpart during this progress, multi-agent reinforcement learning gains rapid traction, and the latest accomplishments address problems with real-world complexity. This article provides an overview of the current developments in the field of multi-agent deep reinforcement learning. We focus primarily on literature from recent years that combines deep reinforcement learning methods with a multi-agent scenario. To survey the works that constitute the contemporary landscape, the main\u00a0contents are divided into three parts. First, we analyze the structure of training schemes that are applied to train multiple agents. Second, we consider the emergent patterns of agent behavior in cooperative, competitive and mixed scenarios. Third, we systematically enumerate challenges that exclusively arise in the multi-agent domain and review methods that are leveraged to cope with these challenges. To conclude this survey, we discuss advances, identify trends, and outline possible directions for future work in this research area.<\/jats:p>","DOI":"10.1007\/s10462-021-09996-w","type":"journal-article","created":{"date-parts":[[2021,4,15]],"date-time":"2021-04-15T21:15:14Z","timestamp":1618521314000},"page":"895-943","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":739,"title":["Multi-agent deep reinforcement learning: a survey"],"prefix":"10.1007","volume":"55","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0047-5116","authenticated-orcid":false,"given":"Sven","family":"Gronauer","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Klaus","family":"Diepold","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2021,4,15]]},"reference":[{"key":"9996_CR1","unstructured":"Ahilan S, Dayan P (2019) Feudal multi-agent hierarchies for cooperative reinforcement learning. CoRR arxiv: abs\/1901.08492"},{"key":"9996_CR2","unstructured":"Al-Shedivat M, Bansal T, Burda Y, Sutskever I, Mordatch I, Abbeel P (2018) Continuous adaptation via meta-learning in nonstationary and competitive environments. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=Sk2u1g-0-"},{"key":"9996_CR3","doi-asserted-by":"publisher","unstructured":"Albrecht SV, Stone P (2018) Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif Intell 258:66\u201395. https:\/\/doi.org\/10.1016\/j.artint.2018.01.002. http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0004370218300249","DOI":"10.1016\/j.artint.2018.01.002"},{"key":"9996_CR4","doi-asserted-by":"publisher","unstructured":"Amato C, Konidaris G, Cruz G, Maynor CA, How JP, Kaelbling LP (2015) Planning for decentralized control of multiple robots under uncertainty. In: 2015 IEEE international conference on robotics and automation (ICRA), pp 1241\u20131248. https:\/\/doi.org\/10.1109\/ICRA.2015.7139350","DOI":"10.1109\/ICRA.2015.7139350"},{"key":"9996_CR5","unstructured":"Amodei D, Olah C, Steinhardt J, Christiano PF, Schulman J, Man\u00e9 D (2016) Concrete problems in AI safety. CoRR. arxiv: abs\/1606.06565,"},{"key":"9996_CR6","doi-asserted-by":"crossref","unstructured":"Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Lawrence\u00a0Zitnick C, Parikh D (2015) Vqa: Visual question answering. In: The IEEE international conference on computer vision (ICCV)","DOI":"10.1109\/ICCV.2015.279"},{"issue":"6","key":"9996_CR7","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1109\/MSP.2017.2743240","volume":"34","author":"K Arulkumaran","year":"2017","unstructured":"Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26\u201338. https:\/\/doi.org\/10.1109\/MSP.2017.2743240","journal-title":"IEEE Signal Process Mag"},{"key":"9996_CR8","unstructured":"Aubret A, Matignon L, Hassas S (2019) A survey on intrinsic motivation in reinforcement learning. arXiv e-prints arXiv:1908.06976,"},{"key":"9996_CR9","unstructured":"Baker B, Kanitscheider I, Markov T, Wu Y, Powell G, McGrew B, Mordatch I (2020) Emergent tool use from multi-agent autocurricula. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=SkxpxJBKwS"},{"key":"9996_CR10","unstructured":"Bansal T, Pachocki J, Sidor S, Sutskever I, Mordatch I (2018) Emergent complexity via multi-agent competition. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=Sy0GnUxCb"},{"key":"9996_CR11","unstructured":"Barde P, Roy J, Harvey FG, Nowrouzezahrai D, Pal C (2019) Promoting coordination through policy regularization in multi-agent reinforcement learning. arXiv e-prints arXiv:1908.02269,"},{"key":"9996_CR12","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1016\/j.artint.2016.10.005","volume":"242","author":"S Barrett","year":"2017","unstructured":"Barrett S, Rosenfeld A, Kraus S, Stone P (2017) Making friends on the fly: cooperating with new teammates. Artif Intell 242:132\u2013171","journal-title":"Artif Intell"},{"key":"9996_CR13","unstructured":"Beattie C, Leibo JZ, Teplyashin D, Ward T, Wainwright M, K\u00fcttler H, Lefrancq A, Green S, Vald\u00e9s V, Sadik A, Schrittwieser J, Anderson K, York S, Cant M, Cain A, Bolton A, Gaffney S, King H, Hassabis D, Legg S, Petersen S (2016) Deepmind lab. CoRR. arxiv: abs\/1612.03801"},{"key":"9996_CR14","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1613\/jair.1497","volume":"22","author":"R Becker","year":"2004","unstructured":"Becker R, Zilberstein S, Lesser V, Goldman CV (2004) Solving transition independent decentralized Markov decision processes. J Artif Intell Res 22:423\u2013455","journal-title":"J Artif Intell Res"},{"key":"9996_CR15","unstructured":"Bellemare M, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R (2016) Unifying count-based exploration and intrinsic motivation. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29, Curran Associates, Inc., pp 1471\u20131479. http:\/\/papers.nips.cc\/paper\/6383-unifying-count-based-exploration-and-intrinsic-motivation.pdf"},{"key":"9996_CR16","doi-asserted-by":"crossref","unstructured":"Bellman R (1957) A Markovian decision process. J Math Mechanics 6(5):679\u2013684. http:\/\/www.jstor.org\/stable\/24900506","DOI":"10.1512\/iumj.1957.6.56038"},{"key":"9996_CR17","doi-asserted-by":"publisher","unstructured":"Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, ACM, New York, NY, USA, ICML \u201909, pp 41\u201348. https:\/\/doi.org\/10.1145\/1553374.1553380,","DOI":"10.1145\/1553374.1553380"},{"key":"9996_CR18","unstructured":"Berner C, Brockman G, Chan B, Cheung V, Debiak P, Dennison C, Farhi D, Fischer Q, Hashme S, Hesse C, J\u00f3zefowicz R, Gray S, Olsson C, Pachocki JW, Petrov M, de\u00a0Oliveira\u00a0Pinto HP, Raiman J, Salimans T, Schlatter J, Schneider J, Sidor S, Sutskever I, Tang J, Wolski F, Zhang S (2019) Dota 2 with large scale deep reinforcement learning. ArXiv arxiv: abs\/1912.06680"},{"issue":"4","key":"9996_CR19","doi-asserted-by":"publisher","first-page":"819","DOI":"10.1287\/moor.27.4.819.297","volume":"27","author":"DS Bernstein","year":"2002","unstructured":"Bernstein DS, Givan R, Immerman N, Zilberstein S (2002) The complexity of decentralized control of Markov decision processes. Math Oper Res 27(4):819\u2013840. https:\/\/doi.org\/10.1287\/moor.27.4.819.297","journal-title":"Math Oper Res"},{"key":"9996_CR20","volume-title":"Dynamic programming and optimal control","author":"DP Bertsekas","year":"2012","unstructured":"Bertsekas DP (2012) Dynamic programming and optimal control, vol 2, 4th edn. Athena Scientific, Belmont","edition":"4"},{"key":"9996_CR21","volume-title":"Dynamic programming and optimal control","author":"DP Bertsekas","year":"2017","unstructured":"Bertsekas DP (2017) Dynamic programming and optimal control, vol 1, 4th edn. Athena Scientific, Belmont","edition":"4"},{"key":"9996_CR22","doi-asserted-by":"publisher","first-page":"659","DOI":"10.1613\/jair.4818","volume":"53","author":"D Bloembergen","year":"2015","unstructured":"Bloembergen D, Tuyls K, Hennes D, Kaisers M (2015) Evolutionary dynamics of multi-agent learning: a survey. J Artif Intell Res 53:659\u2013697","journal-title":"J Artif Intell Res"},{"key":"9996_CR23","doi-asserted-by":"publisher","first-page":"459","DOI":"10.1007\/978-3-030-10925-7_28","volume-title":"Machine learning and knowledge discovery in databases","author":"G Bono","year":"2019","unstructured":"Bono G, Dibangoye JS, Matignon L, Pereyron F, Simonin O (2019) Cooperative multi-agent policy gradient. In: Berlingerio M, Bonchi F, G\u00e4rtner T, Hurley N, Ifrim G (eds) Machine learning and knowledge discovery in databases. Springer International Publishing, Cham, pp 459\u2013476"},{"key":"9996_CR24","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1007\/978-3-642-29946-9_25","volume-title":"Recent advances in reinforcement learning","author":"G Boutsioukis","year":"2012","unstructured":"Boutsioukis G, Partalas I, Vlahavas I (2012) Transfer learning in multi-agent reinforcement learning domains. In: Sanner S, Hutter M (eds) Recent advances in reinforcement learning. Springer, Berlin, pp 249\u2013260"},{"issue":"2","key":"9996_CR25","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1016\/S0004-3702(02)00121-2","volume":"136","author":"M Bowling","year":"2002","unstructured":"Bowling M, Veloso M (2002) Multiagent learning using a variable learning rate. Artif Intell 136(2):215\u2013250","journal-title":"Artif Intell"},{"key":"9996_CR26","unstructured":"Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv:1606.01540"},{"issue":"2","key":"9996_CR27","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1109\/TSMCC.2007.913919","volume":"38","author":"L Busoniu","year":"2008","unstructured":"Busoniu L, Babuska R, De Schutter B (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern Part C (Appl Rev) 38(2):156\u2013172. https:\/\/doi.org\/10.1109\/TSMCC.2007.913919","journal-title":"IEEE Trans Syst Man Cybern Part C (Appl Rev)"},{"key":"9996_CR28","doi-asserted-by":"publisher","unstructured":"Cai Y, Yang SX, Xu X (2013) A combined hierarchical reinforcement learning based approach for multi-robot cooperative target searching in complex unknown environments. In: 2013 IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), pp 52\u201359. https:\/\/doi.org\/10.1109\/ADPRL.2013.6614989","DOI":"10.1109\/ADPRL.2013.6614989"},{"key":"9996_CR29","unstructured":"Cao K, Lazaridou A, Lanctot M, Leibo JZ, Tuyls K, Clark S (2018) Emergent communication through negotiation. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=Hk6WhagRW"},{"issue":"1","key":"9996_CR30","doi-asserted-by":"publisher","first-page":"427","DOI":"10.1109\/TII.2012.2219061","volume":"9","author":"Y Cao","year":"2013","unstructured":"Cao Y, Yu W, Ren W, Chen G (2013) An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Industr Inf 9(1):427\u2013438. https:\/\/doi.org\/10.1109\/TII.2012.2219061","journal-title":"IEEE Trans Industr Inf"},{"key":"9996_CR31","unstructured":"Castellini J, Oliehoek FA, Savani R, Whiteson S (2019) The representational capacity of action-value networks for multi-agent reinforcement learning. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, international foundation for autonomous agents and multiagent systems, Richland, SC, AAMAS \u201919, pp 1862\u20131864. http:\/\/dl.acm.org\/citation.cfm?id=3306127.3331944"},{"key":"9996_CR32","doi-asserted-by":"crossref","unstructured":"Celikyilmaz A, Bosselut A, He X, Choi Y (2018) Deep communicating agents for abstractive summarization. CoRR arxiv: abs\/1803.10357,","DOI":"10.18653\/v1\/N18-1150"},{"key":"9996_CR33","unstructured":"Chang Y, Ho T, Kaelbling LP (2004) All learning is local: Multi-agent learning in global reward games. In: Thrun S, Saul LK, Sch\u00f6lkopf B (eds) Advances in neural information processing systems 16, MIT Press, pp 807\u2013814. http:\/\/papers.nips.cc\/paper\/2476-all-learning-is-local-multi-agent-learning-in-global-reward-games.pdf"},{"key":"9996_CR34","doi-asserted-by":"crossref","unstructured":"Chen Y, Zhou M, Wen Y, Yang Y, Su Y, Zhang W, Zhang D, Wang J, Liu H (2018) Factorized q-learning for large-scale multi-agent systems. CoRR arxiv: abs\/1809.03738","DOI":"10.1145\/3356464.3357707"},{"key":"9996_CR35","doi-asserted-by":"crossref","unstructured":"Chen YF, Liu M, Everett M, How JP (2016) Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. CoRR. arxiv: abs\/1609.07845,","DOI":"10.1109\/ICRA.2017.7989037"},{"key":"9996_CR36","unstructured":"Chentanez N, Barto AG, Singh SP (2005) Intrinsically motivated reinforcement learning. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17, MIT Press, pp 1281\u20131288. http:\/\/papers.nips.cc\/paper\/2552-intrinsically-motivated-reinforcement-learning.pdf"},{"key":"9996_CR37","unstructured":"Choi E, Lazaridou A, de\u00a0Freitas N (2018) Multi-agent compositional communication learning from raw visual input. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=rknt2Be0-"},{"key":"9996_CR38","unstructured":"Chu T, Chinchali S, Katti S (2020) Multi-agent reinforcement learning for networked system control. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=Syx7A3NFvH"},{"issue":"3","key":"9996_CR39","doi-asserted-by":"publisher","first-page":"1086","DOI":"10.1109\/TITS.2019.2901791","volume":"21","author":"T Chu","year":"2020","unstructured":"Chu T, Wang J, Codec\u00e0 L, Li Z (2020) Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans Intell Transp Syst 21(3):1086\u20131095","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"9996_CR40","unstructured":"Chu X, Ye H (2017) Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning. CoRR arxiv: abs\/1710.00336"},{"key":"9996_CR41","unstructured":"Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the fifteenth national conference on artificial intelligence and tenth innovative applications of artificial intelligence conference, AAAI 98, IAAI 98, July 26\u201330, 1998, Madison, Wisconsin, USA, pp 746\u2013752. http:\/\/www.aaai.org\/Library\/AAAI\/1998\/aaai98-106.php"},{"issue":"3","key":"9996_CR42","doi-asserted-by":"publisher","first-page":"281","DOI":"10.1007\/s10994-010-5192-9","volume":"82","author":"JW Crandall","year":"2011","unstructured":"Crandall JW, Goodrich MA (2011) Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning. Mach Learn 82(3):281\u2013314. https:\/\/doi.org\/10.1007\/s10994-010-5192-9","journal-title":"Mach Learn"},{"key":"9996_CR43","doi-asserted-by":"crossref","unstructured":"Da\u00a0Silva FL, Costa AHR (2017) Accelerating multiagent reinforcement learning through transfer learning. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, AAAI Press, AAAI\u201917, pp 5034\u20135035. http:\/\/dl.acm.org\/citation.cfm?id=3297863.3297988","DOI":"10.1609\/aaai.v31i1.10518"},{"issue":"1","key":"9996_CR44","doi-asserted-by":"publisher","first-page":"645","DOI":"10.1613\/jair.1.11396","volume":"64","author":"FL Da Silva","year":"2019","unstructured":"Da Silva FL, Costa AHR (2019) A survey on transfer learning for multiagent reinforcement learning systems. J Artif Int Res 64(1):645\u2013703. https:\/\/doi.org\/10.1613\/jair.1.11396","journal-title":"J Artif Int Res"},{"key":"9996_CR45","unstructured":"Da\u00a0Silva FL, Glatt R, Costa AHR (2017) Simultaneously learning and advising in multiagent reinforcement learning. In: Proceedings of the 16th conference on autonomous agents and multiagent systems, international foundation for autonomous agents and multiagent systems, Richland, SC, AAMAS \u201917, pp 1100\u20131108. http:\/\/dl.acm.org\/citation.cfm?id=3091210.3091280"},{"issue":"1","key":"9996_CR46","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1007\/s10458-019-09430-0","volume":"34","author":"FL Da Silva","year":"2019","unstructured":"Da Silva FL, Warnell G, Costa AHR, Stone P (2019) Agents teaching agents: a survey on inter-agent transfer learning. Auton Agent Multi-Agent Syst 34(1):9. https:\/\/doi.org\/10.1007\/s10458-019-09430-0","journal-title":"Auton Agent Multi-Agent Syst"},{"key":"9996_CR47","doi-asserted-by":"crossref","unstructured":"Das A, Kottur S, Moura JMF, Lee S, Batra D (2017) Learning cooperative visual dialog agents with deep reinforcement learning. In: The IEEE international conference on computer vision (ICCV)","DOI":"10.1109\/ICCV.2017.321"},{"key":"9996_CR48","unstructured":"Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, Pineau J (2019) TarMAC: Targeted multi-agent communication. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, PMLR, Long Beach, California, USA, Proceedings of machine learning research, vol\u00a097, pp 1538\u20131546. http:\/\/proceedings.mlr.press\/v97\/das19a.html"},{"key":"9996_CR49","unstructured":"Dayan P, Hinton GE (1993) Feudal reinforcement learning. In: Hanson SJ, Cowan JD, Giles CL (eds) Advances in neural information processing systems 5, Morgan-Kaufmann, pp 271\u2013278. http:\/\/papers.nips.cc\/paper\/714-feudal-reinforcement-learning.pdf"},{"key":"9996_CR50","doi-asserted-by":"publisher","unstructured":"De\u00a0Cote EM, Lazaric A, Restelli M (2006) Learning to cooperate in multi-agent social dilemmas. In: Proceedings of the fifth international joint conference on autonomous agents and multiagent systems, ACM, New York, NY, USA, AAMAS \u201906, pp 783\u2013785. https:\/\/doi.org\/10.1145\/1160633.1160770","DOI":"10.1145\/1160633.1160770"},{"key":"9996_CR51","doi-asserted-by":"publisher","unstructured":"Diallo EAO, Sugiyama A, Sugawara T (2017) Learning to coordinate with deep reinforcement learning in doubles pong game. In: 2017 16th IEEE international conference on machine learning and applications (ICMLA), pp 14\u201319. https:\/\/doi.org\/10.1109\/ICMLA.2017.0-184","DOI":"10.1109\/ICMLA.2017.0-184"},{"key":"9996_CR52","unstructured":"Dibangoye J, Buffet O (2018) Learning to act in decentralized partially observable MDPs. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, PMLR, Stockholmsm\u00e4ssan, Stockholm Sweden, Proceedings of Machine Learning Research, vol\u00a080, pp 1233\u20131242. http:\/\/proceedings.mlr.press\/v80\/dibangoye18a.html"},{"key":"9996_CR53","unstructured":"Dobbe R, Fridovich-Keil D, Tomlin C (2017) Fully decentralized policies for multi-agent systems: an information theoretic approach. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30, Curran Associates, Inc., pp 2941\u20132950. http:\/\/papers.nips.cc\/paper\/6887-fully-decentralized-policies-for-multi-agent-systems-an-information-theoretic-approach.pdf"},{"key":"9996_CR54","unstructured":"Duan Y, Schulman J, Chen X, Bartlett PL, Sutskever I, Abbeel P (2016) $$\\text{Rl}$$: fast reinforcement learning via slow reinforcement learning. CoRR arxiv: abs\/1611.02779,"},{"key":"9996_CR55","unstructured":"Eccles T, Bachrach Y, Lever G, Lazaridou A, Graepel T (2019) Biases for emergent communication in multi-agent reinforcement learning. In: Wallach H, Larochelle H, Beygelzimer A, Alche-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems 32, Curran Associates, Inc., pp 13111\u201313121. http:\/\/papers.nips.cc\/paper\/9470-biases-for-emergent-communication-in-multi-agent-reinforcement-learning.pdf"},{"key":"9996_CR56","unstructured":"Everett R, Roberts S (2018) Learning against non-stationary agents with opponent modelling and deep reinforcement learning. In: 2018 AAAI Spring symposium series"},{"key":"9996_CR57","unstructured":"Evtimova K, Drozdov A, Kiela D, Cho K (2018) Emergent communication in a multi-modal, multi-step referential game. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=rJGZq6g0-"},{"key":"9996_CR58","unstructured":"Finn C, Levine S (2018) Meta-learning and universality: deep representations and gradient descent can approximate any learning algorithm. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=HyjC5yWCW"},{"key":"9996_CR59","unstructured":"Foerster J, Assael IA, de\u00a0Freitas N, Whiteson S (2016) Learning to communicate with deep multi-agent reinforcement learning. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29, Curran Associates, Inc., pp 2137\u20132145. http:\/\/papers.nips.cc\/paper\/6042-learning-to-communicate-with-deep-multi-agent-reinforcement-learning.pdf"},{"key":"9996_CR60","unstructured":"Foerster J, Nardelli N, Farquhar G, Afouras T, Torr PHS, Kohli P, Whiteson S (2017) Stabilising experience replay for deep multi-agent reinforcement learning. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, PMLR, International Convention Centre, Sydney, Australia, Proceedings of Machine Learning Research, vol\u00a070, pp 1146\u20131155. http:\/\/proceedings.mlr.press\/v70\/foerster17b.html"},{"key":"9996_CR61","unstructured":"Foerster J, Chen RY, Al-Shedivat M, Whiteson S, Abbeel P, Mordatch I (2018a) Learning with opponent-learning awareness. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, AAMAS \u201918, pp 122\u2013130. http:\/\/dl.acm.org\/citation.cfm?id=3237383.3237408"},{"key":"9996_CR62","doi-asserted-by":"crossref","unstructured":"Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018b) Counterfactual multi-agent policy gradients. https:\/\/aaai.org\/ocs\/index.php\/AAAI\/AAAI18\/paper\/view\/17193","DOI":"10.1609\/aaai.v32i1.11794"},{"key":"9996_CR63","unstructured":"Foerster J, Song F, Hughes E, Burch N, Dunning I, Whiteson S, Botvinick M, Bowling M (2019) Bayesian action decoder for deep multi-agent reinforcement learning. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, PMLR, Long Beach, California, USA, Proceedings of Machine Learning Research, vol\u00a097, pp 1942\u20131951. http:\/\/proceedings.mlr.press\/v97\/foerster19a.html"},{"key":"9996_CR64","unstructured":"Fulda N, Ventura D (2007) Predicting and preventing coordination problems in cooperative q-learning systems. In: Proceedings of the 20th international joint conference on artifical intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI\u201907, pp 780\u2013785"},{"key":"9996_CR65","unstructured":"Garc\u00eda J, Fern, o\u00a0Fern\u00e1ndez (2015) A comprehensive survey on safe reinforcement learning. J Mach Learn Res 16(42):1437\u20131480. http:\/\/jmlr.org\/papers\/v16\/garcia15a.html"},{"key":"9996_CR66","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-006-7035-4","author":"M Ghavamzadeh","year":"2006","unstructured":"Ghavamzadeh M, Mahadevan S, Makar R (2006) Hierarchical multi-agent reinforcement learning. Auton Agent Multi-Agent Syst. https:\/\/doi.org\/10.1007\/s10458-006-7035-4","journal-title":"Auton Agent Multi-Agent Syst"},{"key":"9996_CR67","unstructured":"Gleave A, Dennis M, Wild C, Kant N, Levine S, Russell S (2020) Adversarial policies: Attacking deep reinforcement learning. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=HJgEMpVFwB"},{"key":"9996_CR68","doi-asserted-by":"crossref","unstructured":"Goldman CV, Zilberstein S (2004) Decentralized control of cooperative systems: categorization and complexity analysis. J Artif Int Res 22(1):143\u2013174. http:\/\/dl.acm.org\/citation.cfm?id=1622487.1622493","DOI":"10.1613\/jair.1427"},{"key":"9996_CR69","unstructured":"Grover A, Al-Shedivat M, Gupta J, Burda Y, Edwards H (2018) Learning policy representations in multiagent systems. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, PMLR, Stockholmsm\u00e4ssan, Stockholm Sweden, Proceedings of Machine Learning Research, vol\u00a080, pp 1802\u20131811. http:\/\/proceedings.mlr.press\/v80\/grover18a.html"},{"key":"9996_CR70","unstructured":"Guestrin C, Koller D, Parr R (2002) Multiagent planning with factored mdps. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14, MIT Press, pp 1523\u20131530. http:\/\/papers.nips.cc\/paper\/1941-multiagent-planning-with-factored-mdps.pdf"},{"key":"9996_CR71","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1007\/978-3-319-71682-4_5","volume-title":"Autonomous Agents and Multiagent Systems","author":"JK Gupta","year":"2017","unstructured":"Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: Sukthankar G, Rodriguez-Aguilar JA (eds) autonomous agents and multiagent systems. Springer, Cham, pp 66\u201383"},{"key":"9996_CR72","unstructured":"Hadfield-Menell D, Milli S, Abbeel P, Russell SJ, Dragan A (2017) Inverse reward design. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30, Curran Associates, Inc., pp 6765\u20136774. http:\/\/papers.nips.cc\/paper\/7253-inverse-reward-design.pdf"},{"key":"9996_CR73","doi-asserted-by":"crossref","unstructured":"Han D, Boehmer W, Wooldridge M, Rogers A (2019) Multi-agent hierarchical reinforcement learning with dynamic termination. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, AAMAS \u201919, pp 2006\u20132008. http:\/\/dl.acm.org\/citation.cfm?id=3306127.3331992","DOI":"10.1007\/978-3-030-29911-8_7"},{"key":"9996_CR74","unstructured":"Hansen EA, Bernstein D, Zilberstein S (2004) Dynamic programming for partially observable stochastic games. In: AAAI"},{"issue":"3859","key":"9996_CR75","doi-asserted-by":"publisher","first-page":"1243","DOI":"10.1126\/science.162.3859.1243","volume":"162","author":"G Hardin","year":"1968","unstructured":"Hardin G (1968) The tragedy of the commons. Science 162(3859):1243\u20131248","journal-title":"Science"},{"key":"9996_CR76","unstructured":"Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable mdps. https:\/\/www.aaai.org\/ocs\/index.php\/FSS\/FSS15\/paper\/view\/11673"},{"key":"9996_CR77","unstructured":"Havrylov S, Titov I (2017) Emergence of language with multi-agent games: Learning to communicate with sequences of symbols. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30, Curran Associates, Inc., pp 2149\u20132159. http:\/\/papers.nips.cc\/paper\/6810-emergence-of-language-with-multi-agent-games-learning-to-communicate-with-sequences-of-symbols.pdf"},{"key":"9996_CR78","unstructured":"He H, Boyd-Graber J, Kwok K, III HD (2016) Opponent modeling in deep reinforcement learning. In: Balcan MF, Weinberger KQ (eds) Proceedings of The 33rd international conference on machine learning, PMLR, New York, New York, USA, Proceedings of Machine Learning Research, vol\u00a048, pp 1804\u20131813. http:\/\/proceedings.mlr.press\/v48\/he16.html"},{"key":"9996_CR79","doi-asserted-by":"crossref","unstructured":"He H, Chen D, Balakrishnan A, Liang P (2018) Decoupling strategy and generation in negotiation dialogues. CoRR arxiv: abs\/1808.09637,","DOI":"10.18653\/v1\/D18-1256"},{"key":"9996_CR80","unstructured":"Heinrich J, Silver D (2016) Deep reinforcement learning from self-play in imperfect-information games. CoRR arxiv: abs\/1603.01121,"},{"key":"9996_CR81","doi-asserted-by":"crossref","unstructured":"Henderson P, Islam R, Bachman P, Pineau J, Precup D, Meger D (2018) Deep reinforcement learning that matters. https:\/\/www.aaai.org\/ocs\/index.php\/AAAI\/AAAI18\/paper\/view\/16669","DOI":"10.1609\/aaai.v32i1.11694"},{"key":"9996_CR82","unstructured":"Hernandez-Leal P, Kaisers M, Baarslag T, de\u00a0Cote EM (2017) A survey of learning in multiagent environments: dealing with non-stationarity. CoRR arxiv: abs\/1707.09183,"},{"key":"9996_CR83","unstructured":"Hernandez-Leal P, Kartal B, Taylor ME (2019) Agent modeling as auxiliary task for deep reinforcement learning. CoRR arxiv: abs\/1907.09597,"},{"issue":"6","key":"9996_CR84","doi-asserted-by":"publisher","first-page":"750","DOI":"10.1007\/s10458-019-09421-1","volume":"33","author":"P Hernandez-Leal","year":"2019","unstructured":"Hernandez-Leal P, Kartal B, Taylor ME (2019) A survey and critique of multiagent deep reinforcement learning. Auton Agent Multi-Agent Syst 33(6):750\u2013797. https:\/\/doi.org\/10.1007\/s10458-019-09421-1","journal-title":"Auton Agent Multi-Agent Syst"},{"issue":"8","key":"9996_CR85","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735\u20131780. https:\/\/doi.org\/10.1162\/neco.1997.9.8.1735","journal-title":"Neural Comput"},{"key":"9996_CR86","unstructured":"Hong Z, Su S, Shann T, Chang Y, Lee C (2017) A deep policy inference q-network for multi-agent systems. CoRR arxiv: abs\/1712.07893,"},{"key":"9996_CR87","unstructured":"Hoshen Y (2017) Vain: Attentional multi-agent predictive modeling. In: Proceedings of the 31st international conference on neural information processing systems, Curran Associates Inc., USA, NIPS\u201917, pp 2698\u20132708. http:\/\/dl.acm.org\/citation.cfm?id=3294996.3295030"},{"key":"9996_CR88","unstructured":"Houthooft R, Chen X, Chen X, Duan Y, Schulman J, De\u00a0Turck F, Abbeel P (2016) Vime: variational information maximizing exploration. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in Neural Information Processing Systems 29, Curran Associates, Inc., pp 1109\u20131117. http:\/\/papers.nips.cc\/paper\/6591-vime-variational-information-maximizing-exploration.pdf"},{"key":"9996_CR89","unstructured":"Hu J, Wellman MP (1998) Multiagent reinforcement learning: theoretical framework and an algorithm. In: Proceedings of the Fifteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, ICML \u201998, pp 242\u2013250. http:\/\/dl.acm.org\/citation.cfm?id=645527.657296"},{"key":"9996_CR90","first-page":"1039","volume":"4","author":"J Hu","year":"2003","unstructured":"Hu J, Wellman MP (2003) Nash q-learning for general-sum stochastic games. J Mach Learn Res 4:1039\u20131069","journal-title":"J Mach Learn Res"},{"key":"9996_CR91","unstructured":"Hughes E, Leibo JZ, Phillips M, Tuyls K, Due\u00f1ez Guzman E, Garc\u00eda Casta\u00f1eda A, Dunning I, Zhu T, McKee K, Koster R, Roff H, Graepel T (2018) Inequity aversion improves cooperation in intertemporal social dilemmas. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31, Curran Associates, Inc., pp 3326\u20133336. http:\/\/papers.nips.cc\/paper\/7593-inequity-aversion-improves-cooperation-in-intertemporal-social-dilemmas.pdf"},{"key":"9996_CR92","unstructured":"Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, PMLR, Long Beach, California, USA, Proceedings of machine learning research, vol\u00a097, pp 2961\u20132970. http:\/\/proceedings.mlr.press\/v97\/iqbal19a.html"},{"key":"9996_CR93","unstructured":"Islam R, Henderson P, Gomrokchi M, Precup D (2017) Reproducibility of benchmarked deep reinforcement learning tasks for continuous control. CoRR arxiv: abs\/1708.04133,"},{"issue":"6443","key":"9996_CR94","doi-asserted-by":"publisher","first-page":"859","DOI":"10.1126\/science.aau6249","volume":"364","author":"M Jaderberg","year":"2019","unstructured":"Jaderberg M, Czarnecki WM, Dunning I, Marris L, Lever G, Casta\u00f1eda AG, Beattie C, Rabinowitz NC, Morcos AS, Ruderman A, Sonnerat N, Green T, Deason L, Leibo JZ, Silver D, Hassabis D, Kavukcuoglu K, Graepel T (2019) Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science 364(6443):859\u2013865","journal-title":"Science"},{"key":"9996_CR95","doi-asserted-by":"crossref","unstructured":"Jain U, Weihs L, Kolve E, Rastegari M, Lazebnik S, Farhadi A, Schwing AG, Kembhavi A (2019) Two body problem: Collaborative visual task completion. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2019.00685"},{"key":"9996_CR96","unstructured":"Jaques N, Lazaridou A, Hughes E, G\u00fcl\u00e7ehre \u00c7, Ortega PA, Strouse D, Leibo JZ, de\u00a0Freitas N (2018) Intrinsic social motivation via causal influence in multi-agent RL. CoRR arxiv: abs\/1810.08647,"},{"key":"9996_CR97","unstructured":"Jaques N, Lazaridou A, Hughes E, Gulcehre C, Ortega P, Strouse D, Leibo JZ, De\u00a0Freitas N (2019) Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In: International conference on machine learning, pp 3040\u20133049"},{"key":"9996_CR98","unstructured":"Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31, Curran Associates, Inc., pp 7254\u20137264. http:\/\/papers.nips.cc\/paper\/7956-learning-attentional-communication-for-multi-agent-cooperation.pdf"},{"key":"9996_CR99","unstructured":"Johnson M, Hofmann K, Hutton T, Bignell D (2016) The malmo platform for artificial intelligence experimentation. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, AAAI Press, IJCAI\u201916, pp 4246\u20134247. http:\/\/dl.acm.org\/citation.cfm?id=3061053.3061259"},{"key":"9996_CR100","unstructured":"Jorge E, K\u00e5geb\u00e4ck M, Gustavsson E (2016) Learning to play guess who? and inventing a grounded language as a consequence. CoRR arxiv: abs\/1611.03218,"},{"key":"9996_CR101","unstructured":"Juliani A, Berges V, Vckay E, Gao Y, Henry H, Mattar M, Lange D (2018) Unity: a general platform for intelligent agents. CoRR arxiv: abs\/1809.02627,"},{"key":"9996_CR102","doi-asserted-by":"crossref","unstructured":"Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4(1):237\u2013285. http:\/\/dl.acm.org\/citation.cfm?id=1622737.1622748","DOI":"10.1613\/jair.301"},{"key":"9996_CR103","doi-asserted-by":"crossref","unstructured":"Kasai T, Tenmoto H, Kamiya A (2008) Learning of communication codes in multi-agent reinforcement learning problem. In: 2008 IEEE conference on soft computing in industrial applications, pp 1\u20136","DOI":"10.1109\/SMCIA.2008.5045926"},{"key":"9996_CR104","doi-asserted-by":"crossref","unstructured":"Kim W, Cho M, Sung Y (2019) Message-dropout: An efficient training method for multi-agent deep reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence 33(01):6079\u20136086","DOI":"10.1609\/aaai.v33i01.33016079"},{"issue":"2","key":"9996_CR105","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1162\/106454602320184248","volume":"8","author":"S Kirby","year":"2002","unstructured":"Kirby S (2002) Natural language from artificial life. Artif Life 8(2):185\u2013215. https:\/\/doi.org\/10.1162\/106454602320184248","journal-title":"Artif Life"},{"key":"9996_CR106","unstructured":"Kok JR, Vlassis N (2006) Collaborative multiagent reinforcement learning by payoff propagation. J Mach Learn Res 7:1789\u20131828. http:\/\/dl.acm.org\/citation.cfm?id=1248547.1248612"},{"issue":"1","key":"9996_CR107","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1146\/annurev.soc.24.1.183","volume":"24","author":"P Kollock","year":"1998","unstructured":"Kollock P (1998) Social dilemmas: the anatomy of cooperation. Annu Rev Sociol 24(1):183\u2013214. https:\/\/doi.org\/10.1146\/annurev.soc.24.1.183","journal-title":"Annu Rev Sociol"},{"key":"9996_CR108","unstructured":"Kong X, Xin B, Liu F, Wang Y (2017) Revisiting the master-slave architecture in multi-agent deep reinforcement learning. CoRR arxiv: abs\/1712.07305,"},{"key":"9996_CR109","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1016\/j.neucom.2016.01.031","volume":"190","author":"L Kraemer","year":"2016","unstructured":"Kraemer L, Banerjee B (2016) Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190:82\u201394","journal-title":"Neurocomputing"},{"key":"9996_CR110","unstructured":"Kumar S, Shah P, Hakkani-T\u00fcr D, Heck LP (2017) Federated control with hierarchical multi-agent deep reinforcement learning. CoRR arxiv: abs\/1712.08266,"},{"key":"9996_CR111","unstructured":"Lanctot M, Zambaldi V, Gruslys A, Lazaridou A, Tuyls K, Perolat J, Silver D, Graepel T (2017) A unified game-theoretic approach to multiagent reinforcement learning. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30, Curran Associates, Inc., pp 4190\u20134203. http:\/\/papers.nips.cc\/paper\/7007-a-unified-game-theoretic-approach-to-multiagent-reinforcement-learning.pdf"},{"issue":"2","key":"9996_CR112","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1016\/j.obhdp.2012.11.003","volume":"120","author":"PAV Lange","year":"2013","unstructured":"Lange PAV, Joireman J, Parks CD, Dijk EV (2013) The psychology of social dilemmas: a review. Organ Behav Hum Decis Process 120(2):125\u2013141","journal-title":"Organ Behav Hum Decis Process"},{"key":"9996_CR113","unstructured":"Lauer M, Riedmiller M (2000) An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: In Proceedings of the Seventeenth International Conference on Machine Learning, Morgan Kaufmann, pp 535\u2013542"},{"key":"9996_CR114","doi-asserted-by":"crossref","unstructured":"Laurent GJ, Matignon L, Fort-Piat NL (2011) The world of independent learners is not markovian. Int J Knowl-Based Intell Eng Syst 15(1):55\u201364. http:\/\/dl.acm.org\/citation.cfm?id=1971886.1971887","DOI":"10.3233\/KES-2010-0206"},{"key":"9996_CR115","unstructured":"Lazaridou A, Baroni M (2020) Emergent multi-agent communication in the deep learning era. ArXiv arxiv: abs\/2006.02419"},{"key":"9996_CR116","unstructured":"Lazaridou A, Peysakhovich A, Baroni M (2017) Multi-agent cooperation and the emergence of (natural) language. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. https:\/\/openreview.net\/forum?id=Hk8N3Sclg"},{"key":"9996_CR117","unstructured":"Lazaridou A, Hermann KM, Tuyls K, Clark S (2018) Emergence of linguistic communication from referential games with symbolic and pixel input. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=HJGv1Z-AW"},{"key":"9996_CR118","unstructured":"Le HM, Yue Y, Carr P, Lucey P (2017) Coordinated multi-agent imitation learning. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, PMLR, International Convention Centre, Sydney, Australia, Proceedings of Machine Learning Research, vol\u00a070, pp 1995\u20132003. http:\/\/proceedings.mlr.press\/v70\/le17a.html"},{"key":"9996_CR119","unstructured":"Lee J, Cho K, Weston J, Kiela D (2017) Emergent translation in multi-agent communication. CoRR arxiv: abs\/1710.06922,"},{"key":"9996_CR120","unstructured":"Lee Y, Yang J, Lim JJ (2020) Learning to coordinate manipulation skills via skill behavior diversification. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=ryxB2lBtvH"},{"key":"9996_CR121","unstructured":"Leibo JZ, Zambaldi V, Lanctot M, Marecki J, Graepel T (2017) Multi-agent reinforcement learning in sequential social dilemmas. In: Proceedings of the 16th conference on autonomous agents and multiagent systems, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, AAMAS \u201917, pp 464\u2013473. http:\/\/dl.acm.org\/citation.cfm?id=3091125.3091194"},{"key":"9996_CR122","unstructured":"Leibo JZ, Hughes E, Lanctot M, Graepel T (2019) Autocurricula and the emergence of innovation from social interaction: a manifesto for multi-agent intelligence research. CoRR arxiv: abs\/1903.00742,"},{"key":"9996_CR123","unstructured":"Lerer A, Peysakhovich A (2017) Maintaining cooperation in complex social dilemmas using deep reinforcement learning. CoRR arxiv: abs\/1707.01068,"},{"key":"9996_CR124","unstructured":"Letcher A, Foerster J, Balduzzi D, Rockt\u00e4schel T, Whiteson S (2019) Stable opponent shaping in differentiable games. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=SyGjjsC5tQ"},{"key":"9996_CR125","unstructured":"Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. Journal of Machine Learning Research 17(1):1334\u20131373. http:\/\/dl.acm.org\/citation.cfm?id=2946645.2946684"},{"key":"9996_CR126","doi-asserted-by":"crossref","unstructured":"Lewis M, Yarats D, Dauphin YN, Parikh D, Batra D (2017) Deal or no deal? end-to-end learning for negotiation dialogues. CoRR arxiv: abs\/1706.05125,","DOI":"10.18653\/v1\/D17-1259"},{"key":"9996_CR127","unstructured":"Li F, Bowling M (2019) Ease-of-teaching and language structure from emergent communication. In: Wallach H, Larochelle H, Beygelzimer A, Alche-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems 32, Curran Associates, Inc., pp 15851\u201315861. http:\/\/papers.nips.cc\/paper\/9714-ease-of-teaching-and-language-structure-from-emergent-communication.pdf"},{"issue":"01","key":"9996_CR128","first-page":"4213","volume":"33","author":"S Li","year":"2019","unstructured":"Li S, Wu Y, Cui X, Dong H, Fang F, Russell S (2019a) Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. Proc AAAI Conf Artif Intell 33(01):4213\u20134220","journal-title":"Proc AAAI Conf Artif Intell"},{"issue":"01","key":"9996_CR129","first-page":"6096","volume":"33","author":"X Li","year":"2019","unstructured":"Li X, Sun M, Li P (2019b) Multi-agent discussion mechanism for natural language generation. Proc AAAI Conf Artif Intell 33(01):6096\u20136103","journal-title":"Proc AAAI Conf Artif Intell"},{"key":"9996_CR130","unstructured":"Li Y (2018) Deep reinforcement learning. CoRR arxiv: abs\/1810.06339,"},{"key":"9996_CR131","unstructured":"Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: ICLR (Poster). http:\/\/arxiv.org\/arxiv: abs\/1509.02971"},{"key":"9996_CR132","doi-asserted-by":"publisher","unstructured":"Lin K, Zhao R, Xu Z, Zhou J (2018) Efficient large-scale fleet management via multi-agent deep reinforcement learning. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, ACM, New York, NY, USA, KDD \u201918, pp 1774\u20131783. https:\/\/doi.org\/10.1145\/3219819.3219993,","DOI":"10.1145\/3219819.3219993"},{"issue":"1","key":"9996_CR133","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1109\/TCIAIG.2017.2679115","volume":"10","author":"X Lin","year":"2018","unstructured":"Lin X, Beling PA, Cogill R (2018) Multiagent inverse reinforcement learning for two-person zero-sum games. IEEE Trans Games 10(1):56\u201368. https:\/\/doi.org\/10.1109\/TCIAIG.2017.2679115","journal-title":"IEEE Trans Games"},{"key":"9996_CR134","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1016\/S1389-0417(01)00015-8","volume":"2","author":"M Littman","year":"2001","unstructured":"Littman M (2001) Value-function reinforcement learning in markov games. Cogn Syst Res 2:55\u201366","journal-title":"Cogn Syst Res"},{"key":"9996_CR135","doi-asserted-by":"crossref","unstructured":"Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the eleventh international conference on international conference on machine learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, ICML\u201994, pp 157\u2013163. http:\/\/dl.acm.org\/citation.cfm?id=3091574.3091594","DOI":"10.1016\/B978-1-55860-335-6.50027-1"},{"key":"9996_CR136","unstructured":"Liu IJ, Yeh RA, Schwing AG (2020) Pic: Permutation invariant critic for multi-agent deep reinforcement learning. In: PMLR, proceedings of machine learning research, vol 100, pp 590\u2013602. http:\/\/proceedings.mlr.press\/v100\/liu20a.html"},{"key":"9996_CR137","unstructured":"Liu S, Lever G, Heess N, Merel J, Tunyasuvunakool S, Graepel T (2019) Emergent coordination through competition. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=BkG8sjR5Km"},{"key":"9996_CR138","unstructured":"Long Q, Zhou Z, Gupta A, Fang F, Wu Y, Wang X (2020) Evolutionary population curriculum for scaling multi-agent reinforcement learning. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=SJxbHkrKDH"},{"key":"9996_CR139","unstructured":"Lowe R, WU Y, Tamar A, Harb J, Pieter\u00a0Abbeel O, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30, Curran Associates, Inc., pp 6379\u20136390. http:\/\/papers.nips.cc\/paper\/7217-multi-agent-actor-critic-for-mixed-cooperative-competitive-environments.pdf"},{"key":"9996_CR140","unstructured":"Lowe R, Foerster JN, Boureau Y, Pineau J, Dauphin YN (2019) On the pitfalls of measuring emergent communication. CoRR arxiv: abs\/1903.05168,"},{"key":"9996_CR141","doi-asserted-by":"crossref","unstructured":"Luketina J, Nardelli N, Farquhar G, Foerster JN, Andreas J, Grefenstette E, Whiteson S, Rockt\u00e4schel T (2019) A survey of reinforcement learning informed by natural language. CoRR arxiv: abs\/1906.03926,","DOI":"10.24963\/ijcai.2019\/880"},{"key":"9996_CR142","doi-asserted-by":"publisher","unstructured":"Luong NC, Hoang DT, Gong S, Niyato D, Wang P, Liang Y, Kim DI (2019) Applications of deep reinforcement learning in communications and networking: a survey. IEEE Communications Surveys Tutorials pp 1\u20131. https:\/\/doi.org\/10.1109\/COMST.2019.2916583","DOI":"10.1109\/COMST.2019.2916583"},{"issue":"6719","key":"9996_CR143","doi-asserted-by":"publisher","first-page":"498","DOI":"10.1038\/17290","volume":"397","author":"T Lux","year":"1999","unstructured":"Lux T, Marchesi M (1999) Scaling and criticality in a stochastic multi-agent model of a financial market. Nature 397(6719):498\u2013500. https:\/\/doi.org\/10.1038\/17290","journal-title":"Nature"},{"key":"9996_CR144","unstructured":"Lyu X, Amato C (2020) Likelihood quantile networks for coordinating multi-agent reinforcement learning. In: Proceedings of the 19th international conference on autonomous agents and multiagent systems, pp 798\u2013806"},{"key":"9996_CR145","unstructured":"Ma J, Wu F (2020) Feudal multi-agent deep reinforcement learning for traffic signal control. In: Seghrouchni AEF, Sukthankar G, An B, Yorke-Smith N (eds) Proceedings of the 19th international conference on autonomous agents and multiagent systems, AAMAS \u201920, Auckland, New Zealand, May 9-13, 2020, International Foundation for Autonomous Agents and Multiagent Systems, pp 816\u2013824. https:\/\/dl.acm.org\/doi\/arxiv: abs\/10.5555\/3398761.3398858"},{"key":"9996_CR146","doi-asserted-by":"publisher","unstructured":"Makar R, Mahadevan S, Ghavamzadeh M (2001) Hierarchical multi-agent reinforcement learning. In: Proceedings of the fifth international conference on autonomous agents, ACM, New York, NY, USA, AGENTS \u201901, pp 246\u2013253. https:\/\/doi.org\/10.1145\/375735.376302,","DOI":"10.1145\/375735.376302"},{"key":"9996_CR147","doi-asserted-by":"crossref","unstructured":"Matignon L, Laurent GJ, Le Fort-Piat N (2007) Hysteretic q-learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: 2007 IEEE\/RSJ international conference on intelligent robots and systems, pp 64\u201369","DOI":"10.1109\/IROS.2007.4399095"},{"key":"9996_CR148","unstructured":"Matignon L, Jeanpierre L, Mouaddib AI (2012a) Coordinated multi-robot exploration under communication constraints using decentralized markov decision processes. https:\/\/www.aaai.org\/ocs\/index.php\/AAAI\/AAAI12\/paper\/view\/5038"},{"issue":"1","key":"9996_CR149","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1017\/S0269888912000057","volume":"27","author":"L Matignon","year":"2012","unstructured":"Matignon L, Gj Laurent, Le fort piat N, (2012b) Review: independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. Knowl Eng Rev 27(1):1\u201331. https:\/\/doi.org\/10.1017\/S0269888912000057","journal-title":"Knowl Eng Rev"},{"key":"9996_CR150","doi-asserted-by":"publisher","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518:529 EP \u2013. https:\/\/doi.org\/10.1038\/nature14236","DOI":"10.1038\/nature14236"},{"key":"9996_CR151","unstructured":"Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Balcan MF, Weinberger KQ (eds) Proceedings of The 33rd international conference on machine learning, PMLR, New York, New York, USA, Proceedings of machine learning research, vol\u00a048, pp 1928\u20131937. http:\/\/proceedings.mlr.press\/v48\/mniha16.html"},{"issue":"2","key":"9996_CR152","doi-asserted-by":"publisher","first-page":"443","DOI":"10.1007\/s10994-017-5666-0","volume":"107","author":"TM Moerland","year":"2018","unstructured":"Moerland TM, Broekens J, Jonker CM (2018) Emotion in reinforcement learning agents and robots: a survey. Mach Learn 107(2):443\u2013480. https:\/\/doi.org\/10.1007\/s10994-017-5666-0","journal-title":"Mach Learn"},{"key":"9996_CR153","doi-asserted-by":"crossref","unstructured":"Mordatch I, Abbeel P (2018) Emergence of grounded compositional language in multi-agent populations. https:\/\/aaai.org\/ocs\/index.php\/AAAI\/AAAI18\/paper\/view\/17007","DOI":"10.1609\/aaai.v32i1.11492"},{"key":"9996_CR154","unstructured":"Nair R, Tambe M, Yokoo M, Pynadath D, Marsella S (2003) Taming decentralized pomdps: towards efficient policy computation for multiagent settings. In: Proceedings of the 18th international joint conference on artificial intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI\u201903, pp 705\u2013711. http:\/\/dl.acm.org\/citation.cfm?id=1630659.1630762"},{"key":"9996_CR155","unstructured":"Narvekar S, Sinapov J, Leonetti M, Stone P (2016) Source task creation for curriculum learning. In: Proceedings of the 2016 international conference on autonomous agents & multiagent systems, international foundation for autonomous agents and multiagent systems, Richland, SC, AAMAS \u201916, pp 566\u2013574. http:\/\/dl.acm.org\/citation.cfm?id=2936924.2937007"},{"issue":"1","key":"9996_CR156","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1109\/TAC.2008.2009515","volume":"54","author":"A Nedic","year":"2009","unstructured":"Nedic A, Ozdaglar A (2009) Distributed subgradient methods for multi-agent optimization. IEEE Trans Autom Control 54(1):48\u201361","journal-title":"IEEE Trans Autom Control"},{"key":"9996_CR157","unstructured":"Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Proceedings of the seventeenth international conference on machine learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, ICML \u201900, pp 663\u2013670. http:\/\/dl.acm.org\/citation.cfm?id=645529.657801"},{"key":"9996_CR158","unstructured":"Ng AY, Harada D, Russell S (1999) Policy invariance under reward transformations: theory and application to reward shaping. In: In Proceedings of the sixteenth international conference on machine learning, Morgan Kaufmann, pp 278\u2013287"},{"key":"9996_CR159","doi-asserted-by":"crossref","unstructured":"Nguyen DT, Kumar A, Lau HC (2017a) Collective multiagent sequential decision making under uncertainty. https:\/\/aaai.org\/ocs\/index.php\/AAAI\/AAAI17\/paper\/view\/14891","DOI":"10.1609\/aaai.v31i1.10708"},{"key":"9996_CR160","unstructured":"Nguyen DT, Kumar A, Lau HC (2017b) Policy gradient with value function approximation for collective multiagent planning. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30, Curran Associates, Inc., pp 4319\u20134329. http:\/\/papers.nips.cc\/paper\/7019-policy-gradient-with-value-function-approximation-for-collective-multiagent-planning.pdf"},{"key":"9996_CR161","unstructured":"Nguyen DT, Kumar A, Lau HC (2018) Credit assignment for collective multiagent rl with global rewards. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31, Curran Associates, Inc., pp 8102\u20138113. http:\/\/papers.nips.cc\/paper\/8033-credit-assignment-for-collective-multiagent-rl-with-global-rewards.pdf"},{"issue":"9","key":"9996_CR162","doi-asserted-by":"publisher","first-page":"3826","DOI":"10.1109\/TCYB.2020.2977374","volume":"50","author":"TT Nguyen","year":"2020","unstructured":"Nguyen TT, Nguyen ND, Nahavandi S (2020) Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans Cybern 50(9):3826\u20133839","journal-title":"IEEE Trans Cybern"},{"key":"9996_CR163","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-28929-8","volume-title":"A Concise Introduction to Decentralized POMDPs","author":"FA Oliehoek","year":"2016","unstructured":"Oliehoek FA, Amato C (2016) A Concise Introduction to Decentralized POMDPs, 1st edn. Springer Publishing Company, Berlin","edition":"1"},{"key":"9996_CR164","doi-asserted-by":"crossref","unstructured":"Oliehoek FA, Spaan MTJ, Vlassis N (2008) Optimal and approximate q-value functions for decentralized pomdps. J Artif Int Res 32(1):289\u2013353. http:\/\/dl.acm.org\/citation.cfm?id=1622673.1622680","DOI":"10.1613\/jair.2447"},{"key":"9996_CR165","unstructured":"Omidshafiei S, Pazis J, Amato C, How JP, Vian J (2017) Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, PMLR, International Convention Centre, Sydney, Australia, Proceedings of machine learning research, vol\u00a070, pp 2681\u20132690. http:\/\/proceedings.mlr.press\/v70\/omidshafiei17a.html"},{"issue":"01","key":"9996_CR166","first-page":"6128","volume":"33","author":"S Omidshafiei","year":"2019","unstructured":"Omidshafiei S, Kim DK, Liu M, Tesauro G, Riemer M, Amato C, Campbell M, How JP (2019) Learning to teach in cooperative multiagent reinforcement learning. Proc AAAI Conf Artif Intelli 33(01):6128\u20136136","journal-title":"Proc AAAI Conf Artif Intelli"},{"key":"9996_CR167","unstructured":"Oroojlooyjadid A, Hajinezhad D (2019) A review of cooperative multi-agent deep reinforcement learning. ArXiv arxiv: abs\/1908.03963"},{"key":"9996_CR168","doi-asserted-by":"publisher","first-page":"6","DOI":"10.3389\/neuro.12.006.2007","volume":"1","author":"PY Oudeyer","year":"2007","unstructured":"Oudeyer PY, Kaplan F (2007) What is intrinsic motivation? A typology of computational approaches. Front Neurorobotics 1:6\u20136","journal-title":"Front Neurorobotics"},{"key":"9996_CR169","unstructured":"Palmer G, Tuyls K, Bloembergen D, Savani R (2018) Lenient multi-agent deep reinforcement learning. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, AAMAS \u201918, pp 443\u2013451. http:\/\/dl.acm.org\/citation.cfm?id=3237383.3237451"},{"key":"9996_CR170","unstructured":"Palmer G, Savani R, Tuyls K (2019) Negative update intervals in deep multi-agent reinforcement learning. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp 43\u201351"},{"issue":"3","key":"9996_CR171","doi-asserted-by":"publisher","first-page":"387","DOI":"10.1007\/s10458-005-2631-2","volume":"11","author":"L Panait","year":"2005","unstructured":"Panait L, Luke S (2005) Cooperative multi-agent learning: the state of the art. Auton Agent Multi-Agent Syst 11(3):387\u2013434. https:\/\/doi.org\/10.1007\/s10458-005-2631-2","journal-title":"Auton Agent Multi-Agent Syst"},{"key":"9996_CR172","doi-asserted-by":"publisher","unstructured":"Panait L, Sullivan K, Luke S (2006) Lenient learners in cooperative multiagent systems. In: Proceedings of the fifth international joint conference on autonomous agents and multiagent systems, association for computing machinery, New York, NY, USA, AAMAS \u201906, pp 801\u2013803. https:\/\/doi.org\/10.1145\/1160633.1160776,","DOI":"10.1145\/1160633.1160776"},{"key":"9996_CR173","unstructured":"Papoudakis G, Christianos F, Rahman A, Albrecht SV (2019) Dealing with non-stationarity in multi-agent deep reinforcement learning. CoRR arxiv: abs\/1906.04737,"},{"key":"9996_CR174","unstructured":"Pathak D, Agrawal P, Efros AA, Darrell T (2017) Curiosity-driven exploration by self-supervised prediction. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, PMLR, International Convention Centre, Sydney, Australia, Proceedings of Machine Learning Research, vol\u00a070, pp 2778\u20132787. http:\/\/proceedings.mlr.press\/v70\/pathak17a.html"},{"key":"9996_CR175","unstructured":"Peng P, Yuan Q, Wen Y, Yang Y, Tang Z, Long H, Wang J (2017) Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. CoRR arxiv: abs\/1703.10069,"},{"key":"9996_CR176","unstructured":"P\u00e9rolat J, Leibo JZ, Zambaldi V, Beattie C, Tuyls K, Graepel T (2017) A multi-agent reinforcement learning model of common-pool resource appropriation. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30, Curran Associates, Inc., pp 3643\u20133652. http:\/\/papers.nips.cc\/paper\/6955-a-multi-agent-reinforcement-learning-model-of-common-pool-resource-appropriation.pdf"},{"key":"9996_CR177","unstructured":"Peysakhovich A, Lerer A (2018) Prosocial learning agents solve generalized stag hunts better than selfish ones. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, international Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, AAMAS \u201918, pp 2043\u20132044. http:\/\/dl.acm.org\/citation.cfm?id=3237383.3238065"},{"key":"9996_CR178","unstructured":"Pinto L, Davidson J, Sukthankar R, Gupta A (2017) Robust adversarial reinforcement learning. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, PMLR, International Convention Centre, Sydney, Australia, Proceedings of machine learning research, vol\u00a070, pp 2817\u20132826. http:\/\/proceedings.mlr.press\/v70\/pinto17a.html"},{"issue":"1","key":"9996_CR179","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10462-011-9277-z","volume":"40","author":"I Pinyol","year":"2013","unstructured":"Pinyol I, Sabater-Mir J (2013) Computational trust and reputation models for open multi-agent systems: a review. Artif Intell Rev 40(1):1\u201325. https:\/\/doi.org\/10.1007\/s10462-011-9277-z","journal-title":"Artif Intell Rev"},{"key":"9996_CR180","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1007\/3-540-58484-6_269","volume-title":"Parallel problem solving from nature - PPSN III","author":"MA Potter","year":"1994","unstructured":"Potter MA, De Jong KA (1994) A cooperative coevolutionary approach to function optimization. In: Davidor Y, Schwefel HP, M\u00e4nner R (eds) Parallel problem solving from nature - PPSN III. Springer, Berlin, pp 249\u2013257"},{"key":"9996_CR181","unstructured":"Qu G, Wierman A, Li N (2020) Scalable reinforcement learning of localized policies for multi-agent networked systems. PMLR, The Cloud, Proceedings of machine learning research, vol 120, pp 256\u2013266. http:\/\/proceedings.mlr.press\/v120\/qu20a.html"},{"key":"9996_CR182","unstructured":"Rabinowitz N, Perbet F, Song F, Zhang C, Eslami SMA, Botvinick M (2018) Machine theory of mind. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, PMLR, Stockholmsm\u00e4ssan, Stockholm Sweden, Proceedings of machine learning research, vol\u00a080, pp 4218\u20134227. http:\/\/proceedings.mlr.press\/v80\/rabinowitz18a.html"},{"key":"9996_CR183","unstructured":"Raghu M, Irpan A, Andreas J, Kleinberg B, Le Q, Kleinberg J (2018) Can deep reinforcement learning solve Erdos-Selfridge-Spencer games? In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, PMLR, Stockholmsm\u00e4ssan, Stockholm Sweden, Proceedings of machine learning research, vol\u00a080, pp 4238\u20134246. http:\/\/proceedings.mlr.press\/v80\/raghu18a.html"},{"key":"9996_CR184","unstructured":"Raileanu R, Denton E, Szlam A, Fergus R (2018) Modeling others using oneself in multi-agent reinforcement learning. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, PMLR, Stockholmsm\u00e4ssan, Stockholm Sweden, Proceedings of machine learning research, vol\u00a080, pp 4257\u20134266. http:\/\/proceedings.mlr.press\/v80\/raileanu18a.html"},{"issue":"1","key":"9996_CR185","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1017\/S0269888904000116","volume":"19","author":"SD Ramchurn","year":"2004","unstructured":"Ramchurn SD, Huynh D, Jennings NR (2004) Trust in multi-agent systems. Knowl Eng Rev 19(1):1\u201325. https:\/\/doi.org\/10.1017\/S0269888904000116","journal-title":"Knowl Eng Rev"},{"key":"9996_CR186","unstructured":"Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S (2018) QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, PMLR, Stockholmsm\u00e4ssan, Stockholm Sweden, Proceedings of machine learning research, vol\u00a080, pp 4295\u20134304. http:\/\/proceedings.mlr.press\/v80\/rashid18a.html"},{"key":"9996_CR187","unstructured":"Russell S, Zimdars AL (2003) Q-decomposition for reinforcement learning agents. In: Proceedings of the twentieth international conference on international conference on machine learning, AAAI Press, ICML\u201903, pp 656\u2013663. http:\/\/dl.acm.org\/citation.cfm?id=3041838.3041921"},{"key":"9996_CR188","unstructured":"Schaul T, Horgan D, Gregor K, Silver D (2015) Universal value function approximators. In: Proceedings of the 32nd international conference on international conference on machine learning - volume 37, JMLR.org, ICML\u201915, pp 1312\u20131320"},{"issue":"3","key":"9996_CR189","doi-asserted-by":"publisher","first-page":"230","DOI":"10.1109\/TAMD.2010.2056368","volume":"2","author":"J Schmidhuber","year":"2010","unstructured":"Schmidhuber J (2010) Formal theory of creativity, fun, and intrinsic motivation (1990\u20132010). IEEE Trans Auton Ment Dev 2(3):230\u2013247. https:\/\/doi.org\/10.1109\/TAMD.2010.2056368","journal-title":"IEEE Trans Auton Ment Dev"},{"key":"9996_CR190","unstructured":"Schmidhuber J, Zhao J, Wiering M (1996) Simple principles of metalearning. Tech. rep"},{"key":"9996_CR191","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. CoRR arxiv: abs\/1707.06347,"},{"key":"9996_CR192","unstructured":"Sen S, Weiss G (1999) Multiagent systems. MIT Press, Cambridge, MA, USA. http:\/\/dl.acm.org\/citation.cfm?id=305606.305612"},{"key":"9996_CR193","doi-asserted-by":"publisher","unstructured":"Sequeira P, Melo FS, Prada R, Paiva A (2011) Emerging social awareness: exploring intrinsic motivation in multiagent learning. In: 2011 IEEE international conference on development and learning (ICDL), vol\u00a02, pp 1\u20136. https:\/\/doi.org\/10.1109\/DEVLRN.2011.6037325","DOI":"10.1109\/DEVLRN.2011.6037325"},{"key":"9996_CR194","unstructured":"Shalev-Shwartz S, Shammah S, Shashua A (2016) Safe, multi-agent, reinforcement learning for autonomous driving. CoRR arxiv: abs\/1610.03295,"},{"issue":"10","key":"9996_CR195","doi-asserted-by":"publisher","first-page":"1095","DOI":"10.1073\/pnas.39.10.1953","volume":"39","author":"LS Shapley","year":"1953","unstructured":"Shapley LS (1953) Stochastic games. Proc Nat Acad Sci 39(10):1095\u20131100","journal-title":"Proc Nat Acad Sci"},{"key":"9996_CR196","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511811654","volume-title":"Multiagent systems: algorithmic, game-theoretic, and logical foundations","author":"Y Shoham","year":"2008","unstructured":"Shoham Y, Leyton-Brown K (2008) Multiagent systems: algorithmic, game-theoretic, and logical foundations. Cambridge University Press, USA"},{"key":"9996_CR197","unstructured":"Shoham Y, Powers R, Grenager T (2003) Multi-agent reinforcement learning: a critical survey. Tech. rep"},{"key":"9996_CR198","doi-asserted-by":"publisher","unstructured":"Silva FLD, Taylor ME, Costa AHR (2018) Autonomously reusing knowledge in multiagent reinforcement learning. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI-18, International joint conferences on artificial intelligence organization, pp 5487\u20135493. https:\/\/doi.org\/10.24963\/ijcai.2018\/774,","DOI":"10.24963\/ijcai.2018\/774"},{"key":"9996_CR199","doi-asserted-by":"publisher","unstructured":"Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van\u00a0den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of go with deep neural networks and tree search. Nature 529:484 EP \u2013. https:\/\/doi.org\/10.1038\/nature16961","DOI":"10.1038\/nature16961"},{"issue":"6419","key":"9996_CR200","doi-asserted-by":"publisher","first-page":"1140","DOI":"10.1126\/science.aar6404","volume":"362","author":"D Silver","year":"2018","unstructured":"Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140\u20131144","journal-title":"Science"},{"key":"9996_CR201","unstructured":"Singh A, Jain T, Sukhbaatar S (2019) Learning when to communicate at scale in multiagent cooperative and competitive tasks. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=rye7knCqK7"},{"key":"9996_CR202","unstructured":"Son K, Kim D, Kang WJ, Hostallero DE, Yi Y (2019) Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International conference on machine learning, pp 5887\u20135896"},{"key":"9996_CR203","unstructured":"Song J, Ren H, Sadigh D, Ermon S (2018) Multi-agent generative adversarial imitation learning. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, Curran Associates, Inc., vol\u00a031, pp 7461\u20137472. https:\/\/proceedings.neurips.cc\/paper\/2018\/file\/240c945bb72980130446fc2b40fbb8e0-Paper.pdf"},{"key":"9996_CR204","unstructured":"Song Y, Wang J, Lukasiewicz T, Xu Z, Xu M, Ding Z, Wu L (2019) Arena: A general evaluation platform and building toolkit for multi-agent intelligence. CoRR arxiv: abs\/1905.08085,"},{"key":"9996_CR205","doi-asserted-by":"crossref","unstructured":"Spooner T, Savani R (2020) Robust market making via adversarial reinforcement learning. In: Proceedings of the 19th international conference on autonomous agents and multiagent systems, pp 2014\u20132016","DOI":"10.24963\/ijcai.2020\/633"},{"key":"9996_CR206","unstructured":"Srinivasan S, Lanctot M, Zambaldi V, Perolat J, Tuyls K, Munos R, Bowling M (2018) Actor-critic policy optimization in partially observable multiagent environments. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31, Curran Associates, Inc., pp 3422\u20133435. http:\/\/papers.nips.cc\/paper\/7602-actor-critic-policy-optimization-in-partially-observable-multiagent-environments.pdf"},{"issue":"3","key":"9996_CR207","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1023\/A:1008942012299","volume":"8","author":"P Stone","year":"2000","unstructured":"Stone P, Veloso M (2000) Multiagent systems: a survey from a machine learning perspective. Auton Robots 8(3):345\u2013383. https:\/\/doi.org\/10.1023\/A:1008942012299","journal-title":"Auton Robots"},{"key":"9996_CR208","unstructured":"Strouse D, Kleiman-Weiner M, Tenenbaum J, Botvinick M, Schwab DJ (2018) Learning to share and hide intentions using information regularization. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31, Curran Associates, Inc., pp 10249\u201310259. http:\/\/papers.nips.cc\/paper\/8227-learning-to-share-and-hide-intentions-using-information-regularization.pdf"},{"key":"9996_CR209","unstructured":"Sukhbaatar S, szlam a, Fergus R (2016) Learning multiagent communication with backpropagation. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29, Curran Associates, Inc., pp 2244\u20132252. http:\/\/papers.nips.cc\/paper\/6398-learning-multiagent-communication-with-backpropagation.pdf"},{"key":"9996_CR210","unstructured":"Sukhbaatar S, Kostrikov I, Szlam A, Fergus R (2017) Intrinsic motivation and automatic curricula via asymmetric self-play. CoRR arxiv: abs\/1703.05407,"},{"key":"9996_CR211","unstructured":"Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K, Graepel T (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, AAMAS \u201918, pp 2085\u20132087. http:\/\/dl.acm.org\/citation.cfm?id=3237383.3238080"},{"key":"9996_CR212","unstructured":"Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. Adaptive computation and machine learning, MIT Press. http:\/\/www.worldcat.org\/oclc\/37293240"},{"issue":"1","key":"9996_CR213","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1016\/S0004-3702(99)00052-1","volume":"112","author":"RS Sutton","year":"1999","unstructured":"Sutton RS, Precup D, Singh S (1999) Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1):181\u2013211","journal-title":"Artif Intell"},{"key":"9996_CR214","doi-asserted-by":"crossref","unstructured":"Svetlik M, Leonetti M, Sinapov J, Shah R, Walker N, Stone P (2017) Automatic curriculum graph generation for reinforcement learning agents. https:\/\/aaai.org\/ocs\/index.php\/AAAI\/AAAI17\/paper\/view\/14961","DOI":"10.1609\/aaai.v31i1.10933"},{"key":"9996_CR215","unstructured":"Tacchetti A, Song HF, Mediano PAM, Zambaldi V, Kram\u00e1r J, Rabinowitz NC, Graepel T, Botvinick M, Battaglia PW (2019) Relational forward models for multi-agent learning. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=rJlEojAqFm"},{"issue":"4","key":"9996_CR216","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pone.0172395","volume":"12","author":"A Tampuu","year":"2017","unstructured":"Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, Aru J, Vicente R (2017) Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12(4):1\u201315. https:\/\/doi.org\/10.1371\/journal.pone.0172395","journal-title":"PLoS ONE"},{"key":"9996_CR217","doi-asserted-by":"crossref","unstructured":"Tan M (1993) Multi-agent reinforcement learning: Independent vs. cooperative agents. In: In Proceedings of the tenth international conference on machine learning, Morgan Kaufmann, pp 330\u2013337","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"key":"9996_CR218","unstructured":"Tang H, Hao J, Lv T, Chen Y, Zhang Z, Jia H, Ren C, Zheng Y, Fan C, Wang L (2018) Hierarchical deep multiagent reinforcement learning. CoRR arxiv: abs\/1809.09332,"},{"key":"9996_CR219","unstructured":"Taylor A, Dusparic I, Cahill V (2013) Transfer learning in multi-agent systems through parallel transfer. In: in Workshop on theoretically grounded transfer learning at the 30th international conference on machine learning (Poster"},{"key":"9996_CR220","unstructured":"Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10:1633\u20131685. http:\/\/dl.acm.org\/citation.cfm?id=1577069.1755839"},{"key":"9996_CR221","unstructured":"Tesauro G (2004) Extending q-learning to general adaptive multi-agent systems. In: Thrun S, Saul LK, Sch\u00f6lkopf B (eds) Advances in neural information processing systems 16, MIT Press, pp 871\u2013878. http:\/\/papers.nips.cc\/paper\/2503-extending-q-learning-to-general-adaptive-multi-agent-systems.pdf"},{"key":"9996_CR222","doi-asserted-by":"crossref","unstructured":"Tumer K, Wolpert DH (2004) Collectives and the design of complex systems. Springer, Berlin","DOI":"10.1007\/978-1-4419-8909-3"},{"issue":"3","key":"9996_CR223","first-page":"41","volume":"33","author":"K Tuyls","year":"2012","unstructured":"Tuyls K, Weiss G (2012) Multiagent learning: basics, challenges, and prospects. AI Mag 33(3):41","journal-title":"AI Mag"},{"key":"9996_CR224","unstructured":"Vezhnevets AS, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K (2017) FeUdal networks for hierarchical reinforcement learning. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, PMLR, International Convention Centre, Sydney, Australia, Proceedings of Machine Learning Research, vol\u00a070, pp 3540\u20133549. http:\/\/proceedings.mlr.press\/v70\/vezhnevets17a.html"},{"key":"9996_CR225","unstructured":"Vezhnevets AS, Wu Y, Leblond R, Leibo JZ (2019) Options as responses: grounding behavioural hierarchies in multi-agent RL. CoRR arxiv: abs\/1906.01470,"},{"key":"9996_CR226","unstructured":"Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets AS, Yeo M, Makhzani A, K\u00fcttler H, Agapiou J, Schrittwieser J, Quan J, Gaffney S, Petersen S, Simonyan K, Schaul T, van Hasselt H, Silver D, Lillicrap TP, Calderone K, Keet P, Brunasso A, Lawrence D, Ekermo A, Repp J, Tsing R (2017) Starcraft II: a new challenge for reinforcement learning. CoRR arxiv: abs\/1708.04782,"},{"issue":"7782","key":"9996_CR227","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","volume":"575","author":"O Vinyals","year":"2019","unstructured":"Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P, Oh J, Horgan D, Kroiss M, Danihelka I, Huang A, Sifre L, Cai T, Agapiou JP, Jaderberg M, Vezhnevets AS, Leblond R, Pohlen T, Dalibard V, Budden D, Sulsky Y, Molloy J, Paine TL, Gulcehre C, Wang Z, Pfaff T, Wu Y, Ring R, Yogatama D, W\u00fcnsch D, McKinney K, Smith O, Schaul T, Lillicrap T, Kavukcuoglu K, Hassabis D, Apps C, Silver D (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782):350\u2013354. https:\/\/doi.org\/10.1038\/s41586-019-1724-z","journal-title":"Nature"},{"key":"9996_CR228","unstructured":"Wang JX, Kurth-Nelson Z, Tirumala D, Soyer H, Leibo JZ, Munos R, Blundell C, Kumaran D, Botvinick M (2016a) Learning to reinforcement learn. CoRR arxiv: abs\/1611.05763,"},{"key":"9996_CR229","unstructured":"Wang JX, Hughes E, Fernando C, Czarnecki WM, Du\u00e9\u00f1ez Guzm\u00e1n EA, Leibo JZ (2019) Evolving intrinsic motivations for altruistic behavior. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, AAMAS \u201919, pp 683\u2013692. http:\/\/dl.acm.org\/citation.cfm?id=3306127.3331756"},{"key":"9996_CR230","doi-asserted-by":"publisher","unstructured":"Wang S, Wan J, Zhang D, Li D, Zhang C (2016b) Towards smart factory for industry 4.0: a self-organized multi-agent system with big data based feedback and coordination. Comput Netw 101:158\u2013168. https:\/\/doi.org\/10.1016\/j.comnet.2015.12.017. http:\/\/www.sciencedirect.com\/science\/article\/pii\/S1389128615005046, industrial Technologies and Applications for the Internet of Things","DOI":"10.1016\/j.comnet.2015.12.017"},{"key":"9996_CR231","unstructured":"Wang T, Dong H, Lesser VR, Zhang C (2020a) ROMA: multi-agent reinforcement learning with emergent roles. CoRR arxiv: abs\/2003.08039"},{"key":"9996_CR232","unstructured":"Wang T, Wang J, Wu Y, Zhang C (2020b) Influence-based multi-agent exploration. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=BJgy96EYvr"},{"key":"9996_CR233","unstructured":"Wang T, Wang J, Zheng C, Zhang C (2020c) Learning nearly decomposable value functions via communication minimization. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=HJx-3grYDB"},{"key":"9996_CR234","unstructured":"Wei E, Luke S (2016) Lenient learning in independent-learner stochastic cooperative games. J Mach Learn Res 17(84):1\u201342. http:\/\/jmlr.org\/papers\/v17\/15-417.html"},{"key":"9996_CR235","unstructured":"Wei E, Wicke D, Freelan D, Luke S (2018) Multiagent soft q-learning. https:\/\/www.aaai.org\/ocs\/index.php\/SSS\/SSS18\/paper\/view\/17508"},{"key":"9996_CR236","doi-asserted-by":"publisher","unstructured":"Wei Ren, Beard RW, Atkins EM (2005) A survey of consensus problems in multi-agent coordination. In: Proceedings of the 2005, American control conference, 2005., pp 1859\u20131864 vol. 3. https:\/\/doi.org\/10.1109\/ACC.2005.1470239","DOI":"10.1109\/ACC.2005.1470239"},{"key":"9996_CR237","doi-asserted-by":"publisher","first-page":"415","DOI":"10.1007\/978-3-642-79629-6_18","volume-title":"The biology and technology of intelligent autonomous agents","author":"G Wei\u00df","year":"1995","unstructured":"Wei\u00df G (1995) Distributed reinforcement learning. In: Steels L (ed) The biology and technology of intelligent autonomous agents. Springer, Berlin, pp 415\u2013428"},{"key":"9996_CR238","volume-title":"Multiagent systems: a modern approach to distributed artificial intelligence","year":"1999","unstructured":"Weiss G (ed) (1999) Multiagent systems: a modern approach to distributed artificial intelligence. MIT Press, Cambridge"},{"key":"9996_CR239","unstructured":"Wiegand RP (2004) An analysis of cooperative coevolutionary algorithms. PhD thesis, USA, aAI3108645"},{"key":"9996_CR240","unstructured":"Wolpert DH, Tumer K (1999) An introduction to collective intelligence. CoRR cs.LG\/9908014. http:\/\/arxiv.org\/arxiv: abs\/cs.LG\/9908014"},{"key":"9996_CR241","unstructured":"Wu C, Rajeswaran A, Duan Y, Kumar V, Bayen AM, Kakade S, Mordatch I, Abbeel P (2018) Variance reduction for policy gradient with action-dependent factorized baselines. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=H1tSsb-AW"},{"key":"9996_CR242","unstructured":"Yang E, Gu D (2004) Multiagent reinforcement learning for multi-robot systems: a survey. Tech. rep"},{"key":"9996_CR243","unstructured":"Yang J, Nakhaei A, Isele D, Fujimura K, Zha H (2020) Cm3: Cooperative multi-goal multi-stage multi-agent reinforcement learning. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=S1lEX04tPr"},{"key":"9996_CR244","doi-asserted-by":"crossref","unstructured":"Yang T, Meng Z, Hao J, Zhang C, Zheng Y (2018a) Bayes-tomop: a fast detection and best response algorithm towards sophisticated opponents. CoRR arxiv: abs\/1809.04240,","DOI":"10.24963\/ijcai.2019\/88"},{"key":"9996_CR245","unstructured":"Yang Y, Luo R, Li M, Zhou M, Zhang W, Wang J (2018b) Mean field multi-agent reinforcement learning. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, PMLR, Stockholmsm\u00e4ssan, Stockholm Sweden, Proceedings of machine learning research, vol\u00a080, pp 5571\u20135580. http:\/\/proceedings.mlr.press\/v80\/yang18d.html"},{"key":"9996_CR246","doi-asserted-by":"publisher","first-page":"372","DOI":"10.1007\/978-3-642-44927-7_25","volume-title":"PRIMA 2013: principles and practice of multi-agent systems","author":"C Yu","year":"2013","unstructured":"Yu C, Zhang M, Ren F (2013) Emotional multiagent reinforcement learning in social dilemmas. In: Boella G, Elkind E, Savarimuthu BTR, Dignum F, Purvis MK (eds) PRIMA 2013: principles and practice of multi-agent systems. Springer, Berlin, pp 372\u2013387"},{"key":"9996_CR247","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1109\/ACCESS.2013.2259892","volume":"1","author":"H Yu","year":"2013","unstructured":"Yu H, Shen Z, Leung C, Miao C, Lesser VR (2013) A survey of multi-agent trust management systems. IEEE Access 1:35\u201350. https:\/\/doi.org\/10.1109\/ACCESS.2013.2259892","journal-title":"IEEE Access"},{"key":"9996_CR248","unstructured":"Yu L, Song J, Ermon S (2019) Multi-agent adversarial inverse reinforcement learning. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, PMLR, Long Beach, California, USA, Proceedings of machine learning research, vol\u00a097, pp 7194\u20137201. http:\/\/proceedings.mlr.press\/v97\/yu19e.html"},{"key":"9996_CR249","doi-asserted-by":"crossref","unstructured":"Zhang K, Yang Z, Basar T (2018) Networked multi-agent reinforcement learning in continuous spaces. In: 2018 IEEE conference on decision and control (CDC), pp 2771\u20132776","DOI":"10.1109\/CDC.2018.8619581"},{"key":"9996_CR250","unstructured":"Zhang K, Yang Z, Liu H, Zhang T, Basar T (2018) Fully decentralized multi-agent reinforcement learning with networked agents. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, PMLR, Stockholmsm\u00e4ssan, Stockholm Sweden, Proceedings of machine learning research, vol\u00a080, pp 5872\u20135881. http:\/\/proceedings.mlr.press\/v80\/zhang18n.html"},{"key":"9996_CR251","unstructured":"Zhang K, Yang Z, Ba\u015far T (2019) Multi-agent reinforcement learning: a selective overview of theories and algorithms. ArXiv arxiv: abs\/1911.10635"},{"key":"9996_CR252","unstructured":"Zhang W, Bastani O (2019) Mamps: Safe multi-agent reinforcement learning via model predictive shielding. ArXiv arxiv: abs\/1910.12639"},{"key":"9996_CR253","doi-asserted-by":"publisher","first-page":"421","DOI":"10.1007\/978-3-319-97310-4_48","volume-title":"PRICAI 2018: trends in artificial intelligence","author":"Y Zheng","year":"2018","unstructured":"Zheng Y, Meng Z, Hao J, Zhang Z (2018a) Weighted double deep multiagent reinforcement learning in stochastic cooperative environments. In: Geng X, Kang BH (eds) PRICAI 2018: trends in artificial intelligence. Springer International Publishing, Cham, pp 421\u2013429"},{"key":"9996_CR254","unstructured":"Zheng Y, Meng Z, Hao J, Zhang Z, Yang T, Fan C (2018b) A deep bayesian policy reuse approach against non-stationary agents. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31, Curran Associates, Inc., pp 954\u2013964. http:\/\/papers.nips.cc\/paper\/7374-a-deep-bayesian-policy-reuse-approach-against-non-stationary-agents.pdf"},{"key":"9996_CR255","doi-asserted-by":"publisher","unstructured":"Zhu H, Kirley M (2019) Deep multi-agent reinforcement learning in a common-pool resource system. In: 2019 IEEE congress on evolutionary computation (CEC), pp 142\u2013149. https:\/\/doi.org\/10.1109\/CEC.2019.8790001","DOI":"10.1109\/CEC.2019.8790001"},{"key":"9996_CR256","doi-asserted-by":"crossref","unstructured":"Zhu Z, Biyik E, Sadigh D (2020) Multi-agent safe planning with gaussian processes. ArXiv arxiv: abs\/2008.04452","DOI":"10.1109\/IROS45743.2020.9341169"}],"container-title":["Artificial Intelligence Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-021-09996-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10462-021-09996-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-021-09996-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,24]],"date-time":"2022-12-24T12:31:19Z","timestamp":1671885079000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10462-021-09996-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,15]]},"references-count":256,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,2]]}},"alternative-id":["9996"],"URL":"https:\/\/doi.org\/10.1007\/s10462-021-09996-w","relation":{},"ISSN":["0269-2821","1573-7462"],"issn-type":[{"value":"0269-2821","type":"print"},{"value":"1573-7462","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,15]]},"assertion":[{"value":"15 April 2021","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}