{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T14:40:08Z","timestamp":1777560008590,"version":"3.51.4"},"reference-count":71,"publisher":"SAGE Publications","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AIC"],"published-print":{"date-parts":[[2022,9,20]]},"abstract":"<jats:p>The Game Theory &amp; Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in deep reinforcement learning to explore multi-agent systems in complex environments and use these benchmarks to advance our understanding. Here, we summarise the recent work of our team and present a taxonomy that we feel highlights many important open challenges in multi-agent research.<\/jats:p>","DOI":"10.3233\/aic-220113","type":"journal-article","created":{"date-parts":[[2022,9,6]],"date-time":"2022-09-06T11:24:40Z","timestamp":1662463480000},"page":"271-284","source":"Crossref","is-referenced-by-count":1,"title":["Developing, evaluating and scaling learning agents in multi-agent environments"],"prefix":"10.1177","volume":"35","author":[{"given":"Ian","family":"Gemp","sequence":"first","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thomas","family":"Anthony","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yoram","family":"Bachrach","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Avishkar","family":"Bhoopchand","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kalesha","family":"Bullard","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jerome","family":"Connor","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vibhavari","family":"Dasagi","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bart","family":"De Vylder","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Edgar\u00a0A.","family":"Du\u00e9\u00f1ez-Guzm\u00e1n","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Romuald","family":"Elie","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Richard","family":"Everett","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel","family":"Hennes","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Edward","family":"Hughes","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mina","family":"Khan","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marc","family":"Lanctot","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kate","family":"Larson","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guy","family":"Lever","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Siqi","family":"Liu","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Luke","family":"Marris","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kevin R.","family":"McKee","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Paul","family":"Muller","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Julien","family":"P\u00e9rolat","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Florian","family":"Strub","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrea","family":"Tacchetti","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eugene","family":"Tarassov","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhe","family":"Wang","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Karl","family":"Tuyls","sequence":"additional","affiliation":[{"name":"Game Theory & Multi-Agent Team, DeepMind, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","reference":[{"key":"10.3233\/AIC-220113_ref1","first-page":"17987","article-title":"Learning to play no-press diplomacy with best response policy iteration","volume":"33","author":"Anthony","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"10.3233\/AIC-220113_ref3","first-page":"187","article-title":"Some mathematical models of race discrimination in the labor market","author":"Arrow","year":"1972","journal-title":"Racial discrimination in economic life"},{"key":"10.3233\/AIC-220113_ref4","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2020.103356"},{"key":"10.3233\/AIC-220113_ref5","doi-asserted-by":"crossref","unstructured":"Y. Bachrach, I. Gemp, M. Garnelo, J. Kramar, T. Eccles, D. Rosenbaum and T. Graepel, A neural network auction for group decision making over a continuous space, in: Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI) Demonstrations Track, 2021.","DOI":"10.24963\/ijcai.2021\/706"},{"key":"10.3233\/AIC-220113_ref6","unstructured":"A. Bakhtin, D. Wu, A. Lerer and N. Brown, No-press diplomacy from scratch, Advances in Neural Information Processing Systems 34 (2021)."},{"key":"10.3233\/AIC-220113_ref8","unstructured":"J. Balaguer, R. K\u00f6ster, C. Summerfield and A. Tacchetti, The good shepherd: An oracle agent for mechanism design, in: ICLR Workshop on Gamification and Multiagent Solutions, 2022."},{"key":"10.3233\/AIC-220113_ref9","unstructured":"J. Balaguer, R. K\u00f6ster, A. Weinstein, L. Campbell-Gillingham, C. Summerfield, M. Botvinick and A. Tacchetti, HCMD-zero: Learning value aligned mechanisms from data, in: ICLR Workshop on Gamification and Multiagent Solutions, 2022."},{"key":"10.3233\/AIC-220113_ref10","unstructured":"D. Balduzzi, K. Tuyls, J. Perolat and T. Graepel, Re-evaluating evaluation, Advances in Neural Information Processing Systems 31 (2018)."},{"key":"10.3233\/AIC-220113_ref11","doi-asserted-by":"publisher","first-page":"659","DOI":"10.1613\/jair.4818","article-title":"Evolutionary dynamics of multi-agent learning: A survey","volume":"53","author":"Bloembergen","year":"2015","journal-title":"J. Artif. Intell. Res."},{"issue":"1","key":"10.3233\/AIC-220113_ref12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1006\/jeth.1997.2319","article-title":"Learning through reinforcement and replicator dynamics","volume":"77","author":"B\u00f6rgers","year":"1997","journal-title":"Journal of Economic Theory"},{"issue":"1","key":"10.3233\/AIC-220113_ref13","first-page":"374","article-title":"Iterative solution of games by fictitious play, activity analysis of production and","volume":"13","author":"Brown","year":"1951","journal-title":"allocation"},{"issue":"4","key":"10.3233\/AIC-220113_ref16","doi-asserted-by":"publisher","first-page":"911","DOI":"10.4310\/CMS.2015.v13.n4.a4","article-title":"Mean field games and systemic risk","volume":"13","author":"Carmona","year":"2015","journal-title":"Communications in Mathematical Sciences"},{"key":"10.3233\/AIC-220113_ref17","unstructured":"M. Carroll, R. Shah, M.K. Ho, T. Griffiths, S. Seshia, P. Abbeel and A. Dragan, On the utility of learning about humans for human-ai coordination, Advances in neural information processing systems 32 (2019)."},{"key":"10.3233\/AIC-220113_ref18","unstructured":"R. Chaabouni, F. Strub, F. Altch\u00e9, E. Tarassov, C. Tallec, E. Davoodi, K.W. Mathewson, O. Tieleman, A. Lazaridou and B. Piot, Emergent Communication at Scale, International Conference on Learning Representations, 2021."},{"key":"10.3233\/AIC-220113_ref19","first-page":"17443","article-title":"Real world games look like spinning tops","volume":"33","author":"Czarnecki","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"10.3233\/AIC-220113_ref20","doi-asserted-by":"crossref","unstructured":"A. Dafoe, Y. Bachrach, G. Hadfield, E. Horvitz, K. Larson and T. Graepel, Cooperative AI: Machines Must Learn to Find Common Ground, Nature Publishing Group, 2021.","DOI":"10.1038\/d41586-021-01170-0"},{"key":"10.3233\/AIC-220113_ref22","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"issue":"9\u201310","key":"10.3233\/AIC-220113_ref23","doi-asserted-by":"publisher","first-page":"1506","DOI":"10.1016\/j.mcm.2010.06.012","article-title":"Modeling crowd dynamics by the mean-field limit approach","volume":"52","author":"Dogb\u00e9","year":"2010","journal-title":"Mathematical and Computer Modelling"},{"key":"10.3233\/AIC-220113_ref24","unstructured":"A.M. Donati, G. Quispe, C. Ollion, S.L. Corff, F. Strub and O. Pietquin, Learning natural language generation from scratch, in: Conference of the North American Chapter of the Association for Computational Linguistics, 2022."},{"key":"10.3233\/AIC-220113_ref26","unstructured":"T. Eccles, Y. Bachrach, G. Lever, A. Lazaridou and T. Graepel, Biases for emergent communication in multi-agent reinforcement learning, Advances in neural information processing systems 32 (2019)."},{"issue":"1","key":"10.3233\/AIC-220113_ref27","doi-asserted-by":"publisher","first-page":"399","DOI":"10.1111\/mafi.12291","article-title":"Mean\u2013field moral hazard for optimal energy demand response management","volume":"31","author":"\u00c9lie","year":"2021","journal-title":"Mathematical Finance"},{"key":"10.3233\/AIC-220113_ref28","doi-asserted-by":"publisher","DOI":"10.1051\/mmnp\/2020022"},{"key":"10.3233\/AIC-220113_ref29","doi-asserted-by":"crossref","unstructured":"R. Elie, J. Perolat, M. Lauri\u00e8re, M. Geist and O. Pietquin, On the convergence of model free learning in mean field games, in: Proc. of AAAI, 2020.","DOI":"10.1609\/aaai.v34i05.6203"},{"key":"10.3233\/AIC-220113_ref30","unstructured":"M. Geist, J. P\u00e9rolat, M. Lauri\u00e8re, R. Elie, S. Perrin, O. Bachem, R. Munos and O. Pietquin, Concave utility reinforcement learning: The mean-field game viewpoint, in: Proc. of AAMAS, 2022."},{"key":"10.3233\/AIC-220113_ref34","unstructured":"I. Gemp, R. Savani, M. Lanctot, Y. Bachrach, T. Anthony, R. Everett, A. Tacchetti, T. Eccles and J. Kram\u00e1r, Sample-based approximation of Nash in large many-player games via gradient descent, in: Proceedings of the 21st International Conference on Autonomous Agents and MultiAgent Systems, AAMAS \u201922, International Foundation for Autonomous Agents and Multiagent Systems, 2022."},{"key":"10.3233\/AIC-220113_ref35","doi-asserted-by":"publisher","DOI":"10.1098\/rstb.2019.0766"},{"key":"10.3233\/AIC-220113_ref37","unstructured":"A. Gruslys, M. Lanctot, R. Munos, F. Timbers, M. Schmid, J. Perolat, D. Morrill, V. Zambaldi, J.-B. Lespiau, J. Schultz, M.G. Azar, M. Bowling and K. Tuyls, The Advantage Regret-Matching Actor-Critic, 2020."},{"key":"10.3233\/AIC-220113_ref38","unstructured":"J. Heinrich, M. Lanctot and D. Silver, Fictitious self-play in extensive-form games, in: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), 2015."},{"key":"10.3233\/AIC-220113_ref40","unstructured":"D. Hennes, D. Morrill, S. Omidshafiei, R. Munos, J. Perolat, M. Lanctot, A. Gruslys, J.-B. Lespiau, P. Parmas, E. Duenez-Guzman and K. Tuyls, Neural replicator dynamics, in: Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2020."},{"key":"10.3233\/AIC-220113_ref41","doi-asserted-by":"crossref","unstructured":"M. Huang, R.P. Malham\u00e9 and P.E. Caines, Large population stochastic dynamic games: Closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle, Communications in Information & Systems 6 (2006).","DOI":"10.4310\/CIS.2006.v6.n3.a5"},{"key":"10.3233\/AIC-220113_ref43","unstructured":"E. Hughes, J.Z. Leibo, M. Phillips, K. Tuyls, E. Due\u00f1ez-Guzman, A. Garc\u00eda Casta\u00f1eda, I. Dunning, T. Zhu, K. McKee, R. Koster et al., Inequity aversion improves cooperation in intertemporal social dilemmas, Advances in neural information processing systems 31 (2018)."},{"key":"10.3233\/AIC-220113_ref44","unstructured":"N. Jaques, A. Lazaridou, E. Hughes, C. Gulcehre, P. Ortega, D. Strouse, J.Z. Leibo and N. De Freitas, Social influence as intrinsic motivation for multi-agent deep reinforcement learning, in: International Conference on Machine Learning, PMLR, 2019, pp. 3040\u20133049."},{"key":"10.3233\/AIC-220113_ref45","unstructured":"A. Kalinowska, E. Davoodi, F. Strub, K. Mathewson, T. Murphey and P. Pilarski, Situated Communication: A Solution to over-Communication Between Artificial Agents, Emergent Communication Workshop at ICLR 2022, 2022."},{"key":"10.3233\/AIC-220113_ref47","doi-asserted-by":"crossref","unstructured":"R. K\u00f6ster, D. Hadfield-Menell, R. Everett, L. Weidinger, G.K. Hadfield and J.Z. Leibo, Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents, Proceedings of the National Academy of Sciences 119(3) (2022).","DOI":"10.1073\/pnas.2106028118"},{"key":"10.3233\/AIC-220113_ref48","unstructured":"A. Krizhevsky, I. Sutskever and G.E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems 25 (2012)."},{"key":"10.3233\/AIC-220113_ref49","first-page":"1107","article-title":"Least-squares policy iteration","volume":"4","author":"Lagoudakis","year":"2003","journal-title":"The Journal of Machine Learning Research"},{"key":"10.3233\/AIC-220113_ref51","unstructured":"M. Lanctot, V. Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. Perolat, D. Silver and T. Graepel, A unified game-theoretic approach to multiagent reinforcement learning, in: Neural Information Processing Systems (NIPS), 2017."},{"key":"10.3233\/AIC-220113_ref52","unstructured":"M. Lanctot, V. Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. Perolat, D. Silver and T. Graepel, A unified game-theoretic approach to multiagent reinforcement learning, in: Advances in Neural Information Processing Systems, 2017."},{"key":"10.3233\/AIC-220113_ref53","doi-asserted-by":"publisher","DOI":"10.1007\/s11537-007-0657-8"},{"key":"10.3233\/AIC-220113_ref55","unstructured":"J.Z. Leibo, E.A. Due\u00f1ez-Guzman, A. Vezhnevets, J.P. Agapiou, P. Sunehag, R. Koster, J. Matyas, C. Beattie, I. Mordatch and T. Graepel, Scalable evaluation of multi-agent reinforcement learning with melting pot, in: International Conference on Machine Learning, PMLR, 2021, pp. 6187\u20136199."},{"key":"10.3233\/AIC-220113_ref57","unstructured":"J.Z. Leibo, J. Perolat, E. Hughes, S. Wheelwright, A.H. Marblestone, E. Du\u00e9\u00f1ez-Guzm\u00e1n, P. Sunehag, I. Dunning and T. Graepel, Malthusian reinforcement learning, in: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019, pp. 1099\u20131107."},{"key":"10.3233\/AIC-220113_ref59","unstructured":"G. Lever, J. Merel, N. Heess, S. Tunyasuvunakool, S. Liu and T. Graepel, Emergent Coordination Through Competition, 2019."},{"key":"10.3233\/AIC-220113_ref61","unstructured":"S.\u00a0Liu, L.\u00a0Marris, D.\u00a0Hennes, J.\u00a0Merel, N.\u00a0Heess and T.\u00a0Graepel, NeuPL: Neural population learning, in: International Conference on Learning Representations, 2022, https:\/\/openreview.net\/forum?id=MIX3fJkl_1."},{"key":"10.3233\/AIC-220113_ref62","doi-asserted-by":"crossref","unstructured":"E. Lockhart, M. Lanctot, J. P\u00e9rolat, J.-B. Lespiau, D. Morrill, F. Timbers and K. Tuyls, Computing approximate equilibria in sequential adversarial games by exploitability descent, in: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), 2019.","DOI":"10.24963\/ijcai.2019\/66"},{"key":"10.3233\/AIC-220113_ref63","unstructured":"L. Marris, P. Muller, M. Lanctot, K. Tuyls and T. Graepel, Multi-agent training beyond zero-sum with correlated equilibrium meta-solvers, in: Proceedings of the 38th International Conference on Machine Learning, M. Meila and T. Zhang, eds, Proceedings of Machine Learning Research, Vol. 139, PMLR, 2021, pp. 7480\u20137491, http:\/\/proceedings.mlr.press\/v139\/marris21a.html."},{"key":"10.3233\/AIC-220113_ref65","unstructured":"H.B. McMahan, G.J. Gordon and A. Blum, Planning in the presence of cost functions controlled by an adversary, in: Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 536\u2013543."},{"key":"10.3233\/AIC-220113_ref66","doi-asserted-by":"crossref","unstructured":"P. Milgrom and P.R. Milgrom, Putting Auction Theory to Work, Cambridge University Press, 2004.","DOI":"10.1017\/CBO9780511813825"},{"key":"10.3233\/AIC-220113_ref67","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2102.02274"},{"key":"10.3233\/AIC-220113_ref68","unstructured":"P. Muller, S. Omidshafiei, M. Rowland, K. Tuyls, J. Perolat, S. Liu, D. Hennes, L. Marris, M. Lanctot, E. Hughes, Z. Wang, G. Lever, N. Heess, T. Graepel and R. Munos, A generalized training approach for multiagent learning, in: Proceedings of the Eighth International Conference on Learning Representations (ICLR), 2020."},{"key":"10.3233\/AIC-220113_ref69","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2111.08350"},{"key":"10.3233\/AIC-220113_ref70","unstructured":"R. Munos, J. Perolat, J.-B. Lespiau, M. Rowland, B.D. Vylder, M. Lanctot, F. Timbers, D. Hennes, S. Omidshafiei, A. Gruslys, M.G. Azar, E. Lockhart and K. Tuyls, Fast computation of Nash equilibria in imperfect information games, in: Proceedings of the International Conference on Machine Learning (ICML), 2020."},{"key":"10.3233\/AIC-220113_ref71","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-019-45619-9"},{"key":"10.3233\/AIC-220113_ref72","doi-asserted-by":"publisher","DOI":"10.3390\/e20100782"},{"key":"10.3233\/AIC-220113_ref73","unstructured":"P. Paquette, Y. Lu, S.S. Bocco, M. Smith, S. Ortiz-Gagn\u00e9, J.K. Kummerfeld, J. Pineau, S. Singh and A.C. Courville, No-press diplomacy: Modeling multi-agent gameplay, Advances in Neural Information Processing Systems 32 (2019)."},{"key":"10.3233\/AIC-220113_ref74","doi-asserted-by":"crossref","unstructured":"R. Patel, M. Garnelo, I. Gemp, C. Dyer and Y. Bachrach, Game-theoretic vocabulary selection via the Shapley value and Banzhaf index, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 2789\u20132798.","DOI":"10.18653\/v1\/2021.naacl-main.223"},{"key":"10.3233\/AIC-220113_ref75","unstructured":"J. Perolat, J.Z. Leibo, V. Zambaldi, C. Beattie, K. Tuyls and T. Graepel, A multi-agent reinforcement learning model of common-pool resource appropriation, Advances in Neural Information Processing Systems 30 (2017)."},{"key":"10.3233\/AIC-220113_ref76","unstructured":"J. Perolat, R. Munos, J.-B. Lespiau, S. Omidshafiei, M. Rowland, P. Ortega, N. Burch, T. Anthony, D. Balduzzi, B.D. Vylder, G. Piliouras, M. Lanctot and K. Tuyls, From Poincar\u00e9 recurrence to convergence in imperfect information games: Finding equilibrium via regularization, in: Proceedings of the Thirty-Eighth International Conference on Machine Learning (ICML), 2021."},{"key":"10.3233\/AIC-220113_ref80","unstructured":"S. Perrin, J. P\u00e9rolat, M. Lauri\u00e8re, M. Geist, R. Elie and O. Pietquin, Fictitious play for mean field games: Continuous time analysis and applications, in: Proc. of NeurIPS, 2020."},{"key":"10.3233\/AIC-220113_ref82","unstructured":"M. Rita, F. Strub, J.-B. Grill, O. Pietquin and E. Dupoux, On the role of population heterogeneity in emergent communication, in: International Conference on Learning Representations, 2021."},{"issue":"7676","key":"10.3233\/AIC-220113_ref83","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/nature24270","article-title":"Mastering the game of Go without human knowledge","volume":"550","author":"Silver","year":"2017","journal-title":"Nat."},{"key":"10.3233\/AIC-220113_ref84","unstructured":"S. Srinivasan, M. Lanctot, V. Zambaldi, J. P\u00e9rolat, K. Tuyls, R. Munos and M. Bowling, Actor-critic policy optimization in partially observable multiagent environments, in: Advances in Neural Information Processing Systems (NeurIPS), 2018."},{"key":"10.3233\/AIC-220113_ref85","unstructured":"D. Strouse, K. McKee, M. Botvinick, E. Hughes and R. Everett, Collaborating with humans without human data, Advances in Neural Information Processing Systems 34 (2021)."},{"key":"10.3233\/AIC-220113_ref86","doi-asserted-by":"publisher","DOI":"10.1162\/isal_a_00148"},{"key":"10.3233\/AIC-220113_ref87","unstructured":"E. Szathm\u00e1ry and J.M. Smith, The Major Transitions in Evolution, WH Freeman Spektrum, Oxford, UK, 1995."},{"key":"10.3233\/AIC-220113_ref88","unstructured":"A. Tacchetti, D. Strouse, M. Garnelo, T. Graepel and Y. Bachrach, Learning truthful, efficient, and welfare maximizing auction rules, in: ICLR Workshop on Gamification and Multiagent Solutions, 2022."},{"key":"10.3233\/AIC-220113_ref90","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-39857-8_38"},{"key":"10.3233\/AIC-220113_ref91","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1613\/jair.1.12505","article-title":"Game plan: What AI can do for football, and what football can do for AI","volume":"71","author":"Tuyls","year":"2021","journal-title":"Journal of Artificial Intelligence Research"},{"key":"10.3233\/AIC-220113_ref92","doi-asserted-by":"publisher","DOI":"10.1145\/860575.860687"},{"key":"10.3233\/AIC-220113_ref93","unstructured":"A. Vezhnevets, Y. Wu, M. Eckstein, R. Leblond and J.Z. Leibo, Options as responses: Grounding behavioural hierarchies in multi-agent reinforcement learning, in: International Conference on Machine Learning, PMLR, 2020, pp. 9733\u20139742."},{"issue":"7782","key":"10.3233\/AIC-220113_ref95","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","article-title":"Grandmaster level in StarCraft II using multi-agent reinforcement learning","volume":"575","author":"Vinyals","year":"2019","journal-title":"Nat."},{"key":"10.3233\/AIC-220113_ref97","first-page":"15208","article-title":"Learning to incentivize other learning agents","volume":"33","author":"Yang","year":"2020","journal-title":"Advances in Neural Information Processing Systems"}],"container-title":["AI Communications"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/AIC-220113","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T18:28:04Z","timestamp":1777400884000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/AIC-220113"}},"subtitle":[],"editor":[{"given":"Stefano V.","family":"Albrecht","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]},{"given":"Michael","family":"Woolridge","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2022,9,20]]},"references-count":71,"journal-issue":{"issue":"4"},"URL":"https:\/\/doi.org\/10.3233\/aic-220113","relation":{},"ISSN":["1875-8452","0921-7126"],"issn-type":[{"value":"1875-8452","type":"electronic"},{"value":"0921-7126","type":"print"}],"subject":[],"published":{"date-parts":[[2022,9,20]]}}}