{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T19:32:17Z","timestamp":1775503937414,"version":"3.50.1"},"reference-count":55,"publisher":"Springer Science and Business Media LLC","issue":"23","license":[{"start":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T00:00:00Z","timestamp":1736208000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T00:00:00Z","timestamp":1736208000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001779","name":"Monash University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001779","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>The emergence of cooperation in decentralized multi-agent systems is challenging; naive implementations of learning algorithms typically fail to converge or converge to equilibria without cooperation. Opponent modeling techniques, combined with reinforcement learning, have been successful in promoting cooperation, but face challenges when other agents are plentiful or anonymous. We envision environments in which agents face a sequence of interactions with different and heterogeneous agents. Inspired by models of evolutionary game theory, we introduce RL agents that forgo explicit modeling of others. Instead, they augment their reward signal by considering how to best respond to others assumed to be rational against their own strategy. This technique not only scales well in environments with many agents, but can also outperform opponent modeling techniques across a range of cooperation games. Agents that use the algorithm we propose can successfully maintain and establish cooperation when playing against an ensemble of diverse agents. This finding is robust across different kinds of games and can also be shown not to disadvantage agents in purely competitive interactions. While cooperation in pairwise settings is foundational, interactions across large groups of diverse agents are likely to be the norm in future applications where cooperation is an emergent property of agent design, rather than a design goal at the system level. The algorithm we propose here is a simple and scalable step in this direction.<\/jats:p>","DOI":"10.1007\/s00521-024-10511-9","type":"journal-article","created":{"date-parts":[[2025,1,6]],"date-time":"2025-01-06T23:41:01Z","timestamp":1736206861000},"page":"18835-18849","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Learning to cooperate against ensembles of diverse opponents"],"prefix":"10.1007","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-3085-9121","authenticated-orcid":false,"given":"Isuri","family":"Perera","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Frits","family":"de Nijs","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Julian","family":"Garc\u00eda","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,1,7]]},"reference":[{"key":"10511_CR1","doi-asserted-by":"publisher","unstructured":"Santos FP, Pacheco JM, Paiva A et al (2019) Evolution of Collective Fairness in Hybrid Populations of Humans and Agents. In: Proceedings of the AAAI conference on artificial intelligence vol 33, pp 6146\u2013615. https:\/\/doi.org\/10.1609\/aaai.v33i01.33016146","DOI":"10.1609\/aaai.v33i01.33016146"},{"key":"10511_CR2","doi-asserted-by":"crossref","unstructured":"Dafoe A, Bachrach Y, Hadfield G et\u00a0al (2021) Cooperative AI: machines must learn to find common ground","DOI":"10.1038\/d41586-021-01170-0"},{"key":"10511_CR3","unstructured":"Dafoe A, Hughes E, Bachrach Y et\u00a0al (2020) Open problems in cooperative AI. arXiv preprint arXiv:2012.08630"},{"key":"10511_CR4","unstructured":"Hu H, Lerer A, Peysakhovich A et\u00a0al (2020) \u201cother-play\u201d for zero-shot coordination. In: International conference on machine learning, PMLR, pp 4399\u20134410"},{"issue":"4","key":"10511_CR5","doi-asserted-by":"publisher","first-page":"264","DOI":"10.1177\/1059712311410896","volume":"19","author":"HT Anh","year":"2011","unstructured":"Anh HT, Moniz Pereira L, Santos FC (2011) Intention recognition promotes the emergence of cooperation. Adapt Behav 19(4):264\u2013279. https:\/\/doi.org\/10.1177\/1059712311410896","journal-title":"Adapt Behav"},{"key":"10511_CR6","unstructured":"Foerster J, Chen RY, Al-Shedivat M et\u00a0al (2018a) Learning with opponent-learning awareness. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, pp 122\u2013130"},{"issue":"7695","key":"10511_CR7","doi-asserted-by":"publisher","first-page":"242","DOI":"10.1038\/nature25763","volume":"555","author":"FP Santos","year":"2018","unstructured":"Santos FP, Santos FC, Pacheco JM (2018) Social norm complexity and past reputations in the evolution of cooperation. Nature 555(7695):242. https:\/\/doi.org\/10.1038\/nature25763","journal-title":"Nature"},{"issue":"8","key":"10511_CR8","doi-asserted-by":"publisher","first-page":"413","DOI":"10.1016\/j.tics.2013.06.003","volume":"17","author":"DG Rand","year":"2013","unstructured":"Rand DG, Nowak MA (2013) Human cooperation. Trends Cogn Sci 17(8):413\u201342. https:\/\/doi.org\/10.1016\/j.tics.2013.06.003","journal-title":"Trends Cogn Sci"},{"key":"10511_CR9","unstructured":"Peysakhovich A, Lerer A (2018) Prosocial learning agents solve generalized stag hunts better than selfish ones. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, pp 2043\u20132044"},{"issue":"Nov","key":"10511_CR10","first-page":"1039","volume":"4","author":"J Hu","year":"2003","unstructured":"Hu J, Wellman MP (2003) Nash q-learning for general-sum stochastic games. J Mach Learn Res 4(Nov):1039\u20131069","journal-title":"J Mach Learn Res"},{"key":"10511_CR11","unstructured":"Lowe R, Wu Y, Tamar A et\u00a0al (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st international conference on neural information processing systems, pp 6382\u20136393"},{"key":"10511_CR12","unstructured":"Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. AAAI\/IAAI 1998(746\u2013752):2"},{"issue":"1\u20132","key":"10511_CR13","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1007\/s10994-006-0143-1","volume":"67","author":"V Conitzer","year":"2007","unstructured":"Conitzer V, Sandholm T (2007) Awesome: s general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Mach Learn 67(1\u20132):23\u201343","journal-title":"Mach Learn"},{"issue":"4489","key":"10511_CR14","doi-asserted-by":"publisher","first-page":"1390","DOI":"10.1126\/science.7466396","volume":"211","author":"R Axelrod","year":"1981","unstructured":"Axelrod R, Hamilton WD (1981) The evolution of cooperation. Science 211(4489):1390\u20131396","journal-title":"Science"},{"key":"10511_CR15","doi-asserted-by":"publisher","first-page":"102","DOI":"10.3389\/frobt.2018.00102","volume":"5","author":"J Garc\u00eda","year":"2018","unstructured":"Garc\u00eda J, van Veelen M (2018) No strategy can win in the repeated prisoner\u2019s dilemma: linking game theory and computer simulations. Front Robot AI 5:102","journal-title":"Front Robot AI"},{"key":"10511_CR16","unstructured":"Foerster J, Farquhar G, Al-Shedivat M et\u00a0al (2018c) Dice: the infinitely differentiable Monte Carlo estimator. In: International conference on machine learning, PMLR, pp 1529\u20131538"},{"issue":"7713","key":"10511_CR17","doi-asserted-by":"publisher","first-page":"246","DOI":"10.1038\/s41586-018-0277-x","volume":"559","author":"C Hilbe","year":"2018","unstructured":"Hilbe C, \u0160imsa \u0160, Chatterjee K et al (2018) Evolution of cooperation in stochastic games. Nature 559(7713):246\u2013249","journal-title":"Nature"},{"key":"10511_CR18","unstructured":"Lerer A, Peysakhovich A (2017) Maintaining cooperation in complex social dilemmas using deep reinforcement learning. arXiv preprint arXiv:1707.01068"},{"key":"10511_CR19","doi-asserted-by":"crossref","unstructured":"Das A, Kottur S, Moura JM et\u00a0al (2017) Learning cooperative visual dialog agents with deep reinforcement learning. In: Proceedings of the IEEE international conference on computer vision, pp 2951\u20132960","DOI":"10.1109\/ICCV.2017.321"},{"key":"10511_CR20","unstructured":"Foerster JN, Assael YM, De\u00a0Freitas N et\u00a0al (2016) Learning to communicate with deep multi-agent reinforcement learning. arXiv preprint arXiv:1605.06676"},{"key":"10511_CR21","unstructured":"Lazaridou A, Peysakhovich A, Baroni M (2016) Multi-agent cooperation and the emergence of (natural) language. arXiv preprint arXiv:1612.07182"},{"key":"10511_CR22","doi-asserted-by":"publisher","DOI":"10.1016\/j.socec.2020.101535","volume":"86","author":"V Capraro","year":"2020","unstructured":"Capraro V, Rodriguez-Lara I, Ruiz-Martos MJ (2020) Preferences for efficiency, rather than preferences for morality, drive cooperation in the one-shot stag-hunt game. J Behav Exp Econ 86:101535","journal-title":"J Behav Exp Econ"},{"issue":"2","key":"10511_CR23","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1093\/comjnl\/bxh164","volume":"49","author":"J Pitt","year":"2006","unstructured":"Pitt J, Kamara L, Sergot M et al (2006) Voting in multi-agent systems. Comput J 49(2):156\u201317. https:\/\/doi.org\/10.1093\/comjnl\/bxh164","journal-title":"Comput J"},{"key":"10511_CR24","doi-asserted-by":"crossref","unstructured":"Chevaleyre Y, Endriss U, Lang J et\u00a0al (2007) A short introduction to computational social choice. In: International conference on current trends in theory and practice of computer science, Springer, Berlin, pp 51\u201369","DOI":"10.1007\/978-3-540-69507-3_4"},{"key":"10511_CR25","doi-asserted-by":"crossref","unstructured":"Foerster J, Farquhar G, Afouras T et\u00a0al (2018b) Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence","DOI":"10.1609\/aaai.v32i1.11794"},{"key":"10511_CR26","unstructured":"Foerster J, Nardelli N, Farquhar G et\u00a0al (2017) Stabilising experience replay for deep multi-agent reinforcement learning. In: International conference on machine learning, PMLR, pp 1146\u20131155"},{"key":"10511_CR27","doi-asserted-by":"crossref","unstructured":"Wen C, Yao X, Wang Y et\u00a0al (2020) Smix ($$\\lambda$$): enhancing centralized value functions for cooperative multi-agent reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 7301\u20137308","DOI":"10.1609\/aaai.v34i05.6223"},{"issue":"156","key":"10511_CR28","doi-asserted-by":"publisher","first-page":"20190127","DOI":"10.1098\/rsif.2019.0127","volume":"16","author":"J Garc\u00eda","year":"2019","unstructured":"Garc\u00eda J, Traulsen A (2019) Evolution of coordinated punishment to enforce cooperation from an unbiased strategy space. J R Soc Interface 16(156):20190127","journal-title":"J R Soc Interface"},{"key":"10511_CR29","doi-asserted-by":"crossref","unstructured":"Smith JM (1982) Evolution and the theory of games. In: Did Darwin get it right? Essays on games, sex and evolution. Springer, Berlin, pp 202\u2013215","DOI":"10.1007\/978-1-4684-7862-4_22"},{"key":"10511_CR30","volume-title":"Population games and evolutionary dynamics","author":"WH Sandholm","year":"2010","unstructured":"Sandholm WH (2010) Population games and evolutionary dynamics. MIT Press, Cambridge"},{"issue":"7782","key":"10511_CR31","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","volume":"575","author":"O Vinyals","year":"2019","unstructured":"Vinyals O, Babuschkin I, Czarnecki WM et al (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782):350\u2013354","journal-title":"Nature"},{"issue":"6337","key":"10511_CR32","doi-asserted-by":"publisher","first-page":"508","DOI":"10.1126\/science.aam6960","volume":"356","author":"M Morav\u010d\u00edk","year":"2017","unstructured":"Morav\u010d\u00edk M, Schmid M, Burch N et al (2017) Deepstack: expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337):508\u2013513","journal-title":"Science"},{"issue":"7587","key":"10511_CR33","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1038\/nature16961","volume":"529","author":"D Silver","year":"2016","unstructured":"Silver D, Huang A, Maddison CJ et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484\u2013489","journal-title":"Nature"},{"issue":"1","key":"10511_CR34","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1017\/S026988890500041X","volume":"20","author":"K Tuyls","year":"2005","unstructured":"Tuyls K, Now\u00e9 A (2005) Evolutionary game theory and multi-agent reinforcement learning. Knowl Eng Rev 20(1):63\u20139. https:\/\/doi.org\/10.1017\/S026988890500041X","journal-title":"Knowl Eng Rev"},{"key":"10511_CR35","unstructured":"Lu C, Willi T, de\u00a0Witt CS et\u00a0al (2022) Model-free opponent shaping. In: ICLR 2022 workshop on gamification and multiagent solutions"},{"key":"10511_CR36","unstructured":"Badjatiya P, Sarkar M, Sinha A et\u00a0al (2020) Inducing cooperative behaviour in sequential-social dilemmas through multi-agent reinforcement learning using status-quo loss. arXiv preprint arXiv:2001.05458"},{"key":"10511_CR37","unstructured":"Eccles T, Hughes E, Kram\u00e1r J et\u00a0al (2019) The imitation game: learned reciprocity in Markov games. In: AAMAS, pp 1934\u20131936"},{"issue":"2","key":"10511_CR38","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1016\/S0004-3702(02)00121-2","volume":"136","author":"M Bowling","year":"2002","unstructured":"Bowling M, Veloso M (2002) Multiagent learning using a variable learning rate. Artif Intell 136(2):215\u2013250","journal-title":"Artif Intell"},{"key":"10511_CR39","unstructured":"Lanctot M, Zambaldi V, Gruslys A et\u00a0al (2017) A unified game-theoretic approach to multiagent reinforcement learning. In: Proceedings of the 31st international conference on neural information processing systems, pp 4193\u20134206"},{"issue":"2","key":"10511_CR40","doi-asserted-by":"publisher","first-page":"452","DOI":"10.1287\/mnsc.1120.1645","volume":"59","author":"TH Ho","year":"2013","unstructured":"Ho TH, Su X (2013) A dynamic level-k model in sequential games. Manag Sci 59(2):452\u2013469","journal-title":"Manag Sci"},{"key":"10511_CR41","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1613\/jair.1579","volume":"24","author":"PJ Gmytrasiewicz","year":"2005","unstructured":"Gmytrasiewicz PJ, Doshi P (2005) A framework for sequential planning in multi-agent settings. J Artif Intell Res 24:49\u201379","journal-title":"J Artif Intell Res"},{"key":"10511_CR42","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2019.103202","volume":"279","author":"P Doshi","year":"2020","unstructured":"Doshi P, Gmytrasiewicz P, Durfee E (2020) Recursively modeling other agents for decision making: a research perspective. Artif Intell 279:103202","journal-title":"Artif Intell"},{"key":"10511_CR43","unstructured":"Woodward MP, Wood RJ (2012) Learning from humans as an I-POMDP. arXiv:1204.0274"},{"key":"10511_CR44","unstructured":"Hoang TN, Low KH (2013) Interactive POMDP lite: towards practical planning to predict and exploit intentions for interacting with self-interested agents. In: Proceedings of the twenty-third international joint conference on artificial intelligence, pp 2298\u20132305"},{"issue":"1","key":"10511_CR45","first-page":"374","volume":"13","author":"GW Brown","year":"1951","unstructured":"Brown GW (1951) Iterative solution of games by fictitious play. Act Anal Prod Alloc 13(1):374\u2013376","journal-title":"Act Anal Prod Alloc"},{"issue":"1","key":"10511_CR46","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1109\/TCIAIG.2015.2491611","volume":"9","author":"R Mealing","year":"2015","unstructured":"Mealing R, Shapiro JL (2015) Opponent modeling by expectation\u2013maximization and sequence prediction in simplified poker. IEEE Trans Comput Intell AI Games 9(1):11\u201324","journal-title":"IEEE Trans Comput Intell AI Games"},{"key":"10511_CR47","unstructured":"Hernandez-Leal P, Kaisers M, Baarslag T et\u00a0al (2017) A survey of learning in multiagent environments: dealing with non-stationarity. arXiv preprint arXiv:1707.09183"},{"issue":"5","key":"10511_CR48","first-page":"679","volume":"6","author":"R Bellman","year":"1957","unstructured":"Bellman R (1957) A Markovian decision process. J Math Mech 6(5):679\u2013684","journal-title":"J Math Mech"},{"key":"10511_CR49","volume-title":"Reinforcement learning: an introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge"},{"key":"10511_CR50","unstructured":"Balaji N, Kiefer S, Novotn\u1ef3 P et\u00a0al (2018) On the complexity of value iteration. arXiv preprint arXiv:1807.04920"},{"key":"10511_CR51","unstructured":"Roger BM et al (1991) Game theory: analysis of conflict. The president and fellows of Harvard College, USA, 66"},{"issue":"26","key":"10511_CR52","doi-asserted-by":"publisher","first-page":"10409","DOI":"10.1073\/pnas.1206569109","volume":"109","author":"WH Press","year":"2012","unstructured":"Press WH, Dyson FJ (2012) Iterated prisoner\u2019s dilemma contains strategies that dominate any evolutionary opponent. Proc Natl Acad Sci 109(26):10409\u201310413","journal-title":"Proc Natl Acad Sci"},{"key":"10511_CR53","unstructured":"Pytorch implementation of LOLA using DiCE (2018). https:\/\/github.com\/alexis-jacq\/LOLA_DiCE Accessed 25 Oct 2022"},{"key":"10511_CR54","unstructured":"Raileanu R, Denton E, Szlam A et\u00a0al (2018) Modeling others using oneself in multi-agent reinforcement learning. In: International conference on machine learning, PMLR, pp 4257\u20134266"},{"key":"10511_CR55","first-page":"28208","volume":"35","author":"X Yu","year":"2022","unstructured":"Yu X, Jiang J, Zhang W et al (2022) Model-based opponent modeling. Adv Neural Inf Process Syst 35:28208\u201328221","journal-title":"Adv Neural Inf Process Syst"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-024-10511-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-024-10511-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-024-10511-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,6]],"date-time":"2025-09-06T02:43:10Z","timestamp":1757126590000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-024-10511-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,7]]},"references-count":55,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["10511"],"URL":"https:\/\/doi.org\/10.1007\/s00521-024-10511-9","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"value":"0941-0643","type":"print"},{"value":"1433-3058","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,7]]},"assertion":[{"value":"19 November 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 September 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 January 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"All authors have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}