{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T09:34:37Z","timestamp":1778578477646,"version":"3.51.4"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1013302","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T00:00:00Z","timestamp":1757980800000}}],"reference-count":57,"publisher":"Public Library of Science (PLoS)","issue":"8","license":[{"start":{"date-parts":[[2025,8,26]],"date-time":"2025-08-26T00:00:00Z","timestamp":1756166400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>The coevolution of signalling is a complex problem within animal behaviour, and is also central to communication between artificial agents. The Sir Philip Sidney game was designed to model this dyadic interaction from an evolutionary biology perspective, and was formulated to demonstrate the emergence of honest signalling. We use Multi-Agent Reinforcement Learning (MARL) to show that in the majority of cases, the resulting behaviour adopted by agents is not that shown in the original derivation of the model. This paper demonstrates that MARL can be a powerful tool to study evolutionary dynamics and understand the underlying mechanisms of learning over generations; particularly advantageous is the interpretability of this type of approach, as well as the fact that it allows us to study emergent behaviour without the need to constrain the strategy space from the outset. Although it originally set out to exemplify honest signalling, we show that the game provides no incentive for such behaviour. In the majority of cases, the optimal outcome is one that does not require a signal for the resource to be given. This type of interaction is observed within animal behaviour and is sometimes referred to as proactive prosociality. High learning and low discount rates of the reinforcement learning model are shown to be optimal in order to achieve the outcome that maximises both agents\u2019 reward, and proximity to the given threshold leads to suboptimal learning.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1013302","type":"journal-article","created":{"date-parts":[[2025,8,26]],"date-time":"2025-08-26T17:49:53Z","timestamp":1756230593000},"page":"e1013302","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":1,"title":["Maynard Smith revisited: A multi-agent reinforcement learning approach to the coevolution of signalling behaviour"],"prefix":"10.1371","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6330-4704","authenticated-orcid":true,"given":"Olivia","family":"Macmillan-Scott","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mirco","family":"Musolesi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"340","published-online":{"date-parts":[[2025,8,26]]},"reference":[{"key":"pcbi.1013302.ref001","doi-asserted-by":"crossref","first-page":"10873","DOI":"10.1073\/pnas.1400838111","article-title":"Some dynamics of signaling games","author":"S Huttegger","year":"2014","journal-title":"Proc Natl Acad Sci U S A."},{"key":"pcbi.1013302.ref002","doi-asserted-by":"crossref","unstructured":"d\u2019Ettorre P, Hughes DP. Sociobiology of communication: an interdisciplinary perspective. Oxford University Press; 2008.","DOI":"10.1093\/acprof:oso\/9780199216840.001.0001"},{"issue":"1","key":"pcbi.1013302.ref003","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1016\/0022-5193(75)90111-3","article-title":"Mate selection-a selection for a handicap","volume":"53","author":"A Zahavi","year":"1975","journal-title":"J Theor Biol."},{"issue":"3","key":"pcbi.1013302.ref004","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1016\/S0022-5193(05)80082-7","article-title":"Efficiency in evolutionary games: Darwin, Nash and the secret handshake","volume":"144","author":"AJ Robson","year":"1990","journal-title":"J Theor Biol."},{"issue":"2","key":"pcbi.1013302.ref005","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1006\/anbe.2001.1763","article-title":"Begging and parent\u2013offspring conflict in grey seals","volume":"62","author":"PT Smiseth","year":"2001","journal-title":"Animal Behaviour."},{"issue":"2","key":"pcbi.1013302.ref006","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1016\/S0022-5193(05)80674-5","article-title":"The continuous Sir Philip Sidney game: a simple model of biological signalling","volume":"156","author":"RA Johnstone","year":"1992","journal-title":"J Theor Biol."},{"key":"pcbi.1013302.ref007","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1038\/308181a0","article-title":"Reciprocal food sharing in the vampire bat","volume":"308","author":"G Wilkinson","year":"1984","journal-title":"Nat."},{"key":"pcbi.1013302.ref008"},{"issue":"3","key":"pcbi.1013302.ref009","first-page":"603","article-title":"The cost of honesty (further remarks on the handicap principle)","volume":"67","author":"A Zahavi","year":"1977","journal-title":"J Theor Biol."},{"issue":"4","key":"pcbi.1013302.ref010","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1016\/S0022-5193(05)80088-8","article-title":"Biological signals as handicaps","volume":"144","author":"A Grafen","year":"1990","journal-title":"J Theor Biol."},{"issue":"1774","key":"pcbi.1013302.ref011","first-page":"20132457","article-title":"Why not lie? Costs enforce honesty in an experimental signalling game","volume":"281","author":"TJ Polnaszek","year":"2013","journal-title":"Proc Biol Sci."},{"issue":"6","key":"pcbi.1013302.ref012","doi-asserted-by":"crossref","first-page":"1034","DOI":"10.1016\/S0003-3472(05)80161-7","article-title":"Honest signalling: the Philip Sidney game","volume":"42","author":"JM Smith","year":"1991","journal-title":"Animal Behaviour."},{"issue":"2","key":"pcbi.1013302.ref013","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1080\/09540091.2014.885303","article-title":"The limits and robustness of reinforcement learning in Lewis signalling games","volume":"26","author":"D Catteeuw","year":"2014","journal-title":"Connection Science."},{"key":"pcbi.1013302.ref014","unstructured":"Lewis DK. Convention: A Philosophical Study. Cambridge: Harvard University Press; 1969."},{"issue":"1838","key":"pcbi.1013302.ref015","doi-asserted-by":"crossref","first-page":"20200291","DOI":"10.1098\/rstb.2020.0291","article-title":"The complexity of human cooperation under indirect reciprocity","volume":"376","author":"FP Santos","year":"2021","journal-title":"Philos Trans R Soc Lond B Biol Sci."},{"key":"pcbi.1013302.ref016","unstructured":"Smit J, Santos FP. Fairness and cooperation between independent reinforcement learners through indirect reciprocity. In: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems. AAMAS\u201924; 2024. p. 2468\u201370."},{"key":"pcbi.1013302.ref017","unstructured":"Anastassacos N, Garc\u00eda J, Hailes S, Musolesi M. Cooperation and reputation dynamics with reinforcement learning. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. AAMAS\u201921; 2021. p. 115\u201323."},{"key":"pcbi.1013302.ref018","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1613\/jair.433","article-title":"Towards flexible teamwork","volume":"7","author":"M Tambe","year":"1997","journal-title":"The Journal of Artificial Intelligence Research."},{"key":"pcbi.1013302.ref019","doi-asserted-by":"crossref","unstructured":"Barrett S, Agmon N, Hazon N, Kraus S, Stone P. Communicating with unknown teammates. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems. AAMAS\u201914; 2014. p. 1433\u20134.","DOI":"10.3233\/978-1-61499-419-0-45"},{"key":"pcbi.1013302.ref020","unstructured":"Dafoe A, Hughes E, Bachrach Y, Collins T, McKee KR, Leibo JZ, et al.. Open Problems in Cooperative AI; 2020."},{"key":"pcbi.1013302.ref021","unstructured":"Clark HH, Brennan SE. Grounding in communication. In: Resnick L, B L, John M, Teasley S, D D, editors. Perspectives on socially shared cognition. American Psychological Association; 1991. p. 13\u20131991."},{"key":"pcbi.1013302.ref022","doi-asserted-by":"crossref","unstructured":"Camera G, Casari M, Bigoni M. Communication, commitment, and deception in social dilemmas: experimental evidence. Dipartimento Scienze Economiche, Universita\u2019 di Bologna; 2011.","DOI":"10.2139\/ssrn.1854132"},{"issue":"1","key":"pcbi.1013302.ref023","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1002\/aaai.12143","article-title":"Prosocial dynamics in multiagent systems","volume":"45","author":"FP Santos","year":"2024","journal-title":"AI Magazine."},{"issue":"4","key":"pcbi.1013302.ref024","doi-asserted-by":"crossref","first-page":"327","DOI":"10.3233\/AIC-220104","article-title":"Emergent behaviours in multi-agent systems with evolutionary game theory","volume":"35","author":"TA Han","year":"2022","journal-title":"AIC."},{"key":"pcbi.1013302.ref025","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1016\/j.beproc.2018.01.008","article-title":"Enriching behavioral ecology with reinforcement learning methods","volume":"161","author":"WE Frankenhuis","year":"2019","journal-title":"Behav Processes."},{"issue":"2","key":"pcbi.1013302.ref026","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1016\/j.tree.2015.11.009","article-title":"How can evolution learn?","volume":"31","author":"RA Watson","year":"2016","journal-title":"Trends in Ecology & Evolution."},{"issue":"1876","key":"pcbi.1013302.ref027","doi-asserted-by":"crossref","first-page":"20210508","DOI":"10.1098\/rstb.2021.0508","article-title":"The future of theoretical evolutionary game theory","volume":"378","author":"A Traulsen","year":"2023","journal-title":"Philos Trans R Soc Lond B Biol Sci."},{"issue":"1","key":"pcbi.1013302.ref028","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1017\/S026988890500041X","article-title":"Evolutionary game theory and multi-agent reinforcement learning","volume":"20","author":"K Tuyls","year":"2005","journal-title":"The Knowledge Engineering Review."},{"issue":"2","key":"pcbi.1013302.ref029","doi-asserted-by":"crossref","first-page":"163","DOI":"10.3390\/g4020163","article-title":"The dynamics of costly signaling","volume":"4","author":"E Wagner","year":"2013","journal-title":"Games."},{"issue":"1689","key":"pcbi.1013302.ref030","first-page":"1915","article-title":"Dynamic stability and basins of attraction in the Sir Philip Sidney game","volume":"277","author":"SM Huttegger","year":"2010","journal-title":"Proc Biol Sci."},{"key":"pcbi.1013302.ref031","doi-asserted-by":"crossref","first-page":"114565","DOI":"10.1016\/j.chaos.2024.114565","article-title":"On the number of equilibria of the replicator-mutator dynamics for noisy social dilemmas","volume":"180","author":"L Chen","year":"2024","journal-title":"Chaos, Solitons & Fractals."},{"issue":"3","key":"pcbi.1013302.ref032","doi-asserted-by":"crossref","first-page":"641","DOI":"10.1007\/s13235-019-00338-8","article-title":"On equilibrium properties of the replicator\u2013mutator equation in deterministic and random games","volume":"10","author":"MH Duong","year":"2019","journal-title":"Dyn Games Appl."},{"issue":"2","key":"pcbi.1013302.ref033","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1016\/j.jtbi.2004.05.004","article-title":"Replicator-mutator equation, universality property and population dynamics of learning","volume":"230","author":"NL Komarova","year":"2004","journal-title":"J Theor Biol."},{"key":"pcbi.1013302.ref034","doi-asserted-by":"crossref","unstructured":"Catteeuw D, Manderick B, Han TA. Evolutionary stability of honest signaling in finite populations. In: 2013 IEEE Congress on Evolutionary Computation, 2013. p. 2864\u201370. https:\/\/doi.org\/10.1109\/cec.2013.6557917","DOI":"10.1109\/CEC.2013.6557917"},{"key":"pcbi.1013302.ref035","doi-asserted-by":"crossref","unstructured":"Catteeuw D, Han TA, Manderick B. Evolution of honest signaling by social punishment. In: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. 2014. p. 153\u201360. https:\/\/doi.org\/10.1145\/2576768.2598312","DOI":"10.1145\/2576768.2598312"},{"issue":"1","key":"pcbi.1013302.ref036","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1006\/jeth.1997.2319","article-title":"Learning through reinforcement and replicator dynamics","volume":"77","author":"T B\u00f6rgers","year":"1997","journal-title":"Journal of Economic Theory."},{"key":"pcbi.1013302.ref037","unstructured":"Tuyls K, Lenaerts T, Verbeeck K, Maes S, Manderick B. Towards a relation between learning agents and evolutionary dynamics. In: Proceedings of the Fourteenth Belgium-Netherlands Conference on Artificial Intelligence; 2002. p. 315\u201322."},{"key":"pcbi.1013302.ref038","doi-asserted-by":"crossref","unstructured":"Tuyls K, Verbeeck K, Lenaerts T. A selection-mutation model for q-learning in multi-agent systems. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems. 2003. p. 693\u2013700.","DOI":"10.1145\/860575.860687"},{"issue":"2231","key":"pcbi.1013302.ref039","first-page":"20190355","article-title":"The stabilization of equilibria in evolutionary game dynamics through mutation: mutation limits in evolutionary games","volume":"475","author":"J Bauer","year":"2019","journal-title":"Proc Math Phys Eng Sci."},{"issue":"5","key":"pcbi.1013302.ref040","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1006122","article-title":"Identification of animal behavioral strategies by inverse reinforcement learning","volume":"14","author":"S Yamaguchi","year":"2018","journal-title":"PLoS Comput Biol."},{"issue":"11","key":"pcbi.1013302.ref041","doi-asserted-by":"crossref","first-page":"160734","DOI":"10.1098\/rsos.160734","article-title":"The power of associative learning and the ontogeny of optimal behaviour","volume":"3","author":"M Enquist","year":"2016","journal-title":"R Soc Open Sci."},{"key":"pcbi.1013302.ref042","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1016\/j.anbehav.2015.01.037","article-title":"A model for the evolution of reinforcement learning in fluctuating games","volume":"104","author":"S Dridi","year":"2015","journal-title":"Animal Behaviour."},{"key":"pcbi.1013302.ref043","doi-asserted-by":"crossref","unstructured":"Tsutsui K, Takeda K, Fujii K. Emergence of collaborative hunting via multi-agent deep reinforcement learning. In: Rousseau JJ, Kapralos B, editors. Pattern recognition, computer vision, and image processing. Cham: Springer; 2023. p. 210\u201324.","DOI":"10.1007\/978-3-031-37660-3_15"},{"key":"pcbi.1013302.ref044","doi-asserted-by":"crossref","unstructured":"Yamada J, Shawe-Taylor J, Fountas Z. Evolution of a complex predator-prey ecosystem on large-scale multi-agent deep reinforcement learning. 2020.","DOI":"10.1109\/IJCNN48605.2020.9206765"},{"key":"pcbi.1013302.ref045","doi-asserted-by":"crossref","first-page":"012601","DOI":"10.1103\/PhysRevE.102.012601","article-title":"Learning to flock through reinforcement","volume":"102","author":"M Durve","year":"2020","journal-title":"Phys Rev E."},{"key":"pcbi.1013302.ref046","doi-asserted-by":"crossref","unstructured":"Hahn C, Phan T, Gabor T, Belzner L, Linnhoff-Popien C. Emergent escape-based flocking behavior using multi-agent reinforcement learning. In: Artificial Life Conference Proceedings. 2019. p. 598\u2013605.","DOI":"10.1162\/isal_a_00226.xml"},{"issue":"4","key":"pcbi.1013302.ref047","doi-asserted-by":"crossref","first-page":"1152","DOI":"10.1016\/S0003-3472(85)80175-5","article-title":"Communication during aggressive interactions with particular reference to variation in choice of behaviour","volume":"33","author":"M Enquist","year":"1985","journal-title":"Animal Behaviour."},{"issue":"1353","key":"pcbi.1013302.ref048","doi-asserted-by":"crossref","first-page":"609","DOI":"10.1098\/rstb.1997.0041","article-title":"Signalling among relatives. I. Is costly signalling too costly?","volume":"352","author":"CT Bergstrom","year":"1997","journal-title":"Phil Trans R Soc Lond B."},{"key":"pcbi.1013302.ref049","doi-asserted-by":"crossref","first-page":"176","DOI":"10.1016\/j.tpb.2009.02.002","article-title":"When will evolution lead to deceptive signaling in the Sir Philip Sidney game?","volume":"75","author":"S Hamblin","year":"2009","journal-title":"Theor Popul Biol."},{"key":"pcbi.1013302.ref050","doi-asserted-by":"crossref","first-page":"110513","DOI":"10.1016\/j.jtbi.2020.110513","article-title":"Strategic inattention in the Sir Philip Sidney Game","volume":"509","author":"M Whitmeyer","year":"2021","journal-title":"J Theor Biol."},{"key":"pcbi.1013302.ref051","doi-asserted-by":"crossref","unstructured":"Maynard Smith J, Harper D. Animal signals. Oxford University Press; 2003.","DOI":"10.1093\/oso\/9780198526841.001.0001"},{"issue":"25","key":"pcbi.1013302.ref052","doi-asserted-by":"crossref","first-page":"14637","DOI":"10.1073\/pnas.93.25.14637","article-title":"The evolution of begging: signaling and sibling competition","volume":"93","author":"MA Rodr\u00edguez-Giron\u00e9s","year":"1996","journal-title":"Proc Natl Acad Sci U S A."},{"issue":"3","key":"pcbi.1013302.ref053","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1007\/s10071-020-01363-6","article-title":"Marmoset prosociality is intentional","volume":"23","author":"JM Burkart","year":"2020","journal-title":"Anim Cogn."},{"issue":"10","key":"pcbi.1013302.ref054","doi-asserted-by":"crossref","first-page":"20160649","DOI":"10.1098\/rsbl.2016.0649","article-title":"Proactive prosociality in a cooperatively breeding corvid, the azure-winged magpie (Cyanopica cyana)","volume":"12","author":"L Horn","year":"2016","journal-title":"Biol Lett."},{"key":"pcbi.1013302.ref055","doi-asserted-by":"crossref","first-page":"4747","DOI":"10.1038\/ncomms5747","article-title":"The evolutionary origin of human hyper-cooperation","volume":"5","author":"JM Burkart","year":"2014","journal-title":"Nat Commun."},{"issue":"3","key":"pcbi.1013302.ref056","first-page":"279","article-title":"Q-learning","volume":"8","author":"CJCH Watkins","year":"1992","journal-title":"Machine Learning."},{"key":"pcbi.1013302.ref057","doi-asserted-by":"crossref","unstructured":"Anastassacos N, Hailes S, Musolesi M. Partner selection for the emergence of cooperation in multi-agent systems using reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2020. p. 7047\u201354.","DOI":"10.1609\/aaai.v34i05.6190"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1013302","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T00:00:00Z","timestamp":1757980800000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1013302","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T17:47:23Z","timestamp":1758044843000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1013302"}},"subtitle":[],"editor":[{"given":"Feng","family":"Fu","sequence":"first","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2025,8,26]]},"references-count":57,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2025,8,26]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1013302","relation":{"new_version":[{"id-type":"doi","id":"10.1371\/journal.pcbi.1013302","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,26]]}}}