{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T04:17:18Z","timestamp":1777522638722,"version":"3.51.4"},"reference-count":46,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2012,6,25]],"date-time":"2012-06-25T00:00:00Z","timestamp":1340582400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Adaptive Behavior"],"published-print":{"date-parts":[[2012,8]]},"abstract":"<jats:p>We describe a class of stateful games, which we call \u2018medium-access games\u2019, as a model for human and machine communication and demonstrate how to use the Nash equilibria of those games as played by pairs of agents with stationary policies to predict turn-taking behaviour in Q-learning agents based on the agents\u2019 reward function. We identify which fixed policies exhibit turn-taking behaviour in medium-access games and show how to compute the Nash equilibria of such games by using Markov chain methods to calculate the agents\u2019 expected rewards for different stationary policies. We present simulation results for an extensive range of reward functions for pairs of Q-learners playing medium-access games and we use our analysis for stationary agents to develop predictors for the emergence of turn-taking. We explain how to use our predictors to design reward functions for pairs of Q-learning agents that are conducive (or prohibitive) to the emergence of turn-taking in medium-access games. We focus on designing multi-agent reinforcement learning systems that deliberately produce coordinated turn-taking but we also intend our results to be useful for analysing emergent turn-taking behaviour. Based on our turn-taking related results, we suggest ways to use our methodology to designs rewards for quantifiable behaviours besides turn-taking.<\/jats:p>","DOI":"10.1177\/1059712312449547","type":"journal-article","created":{"date-parts":[[2012,6,27]],"date-time":"2012-06-27T03:43:48Z","timestamp":1340768628000},"page":"304-318","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":0,"title":["Rewards for pairs of Q-learning agents conducive to turn-taking in medium-access games"],"prefix":"10.1177","volume":"20","author":[{"given":"Peter A","family":"Raffensperger","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Canterbury, New Zealand"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Philip J","family":"Bones","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Canterbury, New Zealand"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Allan I","family":"McInnes","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Canterbury, New Zealand"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Russell Y","family":"Webb","sequence":"additional","affiliation":[{"name":"Apple Computer Inc., Cupertino, California, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2012,6,25]]},"reference":[{"key":"bibr1-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(02)00121-2"},{"key":"bibr2-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCC.2007.913919"},{"key":"bibr3-1059712312449547","first-page":"132","volume-title":"AAAI fall symposium on dialog with robots","author":"Chao C.","year":"2010"},{"key":"bibr4-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2010.2041066"},{"issue":"6","key":"bibr5-1059712312449547","first-page":"949","volume":"11","author":"Colman A. M.","year":"2009","journal-title":"Evolutionary Ecology Research"},{"key":"bibr6-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-010-5192-9"},{"key":"bibr7-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1177\/105971230000800103"},{"key":"bibr8-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2005.10.010"},{"key":"bibr9-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1142\/S0219198902000756"},{"key":"bibr10-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.5.1.5"},{"key":"bibr11-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-006-0008-9"},{"key":"bibr12-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2006.12.006"},{"key":"bibr13-1059712312449547","first-page":"242","volume-title":"20th International Conference on Machine Learning","author":"Greenwald A.","year":"2003"},{"key":"bibr14-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511613586"},{"key":"bibr15-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1016\/j.cub.2009.11.045"},{"key":"bibr16-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1126\/science.162.3859.1243"},{"key":"bibr17-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1142\/S0219525905000361"},{"key":"bibr18-1059712312449547","first-page":"1039","volume":"4","author":"Hu J.","year":"2003","journal-title":"The Journal of Machine Learning Research"},{"key":"bibr19-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1162\/1064546041766442"},{"key":"bibr20-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1145\/267658.267738"},{"key":"bibr21-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1109\/ROMAN.2008.4600690"},{"key":"bibr22-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2009.933370"},{"key":"bibr23-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2009.933372"},{"key":"bibr24-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1016\/j.anbehav.2003.05.009"},{"key":"bibr25-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.36.1.48"},{"key":"bibr26-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1006\/jtbi.2001.2337"},{"key":"bibr27-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1145\/846241.846270"},{"key":"bibr28-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1080\/09540090500177554"},{"key":"bibr29-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316887"},{"key":"bibr30-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-44811-X_38"},{"key":"bibr31-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1177\/1059712311421831"},{"key":"bibr32-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1080\/08927014.1989.9525516"},{"key":"bibr33-1059712312449547","doi-asserted-by":"publisher","DOI":"10.3115\/1622064.1622066"},{"key":"bibr34-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1353\/lan.1974.0010"},{"key":"bibr35-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0903616106"},{"key":"bibr36-1059712312449547","volume-title":"Reinforcement Learning","author":"Sutton R. S.","year":"1998"},{"key":"bibr37-1059712312449547","volume-title":"Computer Networks","author":"Tanenbaum A. S.","year":"2002"},{"key":"bibr38-1059712312449547","doi-asserted-by":"publisher","DOI":"10.2307\/1939266"},{"key":"bibr39-1059712312449547","first-page":"1057","volume-title":"9th international conference on spoken language processing","author":"Turunen M.","year":"2006"},{"key":"bibr40-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511800481.005"},{"key":"bibr41-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1177\/10597123030111003"},{"key":"bibr42-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1109\/ADPRL.2007.368173"},{"key":"bibr43-1059712312449547","unstructured":"Watkins C. (1989). Learning from delayed rewards. PhD Thesis, Cambridge University, UK."},{"key":"bibr44-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1145\/1514095.1514149"},{"key":"bibr45-1059712312449547","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2009.933185"},{"key":"bibr46-1059712312449547","first-page":"1641","volume-title":"Advances in Neural Information Processing Systems","volume":"18","author":"Zinkevich M.","year":"2006"}],"container-title":["Adaptive Behavior"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1059712312449547","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1059712312449547","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1059712312449547","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T16:18:16Z","timestamp":1777393096000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1059712312449547"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,6,25]]},"references-count":46,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2012,8]]}},"alternative-id":["10.1177\/1059712312449547"],"URL":"https:\/\/doi.org\/10.1177\/1059712312449547","relation":{},"ISSN":["1059-7123","1741-2633"],"issn-type":[{"value":"1059-7123","type":"print"},{"value":"1741-2633","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,6,25]]}}}