{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,15]],"date-time":"2026-01-15T00:01:25Z","timestamp":1768435285638,"version":"3.49.0"},"reference-count":52,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2010,5,20]],"date-time":"2010-05-20T00:00:00Z","timestamp":1274313600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2011,3]]},"DOI":"10.1007\/s10994-010-5192-9","type":"journal-article","created":{"date-parts":[[2010,5,19]],"date-time":"2010-05-19T12:51:32Z","timestamp":1274273492000},"page":"281-314","source":"Crossref","is-referenced-by-count":33,"title":["Learning to compete, coordinate, and cooperate in\u00a0repeated games using reinforcement learning"],"prefix":"10.1007","volume":"82","author":[{"given":"Jacob W.","family":"Crandall","sequence":"first","affiliation":[]},{"given":"Michael A.","family":"Goodrich","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2010,5,20]]},"reference":[{"key":"5192_CR1","doi-asserted-by":"crossref","first-page":"521","DOI":"10.1613\/jair.2628","volume":"33","author":"S. Abdallah","year":"2008","unstructured":"Abdallah, S., & Lesser, V. (2008). A multi-agent learning algorithm with non-linear dynamics. Journal of Artificial Intelligence Research, 33, 521\u2013549.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"5192_CR2","first-page":"322","volume-title":"Proceedings of the 36th annual symposium on foundations of computer science","author":"P. Auer","year":"1995","unstructured":"Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (1995). Gambling in a rigged casino: the adversarial multi-armed bandit problem. In Proceedings of the 36th annual symposium on foundations of computer science (pp. 322\u2013331). Los Alamitos: IEEE Comput. Soc."},{"key":"5192_CR3","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1016\/0304-4068(74)90037-8","volume":"1","author":"R. J. Aumann","year":"1974","unstructured":"Aumann, R. J. (1974). Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics, 1, 67\u201396.","journal-title":"Journal of Mathematical Economics"},{"key":"5192_CR4","volume-title":"The evolution of cooperation","author":"R. Axelrod","year":"1984","unstructured":"Axelrod, R. (1984). The evolution of cooperation. New York: Basic Books."},{"key":"5192_CR5","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1007\/s10458-007-0020-8","volume":"15","author":"D. Banerjee","year":"2007","unstructured":"Banerjee, D., & Sen, S. (2007). Reaching pareto-optimality in prisoner\u2019s dilemma using conditional joint action learning. Autonomous Agents and Multi-Agent Systems, 15, 91\u2013108.","journal-title":"Autonomous Agents and Multi-Agent Systems"},{"key":"5192_CR6","first-page":"365","volume-title":"STOC\u201908: proceedings of the 40th annual symposium on theory of computing","author":"C. Borgs","year":"2008","unstructured":"Borgs, C., Chayes, J., Immorlica, N., Kalai, A. T., Mirrokni, V., & Papadimitriou, C. (2008). The myth of the folk theorem. In STOC\u201908: proceedings of the 40th annual symposium on theory of computing (pp. 365\u2013372). New York: ACM."},{"key":"5192_CR7","unstructured":"Bowling, M. (2000). Convergence problems of general-sum multiagent reinforcement learning. In Proceedings of the 17th international conference on machine learning (pp.\u00a089\u201394)."},{"key":"5192_CR8","unstructured":"Bowling, M. (2005). Convergence and no-regret in multiagent learning. In Advances in neural information processing systems (Vol.\u00a017, pp.\u00a0209\u2013216)."},{"issue":"2","key":"5192_CR9","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1016\/S0004-3702(02)00121-2","volume":"136","author":"M. Bowling","year":"2002","unstructured":"Bowling, M., & Veloso, M. (2002). Multiagent learning using a variable learning rate. Artificial Intelligence, 136(2), 215\u2013250.","journal-title":"Artificial Intelligence"},{"key":"5192_CR10","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1162\/153244303765208377","volume":"3","author":"R. I. Brafman","year":"2003","unstructured":"Brafman, R. I., & Tennenholtz, M. (2003). R-max\u2014a general polynomial time algorithm for near-optimal reinforcement learning. The Journal of Machine Learning Research, 3, 213\u2013231.","journal-title":"The Journal of Machine Learning Research"},{"issue":"7","key":"5192_CR11","doi-asserted-by":"crossref","first-page":"434","DOI":"10.1016\/j.artint.2006.12.007","volume":"171","author":"Y.-H. Chang","year":"2007","unstructured":"Chang, Y.-H. (2007). No regrets about no-regret. Artificial Intelligence, 171(7), 434\u2013439.","journal-title":"Artificial Intelligence"},{"key":"5192_CR12","unstructured":"Chang, Y.-H., & Kaelbling, L. P. (2005). Hedge learning: regret-minimization with learning experts. In Proceedings of the 22nd international conference on machine learning (pp.\u00a0121\u2013128)."},{"key":"5192_CR13","volume-title":"Linear programming","author":"V. Chvatal","year":"1983","unstructured":"Chvatal, V. (1983). Linear programming. New York: Freeman."},{"key":"5192_CR14","unstructured":"Conitzer, V., & Sandholm, T. (2003). AWESOME: a general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In Proceedings of the 20th international conference on machine learning (pp.\u00a083\u201390)."},{"key":"5192_CR15","unstructured":"Crandall, J. W. (2005). Learning successful strategies in repeated general-sum games. Ph.D. thesis, Brigham Young University, Provo, UT."},{"key":"5192_CR16","doi-asserted-by":"crossref","unstructured":"Crandall, J. W., & Goodrich, M. A. (2005). Learning to compete, compromise, and cooperate in repeated general-sum games. In Proceedings of the 22nd international conference on machine learning, Bonn, Germany (pp.\u00a0161\u2013168).","DOI":"10.1145\/1102351.1102372"},{"key":"5192_CR17","doi-asserted-by":"crossref","unstructured":"de Cote, E. M., Lazaric, A., & Restelli, M. (2006). Learning to cooperate in multi-agent social dilemmas. In Proc. of the 5th int. joint conf. on autonomous agents and multiagent systems (pp.\u00a0783\u2013785).","DOI":"10.1145\/1160633.1160770"},{"key":"5192_CR18","unstructured":"de Farias, D., & Megiddo, N. (2004). How to combine expert (or novice) advice when actions impact the environment. In Advances in neural information processing systems (Vol.\u00a016)."},{"key":"5192_CR19","first-page":"409","volume":"17","author":"D. Farias de","year":"2005","unstructured":"de Farias, D., & Megiddo, N. (2005). Exploration\u2013exploitation tradeoffs for expert algorithms in reactive environments. Advances in Neural Information Processing Systems, 17, 409\u2013416.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"5192_CR20","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1006\/game.1999.0740","volume":"29","author":"D. P. Foster","year":"1999","unstructured":"Foster, D. P., & Vohra, R. (1999). Regret in the on-line decision problem. Games and Economic Behavior, 29, 7\u201335.","journal-title":"Games and Economic Behavior"},{"key":"5192_CR21","volume-title":"Passions within reason: the strategic role of the emotions","author":"R. H. Frank","year":"1988","unstructured":"Frank, R. H. (1988). Passions within reason: the strategic role of the emotions. New York: Norton."},{"key":"5192_CR22","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/3-540-59119-2_166","volume-title":"Proceedings of the 2nd European conference on computational learning theory","author":"Y. Freund","year":"1995","unstructured":"Freund, Y., & Schapire, R. E. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the 2nd European conference on computational learning theory (pp. 23\u201337). Los Alamitos: IEEE Comput. Soc."},{"key":"5192_CR23","volume-title":"The theory of learning in games","author":"D. Fudenberg","year":"1998","unstructured":"Fudenberg, D., & Levine, D. K. (1998). The theory of learning in games. Cambridge: The MIT Press."},{"key":"5192_CR24","unstructured":"Fulda, N., & Ventura, D. (2007). Predicting and preventing coordination problems in cooperative Q-learning systems. In Proceedings of the 20th joint international conference on artificial intelligence."},{"key":"5192_CR25","volume-title":"Game theory evolving: a problem-centered introduction to modeling strategic behavior","author":"H. Gintis","year":"2000","unstructured":"Gintis, H. (2000). Game theory evolving: a problem-centered introduction to modeling strategic behavior. Princeton: Princeton University Press."},{"key":"5192_CR26","unstructured":"Greenwald, A., & Hall, K. (2003). Correlated Q-learning. In Proceedings of the 20th international conference on machine learning (pp.\u00a0242\u2013249)."},{"key":"5192_CR27","unstructured":"Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: theoretical framework and an algorithm. In Proceedings of the 15th international conference on machine learning (pp.\u00a0242\u2013250)."},{"key":"5192_CR28","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1613\/jair.301","volume":"4","author":"L. P. Kaelbling","year":"1996","unstructured":"Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4, 237\u2013277.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"5192_CR29","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1006\/jeth.1997.2379","volume":"80","author":"R. Karandikar","year":"1998","unstructured":"Karandikar, R., Mookherjee, D., Ray, D., & Vega-Redondo, F. (1998). Evolving aspirations and cooperation. Journal of Economic Theory, 80, 292\u2013331.","journal-title":"Journal of Economic Theory"},{"key":"5192_CR30","doi-asserted-by":"crossref","unstructured":"Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th international conference on machine learning (pp.\u00a0157\u2013163).","DOI":"10.1016\/B978-1-55860-335-6.50027-1"},{"key":"5192_CR31","unstructured":"Littman, M. L. (2001). Friend-or-foe: Q-learning in general-sum games. In Proceedings of the 18th international conference on machine learning (pp.\u00a0322\u2013328)."},{"key":"5192_CR32","unstructured":"Littman, M. L., & Stone, P. (2001). Leading best-response strategies in repeated games. In IJCAI workshop on economic agents, models, and mechanisms, Seattle, WA."},{"key":"5192_CR33","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1016\/j.dss.2004.08.007","volume":"39","author":"M. L. Littman","year":"2005","unstructured":"Littman, M. L., & Stone, P. (2005). A polynomial-time Nash equilibrium algorithm for repeated games. Decision Support Systems, 39, 55\u201366.","journal-title":"Decision Support Systems"},{"key":"5192_CR34","unstructured":"Moody, J., Liu, Y., Saffell, M., & Youn, K. (2004). Stochastic direct reinforcement. In AAAI spring symposium on artificial multiagent learning, Washington, DC."},{"key":"5192_CR35","doi-asserted-by":"crossref","first-page":"155","DOI":"10.2307\/1907266","volume":"28","author":"J. F. Nash","year":"1950","unstructured":"Nash, J. F. (1950). The bargaining problem. Econometrica, 28, 155\u2013162.","journal-title":"Econometrica"},{"key":"5192_CR36","unstructured":"Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In Proceedings of the 19th international joint conference on artificial intelligence (pp.\u00a0817\u2013822)."},{"key":"5192_CR37","doi-asserted-by":"crossref","unstructured":"Qiao, H., Rozenblit, J., Szidarovszky, F., & Yang, L. (2006). Multi-agent learning model with bargaining. In Proceedings of the 2006 winter simulation conference (pp.\u00a0934\u2013940).","DOI":"10.1109\/WSC.2006.323178"},{"key":"5192_CR38","unstructured":"Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist sytems (Technical Report CUED\/F-INFENG-TR 166). Cambridge University, UK."},{"key":"5192_CR39","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1016\/0303-2647(95)01551-5","volume":"37","author":"T. W. Sandholm","year":"1996","unstructured":"Sandholm, T. W., & Crites, R. H. (1996). Multiagent reinforcement learning in the iterated prisoner\u2019s dilemma. Biosystems, 37, 147\u2013166.","journal-title":"Biosystems"},{"key":"5192_CR40","doi-asserted-by":"crossref","first-page":"1095","DOI":"10.1073\/pnas.39.10.1095","volume":"39","author":"L. S. Shapley","year":"1953","unstructured":"Shapley, L. S. (1953). Stochastic games. Proceedings of National Academy of Science, 39, 1095\u20131100.","journal-title":"Proceedings of National Academy of Science"},{"issue":"7","key":"5192_CR41","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1016\/j.artint.2006.02.006","volume":"171","author":"Y. Shoham","year":"2007","unstructured":"Shoham, Y., Powers, R., & Grenager, T. (2007). If multi-agent learning is the answer, what is the question? Artificial Intelligence, 171(7), 365\u2013377.","journal-title":"Artificial Intelligence"},{"key":"5192_CR42","volume-title":"The stag hunt and the evolution of social structure","author":"B. Skyrms","year":"2004","unstructured":"Skyrms, B. (2004). The stag hunt and the evolution of social structure. Cambridge: Cambridge University Press."},{"key":"5192_CR43","unstructured":"Stimpson, J. R., & Goodrich, M. A. (2003). Learning to cooperate in a social dilemma: a satisficing approach to bargaining. In Proceedings of the 20th international conference on machine learning (pp.\u00a0728\u2013735)."},{"key":"5192_CR44","unstructured":"Stimpson, J. R., Goodrich, M. A., & Walters, L. C. (2001). Satisficing and learning cooperation in the prisoner\u2019s dilemma. In Proceedings of the 17th international joint conference on artificial intelligence (pp.\u00a0535\u2013544)."},{"key":"5192_CR45","volume-title":"Reinforcement learning: an introduction","author":"R. S. Sutton","year":"1998","unstructured":"Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press."},{"key":"5192_CR46","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1016\/0025-5564(78)90077-9","volume":"40","author":"P. D. Taylor","year":"1978","unstructured":"Taylor, P. D., & Jonker, L. (1978). Evolutionarily stable strategies and game dynamics. Mathematical Biosciences, 40, 145\u2013156.","journal-title":"Mathematical Biosciences"},{"key":"5192_CR47","volume-title":"Advances in neural information processing systems","author":"G. Tesauro","year":"2004","unstructured":"Tesauro, G. (2004). Extending Q-learning to general adaptive multi-agent systems. In Advances in neural information processing systems (Vol.\u00a016). Cambridge: MIT Press."},{"key":"5192_CR48","first-page":"1571","volume":"15","author":"X. Wang","year":"2003","unstructured":"Wang, X., & Sandholm, T. (2003). Reinforcement learning to play an optimal Nash equilibrium in team Markov games. Advances in Neural Information Processing Systems, 15, 1571\u20131578.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"5192_CR49","first-page":"279","volume":"8","author":"C. J. Watkins","year":"1992","unstructured":"Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279\u2013292.","journal-title":"Machine Learning"},{"key":"5192_CR50","first-page":"506","volume-title":"Proceedings of the third international joint conference on autonomous agents and multiagent systems","author":"M. Weinberg","year":"2004","unstructured":"Weinberg, M., & Rosenschein, J. S. (2004). Best-response multiagent learning in non-stationary environments. In Proceedings of the third international joint conference on autonomous agents and multiagent systems, Washington, DC, USA (pp. 506\u2013513). Los Alamitos: IEEE Comput. Soc."},{"issue":"1","key":"5192_CR51","doi-asserted-by":"crossref","first-page":"57","DOI":"10.2307\/2951778","volume":"61","author":"H. P. Young","year":"1993","unstructured":"Young, H. P. (1993). The evolution of conventions. Econometrica, 61(1), 57\u201384.","journal-title":"Econometrica"},{"key":"5192_CR52","unstructured":"Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In 20th inter. conf. on machine learning (pp.\u00a0228\u2013236)."}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-010-5192-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10994-010-5192-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-010-5192-9","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,5,31]],"date-time":"2019-05-31T21:40:29Z","timestamp":1559338829000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10994-010-5192-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,5,20]]},"references-count":52,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2011,3]]}},"alternative-id":["5192"],"URL":"https:\/\/doi.org\/10.1007\/s10994-010-5192-9","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,5,20]]}}}