{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T17:41:58Z","timestamp":1779385318413,"version":"3.53.1"},"reference-count":62,"publisher":"Maximum Academic Press","issue":"1","license":[{"start":{"date-parts":[[2012,2,22]],"date-time":"2012-02-22T00:00:00Z","timestamp":1329868800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["The Knowledge Engineering Review"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>In the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms\u2019 strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.<\/jats:p>","DOI":"10.1017\/s0269888912000057","type":"journal-article","created":{"date-parts":[[2012,2,22]],"date-time":"2012-02-22T09:45:01Z","timestamp":1329903901000},"page":"1-31","source":"Crossref","is-referenced-by-count":289,"title":["Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems"],"prefix":"10.48130","volume":"27","author":[{"given":"Laetitia","family":"Matignon","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Guillaume J.","family":"Laurent","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nadine","family":"Le Fort-Piat","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"27968","published-online":{"date-parts":[[2012,2,22]]},"reference":[{"key":"S0269888912000057_ref62","volume-title":"Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey","author":"Yang","year":"2004"},{"key":"S0269888912000057_ref58","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"S0269888912000057_ref54","doi-asserted-by":"publisher","DOI":"10.1017\/S026988890500041X"},{"key":"S0269888912000057_ref53","doi-asserted-by":"crossref","unstructured":"Tumer K. , Agogino A. 2007. Distributed agent-based air traffic flow management In AAMAS \u201807: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, 1\u20138. ACM.","DOI":"10.1145\/1329125.1329434"},{"key":"S0269888912000057_ref59","unstructured":"Wolpert D. H. , Tumer K. 1999. An Introduction to Collective Intelligence. Technical Report NASA-ARC-IC-99-63, NASA Ames Research Center."},{"key":"S0269888912000057_ref44","unstructured":"Peshkin L. , Kim K.-E. , Meuleau N. , Kaelbling L. P. 2000. Learning to cooperate via policy search. In 16th Conference on Uncertainty in Artificial Intelligence, 307\u2013314. Morgan Kaufmann."},{"key":"S0269888912000057_ref43","first-page":"423","article-title":"Theoretical advantages of lenient learners: an evolutionary game theoretic perspective","volume":"9","author":"Panait","year":"2008","journal-title":"Journal of Machine Learning Research"},{"key":"S0269888912000057_ref40","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.36.1.48"},{"key":"S0269888912000057_ref39","doi-asserted-by":"crossref","unstructured":"Melo F. S. , Lopes M. C. 2007. Convergence of independent adaptive learners. In Progress in Artificial Intelligence: 13th Portuguese Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence, 4874, 555\u2013567. Springer-Verlag.","DOI":"10.1007\/978-3-540-77002-2_47"},{"key":"S0269888912000057_ref38","first-page":"58","article-title":"Learning to cooperate in multi-agent systems by combining q-learning and evolutionary strategy","volume":"1","author":"McGlohon","year":"2005","journal-title":"International Journal on Lateral Computing"},{"key":"S0269888912000057_ref34","doi-asserted-by":"publisher","DOI":"10.1007\/s10846-010-9396-9"},{"key":"S0269888912000057_ref37","unstructured":"Matignon L. , Laurent G. J. , Le Fort-Piat N. 2008. A study of FMQ heuristic in cooperative multi-agent games. In Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems. Workshop 10 : Multi-Agent Sequential Decision Making in Uncertain Multi-Agent Domains (AAMAS 08), Estoril, Portugal."},{"key":"S0269888912000057_ref35","doi-asserted-by":"crossref","unstructured":"Matignon L. , Laurent G. J. , Le Fort-Piat N. 2006. Reward function and initial values : better choices for accelerated goal-directed reinforcement learning. In Proceedings of the 16th International Conference on Artificial Neural Networks (ICANN'06), Lecture Notes in Computer Science, 4131, 840\u2013849. Springer.","DOI":"10.1007\/11840817_87"},{"key":"S0269888912000057_ref33","doi-asserted-by":"publisher","DOI":"10.1080\/095281398146806"},{"key":"S0269888912000057_ref32","doi-asserted-by":"publisher","DOI":"10.1177\/02783640122067543"},{"key":"S0269888912000057_ref60","doi-asserted-by":"publisher","DOI":"10.1142\/S0219525901000188"},{"key":"S0269888912000057_ref28","unstructured":"Lauer M. , Riedmiller M. 2000. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the 17th International Conference on Machine Learning, 535\u2013542. Morgan Kaufmann."},{"key":"S0269888912000057_ref25","unstructured":"Kapetanakis S. , Kudenko D. 2004. Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems In AAMAS \u201804: Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems, 1258\u20131259. IEEE Computer Society."},{"key":"S0269888912000057_ref24","doi-asserted-by":"crossref","unstructured":"Kapetanakis S. , Kudenko D. 2002. Reinforcement learning of coordination in cooperative multi-agent systems. In Proceedings of the 9th NCAI, Dechter, R., Kearns, M. & Sutton, R. (eds.). Edmonton, Alberta, Canada.","DOI":"10.1007\/3-540-44826-8_2"},{"key":"S0269888912000057_ref21","doi-asserted-by":"crossref","unstructured":"Gomes E. R. , Kowalczyk R. 2009. Dynamic analysis of multiagent-learning with &epsilon;-greedy exploration. In ICML'09: Proceedings of the 26th International Conference on Machine Learning, 47. ACM.","DOI":"10.1145\/1553374.1553422"},{"key":"S0269888912000057_ref20","first-page":"32","article-title":"Multi-agent case-based reasoning for cooperative reinforcement learners","author":"Gabel","year":"2006","journal-title":"Proceedings of the ECCBR"},{"key":"S0269888912000057_ref19","unstructured":"Fulda N. , Ventura D. 2007. Predicting and preventing coordination problems in cooperative q-learning systems. In Proceedings of the International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc."},{"key":"S0269888912000057_ref15","doi-asserted-by":"crossref","unstructured":"Busoniu L. , Babuska R. , De Schutter B. 2006. Decentralized reinforcement learning control of a robotic manipulator. In Proceedings of the 9th International Conference on Control, Automation, Robotics and Vision (ICARCV 2006), 1347\u20131352. Singapore.","DOI":"10.1109\/ICARCV.2006.345351"},{"key":"S0269888912000057_ref26","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-32274-0_7"},{"key":"S0269888912000057_ref13","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1613\/jair.1154","article-title":"Learning to coordinate efficiently: a model-based approach","volume":"19","author":"Brafman","year":"2003","journal-title":"Journal of Artificial Intelligence Research"},{"key":"S0269888912000057_ref12","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(02)00121-2"},{"key":"S0269888912000057_ref50","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton","year":"1998"},{"key":"S0269888912000057_ref9","first-page":"478","article-title":"Sequential optimality and coordination in multiagent systems","author":"Boutilier","year":"1999","journal-title":"IJCAI"},{"key":"S0269888912000057_ref18","doi-asserted-by":"publisher","DOI":"10.1017\/S0269888906000956"},{"key":"S0269888912000057_ref8","first-page":"195","article-title":"Planning, learning and coordination in multiagent decision processes","author":"Boutilier","year":"1996","journal-title":"Theoretical Aspects of Rationality and Knowledge"},{"key":"S0269888912000057_ref7","volume-title":"On Optimal Cooperation of Knowledge Sources \u2013 an Experimental Investigation","author":"Benda","year":"1986"},{"key":"S0269888912000057_ref6","doi-asserted-by":"publisher","DOI":"10.1080\/09528130412331297956"},{"key":"S0269888912000057_ref3","first-page":"2635","article-title":"Multi-agent reinforcement learning in common interest and fixed sum stochastic games: an experimental study","volume":"9","author":"Bab","year":"2008","journal-title":"Journal of Machine Learning Research"},{"key":"S0269888912000057_ref2","doi-asserted-by":"crossref","unstructured":"Agogino A. , Turner K. 2005. Multi-agent reward analysis for learning in noisy domains. In Proceedings of the 4th InternationalJoint Conference on Autonomous Agents and Multiagent Systems, AAMAS'05, 81\u201388. ACM.","DOI":"10.1145\/1082473.1082486"},{"key":"S0269888912000057_ref47","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.39.10.1953"},{"key":"S0269888912000057_ref16","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-32274-0_4"},{"key":"S0269888912000057_ref1","doi-asserted-by":"crossref","first-page":"521","DOI":"10.1613\/jair.2628","article-title":"A multiagent reinforcement learning algorithm with non-linear dynamics","volume":"33","author":"Abdallah","year":"2008","journal-title":"Journal of Artificial Intelligence Research"},{"key":"S0269888912000057_ref10","first-page":"209","volume-title":"Advances in Neural Information Processing Systems","author":"Bowling","year":"2005"},{"key":"S0269888912000057_ref48","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007678930559"},{"key":"S0269888912000057_ref30","article-title":"The world of independent learners is not Markovian","volume":"15","author":"Laurent","year":"2010","journal-title":"Innovation in Knowledge-Based and Intelligent Engineering Systems"},{"key":"S0269888912000057_ref61","unstructured":"Wunder M. , Littman M. L. , Babes M. 2010. Classes of multiagent q-learning dynamics with epsilon-greedy exploration. In ICML'10: Proceedings of the 27th international Conference on Machine Learning, 1167\u20131174. Omni Press."},{"key":"S0269888912000057_ref45","doi-asserted-by":"publisher","DOI":"10.1080\/095281398146798"},{"key":"S0269888912000057_ref41","volume-title":"A Course in Game Theory","author":"Osborne","year":"1994"},{"key":"S0269888912000057_ref56","first-page":"3694","article-title":"Multi-robot box-pushing: single-agent q-learning vs. team q-learning","author":"Wang","year":"2006","journal-title":"Proceedings opf the IROS"},{"key":"S0269888912000057_ref55","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-006-9007-0"},{"key":"S0269888912000057_ref51","doi-asserted-by":"crossref","unstructured":"Tan M. 1993. Multiagent reinforcement learning: independent vs. cooperative agents. In Proceedings of the 10th International Conference on Machine Learning, 330\u2013337. Morgan Kaufmann.","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"key":"S0269888912000057_ref42","doi-asserted-by":"crossref","unstructured":"Panait L. , Sullivan K. , Luke S. 2006. Lenient learners in cooperative multiagent systems. In AAMAS '06: Proceedings of the 5th International Joint Conference on Autonomous Agents and Multiagent Systems, 801\u2013803. ACM Press.","DOI":"10.1145\/1160633.1160776"},{"key":"S0269888912000057_ref36","doi-asserted-by":"crossref","unstructured":"Matignon L. , Laurent G. J. , Le Fort-Piat N. 2007. Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In Proceedings of IEEE\/RSJ International Conference on Intelligent Robots and Systems IROS 2007, 64\u201369.","DOI":"10.1109\/IROS.2007.4399095"},{"key":"S0269888912000057_ref46","unstructured":"Sen S. , Sekaran M. , Hale J. 1994. Learning to coordinate without sharing information. In Proceedings of the 12th National Conference on Artificial Intelligence, 426\u2013431, Seattle, WA."},{"key":"S0269888912000057_ref23","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1613\/jair.301","article-title":"Reinforcement learning: a survey","volume":"4","author":"Kaelbling","year":"1996","journal-title":"Journal of Artificial Intelligence Research"},{"key":"S0269888912000057_ref27","doi-asserted-by":"crossref","unstructured":"Kuyer L. , Whiteson S. , Bakker B. , Vlassis N. 2008. Multiagent reinforcement learning for urban traffic control using coordination graphs. In ECML PKDD '08: Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases \u2013 Part I, Lecture Notes in Computer Science, 5211, 656\u2013671. Springer.","DOI":"10.1007\/978-3-540-87479-9_61"},{"key":"S0269888912000057_ref14","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCC.2007.913919"},{"key":"S0269888912000057_ref57","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2007.05.006"},{"key":"S0269888912000057_ref11","unstructured":"Bowling M. , Veloso M. 2000. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning. Technical report, Computer Science Department, Carnegie Mellon University."},{"key":"S0269888912000057_ref22","first-page":"1039","article-title":"Nash q-learning for general-sum stochastic games","volume":"4","author":"Hu","year":"2003","journal-title":"Journal of Machine Learning Research"},{"key":"S0269888912000057_ref17","unstructured":"Claus C. , Boutilier C. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th National Conference on Artificial Intelligence, 746\u2013752, American Association for Artificial Intelligence."},{"key":"S0269888912000057_ref49","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008942012299"},{"key":"S0269888912000057_ref5","doi-asserted-by":"crossref","unstructured":"Banerjee B. , Peng J. 2003. Adaptive policy gradient in multiagent learning. In AAMAS '03: Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems, 686\u2013692. ACM.","DOI":"10.1145\/860575.860686"},{"key":"S0269888912000057_ref31","doi-asserted-by":"publisher","DOI":"10.1016\/S1389-0417(01)00015-8"},{"key":"S0269888912000057_ref52","first-page":"1","article-title":"A multiagent approach to managing air traffic flow","volume":"24","author":"Tumer","year":"2010","journal-title":"Journal of Autonomous Agents and Multi-Agent Systems"},{"key":"S0269888912000057_ref29","first-page":"1516","article-title":"Reinforcement learning for stochastic cooperative multi-agent systems","volume":"03","author":"Lauer","year":"2004","journal-title":"Autonomous Agents and Multi-Agent Systems"},{"key":"S0269888912000057_ref4","doi-asserted-by":"publisher","DOI":"10.1007\/BF00735341"}],"container-title":["The Knowledge Engineering Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0269888912000057","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T14:44:04Z","timestamp":1767624244000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0269888912000057\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,2,22]]},"references-count":62,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,2,22]]}},"alternative-id":["S0269888912000057"],"URL":"https:\/\/doi.org\/10.1017\/s0269888912000057","relation":{},"ISSN":["0269-8889","1469-8005"],"issn-type":[{"value":"0269-8889","type":"print"},{"value":"1469-8005","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,2,22]]}}}