{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,10,3]],"date-time":"2023-10-03T17:33:41Z","timestamp":1696354421836},"reference-count":55,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2009,8,20]],"date-time":"2009-08-20T00:00:00Z","timestamp":1250726400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Auton Agent Multi-Agent Syst"],"published-print":{"date-parts":[[2010,11]]},"DOI":"10.1007\/s10458-009-9104-y","type":"journal-article","created":{"date-parts":[[2009,8,19]],"date-time":"2009-08-19T05:40:34Z","timestamp":1250660434000},"page":"321-367","source":"Crossref","is-referenced-by-count":8,"title":["Coordinated learning in multiagent MDPs with infinite state-space"],"prefix":"10.1007","volume":"21","author":[{"given":"Francisco S.","family":"Melo","sequence":"first","affiliation":[]},{"given":"M. Isabel","family":"Ribeiro","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2009,8,20]]},"reference":[{"issue":"4","key":"9104_CR1","doi-asserted-by":"crossref","first-page":"819","DOI":"10.1287\/moor.27.4.819.297","volume":"27","author":"D. S. Bernstein","year":"2002","unstructured":"Bernstein D. S., Zilberstein S., Immerman N. (2002) The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research 27(4): 819\u2013840","journal-title":"Mathematics of Operations Research"},{"key":"9104_CR2","volume-title":"Neuro-dynamic programming optimization and neural computation series","author":"D. P. Bertsekas","year":"1996","unstructured":"Bertsekas D. P., Tsitsiklis J. N. (1996) Neuro-dynamic programming optimization and neural computation series. Athena Scientific, Belmont, MA"},{"key":"9104_CR3","unstructured":"Boutilier, C. (1999). Sequential optimality and coordination in multiagent systems. In Proceedings of the 16th international joint conference on artificial intelligence (IJCAI\u201999) (pp. 478\u2013485)."},{"key":"9104_CR4","unstructured":"Boutilier, C. (1996). Planning, learning and coordination in multiagent decision processes. In Proceedings of the 6th conference on theoretical aspects of rationality and knowledge (TARK-96) (pp. 195\u2013210)"},{"key":"9104_CR5","unstructured":"Bowling, M. (2000). Convergence problems of general-sum multiagent reinforcement learning. In Proceedings of the 17th international conference on machine learning (ICML\u201900) (pp 89\u201394). Morgan Kaufman."},{"key":"9104_CR6","unstructured":"Bowling, M., & Veloso, M. (2000a). An analysis of stochastic game theory for multiagent reinforcement learning. Technical Report CMU-CS-00-165, School of Computer Science, Carnegie Mellon University."},{"key":"9104_CR7","unstructured":"Bowling, M., & Veloso, M. (2000b). Scalable learning in stochastic games. In Proceedings of the AAAI workshop on game theoretic and decision theoretic agents (GTDT\u201902) (pp. 11\u201318). The AAAI Press, Published as AAAI Technical Report WS-02-06."},{"key":"9104_CR8","unstructured":"Bowling, M., & Veloso, M. (2001). Rational and convergent learning in stochastic games. In Proceedings of the 17th international joint conference on artificial intelligence (IJCAI\u201901) (pp. 1021\u20131026)."},{"key":"9104_CR9","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1016\/S0004-3702(02)00121-2","volume":"136","author":"M. Bowling","year":"2002","unstructured":"Bowling M., Veloso M. (2002) Multi-agent learning using a variable learning rate. Artificial Intelligence 136: 215\u2013250","journal-title":"Artificial Intelligence"},{"key":"9104_CR10","volume-title":"Some notes on computation of games solutions. Research Memoranda RM-125-PR","author":"G. W. Brown","year":"1949","unstructured":"Brown G. W. (1949) Some notes on computation of games solutions. Research Memoranda RM-125-PR. RAND Corporation, Santa Monica"},{"key":"9104_CR11","unstructured":"Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th national conference on artificial intelligence (AAAI\u201998) (pp. 746\u2013752)."},{"issue":"2\u20133","key":"9104_CR12","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1023\/A:1007518724497","volume":"33","author":"R. H. Crites","year":"1998","unstructured":"Crites R. H., Barto A. G. (1998) Elevator group control using multiple reinforcement learning agents. Machine Learning 33(2\u20133): 235\u2013262","journal-title":"Machine Learning"},{"key":"9104_CR13","unstructured":"Duflo, M. (1997). Random iterartive Models. In Applications of Mathematics (Vol. 34). Springer."},{"issue":"11","key":"9104_CR14","doi-asserted-by":"crossref","first-page":"1275","DOI":"10.1109\/TC.1987.5009468","volume":"36","author":"E. H. Durfee","year":"1987","unstructured":"Durfee E. H., Lesser V. R., Corkill D. D. (1987) Coherent cooperation among communicating problem solvers. IEEE Transactions on Computers 36(11): 1275\u20131291","journal-title":"IEEE Transactions on Computers"},{"key":"9104_CR15","first-page":"1","volume":"5","author":"E. Even-Dar","year":"2003","unstructured":"Even-Dar E., Mansour Y. (2003) Learning rates for Q-learning. Journal of Machine Learning Research 5: 1\u201325","journal-title":"Journal of Machine Learning Research"},{"key":"9104_CR16","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1613\/jair.1579","volume":"24","author":"P. Gmytrasiewicz","year":"2005","unstructured":"Gmytrasiewicz P., Doshi P. (2005) A framework for sequential planning in multiagent settings. Journal of Artificial Intelligence Research 24: 49\u201379","journal-title":"Journal of Artificial Intelligence Research"},{"key":"9104_CR17","doi-asserted-by":"crossref","unstructured":"Gordon, G. J. (1995). Stable function approximation in dynamic programming. Technical Report CMU-CS-95-103, School of Computer Science, Carnegie Mellon University.","DOI":"10.1016\/B978-1-55860-377-6.50040-2"},{"key":"9104_CR18","unstructured":"Guestrin, C., Lagoudakis, M. G., & Parr, R. (2002). Coordinated reinforcement learning. In Proceedings of the 19th international conference on machine learning (ICML\u201902) (pp, 227\u2013234)."},{"key":"9104_CR19","first-page":"1039","volume":"4","author":"J. Hu","year":"2003","unstructured":"Hu J., Wellman M. P. (2003) Nash Q-learning for general sum stochastic games. Journal of Machine Learning Research 4: 1039\u20131069","journal-title":"Journal of Machine Learning Research"},{"key":"9104_CR20","unstructured":"Kearns, M., & Singh, S. (1999). Finite-sample convergence rates for Q-learning and indirect algorithms. In M. J. Kearns, S. A. Solla, & D. A. Cohn, (Eds.), Advances in neural information processing systems (Vol. 11, pp. 996\u20131002). Cambridge, MA: MIT Press."},{"key":"9104_CR21","unstructured":"Kok J. R., Spaan, M. T. J., & Vlassis, N. (2002). An approach to noncommunicative multiagent coordination in continuous domains. In: M. Wiering, (Ed.), Benelearn 2002: Proceedings of the 12th Belgian\u2013Dutch conference on machine learning (pp. 46\u201352). Utrecht, The Netherlands."},{"key":"9104_CR22","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1016\/j.geb.2005.08.005","volume":"56","author":"D. S. Leslie","year":"2006","unstructured":"Leslie D. S., Collins E. J. (2006) Generalised weakened fictitious play. Games and Economic Behavior 56: 285\u2013298","journal-title":"Games and Economic Behavior"},{"key":"9104_CR23","doi-asserted-by":"crossref","unstructured":"Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In R. L\u00f3pez de M\u00e1ntaras, & D. Poole (Eds.), Proceedings of the 11th international conference on machine learning (ICML\u201994) (pp. 157\u2013163). San Francisco, CA: Morgan Kaufmann.","DOI":"10.1016\/B978-1-55860-335-6.50027-1"},{"issue":"1","key":"9104_CR24","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1016\/S1389-0417(01)00015-8","volume":"2","author":"M. L. Littman","year":"2001","unstructured":"Littman M. L. (2001) Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research 2(1): 55\u201366","journal-title":"Journal of Cognitive Systems Research"},{"key":"9104_CR25","unstructured":"Littman, M. L. (2001b). Friend-or-foe Q-learning in general-sum games. In Proceedings of the 18th international conference on machine learning (ICML\u201901) (pp. 322\u2013328). San Francisco, CA: Morgan Kaufmann."},{"key":"9104_CR26","unstructured":"Melo, F. S., & Ribeiro, M. I. (2007a). Rational and convergent model-free adaptive learning for team Markov games. Technical Report RT-601-07, Institute for Systems and Robotics, February."},{"key":"9104_CR27","doi-asserted-by":"crossref","unstructured":"Melo, F. S., & Ribeiro, M. I. (2007b). Learning to coordinate in topological navigation tasks. In Proceedings of the 6th IFAC symposium on intelligent autonomous vehicles (IAV\u201907) (to appear), September.","DOI":"10.3182\/20070903-3-FR-2921.00009"},{"key":"9104_CR28","unstructured":"Melo, F. S., & Ribeiro, M. I. (2008). Emerging coordination in infinite team Markov games. In Proceedings of the 7th international conference on autonomous agents and multiagent systems (AAMAS\u201908) (pp. 355\u2013362)."},{"key":"9104_CR29","unstructured":"Melo, F. S., & Veloso, M. (2009). Learning of coordination: Exploiting sparse interactions in multiagent systems. In Proceedings of the 8th international conference on autonomous agents and multiagent systems (AAMAS\u201908) (pp. 773\u2013780)."},{"key":"9104_CR30","doi-asserted-by":"crossref","unstructured":"Melo, F. S., Meyn, S. P., & Ribeiro, M. I. (2008). An analysis of reinforcement learning with function approximation. In Proceedings of the 25th international conference on machine learning (ICML\u201908) (pp. 664\u2013671).","DOI":"10.1145\/1390156.1390240"},{"key":"9104_CR31","doi-asserted-by":"crossref","unstructured":"Meyn, S. P., & Tweedie, R. L. (1993). Markov chains and stochastic stability. Communicatons and Control Engineering Series. New York: Springer.","DOI":"10.1007\/978-1-4471-3267-7"},{"key":"9104_CR32","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1073\/pnas.36.1.48","volume":"36","author":"J. F. Nash","year":"1950","unstructured":"Nash J. F. (1950) Equilibrium points in n-person games. Proceedings of the National Academy of Sciences 36: 48\u201349","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"9104_CR33","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1023\/A:1017928328829","volume":"49","author":"D. Ormoneit","year":"2002","unstructured":"Ormoneit D., Sen \u015a. (2002) Kernel-based reinforcement learning. Machine Learning 49: 161\u2013178","journal-title":"Machine Learning"},{"key":"9104_CR34","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1016\/S0304-4149(98)00029-5","volume":"78","author":"M. Pelletier","year":"1998","unstructured":"Pelletier M. (1998) On the almost sure asymptotic behaviour of stochastic algorithms. Stochastic Processes and their Applications 78: 217\u2013244","journal-title":"Stochastic Processes and their Applications"},{"key":"9104_CR35","first-page":"1595","volume-title":"Advances in neural information processing systems","author":"T. J. Perkins","year":"2003","unstructured":"Perkins T. J., Precup D. (2003) A convergent form of approximate policy iteration. In: Thrun S., Becker S., Obermayer K. (eds) Advances in neural information processing systems. MIT Press, Cambridge, MA, pp 1595\u20131602"},{"key":"9104_CR36","doi-asserted-by":"crossref","first-page":"296","DOI":"10.2307\/1969530","volume":"54","author":"J. Robinson","year":"1951","unstructured":"Robinson J. (1951) An iterative method of solving a game. Annals of Mathematics 54: 296\u2013301","journal-title":"Annals of Mathematics"},{"key":"9104_CR37","doi-asserted-by":"crossref","unstructured":"Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3), 210\u2013229. Reprinted in IBM Journal of Research and Development, 44(1\/2), 206\u2013226, 2000.","DOI":"10.1147\/rd.33.0210"},{"key":"9104_CR38","doi-asserted-by":"crossref","first-page":"601","DOI":"10.1147\/rd.116.0601","volume":"11","author":"A. L. Samuel","year":"1967","unstructured":"Samuel A. L. (1967) Some studies in machine learning using the game of checkers II: Recent progress. IBM Journal of Research and Development 11: 601\u2013617","journal-title":"IBM Journal of Research and Development"},{"key":"9104_CR39","first-page":"259","volume-title":"Learning in multiagent systems, chapter 6","author":"S. Sen","year":"1999","unstructured":"Sen S., Wei\u00df G. (1999) Learning in multiagent systems, chapter 6. MIT Press, Cambridge, MA, pp 259\u2013298"},{"key":"9104_CR40","unstructured":"Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994). Reinforcement learning with soft state aggregation. In Advances in neural information processing systems (Vol. 7, pp. 361\u2013368). Cambridge, MA: MIT Press."},{"key":"9104_CR41","unstructured":"Singh, S. P., Kearns, M., & Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games. In Proceedings of the 16th conference on uncertainty in artificial intelligence (UAI\u201900) (pp. 541\u2013548)."},{"key":"9104_CR42","volume-title":"Reinforcement learning: An introduction. Adaptive computation and machine learning series","author":"R. S. Sutton","year":"1998","unstructured":"Sutton R. S., Barto A. G. (1998) Reinforcement learning: An introduction. Adaptive computation and machine learning series (3rd ed.). MIT Press, Cambridge, MA","edition":"3"},{"key":"9104_CR43","first-page":"1064","volume":"10","author":"C. Szepesv\u00e1ri","year":"1997","unstructured":"Szepesv\u00e1ri C. (1997) The asymptotic convergence rates for Q-learning. Proceedings of Neural Information Processing Systems (NIPS\u201997) 10: 1064\u20131070","journal-title":"Proceedings of Neural Information Processing Systems (NIPS\u201997)"},{"issue":"8","key":"9104_CR44","doi-asserted-by":"crossref","first-page":"2017","DOI":"10.1162\/089976699300016070","volume":"11","author":"C. Szepesv\u00e1ri","year":"1999","unstructured":"Szepesv\u00e1ri C., Littman M. L. (1999) A unified analysis of value-function-based reinforcement learning algorithms. Neural Computation 11(8): 2017\u20132059","journal-title":"Neural Computation"},{"key":"9104_CR45","unstructured":"Szepesv\u00e1ri, C., & Smart, W. D. (2004). Interpolation-based Q-learning. In Proceedings of the 21st international conference on machine learning (ICML\u201904) (pp. 100\u2013107). New York, USA: ACM Press, July."},{"issue":"2","key":"9104_CR46","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1162\/neco.1994.6.2.215","volume":"6","author":"G. Tesauro","year":"1994","unstructured":"Tesauro G. (1994) TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6(2): 215\u2013219","journal-title":"Neural Computation"},{"issue":"3","key":"9104_CR47","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1145\/203330.203343","volume":"38","author":"G. Tesauro","year":"1995","unstructured":"Tesauro G. (1995) Temporal difference learning and TD-Gammon. Communications of the ACM 38(3): 58\u201368","journal-title":"Communications of the ACM"},{"issue":"2\u20133","key":"9104_CR48","first-page":"111","volume":"49","author":"H. Tong","year":"2000","unstructured":"Tong H., Brown T. X. (2000) Reinforcement learning for call admission control and routing under quality of service constraints in multimedia networks. Machine Learning 49(2\u20133): 111\u2013139","journal-title":"Machine Learning"},{"issue":"5","key":"9104_CR49","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1109\/TAC.1985.1103988","volume":"30","author":"J. N. Tsitsiklis","year":"1985","unstructured":"Tsitsiklis J. N., Athans M. (1985) On the complexity of decentralized decision making and detection problems. IEEE Transactions on Automatic Control AC 30(5): 440\u2013446","journal-title":"IEEE Transactions on Automatic Control AC"},{"key":"9104_CR50","first-page":"59","volume":"22","author":"J. N. Tsitsiklis","year":"1996","unstructured":"Tsitsiklis J. N., Van Roy B. (1996) Feature-based methods for large scale dynamic programming. Machine Learning 22: 59\u201394","journal-title":"Machine Learning"},{"issue":"5","key":"9104_CR51","doi-asserted-by":"crossref","first-page":"674","DOI":"10.1109\/9.580874","volume":"42","author":"J. N. Tsitsiklis","year":"1996","unstructured":"Tsitsiklis J. N., Van Roy B. (1996) An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42(5): 674\u2013690","journal-title":"IEEE Transactions on Automatic Control"},{"key":"9104_CR52","unstructured":"Uther, W., & Veloso, M. (2003). Adversarial reinforcement learning. Technical Report CMU-CS-03-107, School of Computer Science, Carnegie Mellon University, January."},{"key":"9104_CR53","first-page":"1571","volume-title":"Advances in neural information processing systems","author":"X. Wang","year":"2003","unstructured":"Wang X., Sandholm T. (2003) Reinforcement learning to play an optimal Nash equilibrium in team Markov games. In: Becker S., Thrun S., Obermayer K. (eds) Advances in neural information processing systems. MIT Press, Cambridge, MA, pp 1571\u20131578"},{"key":"9104_CR54","unstructured":"Watkins, C. J. C. H. (1989). Learning from delayed rewards. PhD thesis, King\u2019s College, University of Cambridge, May."},{"issue":"1","key":"9104_CR55","doi-asserted-by":"crossref","first-page":"57","DOI":"10.2307\/2951778","volume":"61","author":"H. P. Young","year":"1993","unstructured":"Young H. P. (1993) The evolution of conventions. Econometrica 61(1): 57\u201384","journal-title":"Econometrica"}],"container-title":["Autonomous Agents and Multi-Agent Systems"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-009-9104-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10458-009-9104-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-009-9104-y","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,5,29]],"date-time":"2019-05-29T17:28:25Z","timestamp":1559150905000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10458-009-9104-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,8,20]]},"references-count":55,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2010,11]]}},"alternative-id":["9104"],"URL":"https:\/\/doi.org\/10.1007\/s10458-009-9104-y","relation":{},"ISSN":["1387-2532","1573-7454"],"issn-type":[{"value":"1387-2532","type":"print"},{"value":"1573-7454","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,8,20]]}}}