{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T08:16:52Z","timestamp":1769761012938,"version":"3.49.0"},"reference-count":52,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2005,9,1]],"date-time":"2005-09-01T00:00:00Z","timestamp":1125532800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Adaptive Behavior"],"published-print":{"date-parts":[[2005,9]]},"abstract":"<jats:p> RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our application of episodic SMDP Sarsa(\u03bb) with linear tile-coding function approximation and variable \u03bb to learning higher-level decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, \u201cthe keepers,\u201d tries to keep control of the ball for as long as possible despite the efforts of \u201cthe takers.\u201d The keepers learn individually when to hold the ball and when to pass to a teammate. Our agents learned policies that significantly outperform a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including different field sizes and different numbers of players on each team. <\/jats:p>","DOI":"10.1177\/105971230501300301","type":"journal-article","created":{"date-parts":[[2005,9,23]],"date-time":"2005-09-23T09:57:54Z","timestamp":1127469474000},"page":"165-188","source":"Crossref","is-referenced-by-count":227,"title":["Reinforcement Learning for RoboCup Soccer Keepaway"],"prefix":"10.1177","volume":"13","author":[{"given":"Peter","family":"Stone","sequence":"first","affiliation":[{"name":"Department of Computer Sciences, The University of Texas at Austin,"}]},{"given":"Richard S.","family":"Sutton","sequence":"additional","affiliation":[{"name":"Department of Computing Science, University of Alberta,"}]},{"given":"Gregory","family":"Kuhlmann","sequence":"additional","affiliation":[{"name":"Department of Computer Sciences, The University of Texas at Austin,"}]}],"member":"179","published-online":{"date-parts":[[2005,9,1]]},"reference":[{"key":"atypb1","volume-title":"Brains, behavior, and robotics","author":"Albus, J. S.","year":"1981"},{"key":"atypb2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-64473-3_74"},{"key":"atypb3","first-page":"1019","volume-title":"Advances in neural information processing systems","author":"Andre, D.","year":"2001"},{"key":"atypb4","first-page":"119","volume-title":"Proceedings of the 18th National Conference on Artificial IntelligenceMento Park","author":"Andre, D."},{"key":"atypb5","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-48422-1_28"},{"key":"atypb6","first-page":"1615","volume-title":"International Conference on Robotics and Automation","author":"Bagnell, J. A."},{"key":"atypb7","first-page":"968","volume-title":"Advances in neural information processing systems","author":"Baird, L. C.","year":"1999"},{"key":"atypb8","volume-title":"Teambots","author":"Balch, T.","year":"2000"},{"key":"atypb9","volume-title":"Teambots domain: Soccerbots","author":"Balch, T.","year":"2000"},{"key":"atypb10","doi-asserted-by":"publisher","DOI":"10.1613\/jair.575"},{"key":"atypb11","first-page":"393","volume-title":"Advances in neural information processing systems","author":"Bradtke, S. J.","year":"1995"},{"key":"atypb12","volume-title":"Users manual: RoboCup soccer server manual for soccer server version 7.07 and later","author":"Chen, M.","year":"2003"},{"key":"atypb13","first-page":"1017","volume-title":"Advances in neural information processing systems","author":"Crites, R. H.","year":"1996"},{"key":"atypb14","first-page":"67","volume-title":"Machine learning methods for planning and scheduling","author":"Dean, T.","year":"1992"},{"key":"atypb15","doi-asserted-by":"publisher","DOI":"10.1613\/jair.639"},{"key":"atypb16","first-page":"1040","volume-title":"Advances in neural information processing systems","author":"Gordon, G.","year":"2001"},{"key":"atypb17","first-page":"1523","volume-title":"Advances in neural information processing systems","author":"Guestrin, C.","year":"2002"},{"key":"atypb18","first-page":"764","volume-title":"Genetic and Evolutionary Computation Conference (New York)","author":"Hsu, W. H."},{"key":"atypb19","first-page":"24","volume-title":"Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence","author":"Kitano, H."},{"key":"atypb20","first-page":"1332","volume-title":"Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99)","author":"Koller, D."},{"key":"atypb21","first-page":"530","volume-title":"IEEE Transactions on Neural Networks","author":"Lin, C.-S.","year":"1991"},{"key":"atypb22","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-64473-3_76"},{"key":"atypb23","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010027016147"},{"key":"atypb24","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45324-5_35"},{"key":"atypb25","first-page":"570","volume-title":"PRICAI\u201996: Topics in Artificial Intelligence (Proceedings of the Fourth Pacific Rim International Conference on Artificial Intelligence)","author":"Noda, I."},{"key":"atypb26","doi-asserted-by":"publisher","DOI":"10.1080\/088395198117848"},{"key":"atypb27","first-page":"1595","volume-title":"Advances in neural information processing systems","author":"Perkins, T. J.","year":"2003"},{"key":"atypb28","first-page":"1065","volume-title":"GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference","author":"Pietro, A. D."},{"key":"atypb29","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316887"},{"key":"atypb30","volume-title":"C4.5: Programs for machine learning","author":"Quinlan, J. R.","year":"1993"},{"key":"atypb31","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45324-5_40"},{"key":"atypb32","volume-title":"RoboCup-2002: Robot soccer world cup VI","author":"Riedmiller, M.","year":"2003"},{"key":"atypb33","volume-title":"On-line Q-learning using connectionist systems","author":"Rummery, G. A.","year":"1994"},{"key":"atypb34","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/4151.001.0001"},{"key":"atypb35","first-page":"316","volume-title":"Proceedings of the Fifth International Conference on Autonomous Agents","author":"Stone, P."},{"key":"atypb36","first-page":"537","volume-title":"Proceedings of the Eighteenth International Conference on Machine Learning","author":"Stone, P."},{"key":"atypb37","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45603-1_22"},{"key":"atypb38","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45324-5_23"},{"key":"atypb39","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-48422-1_21"},{"key":"atypb40","first-page":"1038","volume-title":"Advances in neural information processing systems","author":"Sutton, R. S.","year":"1996"},{"key":"atypb41","volume-title":"Reinforcement learning: An introduction","author":"Sutton, R. S.","year":"1998"},{"key":"atypb42","first-page":"1057","volume-title":"Advances in neural information processing systems","author":"Sutton, R.","year":"2000"},{"key":"atypb43","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(99)00052-1"},{"key":"atypb44","first-page":"330","volume-title":"Proceedings of the Tenth International Conference on Machine Learning","author":"Tan, M."},{"key":"atypb45","first-page":"53","volume-title":"The Fourth International Joint Conference on Autonomous Agents and Multiagent Systems","author":"Taylor, M. E."},{"key":"atypb46","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1994.6.2.215"},{"key":"atypb47","doi-asserted-by":"publisher","DOI":"10.1109\/9.580874"},{"key":"atypb48","unstructured":"Uchibe, E. (1999). Cooperative behavior acquisition by learning and                     evolution in a multi-agent environment for mobile robots. Ph.D. thesis,                 Osaka University."},{"key":"atypb49","first-page":"1122","volume-title":"Proceedings of the Genetic and Evolutionary Computation Conference","author":"Uchibe, E."},{"key":"atypb50","volume-title":"Proceedings of SPIE Sensor Fusion and Decentralized Control in Robotic Systems II","author":"Veloso, M."},{"key":"atypb51","unstructured":"Watkins, C. J. C. H. (1989). Learning from delayed rewards. Ph.D.                 thesis, King\u2019s College, Cambridge."},{"key":"atypb52","first-page":"193","volume-title":"Second International Joint Conference on Autonomous Agents and Multiagent Systems","author":"Whiteson, S."}],"container-title":["Adaptive Behavior"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/105971230501300301","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/105971230501300301","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,28]],"date-time":"2025-01-28T02:21:03Z","timestamp":1738030863000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/105971230501300301"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,9]]},"references-count":52,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2005,9]]}},"alternative-id":["10.1177\/105971230501300301"],"URL":"https:\/\/doi.org\/10.1177\/105971230501300301","relation":{},"ISSN":["1059-7123","1741-2633"],"issn-type":[{"value":"1059-7123","type":"print"},{"value":"1741-2633","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,9]]}}}