{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T04:04:13Z","timestamp":1777521853419,"version":"3.51.4"},"reference-count":60,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2007,3,1]],"date-time":"2007-03-01T00:00:00Z","timestamp":1172707200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Adaptive Behavior"],"published-print":{"date-parts":[[2007,3]]},"abstract":"<jats:p>To excel in challenging tasks, intelligent agents need sophisticated mechanisms for action selection: they need policies that dictate what action to take in each situation. Reinforcement learning (RL) algorithms are designed to learn such policies given only positive and negative rewards. Two contrasting approaches to RL that are currently in popular use are temporal difference (TD) methods, which learn value functions, and evolutionary methods, which optimize populations of candidate policies. Both approaches have had practical successes but few studies have directly compared them. Hence, there are no general guidelines describing their relative strengths and weaknesses. In addition, there has been little cross-collaboration, with few attempts to make them work together or to apply ideas from one to the other. In this article we aim to address these shortcomings via three empirical studies that compare these methods and investigate new ways of making them work together.<\/jats:p>\n                  <jats:p>First, we compare the two approaches in a benchmark task and identify variations of the task that isolate factors critical to the performance of each method. Second, we investigate ways to make evolutionary algorithms excel at on-line tasks by borrowing exploratory mechanisms traditionally used by TD methods. We present empirical results demonstrating a dramatic performance improvement. Third, we explore a novel way of making evolutionary and TD methods work together by using evolution to automatically discover good representations for TD function approximators. We present results demonstrating that this novel approach can outperform both TD and evolutionary methods alone.<\/jats:p>","DOI":"10.1177\/1059712306076253","type":"journal-article","created":{"date-parts":[[2007,2,27]],"date-time":"2007-02-27T09:20:35Z","timestamp":1172568035000},"page":"33-50","source":"Crossref","is-referenced-by-count":34,"title":["Empirical Studies in Action Selection with Reinforcement Learning"],"prefix":"10.1177","volume":"15","author":[{"given":"Shimon","family":"Whiteson","sequence":"first","affiliation":[{"name":"Department of Computer Sciences, University of Texas, Austin, USA,"}]},{"given":"Matthew E.","family":"Taylor","sequence":"additional","affiliation":[{"name":"Department of Computer Sciences, University of Texas, Austin, USA"}]},{"given":"Peter","family":"Stone","sequence":"additional","affiliation":[{"name":"Department of Computer Sciences, University of Texas, Austin, USA"}]}],"member":"179","published-online":{"date-parts":[[2007,3,1]]},"reference":[{"key":"atypb1","first-page":"487","volume":"10","author":"Ackley, D.","year":"1991","journal-title":"Artificial Life II, SFI Studies in the Sciences of Complexity"},{"key":"atypb2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1013689704352"},{"key":"atypb3","volume-title":"Advances in neural information processing systems 11","author":"Baird, L.","year":"1999"},{"key":"atypb4","doi-asserted-by":"publisher","DOI":"10.1086\/276408"},{"key":"atypb5","first-page":"777","volume-title":"Proceedings of the 2002 World Congress on Evolutionary Computation","author":"Beielstein, T."},{"key":"atypb6","first-page":"221","volume":"16","author":"Bellman, R. E.","year":"1956","journal-title":"Sankhya"},{"key":"atypb7","volume-title":"Evolving artificial neural networks using the \u201cBaldwin Effect\u201d","author":"Boers, E.","year":"1995"},{"key":"atypb8","volume-title":"Advances in neural information processing systems 7","author":"Boyan, J. A.","year":"1995"},{"key":"atypb9","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007518724497"},{"key":"atypb10","doi-asserted-by":"publisher","DOI":"10.1023\/A:1011953410319"},{"key":"atypb11","first-page":"36","volume-title":"Proceedings of the Symposium on Computational Intelligence and Learning (CoIL-2000)","author":"Giraud-Carrier, C."},{"key":"atypb12","volume-title":"Robust nonlinear control through neuroevolution","author":"Gomez, F.","year":"2002"},{"key":"atypb13","first-page":"491","volume-title":"Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2005)","author":"Gomez, F."},{"key":"atypb14","first-page":"495","volume":"1","author":"Hinton, G. E.","year":"1987","journal-title":"Complex Systems"},{"key":"atypb15","volume-title":"Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control and artificial intelligence","author":"Holland, J. H.","year":"1975"},{"key":"atypb16","first-page":"764","volume-title":"Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2002)","author":"Hsu, W. H."},{"key":"atypb17","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/4168.001.0001"},{"key":"atypb18","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2003.1160055"},{"key":"atypb19","first-page":"611","volume-title":"Proceedings of the 19th National Conference on Artificial Intelligence","author":"Kohl, N."},{"key":"atypb20","first-page":"1008","volume-title":"Advances in neural information processing systems 11","author":"Konda, V. R.","year":"1999"},{"key":"atypb21","volume-title":"Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS 2001)","author":"Kostiadis, K."},{"key":"atypb22","volume-title":"Proceedings of the International Conference on Neural Networks","author":"Kretchmar, R. M."},{"key":"atypb23","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992699"},{"key":"atypb24","doi-asserted-by":"publisher","DOI":"10.1109\/4235.728210"},{"key":"atypb25","volume-title":"Proceedings of the 20th National Conference on Artificial Intelligence","author":"Mahadevan, S."},{"key":"atypb26","first-page":"760","volume-title":"Proceedings of the 7th International Conference on Genetic Algorithms","author":"McQuesten, P."},{"key":"atypb27","doi-asserted-by":"publisher","DOI":"10.1007\/BF00114722"},{"key":"atypb28","doi-asserted-by":"publisher","DOI":"10.1613\/jair.613"},{"key":"atypb29","first-page":"406","volume-title":"Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence","author":"Ng, A. Y."},{"key":"atypb30","doi-asserted-by":"publisher","DOI":"10.1177\/105971239400300102"},{"key":"atypb31","first-page":"92","volume-title":"Artificial Life V: Proceedings of the 5th International Workshop on the Synthesis and Simulation of Living Systems","author":"Pollack, J. B.","year":"1997"},{"key":"atypb32","first-page":"143","volume-title":"Algorithms for Approximation","author":"Powell, M. J. D.","year":"1987"},{"key":"atypb33","author":"Prescott, T. J.","journal-title":"Philosophical Transactions of the Royal Society B: Biological Sciences"},{"key":"atypb34","first-page":"70","volume-title":"Proceedings of the 3rd International Symposium on Adaptive Systems: Evolutionary Computation and Probabilistic Graphical Models","author":"Pyeatt, L. D."},{"key":"atypb35","first-page":"317","volume-title":"Proceedings of the 16th European Conference on Machine Learning","author":"Reidmiller, M."},{"key":"atypb36","first-page":"632","volume-title":"Proceedings of the 20th International Conference on Machine Learning","author":"Rivest, F."},{"key":"atypb37","volume-title":"On-line Q-learning using connectionist systems","author":"Rummery, G. A.","year":"1994"},{"key":"atypb38","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2005.856212"},{"key":"atypb39","first-page":"194","volume-title":"Proceedings of the Symposium on Abstraction, Reformulation and Approximation (SARA 2005), Lecture Notes in Artificial Intelligence","author":"Sherstov, A. A."},{"key":"atypb40","doi-asserted-by":"publisher","DOI":"10.1007\/BF00114726"},{"key":"atypb41","first-page":"903","volume-title":"Proceedings of the 17th International Conference on Machine Learning","author":"Smart, W. D."},{"key":"atypb42","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0056862"},{"key":"atypb43","first-page":"2557","volume-title":"Proceedings of the 2003 Congress on Evolutionary Computation (CEC 2003)","author":"Stanley, K. O."},{"key":"atypb44","doi-asserted-by":"publisher","DOI":"10.1162\/106365602320169811"},{"key":"atypb45","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1338"},{"key":"atypb46","volume-title":"Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2004)","author":"Stanley, K. O."},{"key":"atypb47","doi-asserted-by":"publisher","DOI":"10.1177\/105971230501300301"},{"key":"atypb48","doi-asserted-by":"publisher","DOI":"10.1007\/11780519_9"},{"key":"atypb49","doi-asserted-by":"publisher","DOI":"10.1007\/BF00115009"},{"key":"atypb50","first-page":"1057","volume-title":"Advances in neural information processing systems","author":"Sutton, R.","year":"2000"},{"key":"atypb51","first-page":"1038","volume-title":"Advances in neural information processing systems 8","author":"Sutton, R. S.","year":"1996"},{"key":"atypb52","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton, R. S.","year":"1998"},{"key":"atypb53","first-page":"1321","volume-title":"Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2006)","author":"Taylor, M. E."},{"key":"atypb54","first-page":"70","volume-title":"Proceedings of the International Conference on Autonomic Computing","author":"Walsh, W. E."},{"key":"atypb55","unstructured":"Watkins, C. (1989).\n                      Learning from Delayed Rewards\n                      . Ph.D. Thesis, King\u2019s College, Cambridge."},{"key":"atypb56","first-page":"877","volume":"7","author":"Whiteson, S.","year":"2006","journal-title":"Journal of Machine Learning Research"},{"key":"atypb57","first-page":"518","volume-title":"Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006)","author":"Whiteson, S."},{"key":"atypb58","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-005-0460-9"},{"key":"atypb59","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022674030396"},{"key":"atypb60","doi-asserted-by":"publisher","DOI":"10.1109\/5.784219"}],"container-title":["Adaptive Behavior"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1059712306076253","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1059712306076253","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T16:15:34Z","timestamp":1777392934000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1059712306076253"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,3]]},"references-count":60,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2007,3]]}},"alternative-id":["10.1177\/1059712306076253"],"URL":"https:\/\/doi.org\/10.1177\/1059712306076253","relation":{},"ISSN":["1059-7123","1741-2633"],"issn-type":[{"value":"1059-7123","type":"print"},{"value":"1741-2633","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,3]]}}}