{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T04:05:00Z","timestamp":1777521900246,"version":"3.51.4"},"reference-count":26,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[2008,12,1]],"date-time":"2008-12-01T00:00:00Z","timestamp":1228089600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Adaptive Behavior"],"published-print":{"date-parts":[[2008,12]]},"abstract":"<jats:p>In this article, we explore an evolutionary approach to the optimization of potential-based shaping rewards and meta-parameters in reinforcement learning. Shaping rewards is a frequently used approach to increase the learning performance of reinforcement learning, with regards to both initial performance and convergence speed. Shaping rewards provide additional knowledge to the agent in the form of richer reward signals, which guide learning to high-rewarding states. Reinforcement learning depends critically on a few meta-parameters that modulate the learning updates or the exploration of the environment, such as the learning rate \u03b1, the discount factor of future rewards \u03b3, and the temperature \u03c4 that controls the trade-off between exploration and exploitation in softmax action selection. We validate the proposed approach in simulation using the mountain-car task. We also transfer shaping rewards and meta-parameters, evolutionarily obtained in simulation, to hardware, using a robotic foraging task.<\/jats:p>","DOI":"10.1177\/1059712308092835","type":"journal-article","created":{"date-parts":[[2008,11,13]],"date-time":"2008-11-13T08:10:00Z","timestamp":1226563800000},"page":"400-412","source":"Crossref","is-referenced-by-count":22,"title":["Co-evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning"],"prefix":"10.1177","volume":"16","author":[{"given":"Stefan","family":"Elfwing","sequence":"first","affiliation":[{"name":"Centre for Autonomous Systems, Numerical Analysis and Computer Science, KTH, Sweden, Neural Computation Unit, Okinawa Institute of Science and Technology, Japan,"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eiji","family":"Uchibe","sequence":"additional","affiliation":[{"name":"Neural Computation Unit, Okinawa Institute of Science and Technology, Japan,"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kenji","family":"Doya","sequence":"additional","affiliation":[{"name":"Neural Computation Unit, Okinawa Institute of Science and Technology, Japan,"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Henrik I.","family":"Christensen","sequence":"additional","affiliation":[{"name":"Centre for Autonomous Systems, Numerical Analysis and Computer Science, KTH, Sweden,"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2008,12,1]]},"reference":[{"key":"atypb1","unstructured":"Ackley, D.H. & Littman, M.L. (1991). Interactions between learning and evolution. In C. G. Langton, C. Taylor, C. D. Farmer, & S. Rasmussen (Eds.), Artificial Life II: Santa Fe Institute Studies in the Sciences of Complexity (Vol. 10, pp. 487-509). Redwood City, CA: Addison-Wesley."},{"key":"atypb2","volume-title":"Theories of learning","author":"Bower, G.H.","year":"1981","edition":"5"},{"key":"atypb3","doi-asserted-by":"publisher","DOI":"10.1177\/105971230501300206"},{"key":"atypb4","doi-asserted-by":"publisher","DOI":"10.1109\/CEC.2005.1554969"},{"key":"atypb5","volume-title":"Darwinian embodied evolution of the learning ability for survival","author":"Elfwing, S.","year":"2007"},{"key":"atypb6","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2006.890270"},{"key":"atypb7","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2003.1250664"},{"key":"atypb8","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143906"},{"key":"atypb9","unstructured":"Laud, A. & DeJong, G. (2002). Reinforcement learning and shaping: Encouraging intended behaviors. In Proceedings of the International Conference on Machine Learning, ICML2002 (pp. 355-362). San Francisco, CA: Morgan Kaufmann."},{"key":"atypb10","unstructured":"Laud, A. & DeJong, G. (2003). The influence of reward on the speed of reinforcement learning: An analysis of shaping. In Proceedings of the International Conference on Machine Learning, ICML2003 (pp. 440-447). San Francisco, CA: Morgan Kaufmann."},{"key":"atypb11","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273572"},{"key":"atypb12","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-335-6.50030-1"},{"key":"atypb13","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008819414322"},{"key":"atypb14","doi-asserted-by":"publisher","DOI":"10.1613\/jair.613"},{"key":"atypb15","unstructured":"Ng, A.Y., Harada, D. & Russell, S.J. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the International Conference on Machine Learning, ICML1999 (pp. 278-287). San Francisco, CA: Morgan Kaufmann."},{"key":"atypb16","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/2889.001.0001","volume-title":"Evolutionary robotics. The biology, intelligence, and technology of self-organizing machines","author":"Nolfi, S.","year":"2000"},{"key":"atypb17","volume-title":"Learning to drive a bicycle using reinforcement learning and shaping","author":"Randl\u00f8v, J.","year":"1998"},{"key":"atypb18","volume-title":"On-line Q-learning using connectionist systems. Technical Report CUED\/F-INFENG\/ TR 166","author":"Rummery, G.A.","year":"1994"},{"key":"atypb19","doi-asserted-by":"publisher","DOI":"10.1007\/BF00114726"},{"key":"atypb20","volume-title":"The behavior of organisms: An experimental analysis","author":"Skinner, B.F.","year":"1938"},{"key":"atypb21","unstructured":"Stanley, K.O. & Miikkulainen, R. (2002). Effcient reinforcement learning through evolving neural network topologies. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO2002 (pp. 569-577). San Francisco, CA: Morgan Kaufmann."},{"key":"atypb22","unstructured":"Sutton, R.S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8 (pp. 1038-1044). Cambridge, MA: MIT Press."},{"key":"atypb23","volume-title":"Reinforcement learning: An introduction","author":"Sutton, R.S.","year":"1998"},{"key":"atypb24","doi-asserted-by":"publisher","DOI":"10.1016\/S0921-8890(02)00170-7"},{"key":"atypb25","first-page":"877","volume":"7","author":"Whiteson, S.","year":"2006","journal-title":"Journal of Machine Learning Research"},{"key":"atypb26","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1190"}],"container-title":["Adaptive Behavior"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1059712308092835","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1059712308092835","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T16:15:43Z","timestamp":1777392943000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1059712308092835"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,12]]},"references-count":26,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2008,12]]}},"alternative-id":["10.1177\/1059712308092835"],"URL":"https:\/\/doi.org\/10.1177\/1059712308092835","relation":{},"ISSN":["1059-7123","1741-2633"],"issn-type":[{"value":"1059-7123","type":"print"},{"value":"1741-2633","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,12]]}}}