{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T10:08:50Z","timestamp":1777716530748,"version":"3.51.4"},"reference-count":26,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2014,12,16]],"date-time":"2014-12-16T00:00:00Z","timestamp":1418688000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2015,2]]},"abstract":"<jats:p>This paper examines temporal difference reinforcement learning with adaptive and directed exploration for resource-limited missions. The scenario considered is that of an unpowered aerial glider learning to perform energy-gaining flight trajectories in a thermal updraft. The presented algorithm, eGP-SARSA( \u03bb), uses a Gaussian process regression model to estimate the value function in a reinforcement learning framework. The Gaussian process also provides a variance on these estimates that is used to measure the contribution of future observations to the Gaussian process value function model in terms of information gain. To avoid myopic exploration we developed a resource-weighted objective function that combines an estimate of the future information gain using an action rollout with the estimated value function to generate directed explorative action sequences. A number of modifications and computational speed-ups to the algorithm are presented along with a standard GP-SARSA( \u03bb) implementation with [Formula: see text]-greedy exploration to compare the respective learning performances. The results show that under this objective function, the learning agent is able to continue exploring for better state-action trajectories when platform energy is high and follow conservative energy-gaining trajectories when platform energy is low.<\/jats:p>","DOI":"10.1177\/0278364914553683","type":"journal-article","created":{"date-parts":[[2014,12,16]],"date-time":"2014-12-16T23:19:38Z","timestamp":1418771978000},"page":"158-172","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":30,"title":["Learning to soar: Resource-constrained exploration in reinforcement learning"],"prefix":"10.1177","volume":"34","author":[{"given":"Jen Jen","family":"Chung","sequence":"first","affiliation":[{"name":"Australian Centre for Field Robotics, School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Australia"}]},{"given":"Nicholas R.J.","family":"Lawrance","sequence":"additional","affiliation":[{"name":"Australian Centre for Field Robotics, School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Australia"}]},{"given":"Salah","family":"Sukkarieh","sequence":"additional","affiliation":[{"name":"Australian Centre for Field Robotics, School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Australia"}]}],"member":"179","published-online":{"date-parts":[[2014,12,16]]},"reference":[{"key":"bibr1-0278364914553683","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2009.11.005"},{"key":"bibr2-0278364914553683","doi-asserted-by":"publisher","DOI":"10.1016\/j.paerosci.2013.03.001"},{"key":"bibr3-0278364914553683","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913488427"},{"key":"bibr4-0278364914553683","volume-title":"A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning","author":"Brochu E","year":"2010"},{"key":"bibr5-0278364914553683","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2013.6630938"},{"key":"bibr6-0278364914553683","unstructured":"Csat\u00f3 L (2002) Gaussian processes: Iterative sparse approximations. PhD Thesis, Aston University, UK."},{"key":"bibr7-0278364914553683","doi-asserted-by":"publisher","DOI":"10.1162\/089976602317250933"},{"key":"bibr8-0278364914553683","first-page":"761","author":"Dearden R","year":"1998","journal-title":"Proceedings of the national conference on artificial intelligence"},{"key":"bibr9-0278364914553683","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2008.12.019"},{"key":"bibr10-0278364914553683","first-page":"154","author":"Engel Y","year":"2003","journal-title":"Proceedings of the 20th international conference on machine learning"},{"key":"bibr11-0278364914553683","doi-asserted-by":"publisher","DOI":"10.1145\/1102351.1102377"},{"key":"bibr12-0278364914553683","doi-asserted-by":"publisher","DOI":"10.1177\/0278364912467485"},{"key":"bibr13-0278364914553683","first-page":"3077","author":"Kim D","year":"2012","journal-title":"Advances in neural information processing systems"},{"key":"bibr14-0278364914553683","doi-asserted-by":"publisher","DOI":"10.1177\/0278364907087426"},{"key":"bibr15-0278364914553683","unstructured":"Lawrance NRJ (2011) Autonomous soaring flight for unmanned aerial vehicles. PhD Thesis, The University of Sydney, Australia."},{"key":"bibr16-0278364914553683","doi-asserted-by":"publisher","DOI":"10.2514\/1.52236"},{"key":"bibr17-0278364914553683","doi-asserted-by":"publisher","DOI":"10.2514\/6.2010-3360"},{"key":"bibr18-0278364914553683","doi-asserted-by":"publisher","DOI":"10.1007\/BF00114731"},{"key":"bibr19-0278364914553683","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/3206.001.0001"},{"key":"bibr20-0278364914553683","volume-title":"On-line Q-learning using connectionist systems","author":"Rummery GA","year":"1994"},{"key":"bibr21-0278364914553683","doi-asserted-by":"publisher","DOI":"10.1613\/jair.2674"},{"key":"bibr22-0278364914553683","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2005.I.009"},{"key":"bibr23-0278364914553683","doi-asserted-by":"publisher","DOI":"10.1007\/s12064-011-0142-z"},{"key":"bibr24-0278364914553683","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton RS","year":"1998"},{"key":"bibr25-0278364914553683","first-page":"91","author":"Sutton RS","year":"1994","journal-title":"The proceedings of the eighth Yale workshop on adaptive and learning systems"},{"key":"bibr26-0278364914553683","unstructured":"Watkins CJCH (1989) Learning from delayed rewards. PhD Thesis, King\u2019s College London, UK."}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364914553683","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/0278364914553683","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364914553683","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T10:18:44Z","timestamp":1777457924000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0278364914553683"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,12,16]]},"references-count":26,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2015,2]]}},"alternative-id":["10.1177\/0278364914553683"],"URL":"https:\/\/doi.org\/10.1177\/0278364914553683","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,12,16]]}}}