{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T16:12:25Z","timestamp":1769271145745,"version":"3.49.0"},"reference-count":21,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,1,8]],"date-time":"2024-01-08T00:00:00Z","timestamp":1704672000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,1,8]],"date-time":"2024-01-08T00:00:00Z","timestamp":1704672000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Intell Syst"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The application of reinforcement learning (RL) to the field of autonomous robotics has high requirements about sample efficiency, since the agent expends for interaction with the environment. One method for sample efficiency is to extract knowledge from existing samples and used to exploration. Typical RL algorithms achieve exploration using task-specific knowledge or adding exploration noise. These methods are limited to current policy improvement level and lack of long-term planning. We propose a novel active exploration deep RL algorithm for the continuous action space problem named active exploration deep reinforcement learning (AEDRL). Our method uses the Gaussian process to model dynamic model, enabling the probability description of prediction sample. Action selection is formulated as the solution of the optimization problem. Thus, the optimization objective is specifically designed for selecting samples that can minimize the uncertainty of the dynamic model. Active exploration is achieved through long-term optimized action selection. This long-term considered action exploration method is more guidance for learning. Enable intelligent agents to explore more interesting action spaces. The proposed AEDRL algorithm is evaluated on several robotic control task including classic pendulum problem and five complex articulated robots. The AEDRL can learn a controller using fewer episodes and demonstrates performance and sample efficiency.<\/jats:p>","DOI":"10.1007\/s44196-023-00389-1","type":"journal-article","created":{"date-parts":[[2024,1,8]],"date-time":"2024-01-08T13:02:51Z","timestamp":1704718971000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction"],"prefix":"10.1007","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4863-6414","authenticated-orcid":false,"given":"Dongfang","family":"Zhao","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xu","family":"Huanshi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhang","family":"Xun","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,1,8]]},"reference":[{"key":"389_CR1","doi-asserted-by":"crossref","unstructured":"Silver, D., Singh, S., Precup, D., Sutton, R.S.: Reward is enough. Artif. Intell. 299, 103535 (2021)","DOI":"10.1016\/j.artint.2021.103535"},{"key":"389_CR2","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning (2013). arXiv preprint. arXiv:1312.5602"},{"key":"389_CR3","doi-asserted-by":"crossref","unstructured":"Van\u00a0Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence (2016)","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"389_CR4","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning (2015). arXiv preprint. arXiv:1509.02971. Accessed on 9 Sep 2015"},{"key":"389_CR5","unstructured":"Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International conference on machine learning, pp. 1889\u20131897 (2015)"},{"key":"389_CR6","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). arXiv preprint. arXiv:1707.06347. Accessed on 20 Jul 2017"},{"key":"389_CR7","unstructured":"Heess, N., TB, D., Sriram, S., Lemmon, J., Merel, J., Wayne, G.: Emergence of locomotion behaviours in rich environments (2017). arXiv:1707.02286. Accessed on 7 Jul 2017"},{"key":"389_CR8","unstructured":"Daniel, C., Neumann, G., Peters, J.R.: Hierarchical relative entropy policy search. In: International conference on artificial intelligence and statistics, pp. 273\u2013281 (2012)"},{"key":"389_CR9","doi-asserted-by":"crossref","unstructured":"Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International conference on machine learning, pp. 2778\u20132787 (2017). PMLR","DOI":"10.1109\/CVPRW.2017.70"},{"key":"389_CR10","unstructured":"Ryu, M., Chow, Y., Anderson, R., Tjandraatmadja, C., Boutilier, C.: CAQL: continuous action q-learning. In: International conference on learning representations (2020)"},{"key":"389_CR11","unstructured":"Chow, Y., Nachum, O., Faust, A., Duenez-Guzman, E., Ghavamzadeh, M.: Lyapunov-based safe policy optimization for continuous control (2019). arXiv preprint. arXiv:1901.10031. Accessed on 28 Jan 2019"},{"key":"389_CR12","unstructured":"Xiao, T., Jang, E., Kalashnikov, D., Levine, S., Ibarz, J., Hausman, K., Herzog, A.: Thinking while moving: deep reinforcement learning with concurrent control (2020). arXiv preprint. arXiv:2004.06089. Accessed on 13 Apr 2020"},{"key":"389_CR13","doi-asserted-by":"crossref","unstructured":"Brunke, L., Greeff, M., Hall, A.W., Yuan, Z., Zhou, S., Panerati, J., Schoellig, A.P.: Safe learning in robotics: from learning-based control to safe reinforcement learning. Annual review of control, robotics, and autonomous systems, 5:411\u2013444 (2021)","DOI":"10.1146\/annurev-control-042920-020211"},{"issue":"13","key":"389_CR14","doi-asserted-by":"publisher","first-page":"1608","DOI":"10.1177\/0278364910371999","volume":"29","author":"P Abbeel","year":"2010","unstructured":"Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Robot. Res. 29(13), 1608\u20131639 (2010)","journal-title":"Int. J. Robot. Res."},{"key":"389_CR15","doi-asserted-by":"crossref","unstructured":"Abbeel, P., Coates, A., Quigley, M., Ng, A.Y.: An application of reinforcement learning to aerobatic helicopter flight. In: Advances in neural information processing systems, 19, pp. 1\u20138 (2007)","DOI":"10.7551\/mitpress\/7503.003.0006"},{"issue":"7","key":"389_CR16","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1145\/1538788.1538812","volume":"52","author":"A Coates","year":"2009","unstructured":"Coates, A., Abbeel, P., Ng, A.Y.: Apprenticeship learning for helicopter control. Commun. ACM 52(7), 97\u2013105 (2009)","journal-title":"Commun. ACM"},{"key":"389_CR17","unstructured":"Ebert, F., Finn, C., Lee, A.X., Levine, S.: Self-supervised visual planning with temporal skip connections (2017). arXiv preprint. arXiv:1710.05268"},{"key":"389_CR18","unstructured":"Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: Proceedings of the 34th international conference on machine learning, vol. 70, pp. 449\u2013458 (2017). JMLR.org"},{"key":"389_CR19","unstructured":"Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, 12, pp. 1057\u20131063 (2000)"},{"issue":"3","key":"389_CR20","doi-asserted-by":"publisher","first-page":"688","DOI":"10.1109\/18.30996","volume":"35","author":"NA Ahmed","year":"1989","unstructured":"Ahmed, N.A., Gokhale, D.: Entropy expressions and their estimators for multivariate distributions. IEEE Trans. Inf. Theory 35(3), 688\u2013692 (1989)","journal-title":"IEEE Trans. Inf. Theory"},{"key":"389_CR21","unstructured":"Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International conference on machine learning, pp. 387\u2013395 (2014). PMLR"}],"container-title":["International Journal of Computational Intelligence Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44196-023-00389-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s44196-023-00389-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44196-023-00389-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,8]],"date-time":"2024-01-08T13:20:14Z","timestamp":1704720014000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s44196-023-00389-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,8]]},"references-count":21,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["389"],"URL":"https:\/\/doi.org\/10.1007\/s44196-023-00389-1","relation":{},"ISSN":["1875-6883"],"issn-type":[{"value":"1875-6883","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,8]]},"assertion":[{"value":"18 May 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 December 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 January 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"This article have not been published in other journals or conferences.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of Interest"}}],"article-number":"6"}}