{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,16]],"date-time":"2025-10-16T07:01:02Z","timestamp":1760598062286,"version":"3.41.0"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,6,3]],"date-time":"2021-06-03T00:00:00Z","timestamp":1622678400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100002858","name":"China Postdoctoral Science Foundation","doi-asserted-by":"crossref","award":["2014M562555"],"award-info":[{"award-number":["2014M562555"]}],"id":[{"id":"10.13039\/501100002858","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61202338"],"award-info":[{"award-number":["61202338"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Defense Science and Technology Foundation Enhancement Plan","award":["2019-JCJQ-JJ-042"],"award-info":[{"award-number":["2019-JCJQ-JJ-042"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2021,6,30]]},"abstract":"<jats:p>Efficient and stable exploration remains a key challenge for deep reinforcement learning (DRL) operating in high-dimensional action and state spaces. Recently, a more promising approach by combining the exploration in the action space with the exploration in the parameters space has been proposed to get the best of both methods. In this article, we propose a new iterative and close-loop framework by combining the evolutionary algorithm (EA), which does explorations in a gradient-free manner directly in the parameters space with an actor-critic, and the deep deterministic policy gradient (DDPG) reinforcement learning algorithm, which does explorations in a gradient-based manner in the action space to make these two methods cooperate in a more balanced and efficient way. In our framework, the policies represented by the EA population (the parametric perturbation part) can evolve in a guided manner by utilizing the gradient information provided by the DDPG and the policy gradient part (DDPG) is used only as a fine-tuning tool for the best individual in the EA population to improve the sample efficiency. In particular, we propose a criterion to determine the training steps required for the DDPG to ensure that useful gradient information can be generated from the EA generated samples and the DDPG and EA part can work together in a more balanced way during each generation. Furthermore, within the DDPG part, our algorithm can flexibly switch between fine-tuning the same previous RL-Actor and fine-tuning a new one generated by the EA according to different situations to further improve the efficiency. Experiments on a range of challenging continuous control benchmarks demonstrate that our algorithm outperforms related works and offers a satisfactory trade-off between stability and sample efficiency.<\/jats:p>","DOI":"10.1145\/3452008","type":"journal-article","created":{"date-parts":[[2021,6,3]],"date-time":"2021-06-03T18:50:56Z","timestamp":1622746256000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning"],"prefix":"10.1145","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4870-881X","authenticated-orcid":false,"given":"Shilei","family":"Li","sequence":"first","affiliation":[{"name":"Department of Information Security, Naval University of Engineering, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Meng","family":"Li","sequence":"additional","affiliation":[{"name":"Army Academy of Artillery and Air Defense, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiongming","family":"Su","sequence":"additional","affiliation":[{"name":"College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shaofei","family":"Chen","sequence":"additional","affiliation":[{"name":"College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhimin","family":"Yuan","sequence":"additional","affiliation":[{"name":"Department of Information Security, Naval University of Engineering, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qing","family":"Ye","sequence":"additional","affiliation":[{"name":"Department of Information Security, Naval University of Engineering, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,6,3]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Human-level control through deep reinforcement learning. Nature 518, 7540","author":"Mnih Volodymyr","year":"2015","unstructured":"Volodymyr Mnih , Kavukcuoglu Koray , Silver David , Rusu Andrei A. Veness , Joel Bellemare , Marc G. Graves , Alex Riedmiller , Martin Fidjeland , Andreas K. Ostrovski , Georg 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 ( 2015 ), 529\u2013533. Volodymyr Mnih, Kavukcuoglu Koray, Silver David, Rusu Andrei A. Veness, Joel Bellemare, Marc G. Graves, Alex Riedmiller, Martin Fidjeland, Andreas K. Ostrovski, Georg 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529\u2013533."},{"key":"e_1_2_1_2_1","volume-title":"Mastering the game of go with deep neural networks and tree search. Nature 529, 7587","author":"Silver David","year":"2016","unstructured":"David Silver , Aja Huang , Chris J. Maddison , Arthur Guez , Laurent Sifre , George van den Driessche , Julian Schrittwieser , Ioannis Antonoglou , Veda Panneershelvam , Marc Lanctot , Sander Dieleman , Dominik Grewe , John Nham , Nal Kalchbrenner , Ilya Sutskever , Timothy Lillicrap , Madeleine Leach , Koray Kavukcuoglu , Thore Graepel , and Demis Hassabis . 2016. Mastering the game of go with deep neural networks and tree search. Nature 529, 7587 ( 2016 ), 484--489. David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the game of go with deep neural networks and tree search. Nature 529, 7587 (2016), 484--489."},{"key":"e_1_2_1_3_1","unstructured":"Timothy P. Lillicrap Jonathan J. Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. Retrieved from https:\/\/arXiv:1509.02971.  Timothy P. Lillicrap Jonathan J. Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. Retrieved from https:\/\/arXiv:1509.02971."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295258"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/3045390.3045594"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/3045118.3045319"},{"key":"e_1_2_1_7_1","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. Retrieved from https:\/\/arXiv:1707.06347.  John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. Retrieved from https:\/\/arXiv:1707.06347."},{"key":"e_1_2_1_8_1","unstructured":"Tuomas Haarnoja Aurick Zhou Pieter Abbeel and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Retrieved from https:\/\/arXiv:1801.01290.  Tuomas Haarnoja Aurick Zhou Pieter Abbeel and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Retrieved from https:\/\/arXiv:1801.01290."},{"key":"e_1_2_1_9_1","unstructured":"Scott Fujimoto Herke van Hoof and David Meger 2018. Addressing function approximation error in actor-critic methods. Retrieved from https:\/\/arXiv:1802.09477.  Scott Fujimoto Herke van Hoof and David Meger 2018. Addressing function approximation error in actor-critic methods. Retrieved from https:\/\/arXiv:1802.09477."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/3305890.3305962"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/3294996.3295035"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157096.3157262"},{"key":"e_1_2_1_13_1","unstructured":"Sainbayar Sukhbaatar Ilya Kostrikov Arthur Szlam and Rob Fergus. 2017. Intrinsic motivation and automatic curricula via asymmetric self-play. Retrieved from https:\/\/arXiv:1703.05407.  Sainbayar Sukhbaatar Ilya Kostrikov Arthur Szlam and Rob Fergus. 2017. Intrinsic motivation and automatic curricula via asymmetric self-play. Retrieved from https:\/\/arXiv:1703.05407."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/3327757.3327931"},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Ildefons Magrans de Abril Ryota Kanai. 2018. Curiosity-driven reinforcement learning with homeostatic regulation. Retrieved from https:\/\/abs\/1801.07440.  Ildefons Magrans de Abril Ryota Kanai. 2018. Curiosity-driven reinforcement learning with homeostatic regulation. Retrieved from https:\/\/abs\/1801.07440.","DOI":"10.1109\/IJCNN.2018.8489075"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/3305890.3305968"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157096.3157221"},{"key":"e_1_2_1_18_1","unstructured":"Matthias Plappert Rein Houthooft Prafulla Dhariwal Szymon Sidor Richard Y. Chen Xi Chen Tamim Asfour Pieter Abbeel and Marcin Andrychowicz. 2017. Parameter space noise for exploration. Retrieved from https:\/\/arXiv:1706.01905.  Matthias Plappert Rein Houthooft Prafulla Dhariwal Szymon Sidor Richard Y. Chen Xi Chen Tamim Asfour Pieter Abbeel and Marcin Andrychowicz. 2017. Parameter space noise for exploration. Retrieved from https:\/\/arXiv:1706.01905."},{"key":"e_1_2_1_19_1","volume-title":"Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, et al.","author":"Fortunato Meire","year":"2017","unstructured":"Meire Fortunato , Mohammad Gheshlaghi Azar , Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, et al. 2017 . Noisy networks for exploration. Retrieved from https:\/\/arXiv:1706.10295. Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, et al. 2017. Noisy networks for exploration. Retrieved from https:\/\/arXiv:1706.10295."},{"key":"e_1_2_1_20_1","first-page":"1","article-title":"Robot skill learning: From reinforcement learning to evolution strategies","volume":"4","author":"Sigaud Olivier Freek","year":"2013","unstructured":"Stulp, Freek and Sigaud Olivier . 2013 . Robot skill learning: From reinforcement learning to evolution strategies . Paladyn J. Behav. Robot. 4 , 1 (Aug. 2013), 49\u201361. doi: 10.2478\/pjbr-2013-0003. Stulp, Freek and Sigaud Olivier. 2013. Robot skill learning: From reinforcement learning to evolution strategies. Paladyn J. Behav. Robot. 4, 1 (Aug. 2013), 49\u201361. doi: 10.2478\/pjbr-2013-0003.","journal-title":"Paladyn J. Behav. Robot."},{"key":"e_1_2_1_21_1","unstructured":"C\u00e9dric Colas Olivier Sigaud and Pierre-Yves Oudeyer. 2018. GEP-PG: Decoupling exploration and exploitation in deep reinforcement learning algorithms. Retrieved from https:\/\/arXiv:1802.05054.  C\u00e9dric Colas Olivier Sigaud and Pierre-Yves Oudeyer. 2018. GEP-PG: Decoupling exploration and exploitation in deep reinforcement learning algorithms. Retrieved from https:\/\/arXiv:1802.05054."},{"key":"e_1_2_1_22_1","volume-title":"CEM-RL: Combining evolutionary and gradient-based methods for policy search.","author":"Pourchot Alois","year":"1810","unstructured":"Alois Pourchot , Olivier Sigaud . 2018. CEM-RL: Combining evolutionary and gradient-based methods for policy search. Retrieved from https:\/\/arXiv: 1810 .01222. Alois Pourchot, Olivier Sigaud. 2018. CEM-RL: Combining evolutionary and gradient-based methods for policy search. Retrieved from https:\/\/arXiv: 1810.01222."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/89851.89891"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/3042817.3042937"},{"key":"e_1_2_1_25_1","volume-title":"Davide Del Testa","author":"Bojarski Mariusz","year":"2016","unstructured":"Mariusz Bojarski , Davide Del Testa , Daniel Dworakowski, Bernhard Firner , Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, and Karol Zieba. 2016 . End to end learning for self-driving cars. Retrieved from https:\/\/arXiv:1604.07316. Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, and Karol Zieba. 2016. End to end learning for self-driving cars. Retrieved from https:\/\/arXiv:1604.07316."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157382.3157608"},{"key":"e_1_2_1_27_1","unstructured":"Matej Ve\u010der\u00edk Todd Hester Jonathan Scholz Fumin Wang Olivier Pietquin Bilal Piot Nicolas Heess Thomas Roth\u00f6rl Thomas Lampe and Martin Riedmiller. 2017. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. Retrieved from https:\/\/arxiv:1707.08817.  Matej Ve\u010der\u00edk Todd Hester Jonathan Scholz Fumin Wang Olivier Pietquin Bilal Piot Nicolas Heess Thomas Roth\u00f6rl Thomas Lampe and Martin Riedmiller. 2017. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. Retrieved from https:\/\/arxiv:1707.08817."},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Todd Hester Matej Vecerik Olivier Pietquin Marc Lanctot Tom Schaul Bilal Piot Dan Horgan John Quan Andrew Sendonaris Gabriel Dulac-Arnold Ian Osband John Agapiou Joel Z. Leibo and Audrunas Gruslys. 2017. Learning from demonstrations for real world reinforcement learning. Retrieved from https:\/\/rxiv:1704.03732.  Todd Hester Matej Vecerik Olivier Pietquin Marc Lanctot Tom Schaul Bilal Piot Dan Horgan John Quan Andrew Sendonaris Gabriel Dulac-Arnold Ian Osband John Agapiou Joel Z. Leibo and Audrunas Gruslys. 2017. Learning from demonstrations for real world reinforcement learning. Retrieved from https:\/\/rxiv:1704.03732.","DOI":"10.1609\/aaai.v32i1.11757"},{"key":"e_1_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Nair Ashvin McGrew Bob Andrychowicz Marcin Zaremba Wojciech and Abbeel Pieter. 2017. Overcoming exploration in reinforcement learning with demonstrations. Retrieved from https:\/\/arXiv:1709.10089.  Nair Ashvin McGrew Bob Andrychowicz Marcin Zaremba Wojciech and Abbeel Pieter. 2017. Overcoming exploration in reinforcement learning with demonstrations. Retrieved from https:\/\/arXiv:1709.10089.","DOI":"10.1109\/ICRA.2018.8463162"},{"key":"e_1_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Thomas R\u00fcckstie\u00df Martin Felder and J\u00fcrgen Schmidhuber. 2008. State-dependent exploration for policy gradient methods. Mach. Learn. Knowl. Discov. Databases 234\u2013249.  Thomas R\u00fcckstie\u00df Martin Felder and J\u00fcrgen Schmidhuber. 2008. State-dependent exploration for policy gradient methods. Mach. Learn. Knowl. Discov. Databases 234\u2013249.","DOI":"10.1007\/978-3-540-87481-2_16"},{"key":"e_1_2_1_31_1","volume-title":"Evolutionsstrategie: Optimierung technischer systeme nach prinzipien der biologishen evolution. Frommann-Holzboog","author":"Rechenberg Ingo","year":"1973","unstructured":"Ingo Rechenberg and Manfred Eigen . 1973 . Evolutionsstrategie: Optimierung technischer systeme nach prinzipien der biologishen evolution. Frommann-Holzboog , Stuttgart . Ingo Rechenberg and Manfred Eigen. 1973. Evolutionsstrategie: Optimierung technischer systeme nach prinzipien der biologishen evolution. Frommann-Holzboog, Stuttgart."},{"volume-title":"Numerische optimierung von computermodellen mittels der evolutionsstrategie","author":"Schwefel Hans-Paul","key":"e_1_2_1_32_1","unstructured":"Hans-Paul Schwefel . 1977. Numerische optimierung von computermodellen mittels der evolutionsstrategie , vol. 1 . Birkh\u00e4user , Basel, Switzerland . Hans-Paul Schwefel. 1977. Numerische optimierung von computermodellen mittels der evolutionsstrategie, vol. 1. Birkh\u00e4user, Basel, Switzerland."},{"key":"e_1_2_1_33_1","unstructured":"Salimans Tim Ho Jonathan Chen Xi and Sutskever Ilya. 2017. Evolution strategies as a scalable alternative to reinforcement learning. Retrieved from https:\/\/arXiv:1703.03864.  Salimans Tim Ho Jonathan Chen Xi and Sutskever Ilya. 2017. Evolution strategies as a scalable alternative to reinforcement learning. Retrieved from https:\/\/arXiv:1703.03864."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.5555\/3327345.3327410"},{"key":"e_1_2_1_35_1","unstructured":"Felipe Petroski Such Vashisht Madhavan Edoardo Conti Joel Lehman Kenneth O. Stanley and Jeff Clune. 2017. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. Retrieved from https:\/\/arXiv:1712.06567.  Felipe Petroski Such Vashisht Madhavan Edoardo Conti Joel Lehman Kenneth O. Stanley and Jeff Clune. 2017. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. Retrieved from https:\/\/arXiv:1712.06567."},{"key":"e_1_2_1_36_1","unstructured":"Joshua Achiam and Shankar Sastry. 2017. Surprise-based intrinsic motivation for deep reinforcement learning. Retrieved from https:\/\/arXiv:1703.01732.  Joshua Achiam and Shankar Sastry. 2017. Surprise-based intrinsic motivation for deep reinforcement learning. Retrieved from https:\/\/arXiv:1703.01732."},{"key":"e_1_2_1_37_1","unstructured":"Bradly C. Stadie Sergey Levine and Pieter Abbeel. 2015. Incentivizing exploration in reinforcement learning with deep predictive models. Retrieved from https:\/\/arXiv:1507.00814.  Bradly C. Stadie Sergey Levine and Pieter Abbeel. 2015. Incentivizing exploration in reinforcement learning with deep predictive models. Retrieved from https:\/\/arXiv:1507.00814."},{"key":"e_1_2_1_38_1","unstructured":"Niru Maheswaranathan Luke Metz George Tucker Dami Choi and Jascha Sohl-Dickstein. 2018. Guided evolutionary strategies: Escaping the curse of dimensionality in random search. Retrieved from https:\/\/arXiv:1806.10230.  Niru Maheswaranathan Luke Metz George Tucker Dami Choi and Jascha Sohl-Dickstein. 2018. Guided evolutionary strategies: Escaping the curse of dimensionality in random search. Retrieved from https:\/\/arXiv:1806.10230."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2739480.2754664"},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1109\/TEVC.2017.2704781","article-title":"Quality and diversity optimization: A unifying modular framework","volume":"22","author":"Antoine Cully","year":"2017","unstructured":"Cully Antoine and Demiris Yiannis . 2017 . Quality and diversity optimization: A unifying modular framework . IEEE Trans. Evolution. Comput. 22 , 2 (2017), 245 \u2013 259 . Cully Antoine and Demiris Yiannis. 2017. Quality and diversity optimization: A unifying modular framework. IEEE Trans. Evolution. Comput. 22, 2 (2017), 245\u2013259.","journal-title":"IEEE Trans. Evolution. Comput."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2012.05.008"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2016.7759584"},{"key":"e_1_2_1_43_1","unstructured":"S\u00e9bastien Forestier R\u00e9my Portelas Yoan Mollard and Pierre-Yves Oudeyer. 2017. Intrinsically motivated goal exploration processes with automatic curriculum learning. Retrieved from https:\/\/arXiv:1708.02190.  S\u00e9bastien Forestier R\u00e9my Portelas Yoan Mollard and Pierre-Yves Oudeyer. 2017. Intrinsically motivated goal exploration processes with automatic curriculum learning. Retrieved from https:\/\/arXiv:1708.02190."},{"key":"e_1_2_1_44_1","volume-title":"Retrieved","author":"Weng Lilian","year":"2020","unstructured":"Lilian Weng . Exploration Strategies in Deep Reinforcement Learning . Retrieved July 10, 2020 from https:\/\/lilianweng.github.io\/lil-log\/2020\/06\/07\/exploration-strategies-in-deep-reinforcement-learning.html. Lilian Weng. Exploration Strategies in Deep Reinforcement Learning. Retrieved July 10, 2020 from https:\/\/lilianweng.github.io\/lil-log\/2020\/06\/07\/exploration-strategies-in-deep-reinforcement-learning.html."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-56602-3_163"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.5555\/554345"},{"key":"e_1_2_1_47_1","unstructured":"Tanmay Gangwani and Jian Peng. 2017. Genetic policy optimization. Retrieved from https:\/\/arXiv:1711.01012.  Tanmay Gangwani and Jian Peng. 2017. Genetic policy optimization. Retrieved from https:\/\/arXiv:1711.01012."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.swevo.2018.03.011"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12065-007-0002-4"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-55849-3_57"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1162\/106365602320169811"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.5555\/1248547.1248578"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.5555\/3326943.3327053"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCIAIG.2015.2494596"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-018-0006-z"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.5555\/1014902"},{"key":"e_1_2_1_57_1","unstructured":"Greg Brockman Vicki Cheung Ludwig Pettersson Jonas Schneider John Schulman Jie Tang and Wojciech Zaremba. 2016. Openai gym. Retrieved from https:\/\/arXiv:1606.01540.  Greg Brockman Vicki Cheung Ludwig Pettersson Jonas Schneider John Schulman Jie Tang and Wojciech Zaremba. 2016. Openai gym. Retrieved from https:\/\/arXiv:1606.01540."}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3452008","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3452008","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:03:00Z","timestamp":1750197780000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3452008"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,3]]},"references-count":57,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,6,30]]}},"alternative-id":["10.1145\/3452008"],"URL":"https:\/\/doi.org\/10.1145\/3452008","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"type":"print","value":"2157-6904"},{"type":"electronic","value":"2157-6912"}],"subject":[],"published":{"date-parts":[[2021,6,3]]},"assertion":[{"value":"2020-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-06-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}