{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T13:49:59Z","timestamp":1780408199828,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":51,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,6,26]],"date-time":"2021-06-26T00:00:00Z","timestamp":1624665600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/V006673\/1"],"award-info":[{"award-number":["EP\/V006673\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,6,26]]},"DOI":"10.1145\/3449639.3459304","type":"proceedings-article","created":{"date-parts":[[2021,6,21]],"date-time":"2021-06-21T17:50:43Z","timestamp":1624297843000},"page":"866-875","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":60,"title":["Policy gradient assisted MAP-Elites"],"prefix":"10.1145","author":[{"given":"Olle","family":"Nilsson","sequence":"first","affiliation":[{"name":"Imperial College London, London, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Antoine","family":"Cully","sequence":"additional","affiliation":[{"name":"Imperial College London, London, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2021,6,26]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Alberto Alvarez Steve Dahlskog Jose Font and Julian Togelius. 2020. Interactive Constrained MAP-Elites Analysis and Evaluation of the Expressiveness of the Feature Dimensions. arXiv:2003.03377 [cs.AI]  Alberto Alvarez Steve Dahlskog Jose Font and Julian Togelius. 2020. Interactive Constrained MAP-Elites Analysis and Evaluation of the Expressiveness of the Feature Dimensions. arXiv:2003.03377 [cs.AI]"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CIG.2019.8848022"},{"key":"e_1_3_2_1_3_1","volume-title":"Advances in Neural Information Processing Systems","volume":"5059","author":"Andrychowicz Marcin","year":"2017","unstructured":"Marcin Andrychowicz , Filip Wolski , Alex Ray , Jonas Schneider , Rachel Fong , Peter Welinder , Bob McGrew , Josh Tobin , Pieter Abbeel , and Wojciech Zaremba . 2017 . Hindsight experience replay . In Advances in Neural Information Processing Systems , Vol. 2017-Decem. Neural information processing systems foundation, 5049-- 5059 . arXiv:1707.01495 Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight experience replay. In Advances in Neural Information Processing Systems, Vol. 2017-Decem. Neural information processing systems foundation, 5049--5059. arXiv:1707.01495"},{"key":"e_1_3_2_1_4_1","volume-title":"The theory of dynamic programming. Bull. Amer. Math. Soc. 60, 6 (11","author":"Bellman Richard","year":"1954","unstructured":"Richard Bellman . 1954. The theory of dynamic programming. Bull. Amer. Math. Soc. 60, 6 (11 1954 ), 503--515. https:\/\/projecteuclid.org:443\/euclid.bams\/1183519147 Richard Bellman. 1954. The theory of dynamic programming. Bull. Amer. Math. Soc. 60, 6 (11 1954), 503--515. https:\/\/projecteuclid.org:443\/euclid.bams\/1183519147"},{"key":"e_1_3_2_1_5_1","volume-title":"CoRR abs\/1606.01540","author":"Brockman Greg","year":"2016","unstructured":"Greg Brockman , Vicki Cheung , Ludwig Pettersson , Jonas Schneider , John Schulman , Jie Tang , and Wojciech Zaremba . 2016. Open AI Gym . CoRR abs\/1606.01540 ( 2016 ). arXiv:1606.01540 http:\/\/arxiv.org\/abs\/1606.01540 Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. CoRR abs\/1606.01540 (2016). arXiv:1606.01540 http:\/\/arxiv.org\/abs\/1606.01540"},{"key":"e_1_3_2_1_6_1","unstructured":"Geoffrey Cideron Thomas Pierrot Nicolas Perrin Karim Beguir and Olivier Sigaud. 2020. QD-RL: Efficient Mixing of Quality and Diversity in Reinforcement Learning. arXiv:2006.08505 [cs.AI]  Geoffrey Cideron Thomas Pierrot Nicolas Perrin Karim Beguir and Olivier Sigaud. 2020. QD-RL: Efficient Mixing of Quality and Diversity in Reinforcement Learning. arXiv:2006.08505 [cs.AI]"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377930.3390217"},{"key":"e_1_3_2_1_8_1","volume-title":"35th International Conference on Machine Learning, ICML 2018","volume":"3","author":"Colas Cedric","year":"2018","unstructured":"Cedric Colas , Olivier Sigau , and Pierre Yves Oudeyer . 2018 . GEP-PG: Decoupling exploration and exploitation in deep reinforcement learning algorithms . In 35th International Conference on Machine Learning, ICML 2018 , Vol. 3 . International Machine Learning Society (IMLS), 1682--1691. arXiv :1802.05054 Cedric Colas, Olivier Sigau, and Pierre Yves Oudeyer. 2018. GEP-PG: Decoupling exploration and exploitation in deep reinforcement learning algorithms. In 35th International Conference on Machine Learning, ICML 2018, Vol. 3. International Machine Learning Society (IMLS), 1682--1691. arXiv:1802.05054"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/3327345.3327410"},{"key":"e_1_3_2_1_10_1","unstructured":"Erwin Coumans and Yunfei Bai. 2016--2019. PyBullet a Python module for physics simulation for games robotics and machine learning. http:\/\/pybullet.org. Implementation: https:\/\/github.com\/bulletphysics\/bullet3\/blob\/master\/examples\/pybullet\/gym\/pybullet_envs\/gym_locomotion_envs.py.  Erwin Coumans and Yunfei Bai. 2016--2019. PyBullet a Python module for physics simulation for games robotics and machine learning. http:\/\/pybullet.org. Implementation: https:\/\/github.com\/bulletphysics\/bullet3\/blob\/master\/examples\/pybullet\/gym\/pybullet_envs\/gym_locomotion_envs.py."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3321707.3321804"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Antoine Cully. 2020. Multi-Emitter MAP-Elites: Improving quality diversity and convergence speed with heterogeneous sets of emitters. arXiv:2007.05352 [cs.NE]  Antoine Cully. 2020. Multi-Emitter MAP-Elites: Improving quality diversity and convergence speed with heterogeneous sets of emitters. arXiv:2007.05352 [cs.NE]","DOI":"10.1145\/3449639.3459326"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14422"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3205455.3205571"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2017.2704781"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463372.2463399"},{"key":"e_1_3_2_1_17_1","volume-title":"First return then explore. (apr","author":"Ecoffet Adrien","year":"2020","unstructured":"Adrien Ecoffet , Joost Huizinga , Joel Lehman , Kenneth O. Stanley , and Jeff Clune . 2020. First return then explore. (apr 2020 ). arXiv:2004.12919 http:\/\/arxiv.org\/abs\/2004.12919 Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. 2020. First return then explore. (apr 2020). arXiv:2004.12919 http:\/\/arxiv.org\/abs\/2004.12919"},{"key":"e_1_3_2_1_18_1","volume-title":"7th International Conference on Learning Representations, ICLR 2019. International Conference on Learning Representations, ICLR. arXiv:1802","author":"Eysenbach Benjamin","year":"2019","unstructured":"Benjamin Eysenbach , Julian Ibarz , Abhishek Gupta , and Sergey Levine . 2019 . Diversity is all you need: Learning skills without a reward function . In 7th International Conference on Learning Representations, ICLR 2019. International Conference on Learning Representations, ICLR. arXiv:1802 .06070 Benjamin Eysenbach, Julian Ibarz, Abhishek Gupta, and Sergey Levine. 2019. Diversity is all you need: Learning skills without a reward function. In 7th International Conference on Learning Representations, ICLR 2019. International Conference on Learning Representations, ICLR. arXiv:1802.06070"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1162\/isal_a_00316"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377930.3390232"},{"key":"e_1_3_2_1_21_1","volume-title":"Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. (aug","author":"Forestier S\u00e9bastien","year":"2017","unstructured":"S\u00e9bastien Forestier , Yoan Mollard , and Pierre-Yves Oudeyer . 2017. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. (aug 2017 ). arXiv:1708.02190 http:\/\/arxiv.org\/abs\/1708.02190 S\u00e9bastien Forestier, Yoan Mollard, and Pierre-Yves Oudeyer. 2017. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. (aug 2017). arXiv:1708.02190 http:\/\/arxiv.org\/abs\/1708.02190"},{"key":"e_1_3_2_1_22_1","volume-title":"Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research","volume":"1596","author":"Fujimoto Scott","year":"2018","unstructured":"Scott Fujimoto , Herke van Hoof , and David Meger . 2018 . Addressing Function Approximation Error in Actor-Critic Methods . In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research , Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, Stockholmsm\u00e4ssan, Stockholm Sweden, 1587-- 1596 . http:\/\/proceedings.mlr.press\/v80\/fujimoto18a.html Implementation : https:\/\/github.com\/sfujim\/TD3. Scott Fujimoto, Herke van Hoof, and David Meger. 2018. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, Stockholmsm\u00e4ssan, Stockholm Sweden, 1587--1596. http:\/\/proceedings.mlr.press\/v80\/fujimoto18a.html Implementation: https:\/\/github.com\/sfujim\/TD3."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.2514\/6.2017-3330"},{"key":"e_1_3_2_1_24_1","unstructured":"Tanmay Gangwani Jian Peng and Yuan Zhou. 2020. Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity. arXiv:2011.02614 [cs.LG]  Tanmay Gangwani Jian Peng and Yuan Zhou. 2020. Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity. arXiv:2011.02614 [cs.LG]"},{"key":"e_1_3_2_1_25_1","volume-title":"Sancho Moura Oliveira, and Anders Lyhne Christensen","author":"Gomes Jorge","year":"2018","unstructured":"Jorge Gomes , Sancho Moura Oliveira, and Anders Lyhne Christensen . 2018 . An approach to evolve and exploit repertoires of general robot behaviours. Swarm and Evolutionary Computation ( 2018). Jorge Gomes, Sancho Moura Oliveira, and Anders Lyhne Christensen. 2018. An approach to evolve and exploit repertoires of general robot behaviours. Swarm and Evolutionary Computation (2018)."},{"key":"e_1_3_2_1_26_1","volume-title":"Pieter Abbeel, and Sergey Levine.","author":"Gupta Abhishek","year":"2018","unstructured":"Abhishek Gupta , Russell Mendonca , Yu Xuan Liu , Pieter Abbeel, and Sergey Levine. 2018 . Meta-reinforcement learning of structured exploration strategies. In Advances in Neural Information Processing Systems , Vol. 2018-Decem. Neural information processing systems foundation, 5302-- 5311 . arXiv:1802.07245 Abhishek Gupta, Russell Mendonca, Yu Xuan Liu, Pieter Abbeel, and Sergey Levine. 2018. Meta-reinforcement learning of structured exploration strategies. In Advances in Neural Information Processing Systems, Vol. 2018-Decem. Neural information processing systems foundation, 5302--5311. arXiv:1802.07245"},{"key":"e_1_3_2_1_27_1","volume-title":"Advances in Neural Information Processing Systems","volume":"5409","author":"Houthooft Rein","year":"2018","unstructured":"Rein Houthooft , Richard Y. Chen , Phillip Isola , Bradly C. Stadie , Filip Wolski , Jonathan Ho , and Pieter Abbeel . 2018 . Evolved policy gradients . In Advances in Neural Information Processing Systems , Vol. 2018-Decem. Neural information processing systems foundation, 5400-- 5409 . arXiv:1802.04821 Rein Houthooft, Richard Y. Chen, Phillip Isola, Bradly C. Stadie, Filip Wolski, Jonathan Ho, and Pieter Abbeel. 2018. Evolved policy gradients. In Advances in Neural Information Processing Systems, Vol. 2018-Decem. Neural information processing systems foundation, 5400--5409. arXiv:1802.04821"},{"key":"e_1_3_2_1_28_1","volume-title":"Advances in Neural Information Processing Systems","volume":"1200","author":"Khadka Shauharda","year":"2018","unstructured":"Shauharda Khadka and Kagan Tumer . 2018 . Evolution-guided policy gradient in reinforcement learning . In Advances in Neural Information Processing Systems , Vol. 2018-Decem. Neural information processing systems foundation, 1188-- 1200 . arXiv:1805.07917 Shauharda Khadka and Kagan Tumer. 2018. Evolution-guided policy gradient in reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 2018-Decem. Neural information processing systems foundation, 1188--1200. arXiv:1805.07917"},{"key":"e_1_3_2_1_29_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2014","unstructured":"Diederik P. Kingma and Jimmy Ba . 2014 . Adam : A Method for Stochastic Optimization . arXiv:1412.6980 [cs.LG] Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs.LG]"},{"key":"e_1_3_2_1_30_1","volume-title":"Map-based Multi-Policy Reinforcement Learning: Enhancing Adaptability of Robots by Deep Reinforcement Learning. (oct","author":"Kume Ayaka","year":"2017","unstructured":"Ayaka Kume , Eiichi Matsumoto , Kuniyuki Takahashi , Wilson Ko , and Jethro Tan . 2017. Map-based Multi-Policy Reinforcement Learning: Enhancing Adaptability of Robots by Deep Reinforcement Learning. (oct 2017 ). arXiv:1710.06117 http:\/\/arxiv.org\/abs\/1710.06117 Ayaka Kume, Eiichi Matsumoto, Kuniyuki Takahashi, Wilson Ko, and Jethro Tan. 2017. Map-based Multi-Policy Reinforcement Learning: Enhancing Adaptability of Robots by Deep Reinforcement Learning. (oct 2017). arXiv:1710.06117 http:\/\/arxiv.org\/abs\/1710.06117"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1162\/EVCO_a_00025"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2001576.2001606"},{"key":"e_1_3_2_1_33_1","volume-title":"4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings. International Conference on Learning Representations, ICLR. arXiv:1509","author":"Lillicrap Timothy P.","year":"2016","unstructured":"Timothy P. Lillicrap , Jonathan J. Hunt , Alexander Pritzel , Nicolas Heess , Tom Erez , Yuval Tassa , David Silver , and Daan Wierstra . 2016 . Continuous control with deep reinforcement learning . In 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings. International Conference on Learning Representations, ICLR. arXiv:1509 .02971 Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings. International Conference on Learning Representations, ICLR. arXiv:1509.02971"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992699"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01931367"},{"key":"e_1_3_2_1_36_1","volume-title":"Guided evolutionary strategies: escaping the curse of dimensionality in random search. arXiv","author":"Maheswaranathan Niru","year":"2018","unstructured":"Niru Maheswaranathan , Luke Metz , George Tucker , and Jascha Sohl-Dickstein . 2018. Guided evolutionary strategies: escaping the curse of dimensionality in random search. arXiv ( 2018 ), 1--16. https:\/\/doi.org\/arXiv:1806.10230v2 arXiv:1806.10230 Niru Maheswaranathan, Luke Metz, George Tucker, and Jascha Sohl-Dickstein. 2018. Guided evolutionary strategies: escaping the curse of dimensionality in random search. arXiv (2018), 1--16. https:\/\/doi.org\/arXiv:1806.10230v2 arXiv:1806.10230"},{"key":"e_1_3_2_1_37_1","volume-title":"33rd International Conference on Machine Learning, ICML","volume":"4","author":"Mnih Volodymyr","year":"2016","unstructured":"Volodymyr Mnih , Adria Puigdomenech Badia , Lehdi Mirza , Alex Graves , Tim Harley , Timothy P. Lillicrap , David Silver , and Koray Kavukcuoglu . 2016 . Asynchronous methods for deep reinforcement learning . In 33rd International Conference on Machine Learning, ICML 2016, Vol. 4 . International Machine Learning Society (IMLS), 2850--2869. arXiv:1602.01783 Volodymyr Mnih, Adria Puigdomenech Badia, Lehdi Mirza, Alex Graves, Tim Harley, Timothy P. Lillicrap, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In 33rd International Conference on Machine Learning, ICML 2016, Vol. 4. International Machine Learning Society (IMLS), 2850--2869. arXiv:1602.01783"},{"key":"e_1_3_2_1_38_1","volume-title":"Playing Atari with Deep Reinforcement Learning. (dec","author":"Mnih Volodymyr","year":"2013","unstructured":"Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Alex Graves , Ioannis Antonoglou , Daan Wierstra , and Martin Riedmiller . 2013. Playing Atari with Deep Reinforcement Learning. (dec 2013 ). arXiv:1312.5602 http:\/\/arxiv.org\/abs\/1312.5602 Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. (dec 2013). arXiv:1312.5602 http:\/\/arxiv.org\/abs\/1312.5602"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_2_1_40_1","volume-title":"Illuminating search spaces by mapping elites. (apr","author":"Mouret Jean-Baptiste","year":"2015","unstructured":"Jean-Baptiste Mouret and Jeff Clune . 2015. Illuminating search spaces by mapping elites. (apr 2015 ). arXiv:1504.04909 http:\/\/arxiv.org\/abs\/1504.04909 Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. (apr 2015). arXiv:1504.04909 http:\/\/arxiv.org\/abs\/1504.04909"},{"key":"e_1_3_2_1_41_1","volume-title":"6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings. International Conference on Learning Representations, ICLR. arXiv:1706","author":"Plappert Matthias","year":"2018","unstructured":"Matthias Plappert , Rein Houthooft , Prafulla Dhariwal , Szymon Sidor , Richard Y. Chen , Xi Chen , Tamim Asfour , Pieter Abbeel , and Marcin Andrychowicz . 2018 . Parameter space noise for exploration . In 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings. International Conference on Learning Representations, ICLR. arXiv:1706 .01905 Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, and Marcin Andrychowicz. 2018. Parameter space noise for exploration. In 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings. International Conference on Learning Representations, ICLR. arXiv:1706.01905"},{"key":"e_1_3_2_1_42_1","volume-title":"7th International Conference on Learning Representations, ICLR 2019. International Conference on Learning Representations, ICLR. arXiv:1810","author":"Pourchot Alo\u00efs","year":"2019","unstructured":"Alo\u00efs Pourchot and Olivier Sigaud . 2019 . CEM-RL: Combining evolutionary and gradient-based methods for policy search . In 7th International Conference on Learning Representations, ICLR 2019. International Conference on Learning Representations, ICLR. arXiv:1810 .01222 Alo\u00efs Pourchot and Olivier Sigaud. 2019. CEM-RL: Combining evolutionary and gradient-based methods for policy search. In 7th International Conference on Learning Representations, ICLR 2019. International Conference on Learning Representations, ICLR. arXiv:1810.01222"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.3389\/frobt.2016.00040"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2739480.2754664"},{"key":"e_1_3_2_1_45_1","volume-title":"Evolution Strategies as a Scalable Alternative to Reinforcement Learning. (mar","author":"Salimans Tim","year":"2017","unstructured":"Tim Salimans , Jonathan Ho , Xi Chen , Szymon Sidor , and Ilya Sutskever . 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. (mar 2017 ). arXiv:1703.03864 http:\/\/arxiv.org\/abs\/1703.03864 Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. (mar 2017). arXiv:1703.03864 http:\/\/arxiv.org\/abs\/1703.03864"},{"key":"e_1_3_2_1_46_1","volume-title":"31st International Conference on Machine Learning, ICML","author":"Silver David","year":"2014","unstructured":"David Silver , Guy Lever , Nicolas Heess , Thomas Degris , Daan Wierstra , and Martin Riedmiller . 2014 . Deterministic policy gradient algorithms . In 31st International Conference on Machine Learning, ICML 2014. David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In 31st International Conference on Machine Learning, ICML 2014."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2908812.2908875"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2017.2735550"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3205455.3205602"},{"key":"e_1_3_2_1_50_1","volume-title":"Learning to reinforcement learn. (nov","author":"Wang Jane X","year":"2016","unstructured":"Jane X Wang , Zeb Kurth-Nelson , Dhruva Tirumala , Hubert Soyer , Joel Z Leibo , Remi Munos , Charles Blundell , Dharshan Kumaran , and Matt Botvinick . 2016. Learning to reinforcement learn. (nov 2016 ). arXiv:1611.05763 http:\/\/arxiv.org\/abs\/1611.05763 Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, and Matt Botvinick. 2016. Learning to reinforcement learn. (nov 2016). arXiv:1611.05763 http:\/\/arxiv.org\/abs\/1611.05763"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1007\/bf00992698"}],"event":{"name":"GECCO '21: Genetic and Evolutionary Computation Conference","location":"Lille France","acronym":"GECCO '21","sponsor":["SIGEVO ACM Special Interest Group on Genetic and Evolutionary Computation"]},"container-title":["Proceedings of the Genetic and Evolutionary Computation Conference"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3449639.3459304","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3449639.3459304","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:28:08Z","timestamp":1750195688000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3449639.3459304"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,26]]},"references-count":51,"alternative-id":["10.1145\/3449639.3459304","10.1145\/3449639"],"URL":"https:\/\/doi.org\/10.1145\/3449639.3459304","relation":{},"subject":[],"published":{"date-parts":[[2021,6,26]]},"assertion":[{"value":"2021-06-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}