{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T19:45:43Z","timestamp":1774381543987,"version":"3.50.1"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T00:00:00Z","timestamp":1742342400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Evol. Learn. Optim."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>A hallmark of intelligence is the ability to exhibit a wide range of effective behaviors. Inspired by this principle, Quality-Diversity algorithms, such as MAP-Elites, are evolutionary methods designed to generate a set of diverse and high-fitness solutions. However, as a genetic algorithm, MAP-Elites relies on random mutations, which can become inefficient in high-dimensional search spaces, thus limiting its scalability to more complex domains, such as learning to control agents directly from high-dimensional inputs. To address this limitation, advanced methods like PGA-MAP-Elites and DCG-MAP-Elites have been developed, which combine actor-critic techniques from Reinforcement Learning with MAP-Elites, significantly enhancing the performance and efficiency of Quality-Diversity algorithms in complex, high-dimensional tasks. While these methods have successfully leveraged the trained critic to guide more effective mutations, the potential of the trained actor remains underutilized in improving both the quality and diversity of the evolved population. In this work, we introduce DCRL-MAP-Elites, an extension of DCG-MAP-Elites that utilizes the descriptor-conditioned actor as a generative model to produce diverse solutions, which are then injected into the offspring batch at each generation. Additionally, we present an empirical analysis of the fitness and descriptor reproducibility of the solutions discovered by each algorithm. Finally, we present a second empirical analysis shedding light on the synergies between the different variations operators and explaining the performance improvement from PGA-MAP-Elites to DCRL-MAP-Elites.<\/jats:p>","DOI":"10.1145\/3696426","type":"journal-article","created":{"date-parts":[[2024,9,23]],"date-time":"2024-09-23T16:26:33Z","timestamp":1727108793000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning"],"prefix":"10.1145","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4743-9494","authenticated-orcid":false,"given":"Maxence","family":"Faldor","sequence":"first","affiliation":[{"name":"Imperial College London, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9476-2900","authenticated-orcid":false,"given":"F\u00e9lix","family":"Chalumeau","sequence":"additional","affiliation":[{"name":"InstaDeep, Cape Town, South Africa"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4601-2176","authenticated-orcid":false,"given":"Manon","family":"Flageat","sequence":"additional","affiliation":[{"name":"Imperial College London, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3190-7073","authenticated-orcid":false,"given":"Antoine","family":"Cully","sequence":"additional","affiliation":[{"name":"Imperial College London, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,3,19]]},"reference":[{"key":"e_1_3_1_2_1","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems","volume":"30","author":"Andrychowicz Marcin","year":"2017","unstructured":"Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight Experience Replay. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Vol. 30, Curran Associates, Inc. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2017\/hash\/453fadbd8a1a3af50a9df4df899537b5-Abstract.html"},{"key":"e_1_3_1_3_1","unstructured":"Felix Chalumeau Raphael Boige Bryan Lim Valentin Mac\u00e9 Maxime Allard Arthur Flajolet Antoine Cully and Thomas Pierrot. 2022. Neuroevolution Is a Competitive Alternative to Reinforcement Learning for Skill Discovery. Retrieved from https:\/\/openreview.net\/forum?id=6BHlZgyPOZY"},{"key":"e_1_3_1_4_1","unstructured":"Felix Chalumeau Bryan Lim Raphael Boige Maxime Allard Luca Grillotti Manon Flageat Valentin Mac\u00e9 Arthur Flajolet Thomas Pierrot and Antoine Cully. 2023. QDax: A library for quality-diversity and population-based algorithms with hardware acceleration. arXiv:2308.03665. Retrieved from https:\/\/arxiv.org\/abs\/2308.03665"},{"key":"e_1_3_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-66515-9_4"},{"key":"e_1_3_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2017.11.010"},{"key":"e_1_3_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377930.3390217"},{"key":"e_1_3_1_8_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14422"},{"key":"e_1_3_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2017.2704781"},{"key":"e_1_3_1_10_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-020-03157-9"},{"key":"e_1_3_1_11_1","unstructured":"Benjamin Eysenbach Abhishek Gupta Julian Ibarz and Sergey Levine. 2018. Diversity is all you need: Learning skills without a reward function. arXiv:1802.06070. Retrieved from https:\/\/arxiv.org\/abs\/1802.06070"},{"key":"e_1_3_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3583131.3590503"},{"key":"e_1_3_1_13_1","doi-asserted-by":"crossref","unstructured":"Maxence Faldor and Antoine Cully. 2024. Toward artificial open-ended evolution within lenia using quality-diversity. arXiv:2406.04235. Retrieved from https:\/\/arxiv.org\/abs\/2406.04235.","DOI":"10.1162\/isal_a_00827"},{"key":"e_1_3_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3577203"},{"issue":"4","key":"e_1_3_1_15_1","doi-asserted-by":"crossref","first-page":"891","DOI":"10.1109\/TEVC.2023.3273560","article-title":"Uncertain Quality-Diversity: Evaluation Methodology and New Methods for Quality-Diversity in Uncertain Domains","volume":"28","author":"Flageat Manon","year":"2023","unstructured":"Manon Flageat and Antoine Cully. 2023. Uncertain Quality-Diversity: Evaluation Methodology and New Methods for Quality-Diversity in Uncertain Domains. IEEE Transactions on Evolutionary Computation 28, 4 (2023), 891\u2013902.","journal-title":"IEEE Transactions on Evolutionary Computation"},{"key":"e_1_3_1_16_1","unstructured":"Manon Flageat Bryan Lim Luca Grillotti Maxime Allard Sim\u00f3n C. Smith and Antoine Cully. 2022. Benchmarking quality-diversity algorithms on neuroevolution for reinforcement learning. arXiv:2211.02193. Retrieved from https:\/\/arxiv.org\/abs\/2211.02193"},{"key":"e_1_3_1_17_1","first-page":"10040","volume-title":"Proceedings of the 35th International Conference on Neural Information Processing SystemsCurran Associates, Inc.","volume":"34","author":"Fontaine Matthew","year":"2021","unstructured":"Matthew Fontaine and Stefanos Nikolaidis. 2021. Differentiable Quality Diversity. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Vol. 34, Curran Associates, Inc., 10040\u201310052. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/532923f11ac97d3e7cb0130315b067dc-Abstract.html"},{"key":"e_1_3_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3583131.3590389"},{"key":"e_1_3_1_19_1","unstructured":"C. Daniel Freeman Erik Frey Anton Raichuk Sertan Girgin Igor Mordatch and Olivier Bachem. 2021. Brax \u2013 A Differentiable Physics Engine for Large Scale Rigid Body Simulation. Retrieved from http:\/\/github.com\/google\/brax"},{"key":"e_1_3_1_20_1","first-page":"1587","volume-title":"Proceedings of the 35th International Conference on Machine Learning","author":"Fujimoto Scott","year":"2018","unstructured":"Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning. PMLR, 1587\u20131596. Retrieved from https:\/\/proceedings.mlr.press\/v80\/fujimoto18a.html"},{"key":"e_1_3_1_21_1","unstructured":"Karol Gregor Danilo Jimenez Rezende and Daan Wierstra. 2016. Variational intrinsic control. arXiv:1611.07507. Retrieved from https:\/\/arxiv.org\/abs\/1611.07507"},{"key":"e_1_3_1_22_1","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Grillotti Luca","year":"2024","unstructured":"Luca Grillotti, Maxence Faldor, Borja Gonz\u00e1lez Le\u00f3n, and Antoine Cully. 2024. Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics. In Proceedings of the International Conference on Machine Learning. PMLR."},{"key":"e_1_3_1_23_1","first-page":"1861","volume-title":"Proceedings of the 35th International Conference on Machine Learning","author":"Haarnoja Tuomas","year":"2018","unstructured":"Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning. PMLR, 1861\u20131870. Retrieved from https:\/\/proceedings.mlr.press\/v80\/haarnoja18b.html"},{"key":"e_1_3_1_24_1","unstructured":"Tuomas Haarnoja Aurick Zhou Kristian Hartikainen George Tucker Sehoon Ha Jie Tan Vikash Kumar Henry Zhu Abhishek Gupta Pieter Abbeel and Sergey Levine. 2019. Soft actor-critic algorithms and applications. arXiv:1812.05905. Retrieved from https:\/\/arxiv.org\/abs\/1812.05905"},{"key":"e_1_3_1_25_1","unstructured":"Nikolaus Hansen. 2023. The CMA evolution strategy: A tutorial. arXiv:1604.00772. Retrieved from https:\/\/arxiv.org\/abs\/1604.00772"},{"key":"e_1_3_1_26_1","unstructured":"Nicolas Heess Dhruva TB Srinivasan Sriram Jay Lemmon Josh Merel Greg Wayne Yuval Tassa Tom Erez Ziyu Wang Ali Eslami Martin Riedmiller and David Silver. 2017. Emergence of Locomotion Behaviours in Rich Environments. arXiv:1707.02286. Retrieved from https:\/\/arxiv.org\/abs\/1707.02286"},{"key":"e_1_3_1_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/0893-6080(89)90020-8"},{"key":"e_1_3_1_28_1","unstructured":"Max Jaderberg Valentin Dalibard Simon Osindero Wojciech M. Czarnecki Jeff Donahue Ali Razavi Oriol Vinyals Tim Green Iain Dunning Karen Simonyan Chrisantha Fernando and Koray Kavukcuoglu. 2017. Population based training of neural networks. arXiv:1711.09846. Retrieved from https:\/\/arxiv.org\/abs\/1711.09846"},{"key":"e_1_3_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDL-EpiRob44920.2019"},{"key":"e_1_3_1_30_1","first-page":"8198","volume-title":"Proceedings of the 34th International Conference on Neural Information Processing SystemsCurran Associates, Inc","volume":"33","author":"Kumar Saurabh","year":"2020","unstructured":"Saurabh Kumar, Aviral Kumar, Sergey Levine, and Chelsea Finn. 2020. One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vol. 33, Curran Associates, Inc., 8198\u20138210. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/5d151d1059a6281335a10732fc49620e-Abstract.html"},{"key":"e_1_3_1_31_1","volume-title":"Proceedings of the 4th International Conference on Learning Representations (ICLR \u201916)","author":"Lillicrap Timothy P.","year":"2016","unstructured":"Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous Control with Deep Reinforcement Learning. In Proceedings of the 4th International Conference on Learning Representations (ICLR \u201916). Yoshua Bengio and Yann LeCun (Eds.). Retrieved from http:\/\/arxiv.org\/abs\/1509.02971"},{"key":"e_1_3_1_32_1","unstructured":"Volodymyr Mnih Koray Kavukcuoglu David Silver Alex Graves Ioannis Antonoglou Daan Wierstra and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv:1312.5602. Retrieved from https:\/\/arxiv.org\/abs\/1312.5602"},{"key":"e_1_3_1_33_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_1_34_1","unstructured":"Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. arXiv:1504.04909. Retrieved from https:\/\/arxiv.org\/abs\/1504.04909"},{"key":"e_1_3_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3449639.3459304"},{"key":"e_1_3_1_36_1","unstructured":"OpenAI Ilge Akkaya Marcin Andrychowicz Maciek Chociej Mateusz Litwin Bob McGrew Arthur Petron Alex Paino Matthias Plappert Glenn Powell Raphael Ribas Jonas Schneider Nikolas Tezak Jerry Tworek Peter Welinder Lilian Weng Qiming Yuan Wojciech Zaremba and Lei Zhang. 2019. Solving rubik\u2019s cube with a robot hand. arXiv:1910.07113. Retrieved from https:\/\/arxiv.org\/abs\/1910.07113"},{"key":"e_1_3_1_37_1","unstructured":"Georg Ostrovski Pablo Samuel Castro and Will Dabney. 2021. The difficulty of passive learning in deep reinforcement learning. arXiv:2110.14020. Retrieved from http:\/\/arxiv.org\/abs\/2110.14020"},{"key":"e_1_3_1_38_1","unstructured":"Thomas Pierrot and Arthur Flajolet. 2023. Evolving populations of diverse RL agents with MAP-elites. arXiv:2303.12803. Retrieved from https:\/\/arxiv.org\/abs\/2303.12803"},{"key":"e_1_3_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3512290.3528845"},{"key":"e_1_3_1_40_1","doi-asserted-by":"publisher","DOI":"10.3389\/frobt.2016.00040"},{"key":"e_1_3_1_41_1","unstructured":"Tim Salimans Jonathan Ho Xi Chen Szymon Sidor and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864. Retrieved from https:\/\/arxiv.org\/abs\/1703.03864"},{"key":"e_1_3_1_42_1","first-page":"1312","volume-title":"Proceedings of the 32nd International Conference on Machine Learning","author":"Schaul Tom","year":"2015","unstructured":"Tom Schaul, Daniel Horgan, Karol Gregor, and David Silver. 2015. Universal Value Function Approximators. In Proceedings of the 32nd International Conference on Machine Learning. PMLR, 1312\u20131320. Retrieved from https:\/\/proceedings.mlr.press\/v37\/schaul15.html"},{"key":"e_1_3_1_43_1","unstructured":"Archit Sharma Shixiang Gu Sergey Levine Vikash Kumar and Karol Hausman. 2019. Dynamics-Aware Unsupervised Discovery of Skills. Retrieved from https:\/\/openreview.net\/forum?id=HJgLZR4KvH"},{"key":"e_1_3_1_44_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature16961"},{"key":"e_1_3_1_45_1","first-page":"387","volume-title":"Proceedings of the 31st International Conference on Machine Learning","author":"Silver David","year":"2014","unstructured":"David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic Policy Gradient Algorithms. In Proceedings of the 31st International Conference on Machine Learning. PMLR, 387\u2013395. Retrieved from https:\/\/proceedings.mlr.press\/v32\/silver14.html"},{"key":"e_1_3_1_46_1","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (2nd ed.). The MIT Press, Cambridge, MA.","edition":"2"},{"key":"e_1_3_1_47_1","doi-asserted-by":"crossref","unstructured":"Bryon Tjanaka Matthew C. Fontaine David H. Lee Aniruddha Kalkar and Stefanos Nikolaidis. 2023. Training diverse high-dimensional controllers by scaling covariance matrix adaptation MAP-annealing. arXiv:2210.02622. Retrieved from https:\/\/arxiv.org\/abs\/2210.02622","DOI":"10.1109\/LRA.2023.3313012"},{"key":"e_1_3_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3512290.3528705"},{"key":"e_1_3_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2017.2735550"}],"container-title":["ACM Transactions on Evolutionary Learning and Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3696426","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3696426","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:55Z","timestamp":1750295935000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3696426"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,19]]},"references-count":48,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3696426"],"URL":"https:\/\/doi.org\/10.1145\/3696426","relation":{},"ISSN":["2688-3007"],"issn-type":[{"value":"2688-3007","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,19]]},"assertion":[{"value":"2023-12-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}