{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T17:27:52Z","timestamp":1775064472331,"version":"3.50.1"},"reference-count":200,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2024,12,10]],"date-time":"2024-12-10T00:00:00Z","timestamp":1733788800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U21A20518, 62025208, and 62421002"],"award-info":[{"award-number":["U21A20518, 62025208, and 62421002"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2025,4,30]]},"abstract":"<jats:p>Deep reinforcement learning has led to dramatic breakthroughs in the field of artificial intelligence for the past few years. As the amount of rollout experience data and the size of neural networks for deep reinforcement learning have grown continuously, handling the training process and reducing the time consumption using parallel and distributed computing is becoming an urgent and essential desire. In this article, we perform a broad and thorough investigation on training acceleration methodologies for deep reinforcement learning based on parallel and distributed computing, providing a comprehensive survey in this field with state-of-the-art methods and pointers to core references. In particular, a taxonomy of literature is provided, along with a discussion of emerging topics and open issues. This incorporates learning system architectures, simulation parallelism, computing parallelism, distributed synchronization mechanisms, and deep evolutionary reinforcement learning. Furthermore, we compare 16 current open-source libraries and platforms with criteria of facilitating rapid development. Finally, we extrapolate future directions that deserve further research.<\/jats:p>","DOI":"10.1145\/3703453","type":"journal-article","created":{"date-parts":[[2024,11,14]],"date-time":"2024-11-14T10:04:39Z","timestamp":1731578679000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey"],"prefix":"10.1145","volume":"57","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3830-7182","authenticated-orcid":false,"given":"Zhihong","family":"Liu","sequence":"first","affiliation":[{"name":"National University of Defense Technology, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3238-745X","authenticated-orcid":false,"given":"Xin","family":"Xu","sequence":"additional","affiliation":[{"name":"National University of Defense Technology, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6752-7892","authenticated-orcid":false,"given":"Peng","family":"Qiao","sequence":"additional","affiliation":[{"name":"National University of Defense Technology, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9743-2034","authenticated-orcid":false,"given":"Dongsheng","family":"Li","sequence":"additional","affiliation":[{"name":"National University of Defense Technology, Changsha, China"}]}],"member":"320","published-online":{"date-parts":[[2024,12,10]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature16961"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.aar6404"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-019-1724-z"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-023-06419-4"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364919887447"},{"key":"e_1_3_2_8_2","first-page":"767","volume-title":"Proceedings of the Conference on Robot Learning","author":"Fan Linxi","year":"2018","unstructured":"Linxi Fan, Yuke Zhu, Jiren Zhu, Zihua Liu, Orien Zeng, Anchit Gupta, Joan Creus-Costa, Silvio Savarese, and Li Fei-Fei. 2018. Surreal: Open-source reinforcement learning framework and robot manipulation benchmark. In Proceedings of the Conference on Robot Learning. PMLR, 767\u2013782."},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-021-25874-z"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1126\/sciadv.aap7885"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41591-021-01599-w"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41591-018-0253-x"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11694"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2017.2743240"},{"key":"e_1_3_2_15_2","unstructured":"Yuxi Li. 2017. Deep reinforcement learning: An overview. arXiv:1701.07274. Retrieved from https:\/\/arxiv.org\/abs\/1701.07274"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2022.3207346"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1561\/2200000086"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2021.3054625"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2021.3121870"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2019.2916583"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2020.3008612"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2020.2977374"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2021.3073036"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3377454"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1007\/s13748-012-0035-5"},{"issue":"6","key":"e_1_3_2_26_2","first-page":"2574","article-title":"A survey on large-scale machine learning","volume":"34","author":"Wang Meng","year":"2020","unstructured":"Meng Wang, Weijie Fu, Xiangnan He, Shijie Hao, and Xindong Wu. 2020. A survey on large-scale machine learning. IEEE Transactions on Knowledge and Data Engineering 34, 6 (2020), 2574\u20132594.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3363554"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3320060"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2019.10.004"},{"key":"e_1_3_2_30_2","unstructured":"Mohammad Reza Samsami and Hossein Alimadad. 2020. Distributed deep reinforcement learning: An overview. arXiv:2011.11012. Retrieved from https:\/\/arxiv.org\/abs\/2011.11012"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11633-023-1454-4"},{"key":"e_1_3_2_32_2","first-page":"617","volume-title":"Proceedings of the Conference on Robot Learning","author":"Clavera Ignasi","year":"2018","unstructured":"Ignasi Clavera, Jonas Rothfuss, John Schulman, Yasuhiro Fujita, Tamim Asfour, and Pieter Abbeel. 2018. Model-based reinforcement learning via meta-policy optimization. In Proceedings of the Conference on Robot Learning. PMLR, 617\u2013629."},{"key":"e_1_3_2_33_2","unstructured":"Thanard Kurutach Ignasi Clavera Yan Duan Aviv Tamar and Pieter Abbeel. 2018. Model-ensemble trust-region policy optimization. arXiv:1802.10592. Retrieved from https:\/\/arxiv.org\/abs\/1802.10592"},{"key":"e_1_3_2_34_2","article-title":"Reinforcement learning: An introduction","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton. 2018. Reinforcement learning: An introduction. A Bradford Book (2018).","journal-title":"A Bradford Book"},{"key":"e_1_3_2_35_2","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347. Retrieved from https:\/\/arxiv.org\/abs\/1707.06347"},{"key":"e_1_3_2_36_2","first-page":"1861","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Haarnoja Tuomas","year":"2018","unstructured":"Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning. PMLR, 1861\u20131870."},{"key":"e_1_3_2_37_2","first-page":"1587","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Fujimoto Scott","year":"2018","unstructured":"Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning. PMLR, 1587\u20131596."},{"key":"e_1_3_2_38_2","first-page":"2021","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Fu Justin","year":"2019","unstructured":"Justin Fu, Aviral Kumar, Matthew Soh, and Sergey Levine. 2019. Diagnosing bottlenecks in deep q-learning algorithms. In Proceedings of the International Conference on Machine Learning. PMLR, 2021\u20132030."},{"key":"e_1_3_2_39_2","volume-title":"Proceedings of the NeurIPS","author":"Kumar Aviral","year":"2019","unstructured":"Aviral Kumar, Justin Fu, Matthew Soh, George Tucker, and Sergey Levine. 2019. Stabilizing off-policy q-learning via bootstrapping error reduction. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_40_2","volume-title":"Proceedings of the NeurIPS","author":"Chu Cheng-Tao","year":"2006","unstructured":"Cheng-Tao Chu, Sang Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Kunle Olukotun, and Andrew Ng. 2006. Map-reduce for machine learning on multicore. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_41_2","volume-title":"Proceedings of the NeurIPS","author":"Zinkevich Martin","year":"2010","unstructured":"Martin Zinkevich, Markus Weimer, Lihong Li, and Alex Smola. 2010. Parallelized stochastic gradient descent. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_42_2","volume-title":"Proceedings of the NeurIPS","volume":"27","author":"Li Mu","year":"2014","unstructured":"Mu Li, David G. Andersen, Alexander J. Smola, and Kai Yu. 2014. Communication efficient distributed machine learning with the parameter server. In Proceedings of the NeurIPS. Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger (Eds.), Vol. 27, Curran Associates, Inc."},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035933"},{"key":"e_1_3_2_44_2","first-page":"906","volume-title":"Proceedings of the NeurIPS","author":"Zhang Hao","year":"2020","unstructured":"Hao Zhang, Yuan Li, Zhijie Deng, Xiaodan Liang, Lawrence Carin, and Eric Xing. 2020. Autosync: Learning to synchronize for data-parallel distributed deep learning. In Proceedings of the NeurIPS. 906\u2013917."},{"key":"e_1_3_2_45_2","volume-title":"Proceedings of the NeurIPS","author":"Dean Jeffrey","year":"2012","unstructured":"Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc\u2019aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, et\u00a0al. 2012. Large scale distributed deep networks. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_46_2","volume-title":"Proceedings of the NeurIPS","author":"Lee Seunghak","year":"2014","unstructured":"Seunghak Lee, Jin Kyu Kim, Xun Zheng, Qirong Ho, Garth A. Gibson, and Eric P. Xing. 2014. On model parallelization and scheduling strategies for distributed machine learning. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00049"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476209"},{"key":"e_1_3_2_49_2","unstructured":"Zhenheng Tang Shaohuai Shi Xiaowen Chu Wei Wang and Bo Li. 2020. Communication-efficient distributed deep learning: A comprehensive survey. arXiv:2003.06307. Retrieved from https:\/\/arxiv.org\/abs\/2003.06307"},{"key":"e_1_3_2_50_2","volume-title":"Proceedings of the NeurIPS","author":"Huang Yanping","year":"2019","unstructured":"Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, Hyouk Joong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. GPipe: Efficient training of giant neural networks using pipeline parallelism. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359646"},{"key":"e_1_3_2_52_2","first-page":"7937","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Narayanan Deepak","year":"2021","unstructured":"Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, and Matei Zaharia. 2021. Memory-efficient pipeline-parallel dnn training. In Proceedings of the International Conference on Machine Learning. PMLR, 7937\u20137947."},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","unstructured":"Ziyue Luo Xiaodong Yi Guoping Long Shiqing Fan Chuan Wu Jun Yang and Wei Lin. 2022. Efficient pipeline planning for expedited distributed DNN training. arXiv:2204.10562. Retrieved from https:\/\/arxiv.org\/abs\/2204.10562","DOI":"10.1109\/INFOCOM48880.2022.9796787"},{"key":"e_1_3_2_54_2","unstructured":"Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997. Retrieved from https:\/\/arxiv.org\/abs\/1404.5997"},{"key":"e_1_3_2_55_2","unstructured":"Yonghui Wu Mike Schuster Zhifeng Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey Jeff Klingner Apurva Shah Melvin Johnson Xiaobing Liu \u0141ukasz Kaiser Stephan Gouws Yoshikiyo Kato Taku Kudo Hideto Kazawa Keith Stevens George Kurian Nishant Patil Wei Wang Cliff Young Jason Smith Jason Riesa Alex Rudnick Oriol Vinyals Greg Corrado Macduff Hughes and Jeffrey Dean. 2016. Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144. Retrieved from https:\/\/arxiv.org\/abs\/1609.08144"},{"key":"e_1_3_2_56_2","volume-title":"Proceedings of the NeurIPS","volume":"31","author":"Shazeer Noam","year":"2018","unstructured":"Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, and Blake Hechtman. 2018. Mesh-TensorFlow: Deep learning for supercomputers. In Proceedings of the NeurIPS. S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31, Curran Associates, Inc."},{"key":"e_1_3_2_57_2","unstructured":"Zhihao Jia Matei Zaharia and Alex Aiken. 2019. Beyond data and model parallelism for deep neural networks. arXiv:1807.05358. Retrieved from https:\/\/arxiv.org\/abs\/1807.05358"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00024"},{"key":"e_1_3_2_59_2","first-page":"307","volume-title":"Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC\u201920)","author":"Park Jay H.","year":"2020","unstructured":"Jay H. Park, Gyeongchan Yun, M. Yi Chang, Nguyen T. Nguyen, Seungmin Lee, Jaesik Choi, Sam H. Noh, and Young-ri Choi. 2020. \\(\\lbrace\\) HetPipe \\(\\rbrace\\) : Enabling large \\(\\lbrace\\) DNN \\(\\rbrace\\) training on (Whimpy) heterogeneous \\(\\lbrace\\) GPU \\(\\rbrace\\) clusters through integration of pipelined model parallelism and data parallelism. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC\u201920). 307\u2013321."},{"key":"e_1_3_2_60_2","unstructured":"Zhengda Bian Hongxin Liu Boxiang Wang Haichen Huang Yongbin Li Chuanrui Wang Fan Cui and Yang You. 2021. Colossal-AI: A unified deep learning system for large-scale parallel training. arXiv:2110.14883. Retrieved from https:\/\/arxiv.org\/abs\/2110.14883"},{"key":"e_1_3_2_61_2","first-page":"7608","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Petrenko Aleksei","year":"2020","unstructured":"Aleksei Petrenko, Zhehui Huang, Tushar Kumar, Gaurav Sukhatme, and Vladlen Koltun. 2020. Sample factory: Egocentric 3D control from pixels at 100000 FPS with asynchronous reinforcement learning. In Proceedings of the International Conference on Machine Learning. PMLR, 7608\u20137618."},{"key":"e_1_3_2_62_2","first-page":"2263","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Espeholt Lasse","year":"2018","unstructured":"Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymyr Mnih, Tom Ward, Boron Yotam, Firoiu Vlad, Harley Tim, Iain Dunning, Shane Legg, and Koray Kavukcuoglu. 2018. IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In Proceedings of the International Conference on Machine Learning. PMLR, 2263\u20132284."},{"key":"e_1_3_2_63_2","unstructured":"Viktor Makoviychuk Lukasz Wawrzyniak Yunrong Guo Michelle Lu Kier Storey Miles Macklin David Hoeller Nikita Rudin Arthur Allshire Ankur Handa and Gavriel State. 2021. Isaac gym: High performance GPU-based physics simulation for robot learning. arXiv:2108.10470. Retrieved from https:\/\/arxiv.org\/abs\/2108.10470"},{"key":"e_1_3_2_64_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Baker Bowen","year":"2020","unstructured":"Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, and Igor Mordatch. 2020. Emergent tool use from multi-agent autocurricula. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2022.104441"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCNS.2021.3078100"},{"key":"e_1_3_2_67_2","unstructured":"Felipe Petroski Such Vashisht Madhavan Edoardo Conti Joel Lehman Kenneth O. Stanley and Jeff Clune. 2017. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv:1712.06567. Retrieved from https:\/\/arxiv.org\/abs\/1712.06567"},{"key":"e_1_3_2_68_2","first-page":"1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Horgan Dan","year":"2018","unstructured":"Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado Van Hasselt, and David Silver. 2018. Distributed prioritized experience replay. In Proceedings of the International Conference on Learning Representations. 1\u201319."},{"key":"e_1_3_2_69_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Kapturowski Steven","year":"2018","unstructured":"Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, and Will Dabney. 2018. Recurrent experience replay in distributed reinforcement learning. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_70_2","unstructured":"Arun Nair Praveen Srinivasan Sam Blackwell Cagdas Alcicek Rory Fearon Alessandro De Maria Vedavyas Panneershelvam Mustafa Suleyman Charles Beattie Stig Petersen Shane Legg Volodymyr Mnih Koray Kavukcuoglu and David Silver. 2015. Massively parallel methods for deep reinforcement learning. arXiv:1507.04296. Retrieved from https:\/\/arxiv.org\/abs\/1507.04296"},{"key":"e_1_3_2_71_2","first-page":"1928","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Mnih Volodymyr","year":"2016","unstructured":"Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning. PMLR, 1928\u20131937."},{"key":"e_1_3_2_72_2","unstructured":"Adam Stooke and Pieter Abbeel. rlpyt : A research code base for deep reinforcement learning in PyTorch. arXiv:1909.01500v2. Retrieved from https:\/\/arxiv.org\/abs\/1909.01500v2"},{"key":"e_1_3_2_73_2","first-page":"1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Wijmans Erik","year":"2020","unstructured":"Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, and Dhruv Batra. 2020. DD-PPO: Learning near-perfect PointGoal navigators from 2.5 billion frames. In Proceedings of the International Conference on Learning Representations. 1\u201321."},{"key":"e_1_3_2_74_2","first-page":"3053","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Liang Eric","year":"2018","unstructured":"Eric Liang, Richard Liaw, Robert Nishihara, Philipp Moritz, Roy Fox, Ken Goldberg, Joseph Gonzalez, Michael Jordan, and Ion Stoica. 2018. RLlib: Abstractions for distributed reinforcement learning. In Proceedings of the International Conference on Machine Learning. PMLR, 3053\u20133062."},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322259"},{"key":"e_1_3_2_76_2","unstructured":"Adam Stooke and Pieter Abbeel. 2018. Accelerated methods for deep reinforcement learning. arXiv:1803.02811. Retrieved from https:\/\/arxiv.org\/abs\/1803.02811"},{"key":"e_1_3_2_77_2","unstructured":"Amir Yazdanbakhsh Junchao Chen and Yu Zheng. 2020. Menger: Massively large-scale distributed reinforcement learning. NeurIPS Beyond Backpropagation Workshop 3 (2020). https:\/\/research.google\/pubs\/pub49803\/"},{"key":"e_1_3_2_78_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Espeholt Lasse","year":"2020","unstructured":"Lasse Espeholt, Rapha\u00ebl Marinier, Piotr Stanczyk, Ke Wang, and Marcin Michalski. 2020. Seed rl: Scalable and efficient deep-rl with accelerated central inference. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_79_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Mei Zhiyu","year":"2024","unstructured":"Zhiyu Mei, Wei Fu, Guangju Wang, Huanchen Zhang, and Yi Wu. 2024. SRL: Scaling distributed reinforcement learning to over ten thousand cores. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_80_2","first-page":"19440","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Li Zechu","year":"2023","unstructured":"Zechu Li, Tao Chen, Zhang-Wei Hong, Anurag Ajay, and Pulkit Agrawal. 2023. Parallel Q-Learning: Scaling Off-policy reinforcement learning under massively parallel simulation. In Proceedings of the International Conference on Machine Learning. PMLR, 19440\u201319459."},{"key":"e_1_3_2_81_2","first-page":"1338","volume-title":"Proceedings of the Conference on Robot Learning","author":"Zhang Yunzhi","year":"2020","unstructured":"Yunzhi Zhang, Ignasi Clavera, Boren Tsai, and Pieter Abbeel. 2020. Asynchronous methods for model-based reinforcement learning. In Proceedings of the Conference on Robot Learning. PMLR, 1338\u20131347."},{"key":"e_1_3_2_82_2","volume-title":"Proceedings of the NeurIPS","author":"Assran Mahmoud","year":"2019","unstructured":"Mahmoud Assran, Joshua Romoff, Nicolas Ballas, Joelle Pineau, and Michael Rabbat. 2019. Gossip-based actor-learner architectures for deep reinforcement learning. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.2016.2585302"},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2021.110092"},{"key":"e_1_3_2_85_2","doi-asserted-by":"crossref","unstructured":"Matteo Hessel Joseph Modayil Hado Van Hasselt Tom Schaul Georg Ostrovski Will Dabney Dan Horgan Bilal Piot Mohammad Azar and David Silver. 2018. Rainbow: Combining improvements in deep reinforcement learning. arXiv:1710.02298. Retrieved from https:\/\/arxiv.org\/abs\/1710.02298","DOI":"10.1609\/aaai.v32i1.11796"},{"key":"e_1_3_2_86_2","first-page":"1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Gruslys Audr\u016bnas","year":"2018","unstructured":"Audr\u016bnas Gruslys, Will Dabney, Mohammad Gheshlaghi Azar, Bilal Piot, Marc G. Bellemare, and R\u00e9mi Munos. 2018. The reactor: A fast and sample-efficient actor-critic agent for reinforcement learning. In Proceedings of the International Conference on Learning Representations. 1\u201318."},{"key":"e_1_3_2_87_2","first-page":"1889","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Schulman John","year":"2015","unstructured":"John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning. PMLR, 1889\u20131897."},{"key":"e_1_3_2_88_2","unstructured":"Timothy P. Lillicrap Jonathan J. Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv:1509.02971. Retrieved from https:\/\/arxiv.org\/abs\/1509.02971"},{"key":"e_1_3_2_89_2","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347. Retrieved from https:\/\/arxiv.org\/abs\/1707.06347"},{"key":"e_1_3_2_90_2","unstructured":"Tuomas Haarnoja Aurick Zhou Kristian Hartikainen George Tucker Sehoon Ha Jie Tan Vikash Kumar Henry Zhu Abhishek Gupta Pieter Abbeel and Sergey Levine. 2018. Soft actor-critic algorithms and applications. arXiv:1812.05905. Retrieved from https:\/\/arxiv.org\/abs\/1812.05905"},{"key":"e_1_3_2_91_2","first-page":"2660","volume-title":"Proceedings of the NeurIPS","author":"Tian Yuandong","year":"2017","unstructured":"Yuandong Tian, Qucheng Gong, Wenling Shang, Yuxin Wu, and C. Lawrence Zitnick. 2017. ELF: An extensive, lightweight and flexible research platform for real-time strategy games. In Proceedings of the NeurIPS. 2660\u20132670."},{"key":"e_1_3_2_92_2","unstructured":"C. Daniel Freeman Erik Frey Anton Raichuk Sertan Girgin Igor Mordatch and Olivier Bachem. 2021. Brax\u2013a differentiable physics engine for large scale rigid body simulation. arXiv:2106.13281. Retrieved from https:\/\/arxiv.org\/abs\/2106.13281"},{"key":"e_1_3_2_93_2","doi-asserted-by":"publisher","DOI":"10.1145\/2829988.2787510"},{"key":"e_1_3_2_94_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2016.2532860"},{"key":"e_1_3_2_95_2","volume-title":"Proceedings of the NeurIPS","author":"Dalton Steven","year":"2020","unstructured":"Steven Dalton and Iuri Frosio. 2020. Accelerating reinforcement learning through GPU Atari emulation. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_96_2","doi-asserted-by":"publisher","DOI":"10.1145\/3326285.3329074"},{"key":"e_1_3_2_97_2","unstructured":"Jacky Liang Viktor Makoviychuk Ankur Handa Nuttapong Chentanez Miles Macklin and Dieter Fox. 2018. GPU-accelerated robotic simulation for distributed reinforcement learning. arXiv:1810.05762. Retrieved from https:\/\/arxiv.org\/abs\/1810.05762"},{"key":"e_1_3_2_98_2","unstructured":"Tim Salimans Jonathan Ho Xi Chen Szymon Sidor and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864. Retrieved from https:\/\/arxiv.org\/abs\/1703.03864"},{"key":"e_1_3_2_99_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Shacklett Brennan","year":"2021","unstructured":"Brennan Shacklett, Erik Wijmans, Aleksei Petrenko, Manolis Savva, Dhruv Batra, Vladlen Koltun, and Kayvon Fatahalian. 2021. Large batch simulation for deep reinforcement learning. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_100_2","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459670"},{"key":"e_1_3_2_101_2","unstructured":"C. Daniel Freeman Erik Frey Anton Raichuk Sertan Girgin Igor Mordatch and Olivier Bachem. 2021. Brax \u2013 a differentiable physics engine for large scale rigid body simulation. arXiv:2106.13281. Retrieved from https:\/\/arxiv.org\/abs\/2106.13281"},{"key":"e_1_3_2_102_2","series-title":"Proceedings of Machine Learning Research","first-page":"91","volume-title":"Proceedings of the 5th Conference on Robot Learning.","volume":"164","author":"Rudin Nikita","year":"2022","unstructured":"Nikita Rudin, David Hoeller, Philipp Reist, and Marco Hutter. 2022. Learning to walk in minutes using massively parallel deep reinforcement learning. In Proceedings of the 5th Conference on Robot Learning.Aleksandra Faust, David Hsu, and Gerhard Neumann (Eds.), Proceedings of Machine Learning Research, Vol. 164, PMLR, 91\u2013100."},{"key":"e_1_3_2_103_2","doi-asserted-by":"publisher","DOI":"10.1126\/scirobotics.abg5810"},{"key":"e_1_3_2_104_2","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_3_2_105_2","first-page":"309","volume-title":"Proceedings of the European Workshop on Reinforcement Learning","author":"Li Yuxi","year":"2011","unstructured":"Yuxi Li and Dale Schuurmans. 2011. Mapreduce for parallel reinforcement learning. In Proceedings of the European Workshop on Reinforcement Learning. Springer, 309\u2013320."},{"key":"e_1_3_2_106_2","unstructured":"Eric Liang Richard Liaw Robert Nishihara Philipp Moritz Roy Fox Joseph Gonzalez Ken Goldberg and Ion Stoica. 2017. Ray rllib: A composable and scalable reinforcement learning library. arXiv:1712.09381. Retrieved from https:\/\/arxiv.org\/abs\/1712.09381"},{"key":"e_1_3_2_107_2","unstructured":"Horia Mania Aurelia Guy and Benjamin Recht. 2018. Simple random search provides a competitive approach to reinforcement learning. arXiv:1803.07055. Retrieved from https:\/\/arxiv.org\/abs\/1803.07055"},{"key":"e_1_3_2_108_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-92040-5_19"},{"key":"e_1_3_2_109_2","unstructured":"Nicolas Heess Dhruva T. B. Srinivasan Sriram Jay Lemmon Josh Merel Greg Wayne Yuval Tassa Tom Erez Ziyu Wang S. M. Ali Eslami Martin Riedmiller and David Silver. 2017. Emergence of locomotion behaviours in rich environments. arXiv:1707.02286. Retrieved from https:\/\/arxiv.org\/abs\/1707.02286"},{"key":"e_1_3_2_110_2","first-page":"1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Radients P. Olicy G.","unstructured":"P. Olicy G. Radients, Gabriel Barth-maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva Tb, Alistair Muldal, Nicolas Heess, and Timothy Lillicrap. Distributed distributional deterministic policy gradients. In Proceedings of the International Conference on Learning Representations. 1\u201316."},{"key":"e_1_3_2_111_2","first-page":"1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Babaeizadeh Mohammad","year":"2017","unstructured":"Mohammad Babaeizadeh, Iuri Frosio, Stephen Tyree, Jason Clemons, and Jan Kautz. 2017. Reinforcement learning through asynchronous advantage actor-critic on a GPU. In Proceedings of the International Conference on Learning Representations. 1\u201312."},{"key":"e_1_3_2_112_2","unstructured":"Alfredo V. Clemente Humberto N. Castej\u00f3n and Arjun Chandra. 2017. Efficient parallel methods for deep reinforcement learning. arXiv:1705.04862. Retrieved from https:\/\/arxiv.org\/abs\/1705.04862"},{"key":"e_1_3_2_113_2","first-page":"977","volume-title":"Proceedings of the 2023 USENIX Annual Technical Conference (USENIX ATC\u201923)","author":"Zhu Huanzhou","year":"2023","unstructured":"Huanzhou Zhu, Bo Zhao, Gang Chen, Weifeng Chen, Yijie Chen, Liang Shi, Yaodong Yang, Peter Pietzuch, and Lei Chen. 2023. \\(\\lbrace\\) MSRL \\(\\rbrace\\) : Distributed reinforcement learning with dataflow fragments. In Proceedings of the 2023 USENIX Annual Technical Conference (USENIX ATC\u201923). 977\u2013993."},{"key":"e_1_3_2_114_2","unstructured":"Christopher Berner Greg Brockman Brooke Chan Vicki Cheung Przemys\u0142aw D\u0119biak Christy Dennison David Farhi Quirin Fischer Shariq Hashme Chris Hesse et\u00a0al. 2019. Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680. Retrieved from https:\/\/arxiv.org\/abs\/1912.06680"},{"key":"e_1_3_2_115_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Mei Yixuan","year":"2023","unstructured":"Yixuan Mei, Jiaxuan Gao, Weirui Ye, Shaohuai Liu, Yang Gao, and Yi Wu. 2023. Speedyzero: Mastering atari with limited data and time. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_116_2","doi-asserted-by":"publisher","DOI":"10.1145\/3039902.3039915"},{"key":"e_1_3_2_117_2","doi-asserted-by":"publisher","DOI":"10.23919\/FPL.2017.8056789"},{"key":"e_1_3_2_118_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.2019.00-24"},{"key":"e_1_3_2_119_2","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304058"},{"key":"e_1_3_2_120_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM48280.2020.00012"},{"key":"e_1_3_2_121_2","doi-asserted-by":"publisher","unstructured":"Yuan Meng Chi Zhang and Viktor Prasanna. 2022. FPGA acceleration of deep reinforcement learning using on-chip replay management. In Proceedings of the 19th ACM International Conference on Computing Frontiers. (2022) 40\u201348. DOI:10.1145\/3528416.3530227","DOI":"10.1145\/3528416.3530227"},{"key":"e_1_3_2_122_2","unstructured":"Scott Reed Konrad Zolna Emilio Parisotto Sergio Gomez Colmenarejo Alexander Novikov Gabriel Barth-Maron Mai Gimenez Yury Sulsky Jackie Kay Jost Tobias Springenberg Tom Eccles Jake Bruce Ali Razavi Ashley Edwards Nicolas Heess Yutian Chen Raia Hadsell Oriol Vinyals Mahyar Bordbar and Nando de Freitas. 2022. A generalist agent. arXiv:2205.06175. Retrieved from https:\/\/arxiv.org\/abs\/2205.06175"},{"key":"e_1_3_2_123_2","unstructured":"Jiale Zhi Rui Wang Jeff Clune and Kenneth O. Stanley. 2020. Fiber: A platform for efficient development and distributed training for reinforcement learning and population-based methods. arXiv:2003.11164. Retrieved from https:\/\/arxiv.org\/abs\/2003.11164"},{"key":"e_1_3_2_124_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2885950"},{"key":"e_1_3_2_125_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373087.3375359"},{"key":"e_1_3_2_126_2","volume-title":"OSDI\u201918: Proceedings of the 13th USENIX conference on Operating Systems Design and Implementation","author":"Moritz Philipp","year":"2018","unstructured":"Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: A distributed framework for emerging AI applications. In OSDI\u201918: Proceedings of the 13th USENIX conference on Operating Systems Design and Implementation."},{"key":"e_1_3_2_127_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i15.29592"},{"key":"e_1_3_2_128_2","doi-asserted-by":"publisher","DOI":"10.1145\/3649153.3649193"},{"key":"e_1_3_2_129_2","unstructured":"Albert Bou and Gianni De Fabritiis. 2020. PyTorchRL: Modular and distributed reinforcement learning in PyTorch. arXiv:2007.02622. Retrieved from https:\/\/arxiv.org\/abs\/2007.02622"},{"key":"e_1_3_2_130_2","unstructured":"Xinghao Pan Jianmin Chen Rajat Monga Samy Bengio and Rafal Jozefowicz. 2017. Revisiting distributed synchronous SGD. arXiv:1702.05800. Retrieved from https:\/\/arxiv.org\/abs\/1702.05800"},{"key":"e_1_3_2_131_2","first-page":"4745","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Lian Xiangru","year":"2018","unstructured":"Xiangru Lian, Wei Zhang, Ce Zhang, and Ji Liu. 2018. Asynchronous decentralized parallel stochastic gradient descent. In Proceedings of the International Conference on Machine Learning. PMLR, 4745\u20134767."},{"key":"e_1_3_2_132_2","first-page":"1","volume-title":"Proceedings of the NeurIPS","author":"Ho Qirong","year":"2013","unstructured":"Qirong Ho, James Cipar, Henggang Cui, Jin Kyu Kim, Seunghak Lee, Phillip B. Gibbons, Garth A. Gibson, Gregory R. Ganger, and Eric P. Xing. 2013. More effective distributed ML via a stale synchronous parallel parameter server. In Proceedings of the NeurIPS. 1\u20139."},{"key":"e_1_3_2_133_2","first-page":"8045","volume-title":"Proceedings of the NeurIPS","author":"Li Youjie","year":"2018","unstructured":"Youjie Li, Mingchao Yu, Songze Li, Salman Avestimehr, Nam Sung Kim, and Alexander Schwing. 2018. PIPE-SGD: A decentralized pipelined SGD framework for distributed deep net training. In Proceedings of the NeurIPS. 8045\u20138056."},{"key":"e_1_3_2_134_2","unstructured":"Alfredo V. Clemente Humberto N. Castej\u00f3n and Arjun Chandra. 2017. Efficient parallel methods for deep reinforcement learning. arXiv:1705.04862. Retrieved from https:\/\/arxiv.org\/abs\/1705.04862"},{"key":"e_1_3_2_135_2","doi-asserted-by":"publisher","DOI":"10.1145\/2391229.2391236"},{"key":"e_1_3_2_136_2","first-page":"17070","article-title":"High-throughput synchronous deep rl","volume":"33","author":"Liu Iou-Jen","year":"2020","unstructured":"Iou-Jen Liu, Raymond Yeh, and Alexander Schwing. 2020. High-throughput synchronous deep rl. Advances in Neural Information Processing Systems 33 (2020), 17070\u201317080.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_137_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2019.00150"},{"key":"e_1_3_2_138_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCC.2021.3062398"},{"key":"e_1_3_2_139_2","first-page":"8545","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Schmitt Simon","year":"2020","unstructured":"Simon Schmitt, Matteo Hessel, and Karen Simonyan. 2020. Off-policy actor-critic with shared experience replay. In Proceedings of the International Conference on Machine Learning. PMLR, 8545\u20138554."},{"key":"e_1_3_2_140_2","unstructured":"Alexandre Borges and Arlindo Oliveira. 2021. Combining off and on-policy training in model-based reinforcement learning. arXiv:2102.12194. Retrieved from https:\/\/arxiv.org\/abs\/2102.12194"},{"key":"e_1_3_2_141_2","doi-asserted-by":"publisher","DOI":"10.5555\/3463952.3464104"},{"key":"e_1_3_2_142_2","unstructured":"Edoardo Conti Joel Lehman and Kenneth O. Stanley. 2018. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In Proceedings of the 32nd Conference on Neural Information Processing Systems 31 (2018)."},{"key":"e_1_3_2_143_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-018-0006-z"},{"key":"e_1_3_2_144_2","unstructured":"Shauharda Khadka. 2019. Evolution-guided policy gradient in reinforcement learning. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1188\u20131200."},{"key":"e_1_3_2_145_2","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2638566"},{"key":"e_1_3_2_146_2","doi-asserted-by":"publisher","DOI":"10.1145\/1569901.1570064"},{"key":"e_1_3_2_147_2","unstructured":"Manoj Kumar Dr Husain Naveen Upreti Deepti Gupta et\u00a0al. 2010. Genetic algorithm: Review and application. International Journal of Information Technology and Knowledge Management 2 2 (2010) 451\u2013454."},{"key":"e_1_3_2_148_2","doi-asserted-by":"publisher","DOI":"10.1162\/106365602320169811"},{"key":"e_1_3_2_149_2","doi-asserted-by":"publisher","DOI":"10.5555\/2955491.2955578"},{"key":"e_1_3_2_150_2","doi-asserted-by":"publisher","DOI":"10.1162\/artl.2009.15.2.15202"},{"key":"e_1_3_2_151_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Zoph Barret","year":"2017","unstructured":"Barret Zoph and Quoc V. Le. 2017. Neural architecture search with reinforcement learning. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_152_2","first-page":"2902","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Real Esteban","year":"2017","unstructured":"Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V. Le, and Alexey Kurakin. 2017. Large-scale evolution of image classifiers. In Proceedings of the International Conference on Machine Learning. PMLR, 2902\u20132911."},{"key":"e_1_3_2_153_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33014780"},{"key":"e_1_3_2_154_2","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2021.3088631"},{"key":"e_1_3_2_155_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2017.12.049"},{"key":"e_1_3_2_156_2","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2021.3060833"},{"key":"e_1_3_2_157_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Gangwani Tanmay","year":"2018","unstructured":"Tanmay Gangwani and Jian Peng. 2018. Policy optimization by genetic distillation. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_158_2","unstructured":"Prafulla Dhariwal Christopher Hesse Oleg Klimov Alex Nichol Matthias Plappert Alec Radford John Schulman Szymon Sidor Yuhuai Wu and Peter Zhokhov. 2017. OpenAI Baselines. https:\/\/github.com\/openai\/baselines. (2017)."},{"key":"e_1_3_2_159_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14422"},{"key":"e_1_3_2_160_2","doi-asserted-by":"publisher","DOI":"10.3389\/frobt.2016.00040"},{"key":"e_1_3_2_161_2","volume-title":"Proceedings of the NeurIPS","author":"Conti Edoardo","year":"2018","unstructured":"Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth Stanley, and Jeff Clune. 2018. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_162_2","doi-asserted-by":"publisher","DOI":"10.1145\/3319619.3321956"},{"key":"e_1_3_2_163_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Plappert Matthias","year":"2018","unstructured":"Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, and Marcin Andrychowicz. 2018. Parameter space noise for exploration. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_164_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Fortunato Meire","year":"2018","unstructured":"Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, et\u00a0al. 2018. Noisy networks for exploration. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_165_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2020.11.009"},{"key":"e_1_3_2_166_2","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2021.3088631"},{"key":"e_1_3_2_167_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-44094-7_3"},{"key":"e_1_3_2_168_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.swevo.2019.05.010"},{"key":"e_1_3_2_169_2","doi-asserted-by":"publisher","DOI":"10.1145\/3520304.3528919"},{"key":"e_1_3_2_170_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-019-09719-2"},{"key":"e_1_3_2_171_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Liu Hanxiao","year":"2018","unstructured":"Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. DARTS: Differentiable architecture search. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_172_2","first-page":"367","volume-title":"Proceedings of the Uncertainty in Artificial Intelligence","author":"Li Liam","year":"2020","unstructured":"Liam Li and Ameet Talwalkar. 2020. Random search and reproducibility for neural architecture search. In Proceedings of the Uncertainty in Artificial Intelligence. PMLR, 367\u2013377."},{"key":"e_1_3_2_173_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00783"},{"key":"e_1_3_2_174_2","volume-title":"Proceedings of the NeurIPS","author":"Kandasamy Kirthevasan","year":"2018","unstructured":"Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, and Eric P. Xing. 2018. Neural architecture search with bayesian optimisation and optimal transport. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_175_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01246-5_2"},{"key":"e_1_3_2_176_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_177_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.2988928"},{"key":"e_1_3_2_178_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2019.2929059"},{"key":"e_1_3_2_179_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.2969483"},{"key":"e_1_3_2_180_2","unstructured":"Yonghui Wu Mike Schuster Zhifeng Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey et\u00a0al. 2016. Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144. Retrieved from https:\/\/arxiv.org\/abs\/1609.08144"},{"key":"e_1_3_2_181_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2019.2957276"},{"key":"e_1_3_2_182_2","doi-asserted-by":"publisher","DOI":"10.1145\/3205455.3205489"},{"key":"e_1_3_2_183_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.02.050"},{"key":"e_1_3_2_184_2","doi-asserted-by":"publisher","DOI":"10.1145\/3425500"},{"key":"e_1_3_2_185_2","unstructured":"Sergio Guadarrama Anoop Korattikara Oscar Ramirez Pablo Castro Ethan Holly Sam Fishman Ke Wang Ekaterina Gonina Neal Wu Efi Kokiopoulou Luciano Sbaiz Jamie Smith G\u00e1bor Bart\u00f3k Jesse Berent Chris Harris Vincent Vanhoucke and Eugene Brevdo. 2018. TF-Agents: A Library for Reinforcement Learning in TensorFlow. https:\/\/github.com\/tensorflow\/agents. (2018). https:\/\/github.com\/tensorflow\/agents[Online; accessed 25-June-2019]."},{"key":"e_1_3_2_186_2","unstructured":"Jason Gauci Edoardo Conti Yitao Liang Kittipat Virochsiri Yuchen He Zachary Kaden Vivek Narayanan Xiaohui Ye Zhengxing Chen and Scott Fujimoto. 2019. Horizon : Facebook \u2019 s open source applied reinforcement learning platform. arXiv:1811.00260v5. Retrieved from https:\/\/arxiv.org\/abs\/1811.00260v5"},{"key":"e_1_3_2_187_2","unstructured":"Baidu. 2022. A flexible distributed and object-oriented programming reinforcement learning framework. github website (2022). https:\/\/github.com\/PaddlePaddle\/PARL"},{"key":"e_1_3_2_188_2","unstructured":"Matt Hoffman Bobak Shahriari John Aslanides Gabriel Barth-Maron Feryal Behbahani Tamara Norman Abbas Abdolmaleki Albin Cassirer Fan Yang Kate Baumli Sarah Henderson Alex Novikov Sergio G\u00f3mez Colmenarejo Serkan Cabi Caglar Gulcehre Tom Le Paine Andrew Cowie Ziyu Wang Bilal Piot and Nando de Freitas. 2020. Acme: A research framework for distributed reinforcement learning. arXiv:2006.00979. Retrieved from https:\/\/arxiv.org\/abs\/2006.00979"},{"issue":"77","key":"e_1_3_2_189_2","first-page":"1","article-title":"ChainerRL: A deep reinforcement learning library","volume":"22","author":"Fujita Yasuhiro","year":"2021","unstructured":"Yasuhiro Fujita, Prabhat Nagarajan, Toshiki Kataoka, and Takahiro Ishikawa. 2021. ChainerRL: A deep reinforcement learning library. Journal of Machine Learning Research 22, 77 (2021), 1\u201314. Retrieved from http:\/\/jmlr.org\/papers\/v22\/20-376.html","journal-title":"Journal of Machine Learning Research"},{"issue":"267","key":"e_1_3_2_190_2","first-page":"1","article-title":"Tianshou: A highly modularized deep reinforcement learning library","volume":"23","author":"Weng Jiayi","year":"2022","unstructured":"Jiayi Weng, Huayu Chen, Dong Yan, Kaichao You, Alexis Duburcq, Minghao Zhang, Yi Su, Hang Su, and Jun Zhu. 2022. Tianshou: A highly modularized deep reinforcement learning library. Journal of Machine Learning Research 23, 267 (2022), 1\u20136. Retrieved from http:\/\/jmlr.org\/papers\/v23\/21-1127.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_191_2","unstructured":"Albert Bou Matteo Bettini Sebastian Dittert Vikash Kumar Shagun Sodhani Xiaomeng Yang Gianni De Fabritiis and Vincent Moens. 2023. TorchRL: A Data-driven Decision-making Library for PyTorch. arXiv:2306.00577. Retrieved from https:\/\/arxiv.org\/abs\/2306.00577"},{"key":"e_1_3_2_192_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41565-020-0655-z"},{"key":"e_1_3_2_193_2","doi-asserted-by":"crossref","unstructured":"Tianqi Wang Tong Geng Ang Li Xi Jin and Martin Herbordt. 2020. FPDeep: Scalable acceleration of CNN training on deeply-pipelined FPGA clusters. IEEE Transactions on Computers 69 8 (2020) 1143\u20131158.","DOI":"10.1109\/TC.2020.3000118"},{"key":"e_1_3_2_194_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-022-00480-w"},{"issue":"11","key":"e_1_3_2_195_2","first-page":"13985","article-title":"Asynchronous curriculum experience replay: A deep reinforcement learning approach for UAV autonomous motion control in unknown dynamic environments","volume":"72","author":"Hu Zijian","year":"2023","unstructured":"Zijian Hu, Xiaoguang Gao, Kaifang Wan, Qianglong Wang, and Yiwei Zhai. 2023. Asynchronous curriculum experience replay: A deep reinforcement learning approach for UAV autonomous motion control in unknown dynamic environments. IEEE Transactions on Vehicular Technology 72, 11 (2023), 13985\u201314001.","journal-title":"IEEE Transactions on Vehicular Technology"},{"key":"e_1_3_2_196_2","first-page":"17604","article-title":"Regret minimization experience replay in off-policy reinforcement learning","volume":"34","author":"Liu Xu-Hui","year":"2021","unstructured":"Xu-Hui Liu, Zhenghai Xue, Jingcheng Pang, Shengyi Jiang, Feng Xu, and Yang Yu. 2021. Regret minimization experience replay in off-policy reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 17604\u201317615.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_197_2","volume-title":"Proceedings of the 7th Annual Conference on Robot Learning","author":"Schwarke Clemens","year":"2023","unstructured":"Clemens Schwarke, Victor Klemm, Matthijs van der Boon, Marko Bjelonic, and Marco Hutter. 2023. Curiosity-driven learning for joint locomotion and manipulation tasks. In Proceedings of the 7th Annual Conference on Robot Learning."},{"key":"e_1_3_2_198_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Wu Shuang","year":"2022","unstructured":"Shuang Wu, Jian Yao, Haobo Fu, Ye Tian, Chao Qian, Yaodong Yang, Qiang Fu, and Yang Wei. 2022. Quality-similar diversity via population based reinforcement learning. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_199_2","unstructured":"Yuntao Bai Andy Jones Kamal Ndousse Amanda Askell Anna Chen Nova DasSarma Dawn Drain Stanislav Fort Deep Ganguli Tom Henighan et\u00a0al. 2022. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv:2204.05862. Retrieved from https:\/\/arxiv.org\/abs\/2204.05862"},{"key":"e_1_3_2_200_2","unstructured":"Kun Chu Xufeng Zhao Cornelius Weber Mengdi Li and Stefan Wermter. 2023. Accelerating reinforcement learning of robotic manipulations via feedback from large language models. arXiv:2311.02379. Retrieved from https:\/\/arxiv.org\/abs\/2311.02379"},{"key":"e_1_3_2_201_2","doi-asserted-by":"crossref","unstructured":"Yuji Cao Huan Zhao Yuheng Cheng Ting Shu Guolong Liu Gaoqi Liang Junhua Zhao and Yun Li. 2024. Survey on large language model-enhanced reinforcement learning: Concept taxonomy and methods. arXiv:2404.00282. Retrieved from https:\/\/arxiv.org\/abs\/2404.00282","DOI":"10.1109\/TNNLS.2024.3497992"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3703453","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3703453","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:08Z","timestamp":1750295888000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3703453"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,10]]},"references-count":200,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,4,30]]}},"alternative-id":["10.1145\/3703453"],"URL":"https:\/\/doi.org\/10.1145\/3703453","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,10]]},"assertion":[{"value":"2023-11-11","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-28","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-10","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}