{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T17:05:30Z","timestamp":1774631130334,"version":"3.50.1"},"reference-count":78,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,9]]},"abstract":"<jats:p>Deep reinforcement learning (DRL) has demonstrated significant potential in various applications, including gaming AI, robotics, and system scheduling. DRL algorithms produce, sample, and learn from training data online through a trial-and-error process, demanding considerable time and computational resources. To address this, distributed DRL algorithms and paradigms have been developed to expedite training using extensive resources. Through carefully designed experiments, we are the first to observe that strategically increasing the actor-environment interactions by spawning more concurrent actors at certain training rounds within ephemeral time frames can significantly enhance training efficiency. Yet, current distributed DRL solutions, which are predominantly server-based (or serverful), fail to capitalize on these opportunities due to their long startup times, limited adaptability, and cumbersome scalability.<\/jats:p>\n          <jats:p>\n            This paper proposes\n            <jats:italic>Nitro<\/jats:italic>\n            , a generic training engine for distributed DRL algorithms that enforces timely and effective boosting with concurrent actors instantaneously spawned by serverless computing. With serverless functions,\n            <jats:italic>Nitro<\/jats:italic>\n            adjusts data sampling strategies dynamically according to the DRL training demands.\n            <jats:italic>Nitro<\/jats:italic>\n            seizes the opportunity of real-time boosting by accurately and swiftly detecting an empirical metric. To achieve cost efficiency, we design a heuristic actor scaling algorithm to guide\n            <jats:italic>Nitro<\/jats:italic>\n            for cost-aware boosting budget allocation. We integrate\n            <jats:italic>Nitro<\/jats:italic>\n            with state-of-the-art DRL algorithms and frameworks and evaluate them on AWS EC2 and Lambda. Experiments with Mujoco and Atari benchmarks show that\n            <jats:italic>Nitro<\/jats:italic>\n            improves the final rewards (\n            <jats:italic>i.e.<\/jats:italic>\n            , training quality) by up to 6\u00d7 and reduces training costs by up to 42%.\n          <\/jats:p>","DOI":"10.14778\/3696435.3696441","type":"journal-article","created":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T20:41:46Z","timestamp":1739306506000},"page":"66-79","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Nitro: Boosting Distributed Reinforcement Learning with Serverless Computing"],"prefix":"10.14778","volume":"18","author":[{"given":"Hanfei","family":"Yu","sequence":"first","affiliation":[{"name":"Stevens Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jacob","family":"Carter","sequence":"additional","affiliation":[{"name":"Louisiana State University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hao","family":"Wang","sequence":"additional","affiliation":[{"name":"Stevens Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Devesh","family":"Tiwari","sequence":"additional","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jian","family":"Li","sequence":"additional","affiliation":[{"name":"Stony Brook University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Seung-Jong","family":"Park","sequence":"additional","affiliation":[{"name":"Missouri University of Science and Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,2,11]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Joshua Achiam. 2018. Spinning Up in Deep Reinforcement Learning. https:\/\/spinningup.openai.com."},{"key":"e_1_2_1_2_1","volume-title":"Constrained Policy Optimization. In International Conference on Machine Learning (ICML).","author":"Achiam Joshua","year":"2017","unstructured":"Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained Policy Optimization. In International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_3_1","volume-title":"Natural Gradient Works Efficiently in Learning. Neural Computation","author":"Amari Shun-Ichi","year":"1998","unstructured":"Shun-Ichi Amari. 1998. Natural Gradient Works Efficiently in Learning. Neural Computation (1998)."},{"key":"e_1_2_1_4_1","unstructured":"AWS. 2006. AWS EC2: Secure and Resizable Compute Capacity in the Cloud. https:\/\/aws.amazon.com\/ec2\/."},{"key":"e_1_2_1_5_1","unstructured":"AWS. 2014. AWS Lambda: Serverless Compute. https:\/\/aws.amazon.com\/lambda\/."},{"key":"e_1_2_1_6_1","unstructured":"AWS. 2015. Amazon Elastic Container Registry. https:\/\/aws.amazon.com\/ecr\/."},{"key":"e_1_2_1_7_1","unstructured":"AWS. 2015. AWS SDK for Python (Boto3). https:\/\/aws.amazon.com\/sdk-for-python\/."},{"key":"e_1_2_1_8_1","unstructured":"AWS. 2018. AWS Lambda: Serverless Compute. https:\/\/aws.amazon.com\/lambda\/."},{"key":"e_1_2_1_9_1","unstructured":"AWS. 2019. AWS Lambda: Configuring Provisioned Concurrency. https:\/\/docs.aws.amazon.com\/lambda\/latest\/dg\/provisioned-concurrency.html."},{"key":"e_1_2_1_10_1","unstructured":"Christopher Berner Greg Brockman Brooke Chan Vicki Cheung Przemys\u0142aw D\u0119biak Christy Dennison David Farhi Quirin Fischer Shariq Hashme Chris Hesse et al. 2019. Dota 2 with Large Scale Deep Reinforcement Learning. arXiv preprint arXiv:1912.06680 (2019)."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3357223.3362711"},{"key":"e_1_2_1_12_1","volume-title":"Auto: Scaling Deep Reinforcement Learning for Datacenter-scale Automatic Traffic Optimization. In 2018 conference of the ACM special interest group on data communication (SIGCOMM).","author":"Chen Li","year":"2018","unstructured":"Li Chen, Justinas Lingys, Kai Chen, and Feng Liu. 2018. Auto: Scaling Deep Reinforcement Learning for Datacenter-scale Automatic Traffic Optimization. In 2018 conference of the ACM special interest group on data communication (SIGCOMM)."},{"key":"e_1_2_1_13_1","volume-title":"Transferable Active Grasping and Real Embodied Dataset. In 2020 IEEE International Conference on Robotics and Automation (ICRA).","author":"Chen Xiangyu","year":"2020","unstructured":"Xiangyu Chen, Zelin Ye, Jiankai Sun, Yuda Fan, Fang Hu, Chenxi Wang, and Cewu Lu. 2020. Transferable Active Grasping and Real Embodied Dataset. In 2020 IEEE International Conference on Robotics and Automation (ICRA)."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/2988128"},{"key":"e_1_2_1_15_1","volume-title":"Sharp Minima Can Generalize for Deep Nets. In International Conference on Machine Learning (ICML).","author":"Dinh Laurent","year":"2017","unstructured":"Laurent Dinh, Razvan Pascanu, Samy Bengio, and Yoshua Bengio. 2017. Sharp Minima Can Generalize for Deep Nets. In International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_16_1","volume-title":"Seed RL: Scalable and Efficient Deep-RL with Accelerated Central Inference. In International Conference on Learning Representations (ICLR).","author":"Espeholt Lasse","year":"2020","unstructured":"Lasse Espeholt, Rapha\u00ebl Marinier, Piotr Stanczyk, Ke Wang, and Marcin Michalski. 2020. Seed RL: Scalable and Efficient Deep-RL with Accelerated Central Inference. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_1_17_1","volume-title":"IM-PALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. In International Conference on Machine Learning (ICML).","author":"Espeholt Lasse","year":"2018","unstructured":"Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Vlad Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, et al. 2018. IM-PALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. In International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_18_1","volume-title":"Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning. Advances in Neural Information Processing Systems (NIPS)","author":"Gu Shixiang Shane","year":"2017","unstructured":"Shixiang Shane Gu, Timothy Lillicrap, Richard E Turner, Zoubin Ghahramani, Bernhard Sch\u00f6lkopf, and Sergey Levine. 2017. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning. Advances in Neural Information Processing Systems (NIPS) (2017)."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of Machine Learning and Systems (MLSys)","author":"Guo Runsheng","year":"2022","unstructured":"Runsheng Guo, Victor Guo, Antonio Kim, Josh Hildred, and Khuzaima Daudjee. 2022. Hydrozoa: Dynamic Hybrid-Parallel DNN Training on Serverless Containers. Proceedings of Machine Learning and Systems (MLSys) (2022)."},{"key":"e_1_2_1_20_1","volume-title":"Proc. the IEEE Conference on Computer Communications (INFOCOM).","author":"Hao Wang","year":"2019","unstructured":"Wang Hao, Niu Di, and Li Baochun. 2019. Distributed Machine Learning with a Serverless Architecture. In Proc. the IEEE Conference on Computer Communications (INFOCOM)."},{"key":"e_1_2_1_21_1","volume-title":"Rainbow: Combining Improvements in Deep Reinforcement Learning. In Thirty-second AAAI conference on artificial intelligence (AAAI).","author":"Hessel Matteo","year":"2018","unstructured":"Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. 2018. Rainbow: Combining Improvements in Deep Reinforcement Learning. In Thirty-second AAAI conference on artificial intelligence (AAAI)."},{"key":"e_1_2_1_22_1","volume-title":"Acme: A Research Framework for Distributed Reinforcement Learning. arXiv preprint arXiv:2006.00979","author":"Hoffman Matthew W","year":"2020","unstructured":"Matthew W Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Nikola Momchev, Danila Sinopalnikov, Piotr Sta\u0144czyk, Sabela Ramos, Anton Raichuk, Damien Vincent, et al. 2020. Acme: A Research Framework for Distributed Reinforcement Learning. arXiv preprint arXiv:2006.00979 (2020)."},{"key":"e_1_2_1_23_1","volume-title":"Distributed Prioritized Experience Replay. In International Conference on Learning Representations (ICLR).","author":"Horgan Dan","year":"2018","unstructured":"Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado Van Hasselt, and David Silver. 2018. Distributed Prioritized Experience Replay. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_1_24_1","volume-title":"Momentum-Based Policy Gradient Methods. In International Conference on Machine Learning (ICML).","author":"Huang Feihu","year":"2020","unstructured":"Feihu Huang, Shangqian Gao, Jian Pei, and Heng Huang. 2020. Momentum-Based Policy Gradient Methods. In International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_25_1","volume-title":"Chang Ye, Jeff Braga, Dipam Chakraborty, Kinal Mehta, and Jo\u00e3o GM Ara\u00fajo.","author":"Huang Shengyi","year":"2022","unstructured":"Shengyi Huang, Rousslan Fernand JulienDossa Dossa, Chang Ye, Jeff Braga, Dipam Chakraborty, Kinal Mehta, and Jo\u00e3o GM Ara\u00fajo. 2022. CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms. The Journal of Machine Learning Research (2022)."},{"key":"e_1_2_1_26_1","volume-title":"Huawei Technologies Co","year":"2022","unstructured":"Ltd. Huawei Technologies Co. 2022. Huawei MindSpore AI Development Framework. In Artificial Intelligence Technology."},{"key":"e_1_2_1_27_1","volume-title":"A Systematic Evaluation of Machine Learning on Serverless Infrastructure. The VLDB Journal","author":"Jiang Jiawei","year":"2023","unstructured":"Jiawei Jiang, Shaoduo Gan, Bo Du, Gustavo Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, Sheng Wang, and Ce Zhang. 2023. A Systematic Evaluation of Machine Learning on Serverless Infrastructure. The VLDB Journal (2023)."},{"key":"e_1_2_1_28_1","volume-title":"Towards Demystifying Serverless Machine Learning Training. In 2021 International Conference on Management of Data (SIGMOD).","author":"Jiang Jiawei","year":"2021","unstructured":"Jiawei Jiang, Shaoduo Gan, Yue Liu, Fanlin Wang, Gustavo Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, and Ce Zhang. 2021. Towards Demystifying Serverless Machine Learning Training. In 2021 International Conference on Management of Data (SIGMOD)."},{"key":"e_1_2_1_29_1","volume-title":"International Conference on Learning Representations (ICLR).","author":"Kakade Sham","year":"2020","unstructured":"Sham Kakade and John Langford. 2020. A Closer Look at Deep Policy Gradients. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_1_30_1","volume-title":"A Natural Policy Gradient. Advances in Neural Information Processing Systems (NIPS)","author":"Kakade Sham M","year":"2001","unstructured":"Sham M Kakade. 2001. A Natural Policy Gradient. Advances in Neural Information Processing Systems (NIPS) (2001)."},{"key":"e_1_2_1_31_1","volume-title":"Recurrent Experience Replay in Distributed Reinforcement Learning. In International Conference on Learning Representations (ICLR).","author":"Kapturowski Steven","year":"2018","unstructured":"Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, and Will Dabney. 2018. Recurrent Experience Replay in Distributed Reinforcement Learning. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_1_32_1","volume-title":"On Large-batch Training for Deep Learning: Generalization Gap and Sharp Minima. In International Conference on Learning Representations (ICLR).","author":"Keskar Nitish Shirish","year":"2017","unstructured":"Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2017. On Large-batch Training for Deep Learning: Generalization Gap and Sharp Minima. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_1_33_1","volume-title":"Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_2_1_34_1","volume-title":"ActorQ: Quantization for Actor-learner Distributed Reinforcement Learning. In Hardware Aware Efficient Training Workshop at ICLR.","author":"Lam Maximilian","year":"2021","unstructured":"Maximilian Lam, Sharad Chitlangia, Srivatsan Krishnan, Zishen Wan, Gabriel Barth-Maron, Aleksandra Faust, and Vijay Janapa Reddi. 2021. ActorQ: Quantization for Actor-learner Distributed Reinforcement Learning. In Hardware Aware Efficient Training Workshop at ICLR."},{"key":"e_1_2_1_35_1","volume-title":"Playing FPS Games with Deep Reinforcement Learning. In Thirty-First AAAI Conference on Artificial Intelligence (AAAI).","author":"Lample Guillaume","year":"2017","unstructured":"Guillaume Lample and Devendra Singh Chaplot. 2017. Playing FPS Games with Deep Reinforcement Learning. In Thirty-First AAAI Conference on Artificial Intelligence (AAAI)."},{"key":"e_1_2_1_36_1","volume-title":"Visualizing the Loss Landscape of Neural Nets. Advances in neural information processing systems (NIPS)","author":"Li Hao","year":"2018","unstructured":"Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. 2018. Visualizing the Loss Landscape of Neural Nets. Advances in neural information processing systems (NIPS) (2018)."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3472883.3486971"},{"key":"e_1_2_1_38_1","volume-title":"International Conference on Machine Learning (ICML).","author":"Li Zechu","year":"2023","unstructured":"Zechu Li, Tao Chen, Zhang-Wei Hong, Anurag Ajay, and Pulkit Agrawal. 2023. Parallel Q-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation. In International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_39_1","volume-title":"RLlib: Abstractions for Distributed Reinforcement Learning. In International Conference on Machine Learning (ICML).","author":"Liang Eric","year":"2018","unstructured":"Eric Liang, Richard Liaw, Robert Nishihara, Philipp Moritz, Roy Fox, Ken Goldberg, Joseph Gonzalez, Michael Jordan, and Ion Stoica. 2018. RLlib: Abstractions for Distributed Reinforcement Learning. In International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_40_1","volume-title":"PTR-PPO: Proximal Policy Optimization with Prioritized Trajectory Replay. arXiv preprint arXiv:2112.03798","author":"Liang Xingxing","year":"2021","unstructured":"Xingxing Liang, Yang Ma, Yanghe Feng, and Zhong Liu. 2021. PTR-PPO: Proximal Policy Optimization with Prioritized Trajectory Replay. arXiv preprint arXiv:2112.03798 (2021)."},{"key":"e_1_2_1_41_1","volume-title":"Continuous Control with Deep Reinforcement Learning. arXiv preprint arXiv:1509.02971","author":"Lillicrap Timothy P","year":"2015","unstructured":"Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous Control with Deep Reinforcement Learning. arXiv preprint arXiv:1509.02971 (2015)."},{"key":"e_1_2_1_42_1","volume-title":"Constrained Variational Policy Optimization for Safe Reinforcement Learning. In International Conference on Machine Learning (ICML).","author":"Liu Zuxin","year":"2022","unstructured":"Zuxin Liu, Zhepeng Cen, Vladislav Isenbaev, Wei Liu, Steven Wu, Bo Li, and Ding Zhao. 2022. Constrained Variational Policy Optimization for Safe Reinforcement Learning. In International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_43_1","volume-title":"IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks. In International Conference on Learning Representations (ICLR).","author":"Luo Michael","year":"2020","unstructured":"Michael Luo, Jiahao Yao, Richard Liaw, Eric Liang, and Ion Stoica. 2020. IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_1_44_1","volume-title":"KungFu: Making Training in Distributed Machine Learning Adaptive. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI).","author":"Mai Luo","year":"2020","unstructured":"Luo Mai, Guo Li, Marcel Wagenl\u00e4nder, Konstantinos Fertakis, Andrei-Octavian Brabete, and Peter Pietzuch. 2020. KungFu: Making Training in Distributed Machine Learning Adaptive. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI)."},{"key":"e_1_2_1_45_1","volume-title":"Zili Meng, and Mohammad Alizadeh.","author":"Mao Hongzi","year":"2019","unstructured":"Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2019. Learning Scheduling Algorithms for Data Processing Clusters. In ACM Special Interest Group on Data Communication (SIGCOMM)."},{"key":"e_1_2_1_46_1","volume-title":"New Insights and Perspectives on the Natural Gradient Method. Journal of Machine Learning Research (JMLR)","author":"Martens James","year":"2020","unstructured":"James Martens. 2020. New Insights and Perspectives on the Natural Gradient Method. Journal of Machine Learning Research (JMLR) (2020)."},{"key":"e_1_2_1_47_1","volume-title":"International Conference on Learning Representations (ICLR)","author":"Mei Zhiyu","year":"2024","unstructured":"Zhiyu Mei, Wei Fu, Guangju Wang, Huanchen Zhang, and Yi Wu. 2024. SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores. International Conference on Learning Representations (ICLR) (2024)."},{"key":"e_1_2_1_48_1","volume-title":"Docker: Lightweight Linux Containers for Consistent Development and Deployment. Linux Journal","author":"Dirk Merkel","year":"2014","unstructured":"Dirk Merkel et al. 2014. Docker: Lightweight Linux Containers for Consistent Development and Deployment. Linux Journal (2014)."},{"key":"e_1_2_1_49_1","volume-title":"Device Placement Optimization with Reinforcement Learning. In International Conference on Machine Learning (ICML).","author":"Mirhoseini Azalia","year":"2017","unstructured":"Azalia Mirhoseini, Hieu Pham, Quoc V Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeff Dean. 2017. Device Placement Optimization with Reinforcement Learning. In International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_50_1","volume-title":"Asynchronous Methods for Deep Reinforcement Learning. In International Conference on Machine Learning (ICML).","author":"Mnih Volodymyr","year":"2016","unstructured":"Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous Methods for Deep Reinforcement Learning. In International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_51_1","volume-title":"Vedavyas Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen, et al.","author":"Nair Arun","year":"2015","unstructured":"Arun Nair, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro De Maria, Vedavyas Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen, et al. 2015. Massively Parallel Methods for Deep Reinforcement Learning. arXiv preprint arXiv:1507.04296 (2015)."},{"key":"e_1_2_1_52_1","volume-title":"Ridge rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian. Advances in Neural Information Processing Systems (NIPS)","author":"Parker-Holder Jack","year":"2020","unstructured":"Jack Parker-Holder, Luke Metz, Cinjon Resnick, Hengyuan Hu, Adam Lerer, Alistair Letcher, Alexander Peysakhovich, Aldo Pacchiano, and Jakob Foerster. 2020. Ridge rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian. Advances in Neural Information Processing Systems (NIPS) (2020)."},{"key":"e_1_2_1_53_1","unstructured":"Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury Gregory Chanan Trevor Killeen Zeming Lin Natalia Gimelshein Luca Antiga et al. 2019. PyTorch: An Imperative Style High-Performance Deep Learning Library. Advances in Neural Information Processing Systems (NIPS) (2019)."},{"key":"e_1_2_1_54_1","unstructured":"Python. 2008. Pickle --- Python Object Serialization. https:\/\/docs.python.org\/3\/library\/pickle.html."},{"key":"e_1_2_1_55_1","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI).","author":"Qiu Haoran","year":"2020","unstructured":"Haoran Qiu, Subho S Banerjee, Saurabh Jha, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. 2020. {FIRM}: An Intelligent Fine-grained Resource Management Framework for {SLO-Oriented} Microservices. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI)."},{"key":"e_1_2_1_56_1","unstructured":"Alec Radford Jeffrey Wu Rewon Child David Luan Dario Amodei Ilya Sutskever et al. 2019. Language Models are Unsupervised Multitask Learners. OpenAI blog (2019)."},{"key":"e_1_2_1_57_1","volume-title":"Stable-baselines3: Reliable Reinforcement Learning Implementations. The Journal of Machine Learning Research","author":"Raffin Antonin","year":"2021","unstructured":"Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. 2021. Stable-baselines3: Reliable Reinforcement Learning Implementations. The Journal of Machine Learning Research (2021)."},{"key":"e_1_2_1_58_1","unstructured":"Redis. 2009. Redis Official Website. http:\/\/redis.io\/."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3472883.3486972"},{"key":"e_1_2_1_60_1","volume":"198","author":"Rumelhart David E","unstructured":"David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning Representations by Back-propagating Errors. Nature (1986).","journal-title":"Ronald J Williams."},{"key":"e_1_2_1_61_1","volume-title":"Trust Region Policy Optimization. In International Conference on Machine Learning (ICML).","author":"Schulman John","year":"2015","unstructured":"John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust Region Policy Optimization. In International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_62_1","volume-title":"High-Dimensional Continuous Control Using Generalized Advantage Estimation. In International Conference on Learning Representations (ICLR).","author":"Schulman John","year":"2016","unstructured":"John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2016. High-Dimensional Continuous Control Using Generalized Advantage Estimation. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_1_63_1","volume-title":"Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347","author":"Schulman John","year":"2017","unstructured":"John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347 (2017)."},{"key":"e_1_2_1_64_1","volume-title":"Hessian Aided Policy Gradient. In International Conference on Machine Learning. 5729--5738","author":"Shen Zebang","year":"2019","unstructured":"Zebang Shen, Alejandro Ribeiro, Hamed Hassani, Hui Qian, and Chao Mi. 2019. Hessian Aided Policy Gradient. In International Conference on Machine Learning. 5729--5738."},{"key":"e_1_2_1_65_1","volume-title":"Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al.","author":"Silver David","year":"2016","unstructured":"David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature (2016)."},{"key":"e_1_2_1_66_1","volume-title":"Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments. In Nineteenth International Conference on Machine Learning (ICML).","author":"Sullivan Ryan","year":"2022","unstructured":"Ryan Sullivan, Justin K Terry, Benjamin Black, and John P Dickerson. 2022. Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments. In Nineteenth International Conference on Machine Learning (ICML)."},{"key":"e_1_2_1_67_1","volume-title":"Policy Gradient Methods for Reinforcement Learning with Function Approximation. Advances in Neural Information Processing Systems (NIPS)","author":"Sutton Richard S","year":"1999","unstructured":"Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 1999. Policy Gradient Methods for Reinforcement Learning with Function Approximation. Advances in Neural Information Processing Systems (NIPS) (1999)."},{"key":"e_1_2_1_68_1","volume-title":"MuJoCo: A Physics Engine for Model-based Control. In IEEE\/RSJ International Conference on Intelligent Robots and Systems.","author":"Todorov Emanuel","year":"2012","unstructured":"Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. MuJoCo: A Physics Engine for Model-based Control. In IEEE\/RSJ International Conference on Intelligent Robots and Systems."},{"key":"e_1_2_1_69_1","volume-title":"Predicting Neural Network Accuracy from Weights. arXiv preprint arXiv:2002.11448","author":"Unterthiner Thomas","year":"2020","unstructured":"Thomas Unterthiner, Daniel Keysers, Sylvain Gelly, Olivier Bousquet, and Ilya Tolstikhin. 2020. Predicting Neural Network Accuracy from Weights. arXiv preprint arXiv:2002.11448 (2020)."},{"key":"e_1_2_1_70_1","doi-asserted-by":"crossref","unstructured":"Oriol Vinyals Igor Babuschkin Wojciech M Czarnecki Micha\u00ebl Mathieu Andrew Dudzik Junyoung Chung David H Choi Richard Powell Timo Ewalds Petko Georgiev et al. 2019. Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Learning. Nature (2019).","DOI":"10.1038\/s41586-019-1724-z"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2019.8737391"},{"key":"e_1_2_1_72_1","volume-title":"DD-PPO: Learning Near-Perfect Point-Goal Navigators from 2.5 Billion Frames. arXiv preprint arXiv:1911.00357","author":"Wijmans Erik","year":"2019","unstructured":"Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, and Dhruv Batra. 2019. DD-PPO: Learning Near-Perfect Point-Goal Navigators from 2.5 Billion Frames. arXiv preprint arXiv:1911.00357 (2019)."},{"key":"e_1_2_1_73_1","volume-title":"Pyhessian: Neural Networks Through the Lens of the Hessian. In 2020 IEEE international conference on big data (Big data).","author":"Yao Zhewei","year":"2020","unstructured":"Zhewei Yao, Amir Gholami, Kurt Keutzer, and Michael W Mahoney. 2020. Pyhessian: Neural Networks Through the Lens of the Hessian. In 2020 IEEE international conference on big data (Big data)."},{"key":"e_1_2_1_74_1","volume-title":"Cheaper and Faster: Distributed Deep Reinforcement Learning with Serverless Computing. In Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI).","author":"Yu Hanfei","year":"2024","unstructured":"Hanfei Yu, Jian Li, Yang Hua, Xu Yuan, and Hao Wang. 2024. Cheaper and Faster: Distributed Deep Reinforcement Learning with Serverless Computing. In Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI)."},{"key":"e_1_2_1_75_1","volume-title":"Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless Computing. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC).","author":"Yu Hanfei","year":"2024","unstructured":"Hanfei Yu, Hao Wang, Devesh Tiwari, Jian Li, and Seung-Jong Park. 2024. Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless Computing. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC)."},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1145\/3127479.3127490"},{"key":"e_1_2_1_77_1","volume-title":"MSRL: Distributed Reinforcement Learning with Dataflow Fragments. In 2023 USENIX Annual Technical Conference (ATC).","author":"Zhu Huanzhou","year":"2023","unstructured":"Huanzhou Zhu, Bo Zhao, Gang Chen, Weifeng Chen, Yijie Chen, Liang Shi, Yaodong Yang, Peter Pietzuch, and Lei Chen. 2023. MSRL: Distributed Reinforcement Learning with Dataflow Fragments. In 2023 USENIX Annual Technical Conference (ATC)."},{"key":"e_1_2_1_78_1","volume-title":"Target-driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning. In IEEE International Conference on Robotics and Automation (ICRA","author":"Zhu Yuke","year":"2017","unstructured":"Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J Lim, Abhinav Gupta, Li Fei-Fei, and Ali Farhadi. 2017. Target-driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning. In IEEE International Conference on Robotics and Automation (ICRA 2017)."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3696435.3696441","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T20:42:36Z","timestamp":1739306556000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3696435.3696441"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9]]},"references-count":78,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,9]]}},"alternative-id":["10.14778\/3696435.3696441"],"URL":"https:\/\/doi.org\/10.14778\/3696435.3696441","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,9]]},"assertion":[{"value":"2025-02-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}