{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:27:22Z","timestamp":1750220842224,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":24,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,10,13]],"date-time":"2019-10-13T00:00:00Z","timestamp":1570924800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,10,13]]},"DOI":"10.1145\/3356464.3357704","type":"proceedings-article","created":{"date-parts":[[2019,10,31]],"date-time":"2019-10-31T12:20:52Z","timestamp":1572524452000},"page":"1-7","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["An efficient reinforcement learning algorithm for learning deterministic policies in continuous domains"],"prefix":"10.1145","author":[{"given":"Matthieu","family":"Zimmer","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Paul","family":"Weng","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,10,13]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dan Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. http:\/\/tensorflow.org\/ Software available from tensorflow.org.  Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dan Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. http:\/\/tensorflow.org\/ Software available from tensorflow.org."},{"key":"e_1_3_2_1_2_1","unstructured":"Prafulla Dhariwal Christopher Hesse Oleg Klimov Alex Nichol Matthias Plappert Alec Radford John Schulman Szymon Sidor Yuhuai Wu and Peter Zhokhov. 2017. OpenAI Baselines. https:\/\/github.com\/openai\/baselines.  Prafulla Dhariwal Christopher Hesse Oleg Klimov Alex Nichol Matthias Plappert Alec Radford John Schulman Szymon Sidor Yuhuai Wu and Peter Zhokhov. 2017. OpenAI Baselines. https:\/\/github.com\/openai\/baselines."},{"key":"e_1_3_2_1_3_1","volume-title":"Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv preprint arXiv:1802.01561","author":"Espeholt Lasse","year":"2018","unstructured":"Lasse Espeholt , Hubert Soyer , Remi Munos , Karen Simonyan , Volodymir Mnih , Tom Ward , Yotam Doron , Vlad Firoiu , Tim Harley , Iain Dunning , 2018 . Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv preprint arXiv:1802.01561 (2018). Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, et al. 2018. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv preprint arXiv:1802.01561 (2018)."},{"key":"e_1_3_2_1_4_1","volume-title":"Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477","author":"Fujimoto Scott","year":"2018","unstructured":"Scott Fujimoto , Herke van Hoof , and David Meger . 2018. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 ( 2018 ). Scott Fujimoto, Herke van Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018)."},{"key":"e_1_3_2_1_5_1","volume-title":"Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic. arXiv preprint arXiv:1611.02247","author":"Gu Shixiang","year":"2016","unstructured":"Shixiang Gu , Timothy Lillicrap , Zoubin Ghahramani , Richard E. Turner , and Sergey Levine . 2016. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic. arXiv preprint arXiv:1611.02247 ( 2016 ). Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, and Sergey Levine. 2016. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic. arXiv preprint arXiv:1611.02247 (2016)."},{"key":"e_1_3_2_1_6_1","volume-title":"Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning. arXiv preprint arXiv:1706.00387","author":"Gu Shixiang","year":"2017","unstructured":"Shixiang Gu , Timothy Lillicrap , Zoubin Ghahramani , Richard E. Turner , Bernhard Sch\u00f6lkopf , and Sergey Levine . 2017. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning. arXiv preprint arXiv:1706.00387 ( 2017 ). Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Bernhard Sch\u00f6lkopf, and Sergey Levine. 2017. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning. arXiv preprint arXiv:1706.00387 (2017)."},{"key":"e_1_3_2_1_7_1","volume-title":"Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093","author":"Jia Yangqing","year":"2014","unstructured":"Yangqing Jia , Evan Shelhamer , Jeff Donahue , Sergey Karayev , Jonathan Long , Ross Girshick , Sergio Guadarrama , and Trevor Darrell . 2014 . Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014). Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0363012901385691"},{"key":"e_1_3_2_1_9_1","volume-title":"Continuous control with deep reinforcement learning. ICLR","author":"Lillicrap Timothy P.","year":"2016","unstructured":"Timothy P. Lillicrap , Jonathan J. Hunt , Alexander Pritzel , Nicolas Heess , Tom Erez , Yuval Tassa , David Silver , and Daan Wierstra . 2016. Continuous control with deep reinforcement learning. ICLR ( 2016 ). arXiv:1509.02971 Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. ICLR (2016). arXiv:1509.02971"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_2_1_11_1","volume-title":"Bellemare","author":"Munos Remi","year":"2016","unstructured":"Remi Munos , Tom Stepleton , Anna Harutyunyan , and Marc G . Bellemare . 2016 . Safe and Efficient Off-Policy Reinforcement Learning . arXiv preprint arXiv:1606.02647 (2016). Remi Munos, Tom Stepleton, Anna Harutyunyan, and Marc G. Bellemare. 2016. Safe and Efficient Off-Policy Reinforcement Learning. arXiv preprint arXiv:1606.02647 (2016)."},{"key":"e_1_3_2_1_12_1","unstructured":"Art B. Owen. 2013. Monte Carlo theory methods and examples.  Art B. Owen. 2013. Monte Carlo theory methods and examples."},{"key":"e_1_3_2_1_13_1","volume-title":"Eligibility traces for off-policy policy evaluation","author":"Precup Doina","year":"2000","unstructured":"Doina Precup . 2000. Eligibility traces for off-policy policy evaluation . Computer Science Department Faculty Publication Series ( 2000 ), 80. Doina Precup. 2000. Eligibility traces for off-policy policy evaluation. Computer Science Department Faculty Publication Series (2000), 80."},{"key":"e_1_3_2_1_14_1","volume-title":"High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438","author":"Schulman John","year":"2015","unstructured":"John Schulman , Philipp Moritz , Sergey Levine , Michael Jordan , and Pieter Abbeel . 2015. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 ( 2015 ). John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2015. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)."},{"key":"e_1_3_2_1_15_1","volume-title":"Proximal Policy Optimization Algorithms. CoRR abs\/1707.06347","author":"Schulman John","year":"2017","unstructured":"John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , and Oleg Klimov . 2017. Proximal Policy Optimization Algorithms. CoRR abs\/1707.06347 ( 2017 ). arXiv:1707.06347 http:\/\/arxiv.org\/abs\/1707.06347 John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. CoRR abs\/1707.06347 (2017). arXiv:1707.06347 http:\/\/arxiv.org\/abs\/1707.06347"},{"key":"e_1_3_2_1_16_1","volume-title":"Proceedings of the 31st International Conference on Machine Learning","author":"Silver David","year":"2014","unstructured":"David Silver , Guy Lever , Nicolas Heess , Thomas Degris , Daan Wierstra , and Martin Riedmiller . 2014 . Deterministic Policy Gradient Algorithms . Proceedings of the 31st International Conference on Machine Learning (2014), 387--395. David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic Policy Gradient Algorithms. Proceedings of the 31st International Conference on Machine Learning (2014), 387--395."},{"key":"e_1_3_2_1_17_1","volume-title":"Barto","author":"Sutton Richard S.","year":"1998","unstructured":"Richard S. Sutton and Andrew G . Barto . 1998 . Reinforcement Learning : An Introduction (Adaptive Computation and Machine Learning). A Bradford Book . Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). A Bradford Book."},{"key":"e_1_3_2_1_18_1","first-page":"1057","article-title":"Policy Gradient Methods for Reinforcement Learning with Function Approximation","volume":"12","author":"Sutton Richard S.","year":"1999","unstructured":"Richard S. Sutton , David Mcallester , Satinder Singh , and Yishay Mansour . 1999 . Policy Gradient Methods for Reinforcement Learning with Function Approximation . In Advances in Neural Information Processing Systems 12 (1999), 1057 -- 1063 . https:\/\/doi.org\/10.1.1.37.9714 Richard S. Sutton, David Mcallester, Satinder Singh, and Yishay Mansour. 1999. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Advances in Neural Information Processing Systems 12 (1999), 1057--1063. https:\/\/doi.org\/10.1.1.37.9714","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ADPRL.2007.368199"},{"key":"e_1_3_2_1_20_1","volume-title":"Sample Efficient Actor-Critic with Experience Replay. arXiv preprint arXiv:1611.01224","author":"Wang Ziyu","year":"2016","unstructured":"Ziyu Wang , Victor Bapst , Nicolas Heess , Volodymyr Mnih , Remi Munos , Koray Kavukcuoglu , and Nando DE Freitas . 2016. Sample Efficient Actor-Critic with Experience Replay. arXiv preprint arXiv:1611.01224 ( 2016 ). Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, and Nando DE Freitas. 2016. Sample Efficient Actor-Critic with Experience Replay. arXiv preprint arXiv:1611.01224 (2016)."},{"key":"e_1_3_2_1_21_1","volume-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3-4","author":"Williams Ronald J.","year":"1992","unstructured":"Ronald J. Williams . 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3-4 ( 1992 ), 229--256. Ronald J. Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3-4 (1992), 229--256."},{"key":"e_1_3_2_1_22_1","volume-title":"Neural Fitted Actor-Critic. In European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning.","author":"Zimmer Matthieu","year":"2016","unstructured":"Matthieu Zimmer , Yann Boniface , and Alain Dutech . 2016 . Neural Fitted Actor-Critic. In European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Matthieu Zimmer, Yann Boniface, and Alain Dutech. 2016. Neural Fitted Actor-Critic. In European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/DEVLRN.2018.8761021"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/625"}],"event":{"name":"DAI '19: First International Conference on Distributed Artificial Intelligence","acronym":"DAI '19","location":"Beijing China"},"container-title":["Proceedings of the First International Conference on Distributed Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356464.3357704","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3356464.3357704","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:22:54Z","timestamp":1750202574000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356464.3357704"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,13]]},"references-count":24,"alternative-id":["10.1145\/3356464.3357704","10.1145\/3356464"],"URL":"https:\/\/doi.org\/10.1145\/3356464.3357704","relation":{},"subject":[],"published":{"date-parts":[[2019,10,13]]},"assertion":[{"value":"2019-10-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}