{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T03:47:54Z","timestamp":1772164074790,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":24,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,6,6]],"date-time":"2022-06-06T00:00:00Z","timestamp":1654473600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,6,6]]},"DOI":"10.1145\/3489048.3522648","type":"proceedings-article","created":{"date-parts":[[2022,6,2]],"date-time":"2022-06-02T10:30:55Z","timestamp":1654165855000},"page":"77-78","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Differentially Private Reinforcement Learning with Linear Function Approximation"],"prefix":"10.1145","author":[{"given":"Xingyu","family":"Zhou","sequence":"first","affiliation":[{"name":"Wayne State University, Detroit, MI, USA"}]}],"member":"320","published-online":{"date-parts":[[2022,6,6]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Differential privacy for multi-armed bandits: What is it and what is its cost? arXiv preprint arXiv:1905.12298","author":"Basu Debabrota","year":"2019","unstructured":"Debabrota Basu , Christos Dimitrakakis , and Aristide Tossou . Differential privacy for multi-armed bandits: What is it and what is its cost? arXiv preprint arXiv:1905.12298 , 2019 . Debabrota Basu, Christos Dimitrakakis, and Aristide Tossou. Differential privacy for multi-armed bandits: What is it and what is its cost? arXiv preprint arXiv:1905.12298, 2019."},{"key":"e_1_3_2_1_2_1","first-page":"1283","volume-title":"International Conference on Machine Learning","author":"Cai Qi","year":"2020","unstructured":"Qi Cai , Zhuoran Yang , Chi Jin , and Zhaoran Wang . Provably efficient exploration in policy optimization . In International Conference on Machine Learning , pages 1283 -- 1294 . PMLR, 2020 . Qi Cai, Zhuoran Yang, Chi Jin, and Zhaoran Wang. Provably efficient exploration in policy optimization. In International Conference on Machine Learning, pages 1283--1294. PMLR, 2020."},{"key":"e_1_3_2_1_3_1","volume-title":"Differentially private regret minimization in episodic markov decision processes. arXiv preprint arXiv:2112.10599","author":"Chowdhury Sayak Ray","year":"2021","unstructured":"Sayak Ray Chowdhury and Xingyu Zhou . Differentially private regret minimization in episodic markov decision processes. arXiv preprint arXiv:2112.10599 , 2021 . Sayak Ray Chowdhury and Xingyu Zhou. Differentially private regret minimization in episodic markov decision processes. arXiv preprint arXiv:2112.10599, 2021."},{"key":"e_1_3_2_1_4_1","first-page":"1329","volume-title":"International conference on machine learning","author":"Duan Yan","year":"2016","unstructured":"Yan Duan , Xi Chen , Rein Houthooft , John Schulman , and Pieter Abbeel . Benchmarking deep reinforcement learning for continuous control . In International conference on machine learning , pages 1329 -- 1338 . PMLR, 2016 . Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In International conference on machine learning, pages 1329--1338. PMLR, 2016."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/1791834.1791836"},{"key":"e_1_3_2_1_6_1","volume-title":"Optimistic policy optimization with bandit feedback. arXiv preprint arXiv:2002.08243","author":"Efroni Yonathan","year":"2020","unstructured":"Yonathan Efroni , Lior Shani , Aviv Rosenberg , and Shie Mannor . Optimistic policy optimization with bandit feedback. arXiv preprint arXiv:2002.08243 , 2020 . Yonathan Efroni, Lior Shani, Aviv Rosenberg, and Shie Mannor. Optimistic policy optimization with bandit feedback. arXiv preprint arXiv:2002.08243, 2020."},{"key":"e_1_3_2_1_7_1","volume-title":"Local differentially private regret minimization in reinforcement learning. arXiv preprint arXiv:2010.07778","author":"Garcelon Evrard","year":"2020","unstructured":"Evrard Garcelon , Vianney Perchet , Ciara Pike-Burke , and Matteo Pirotta . Local differentially private regret minimization in reinforcement learning. arXiv preprint arXiv:2010.07778 , 2020 . Evrard Garcelon, Vianney Perchet, Ciara Pike-Burke, and Matteo Pirotta. Local differentially private regret minimization in reinforcement learning. arXiv preprint arXiv:2010.07778, 2020."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.9914"},{"key":"e_1_3_2_1_9_1","volume-title":"A natural policy gradient. Advances in neural information processing systems, 14","author":"Kakade Sham M","year":"2001","unstructured":"Sham M Kakade . A natural policy gradient. Advances in neural information processing systems, 14 , 2001 . Sham M Kakade. A natural policy gradient. Advances in neural information processing systems, 14, 2001."},{"key":"e_1_3_2_1_10_1","first-page":"1008","volume-title":"Advances in neural information processing systems","author":"Konda Vijay R","year":"2000","unstructured":"Vijay R Konda and John N Tsitsiklis . Actor-critic algorithms . In Advances in neural information processing systems , pages 1008 -- 1014 . Citeseer , 2000 . Vijay R Konda and John N Tsitsiklis. Actor-critic algorithms. In Advances in neural information processing systems, pages 1008--1014. Citeseer, 2000."},{"key":"e_1_3_2_1_11_1","volume-title":"Neural proximal\/trust region policy optimization attains globally optimal policy. arXiv preprint arXiv:1906.10306","author":"Liu Boyi","year":"2019","unstructured":"Boyi Liu , Qi Cai , Zhuoran Yang , and Zhaoran Wang . Neural proximal\/trust region policy optimization attains globally optimal policy. arXiv preprint arXiv:1906.10306 , 2019 . Boyi Liu, Qi Cai, Zhuoran Yang, and Zhaoran Wang. Neural proximal\/trust region policy optimization attains globally optimal policy. arXiv preprint arXiv:1906.10306, 2019."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/3020847.3020909"},{"key":"e_1_3_2_1_13_1","first-page":"1889","volume-title":"International conference on machine learning","author":"Schulman John","year":"2015","unstructured":"John Schulman , Sergey Levine , Pieter Abbeel , Michael Jordan , and Philipp Moritz . Trust region policy optimization . In International conference on machine learning , pages 1889 -- 1897 . PMLR, 2015 . John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889--1897. PMLR, 2015."},{"key":"e_1_3_2_1_14_1","volume-title":"Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347","author":"Schulman John","year":"2017","unstructured":"John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , and Oleg Klimov . Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 , 2017 . John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017."},{"key":"e_1_3_2_1_15_1","volume-title":"Differentially private contextual linear bandits. arXiv preprint arXiv:1810.00068","author":"Shariff Roshan","year":"2018","unstructured":"Roshan Shariff and Or Sheffet . Differentially private contextual linear bandits. arXiv preprint arXiv:1810.00068 , 2018 . Roshan Shariff and Or Sheffet. Differentially private contextual linear bandits. arXiv preprint arXiv:1810.00068, 2018."},{"key":"e_1_3_2_1_16_1","volume-title":"Mastering the game of go without human knowledge. nature, 550 (7676): 354--359","author":"Silver David","year":"2017","unstructured":"David Silver , Julian Schrittwieser , Karen Simonyan , Ioannis Antonoglou , Aja Huang , Arthur Guez , Thomas Hubert , Lucas Baker , Matthew Lai , Adrian Bolton , Mastering the game of go without human knowledge. nature, 550 (7676): 354--359 , 2017 . David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. nature, 550 (7676): 354--359, 2017."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10212"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.10896"},{"key":"e_1_3_2_1_19_1","first-page":"9754","volume-title":"International Conference on Machine Learning","author":"Vietri Giuseppe","year":"2020","unstructured":"Giuseppe Vietri , Borja Balle , Akshay Krishnamurthy , and Steven Wu . Private reinforcement learning with pac and regret guarantees . In International Conference on Machine Learning , pages 9754 -- 9764 . PMLR, 2020 . Giuseppe Vietri, Borja Balle, Akshay Krishnamurthy, and Steven Wu. Private reinforcement learning with pac and regret guarantees. In International Conference on Machine Learning, pages 9754--9764. PMLR, 2020."},{"key":"e_1_3_2_1_20_1","volume-title":"Neural policy gradient methods: Global optimality and rates of convergence. arXiv preprint arXiv:1909.01150","author":"Wang Lingxiao","year":"2019","unstructured":"Lingxiao Wang , Qi Cai , Zhuoran Yang , and Zhaoran Wang . Neural policy gradient methods: Global optimality and rates of convergence. arXiv preprint arXiv:1909.01150 , 2019 . Lingxiao Wang, Qi Cai, Zhuoran Yang, and Zhaoran Wang. Neural policy gradient methods: Global optimality and rates of convergence. arXiv preprint arXiv:1909.01150, 2019."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-5007"},{"key":"e_1_3_2_1_22_1","volume-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8 (3--4): 229--256","author":"Williams Ronald J","year":"1992","unstructured":"Ronald J Williams . Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8 (3--4): 229--256 , 1992 . Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8 (3--4): 229--256, 1992."},{"key":"e_1_3_2_1_23_1","volume-title":"Differentially private reinforcement learning with linear function approximation. arXiv preprint arXiv:2201.07052","author":"Zhou Xingyu","year":"2022","unstructured":"Xingyu Zhou . Differentially private reinforcement learning with linear function approximation. arXiv preprint arXiv:2201.07052 , 2022 . Xingyu Zhou. Differentially private reinforcement learning with linear function approximation. arXiv preprint arXiv:2201.07052, 2022."},{"key":"e_1_3_2_1_24_1","volume-title":"Local differential privacy for bayesian optimization. arXiv preprint arXiv:2010.06709","author":"Zhou Xingyu","year":"2020","unstructured":"Xingyu Zhou and Jian Tan . Local differential privacy for bayesian optimization. arXiv preprint arXiv:2010.06709 , 2020 . Xingyu Zhou and Jian Tan. Local differential privacy for bayesian optimization. arXiv preprint arXiv:2010.06709, 2020."}],"event":{"name":"SIGMETRICS\/PERFORMANCE '22: ACM SIGMETRICS\/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems","location":"Mumbai India","acronym":"SIGMETRICS\/PERFORMANCE '22","sponsor":["SIGMETRICS ACM Special Interest Group on Measurement and Evaluation"]},"container-title":["Abstract Proceedings of the 2022 ACM SIGMETRICS\/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3489048.3522648","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3489048.3522648","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:49:00Z","timestamp":1750178940000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3489048.3522648"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,6]]},"references-count":24,"alternative-id":["10.1145\/3489048.3522648","10.1145\/3489048"],"URL":"https:\/\/doi.org\/10.1145\/3489048.3522648","relation":{"is-identical-to":[{"id-type":"doi","id":"10.1145\/3547353.3522648","asserted-by":"object"}]},"subject":[],"published":{"date-parts":[[2022,6,6]]},"assertion":[{"value":"2022-06-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}