{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T16:15:20Z","timestamp":1778602520524,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":65,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,5,24]],"date-time":"2021-05-24T00:00:00Z","timestamp":1621814400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NRF Investiga-torship","award":["NRFI06-2020-0022"],"award-info":[{"award-number":["NRFI06-2020-0022"]}]},{"name":"National Research Foundation, Singapore under its AI Singapore Programme","award":["AISG2-RP-2020-019"],"award-info":[{"award-number":["AISG2-RP-2020-019"]}]},{"name":"Singapore National ResearchFoundation, under its National Cybersecurity R&D Program","award":["NRF2018NCR-NCR005-0001"],"award-info":[{"award-number":["NRF2018NCR-NCR005-0001"]}]},{"name":"Singapore National Research Founda-tion under NCR","award":["RF2018NCR-NSOE003-0001"],"award-info":[{"award-number":["RF2018NCR-NSOE003-0001"]}]},{"name":"The Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 2 grant","award":["MOE-T2EP20120-0004"],"award-info":[{"award-number":["MOE-T2EP20120-0004"]}]},{"name":"Singapore Ministry of Education AcRF Tier 1","award":["RG108\/19 (S) and RS02\/19"],"award-info":[{"award-number":["RG108\/19 (S) and RS02\/19"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,5,24]]},"DOI":"10.1145\/3433210.3453090","type":"proceedings-article","created":{"date-parts":[[2021,6,4]],"date-time":"2021-06-04T15:26:39Z","timestamp":1622820399000},"page":"307-319","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":26,"title":["Stealing Deep Reinforcement Learning Models for Fun and Profit"],"prefix":"10.1145","author":[{"given":"Kangjie","family":"Chen","sequence":"first","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shangwei","family":"Guo","sequence":"additional","affiliation":[{"name":"Chongqing University, Chongqing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tianwei","family":"Zhang","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaofei","family":"Xie","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yang","family":"Liu","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,6,4]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2020. Safe multi-agent reinforcement learning for autonomous driving. https:\/\/www.mobileye.com\/our-technology\/driving-policy\/. Accessed: 2020-12-05.  2020. Safe multi-agent reinforcement learning for autonomous driving. https:\/\/www.mobileye.com\/our-technology\/driving-policy\/. Accessed: 2020-12-05."},{"key":"e_1_3_2_1_2_1","unstructured":"JUNE 2018. Learning to drive in a day. https:\/\/wayve.ai\/blog\/learning-to-drive-in-a-day-with-reinforcement-learning. Accessed: 2020-12-05.  JUNE 2018. Learning to drive in a day. https:\/\/wayve.ai\/blog\/learning-to-drive-in-a-day-with-reinforcement-learning. Accessed: 2020-12-05."},{"key":"e_1_3_2_1_3_1","volume-title":"USENIX Security Symposium. 1615--1631","author":"Adi Yossi","year":"2018","unstructured":"Yossi Adi , Carsten Baum , Moustapha Cisse , Benny Pinkas , and Joseph Keshet . 2018 . Turning your weakness into a strength: Watermarking deep neural networks by backdooring . In USENIX Security Symposium. 1615--1631 . Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. 2018. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In USENIX Security Symposium. 1615--1631."},{"key":"e_1_3_2_1_4_1","volume-title":"CSI NN: Reverse Engineering of Neural Network Architectures Through Electromagnetic Side Channel. In USENIX Security Symposium. 515--532","author":"Batina Lejla","year":"2019","unstructured":"Lejla Batina , Shivam Bhasin , Dirmanto Jap , and Stjepan Picek . 2019 . CSI NN: Reverse Engineering of Neural Network Architectures Through Electromagnetic Side Channel. In USENIX Security Symposium. 515--532 . Lejla Batina, Shivam Bhasin, Dirmanto Jap, and Stjepan Picek. 2019. CSI NN: Reverse Engineering of Neural Network Architectures Through Electromagnetic Side Channel. In USENIX Security Symposium. 515--532."},{"key":"e_1_3_2_1_5_1","volume-title":"Sequential triggers for watermarking of deep reinforcement learning policies. arXiv preprint arXiv:1906.01126","author":"Behzadan Vahid","year":"2019","unstructured":"Vahid Behzadan and William Hsu . 2019. Sequential triggers for watermarking of deep reinforcement learning policies. arXiv preprint arXiv:1906.01126 ( 2019 ). Vahid Behzadan and William Hsu. 2019. Sequential triggers for watermarking of deep reinforcement learning policies. arXiv preprint arXiv:1906.01126 (2019)."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-62416-7_19"},{"key":"e_1_3_2_1_7_1","volume-title":"Cryptanalytic Extraction of Neural Network Models. arXiv preprint arXiv:2003.04884","author":"Carlini Nicholas","year":"2020","unstructured":"Nicholas Carlini , Matthew Jagielski , and Ilya Mironov . 2020. Cryptanalytic Extraction of Neural Network Models. arXiv preprint arXiv:2003.04884 ( 2020 ). Nicholas Carlini, Matthew Jagielski, and Ilya Mironov. 2020. Cryptanalytic Extraction of Neural Network Models. arXiv preprint arXiv:2003.04884 (2020)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2017.49"},{"key":"e_1_3_2_1_9_1","volume-title":"Working notes of AAAI 1998 Fall Symposium on Planning with Partially Observable Markov Decision Processes","author":"Cassandra Anthony R","unstructured":"Anthony R Cassandra . 1998. A survey of POMDP applications . In Working notes of AAAI 1998 Fall Symposium on Planning with Partially Observable Markov Decision Processes , Vol. 1724 . Anthony R Cassandra. 1998. A survey of POMDP applications. In Working notes of AAAI 1998 Fall Symposium on Planning with Partially Observable Markov Decision Processes, Vol. 1724."},{"key":"e_1_3_2_1_10_1","volume-title":"Exploring connections between active learning and model extraction. arXiv preprint arXiv:1811.02054","author":"Chandrasekaran Varun","year":"2018","unstructured":"Varun Chandrasekaran , Kamalika Chaudhuri , Irene Giacomelli , Somesh Jha , and Songbai Yan . 2018. Exploring connections between active learning and model extraction. arXiv preprint arXiv:1811.02054 ( 2018 ). Varun Chandrasekaran, Kamalika Chaudhuri, Irene Giacomelli, Somesh Jha, and Songbai Yan. 2018. Exploring connections between active learning and model extraction. arXiv preprint arXiv:1811.02054 (2018)."},{"key":"e_1_3_2_1_11_1","unstructured":"Kangjie Chen Shangwei Guo Tianwei Zhang Shuxin Li and Yang Liu. 2021. Temporal Watermarks for Deep Reinforcement Learning Models. (2021).  Kangjie Chen Shangwei Guo Tianwei Zhang Shuxin Li and Yang Liu. 2021. Temporal Watermarks for Deep Reinforcement Learning Models. (2021)."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2018.8489592"},{"key":"e_1_3_2_1_13_1","unstructured":"Prafulla Dhariwal Christopher Hesse Oleg Klimov Alex Nichol Matthias Plappert Alec Radford John Schulman Szymon Sidor Yuhuai Wu and Peter Zhokhov. 2017. OpenAI Baselines. https:\/\/github.com\/openai\/baselines.  Prafulla Dhariwal Christopher Hesse Oleg Klimov Alex Nichol Matthias Plappert Alec Radford John Schulman Szymon Sidor Yuhuai Wu and Peter Zhokhov. 2017. OpenAI Baselines. https:\/\/github.com\/openai\/baselines."},{"key":"e_1_3_2_1_14_1","volume-title":"Stealing neural networks via timing side channels. arXiv preprint arXiv:1812.11720","author":"Duddu Vasisht","year":"2018","unstructured":"Vasisht Duddu , Debasis Samanta , D Vijay Rao , and Valentina E Balas . 2018. Stealing neural networks via timing side channels. arXiv preprint arXiv:1812.11720 ( 2018 ). Vasisht Duddu, Debasis Samanta, D Vijay Rao, and Valentina E Balas. 2018. Stealing neural networks via timing side channels. arXiv preprint arXiv:1812.11720 (2018)."},{"key":"e_1_3_2_1_15_1","volume-title":"International Conference on Machine Learning. 49--58","author":"Finn Chelsea","year":"2016","unstructured":"Chelsea Finn , Sergey Levine , and Pieter Abbeel . 2016 . Guided cost learning: Deep inverse optimal control via policy optimization . In International Conference on Machine Learning. 49--58 . Chelsea Finn, Sergey Levine, and Pieter Abbeel. 2016. Guided cost learning: Deep inverse optimal control via policy optimization. In International Conference on Machine Learning. 49--58."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3243734.3243834"},{"key":"e_1_3_2_1_17_1","volume-title":"Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572","author":"Goodfellow Ian J","year":"2014","unstructured":"Ian J Goodfellow , Jonathon Shlens , and Christian Szegedy . 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 ( 2014 ). Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)."},{"key":"e_1_3_2_1_18_1","volume-title":"Lin (Eds.)","volume":"33","author":"Guo Qing","year":"2020","unstructured":"Qing Guo , Felix Juefei-Xu , Xiaofei Xie , Lei Ma , Jian Wang , Bing Yu , Wei Feng , and Yang Liu . 2020 a. Watch out! Motion is Blurring the Vision of Your Deep Neural Networks. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H . Lin (Eds.) , Vol. 33 . Curran Associates, Inc., 975--985. https:\/\/proceedings.neurips.cc\/paper\/ 2020\/file\/0a73de68f10e15626eb98701ecf03adb-Paper.pdf Qing Guo, Felix Juefei-Xu, Xiaofei Xie, Lei Ma, Jian Wang, Bing Yu, Wei Feng, and Yang Liu. 2020 a. Watch out! Motion is Blurring the Vision of Your Deep Neural Networks. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 975--985. https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/0a73de68f10e15626eb98701ecf03adb-Paper.pdf"},{"key":"e_1_3_2_1_19_1","volume-title":"2020 b. The Hidden Vulnerability of Watermarking for Deep Neural Networks. arXiv preprint arXiv:2009.08697","author":"Guo Shangwei","year":"2020","unstructured":"Shangwei Guo , Tianwei Zhang , Han Qiu , Yi Zeng , Tao Xiang , and Yang Liu . 2020 b. The Hidden Vulnerability of Watermarking for Deep Neural Networks. arXiv preprint arXiv:2009.08697 ( 2020 ). Shangwei Guo, Tianwei Zhang, Han Qiu, Yi Zeng, Tao Xiang, and Yang Liu. 2020 b. The Hidden Vulnerability of Watermarking for Deep Neural Networks. arXiv preprint arXiv:2009.08697 (2020)."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3359789.3359824"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2020.3022358"},{"key":"e_1_3_2_1_22_1","volume-title":"AAAI Conference on Artificial Intelligence.","author":"Hester Todd","year":"2018","unstructured":"Todd Hester , Matej Vecerik , Olivier Pietquin , Marc Lanctot , Tom Schaul , Bilal Piot , Dan Horgan , John Quan , Andrew Sendonaris , Ian Osband , 2018 . Deep q-learning from demonstrations . In AAAI Conference on Artificial Intelligence. Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Ian Osband, et al. 2018. Deep q-learning from demonstrations. In AAAI Conference on Artificial Intelligence."},{"key":"e_1_3_2_1_23_1","unstructured":"Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems. 4565--4573. % balance  Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems. 4565--4573. % balance"},{"key":"e_1_3_2_1_24_1","volume-title":"Long short-term memory. Neural computation","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation , Vol. 9 , 8 ( 1997 ), 1735--1780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780."},{"key":"e_1_3_2_1_25_1","volume-title":"Ian Rackow, Kevin Kulda, Dana Dachman-Soled, and Tudor Dumitracs.","author":"Hong Sanghyun","year":"2018","unstructured":"Sanghyun Hong , Michael Davinroy , Yi?itcan Kaya , Stuart Nevans Locke , Ian Rackow, Kevin Kulda, Dana Dachman-Soled, and Tudor Dumitracs. 2018 . Security analysis of deep neural networks operating in the presence of cache side-channel attacks. arXiv preprint arXiv:1810.03487 (2018). Sanghyun Hong, Michael Davinroy, Yi?itcan Kaya, Stuart Nevans Locke, Ian Rackow, Kevin Kulda, Dana Dachman-Soled, and Tudor Dumitracs. 2018. Security analysis of deep neural networks operating in the presence of cache side-channel attacks. arXiv preprint arXiv:1810.03487 (2018)."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378460"},{"key":"e_1_3_2_1_27_1","volume-title":"ACM\/ESDA\/IEEE Design Automation Conference. 1--6.","author":"Hua Weizhe","year":"2018","unstructured":"Weizhe Hua , Zhiru Zhang , and G Edward Suh . 2018 . Reverse engineering convolutional neural networks through side-channel information leaks . In ACM\/ESDA\/IEEE Design Automation Conference. 1--6. Weizhe Hua, Zhiru Zhang, and G Edward Suh. 2018. Reverse engineering convolutional neural networks through side-channel information leaks. In ACM\/ESDA\/IEEE Design Automation Conference. 1--6."},{"key":"e_1_3_2_1_28_1","volume-title":"Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284","author":"Huang Sandy","year":"2017","unstructured":"Sandy Huang , Nicolas Papernot , Ian Goodfellow , Yan Duan , and Pieter Abbeel . 2017. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284 ( 2017 ). Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, and Pieter Abbeel. 2017. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284 (2017)."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cosrev.2020.100270"},{"key":"e_1_3_2_1_30_1","volume-title":"Eyad Elyan, and Chrisina Jayne.","author":"Hussein Ahmed","year":"2017","unstructured":"Ahmed Hussein , Mohamed Medhat Gaber , Eyad Elyan, and Chrisina Jayne. 2017 . Imitation learning: A survey of learning methods. Comput. Surveys ( 2017). Ahmed Hussein, Mohamed Medhat Gaber, Eyad Elyan, and Chrisina Jayne. 2017. Imitation learning: A survey of learning methods. Comput. Surveys (2017)."},{"key":"e_1_3_2_1_31_1","volume-title":"USENIX Security Symposium.","author":"Jagielski Matthew","year":"2020","unstructured":"Matthew Jagielski , Nicholas Carlini , David Berthelot , Alex Kurakin , and Nicolas Papernot . 2020 . High accuracy and high fidelity extraction of neural networks . In USENIX Security Symposium. Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, and Nicolas Papernot. 2020. High accuracy and high fidelity extraction of neural networks. In USENIX Security Symposium."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/EuroSP.2019.00044"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3274694.3274740"},{"key":"e_1_3_2_1_35_1","volume-title":"Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971","author":"Lillicrap Timothy P","year":"2015","unstructured":"Timothy P Lillicrap , Jonathan J Hunt , Alexander Pritzel , Nicolas Heess , Tom Erez , Yuval Tassa , David Silver , and Daan Wierstra . 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 ( 2015 ). Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/18.61115"},{"key":"e_1_3_2_1_37_1","volume-title":"When NAS Meets Watermarking: Ownership Verification of DNN Models via Cache Side Channels. arXiv preprint arXiv:2102.03523","author":"Lou Xiaoxuan","year":"2021","unstructured":"Xiaoxuan Lou , Shangwei Guo , Tianwei Zhang , Yinqian Zhang , and Yang Liu . 2021. When NAS Meets Watermarking: Ownership Verification of DNN Models via Cache Side Channels. arXiv preprint arXiv:2102.03523 ( 2021 ). Xiaoxuan Lou, Shangwei Guo, Tianwei Zhang, Yinqian Zhang, and Yang Liu. 2021. When NAS Meets Watermarking: Ownership Verification of DNN Models via Cache Side Channels. arXiv preprint arXiv:2102.03523 (2021)."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3287560.3287562"},{"key":"e_1_3_2_1_39_1","volume-title":"International Conference on Machine Learning. 1928--1937","author":"Mnih Volodymyr","year":"2016","unstructured":"Volodymyr Mnih , Adria Puigdomenech Badia , Mehdi Mirza , Alex Graves , Timothy Lillicrap , Tim Harley , David Silver , and Koray Kavukcuoglu . 2016 . Asynchronous methods for deep reinforcement learning . In International Conference on Machine Learning. 1928--1937 . Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning. 1928--1937."},{"key":"e_1_3_2_1_40_1","volume-title":"Nature","volume":"518","author":"Mnih Volodymyr","year":"2015","unstructured":"Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Andrei A Rusu , Joel Veness , Marc G Bellemare , Alex Graves , Martin Riedmiller , Andreas K Fidjeland , Georg Ostrovski , 2015 . Human-level control through deep reinforcement learning . Nature , Vol. 518 , 7540 (2015), 529--533. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature, Vol. 518, 7540 (2015), 529--533."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Andrew Y Ng Adam Coates Mark Diel Varun Ganapathi Jamie Schulte Ben Tse Eric Berger and Eric Liang. 2006. Autonomous inverted helicopter flight via reinforcement learning. In Experimental Robotics IX. 363--372.  Andrew Y Ng Adam Coates Mark Diel Varun Ganapathi Jamie Schulte Ben Tse Eric Berger and Eric Liang. 2006. Autonomous inverted helicopter flight via reinforcement learning. In Experimental Robotics IX. 363--372.","DOI":"10.1007\/11552246_35"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"crossref","unstructured":"Seong Joon Oh Bernt Schiele and Mario Fritz. 2019. Towards reverse-engineering black-box neural networks. In Explainable AI: Interpreting Explaining and Visualizing Deep Learning. 121--144.  Seong Joon Oh Bernt Schiele and Mario Fritz. 2019. Towards reverse-engineering black-box neural networks. In Explainable AI: Interpreting Explaining and Visualizing Deep Learning. 121--144.","DOI":"10.1007\/978-3-030-28954-6_7"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00509"},{"key":"e_1_3_2_1_44_1","volume-title":"A framework for the extraction of deep neural networks by leveraging public data. arXiv preprint arXiv:1905.09165","author":"Pal Soham","year":"2019","unstructured":"Soham Pal , Yash Gupta , Aditya Shukla , Aditya Kanade , Shirish Shevade , and Vinod Ganapathy . 2019. A framework for the extraction of deep neural networks by leveraging public data. arXiv preprint arXiv:1905.09165 ( 2019 ). Soham Pal, Yash Gupta, Aditya Shukla, Aditya Kanade, Shirish Shevade, and Vinod Ganapathy. 2019. A framework for the extraction of deep neural networks by leveraging public data. arXiv preprint arXiv:1905.09165 (2019)."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/EuroSP.2016.36"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939778"},{"key":"e_1_3_2_1_47_1","volume-title":"Annual Conference on Computational Learning Theory. 101--103","author":"Russell Stuart","year":"1998","unstructured":"Stuart Russell . 1998 . Learning agents for uncertain environments . In Annual Conference on Computational Learning Theory. 101--103 . Stuart Russell. 1998. Learning agents for uncertain environments. In Annual Conference on Computational Learning Theory. 101--103."},{"key":"e_1_3_2_1_48_1","volume-title":"Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347","author":"Schulman John","year":"2017","unstructured":"John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , and Oleg Klimov . 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 ( 2017 ). John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)."},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2020\/466"},{"key":"e_1_3_2_1_50_1","volume-title":"Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al.","author":"Silver David","year":"2016","unstructured":"David Silver , Aja Huang , Chris J Maddison , Arthur Guez , Laurent Sifre , George Van Den Driessche , Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016 . Mastering the game of Go with deep neural networks and tree search. Nature , Vol. 529 , 7587 (2016), 484--489. David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, Vol. 529, 7587 (2016), 484--489."},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.6047"},{"key":"e_1_3_2_1_52_1","volume-title":"Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199","author":"Szegedy Christian","year":"2013","unstructured":"Christian Szegedy , Wojciech Zaremba , Ilya Sutskever , Joan Bruna , Dumitru Erhan , Ian Goodfellow , and Rob Fergus . 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 ( 2013 ). Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)."},{"key":"e_1_3_2_1_53_1","volume-title":"USENIX Security Symposium. 601--618","author":"Tram\u00e8r Florian","year":"2016","unstructured":"Florian Tram\u00e8r , Fan Zhang , Ari Juels , Michael K Reiter , and Thomas Ristenpart . 2016 . Stealing machine learning models via prediction apis . In USENIX Security Symposium. 601--618 . Florian Tram\u00e8r, Fan Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. 2016. Stealing machine learning models via prediction apis. In USENIX Security Symposium. 601--618."},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078971.3078974"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2018.00038"},{"key":"e_1_3_2_1_56_1","volume-title":"Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224","author":"Wang Ziyu","year":"2016","unstructured":"Ziyu Wang , Victor Bapst , Nicolas Heess , Volodymyr Mnih , Remi Munos , Koray Kavukcuoglu , and Nando de Freitas . 2016. Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224 ( 2016 ). Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, and Nando de Freitas. 2016. Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224 (2016)."},{"key":"e_1_3_2_1_57_1","volume-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning","author":"Williams Ronald J","year":"1992","unstructured":"Ronald J Williams . 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning , Vol. 8 , 3--4 ( 1992 ), 229--256. Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3--4 (1992), 229--256."},{"key":"e_1_3_2_1_58_1","unstructured":"Yuhuai Wu Elman Mansimov Roger B Grosse Shun Liao and Jimmy Ba. 2017. Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In Advances in Neural Information Processing Systems. 5279--5288.  Yuhuai Wu Elman Mansimov Roger B Grosse Shun Liao and Jimmy Ba. 2017. Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In Advances in Neural Information Processing Systems. 5279--5288."},{"key":"e_1_3_2_1_59_1","volume-title":"Characterizing attacks on deep reinforcement learning. arXiv preprint arXiv:1907.09470","author":"Xiao Chaowei","year":"2019","unstructured":"Chaowei Xiao , Xinlei Pan , Warren He , Jian Peng , Mingjie Sun , Jinfeng Yi , Mingyan Liu , Bo Li , and Dawn Song . 2019. Characterizing attacks on deep reinforcement learning. arXiv preprint arXiv:1907.09470 ( 2019 ). Chaowei Xiao, Xinlei Pan, Warren He, Jian Peng, Mingjie Sun, Jinfeng Yi, Mingyan Liu, Bo Li, and Dawn Song. 2019. Characterizing attacks on deep reinforcement learning. arXiv preprint arXiv:1907.09470 (2019)."},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00284"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3293882.3330579"},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2020.24178"},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3196494.3196550"},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380368"},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00077"},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2017.7989381"}],"event":{"name":"ASIA CCS '21: ACM Asia Conference on Computer and Communications Security","location":"Virtual Event Hong Kong","acronym":"ASIA CCS '21","sponsor":["SIGSAC ACM Special Interest Group on Security, Audit, and Control"]},"container-title":["Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3433210.3453090","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3433210.3453090","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:12Z","timestamp":1750193292000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3433210.3453090"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,24]]},"references-count":65,"alternative-id":["10.1145\/3433210.3453090","10.1145\/3433210"],"URL":"https:\/\/doi.org\/10.1145\/3433210.3453090","relation":{},"subject":[],"published":{"date-parts":[[2021,5,24]]},"assertion":[{"value":"2021-06-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}