{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T21:56:33Z","timestamp":1778104593724,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":43,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,11,15]],"date-time":"2021-11-15T00:00:00Z","timestamp":1636934400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Business Finland \/ S2ERC","award":["663\/31\/2020"],"award-info":[{"award-number":["663\/31\/2020"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,11,15]]},"DOI":"10.1145\/3474369.3486877","type":"proceedings-article","created":{"date-parts":[[2021,10,28]],"date-time":"2021-10-28T11:13:28Z","timestamp":1635419608000},"page":"157-168","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":18,"title":["Automating Privilege Escalation with Deep Reinforcement Learning"],"prefix":"10.1145","author":[{"given":"Kalle","family":"Kujanp\u00e4\u00e4","sequence":"first","affiliation":[{"name":"Aalto University, Espoo, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Willie","family":"Victor","sequence":"additional","affiliation":[{"name":"F-Secure, Johannesburg, South Africa"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alexander","family":"Ilin","sequence":"additional","affiliation":[{"name":"Aalto University, Espoo, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,11,15]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnca.2019.102479"},{"key":"e_1_3_2_1_2_1","volume-title":"Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning. arXiv preprint arXiv:1801.08917","author":"Anderson Hyrum S","year":"2018","unstructured":"Hyrum S Anderson , Anant Kharkar , Bobby Filar , David Evans , and Phil Roth . 2018. Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning. arXiv preprint arXiv:1801.08917 ( 2018 ). Hyrum S Anderson, Anant Kharkar, Bobby Filar, David Evans, and Phil Roth. 2018. Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning. arXiv preprint arXiv:1801.08917 (2018)."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2991079.2991111"},{"key":"e_1_3_2_1_4_1","volume-title":"On the Effectiveness of Machine and Deep Learning for Cyber Security. In 2018 10th International Conference on Cyber Conflict (CyCon). IEEE, 371--390","author":"Apruzzese Giovanni","year":"2018","unstructured":"Giovanni Apruzzese , Michele Colajanni , Luca Ferretti , Alessandro Guido , and Mirco Marchetti . 2018 . On the Effectiveness of Machine and Deep Learning for Cyber Security. In 2018 10th International Conference on Cyber Conflict (CyCon). IEEE, 371--390 . Giovanni Apruzzese, Michele Colajanni, Luca Ferretti, Alessandro Guido, and Mirco Marchetti. 2018. On the Effectiveness of Machine and Deep Learning for Cyber Security. In 2018 10th International Conference on Cyber Conflict (CyCon). IEEE, 371--390."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cose.2020.101738"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cose.2021.102204"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-018-01408-x"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/COMPSAC.2019.10211"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1186\/s42400-019-0027-x"},{"key":"e_1_3_2_1_10_1","volume-title":"Autonomous Security Analysis and Penetration Testing. In 2020 16th International Conference on Mobility, Sensing and Networking (MSN). IEEE, 508--515","author":"Chowdhary Ankur","year":"2020","unstructured":"Ankur Chowdhary , Dijiang Huang , Jayasurya Sevalur Mahendran , Daniel Romo , Yuli Deng , and Abdulhakim Sabur . 2020 . Autonomous Security Analysis and Penetration Testing. In 2020 16th International Conference on Mobility, Sensing and Networking (MSN). IEEE, 508--515 . Ankur Chowdhary, Dijiang Huang, Jayasurya Sevalur Mahendran, Daniel Romo, Yuli Deng, and Abdulhakim Sabur. 2020. Autonomous Security Analysis and Penetration Testing. In 2020 16th International Conference on Mobility, Sensing and Networking (MSN). IEEE, 508--515."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2018.2822680"},{"key":"e_1_3_2_1_12_1","volume-title":"Albert S Thie, Madalina M Drugan, and Marco Wiering.","author":"Elderman Richard","year":"2017","unstructured":"Richard Elderman , Leon JJ Pater , Albert S Thie, Madalina M Drugan, and Marco Wiering. 2017 . Adversarial Reinforcement Learning in a Cyber Security Simulation.. In ICAART ( 2). 559--566. Richard Elderman, Leon JJ Pater, Albert S Thie, Madalina M Drugan, and Marco Wiering. 2017. Adversarial Reinforcement Learning in a Cyber Security Simulation.. In ICAART (2). 559--566."},{"key":"e_1_3_2_1_13_1","volume-title":"International Journal of Information Security","author":"Fabio Massimo Zennaro L\u00e1szl\u00f3","year":"2021","unstructured":"L\u00e1szl\u00f3 ErdHo di and Fabio Massimo Zennaro . 2021. The Agent Web Model: modeling web hacking for reinforcement learning . International Journal of Information Security ( 2021 ), 1--17. L\u00e1szl\u00f3 ErdHo di and Fabio Massimo Zennaro. 2021. The Agent Web Model: modeling web hacking for reinforcement learning. International Journal of Information Security (2021), 1--17."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2957429"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2908033"},{"key":"e_1_3_2_1_16_1","volume-title":"Robust Deep Reinforcement Learning for Security and Safety in Autonomous Vehicle Systems. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 307--312","author":"Ferdowsi Aidin","year":"2018","unstructured":"Aidin Ferdowsi , Ursula Challita , Walid Saad , and Narayan B Mandayam . 2018 . Robust Deep Reinforcement Learning for Security and Safety in Autonomous Vehicle Systems. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 307--312 . Aidin Ferdowsi, Ursula Challita, Walid Saad, and Narayan B Mandayam. 2018. Robust Deep Reinforcement Learning for Security and Safety in Autonomous Vehicle Systems. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 307--312."},{"key":"e_1_3_2_1_17_1","volume-title":"Paramiko: A Python implementation of SSHv2","author":"Forcier Jeff","year":"2021","unstructured":"Jeff Forcier . 2021 . Paramiko: A Python implementation of SSHv2 . http:\/\/www.paramiko.org\/index.html Jeff Forcier. 2021. Paramiko: A Python implementation of SSHv2. http:\/\/www.paramiko.org\/index.html"},{"key":"e_1_3_2_1_18_1","volume-title":"Reinforcement Learning for Intelligent Penetration Testing. In 2018 Second World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4). IEEE, 185--192","author":"Ghanem Mohamed C","year":"2018","unstructured":"Mohamed C Ghanem and Thomas M Chen . 2018 . Reinforcement Learning for Intelligent Penetration Testing. In 2018 Second World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4). IEEE, 185--192 . Mohamed C Ghanem and Thomas M Chen. 2018. Reinforcement Learning for Intelligent Penetration Testing. In 2018 Second World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4). IEEE, 185--192."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.3390\/info11010006"},{"key":"e_1_3_2_1_20_1","volume-title":"Wired","volume":"22","author":"Greenberg Andy","year":"2018","unstructured":"Andy Greenberg . 2018 . The Untold Story of NotPetya, the Most Devastating Cyberattack in History . Wired , August , Vol. 22 (2018). Andy Greenberg. 2018. The Untold Story of NotPetya, the Most Devastating Cyberattack in History. Wired, August, Vol. 22 (2018)."},{"key":"e_1_3_2_1_21_1","volume-title":"Two-Dimensional Anti-Jamming Communication Based on Deep Reinforcement Learning. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE","author":"Han Guoan","year":"2017","unstructured":"Guoan Han , Liang Xiao , and H Vincent Poor . 2017 . Two-Dimensional Anti-Jamming Communication Based on Deep Reinforcement Learning. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE , 2087--2091. Guoan Han, Liang Xiao, and H Vincent Poor. 2017. Two-Dimensional Anti-Jamming Communication Based on Deep Reinforcement Learning. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2087--2091."},{"key":"e_1_3_2_1_22_1","volume-title":"Reinforcement Learning for Autonomous Defence in Software-Defined Networking. In International Conference on Decision and Game Theory for Security. Springer, 145--165","author":"Han Yi","year":"2018","unstructured":"Yi Han , Benjamin IP Rubinstein , Tamas Abraham , Tansu Alpcan , Olivier De Vel , Sarah Erfani , David Hubczenko , Christopher Leckie , and Paul Montague . 2018 . Reinforcement Learning for Autonomous Defence in Software-Defined Networking. In International Conference on Decision and Game Theory for Security. Springer, 145--165 . Yi Han, Benjamin IP Rubinstein, Tamas Abraham, Tansu Alpcan, Olivier De Vel, Sarah Erfani, David Hubczenko, Christopher Leckie, and Paul Montague. 2018. Reinforcement Learning for Autonomous Defence in Software-Defined Networking. In International Conference on Decision and Game Theory for Security. Springer, 145--165."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2016.2548987"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3196494.3196511"},{"key":"e_1_3_2_1_25_1","volume-title":"Breakthroughs in Statistics","author":"Huber Peter J","unstructured":"Peter J Huber . 1992. Robust Estimation of a Location Parameter . In Breakthroughs in Statistics . Springer , 492--518. Peter J Huber. 1992. Robust Estimation of a Location Parameter. In Breakthroughs in Statistics. Springer, 492--518."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2018.2866319"},{"key":"e_1_3_2_1_27_1","volume-title":"Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cose.2020.102108"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.compeleceng.2019.02.022"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.compeleceng.2017.02.013"},{"key":"e_1_3_2_1_31_1","volume-title":"Asynchronous Methods for Deep Reinforcement Learning. In International conference on machine learning. PMLR","author":"Mnih Volodymyr","year":"2016","unstructured":"Volodymyr Mnih , Adria Puigdomenech Badia , Mehdi Mirza , Alex Graves , Timothy Lillicrap , Tim Harley , David Silver , and Koray Kavukcuoglu . 2016 . Asynchronous Methods for Deep Reinforcement Learning. In International conference on machine learning. PMLR , 1928--1937. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous Methods for Deep Reinforcement Learning. In International conference on machine learning. PMLR, 1928--1937."},{"key":"e_1_3_2_1_32_1","volume-title":"Deep Reinforcement Learning for Cyber Security. arXiv preprint arXiv:1906.05799","author":"Nguyen Thanh Thi","year":"2019","unstructured":"Thanh Thi Nguyen and Vijay Janapa Reddi . 2019. Deep Reinforcement Learning for Cyber Security. arXiv preprint arXiv:1906.05799 ( 2019 ). Thanh Thi Nguyen and Vijay Janapa Reddi. 2019. Deep Reinforcement Learning for Cyber Security. arXiv preprint arXiv:1906.05799 (2019)."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2885530"},{"key":"e_1_3_2_1_34_1","volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , Alban Desmaison , Andreas Kopf , Edward Yang , Zachary DeVito , Martin Raison , Alykhan Tejani , Sasank Chilamkurthy , Benoit Steiner , Lu Fang , Junjie Bai , and Soumith Chintala . 2019. PyTorch: An Imperative Style , High-Performance Deep Learning Library . In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http:\/\/papers.neurips.cc\/paper\/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http:\/\/papers.neurips.cc\/paper\/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf"},{"key":"e_1_3_2_1_35_1","volume-title":"https:\/\/docs.rapid7.com\/metasploit\/msf-overview\/","author":"Framework Metasploit","year":"2021","unstructured":"Rapid7. 2021. Metasploit Framework . ( 2021 ). https:\/\/docs.rapid7.com\/metasploit\/msf-overview\/ Rapid7. 2021. Metasploit Framework. (2021). https:\/\/docs.rapid7.com\/metasploit\/msf-overview\/"},{"key":"e_1_3_2_1_36_1","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton Richard S","year":"2018","unstructured":"Richard S Sutton and Andrew G Barto . 2018 . Reinforcement Learning: An Introduction . MIT press . Richard S Sutton and Andrew G Barto. 2018. Reinforcement Learning: An Introduction. MIT press."},{"key":"e_1_3_2_1_37_1","unstructured":"Isao Takaesu. 2018. Deep Exploit. https:\/\/github.com\/13o-bbr-bbq\/machine_learning_security\/tree\/master\/DeepExploit  Isao Takaesu. 2018. Deep Exploit. https:\/\/github.com\/13o-bbr-bbq\/machine_learning_security\/tree\/master\/DeepExploit"},{"key":"e_1_3_2_1_38_1","unstructured":"Microsoft 365 Defender Research Team. 2021. Gamifying machine learning for stronger security and AI models. https:\/\/www.microsoft.com\/security\/blog\/2021\/04\/08\/gamifying-machine-learning-for-stronger-security-and-ai-models\/  Microsoft 365 Defender Research Team. 2021. Gamifying machine learning for stronger security and AI models. https:\/\/www.microsoft.com\/security\/blog\/2021\/04\/08\/gamifying-machine-learning-for-stronger-security-and-ai-models\/"},{"key":"e_1_3_2_1_39_1","volume-title":"Reinforcement Learning Based Mobile Offloading for Cloud-Based Malware Detection. In GLOBECOM 2017--2017 IEEE Global Communications Conference. IEEE, 1--6.","author":"Wan Xiaoyue","year":"2017","unstructured":"Xiaoyue Wan , Geyi Sheng , Yanda Li , Liang Xiao , and Xiaojiang Du . 2017 . Reinforcement Learning Based Mobile Offloading for Cloud-Based Malware Detection. In GLOBECOM 2017--2017 IEEE Global Communications Conference. IEEE, 1--6. Xiaoyue Wan, Geyi Sheng, Yanda Li, Liang Xiao, and Xiaojiang Du. 2017. Reinforcement Learning Based Mobile Offloading for Cloud-Based Malware Detection. In GLOBECOM 2017--2017 IEEE Global Communications Conference. IEEE, 1--6."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33011401"},{"key":"e_1_3_2_1_41_1","volume-title":"Spoofing Detection with Reinforcement Learning in Wireless Networks. In 2015 IEEE Global Communications Conference (GLOBECOM). IEEE, 1--5.","author":"Xiao Liang","year":"2015","unstructured":"Liang Xiao , Yan Li , Guolong Liu , Qiangda Li , and Weihua Zhuang . 2015 . Spoofing Detection with Reinforcement Learning in Wireless Networks. In 2015 IEEE Global Communications Conference (GLOBECOM). IEEE, 1--5. Liang Xiao, Yan Li, Guolong Liu, Qiangda Li, and Weihua Zhuang. 2015. Spoofing Detection with Reinforcement Learning in Wireless Networks. In 2015 IEEE Global Communications Conference (GLOBECOM). IEEE, 1--5."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/MWC.2018.1700291"},{"key":"e_1_3_2_1_43_1","volume-title":"Modeling Penetration Testing with Reinforcement Learning Using Capture-the-Flag Challenges: Trade-offs between Model-free Learning and A Priori Knowledge. arXiv preprint arXiv:2005.12632","author":"Zennaro Fabio Massimo","year":"2021","unstructured":"Fabio Massimo Zennaro and Laszlo Erdodi . 2021. Modeling Penetration Testing with Reinforcement Learning Using Capture-the-Flag Challenges: Trade-offs between Model-free Learning and A Priori Knowledge. arXiv preprint arXiv:2005.12632 ( 2021 ). Fabio Massimo Zennaro and Laszlo Erdodi. 2021. Modeling Penetration Testing with Reinforcement Learning Using Capture-the-Flag Challenges: Trade-offs between Model-free Learning and A Priori Knowledge. arXiv preprint arXiv:2005.12632 (2021)."}],"event":{"name":"CCS '21: 2021 ACM SIGSAC Conference on Computer and Communications Security","location":"Virtual Event Republic of Korea","acronym":"CCS '21","sponsor":["SIGSAC ACM Special Interest Group on Security, Audit, and Control"]},"container-title":["Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474369.3486877","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3474369.3486877","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:26Z","timestamp":1750188626000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474369.3486877"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,15]]},"references-count":43,"alternative-id":["10.1145\/3474369.3486877","10.1145\/3474369"],"URL":"https:\/\/doi.org\/10.1145\/3474369.3486877","relation":{},"subject":[],"published":{"date-parts":[[2021,11,15]]},"assertion":[{"value":"2021-11-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}