{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T15:47:20Z","timestamp":1781020040614,"version":"3.54.1"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Cyber-Phys. Syst."],"published-print":{"date-parts":[[2026,1,31]]},"abstract":"<jats:p>Cyber-Physical Systems (CPSs) are the backbone of many critical infrastructures. However, they have introduced an uncharted territory of security vulnerabilities and attack vectors, mainly due to the deeply integrated physical and cyber spaces. Moreover, in industrial CPS settings, network openness exposes the system to the outside world and renders it vulnerable to cyber threats. The security of industrial CPS significantly relies on the cyber incident detection and response systems which are fundamental to ensure the continuous and proper operation of cyber-physical processes. Among the key configuration parameters of these defense systems is the detection threshold. However, finding the optimal threshold that strikes the right balance between missed detection and false-positive rates remains a challenging problem. In this article, we propose a novel approach that leverages a Hierarchical Reinforcement Learning (HRL) architecture to autonomously detect the dynamic instability in an industrial CPS network and respond by adapting the cyber incident detection and response threshold range to minimize the effects of possible incidents. We developed and tested four HRL algorithmic variants, each offering potential avenues for optimization with its own strengths and limitations. Our agents dynamically select these ranges by assessing the expected risk and potential damage over time. In addition, the agent\u2019s selection process aims to minimize false positives and reduce the cost associated with changing the selected range. All four algorithmic adaptations show the effectiveness of HRL for designing adaptive cyber-physical defense compared to static approaches. Our experimental results indicate that our proposed technique is effective for building autonomous cyber incident detection systems in industrial CPS.<\/jats:p>","DOI":"10.1145\/3765622","type":"journal-article","created":{"date-parts":[[2025,9,2]],"date-time":"2025-09-02T15:22:24Z","timestamp":1756826544000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Autonomous and Adaptive Cyber Incident Detection and Response in Industrial Cyber-Physical Systems Using Hierarchical Reinforcement Learning"],"prefix":"10.1145","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-5309-4436","authenticated-orcid":false,"given":"Ayesha","family":"Babar","sequence":"first","affiliation":[{"name":"School of Computing, Queen\u2019s University, Kingston, Ontario, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1922-5803","authenticated-orcid":false,"given":"Talal","family":"Halabi","sequence":"additional","affiliation":[{"name":"Universite Laval, Quebec, Quebec, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1697-4101","authenticated-orcid":false,"given":"Mohammad","family":"Zulkernine","sequence":"additional","affiliation":[{"name":"School of Computing, Queen\u2019s University, Kingston, Ontario, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2026,1,21]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.5555\/3463952.3463970"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics11233934"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3022862"},{"key":"e_1_3_2_5_2","volume-title":"ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems","author":"Anca Mihai","year":"2023","unstructured":"Mihai Anca, Mark F. Hansen, and Matthew Studley. 2023. Modular hierarchical reinforcement learning for robotics: Improving scalability and generalizability. In ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems."},{"key":"e_1_3_2_6_2","unstructured":"Andrew Barto and Richard S. Sutton. 1998. Reinforcement Learning: An Introduction. Vol. 1 MIT Press Cambridge."},{"key":"e_1_3_2_7_2","volume-title":"the AAAI Conference on Artificial Intelligence","volume":"31","author":"Bacon Pierre-Luc","year":"2017","unstructured":"Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The option-critic architecture. In the AAAI Conference on Artificial Intelligence, Vol. 31."},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.23919\/ACC.2017.7962986"},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1007\/978-3-642-32375-1_2","article-title":"Intrinsic motivation and reinforcement learning","author":"Barto Andrew G.","year":"2013","unstructured":"Andrew G. Barto. 2013. Intrinsic motivation and reinforcement learning. In Intrinsically Motivated Learning in Natural and Artificial Systems. G. Baldassarre and M. Mirolli (Eds.), Springer, 17\u201347.","journal-title":"Intrinsically Motivated Learning in Natural and Artificial Systems"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177697196"},{"key":"e_1_3_2_11_2","first-page":"449","volume-title":"International Conference on Machine Learning","author":"Bellemare Marc G.","year":"2017","unstructured":"Marc G. Bellemare, Will Dabney, and R\u00e9mi Munos. 2017. A distributional perspective on reinforcement learning. In International Conference on Machine Learning. PMLR, 449\u2013458."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.5555\/560669"},{"key":"e_1_3_2_13_2","first-page":"833","volume-title":"2012 ACM Conference on Computer and Communications Security","author":"Bilge Leyla","year":"2012","unstructured":"Leyla Bilge and Tudor Dumitra\u015f. 2012. Before we knew it: An empirical study of zero-day attacks in the real world. In 2012 ACM Conference on Computer and Communications Security, 833\u2013844."},{"key":"e_1_3_2_14_2","volume-title":"Stochastic Approximation: A Dynamical Systems Viewpoint","author":"Borkar Vivek S.","year":"2009","unstructured":"Vivek S. Borkar. 2009. Stochastic Approximation: A Dynamical Systems Viewpoint, Vol. 48. Springer."},{"key":"e_1_3_2_15_2","unstructured":"Chuck Brooks. 2021. Cybersecurity Threats: The Daunting Challenge of Securing the Internet of Things. Retrieved from https:\/\/www.forbes.com\/sites\/chuckbrooks\/2021\/02\/07\/cybersecurity-threats-the-daunting-challenge-of-securing-the-internet-of-things"},{"key":"e_1_3_2_16_2","volume-title":"European Conference on Artificial Intelligence","author":"Canonaco Giuseppe","year":"2020","unstructured":"Giuseppe Canonaco, Marcello Restelli, and Manuel Roveri. 2020. Model-free non-stationarity detection and adaptation in reinforcement learning. In European Conference on Artificial Intelligence. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:221714335"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3600160.3600176"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1186\/s42400-019-0027-x"},{"key":"e_1_3_2_19_2","unstructured":"Joseph Clark. 2023. DOD Releases AI Adoption Strategy. Retrieved February 21 2024 from https:\/\/www.defense.gov\/News\/News-Stories\/Article\/Article\/3578219\/dod-releases-ai-adoption-strategy\/"},{"key":"e_1_3_2_20_2","unstructured":"Austin Cooper and Sean P. Meyn. 2024. Reinforcement learning design for quickest change detection. arXiv:2403.14109. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:268553633"},{"key":"e_1_3_2_21_2","article-title":"Feudal reinforcement learning","volume":"5","author":"Dayan Peter","year":"1992","unstructured":"Peter Dayan and Geoffrey E. Hinton. 1992. Feudal reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 5.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.639"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-021-05961-4"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2022.3165809"},{"key":"e_1_3_2_25_2","unstructured":"FireEye Technical Report. 2020. Zero-Day Exploitation Increasingly Demonstrates Access to Money Rather Than Skill\u2014Intelligence for Vulnerability Management. Retrieved February 19 2024 from https:\/\/www.fireeye.com\/blog\/threat-research\/2020\/04\/zero-day-exploitation-demonstrates-access-to-money-not-skill.html"},{"key":"e_1_3_2_26_2","unstructured":"M. Fortunato M. G. Azar B. Piot J. Menick I. Osband A. Graves V. Mnih R. Munos D. Hassabis O. Pietquin et al. 2017. Noisy networks for exploration. arXiv:1706.10295. Retrieved from https:\/\/arxiv.org\/abs\/1706.10295"},{"key":"e_1_3_2_27_2","first-page":"415","volume-title":"7th International Conference on Decision and Game Theory for Security (GameSec \u201916)","author":"Ghafouri Amin","year":"2016","unstructured":"Amin Ghafouri, Waseem Abbas, Aron Laszka, Yevgeniy Vorobeychik, and Xenofon Koutsoukos. 2016. Optimal thresholds for anomaly-based intrusion detection in dynamical environments. In 7th International Conference on Decision and Game Theory for Security (GameSec \u201916). Springer, 415\u2013434."},{"key":"e_1_3_2_28_2","unstructured":"Freepik Gravisio and Witdhawaty. 2024. Icons Created by Flaticon. Retrieved February 22 2024 from https:\/\/www.flaticon.com\/"},{"key":"e_1_3_2_29_2","unstructured":"Abhishek Gupta Vikash Kumar Corey Lynch Sergey Levine and Karol Hausman. 2019. Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. arXiv:1910.11956. Retrieved from https:\/\/arxiv.org\/abs\/1910.11956"},{"key":"e_1_3_2_30_2","first-page":"495","article-title":"Hierarchical reinforcement learning","author":"Hengst Bernhard","year":"2011","unstructured":"Bernhard Hengst. 2011. Hierarchical reinforcement learning. In Encyclopedia of Machine Learning. C. Sammut and G. I. Webb (Eds.), Springer, 495\u2013502.","journal-title":"Encyclopedia of Machine Learning"},{"key":"e_1_3_2_31_2","doi-asserted-by":"crossref","unstructured":"M. Hessel J. Modayil H. van Hasselt T. Schaul G. Ostrovski W. Dabney and D. Silver. 2017. Rainbow: Combining improvements in deep reinforcement learning. arXiv:1710.02298. Retrieved from https:\/\/arxiv.org\/abs\/1710.02298","DOI":"10.1609\/aaai.v32i1.11796"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-023-05022-4"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3305218.3305239"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.arcontrol.2022.01.001"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.3390\/make4010009"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3301273"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.32604\/cmc.2024.052447"},{"key":"e_1_3_2_38_2","article-title":"Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation","volume":"29","author":"Kulkarni Tejas D.","year":"2016","unstructured":"Tejas D. Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Advances in Neural Information Processing Systems, Vol. 29, 2016.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/2898375.2898399"},{"key":"e_1_3_2_40_2","unstructured":"Andrew Levy George Konidaris Robert Platt and Kate Saenko. 2017. Learning multi-level hierarchies with hindsight. arXiv:1712.00948. Retrieved from https:\/\/arxiv.org\/abs\/1712.00948"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.23919\/ACC.2019.8815000"},{"key":"e_1_3_2_42_2","unstructured":"Volodymyr Mnih Koray Kavukcuoglu David Silver Alex Graves Ioannis Antonoglou Daan Wierstra and Martin Riedmiller. 2013. Playing Atari with deep reinforcement learning. arXiv:1312.5602. Retrieved from https:\/\/arxiv.org\/abs\/1312.5602"},{"key":"e_1_3_2_43_2","article-title":"Data-efficient hierarchical reinforcement learning","volume":"31","author":"Nachum Ofir","year":"2018","unstructured":"Ofir Nachum, Shixiang Shane Gu, Honglak Lee, and Sergey Levine. 2018. Data-efficient hierarchical reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 31, 2018.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_44_2","first-page":"3779","volume-title":"IEEE Transactions on Neural Networks and Learning Systems","volume":"34","author":"Nguyen Thanh Thi","year":"2021","unstructured":"Thanh Thi Nguyen and Vijay Janapa Reddi. 2021. Deep reinforcement learning for cyber security. IEEE Transactions on Neural Networks and Learning Systems 34, 8 (2021), 3779\u20133795."},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459991"},{"key":"e_1_3_2_46_2","article-title":"Reinforcement learning with hierarchies of machines","volume":"10","author":"Parr Ronald","year":"1997","unstructured":"Ronald Parr and Stuart Russell. 1997. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems, Vol. 10.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3453160"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cose.2023.103546"},{"issue":"1","key":"e_1_3_2_49_2","article-title":"Patching zero-day vulnerabilities: An empirical analysis","volume":"7","author":"Roumani Yaman","year":"2021","unstructured":"Yaman Roumani. 2021. Patching zero-day vulnerabilities: An empirical analysis. Journal of Cybersecurity 7, 1 (2021), tyab023.","journal-title":"Journal of Cybersecurity"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3578366"},{"key":"e_1_3_2_51_2","unstructured":"Tom Schaul John Quan Ioannis Antonoglou and David Silver. 2015. Prioritized experience replay. arXiv:1511.05952. Retrieved from https:\/\/arxiv.org\/abs\/1511.05952"},{"key":"e_1_3_2_52_2","first-page":"51","volume-title":"International Conference on Secure Knowledge Management in Artificial Intelligence Era","author":"Sewak Mohit","year":"2021","unstructured":"Mohit Sewak, Sanjay K. Sahay, and Hemant Rathore. 2021. Deep reinforcement learning for cybersecurity threat detection and protection: A review. In International Conference on Secure Knowledge Management in Artificial Intelligence Era. Springer, 51\u201372."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10796-022-10333-x"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(99)00052-1"},{"key":"e_1_3_2_55_2","article-title":"Universal option models","volume":"27","author":"Szepesvari Csaba","year":"2014","unstructured":"Csaba Szepesvari, Richard S. Sutton, Joseph Modayil, Shalabh Bhatnagar, Doina Precup, and David Silver. 2014. Universal option models. In Advances in Neural Information Processing Systems, Vol. 27.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_56_2","volume-title":"the AAAI Conference on Artificial Intelligence","volume":"31","author":"Tessler Chen","year":"2017","unstructured":"Chen Tessler, Shahar Givony, Tom Zahavy, Daniel Mankowitz, and Shie Mannor. 2017. A deep hierarchical approach to lifelong learning in Minecraft. In the AAAI Conference on Artificial Intelligence, Vol. 31."},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6386109"},{"key":"e_1_3_2_58_2","first-page":"1995","volume-title":"International Conference on Machine Learning","author":"Wang Ziyu","year":"2016","unstructured":"Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. 2016. Dueling network architectures for deep reinforcement learning. In International Conference on Machine Learning. PMLR, 1995\u20132003."},{"key":"e_1_3_2_59_2","unstructured":"M. Zolanvari M. A. Teixeira L. Gupta K. M. Khan and R. Jain. 2021. WUSTL-IIOT-2021 Dataset for IIoT Cybersecurity Research. Washington University in St. Louis. Retrieved from http:\/\/www.cse.wustl.edu\/\u223cjain\/iiot2\/index.html"}],"container-title":["ACM Transactions on Cyber-Physical Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3765622","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T09:08:44Z","timestamp":1768986524000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3765622"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,21]]},"references-count":58,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,31]]}},"alternative-id":["10.1145\/3765622"],"URL":"https:\/\/doi.org\/10.1145\/3765622","relation":{},"ISSN":["2378-962X","2378-9638"],"issn-type":[{"value":"2378-962X","type":"print"},{"value":"2378-9638","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,21]]},"assertion":[{"value":"2024-11-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-17","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-01-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}