{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T07:47:49Z","timestamp":1776844069687,"version":"3.51.2"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T00:00:00Z","timestamp":1776729600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["CNS-2442914 and CNS-2333980"],"award-info":[{"award-number":["CNS-2442914 and CNS-2333980"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Cyber-Phys. Syst."],"published-print":{"date-parts":[[2026,4,30]]},"abstract":"<jats:p>Safe Reinforcement Learning (RL) has been applied to synthesize control policies that maximize task rewards while adhering to safety constraints within simulated secure cyber-physical systems. However, the vulnerability of safe RL to adversarial attacks remains largely unexplored. We argue that understanding the safety vulnerabilities of learned control policies is crucial for ensuring true safety in real-world scenarios. To address this gap, we first formally define the safe RL problem with formal language (signal temporal logic) and demonstrate that even optimal policies are susceptible to observation perturbations. We then introduce novel safety violation attacks that exploit adversarial models trained with reversed safety constraints to induce unsafe behaviors. Lastly, through both theoretical analysis and experimental results, we demonstrate that our approach is more effective at violating safety constraints than existing adversarial RL methods, which primarily focus on reducing task rewards rather than compromising safety.<\/jats:p>","DOI":"10.1145\/3788281","type":"journal-article","created":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T10:05:02Z","timestamp":1768817102000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Vulnerability Analysis for Safe Reinforcement Learning in Cyber-Physical Systems"],"prefix":"10.1145","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-9137-2359","authenticated-orcid":false,"given":"Shixiong","family":"Jiang","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3532-9506","authenticated-orcid":false,"given":"Mengyu","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana, USA and Washington State University Tri-Cities, Richland, Washington, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6488-3488","authenticated-orcid":false,"given":"Fanxin","family":"Kong","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,4,21]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/2465787.2465797"},{"key":"e_1_3_1_3_2","first-page":"22","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Achiam Joshua","year":"2017","unstructured":"Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained policy optimization. In Proceedings of the International Conference on Machine Learning. PMLR, 22\u201331."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSTW.2018.00052"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-95582-7_27"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1146\/annurev-control-042920-020211"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/840"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3319535.3339815"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3575870.3587119"},{"key":"e_1_3_1_10_2","unstructured":"Yize Chen Yuanyuan Shi and Baosen Zhang. 2018. Optimal control via neural networks: A convex approach. arXiv:1805.11835. Retrieved from https:\/\/arxiv.org\/abs\/1805.11835"},{"key":"e_1_3_1_11_2","article-title":"A Lyapunov-based approach to safe reinforcement learning","volume":"31","author":"Chow Yinlam","year":"2018","unstructured":"Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh. 2018. A Lyapunov-based approach to safe reinforcement learning. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 31.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_12_2","unstructured":"Yinlam Chow Ofir Nachum Aleksandra Faust Edgar Duenez-Guzman and Mohammad Ghavamzadeh. 2019. Lyapunov-based safe policy optimization for continuous control. arXiv:1901.10031. Retrieved from https:\/\/arxiv.org\/abs\/1901.10031"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15297-9_9"},{"key":"e_1_3_1_14_2","unstructured":"Scott Fujimoto Edoardo Conti Mohammad Ghavamzadeh and Joelle Pineau. 2019. Benchmarking batch deep reinforcement learning algorithms. arXiv:1910.01708. Retrieved from https:\/\/arxiv.org\/abs\/1910.01708"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12107"},{"key":"e_1_3_1_16_2","doi-asserted-by":"crossref","unstructured":"Amin Ghafouri Yevgeniy Vorobeychik and Xenofon Koutsoukos. 2018. Adversarial regression for detecting attacks in cyber-physical systems. arXiv:1804.11022. Retrieved from https:\/\/arxiv.org\/abs\/1804.11022","DOI":"10.24963\/ijcai.2018\/524"},{"key":"e_1_3_1_17_2","unstructured":"Ian J. Goodfellow Jonathon Shlens and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv:1412.6572. Retrieved from https:\/\/arxiv.org\/abs\/1412.6572"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.23919\/ECC.2019.8796117"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-17108-6_12"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403089"},{"key":"e_1_3_1_21_2","unstructured":"Sandy Huang Nicolas Papernot Ian Goodfellow Yan Duan and Pieter Abbeel. 2017. Adversarial attacks on neural network policies. arXiv:1702.02284. Retrieved from https:\/\/arxiv.org\/abs\/1702.02284"},{"key":"e_1_3_1_22_2","article-title":"Safety gymnasium: A unified safe reinforcement learning benchmark","volume":"36","author":"Ji Jiaming","year":"2023","unstructured":"Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Josef Dai, and Yaodong Yang. 2023. Safety gymnasium: A unified safe reinforcement learning benchmark. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 36.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCPS54341.2022.00030"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218663"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCPS.2018.00011"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/IVS.2015.7225830"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.23919\/ACC.2018.8431181"},{"key":"e_1_3_1_28_2","volume-title":"Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems","author":"Li Xiao","year":"2017","unstructured":"Xiao Li, Cristian-Ioan Vasile, and Calin Belta. 2017. Reinforcement learning with temporal logic rewards. In Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems. IEEE."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2024.3443916"},{"key":"e_1_3_1_30_2","unstructured":"Qingkai Liang Fanyu Que and Eytan Modiano. 2018. Accelerated primal-dual policy optimization for safe reinforcement learning. arXiv:1802.06480. Retrieved from https:\/\/arxiv.org\/abs\/1802.06480"},{"key":"e_1_3_1_31_2","unstructured":"Mengyu Liu Pengyuan Lu Xin Chen Fanxin Kong Oleg Sokolsky and Insup Lee. 2023. Fulfilling formal specifications ASAP by model-free reinforcement learning. arXiv:2304.12508. Retrieved from https:\/\/arxiv.org\/abs\/2304.12508"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS55097.2022.00029"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS59052.2023.00017"},{"key":"e_1_3_1_34_2","first-page":"13644","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Liu Zuxin","year":"2022","unstructured":"Zuxin Liu, Zhepeng Cen, Vladislav Isenbaev, Wei Liu, Steven Wu, Bo Li, and Ding Zhao. 2022. Constrained variational policy optimization for safe reinforcement learning. In Proceedings of the International Conference on Machine Learning. PMLR, 13644\u201313668."},{"key":"e_1_3_1_35_2","unstructured":"Zuxin Liu Zijian Guo Zhepeng Cen Huan Zhang Jie Tan Bo Li and Ding Zhao. 2022. On the robustness of safe reinforcement learning under observational perturbations. arXiv:2205.14691. Retrieved from https:\/\/arxiv.org\/abs\/2205.14691"},{"key":"e_1_3_1_36_2","first-page":"21586","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Liu Zuxin","year":"2023","unstructured":"Zuxin Liu, Zijian Guo, Zhepeng Cen, Huan Zhang, Yihang Yao, Hanjiang Hu, and Ding Zhao. 2023. Towards robust and safe reinforcement learning with benign off-policy data. In Proceedings of the International Conference on Machine Learning. PMLR, 21586\u201321610."},{"key":"e_1_3_1_37_2","unstructured":"Aleksander Madry Aleksandar Makelov Ludwig Schmidt Dimitris Tsipras and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083. Retrieved from https:\/\/arxiv.org\/abs\/1706.06083"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-30206-3_12"},{"key":"e_1_3_1_39_2","unstructured":"Anay Pattanaik Zhenyi Tang Shuijing Liu Gautham Bommannan and Girish Chowdhary. 2017. Robust deep reinforcement learning with adversarial attacks. arXiv:1712.03632. Retrieved from https:\/\/arxiv.org\/abs\/1712.03632"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jmsy.2020.11.017"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3092676"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i8.26139"},{"key":"e_1_3_1_43_2","unstructured":"Alex Ray Joshua Achiam and Dario Amodei. 2019. Benchmarking safe exploration in deep reinforcement learning. arXiv:1910.01708. Retrieved from https:\/\/arxiv.org\/abs\/1910.01708"},{"key":"e_1_3_1_44_2","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347. Retrieved from https:\/\/arxiv.org\/abs\/1707.06347"},{"key":"e_1_3_1_45_2","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Kumar Singh Nikhil","year":"2023","unstructured":"Nikhil Kumar Singh and Indranil Saha. 2023. STL-based synthesis of feedback controllers using reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence."},{"key":"e_1_3_1_46_2","first-page":"9133","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Stooke Adam","year":"2020","unstructured":"Adam Stooke, Joshua Achiam, and Pieter Abbeel. 2020. Responsive safety in reinforcement learning by PID Lagrangian methods. In Proceedings of the International Conference on Machine Learning. PMLR, 9133\u20139143."},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2019.2890858"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.6047"},{"key":"e_1_3_1_49_2","unstructured":"Yanchao Sun Ruijie Zheng Yongyuan Liang and Furong Huang. 2021. Who is the strongest enemy? Towards optimal and efficient evasion attacks in deep RL. arXiv:2106.05087. Retrieved from https:\/\/arxiv.org\/abs\/2106.05087"},{"key":"e_1_3_1_50_2","unstructured":"Mark Towers Ariel Kwiatkowski Jordan Terry John U. Balis Gianluca De Cola Tristan Deleu Manuel Goul\u00e3o Andreas Kallinteris Markus Krimmel Arjun Kg et\u00a0al. 2024. Gymnasium: A standard interface for reinforcement learning environments. arXiv:2407.17032. Retrieved from https:\/\/arxiv.org\/abs\/2407.17032"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3576841.3585919"},{"key":"e_1_3_1_52_2","first-page":"36593","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Wang Yixuan","year":"2023","unstructured":"Yixuan Wang, Simon Sinong Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, and Qi Zhu. 2023. Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. In Proceedings of the International Conference on Machine Learning. PMLR, 36593\u201336604."},{"key":"e_1_3_1_53_2","article-title":"Constraint-conditioned policy optimization for versatile safe reinforcement learning","volume":"36","author":"Yao Yihang","year":"2024","unstructured":"Yihang Yao, Zuxin Liu, Zhepeng Cen, Jiacheng Zhu, Wenhao Yu, Tingnan Zhang, and Ding Zhao. 2024. Constraint-conditioned policy optimization for versatile safe reinforcement learning. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 36.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_54_2","first-page":"2608","article-title":"Towards safe reinforcement learning with a safety editor policy","volume":"35","author":"Yu Haonan","year":"2022","unstructured":"Haonan Yu, Wei Xu, and Haichao Zhang. 2022. Towards safe reinforcement learning with a safety editor policy. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 35, 2608\u20132621.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_55_2","unstructured":"Simon Sinong Zhan Yixuan Wang Qingyuan Wu Ruochen Jiao Chao Huang and Qi Zhu. 2023. State-wise safe reinforcement learning with pixel observations. arXiv:2311.02227. Retrieved from https:\/\/arxiv.org\/abs\/2311.02227"},{"key":"e_1_3_1_56_2","unstructured":"Huan Zhang Hongge Chen Duane Boning and Cho-Jui Hsieh. 2021. Robust reinforcement learning on state observations with learned optimal adversary. arXiv:2101.08452. Retrieved from https:\/\/arxiv.org\/abs\/2101.08452"},{"key":"e_1_3_1_57_2","article-title":"Robust deep reinforcement learning against adversarial perturbations on state observations","author":"Zhang Huan","year":"2020","unstructured":"Huan Zhang, Hongge Chen, Chaowei Xiao, Bo Li, Mingyan Liu, Duane Boning, and Cho-Jui Hsieh. 2020. Robust deep reinforcement learning against adversarial perturbations on state observations. In Proceedings of the Advances in Neural Information Processing Systems.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS49844.2020.00028"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-42637-7_6"},{"key":"e_1_3_1_60_2","first-page":"15338","article-title":"First order constrained optimization in policy space","volume":"33","author":"Zhang Yiming","year":"2020","unstructured":"Yiming Zhang, Quan Vuong, and Keith Ross. 2020. First order constrained optimization in policy space. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 33, 15338\u201315349.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/2656045.2656061"}],"container-title":["ACM Transactions on Cyber-Physical Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3788281","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3788281","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T06:34:49Z","timestamp":1776839689000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3788281"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,21]]},"references-count":60,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2026,4,30]]}},"alternative-id":["10.1145\/3788281"],"URL":"https:\/\/doi.org\/10.1145\/3788281","relation":{},"ISSN":["2378-962X","2378-9638"],"issn-type":[{"value":"2378-962X","type":"print"},{"value":"2378-9638","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,4,21]]},"assertion":[{"value":"2024-09-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-17","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-04-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}