{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,7]],"date-time":"2024-09-07T08:43:25Z","timestamp":1725698605129},"publisher-location":"New York, NY, USA","reference-count":35,"publisher":"ACM","funder":[{"DOI":"10.13039\/https:\/\/doi.org\/10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["2105007"],"id":[{"id":"10.13039\/https:\/\/doi.org\/10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,11,30]]},"DOI":"10.1145\/3605764.3623913","type":"proceedings-article","created":{"date-parts":[[2023,11,21]],"date-time":"2023-11-21T17:12:17Z","timestamp":1700586737000},"page":"139-148","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Task-Agnostic Safety for Reinforcement Learning"],"prefix":"10.1145","author":[{"ORCID":"http:\/\/orcid.org\/0000-0001-9020-2789","authenticated-orcid":false,"given":"Md Asifur","family":"Rahman","sequence":"first","affiliation":[{"name":"Wake Forest University, Winston-Salem, NC, USA"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-4572-7150","authenticated-orcid":false,"given":"Sarra","family":"Alqahtani","sequence":"additional","affiliation":[{"name":"Wake Forest University, Winston-Salem, NC, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,11,26]]},"reference":[{"volume-title":"International conference on machine learning. PMLR, 22-- 31","year":"2017","author":"Achiam Joshua","key":"e_1_3_2_1_1_1","unstructured":"Joshua Achiam , David Held , Aviv Tamar , and Pieter Abbeel . 2017 . Constrained policy optimization . In International conference on machine learning. PMLR, 22-- 31 . Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained policy optimization. In International conference on machine learning. PMLR, 22-- 31."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11797"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/978--3--319--44482--6_10"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CDC.2017.8263977"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.23919\/ACC50511.2021.9483182"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1186\/s42400-019-0027-x"},{"volume-title":"Safe multi-agent reinforcement learning via shielding. arXiv preprint arXiv:2101.11196","year":"2021","author":"ElSayed-Aly Ingy","key":"e_1_3_2_1_7_1","unstructured":"Ingy ElSayed-Aly , Suda Bharadwaj , Christopher Amato , R\u00fcdiger Ehlers , Ufuk Topcu , and Lu Feng . 2021. Safe multi-agent reinforcement learning via shielding. arXiv preprint arXiv:2101.11196 ( 2021 ). Ingy ElSayed-Aly, Suda Bharadwaj, Christopher Amato, R\u00fcdiger Ehlers, Ufuk Topcu, and Lu Feng. 2021. Safe multi-agent reinforcement learning via shielding. arXiv preprint arXiv:2101.11196 (2021)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACSOS55765.2022.00023"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8794107"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/11871842_63"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1666"},{"key":"e_1_3_2_1_12_1","unstructured":"Seyed Kamyar Seyed Ghasemipour Shane Gu and Richard Zemel. 2019. Understanding the relation between maximum-entropy inverse reinforcement learning and behaviour cloning. (2019). Seyed Kamyar Seyed Ghasemipour Shane Gu and Richard Zemel. 2019. Understanding the relation between maximum-entropy inverse reinforcement learning and behaviour cloning. (2019)."},{"volume-title":"Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL. arXiv preprint arXiv:2206.00695","year":"2022","author":"Goo Wonjoon","key":"e_1_3_2_1_13_1","unstructured":"Wonjoon Goo and Scott Niekum . 2022. Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL. arXiv preprint arXiv:2206.00695 ( 2022 ). Wonjoon Goo and Scott Niekum. 2022. Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL. arXiv preprint arXiv:2206.00695 (2022)."},{"volume-title":"IDES: Self-adaptive Software with Online Policy Evolution Extended from Rainbow.","year":"2012","author":"Gu Xiaodong","key":"e_1_3_2_1_14_1","unstructured":"Xiaodong Gu . 2012 . IDES: Self-adaptive Software with Online Policy Evolution Extended from Rainbow. Xiaodong Gu. 2012. IDES: Self-adaptive Software with Online Policy Evolution Extended from Rainbow."},{"volume-title":"International conference on machine learning. PMLR","year":"2018","author":"Haarnoja Tuomas","key":"e_1_3_2_1_15_1","unstructured":"Tuomas Haarnoja , Aurick Zhou , Pieter Abbeel , and Sergey Levine . 2018 . Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor . In International conference on machine learning. PMLR , 1861-- 1870. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861-- 1870."},{"key":"e_1_3_2_1_16_1","unstructured":"Tuomas Haarnoja Aurick Zhou Kristian Hartikainen George Tucker Sehoon Ha Jie Tan Vikash Kumar Henry Zhu Abhishek Gupta Pieter Abbeel etal 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018). Tuomas Haarnoja Aurick Zhou Kristian Hartikainen George Tucker Sehoon Ha Jie Tan Vikash Kumar Henry Zhu Abhishek Gupta Pieter Abbeel et al. 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)."},{"volume-title":"Anton Maximilian Sch\u00e4fer, and Steffen Udluft","year":"2008","author":"Hans Alexander","key":"e_1_3_2_1_17_1","unstructured":"Alexander Hans , Daniel Schneega\u00df , Anton Maximilian Sch\u00e4fer, and Steffen Udluft . 2008 . Safe exploration for reinforcement learning.. In ESANN. Citeseer , 143--148. Alexander Hans, Daniel Schneega\u00df, Anton Maximilian Sch\u00e4fer, and Steffen Udluft. 2008. Safe exploration for reinforcement learning.. In ESANN. Citeseer, 143--148."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-66723-8_19"},{"volume-title":"International Workshop on the Foundations of Trustworthy AI Integrating Learning, Optimization and Reasoning. Springer, 123--139","year":"2020","author":"Kim Youngmin","key":"e_1_3_2_1_19_1","unstructured":"Youngmin Kim , Richard Allmendinger , and Manuel L\u00f3pez-Ib\u00e1\u00f1ez . 2020 . Safe learning and optimization techniques: Towards a survey of the state of the art . In International Workshop on the Foundations of Trustworthy AI Integrating Learning, Optimization and Reasoning. Springer, 123--139 . Youngmin Kim, Richard Allmendinger, and Manuel L\u00f3pez-Ib\u00e1\u00f1ez. 2020. Safe learning and optimization techniques: Towards a survey of the state of the art. In International Workshop on the Foundations of Trustworthy AI Integrating Learning, Optimization and Reasoning. Springer, 123--139."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5887"},{"volume-title":"Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909","year":"2018","author":"Levine Sergey","key":"e_1_3_2_1_21_1","unstructured":"Sergey Levine . 2018. Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909 ( 2018 ). Sergey Levine. 2018. Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909 (2018)."},{"volume-title":"Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International journal of robotics research 37, 4--5","year":"2018","author":"Levine Sergey","key":"e_1_3_2_1_22_1","unstructured":"Sergey Levine , Peter Pastor , Alex Krizhevsky , Julian Ibarz , and Deirdre Quillen . 2018. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International journal of robotics research 37, 4--5 ( 2018 ), 421--436. Sergey Levine, Peter Pastor, Alex Krizhevsky, Julian Ibarz, and Deirdre Quillen. 2018. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International journal of robotics research 37, 4--5 (2018), 421--436."},{"volume-title":"Realizing self-adaptive systems via online reinforcement learning and feature-model-guided exploration. Computing (03","year":"2022","author":"Metzger Andreas","key":"e_1_3_2_1_23_1","unstructured":"Andreas Metzger , Cl\u00e9ment Quinton , Zoltan Mann , Luciano Baresi , and Klaus Pohl . 2022. Realizing self-adaptive systems via online reinforcement learning and feature-model-guided exploration. Computing (03 2022 ). https:\/\/doi.org\/10. 1007\/s00607-022-01052-x Andreas Metzger, Cl\u00e9ment Quinton, Zoltan Mann, Luciano Baresi, and Klaus Pohl. 2022. Realizing self-adaptive systems via online reinforcement learning and feature-model-guided exploration. Computing (03 2022). https:\/\/doi.org\/10. 1007\/s00607-022-01052-x"},{"volume-title":"Risk-sensitive reinforcement learning. Machine learning 49, 2","year":"2002","author":"Mihatsch Oliver","key":"e_1_3_2_1_24_1","unstructured":"Oliver Mihatsch and Ralph Neuneier . 2002. Risk-sensitive reinforcement learning. Machine learning 49, 2 ( 2002 ), 267--290. Oliver Mihatsch and Ralph Neuneier. 2002. Risk-sensitive reinforcement learning. Machine learning 49, 2 (2002), 267--290."},{"volume-title":"Markov decision processes: discrete stochastic dynamic programming","author":"Puterman Martin L","key":"e_1_3_2_1_25_1","unstructured":"Martin L Puterman . 2014. Markov decision processes: discrete stochastic dynamic programming . John Wiley & Sons . Martin L Puterman. 2014. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons."},{"key":"e_1_3_2_1_26_1","unstructured":"Alex Ray Joshua Achiam and Dario Amodei. 2019. Benchmarking Safe Exploration in Deep Reinforcement Learning. (2019). Alex Ray Joshua Achiam and Dario Amodei. 2019. Benchmarking Safe Exploration in Deep Reinforcement Learning. (2019)."},{"volume-title":"International conference on machine learning. PMLR","year":"2015","author":"Schulman John","key":"e_1_3_2_1_27_1","unstructured":"John Schulman , Sergey Levine , Pieter Abbeel , Michael Jordan , and Philipp Moritz . 2015 . Trust region policy optimization . In International conference on machine learning. PMLR , 1889--1897. John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. PMLR, 1889--1897."},{"volume-title":"International conference on machine learning. PMLR, 387--395","year":"2014","author":"Silver David","key":"e_1_3_2_1_28_1","unstructured":"David Silver , Guy Lever , Nicolas Heess , Thomas Degris , Daan Wierstra , and Martin Riedmiller . 2014 . Deterministic policy gradient algorithms . In International conference on machine learning. PMLR, 387--395 . David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In International conference on machine learning. PMLR, 387--395."},{"volume-title":"Learning to be safe: Deep rl with a safety critic. arXiv preprint arXiv:2010.14603","year":"2020","author":"Srinivasan Krishnan","key":"e_1_3_2_1_29_1","unstructured":"Krishnan Srinivasan , Benjamin Eysenbach , Sehoon Ha , Jie Tan , and Chelsea Finn . 2020. Learning to be safe: Deep rl with a safety critic. arXiv preprint arXiv:2010.14603 ( 2020 ). Krishnan Srinivasan, Benjamin Eysenbach, Sehoon Ha, Jie Tan, and Chelsea Finn. 2020. Learning to be safe: Deep rl with a safety critic. arXiv preprint arXiv:2010.14603 (2020)."},{"volume-title":"Barto","year":"2018","author":"Sutton Richard S.","key":"e_1_3_2_1_30_1","unstructured":"Richard S. Sutton and Andrew G . Barto . 2018 . Reinforcement Learning : An Introduction. A Bradford Book, Cambridge, MA, USA. Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA."},{"volume-title":"International Conference on Machine Learning. PMLR, 6215--6224","year":"2019","author":"Tessler Chen","key":"e_1_3_2_1_31_1","unstructured":"Chen Tessler , Yonathan Efroni , and Shie Mannor . 2019 . Action robust reinforcement learning and applications in continuous control . In International Conference on Machine Learning. PMLR, 6215--6224 . Chen Tessler, Yonathan Efroni, and Shie Mannor. 2019. Action robust reinforcement learning and applications in continuous control. In International Conference on Machine Learning. PMLR, 6215--6224."},{"volume-title":"Reward constrained policy optimization. arXiv preprint arXiv:1805.11074","year":"2018","author":"Tessler Chen","key":"e_1_3_2_1_32_1","unstructured":"Chen Tessler , Daniel J Mankowitz , and Shie Mannor . 2018. Reward constrained policy optimization. arXiv preprint arXiv:1805.11074 ( 2018 ). Chen Tessler, Daniel J Mankowitz, and Shie Mannor. 2018. Reward constrained policy optimization. arXiv preprint arXiv:1805.11074 (2018)."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3070252"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2020.2976272"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/978--3-031-"}],"event":{"name":"CCS '23: ACM SIGSAC Conference on Computer and Communications Security","sponsor":["SIGSAC ACM Special Interest Group on Security, Audit, and Control"],"location":"Copenhagen Denmark","acronym":"CCS '23"},"container-title":["Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3605764.3623913","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,11]],"date-time":"2024-01-11T11:11:54Z","timestamp":1704971514000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3605764.3623913"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,26]]},"references-count":35,"alternative-id":["10.1145\/3605764.3623913","10.1145\/3605764"],"URL":"http:\/\/dx.doi.org\/10.1145\/3605764.3623913","relation":{},"subject":[],"published":{"date-parts":[[2023,11,26]]},"assertion":[{"value":"2023-11-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}