{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T13:05:12Z","timestamp":1760101512725,"version":"3.40.3"},"publisher-location":"Cham","reference-count":52,"publisher":"Springer Nature Switzerland","isbn-type":[{"type":"print","value":"9783031656323"},{"type":"electronic","value":"9783031656330"}],"license":[{"start":{"date-parts":[[2024,1,1]],"date-time":"2024-01-01T00:00:00Z","timestamp":1704067200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,7,26]],"date-time":"2024-07-26T00:00:00Z","timestamp":1721952000000},"content-version":"vor","delay-in-days":207,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We introduce VELM, a reinforcement learning (RL) framework grounded in verification principles for safe exploration in unknown environments. VELM ensures that an RL agent systematically explores its environment, adhering to safety properties throughout the learning process. VELM learns environment models as symbolic formulas and conducts formal reachability analysis over the learned models for safety verification. An online shielding layer is then constructed to confine the RL agent\u2019s exploration solely within a state space verified as safe in the learned model, thereby bolstering the overall safety profile of the RL system. Our experimental results demonstrate the efficacy of VELM across diverse RL environments, highlighting its capacity to significantly reduce safety violations in comparison to existing safe learning techniques, all without compromising the RL agent\u2019s reward performance.<\/jats:p>","DOI":"10.1007\/978-3-031-65633-0_11","type":"book-chapter","created":{"date-parts":[[2024,7,25]],"date-time":"2024-07-25T07:03:08Z","timestamp":1721890988000},"page":"232-255","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Safe Exploration in\u00a0Reinforcement Learning by\u00a0Reachability Analysis over\u00a0Learned Models"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-4317-9758","authenticated-orcid":false,"given":"Yuning","family":"Wang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9606-150X","authenticated-orcid":false,"given":"He","family":"Zhu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,7,26]]},"reference":[{"key":"11_CR1","unstructured":"Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6\u201311 August 2017. Proceedings of Machine Learning Research, vol.\u00a070 (2017)"},{"key":"11_CR2","doi-asserted-by":"crossref","unstructured":"Akametalu, A.K., Kaynama, S., Fisac, J.F., Zeilinger, M.N., Gillula, J.H., Tomlin, C.J.: Reachability-based safe learning with gaussian processes. In: 53rd IEEE Conference on Decision and Control, CDC 2014, Los Angeles, CA, USA, 15\u201317 December, 2014 (2014)","DOI":"10.1109\/CDC.2014.7039601"},{"key":"11_CR3","doi-asserted-by":"crossref","unstructured":"Alshiekh, M., Bloem, R., Ehlers, R., K\u00f6nighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), New Orleans, Louisiana, USA, 2\u20137 February, 2018 (2018)","DOI":"10.1609\/aaai.v32i1.11797"},{"key":"11_CR4","unstructured":"Anderson, G., Chaudhuri, S., Dillig, I.: Guiding safe exploration with weakest preconditions. In: The Eleventh International Conference on Learning Representations (2023)"},{"key":"11_CR5","unstructured":"Anderson, G., Verma, A., Dillig, I., Chaudhuri, S.: Neurosymbolic reinforcement learning with formally verified exploration. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual (2020)"},{"key":"11_CR6","unstructured":"Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4\u20139 December, 2017, Long Beach, CA, USA (2017)"},{"key":"11_CR7","unstructured":"Bharadhwaj, H., Kumar, A., Rhinehart, N., Levine, S., Shkurti, F., Garg, A.: Conservative safety critics for exploration. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, 3\u20137 May, 2021 (2021)"},{"key":"11_CR8","unstructured":"Brockman, G., et al.: Openai gym (2016)"},{"key":"11_CR9","doi-asserted-by":"crossref","unstructured":"Burlacu, B., Kronberger, G., Kommenda, M.: Operon c++: an efficient genetic programming framework for symbolic regression. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, GECCO 2020 (2020)","DOI":"10.1145\/3377929.3398099"},{"key":"11_CR10","unstructured":"Cava, W.G.L., et al.: Contemporary symbolic regression methods and their relative performance. In: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, Virtual, December 2021"},{"key":"11_CR11","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"258","DOI":"10.1007\/978-3-642-39799-8_18","volume-title":"Computer Aided Verification","author":"X Chen","year":"2013","unstructured":"Chen, X., \u00c1brah\u00e1m, E., Sankaranarayanan, S.: Flow*: an analyzer for non-linear hybrid systems. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 258\u2013263. Springer, Heidelberg (2013). https:\/\/doi.org\/10.1007\/978-3-642-39799-8_18"},{"key":"11_CR12","unstructured":"Cheng, R., Orosz, G., Murray, R.M., Burdick, J.W.: End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, Honolulu, Hawaii, USA, 27 January\u20131 February, 2019 (2019)"},{"key":"11_CR13","unstructured":"Chow, Y., Nachum, O., Du\u00e9\u00f1ez-Guzm\u00e1n, E.A., Ghavamzadeh, M.: A lyapunov-based approach to safe reinforcement learning. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr\u00e9al, Canada (2018)"},{"key":"11_CR14","unstructured":"Chow, Y., Nachum, O., Faust, A., Du\u00e9\u00f1ez-Guzm\u00e1n, E.A., Ghavamzadeh, M.: Safe policy learning for continuous control. In: 4th Conference on Robot Learning, CoRL 2020, 16\u201318 November 2020, Virtual Event\/Cambridge, MA, USA. Proceedings of Machine Learning Research, vol.\u00a0155 (2020)"},{"key":"11_CR15","unstructured":"Dalal, G., Dvijotham, K., Vecer\u00edk, M., Hester, T., Paduraru, C., Tassa, Y.: Safe exploration in continuous action spaces. CoRR abs\/1801.08757 (2018)"},{"key":"11_CR16","unstructured":"Dijkstra, E.W.: A Discipline of Programming. Prentice-Hall (1976)"},{"key":"11_CR17","unstructured":"Donti, P.L., Roderick, M., Fazlyab, M., Kolter, J.Z.: Enforcing robust control guarantees within neural network policies. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, 3\u20137 May, 2021 (2021)"},{"key":"11_CR18","doi-asserted-by":"crossref","unstructured":"Fisac, J.F., Akametalu, A.K., Zeilinger, M.N., Kaynama, S., Gillula, J.H., Tomlin, C.J.: A general safety framework for learning-based control in uncertain robotic systems. IEEE Trans. Autom. Control. 64(7) (2019)","DOI":"10.1109\/TAC.2018.2876389"},{"key":"11_CR19","doi-asserted-by":"crossref","unstructured":"Fran\u00e7ois-Lavet, V., Henderson, P., Islam, R., Bellemare, M.G., Pineau, J.: An introduction to deep reinforcement learning. Found. Trends. Mach. Learn. 11(3-4) (2018)","DOI":"10.1561\/2200000071"},{"key":"11_CR20","doi-asserted-by":"crossref","unstructured":"Fulton, N., Platzer, A.: Safe reinforcement learning via formal methods: toward safe control through proof and learning. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), New Orleans, Louisiana, USA, 2\u20137 February, 2018 (2018)","DOI":"10.1609\/aaai.v32i1.12107"},{"key":"11_CR21","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"413","DOI":"10.1007\/978-3-030-17462-0_28","volume-title":"Tools and Algorithms for the Construction and Analysis of Systems","author":"N Fulton","year":"2019","unstructured":"Fulton, N., Platzer, A.: Verifiably safe off-model reinforcement learning. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019. LNCS, vol. 11427, pp. 413\u2013430. Springer, Cham (2019). https:\/\/doi.org\/10.1007\/978-3-030-17462-0_28"},{"key":"11_CR22","doi-asserted-by":"crossref","unstructured":"Gillula, J.H., Tomlin, C.J.: Guaranteed safe online learning via reachability: tracking a ground target using a quadrotor. In: IEEE International Conference on Robotics and Automation, ICRA 2012, 14\u201318 May, 2012, St. Paul, Minnesota, USA (2012)","DOI":"10.1109\/ICRA.2012.6225136"},{"key":"11_CR23","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm\u00e4ssan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol.\u00a080 (2018)"},{"key":"11_CR24","doi-asserted-by":"crossref","unstructured":"Hunt, N., Fulton, N., Magliacane, S., Hoang, T.N., Das, S., Solar-Lezama, A.: Verifiably safe exploration for end-to-end reinforcement learning. In: HSCC \u201921: 24th ACM International Conference on Hybrid Systems: Computation and Control, Nashville, Tennessee, 19\u201321 May, 2021 (2021)","DOI":"10.1145\/3447928.3456653"},{"key":"11_CR25","unstructured":"Janner, M., Fu, J., Zhang, M., Levine, S.: When to trust your model: Model-based policy optimization. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8\u201314 December, 2019, Vancouver, BC, Canada (2019)"},{"key":"11_CR26","unstructured":"Jayant, A.K., Bhatnagar, S.: Model-based safe deep reinforcement learning via a constrained proximal policy optimization algorithm. In: NeurIPS (2022)"},{"key":"11_CR27","unstructured":"Johnson, T.T., et al.: ARCH-COMP21 category report: artificial intelligence and neural network control systems (AINNCS) for continuous and hybrid systems plants. In: 8th International Workshop on Applied Verification of Continuous and Hybrid Systems (ARCH21), Brussels, Belgium, July 9, 2021. EPiC Series in Computing, vol.\u00a080 (2021)"},{"key":"11_CR28","unstructured":"Kamienny, P., d\u2019Ascoli, S., Lample, G., Charton, F.: End-to-end symbolic regression with transformers. In: NeurIPS (2022)"},{"key":"11_CR29","doi-asserted-by":"crossref","unstructured":"Koller, T., Berkenkamp, F., Turchetta, M., Krause, A.: Learning-based model predictive control for safe exploration. In: 57th IEEE Conference on Decision and Control, CDC 2018, Miami, FL, USA, 17\u201319 December, 2018 (2018)","DOI":"10.1109\/CDC.2018.8619572"},{"key":"11_CR30","doi-asserted-by":"crossref","unstructured":"Kronberger, G., de\u00a0Fran\u00e7a, F.O., Burlacu, B., Haider, C., Kommenda, M.: Shape-constrained symbolic regression - improving extrapolation with prior knowledge. Evol. Comput. 30(1) (2022)","DOI":"10.1162\/evco_a_00294"},{"key":"11_CR31","doi-asserted-by":"crossref","unstructured":"Li, S., Bastani, O.: Robust model predictive shielding for safe reinforcement learning with stochastic dynamics. In: 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31 - August 31, 2020 (2020)","DOI":"10.1109\/ICRA40945.2020.9196867"},{"key":"11_CR32","unstructured":"Li, Y., Li, N., Tseng, H.E., Girard, A., Filev, D.P., Kolmanovsky, I.V.: Safe reinforcement learning using robust action governor. In: Proceedings of the 3rd Annual Conference on Learning for Dynamics and Control, L4DC 2021, 7\u20138 June 2021, Virtual Event, Switzerland. Proceedings of Machine Learning Research, vol.\u00a0144 (2021)"},{"key":"11_CR33","unstructured":"Liu, Z., Zhou, H., Chen, B., Zhong, S., Hebert, M., Zhao, D.: Safe model-based reinforcement learning with robust cross-entropy method. CoRR abs\/2010.07968 (2020)"},{"key":"11_CR34","unstructured":"Luo, Y., Ma, T.: Learning barrier certificates: towards safe reinforcement learning with zero training-time violations. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, 6\u201314 December, 2021, virtual (2021)"},{"key":"11_CR35","unstructured":"Ma, Y.J., Shen, A., Bastani, O., Jayaraman, D.: Conservative and adaptive penalty for model-based safe reinforcement learning. In: Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Virtual Event, February 22 - March 1, 2022 (2022)"},{"key":"11_CR36","unstructured":"Mania, H., Guy, A., Recht, B.: Simple random search of static linear policies is competitive for reinforcement learning. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montr\u00e9al, Canada (2018)"},{"key":"11_CR37","unstructured":"Moldovan, T.M., Abbeel, P.: Safe exploration in Markov decision processes. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26\u2013July 1, 2012 (2012)"},{"key":"11_CR38","unstructured":"Sikchi, H., Zhou, W., Held, D.: Lyapunov barrier policy optimization. CoRR abs\/2103.09230 (2021)"},{"key":"11_CR39","unstructured":"Srinivasan, K., Eysenbach, B., Ha, S., Tan, J., Finn, C.: Learning to be safe: Deep RL with a safety critic. CoRR abs\/2010.14603 (2020)"},{"key":"11_CR40","unstructured":"Stooke, A., Achiam, J., Abbeel, P.: Responsive safety in reinforcement learning by PID Lagrangian methods. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13\u201318 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol.\u00a0119 (2020)"},{"key":"11_CR41","unstructured":"Tessler, C., Mankowitz, D.J., Mannor, S.: Reward constrained policy optimization. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019 (2019)"},{"key":"11_CR42","doi-asserted-by":"crossref","unstructured":"Thananjeyan, B., et al.: Safe reinforcement learning with learned recovery zones. IEEE Robot. Autom. Lett. 6(3) (2021)","DOI":"10.1109\/LRA.2021.3070252"},{"key":"11_CR43","unstructured":"Turchetta, M., Berkenkamp, F., Krause, A.: Safe exploration in finite Markov decision processes with gaussian processes. In: Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 5\u201310 December, 2016, Barcelona, Spain (2016)"},{"key":"11_CR44","unstructured":"Verma, A., Le, H.M., Yue, Y., Chaudhuri, S.: Imitation-projected programmatic reinforcement learning. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8\u201314 December, 2019, Vancouver, BC, Canada (2019)"},{"key":"11_CR45","unstructured":"Wang, Y., Zhu, H.: Verification-guided programmatic controller synthesis. In: Tools and Algorithms for the Construction and Analysis of Systems - 29th International Conference, TACAS 2023, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022, Paris, France, 22\u201327 April, 2023, Proceedings, Part II. LNCS, vol. 13994 (2023)"},{"key":"11_CR46","unstructured":"Yang, Q., Sim\u00e3o, T.D., Tindemans, S.H., Spaan, M.T.J.: WCSAC: worst-case soft actor critic for safety-constrained reinforcement learning. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Event, 2\u20139 February, 2021 (2021)"},{"key":"11_CR47","unstructured":"Yang, T., Rosca, J., Narasimhan, K., Ramadge, P.J.: Projection-based constrained policy optimization. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26\u201330 April, 2020 (2020)"},{"key":"11_CR48","doi-asserted-by":"crossref","unstructured":"Yang, Z., Zhang, L., Zeng, X., Tang, X., Peng, C., Zeng, Z.: Hybrid controller synthesis for nonlinear systems subject to reach-avoid constraints. In: Computer Aided Verification: 35th International Conference, CAV 2023, Paris, France, July 17-22, 2023, Proceedings, Part I (2023)","DOI":"10.1007\/978-3-031-37706-8_16"},{"key":"11_CR49","unstructured":"Yu, D., Ma, H., Li, S., Chen, J.: Reachability constrained reinforcement learning. In: International Conference on Machine Learning, ICML 2022, 17\u201323 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol.\u00a0162 (2022)"},{"key":"11_CR50","doi-asserted-by":"crossref","unstructured":"Zanger, M.A., Daaboul, K., Z\u00f6llner, J.M.: Safe continuous control with constrained model-based policy optimization. In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, IROS 2021, Prague, Czech Republic, September 27 - October 1, 2021 (2021)","DOI":"10.1109\/IROS51168.2021.9635984"},{"key":"11_CR51","unstructured":"Zhang, Y., Vuong, Q., Ross, K.W.: First order constrained optimization in policy space. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 6\u201312 December, 2020, virtual (2020)"},{"key":"11_CR52","doi-asserted-by":"crossref","unstructured":"Zhu, H., Xiong, Z., Magill, S., Jagannathan, S.: An inductive synthesis framework for verifiable reinforcement learning. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, 22\u201326 June, 2019 (2019)","DOI":"10.1145\/3314221.3314638"}],"container-title":["Lecture Notes in Computer Science","Computer Aided Verification"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-65633-0_11","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,25]],"date-time":"2024-07-25T07:05:47Z","timestamp":1721891147000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-65633-0_11"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"ISBN":["9783031656323","9783031656330"],"references-count":52,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-65633-0_11","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2024]]},"assertion":[{"value":"26 July 2024","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"CAV","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Conference on Computer Aided Verification","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Montreal, QC","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Canada","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2024","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"24 July 2024","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"27 July 2024","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"36","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"cav2024","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"http:\/\/i-cav.org\/2024\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}