{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T01:22:23Z","timestamp":1771636943842,"version":"3.50.1"},"publisher-location":"Cham","reference-count":41,"publisher":"Springer International Publishing","isbn-type":[{"value":"9783030816841","type":"print"},{"value":"9783030816858","type":"electronic"}],"license":[{"start":{"date-parts":[[2021,1,1]],"date-time":"2021-01-01T00:00:00Z","timestamp":1609459200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,7,15]],"date-time":"2021-07-15T00:00:00Z","timestamp":1626307200000},"content-version":"vor","delay-in-days":195,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this paper, we propose a safe reinforcement learning approach to synthesize deep neural network (DNN) controllers for nonlinear systems subject to safety constraints. The proposed approach employs an iterative scheme where a<jats:italic>learner<\/jats:italic>and a<jats:italic>verifier<\/jats:italic>interact to synthesize safe DNN controllers. The<jats:italic>learner<\/jats:italic>trains a DNN controller via deep reinforcement learning, and the<jats:italic>verifier<\/jats:italic>certifies the learned controller through computing a maximal safe initial region and its corresponding barrier certificate, based on polynomial abstraction and bilinear matrix inequalities solving. Compared with the existing verification-in-the-loop synthesis methods, our iterative framework is a sequential synthesis scheme of controllers and barrier certificates, which can learn safe controllers with adaptive barrier certificates rather than user-defined ones. We implement the tool SRLBC and evaluate its performance over a set of benchmark examples. The experimental results demonstrate that our approach efficiently synthesizes safe DNN controllers even for a nonlinear system with dimension up\u00a0to 12.<\/jats:p>","DOI":"10.1007\/978-3-030-81685-8_22","type":"book-chapter","created":{"date-parts":[[2021,7,17]],"date-time":"2021-07-17T00:02:35Z","timestamp":1626480155000},"page":"467-490","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["An Iterative Scheme of Safe Reinforcement Learning for Nonlinear Systems via Barrier Certificate Generation"],"prefix":"10.1007","author":[{"given":"Zhengfeng","family":"Yang","sequence":"first","affiliation":[]},{"given":"Yidan","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Wang","family":"Lin","sequence":"additional","affiliation":[]},{"given":"Xia","family":"Zeng","sequence":"additional","affiliation":[]},{"given":"Xiaochao","family":"Tang","sequence":"additional","affiliation":[]},{"given":"Zhenbing","family":"Zeng","sequence":"additional","affiliation":[]},{"given":"Zhiming","family":"Liu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,7,15]]},"reference":[{"key":"22_CR1","doi-asserted-by":"crossref","unstructured":"Ahmadi, M., Singletary, A., Burdick, J.W., Ames, A.D.: Safe policy synthesis in multi-agent POMDPs via discrete-time barrier functions. In: Proceedings of the IEEE 58th Conference on Decision and Control (CDC), pp. 4797\u20134803. IEEE (2019)","DOI":"10.1109\/CDC40024.2019.9030241"},{"key":"22_CR2","doi-asserted-by":"crossref","unstructured":"Ames, A.D., Coogan, S., Egerstedt, M., Notomista, G., Sreenath, K., Tabuada, P.: Control barrier functions: theory and applications. In: Proceedings of the 17th European Control Conference, (ECC), pp. 3420\u20133431 (2019)","DOI":"10.23919\/ECC.2019.8796030"},{"issue":"5","key":"22_CR3","doi-asserted-by":"publisher","first-page":"834","DOI":"10.1109\/TSMC.1983.6313077","volume":"13","author":"AG Barto","year":"1983","unstructured":"Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 13(5), 834\u2013846 (1983)","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"22_CR4","doi-asserted-by":"crossref","unstructured":"Bouissou, O., Chapoutot, A., Djaballah, A., Kieffer, M.: Computation of parametric barrier functions for dynamical systems using interval analysis. In: Proceedings of the 53rd IEEE Conference on Decision and Control (CDC), pp. 753\u2013758. IEEE (2014)","DOI":"10.1109\/CDC.2014.7039472"},{"key":"22_CR5","unstructured":"Chang, Y.C., Roohi, N., Gao, S.: Neural Lyapunov control. In: Proceedings of the Annual Conference on Advances in Neural Information Processing Systems (NeurIPS), pp. 3245\u20133254 (2019)"},{"issue":"10","key":"22_CR6","doi-asserted-by":"publisher","first-page":"1846","DOI":"10.1109\/TAC.2004.835589","volume":"49","author":"G Chesi","year":"2004","unstructured":"Chesi, G.: Computing output feedback controllers to enlarge the domain of attraction in polynomial systems. IEEE Trans. Autom. Control 49(10), 1846\u20131853 (2004)","journal-title":"IEEE Trans. Autom. Control"},{"key":"22_CR7","unstructured":"Davis, P.J.: Interpolation and Approximation. Dover Books on Mathematics. Dover Publications, New York (1975)"},{"key":"22_CR8","doi-asserted-by":"crossref","unstructured":"Deshmukh, J.V., Kapinski, J., Yamaguchi, T., Prokhorov, D.: Learning deep neural network controllers for dynamical systems with safety guarantees: Invited paper. In: Proceedings of the IEEE\/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1\u20137 (2019)","DOI":"10.1109\/ICCAD45719.2019.8942130"},{"issue":"1","key":"22_CR9","first-page":"99","volume":"49","author":"M Ducho\u0148","year":"2011","unstructured":"Ducho\u0148, M.: A generalized bernstein approximation theorem. Tatra Mt. Math. Publ. 49(1), 99\u2013109 (2011)","journal-title":"Tatra Mt. Math. Publ."},{"key":"22_CR10","doi-asserted-by":"crossref","unstructured":"Dutta, S., Chen, X., Jha, S., Sankaranarayanan, S., Tiwari, A.: Sherlock - a tool for verification of neural network feedback systems: demo abstract. In: Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control (HSCC), pp. 262\u2013263 (2019)","DOI":"10.1145\/3302504.3313351"},{"key":"22_CR11","doi-asserted-by":"crossref","unstructured":"Dutta, S., Chen, X., Sankaranarayanan, S.: Reachability analysis for neural feedback systems using regressive polynomial rule inference. In: Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control (HSCC), pp. 157\u2013168 (2019)","DOI":"10.1145\/3302504.3311807"},{"issue":"16","key":"22_CR12","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1016\/j.ifacol.2018.08.026","volume":"51","author":"S Dutta","year":"2018","unstructured":"Dutta, S., Jha, S., Sankaranarayanan, S., Tiwari, A.: Learning and verification of feedback control systems using feedforward neural networks. IFAC-PapersOnLine 51(16), 151\u2013156 (2018)","journal-title":"IFAC-PapersOnLine"},{"key":"22_CR13","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1007\/978-3-319-77935-5_9","volume-title":"NASA Formal Methods","author":"S Dutta","year":"2018","unstructured":"Dutta, S., Jha, S., Sankaranarayanan, S., Tiwari, A.: Output range analysis for deep feedforward neural networks. In: Dutle, A., Mu\u00f1oz, C., Narkawicz, A. (eds.) NFM 2018. LNCS, vol. 10811, pp. 121\u2013138. Springer, Cham (2018). https:\/\/doi.org\/10.1007\/978-3-319-77935-5_9"},{"key":"22_CR14","unstructured":"Fazlyab, M., Robey, A., Hassani, H., Morari, M., Pappas, G.J.: Efficient and accurate estimation of lipschitz constants for deep neural networks. arXiv preprint arXiv:1906.04893 (2019)"},{"key":"22_CR15","doi-asserted-by":"crossref","unstructured":"Fulton, N., Platzer, A.: Safe reinforcement learning via formal methods: toward safe control through proof and learning. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), pp. 6485\u20136492 (2018)","DOI":"10.1609\/aaai.v32i1.12107"},{"key":"22_CR16","unstructured":"Gao, S.: Quadcopter model. https:\/\/github.com\/dreal\/benchmarks"},{"issue":"42","key":"22_CR17","first-page":"1437","volume":"16","author":"J Garc\u00eda","year":"2015","unstructured":"Garc\u00eda, J., o Fern\u00e1ndez, F., et al.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(42), 1437\u20131480 (2015)","journal-title":"J. Mach. Learn. Res."},{"issue":"5s","key":"22_CR18","doi-asserted-by":"publisher","first-page":"106:1","DOI":"10.1145\/3358228","volume":"18","author":"C Huang","year":"2019","unstructured":"Huang, C., Fan, J., Li, W., Chen, X., Zhu, Q.: ReachNN: reachability analysis of neural-network controlled systems. ACM Trans. Embedded Comput. Syst. 18(5s), 106:1-106:22 (2019)","journal-title":"ACM Trans. Embedded Comput. Syst."},{"key":"22_CR19","doi-asserted-by":"crossref","unstructured":"Ivanov, R., Weimer, J., Alur, R., Pappas, G.J., Lee, I.: Verisig: verifying safety properties of hybrid systems with neural network controllers. In: Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control (HSCC), pp. 169\u2013178 (2019)","DOI":"10.1145\/3302504.3311806"},{"key":"22_CR20","unstructured":"Jarvis-Wloszek, Z.: Lyapunov based analysis and controller synthesis for polynomial systems using sum-of-squares optimization. Ph.D. thesis, University of California (2003)"},{"key":"22_CR21","doi-asserted-by":"crossref","unstructured":"Klipp, E., Herwig, R., Kowald, A., Wierling, C., Lehrach, H.: Systems Biology in Practice: Concepts. Implementation and Application, Wiley-Blackwell (2005)","DOI":"10.1002\/3527603603"},{"key":"22_CR22","unstructured":"Ko\u010dvara, M., Stingl, M.: PENBMI user\u2019s guide (version 2.0) (2005). http:\/\/www.penopt.com"},{"key":"22_CR23","unstructured":"Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: Proceedings of the 4th International Conference on Learning Representations (ICLR) (2016)"},{"key":"22_CR24","unstructured":"Liu, W., Mehdipour, N., Belta, C.: Recurrent neural network controllers for signal temporal logic specifications subject to safety constraints (2020). https:\/\/arxiv.org\/abs\/2009.11468"},{"key":"22_CR25","unstructured":"Mittal, M., Gallieri, M., Quaglino, A., Salehian, S.S.M., Koutn\u00edk, J.: Neural Lyapunov model predictive control (2020). https:\/\/arxiv.org\/abs\/2002.10451"},{"issue":"8","key":"22_CR26","doi-asserted-by":"publisher","first-page":"1415","DOI":"10.1109\/TAC.2007.902736","volume":"52","author":"S Prajna","year":"2007","unstructured":"Prajna, S., Jadbabaie, A., Pappas, G.J.: A framework for worst-case and stochastic safety verification using barrier certificates. IEEE Trans. Autom. Control 52(8), 1415\u20131429 (2007)","journal-title":"IEEE Trans. Autom. Control"},{"issue":"2","key":"22_CR27","doi-asserted-by":"publisher","first-page":"310","DOI":"10.1109\/TAC.2003.823000","volume":"49","author":"S Prajna","year":"2004","unstructured":"Prajna, S., Parrilo, P.A., Rantzer, A.: Nonlinear control synthesis by convex optimization. IEEE Trans. Autom. Control 49(2), 310\u2013314 (2004)","journal-title":"IEEE Trans. Autom. Control"},{"key":"22_CR28","doi-asserted-by":"crossref","unstructured":"Pylorof, D., Bakolas, E.: Analysis and synthesis of nonlinear controllers for input constrained systems using semidefinite programming optimization. In: Proceedings of the 2016 American Control Conference (ACC), pp. 6959\u20136964 (2016)","DOI":"10.1109\/ACC.2016.7526769"},{"issue":"2","key":"22_CR29","doi-asserted-by":"publisher","first-page":"275","DOI":"10.1007\/s10514-018-9791-9","volume":"43","author":"H Ravanbakhsh","year":"2019","unstructured":"Ravanbakhsh, H., Sankaranarayanan, S.: Learning control Lyapunov functions from counterexamples and demonstrations. Auton. Rob. 43(2), 275\u2013307 (2019)","journal-title":"Auton. Rob."},{"key":"22_CR30","unstructured":"Richards, S.M., Berkenkamp, F., Krause, A.: The Lyapunov neural network: adaptive stability certification for safe learning of dynamic systems (2018). http:\/\/arxiv.org\/abs\/1808.00924"},{"key":"22_CR31","doi-asserted-by":"crossref","unstructured":"Ruan, W., Huang, X., Kwiatkowska, M.: Reachability analysis of deep neural networks with provable guarantees. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI), pp. 2651\u20132659 (2018)","DOI":"10.24963\/ijcai.2018\/368"},{"key":"22_CR32","unstructured":"Sassi, M.A.B., Sankaranarayanan, S.: Stabilization of polynomial dynamical systems using linear programming based on bernstein polynomials (2015). arXiv preprint arXiv:1501.04578"},{"key":"22_CR33","doi-asserted-by":"crossref","unstructured":"Squires, E., Pierpaoli, P., Egerstedt, M.: Constructive barrier certificates with applications to fixed-wing aircraft collision avoidance. In: Proceedings of the IEEE Conference on Control Technology and Applications (CCTA), pp. 1656\u20131661 (2018)","DOI":"10.1109\/CCTA.2018.8511342"},{"key":"22_CR34","unstructured":"Szegedy, C., et al.: Intriguing properties of neural networks. In: Proceedings of the 2nd International Conference on Learning Representations (ICLR) (2014)"},{"key":"22_CR35","doi-asserted-by":"crossref","unstructured":"Tuncali, C.E., Kapinski, J., Ito, H., Deshmukh, J.V.: Reasoning about safety of learning-enabled components in autonomous cyber-physical systems. In: Proceedings of the 55th Annual Design Automation Conference (DAC), pp. 30:1\u201330:6 (2018)","DOI":"10.1145\/3195970.3199852"},{"key":"22_CR36","unstructured":"Turchetta, M., Kolobov, A., Shah, S., Krause, A., Agarwal, A.: Safe reinforcement learning via curriculum induction. In: Proceedings of the Annual Conference on Advances in Neural Information Processing Systems (NeurIPS), pp. 12151\u201312162 (2020)"},{"key":"22_CR37","doi-asserted-by":"crossref","unstructured":"Xiang, W., Tran, H.D., Rosenfeld, J.A., Johnson, T.T.: Reachable set estimation and safety verification for piecewise linear systems with neural network controllers. In: Proceedings of the Annual American Control Conference (ACC), pp. 1574\u20131579 (2018)","DOI":"10.23919\/ACC.2018.8431048"},{"key":"22_CR38","doi-asserted-by":"crossref","unstructured":"Zeng, X., Lin, W., Yang, Z., Chen, X., Wang, L.: Darboux-type barrier certificates for safety verification of nonlinear hybrid systems. In: Proceedings of the 2016 International Conference on Embedded Software (EMSOFT), pp. 1\u201310 (2016)","DOI":"10.1145\/2968478.2968484"},{"key":"22_CR39","doi-asserted-by":"crossref","unstructured":"Zhao, H., Zeng, X., Chen, T., Liu, Z., Woodcock, J.: Learning safe neural network controllers with barrier certificates. In: Proceedings of the International Symposium on the Dependable Software Engineering. Theories, Tools, and Applications (SETTA), pp. 177\u2013185 (2020)","DOI":"10.1007\/978-3-030-62822-2_11"},{"key":"22_CR40","doi-asserted-by":"publisher","unstructured":"Zhao, H., Zeng, X., Chen, T. Liu, Z., Woodcock, J.: Learning safe neural network controllers with barrier certificates. Formal Aspects Comput., 1\u201319 (2021). https:\/\/doi.org\/10.1007\/s00165-021-00544-5","DOI":"10.1007\/s00165-021-00544-5"},{"key":"22_CR41","doi-asserted-by":"crossref","unstructured":"Zhu, H., Xiong, Z., Magill, S., Jagannathan, S.: An inductive synthesis framework for verifiable reinforcement learning. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 686\u2013701 (2019)","DOI":"10.1145\/3314221.3314638"}],"container-title":["Lecture Notes in Computer Science","Computer Aided Verification"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-030-81685-8_22","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,4]],"date-time":"2023-01-04T18:43:18Z","timestamp":1672857798000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-030-81685-8_22"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021]]},"ISBN":["9783030816841","9783030816858"],"references-count":41,"URL":"https:\/\/doi.org\/10.1007\/978-3-030-81685-8_22","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021]]},"assertion":[{"value":"15 July 2021","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"CAV","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Conference on Computer Aided Verification","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2021","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"20 July 2021","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"23 July 2021","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"33","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"cav2021","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"http:\/\/i-cav.org\/2021\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Double-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"EasyChair","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"290","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"63","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"22% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"12","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Yes","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"16 tool papers and 5 invited papers are also included.","order":10,"name":"additional_info_on_review_process","label":"Additional Info on Review Process","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}