{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T06:06:58Z","timestamp":1770271618486,"version":"3.49.0"},"reference-count":65,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2023,9,30]],"date-time":"2023-09-30T00:00:00Z","timestamp":1696032000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,9,30]],"date-time":"2023-09-30T00:00:00Z","timestamp":1696032000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002865","name":"Chongqing Science and Technology Commission","doi-asserted-by":"publisher","award":["cstc2021jscx-jbgsX0001"],"award-info":[{"award-number":["cstc2021jscx-jbgsX0001"]}],"id":[{"id":"10.13039\/501100002865","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Trajectory tracking is a key technology for controlling the autonomous vehicles effectively and stably to track the reference trajectory. How to handle the various constraints in trajectory tracking is very challenging. The recently proposed generalized exterior point method (GEP) shows high computational efficiency and closed-loop performance in solving the constrained trajectory tracking problem. However, the neural networks used in the GEP may suffer from the ill-conditioning issue during model training, which result in a slow or even non-converging training convergence process and the control output of the policy network being suboptimal or even severely constraint-violating. To effectively deal with the large-scale nonlinear state-wise constraints and avoid the ill-conditioning issue, we propose a model-based reinforcement learning (RL) method called the actor-critic objective penalty function method (ACOPFM) for trajectory tracking in autonomous driving. We adopt an integrated decision and control (IDC)-based planning and control scheme to transform the trajectory tracking problem into MPC-based nonlinear programming problems and embed the objective penalty function method into an actor-critic solution framework. The nonlinear programming problem is transformed into an unconstrained optimization problem and employed as a loss function for model updating of the policy network, and the ill-conditioning issue is avoided by alternately performing gradient descent and adaptively adjusting the penalty parameter. The convergence of ACOPFM is proved. The simulation results demonstrate that the ACOPFM converges to the optimal control strategy fast and steadily, and perform well under the multi-lane test scenario.<\/jats:p>","DOI":"10.1007\/s40747-023-01238-6","type":"journal-article","created":{"date-parts":[[2023,9,30]],"date-time":"2023-09-30T06:02:06Z","timestamp":1696053726000},"page":"1715-1732","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Actor-critic objective penalty function method: an adaptive strategy for trajectory tracking in autonomous driving"],"prefix":"10.1007","volume":"10","author":[{"given":"Bo","family":"Wang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0514-1331","authenticated-orcid":false,"given":"Fusheng","family":"Bai","sequence":"additional","affiliation":[]},{"given":"Ke","family":"Zhang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,9,30]]},"reference":[{"key":"1238_CR1","doi-asserted-by":"crossref","unstructured":"Badue Claudine, Guidolini R\u00e2nik, Carneiro Raphael\u00a0Vivacqua, Azevedo Pedro, Cardoso Vinicius\u00a0B, Forechi Avelino, Jesus Luan, Berriel Rodrigo, Paixao Thiago\u00a0M, Mutz Filipe, et\u00a0al (2021) Self-driving cars: A survey. Expert Systems with Applications, 165:113816","DOI":"10.1016\/j.eswa.2020.113816"},{"issue":"4","key":"1238_CR2","doi-asserted-by":"publisher","first-page":"1135","DOI":"10.1109\/TITS.2015.2498841","volume":"17","author":"D Gonz\u00e1lez","year":"2015","unstructured":"Gonz\u00e1lez D, P\u00e9rez J, Milan\u00e9s V, Nashashibi F (2015) A review of motion planning techniques for automated vehicles. IEEE Trans Intell Transp Syst 17(4):1135\u20131145","journal-title":"IEEE Trans Intell Transp Syst"},{"issue":"21","key":"1238_CR3","doi-asserted-by":"publisher","first-page":"7165","DOI":"10.3390\/s21217165","volume":"21","author":"Z Huang","year":"2021","unstructured":"Huang Z, Li H, Li W, Liu J, Huang C, Yang Z, Fang W (2021) A new trajectory tracking algorithm for autonomous vehicles based on model predictive control. Sensors 21(21):7165","journal-title":"Sensors"},{"issue":"4","key":"1238_CR4","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1109\/TIV.2018.2874529","volume":"3","author":"C Chatzikomis","year":"2018","unstructured":"Chatzikomis C, Sorniotti A, Gruber P, Zanchetta M, Willans D, Balcombe B (2018) Comparison of path tracking and torque-vectoring controllers for autonomous electric vehicles. IEEE Transactions on Intelligent Vehicles 3(4):559\u2013570","journal-title":"IEEE Transactions on Intelligent Vehicles"},{"issue":"1","key":"1238_CR5","doi-asserted-by":"publisher","first-page":"419","DOI":"10.5194\/ms-12-419-2021","volume":"12","author":"L Li","year":"2021","unstructured":"Li L, Li J, Zhang S (2021) Review article: State-of-the-art trajectory tracking of autonomous vehicles. Mechanical Sciences 12(1):419\u2013432","journal-title":"Mechanical Sciences"},{"key":"1238_CR6","doi-asserted-by":"crossref","unstructured":"Shtessel Yuri, Edwards Christopher, Fridman Leonid, Levant Arie, et\u00a0al (2014) Sliding mode control and observation, volume\u00a010. Springer","DOI":"10.1007\/978-0-8176-4893-0"},{"issue":"11","key":"1238_CR7","doi-asserted-by":"publisher","first-page":"1163","DOI":"10.1016\/S0967-0661(01)00062-4","volume":"9","author":"Karl Johan \u00c5str\u00f6m and Tore H\u00e4gglund","year":"2001","unstructured":"Karl Johan \u00c5str\u00f6m and Tore H\u00e4gglund (2001) The future of pid control. Control Eng Pract 9(11):1163\u20131175","journal-title":"Control Eng Pract"},{"key":"1238_CR8","doi-asserted-by":"crossref","unstructured":"Gr\u00fcne Lars, Pannek J\u00fcrgen, Gr\u00fcne Lars, Pannek, J\u00fcrgen (2017) Nonlinear model predictive control. Springer","DOI":"10.1007\/978-3-319-46024-6"},{"issue":"3","key":"1238_CR9","first-page":"407","volume":"23","author":"J-K Liu","year":"2007","unstructured":"Liu J-K, Sun F-C (2007) Research and development on theory and algorithms of sliding mode control. Kongzhi Lilun yu Yingyong\/ Control Theory & Applications 23(3):407\u2013418","journal-title":"Kongzhi Lilun yu Yingyong\/ Control Theory & Applications"},{"issue":"7","key":"1238_CR10","doi-asserted-by":"publisher","first-page":"1063","DOI":"10.1109\/9.508917","volume":"41","author":"P Kachroo","year":"1996","unstructured":"Kachroo P, Tomizuka M (1996) Chattering reduction and error convergence in the sliding-mode control of a class of nonlinear systems. IEEE Trans Autom Control 41(7):1063\u20131068","journal-title":"IEEE Trans Autom Control"},{"key":"1238_CR11","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1016\/j.oceaneng.2019.02.043","volume":"178","author":"B Huang","year":"2019","unstructured":"Huang B, Yang Q (2019) Double-loop sliding mode controller with a novel switching term for the trajectory tracking of work-class rovs. Ocean Eng 178:80\u201394","journal-title":"Ocean Eng"},{"key":"1238_CR12","doi-asserted-by":"publisher","first-page":"1079","DOI":"10.1007\/s11071-015-2551-x","volume":"84","author":"T Elmokadem","year":"2016","unstructured":"Elmokadem T, Zribi M, Youcef-Toumi K (2016) Trajectory tracking sliding mode control of underactuated auvs. Nonlinear Dyn 84:1079\u20131091","journal-title":"Nonlinear Dyn"},{"key":"1238_CR13","doi-asserted-by":"publisher","DOI":"10.1016\/j.ast.2019.105306","volume":"93","author":"M Labbadi","year":"2019","unstructured":"Labbadi M, Cherkaoui M (2019) Robust adaptive backstepping fast terminal sliding mode controller for uncertain quadrotor uav. Aerosp Sci Technol 93:105306","journal-title":"Aerosp Sci Technol"},{"key":"1238_CR14","doi-asserted-by":"crossref","unstructured":"Ge Q, Sun Q, Li SE, Zheng S, Wu W, Chen X (2021) Numerically stable dynamic bicycle model for discrete-time control. In: 2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops), pp 128\u2013134. IEEE","DOI":"10.1109\/IVWorkshops54471.2021.9669260"},{"key":"1238_CR15","doi-asserted-by":"crossref","unstructured":"Mohan Tiwari Pyare, Janardhanan S, un Nabi Mashuq (2015) Rigid spacecraft attitude control using adaptive non-singular fast terminal sliding mode. Journal of Control, Automation and Electrical Systems 26:115\u2013124","DOI":"10.1007\/s40313-014-0164-0"},{"key":"1238_CR16","doi-asserted-by":"publisher","first-page":"619","DOI":"10.1007\/s40435-020-00666-3","volume":"9","author":"H Hassani","year":"2021","unstructured":"Hassani H, Mansouri A, Ahaitouf A (2021) Robust autonomous flight for quadrotor uav based on adaptive nonsingular fast terminal sliding mode control. Int J Dyn Control 9:619\u2013635","journal-title":"Int J Dyn Control"},{"key":"1238_CR17","doi-asserted-by":"crossref","unstructured":"Rupp Astrid, Stolz Michael (2017) Survey on control schemes for automated driving on highways. In Automated driving, pages 43\u201369. Springer","DOI":"10.1007\/978-3-319-31895-0_4"},{"issue":"6","key":"1238_CR18","doi-asserted-by":"publisher","first-page":"485","DOI":"10.1049\/iet-its.2016.0293","volume":"12","author":"L Nie","year":"2018","unstructured":"Nie L, Guan J, Chihua L, Zheng H, Yin Z (2018) Longitudinal speed control of autonomous vehicle based on a self-adaptive pid of radial basis function neural network. IET Intel Transp Syst 12(6):485\u2013494","journal-title":"IET Intel Transp Syst"},{"issue":"2","key":"1238_CR19","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1177\/0278364906075328","volume":"26","author":"M Howard Thomas","year":"2007","unstructured":"Howard Thomas M, Alonzo K (2007) Optimal rough terrain trajectory generation for wheeled mobile robots. Int J Robot Res 26(2):141\u2013166","journal-title":"Int J Robot Res"},{"issue":"3","key":"1238_CR20","doi-asserted-by":"publisher","first-page":"556","DOI":"10.1109\/TCST.2010.2049203","volume":"19","author":"S Li","year":"2010","unstructured":"Li S, Li K, Rajamani R, Wang J (2010) Model predictive multi-objective vehicular adaptive cruise control. IEEE Trans Control Syst Technol 19(3):556\u2013566","journal-title":"IEEE Trans Control Syst Technol"},{"key":"1238_CR21","unstructured":"Sutton Richard\u00a0S, Barto Andrew\u00a0G (2018) Reinforcement learning: an introduction. MIT press"},{"key":"1238_CR22","doi-asserted-by":"crossref","unstructured":"Pal Constantin-Valentin, Leon Florin (2020) Brief survey of model-based reinforcement learning techniques. In 2020 24th International Conference on System Theory, Control and Computing (ICSTCC), pages 92\u201397. IEEE","DOI":"10.1109\/ICSTCC50638.2020.9259716"},{"issue":"3731","key":"1238_CR23","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1126\/science.153.3731.34","volume":"153","author":"R Bellman","year":"1966","unstructured":"Bellman R (1966) Dynamic programming. Science 153(3731):34\u201337","journal-title":"Science"},{"issue":"6","key":"1238_CR24","first-page":"5046","volume":"68","author":"Z Xingwei","year":"2020","unstructured":"Xingwei Z, Bo Tao L, Qian HD (2020) Model-based actor-critic learning for optimal tracking control of robots with input saturation. IEEE Trans Industr Electron 68(6):5046\u20135056","journal-title":"IEEE Trans Industr Electron"},{"key":"1238_CR25","doi-asserted-by":"crossref","unstructured":"Yu Lingli, Shao Xuanya, Yan Xiaoxin (2017) Autonomous overtaking decision making of driverless bus based on deep q-learning method. In 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO), pages 2267\u20132272. IEEE","DOI":"10.1109\/ROBIO.2017.8324756"},{"issue":"5","key":"1238_CR26","doi-asserted-by":"publisher","first-page":"4706","DOI":"10.1109\/TVT.2022.3151651","volume":"71","author":"X Tang","year":"2022","unstructured":"Tang X, Huang B, Liu T, Lin X (2022) Highway decision-making and motion planning for autonomous driving via soft actor-critic. IEEE Trans Veh Technol 71(5):4706\u20134717","journal-title":"IEEE Trans Veh Technol"},{"issue":"8","key":"1238_CR27","doi-asserted-by":"publisher","first-page":"3638","DOI":"10.1109\/TAC.2020.3024161","volume":"66","author":"M Zanon","year":"2020","unstructured":"Zanon M, Gros S (2020) Safe reinforcement learning using robust mpc. IEEE Trans Autom Control 66(8):3638\u20133652","journal-title":"IEEE Trans Autom Control"},{"issue":"2","key":"1238_CR28","doi-asserted-by":"publisher","first-page":"636","DOI":"10.1109\/TAC.2019.2913768","volume":"65","author":"S Gros","year":"2019","unstructured":"Gros S, Zanon M (2019) Data-driven economic nmpc using reinforcement learning. IEEE Trans Autom Control 65(2):636\u2013648","journal-title":"IEEE Trans Autom Control"},{"key":"1238_CR29","doi-asserted-by":"crossref","unstructured":"Gros S\u00e9bastien, Zanon Mario (2021) Reinforcement learning based on MPC and the stochastic policy gradient method. In 2021 American Control Conference (ACC), pages 1947\u20131952. IEEE","DOI":"10.23919\/ACC50511.2021.9482765"},{"issue":"2","key":"1238_CR30","first-page":"859","volume":"53","author":"G Yang","year":"2022","unstructured":"Yang G, Yangang R, Qi S, Eben LS, Haitong M, Jingliang D, Yifan D, Bo C (2022) Integrated decision and control: toward interpretable and computationally efficient driving intelligence. IEEE transactions on cybernetics 53(2):859\u2013873","journal-title":"IEEE transactions on cybernetics"},{"issue":"1","key":"1238_CR31","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1109\/TIV.2016.2578706","volume":"1","author":"P Brian","year":"2016","unstructured":"Brian P, Michal \u010c, Zheng YS, Dmitry Y, Emilio F (2016) A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Transactions on intelligent vehicles 1(1):33\u201355","journal-title":"IEEE Transactions on intelligent vehicles"},{"issue":"6","key":"1238_CR32","first-page":"4909","volume":"23","author":"KB Ravi","year":"2021","unstructured":"Ravi KB, Ibrahim S, Victor T, Patrick M, Al Sallab Ahmad A, Senthil Y, Patrick P (2021) Deep reinforcement learning for autonomous driving: A survey. IEEE Trans Intell Transp Syst 23(6):4909\u20134926","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"1238_CR33","unstructured":"Fletcher R_ (1981) Practical methods of optimization: Vol. 2: Constrained optimization. JOHN WILEY & SONS, INC., ONE WILEY DR., SOMERSET, N. J. 08873, 1981, 224"},{"key":"1238_CR34","doi-asserted-by":"crossref","unstructured":"Charalambous Christakis (1980) A method to overcome the ill-conditioning problem of differentiable penalty functions. Operations Research, 28(3-part-ii):650\u2013667","DOI":"10.1287\/opre.28.3.650"},{"key":"1238_CR35","doi-asserted-by":"crossref","unstructured":"Fletcher Roger (1983) Penalty functions. Mathematical Programming The State of the Art, pages 87\u2013114","DOI":"10.1007\/978-3-642-68874-4_5"},{"issue":"1","key":"1238_CR36","doi-asserted-by":"publisher","first-page":"296","DOI":"10.1137\/0732012","volume":"32","author":"J-P Dussault","year":"1995","unstructured":"Dussault J-P (1995) Numerical stability and efficiency of penalty algorithms. SIAM J Numer Anal 32(1):296\u2013317","journal-title":"SIAM J Numer Anal"},{"issue":"3","key":"1238_CR37","doi-asserted-by":"publisher","first-page":"693","DOI":"10.1137\/0914044","volume":"14","author":"S Saarinen","year":"1993","unstructured":"Saarinen S, Bramley R, Cybenko G (1993) Ill-conditioning in neural network training problems. SIAM J Sci Comput 14(3):693\u2013714","journal-title":"SIAM J Sci Comput"},{"key":"1238_CR38","doi-asserted-by":"crossref","unstructured":"Zhang Yongke, Zhang Yongjun, Ye Wei (1995) Local-sparse connection multilayer networks. In Proceedings of ICNN\u201995-International Conference on Neural Networks, volume\u00a03, pages 1254\u20131257. IEEE","DOI":"10.1109\/ICNN.1995.487335"},{"key":"1238_CR39","unstructured":"Der\u00a0Smagt Patrick Van, Hirzinger Gerd (2002) Solving the ill-conditioning in neural network learning. In Neural networks: tricks of the trade, pages 193\u2013206. Springer"},{"issue":"1","key":"1238_CR40","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1007\/s10107-010-0408-0","volume":"133","author":"H Byrd Richard","year":"2012","unstructured":"Byrd Richard H, Gabriel L-C, Jorge N (2012) A line search exact penalty method using steering rules. Math Program 133(1):39\u201373","journal-title":"Math Program"},{"issue":"133","key":"1238_CR41","doi-asserted-by":"publisher","first-page":"104","DOI":"10.1090\/S0025-5718-1976-0400702-1","volume":"30","author":"C Rheinboldt Werner","year":"1976","unstructured":"Rheinboldt Werner C (1976) On measures of ill-conditioning for nonlinear equations. Math Comput 30(133):104\u2013111","journal-title":"Math Comput"},{"issue":"3","key":"1238_CR42","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1137\/1021052","volume":"21","author":"G Peters","year":"1979","unstructured":"Peters G, Wilkinson James H (1979) Inverse iteration, ill-conditioned equations and newton\u2019s method. SIAM Rev 21(3):339\u2013360","journal-title":"SIAM Rev"},{"issue":"5","key":"1238_CR43","doi-asserted-by":"publisher","first-page":"554","DOI":"10.1109\/31.1783","volume":"35","author":"KM Peter","year":"1988","unstructured":"Peter KM, Chua Leon O (1988) Neural networks for nonlinear programming. IEEE Transactions on Circuits and Systems 35(5):554\u2013562","journal-title":"IEEE Transactions on Circuits and Systems"},{"issue":"1","key":"1238_CR44","doi-asserted-by":"publisher","first-page":"277","DOI":"10.1016\/j.ejor.2017.07.025","volume":"265","author":"L Jie","year":"2018","unstructured":"Jie L, Gupte A, Huang Y (2018) A mean-risk mixed integer nonlinear program for transportation network protection. Eur J Oper Res 265(1):277\u2013289","journal-title":"Eur J Oper Res"},{"key":"1238_CR45","unstructured":"Nocedal Jorge, Wright Stephen\u00a0J (2006) Numerical Optimization, 2nd edition. Springer"},{"key":"1238_CR46","doi-asserted-by":"crossref","unstructured":"Luenberger David G, Ye Yinyu (2021) Linear and Nonlinear Programming, 5th edition. Springer Nature Switzerland AG","DOI":"10.1007\/978-3-030-85450-8_6"},{"key":"1238_CR47","unstructured":"Murray W (1967) Ill-conditioning in barrier and penalty functions arising in constrained nonlinear programming. In Proceedings of the Sixth International Symposium on Mathematical Programming"},{"issue":"5","key":"1238_CR48","doi-asserted-by":"publisher","first-page":"344","DOI":"10.1287\/mnsc.13.5.344","volume":"13","author":"I Zangwill Willard","year":"1967","unstructured":"Zangwill Willard I (1967) Non-linear programming via penalty functions. Manage Sci 13(5):344\u2013358","journal-title":"Manage Sci"},{"issue":"1","key":"1238_CR49","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1007\/BF01581639","volume":"19","author":"F Coleman Thomas","year":"1980","unstructured":"Coleman Thomas F, Conn Andrew R (1980) Second-order conditions for an exact penalty function. Math Program 19(1):178\u2013185","journal-title":"Math Program"},{"issue":"3","key":"1238_CR50","doi-asserted-by":"publisher","first-page":"404","DOI":"10.1016\/0377-2217(90)90017-6","volume":"46","author":"F K\u00f6rner","year":"1990","unstructured":"K\u00f6rner F (1990) On the numerical realization of the exact penalty method for quadratic programming algorithms. Eur J Oper Res 46(3):404\u2013408","journal-title":"Eur J Oper Res"},{"issue":"3","key":"1238_CR51","doi-asserted-by":"publisher","first-page":"686","DOI":"10.1016\/0377-2217(93)E0339-Y","volume":"83","author":"M Mongeau","year":"1995","unstructured":"Mongeau M, Sartenaer A (1995) Automatic decrease of the penalty parameter in exact penalty function methods. Eur J Oper Res 83(3):686\u2013699","journal-title":"Eur J Oper Res"},{"issue":"1","key":"1238_CR52","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1137\/0705006","volume":"5","author":"D Morrison David","year":"1968","unstructured":"Morrison David D (1968) Optimization by least squares. SIAM J Numer Anal 5(1):83\u201388","journal-title":"SIAM J Numer Anal"},{"key":"1238_CR53","doi-asserted-by":"crossref","unstructured":"Meng Z, Qiying H, Dang C, Yang X (2004) An objective penalty function method for nonlinear programming. Appl Math Lett 17(6):683\u2013689","DOI":"10.1016\/S0893-9659(04)90105-X"},{"key":"1238_CR54","doi-asserted-by":"crossref","unstructured":"Meng Z, Qiying H, Dang C (2009) A penalty function algorithm with objective parameters for nonlinear mathematical programming. Journal of Industrial & Management Optimization 5(3):585","DOI":"10.3934\/jimo.2009.5.585"},{"issue":"2","key":"1238_CR55","doi-asserted-by":"publisher","first-page":"691","DOI":"10.1007\/s10898-012-9900-9","volume":"56","author":"Z Meng","year":"2013","unstructured":"Meng Z, Dang C, Jiang M, Xinsheng X, Shen R (2013) Exactness and algorithm of an objective penalty function. J Global Optim 56(2):691\u2013711","journal-title":"J Global Optim"},{"key":"1238_CR56","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1016\/j.neucom.2019.09.119","volume":"458","author":"J Min","year":"2021","unstructured":"Min J, Meng Z, Zhou G, Shen R (2021) On the smoothing of the norm objective penalty function for two-cardinality sparse constrained optimization problems. Neurocomputing 458:559\u2013565","journal-title":"Neurocomputing"},{"issue":"5","key":"1238_CR57","doi-asserted-by":"publisher","first-page":"1216","DOI":"10.1016\/j.automatica.2013.02.003","volume":"49","author":"A Anil","year":"2013","unstructured":"Anil A, Humberto G, Shankar SS, Claire T (2013) Provably safe and robust learning-based model predictive control. Automatica 49(5):1216\u20131226","journal-title":"Automatica"},{"key":"1238_CR58","doi-asserted-by":"crossref","unstructured":"Koller Torsten, Berkenkamp Felix, Turchetta Matteo, Krause Andreas (2018) Learning-based model predictive control for safe exploration. In 2018 IEEE conference on decision and control (CDC), pages 6059\u20136066. IEEE","DOI":"10.1109\/CDC.2018.8619572"},{"key":"1238_CR59","doi-asserted-by":"crossref","unstructured":"Zanon Mario, Gros S\u00e9bastien, Bemporad Alberto (2019) Practical reinforcement learning of stabilizing economic mpc. In 2019 18th European Control Conference (ECC), pages 2258\u20132263. IEEE","DOI":"10.23919\/ECC.2019.8795816"},{"key":"1238_CR60","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2021.118346","volume":"309","author":"J Arroyo","year":"2022","unstructured":"Arroyo J, Manna C, Spiessens F, Helsen L (2022) Reinforced model predictive control (rl-mpc) for building energy management. Appl Energy 309:118346","journal-title":"Appl Energy"},{"issue":"3","key":"1238_CR61","doi-asserted-by":"publisher","first-page":"335","DOI":"10.1016\/0005-1098(89)90002-2","volume":"25","author":"E Garcia Carlos","year":"1989","unstructured":"Garcia Carlos E, Prett David M, Manfred M (1989) Model predictive control: Theory and practice-a survey. Automatica 25(3):335\u2013348","journal-title":"Automatica"},{"issue":"9","key":"1238_CR62","doi-asserted-by":"publisher","first-page":"3866","DOI":"10.1109\/TCYB.2020.2999556","volume":"50","author":"B Karg","year":"2020","unstructured":"Karg B, Lucia S (2020) Efficient representation and approximation of model predictive control laws via deep learning. IEEE Transactions on Cybernetics 50(9):3866\u20133878","journal-title":"IEEE Transactions on Cybernetics"},{"key":"1238_CR63","doi-asserted-by":"crossref","unstructured":"Chen Jianyu, Li Shengbo\u00a0Eben, Tomizuka Masayoshi (2021) Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems","DOI":"10.1109\/TITS.2020.3046646"},{"key":"1238_CR64","doi-asserted-by":"crossref","unstructured":"Ren Yangang, Duan Jingliang, Li Shengbo\u00a0Eben, Guan Yang, Sun Qi (2020) Improving generalization of reinforcement learning with minimax distributional soft actor-critic. In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), pages 1\u20136. IEEE","DOI":"10.1109\/ITSC45102.2020.9294300"},{"key":"1238_CR65","doi-asserted-by":"crossref","unstructured":"Ma Haitong, Chen Jianyu, Eben Shengbo, Lin Ziyu, Guan Yang, Ren Yangang, Zheng Sifa (2021) Model-based constrained reinforcement learning using generalized control barrier function. In 2021 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4552\u20134559. IEEE","DOI":"10.1109\/IROS51168.2021.9636468"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01238-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-01238-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01238-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,29]],"date-time":"2024-10-29T18:10:41Z","timestamp":1730225441000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-01238-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,30]]},"references-count":65,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,4]]}},"alternative-id":["1238"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-01238-6","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,30]]},"assertion":[{"value":"19 March 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 September 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 September 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that there are no conflicts of interest regarding the publication of this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflicts of interest"}}]}}