{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T18:20:45Z","timestamp":1755800445185,"version":"3.44.0"},"reference-count":62,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2025,5,10]],"date-time":"2025-05-10T00:00:00Z","timestamp":1746835200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","award":["RGPIN-2022-04140"],"award-info":[{"award-number":["RGPIN-2022-04140"]}],"id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Discovery Launch Supplement","award":["DGECR-2022-00098"],"award-info":[{"award-number":["DGECR-2022-00098"]}]},{"name":"Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto to E.A."},{"name":"Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,5,10]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>We consider a risk-aware optimal control problem, where the objective is the conditional value-at-risk of a maximum of stagewise and terminal costs along a finite time-horizon. Previous techniques for this problem rely on dynamic programming (DP), which is notorious for scalability issues. Since approximate DP (ADP) in risk-neutral settings can alleviate such issues, we study an ADP method for the aforementioned risk-aware setting that relies on empirical sampling and function approximation in a reproducing kernel Hilbert space. Our contribution is the derivation of sup-norm approximation error bounds that are pointwise functions of the sampling process, using techniques from functional analysis, probability theory and a relaxed Lipschitz condition. The performance of the method is evaluated using a single-stage reservoir management problem, and the effect of different algorithm parameters on the error is illustrated.<\/jats:p>","DOI":"10.1093\/imamci\/dnaf014","type":"journal-article","created":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T00:39:55Z","timestamp":1748306395000},"source":"Crossref","is-referenced-by-count":0,"title":["Error analysis for approximate CVaR-optimal control with a maximum cost"],"prefix":"10.1093","volume":"42","author":[{"given":"Evan","family":"Arsenault","sequence":"first","affiliation":[{"name":"Toronto Hydro , 500 Commissioners St., M4M 3N7 Toronto ,","place":["Canada"]}]},{"given":"Margaret P","family":"Chapman","sequence":"additional","affiliation":[{"name":"Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto , 35 St George St., M5S 1A4 Toronto ,","place":["Canada"]}]}],"member":"286","published-online":{"date-parts":[[2025,5,27]]},"reference":[{"key":"2025082006364874900_ref1","doi-asserted-by":"crossref","first-page":"2724","DOI":"10.1016\/j.automatica.2008.03.027","article-title":"Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems","volume":"44","author":"Abate","year":"2008","journal-title":"Autom. J. IFAC"},{"key":"2025082006364874900_ref2","first-page":"4357","article-title":"A distributional analysis of sampling-based reinforcement learning algorithms","volume-title":"International Conference on Artificial Intelligence and Statistics","author":"Amortila","year":"2020"},{"volume-title":"Approximate CVaR Optimal Control with a Maximum Cost.","year":"2023","author":"Arsenault","key":"2025082006364874900_ref3"},{"key":"2025082006364874900_ref4","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1111\/1467-9965.00068","article-title":"Coherent measures of risk","volume":"9","author":"Artzner","year":"1999","journal-title":"Math. Finance"},{"volume-title":"Probability and Measure Theory","year":"2000","author":"Ash","key":"2025082006364874900_ref5"},{"key":"2025082006364874900_ref6","doi-asserted-by":"crossref","first-page":"880","DOI":"10.1287\/moor.1080.0324","article-title":"A learning algorithm for risk-sensitive cost","volume":"33","author":"Basu","year":"2008","journal-title":"Math. Oper. Res."},{"key":"2025082006364874900_ref7","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1007\/s00186-021-00746-w","article-title":"Minimizing spectral risk measures applied to Markov decision processes","volume":"94","author":"B\u00e4uerle","year":"2021","journal-title":"Math. Methods Oper. Res."},{"key":"2025082006364874900_ref8","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1007\/s00186-011-0367-0","article-title":"Markov decision processes with average-value-at-risk criteria","volume":"74","author":"B\u00e4uerle","year":"2011","journal-title":"Math. Methods Oper. Res."},{"key":"2025082006364874900_ref9","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1287\/moor.2013.0601","article-title":"More risk-sensitive Markov decision processes","volume":"39","author":"B\u00e4uerle","year":"2014","journal-title":"Math. Oper. Res."},{"volume-title":"Dynamic Programming","year":"1957","author":"Bellman","key":"2025082006364874900_ref10"},{"key":"2025082006364874900_ref11","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1090\/S0025-5718-1959-0107376-8","article-title":"Functional approximations and dynamic programming","volume":"13","author":"Bellman","year":"1959","journal-title":"Math. Comp."},{"volume-title":"Stochastic Optimal Control: The Discrete-Time Case","year":"1996","author":"Bertsekas","key":"2025082006364874900_ref12"},{"key":"2025082006364874900_ref13","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1016\/S0167-6911(01)00152-9","article-title":"A sensitivity formula for risk-sensitive cost and the actor\u2013critic algorithm","volume":"44","author":"Borkar","year":"2001","journal-title":"Syst. Control Lett."},{"key":"2025082006364874900_ref14","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1287\/moor.27.2.294.324","article-title":"Q-learning for risk-sensitive control","volume":"27","author":"Borkar","year":"2002","journal-title":"Math. Oper. Res."},{"key":"2025082006364874900_ref15","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1137\/S0363012997331639","article-title":"The O.D.E. method for convergence of stochastic approximation and reinforcement learning","volume":"38","author":"Borkar","year":"2000","journal-title":"SIAM J. Control Optim."},{"key":"2025082006364874900_ref16","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1109\/ICRA.2016.7487152","article-title":"Risk aversion in finite Markov decision processes using total cost criteria and average value at risk","volume-title":"2016 IEEE Intrernational Conference on Robotics and Automation (ICRA)","author":"Carpin","year":"2016"},{"key":"2025082006364874900_ref17","doi-asserted-by":"crossref","first-page":"3935","DOI":"10.1137\/13093902X","article-title":"Risk-averse control of undiscounted transient Markov models","volume":"52","author":"\u00c7avu\u015f","year":"2014","journal-title":"SIAM J. Control Optim."},{"key":"2025082006364874900_ref18","doi-asserted-by":"crossref","first-page":"6521","DOI":"10.1109\/TAC.2021.3131149","article-title":"Risk-sensitive safety analysis using conditional value-at-risk","volume":"67","author":"Chapman","year":"2022","journal-title":"IEEE Trans. Autom. Control"},{"key":"2025082006364874900_ref19","doi-asserted-by":"crossref","first-page":"3720","DOI":"10.1109\/TAC.2022.3195381","article-title":"On optimizing the conditional value-at-risk of a maximum cost for risk-averse safety analysis","volume":"68","author":"Chapman","year":"2023","journal-title":"IEEE Trans. Autom. Control"},{"volume-title":"Infinite Dimensional Analysis: A Hitchhiker\u2019s Guide","year":"2006","author":"Charalambos","key":"2025082006364874900_ref20"},{"key":"2025082006364874900_ref21","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1146\/annurev-control-060117-104941","article-title":"Hamilton\u2013Jacobi reachability: some recent theoretical advances and applications in unmanned airspace management","volume":"1","author":"Chen","year":"2018","journal-title":"Annu. Rev. Control Robot. Auton. Syst."},{"key":"2025082006364874900_ref22","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511618796","volume-title":"Learning Theory: An Approximation Theory Viewpoint","author":"Cucker","year":"2007"},{"key":"2025082006364874900_ref23","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1080\/17442508.2014.939979","article-title":"Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities","volume":"87","author":"Dufour","year":"2015","journal-title":"Stochastics"},{"key":"2025082006364874900_ref24","article-title":"Error propagation for approximate policy and value iteration","volume":"23","author":"Farahmand","year":"2010","journal-title":"Adv. Neural Inform. Process. Syst."},{"key":"2025082006364874900_ref25","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1002\/nav.21743","article-title":"On the reduction of total-cost and average-cost MDPs to discounted MDPs","volume":"66","author":"Feinberg","year":"2019","journal-title":"Naval Res. Logist. (NRL)"},{"volume-title":"Real Analysis: Modern Techniques and Their Applications","year":"1999","author":"Folland","key":"2025082006364874900_ref26"},{"key":"2025082006364874900_ref27","doi-asserted-by":"crossref","first-page":"1141","DOI":"10.1137\/21M1389808","article-title":"Convergence of recursive stochastic algorithms using Wasserstein divergence","volume":"3","author":"Gupta","year":"2021","journal-title":"SIAM J. Math. Data Sci."},{"key":"2025082006364874900_ref28","doi-asserted-by":"crossref","DOI":"10.1109\/TAC.2024.3362686","article-title":"Probabilistic contraction analysis of iterated random operators","volume-title":"IEEE Trans. Autom. Control","author":"Gupta","year":"2024"},{"key":"2025082006364874900_ref29","doi-asserted-by":"crossref","DOI":"10.1007\/0-387-34471-3","volume-title":"Extreme Value Theory: An Introduction","author":"de Haan","year":"2006"},{"key":"2025082006364874900_ref30","doi-asserted-by":"crossref","first-page":"1569","DOI":"10.1137\/140969221","article-title":"A convex analytic approach to risk-aware Markov decision processes","volume":"53","author":"Haskell","year":"2015","journal-title":"SIAM J. Control Optim."},{"key":"2025082006364874900_ref31","doi-asserted-by":"crossref","first-page":"402","DOI":"10.1287\/moor.2015.0733","article-title":"Empirical dynamic programming","volume":"41","author":"Haskell","year":"2016","journal-title":"Math. Oper. Res."},{"key":"2025082006364874900_ref32","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1109\/TAC.2019.2907414","article-title":"A universal empirical dynamic programming algorithm for continuous state MDPs","volume":"65","author":"Haskell","year":"2019","journal-title":"IEEE Trans. Autom. Control"},{"key":"2025082006364874900_ref33","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/s00186-005-0438-1","article-title":"Lipschitz continuity of value functions in Markovian decision processes","volume":"62","author":"Hinderer","year":"2005","journal-title":"Math. Methods Oper. Res."},{"key":"2025082006364874900_ref34","doi-asserted-by":"crossref","first-page":"1314","DOI":"10.1109\/TAC.2020.2989702","article-title":"Stochastic approximation for risk-aware Markov decision processes","volume":"66","author":"Huang","year":"2021","journal-title":"IEEE Trans. Autom. Control"},{"key":"2025082006364874900_ref35","doi-asserted-by":"crossref","first-page":"554","DOI":"10.1287\/moor.2017.0872","article-title":"Risk-averse approximate dynamic programming with quantile-based risk measures","volume":"43","author":"Jiang","year":"2018","journal-title":"Math. Oper. Res."},{"key":"2025082006364874900_ref36","doi-asserted-by":"crossref","first-page":"1144","DOI":"10.1137\/18M1208058","article-title":"Robustness to incorrect system models in stochastic control","volume":"58","author":"Kara","year":"2020","journal-title":"SIAM J. Control Optim."},{"key":"2025082006364874900_ref37","article-title":"Robustness to incorrect models and data-driven learning in average-cost optimal stochastic control","volume":"139","author":"Kara","year":"2022","journal-title":"Automatica, art. no. 110179"},{"key":"2025082006364874900_ref38","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1007\/s10287-005-0007-3","article-title":"Integrated chance constraints: reduced forms and an algorithm","volume":"3","author":"Klein Haneveld","year":"2006","journal-title":"Comput. Manage. Sci."},{"key":"2025082006364874900_ref39","first-page":"1800","article-title":"Risk-averse learning by temporal difference methods with Markov risk measures","volume":"22","author":"K\u00f6se","year":"2021","journal-title":"J. Mach. Learn. Res."},{"volume-title":"Introduction to Topological Manifolds","year":"2010","author":"Lee","key":"2025082006364874900_ref40"},{"key":"2025082006364874900_ref41","doi-asserted-by":"crossref","first-page":"1310","DOI":"10.1109\/LCSYS.2021.3092196","article-title":"Fitted value iteration in continuous MDPs with state dependent action sets","volume":"6","author":"Li","year":"2021","journal-title":"IEEE Control Syst. Lett."},{"key":"2025082006364874900_ref42","first-page":"664","article-title":"An analysis of reinforcement learning with function approximation","author":"Melo","year":"2008"},{"key":"2025082006364874900_ref43","first-page":"815","article-title":"Finite-time bounds for fitted value iteration","volume":"1","author":"Munos","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"2025082006364874900_ref44","first-page":"430","article-title":"Distributionally robust control of constrained stochastic systems","volume":"61","author":"van Parys","year":"2015","journal-title":"IEEE Trans. Autom. Control"},{"key":"2025082006364874900_ref45","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1007\/s10994-015-5484-1","article-title":"Policy gradient in Lipschitz Markov decision processes","volume":"100","author":"Pirotta","year":"2015","journal-title":"Mach. Learn."},{"key":"2025082006364874900_ref46","doi-asserted-by":"crossref","first-page":"234","DOI":"10.1287\/moor.1060.0188","article-title":"Performance loss bounds for approximate value iteration with state aggregation","volume":"31","author":"van Roy","year":"2006","journal-title":"Math. Oper. Res."},{"key":"2025082006364874900_ref47","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1007\/s10107-010-0393-3","article-title":"Risk-averse dynamic programming for Markov decision processes","volume":"125","author":"Ruszczy\u0144ski","year":"2010","journal-title":"Math. Program."},{"key":"2025082006364874900_ref48","doi-asserted-by":"crossref","DOI":"10.1137\/1.9780898718751","volume-title":"Lectures on Stochastic Programming: Modeling and Theory","author":"Shapiro","year":"2009"},{"key":"2025082006364874900_ref49","doi-asserted-by":"crossref","first-page":"3652","DOI":"10.1137\/120899005","article-title":"Risk-sensitive Markov control processes","volume":"51","author":"Shen","year":"2013","journal-title":"SIAM J. Control Optim."},{"key":"2025082006364874900_ref50","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1007\/s00365-006-0659-y","article-title":"Learning theory estimates via integral operators and their approximations","volume":"26","author":"Smale","year":"2007","journal-title":"Constr. Approx."},{"key":"2025082006364874900_ref51","doi-asserted-by":"crossref","first-page":"2555","DOI":"10.1109\/TCST.2023.3274843","article-title":"On exponential utility and conditional value-at-risk as risk-averse performance criteria","volume":"31","author":"Smith","year":"2023","journal-title":"IEEE Trans. Control Syst. Technol."},{"key":"2025082006364874900_ref52","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-77242-4","volume-title":"Support Vector Machines","author":"Steinwart","year":"2008"},{"key":"2025082006364874900_ref53","first-page":"1057","article-title":"Policy gradient methods for reinforcement learning with function approximation","volume":"12","author":"Sutton","year":"1999","journal-title":"Adv. Neural Inform. Process. Syst."},{"key":"2025082006364874900_ref54","article-title":"State-based confidence bounds for data-driven stochastic reachability using Hilbert space embeddings","volume":"138","author":"Thorpe","year":"2022","journal-title":"Automatica, art. no. 110146"},{"key":"2025082006364874900_ref55","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1023\/A:1022689125041","article-title":"Asynchronous stochastic approximation and Q-learning","volume":"16","author":"Tsitsiklis","year":"1994","journal-title":"Mach. Learn."},{"key":"2025082006364874900_ref56","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1023\/A:1018008221616","article-title":"Feature-based methods for large scale dynamic programming","volume":"22","author":"Tsitsiklis","year":"1996","journal-title":"Mach. Learn."},{"key":"2025082006364874900_ref57","doi-asserted-by":"crossref","DOI":"10.1017\/9781108231596","volume-title":"High-Dimensional Probability: An Introduction with Applications in Data Science","author":"Vershynin","year":"2018"},{"key":"2025082006364874900_ref58","article-title":"Risk-averse autonomous systems: a brief history and recent developments from the perspective of optimal control","volume":"311","author":"Wang","year":"2022","journal-title":"Artif. Intell., art. no. 103743"},{"key":"2025082006364874900_ref59","first-page":"279","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"key":"2025082006364874900_ref60","doi-asserted-by":"crossref","first-page":"2863","DOI":"10.23919\/ACC53348.2022.9867285","article-title":"CVaR-based safety analysis in the infinite time horizon setting","volume-title":"2022 American Control Conference (ACC)","author":"Wei","year":"2022"},{"key":"2025082006364874900_ref61","doi-asserted-by":"crossref","first-page":"764","DOI":"10.2307\/1426972","article-title":"Risk-sensitive linear\/quadratic\/gaussian control","volume":"13","author":"Whittle","year":"1981","journal-title":"Adv. Appl. Probab."},{"key":"2025082006364874900_ref62","doi-asserted-by":"crossref","first-page":"3135","DOI":"10.1109\/TAC.2018.2790261","article-title":"Approximate value iteration for risk-aware Markov decision processes","volume":"63","author":"Yu","year":"2018","journal-title":"IEEE Trans. Autom. Control"}],"container-title":["IMA Journal of Mathematical Control and Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/imamci\/article-pdf\/42\/2\/dnaf014\/63362813\/dnaf014.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/imamci\/article-pdf\/42\/2\/dnaf014\/63362813\/dnaf014.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,20]],"date-time":"2025-08-20T10:37:02Z","timestamp":1755686222000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/imamci\/article\/doi\/10.1093\/imamci\/dnaf014\/8150967"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,10]]},"references-count":62,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,5,10]]}},"URL":"https:\/\/doi.org\/10.1093\/imamci\/dnaf014","relation":{},"ISSN":["0265-0754","1471-6887"],"issn-type":[{"type":"print","value":"0265-0754"},{"type":"electronic","value":"1471-6887"}],"subject":[],"published-other":{"date-parts":[[2025,6]]},"published":{"date-parts":[[2025,5,10]]},"article-number":"dnaf014"}}