{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,12]],"date-time":"2026-02-12T10:55:54Z","timestamp":1770893754005,"version":"3.50.1"},"reference-count":28,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T00:00:00Z","timestamp":1688083200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Thibaut Th\u00e9ate"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Classical reinforcement learning (RL) techniques are generally concerned with the design of decision-making policies driven by the maximisation of the expected outcome. Nevertheless, this approach does not take into consideration the potential risk associated with the actions taken, which may be critical in certain applications. To address that issue, the present research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies that are sensitive to the risk, the latter being modelled by the tail of the return probability distribution. The core idea is to replace the Q function generally standing at the core of learning schemes in RL by another function, taking into account both the expected return and the risk. Named the risk-based utility function U, it can be extracted from the random return distribution Z naturally learnt by any distributional RL algorithm. This enables the spanning of the complete potential trade-off between risk minimisation and expected return maximisation, in contrast to fully risk-averse methodologies. Fundamentally, this research yields a truly practical and accessible solution for learning risk-sensitive policies with minimal modification to the distributional RL algorithm, with an emphasis on the interpretability of the resulting decision-making process.<\/jats:p>","DOI":"10.3390\/a16070325","type":"journal-article","created":{"date-parts":[[2023,7,3]],"date-time":"2023-07-03T00:42:46Z","timestamp":1688344966000},"page":"325","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Risk-Sensitive Policy with Distributional Reinforcement Learning"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8218-5309","authenticated-orcid":false,"given":"Thibaut","family":"Th\u00e9ate","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, University of Li\u00e8ge, 4031 Li\u00e8ge, Belgium"}]},{"given":"Damien","family":"Ernst","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, University of Li\u00e8ge, 4031 Li\u00e8ge, Belgium"},{"name":"Information Processing and Communications Laboratory, Institut Polytechnique de Paris, 91120 Paris, France"}]}],"member":"1968","published-online":{"date-parts":[[2023,6,30]]},"reference":[{"key":"ref_1","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Technical Note: Q-Learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"2419","DOI":"10.1007\/s10994-021-05961-4","article-title":"Challenges of real-world reinforcement learning: Definitions, benchmarks and analysis","volume":"110","author":"Levine","year":"2021","journal-title":"Mach. Learn."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1038\/s41591-018-0310-5","article-title":"Guidelines for reinforcement learning in healthcare","volume":"25","author":"Gottesman","year":"2019","journal-title":"Nat. Med."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"114632","DOI":"10.1016\/j.eswa.2021.114632","article-title":"An application of deep reinforcement learning to algorithmic trading","volume":"173","author":"Ernst","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"4915","DOI":"10.1109\/LRA.2021.3070252","article-title":"Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones","volume":"6","author":"Thananjeyan","year":"2021","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"14043","DOI":"10.1109\/TITS.2021.3134702","article-title":"A Survey of Deep RL and IL for Autonomous Driving Policy Learning","volume":"23","author":"Zhu","year":"2022","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_8","unstructured":"Bellemare, M.G., Dabney, W., and Munos, R. (2017, January 6\u201311). A Distributional Perspective on Reinforcement Learning. Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia."},{"key":"ref_9","first-page":"1437","article-title":"A comprehensive survey on safe reinforcement learning","volume":"16","year":"2015","journal-title":"J. Mach. Learn. Res."},{"key":"ref_10","unstructured":"Castro, D.D., Tamar, A., and Mannor, S. (July, January 26). Policy Gradients with Variance Related Risk Criteria. Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, UK."},{"key":"ref_11","unstructured":"La, P., and Ghavamzadeh, M. (2013, January 5\u20138). Actor-Critic Algorithms for Risk-Sensitive MDPs. Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA."},{"key":"ref_12","unstructured":"Zhang, S., Liu, B., and Whiteson, S. (2021, January 2\u20139). Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning. Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event."},{"key":"ref_13","first-page":"1443","article-title":"Conditional Value-at-Risk for General Loss Distributions","volume":"7","author":"Rockafellar","year":"2001","journal-title":"Corp. Financ. Organ. J."},{"key":"ref_14","unstructured":"Chow, Y., Tamar, A., Mannor, S., and Pavone, M. (2015, January 7\u201312). Risk-Sensitive and Robust Decision-Making: A CVaR Optimization Approach. Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada."},{"key":"ref_15","first-page":"167:1","article-title":"Risk-Constrained Reinforcement Learning with Percentile Risk Criteria","volume":"18","author":"Chow","year":"2017","journal-title":"J. Mach. Learn. Res."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Tamar, A., Glassner, Y., and Mannor, S. (2015, January 25\u201330). Optimizing the CVaR via Sampling. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA.","DOI":"10.1609\/aaai.v29i1.9561"},{"key":"ref_17","unstructured":"Rajeswaran, A., Ghotra, S., Ravindran, B., and Levine, S. (2017, January 24\u201326). EPOpt: Learning Robust Neural Network Policies Using Model Ensembles. Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France."},{"key":"ref_18","unstructured":"Hiraoka, T., Imagawa, T., Mori, T., Onishi, T., and Tsuruoka, Y. (2019, January 8\u201314). Learning Robust Options by Conditional Value at Risk Optimization. Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1298","DOI":"10.1162\/NECO_a_00600","article-title":"Risk-Sensitive Reinforcement Learning","volume":"26","author":"Shen","year":"2014","journal-title":"Neural Comput."},{"key":"ref_20","unstructured":"Dabney, W., Ostrovski, G., Silver, D., and Munos, R. (2018, January 10\u201315). Implicit Quantile Networks for Distributional Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm\u00e4ssan, Stockholm, Sweden."},{"key":"ref_21","unstructured":"Tang, Y.C., Zhang, J., and Salakhutdinov, R. (November, January 30). Worst Cases Policy Gradients. Proceedings of the 3rd Annual Conference on Robot Learning, CoRL 2019, Osaka, Japan."},{"key":"ref_22","unstructured":"Urp\u00ed, N.A., Curi, S., and Krause, A. (2021, January 3\u20137). Risk-Averse Offline Reinforcement Learning. Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"859","DOI":"10.1007\/s10994-022-06187-8","article-title":"Safety-constrained reinforcement learning with a distributional safety critic","volume":"112","author":"Yang","year":"2022","journal-title":"Mach. Learn."},{"key":"ref_24","unstructured":"Pinto, L., Davidson, J., Sukthankar, R., and Gupta, A. (2017, January 6\u201311). Robust Adversarial Reinforcement Learning. Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia."},{"key":"ref_25","unstructured":"Qiu, W., Wang, X., Yu, R., Wang, R., He, X., An, B., Obraztsova, S., and Rabinovich, Z. (2021, January 6\u201314). RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents. Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, Virtual."},{"key":"ref_26","unstructured":"Bellman, R. (1957). Dynamic Programming, Princeton University Press."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1016\/j.neucom.2023.02.049","article-title":"Distributional Reinforcement Learning with Unconstrained Monotonic Neural Networks","volume":"534","author":"Wehenkel","year":"2023","journal-title":"Neurocomputing"},{"key":"ref_28","unstructured":"Wehenkel, A., and Louppe, G. (2019, January 8\u201314). Unconstrained Monotonic Neural Networks. Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/7\/325\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:03:37Z","timestamp":1760126617000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/7\/325"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,30]]},"references-count":28,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2023,7]]}},"alternative-id":["a16070325"],"URL":"https:\/\/doi.org\/10.3390\/a16070325","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,30]]}}}