{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T19:45:30Z","timestamp":1775850330713,"version":"3.50.1"},"reference-count":82,"publisher":"Wiley","issue":"2","license":[{"start":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:00:00Z","timestamp":1750291200000},"content-version":"vor","delay-in-days":18,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Intelligent Sys in Account"],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>ABSTRACT<\/jats:title><jats:p>Financial portfolio management focuses on the maximization of several objectives in a trading period related not only to the risk and performance of the portfolio but also to other objectives such as the environment, social, and governance (ESG) score of the portfolio. Regrettably, classic methods such as the Markowitz model do not take into account ESG scores but only the risk and performance of the portfolio. Moreover, the assumptions made by this model about the financial returns make it unfeasible to be applicable to markets with high volatility such as the technological sector. This paper investigates the application of deep reinforcement learning (DRL) for ESG financial portfolio management. DRL agents circumvent the issue of classic models in the sense that they do not make assumptions like the financial returns being normally distributed and are able to deal with any information like the ESG score if they are configured to gain a reward that makes an objective better. However, the performance of DRL agents has high variability, and it is very sensible to the value of their hyperparameters. Bayesian optimization is a class of methods that are suited to the optimization of black\u2010box functions, that is, functions whose analytical expression is unknown and are noisy and expensive to evaluate. The hyperparameter tuning problem of DRL algorithms perfectly suits this scenario. As training an agent just for one objective is a very expensive period, requiring millions of timesteps, instead of optimizing an objective being a mixture of a risk\u2010performance metric and an ESG metric, we choose to separate the objective and solve the multi\u2010objective scenario to obtain an optimal Pareto set of portfolios representing the best trade\u2010off between the Sharpe ratio and the ESG mean score of the portfolio and leaving to the investor the choice of the final portfolio. We conducted our experiments using environments encoded within the OpenAI Gym, adapted from the FinRL platform. The experiments are carried out in the Dow Jones Industrial Average (DJIA) and the NASDAQ markets in terms of the Sharpe ratio achieved by the agent and the mean ESG score of the portfolio. We compare the performance of the obtained Pareto sets in hypervolume terms illustrating how portfolios are the best trade\u2010off between the Sharpe ratio and mean ESG score. Also, we show the usefulness of our proposed methodology by comparing the obtained hypervolume with one achieved by a random search methodology on the DRL hyperparameter space.<\/jats:p>","DOI":"10.1002\/isaf.70008","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T07:08:49Z","timestamp":1750316929000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Multi\u2010Objective Bayesian Optimization of Deep Reinforcement Learning for Environmental, Social, and Governance (ESG) Financial Portfolio Management"],"prefix":"10.1002","volume":"32","author":[{"given":"Eduardo\u00a0C.","family":"Garrido\u2010Merch\u00e1n","sequence":"first","affiliation":[{"name":"Faculty of Economics and Business (ICADE) Universidad Pontificia Comillas  Madrid Spain"},{"name":"Institute for Research in Technology (IIT) Universidad Pontificia Comillas  Madrid Spain"}]},{"given":"Sol","family":"Mora\u2010Figueroa","sequence":"additional","affiliation":[{"name":"Faculty of Economics and Business (ICADE) Universidad Pontificia Comillas  Madrid Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0315-3753","authenticated-orcid":false,"given":"Mar\u00eda","family":"Coronado\u2010Vaca","sequence":"additional","affiliation":[{"name":"Faculty of Economics and Business (ICADE) Universidad Pontificia Comillas  Madrid Spain"}]}],"member":"311","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_9_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jfs.2021.100869"},{"key":"e_1_2_9_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-16564-1_18"},{"key":"e_1_2_9_4_1","doi-asserted-by":"publisher","DOI":"10.1080\/20430795.2022.2106934"},{"key":"e_1_2_9_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejor.2011.07.011"},{"key":"e_1_2_9_6_1","doi-asserted-by":"crossref","unstructured":"Bansal M. A.Krizhevsky andA.Ogale.2018. \u201cChauffeurnet: Learning to Drive by Imitating the Best and Synthesizing the Worst.\u201d arXiv preprint arXiv:1812.03079.","DOI":"10.15607\/RSS.2019.XV.031"},{"key":"e_1_2_9_7_1","doi-asserted-by":"publisher","DOI":"10.1093\/rof\/rfac033"},{"key":"e_1_2_9_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2011.12.001"},{"key":"e_1_2_9_9_1","doi-asserted-by":"publisher","DOI":"10.2469\/faj.v48.n5.28"},{"key":"e_1_2_9_10_1","unstructured":"Brockman G. V.Cheung L.Pettersson et\u00a0al.2016. \u201cOpenai Gym.\u201darXiv preprint arXiv:1606.01540."},{"key":"e_1_2_9_11_1","doi-asserted-by":"crossref","unstructured":"Buehler H. L.Gonon J.Teichmann B.Wood B.Mohan andJ.Kochems.2020. \u201cDeep Hedging: Hedging Derivatives Under Generic Market Frictions Using Reinforcement Learning.\u201d Available atSSRN.","DOI":"10.2139\/ssrn.3355706"},{"key":"e_1_2_9_12_1","doi-asserted-by":"publisher","DOI":"10.1177\/0007650315570701"},{"key":"e_1_2_9_13_1","doi-asserted-by":"publisher","DOI":"10.3905\/jfds.2020.1.052"},{"key":"e_1_2_9_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.insmatheco.2021.03.017"},{"key":"e_1_2_9_15_1","first-page":"230","volume-title":"Deep Reinforcement Learning for Algorithmic Trading","author":"Cartea \u00c1.","year":"2023"},{"key":"e_1_2_9_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSESS47205.2019.9040728"},{"key":"e_1_2_9_17_1","first-page":"1","article-title":"Pseudo\u2010Model\u2010Free Hedging for Variable Annuities via Deep Reinforcement Learning","author":"Chong W. F.","year":"2021","journal-title":"Annals of Actuarial Science"},{"key":"e_1_2_9_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10203-021-00364-5"},{"key":"e_1_2_9_19_1","doi-asserted-by":"publisher","DOI":"10.2469\/faj.v61.n2.2716"},{"key":"e_1_2_9_20_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1540\u20106261.1992.tb04398.x"},{"key":"e_1_2_9_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-405X(93)90023-5"},{"key":"e_1_2_9_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jfineco.2014.10.010"},{"key":"e_1_2_9_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.iref.2020.07.013"},{"key":"e_1_2_9_24_1","doi-asserted-by":"publisher","DOI":"10.1561\/2200000071"},{"key":"e_1_2_9_25_1","unstructured":"Ganesh P.andP.Rakheja.2018. \u201cDeep Reinforcement Learning in High\u2010Frequency Trading.\u201d arXiv preprint arXiv:1809.01506."},{"issue":"1","key":"e_1_2_9_26_1","first-page":"1437","article-title":"A Comprehensive Survey on Safe Reinforcement Learning","volume":"16","author":"Garc\u00eda J.","year":"2015","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_9_27_1","doi-asserted-by":"publisher","DOI":"10.1088\/2515-7620\/acd0f8"},{"key":"e_1_2_9_28_1","unstructured":"Garrido Merch\u00e1nEduardo Cesar2021Advanced Methods for Bayesian Optimization in Complex Scenarios. Doctoral Thesis. Universidad Aut\u00f3noma de Madrid (UAM) Higher Polytechnique School Computer Science Department"},{"key":"e_1_2_9_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejor.2016.10.043"},{"key":"e_1_2_9_30_1","doi-asserted-by":"publisher","DOI":"10.3905\/jpm.2019.45.5.069"},{"key":"e_1_2_9_31_1","doi-asserted-by":"crossref","unstructured":"Gimeno R.andC. I.Gonz\u00e1lez.2022. \u201cThe Role of a Green Factor in Stock Prices. When Fama & French Go Green.\u201dBanco de Espa\u00f1a Working PaperNo. 2207 Available atSSRN.","DOI":"10.2139\/ssrn.4064848"},{"key":"e_1_2_9_32_1","unstructured":"G\u00f6rgenMaximilianWilkensMarcoOhlsenHenrikCARIMA\u2013 A Capital Market\u2010Based Approach to Quantifying and Managing Transition Risks (UA & VfU)2020"},{"key":"e_1_2_9_33_1","doi-asserted-by":"publisher","DOI":"10.1080\/1350486X.2020.1714455"},{"key":"e_1_2_9_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.rfe.2015.03.004"},{"key":"e_1_2_9_35_1","doi-asserted-by":"publisher","DOI":"10.1111\/mafi.12382"},{"key":"e_1_2_9_36_1","doi-asserted-by":"publisher","DOI":"10.3390\/jrfm16030201"},{"key":"e_1_2_9_37_1","doi-asserted-by":"publisher","DOI":"10.3905\/jpm.2019.45.4.067"},{"key":"e_1_2_9_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2023.119556"},{"key":"e_1_2_9_39_1","unstructured":"Jiang Z. D.Xu andJ.Liang.2017. \u201cA Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem.\u201d arXiv preprint arXiv:1706.10059."},{"key":"e_1_2_9_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/IntelliSys.2017.8324237"},{"key":"e_1_2_9_41_1","doi-asserted-by":"publisher","DOI":"10.1108\/JRF\u201005\u20102019\u20100075"},{"key":"e_1_2_9_42_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41591\u2010018\u20100213\u20105"},{"key":"e_1_2_9_43_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jestch.2021.01.007"},{"key":"e_1_2_9_44_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14539"},{"key":"e_1_2_9_45_1","doi-asserted-by":"crossref","unstructured":"Li J. R.Rao andJ.Shi.2018. \u201cLearning to Trade with Deep Actor\u2013Critic Methods.\u201d In 2018 11th International Symposium on Computational Intelligence and Design (ISCID) 2 66\u201371. IEEE.","DOI":"10.1109\/ISCID.2018.10116"},{"key":"e_1_2_9_46_1","unstructured":"Li X. Y.Li Y.Zhan andX.\u2010Y.Liu.2019. \u201cOptimistic Bull or Pessimistic Bear: Adaptive Deep Reinforcement Learning for Stock Portfolio Allocation.\u201d arXiv preprint arXiv:1907.01503."},{"key":"e_1_2_9_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10479\u2010020\u201003554\u20103"},{"key":"e_1_2_9_48_1","doi-asserted-by":"publisher","DOI":"10.2307\/1926735"},{"key":"e_1_2_9_49_1","unstructured":"Liu X.\u2010Y. Z.Xiong S.Zhong H.Yang andA.Walid.2018. \u201cPractical Deep Reinforcement Learning Approach for Stock Trading.\u201d arXiv preprint arXiv:1811.07522."},{"key":"e_1_2_9_50_1","doi-asserted-by":"crossref","unstructured":"Liu X.\u2010Y. H.Yang Q.Chen et\u00a0al.2020. \u201cFinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance.\u201d arXiv preprint arXiv:2011.09607.","DOI":"10.2139\/ssrn.3737257"},{"key":"e_1_2_9_51_1","doi-asserted-by":"crossref","unstructured":"Liu X.\u2010Y. H.Yang J.Gao andC. D.Wang.2021. \u201cFinRL: Deep Reinforcement Learning Framework to Automate Trading in Quantitative Finance.\u201d InProceedings of the Second ACM International Conference on AI in Finance 1\u20139.","DOI":"10.1145\/3490354.3494366"},{"key":"e_1_2_9_52_1","doi-asserted-by":"crossref","unstructured":"Mao G.andX.\u2010Y.2021. \u201cExplainable Deep Reinforcement Learning for Portfolio Management: An Empirical Approach.\u201d In Proceedings of the Second ACM International Conference on AI in Finance. 1\u20139.","DOI":"10.1145\/3490354.3494415"},{"key":"e_1_2_9_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/CIFEr52523.2022.9776048"},{"issue":"1","key":"e_1_2_9_54_1","first-page":"77","article-title":"Portfolio Selection","volume":"7","author":"Markovitz H. M.","year":"1952","journal-title":"Journal of Finance"},{"key":"e_1_2_9_55_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-78548-201-4.50015-5"},{"key":"e_1_2_9_56_1","doi-asserted-by":"publisher","DOI":"10.2469\/faj.v45.n1.31"},{"key":"e_1_2_9_57_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_2_9_58_1","doi-asserted-by":"publisher","DOI":"10.2307\/1910098"},{"key":"e_1_2_9_59_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-12423-5_13"},{"key":"e_1_2_9_60_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.bar.2017.10.003"},{"key":"e_1_2_9_61_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jfineco.2020.11.001"},{"key":"e_1_2_9_62_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10479-018-2921-0"},{"key":"e_1_2_9_63_1","doi-asserted-by":"crossref","unstructured":"Roncalli T. T.Le Guenedal F.Lepetit T.Roncalli andT.Sekine.2020. \u201cMeasuring and Managing Carbon Risk in Investment Portfolios. arXiv preprint arXiv:2008.13198.","DOI":"10.2139\/ssrn.3681266"},{"key":"e_1_2_9_64_1","doi-asserted-by":"publisher","DOI":"10.1111\/1540\u20106261.00453"},{"key":"e_1_2_9_65_1","unstructured":"Sadighian J.2019. \u201cDeep Reinforcement Learning in Cryptocurrency Market Making.\u201darXiv preprint arXiv:1911.08647."},{"key":"e_1_2_9_66_1","doi-asserted-by":"publisher","DOI":"10.3390\/app10041506"},{"key":"e_1_2_9_67_1","unstructured":"Schulman J. F.Wolski P.Dhariwal A.Radford andO.Klimov.2017. \u201cProximal Policy Optimization Algorithms.\u201d arXiv preprint arXiv:1707.06347."},{"key":"e_1_2_9_68_1","doi-asserted-by":"publisher","DOI":"10.1080\/0015198X.2020.1723390"},{"key":"e_1_2_9_69_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1540\u20106261.1964.tb02865.x"},{"key":"e_1_2_9_70_1","doi-asserted-by":"publisher","DOI":"10.3905\/jpm.1994.409501"},{"key":"e_1_2_9_71_1","volume-title":"Managing Downside Risk in Financial Markets: Theory, Practice, and Implementation","author":"Sortino F. A.","year":"2001"},{"key":"e_1_2_9_72_1","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton R. S.","year":"2018"},{"key":"e_1_2_9_73_1","first-page":"1057","article-title":"Policy Gradient Methods for Reinforcement Learning with Function Approximation","volume":"12","author":"Sutton R. S.","year":"1999","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_9_74_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cie.2014.07.005"},{"key":"e_1_2_9_75_1","doi-asserted-by":"crossref","unstructured":"Treynor J. L.1961. \u201cMarket Value Time and Risk.\u201d Time and Risk (August 8 1961). Available at SSRN.","DOI":"10.2139\/ssrn.2600356"},{"key":"e_1_2_9_76_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2019.113097"},{"key":"e_1_2_9_77_1","unstructured":"Whelan T. U.Atz T.Van Holt andC.Clark.2021. \u201cESG and Financial Performance: Uncovering the Relationship by Aggregating Evidence From 1 000 Plus Studies Published Between 2015\u20132020.\u201dNYU Stern Center for Sustainable Business and Rockefeller Asset Management 1:2015\u20132020."},{"key":"e_1_2_9_78_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0377-2217(03)00172-3"},{"key":"e_1_2_9_79_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.techfore.2022.121977"},{"issue":"1","key":"e_1_2_9_80_1","first-page":"40","article-title":"Calmar Ratio: A Smoother Tool","volume":"20","author":"Young T. W.","year":"1991","journal-title":"Futures"},{"key":"e_1_2_9_81_1","doi-asserted-by":"publisher","DOI":"10.1093\/rof\/rfac045"},{"key":"e_1_2_9_82_1","doi-asserted-by":"publisher","DOI":"10.3905\/jfds.2020.1.030"},{"key":"e_1_2_9_83_1","unstructured":"Zhu S. I.Ng andZ.Chen.2019. \u201cCausal Discovery with Reinforcement Learning.\u201darXiv preprint arXiv:1906.04477."}],"container-title":["Intelligent Systems in Accounting, Finance and Management"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/isaf.70008","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,26]],"date-time":"2025-06-26T08:24:15Z","timestamp":1750926255000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/isaf.70008"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6]]},"references-count":82,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["10.1002\/isaf.70008"],"URL":"https:\/\/doi.org\/10.1002\/isaf.70008","archive":["Portico"],"relation":{},"ISSN":["1550-1949","2160-0074"],"issn-type":[{"value":"1550-1949","type":"print"},{"value":"2160-0074","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6]]},"assertion":[{"value":"2023-07-31","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-12","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"e70008"}}