{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:19:30Z","timestamp":1760145570226,"version":"build-2065373602"},"reference-count":52,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2024,7,30]],"date-time":"2024-07-30T00:00:00Z","timestamp":1722297600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union\u2019s Horizon 2020 research and innovation program","award":["848011"],"award-info":[{"award-number":["848011"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>In medicine, dynamic treatment regimes (DTRs) have emerged to guide personalized treatment decisions for patients, accounting for their unique characteristics. However, existing methods for determining optimal DTRs face limitations, often due to reliance on linear models unsuitable for complex disease analysis and a focus on outcome prediction over treatment effect estimation. To overcome these challenges, decision tree-based reinforcement learning approaches have been proposed. Our study aims to evaluate the performance and feasibility of such algorithms: tree-based reinforcement learning (T-RL), DTR-Causal Tree (DTR-CT), DTR-Causal Forest (DTR-CF), stochastic tree-based reinforcement learning (SL-RL), and Q-learning with Random Forest. Using real-world clinical data, we conducted experiments to compare algorithm performances. Evaluation metrics included the proportion of correctly assigned patients to recommended treatments and the empirical mean with standard deviation of expected counterfactual outcomes based on estimated optimal treatment strategies. This research not only highlights the potential of decision tree-based reinforcement learning for dynamic treatment regimes but also contributes to advancing personalized medicine by offering nuanced and effective treatment recommendations.<\/jats:p>","DOI":"10.3390\/make6030088","type":"journal-article","created":{"date-parts":[[2024,8,1]],"date-time":"2024-08-01T18:04:59Z","timestamp":1722535499000},"page":"1798-1817","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Learning Optimal Dynamic Treatment Regime from Observational Clinical Data through Reinforcement Learning"],"prefix":"10.3390","volume":"6","author":[{"given":"Seyum","family":"Abebe","sequence":"first","affiliation":[{"name":"European Centre for Living Technology, Ca\u2019 Foscari University of Venice, 30123 Venice, Italy"}]},{"given":"Irene","family":"Poli","sequence":"additional","affiliation":[{"name":"European Centre for Living Technology, Ca\u2019 Foscari University of Venice, 30123 Venice, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8491-5421","authenticated-orcid":false,"given":"Roger D.","family":"Jones","sequence":"additional","affiliation":[{"name":"European Centre for Living Technology, Ca\u2019 Foscari University of Venice, 30123 Venice, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4204-1009","authenticated-orcid":false,"given":"Debora","family":"Slanzi","sequence":"additional","affiliation":[{"name":"European Centre for Living Technology, Ca\u2019 Foscari University of Venice, 30123 Venice, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2024,7,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1127","DOI":"10.1016\/j.numecd.2019.07.017","article-title":"Diabetic kidney disease: New clinical and therapeutic issues. Joint position statement of the Italian Diabetes Society and the Italian Society of Nephrology on \u201cThe natural history of diabetic kidney disease and treatment of hyperglycemia in patients with type 2 diabetes and impaired renal function\u201d","volume":"29","author":"Pugliese","year":"2019","journal-title":"Nutr. Metab. Cardiovasc. Dis."},{"key":"ref_2","unstructured":"Koller, D., and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques, MIT Press."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1700391","DOI":"10.1183\/13993003.00391-2017","article-title":"What is precision medicine?","volume":"50","author":"Fuchs","year":"2017","journal-title":"Eur. Respir. J."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"694","DOI":"10.1377\/hlthaff.2017.1624","article-title":"Precision medicine: From science to value","volume":"37","author":"Ginsburg","year":"2018","journal-title":"Health Aff."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1393","DOI":"10.1016\/0270-0255(86)90088-6","article-title":"A new approach to causal inference in mortality studies with a sustained exposure period\u2014Application to control of the healthy worker survivor effect","volume":"7","author":"Robins","year":"1986","journal-title":"Math. Model."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2379","DOI":"10.1080\/03610929408831393","article-title":"Correcting for non-compliance in randomized trials using structural nested mean models","volume":"23","author":"Robins","year":"1994","journal-title":"Commun. Stat. Theory Methods"},{"key":"ref_7","unstructured":"Robins, J.M. Causal inference from complex longitudinal data. Proceedings of the Latent Variable Modeling and Applications to Causality, Lecture Notes in Statistics."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1111\/1467-9868.00389","article-title":"Optimal dynamic treatment regimes","volume":"65","author":"Murphy","year":"2003","journal-title":"J. R. Stat. Soc. Ser. B (Stat. Methodol.)"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Chakraborty, B., and Moodie, E.E. (2013). Statistical Methods for Dynamic Treatment Regimes, Springer.","DOI":"10.1007\/978-1-4614-7428-9"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1146\/annurev-statistics-022513-115553","article-title":"Dynamic treatment regimes","volume":"1","author":"Chakraborty","year":"2014","journal-title":"Annu. Rev. Stat. Its Appl."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1377\/hlthaff.20.6.64","article-title":"Improving chronic illness care: Translating evidence into action","volume":"20","author":"Wagner","year":"2001","journal-title":"Health Aff."},{"key":"ref_12","unstructured":"Robins, J.M. Optimal structural nested models for optimal sequential decisions. Proceedings of the Second Seattle Symposium in Biostatistics: Analysis of Correlated Data."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1410","DOI":"10.1198\/016214501753382327","article-title":"Marginal mean models for dynamic regimes","volume":"96","author":"Murphy","year":"2001","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1002\/cjs.11162","article-title":"Q-learning for estimating optimal dynamic treatment rules from observational data","volume":"40","author":"Moodie","year":"2012","journal-title":"Can. J. Stat."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v080.i02","article-title":"Dynamic treatment regimen estimation via regression-based techniques: Introducing r package dtrreg","volume":"80","author":"Wallace","year":"2017","journal-title":"J. Stat. Softw."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Tsiatis, A.A., Davidian, M., Holloway, S.T., and Laber, E.B. (2019). Dynamic Treatment Regimes: Statistical Methods for Precision Medicine, CRC press.","DOI":"10.1201\/9780429192692"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"van der Laan, M.J., Petersen, M.L., and Joffe, M.M. (2005). History-adjusted marginal structural models and statically-optimal dynamic treatment regimens. Int. J. Biostat., 1.","DOI":"10.2202\/1557-4679.1003"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_19","unstructured":"Murphy, S.A. (2024, May 27). A Generalization Error for Q-Learning. Available online: https:\/\/www.jmlr.org\/papers\/volume6\/murphy05a\/murphy05a.pdf."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Mahar, R.K., McGuinness, M.B., Chakraborty, B., Carlin, J.B., IJzerman, M.J., and Simpson, J.A. (2021). A scoping review of studies using observational data to optimise dynamic treatment regimens. BMC Med. Res. Methodol., 21.","DOI":"10.1186\/s12874-021-01211-2"},{"key":"ref_21","unstructured":"Blumlein, T., Persson, J., and Feuerriegel, S. (2022, January 5\u20136). Learning optimal dynamic treatment regimes using causal tree methods in medicine. Proceedings of the Machine Learning for Healthcare Conference. PMLR, Durham, NC, USA."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1111\/biom.12539","article-title":"Adaptive contrast weighted learning for multi-stage multi-treatment decision-making","volume":"73","author":"Tao","year":"2017","journal-title":"Biometrics"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"501","DOI":"10.1093\/biomet\/asv028","article-title":"Tree-based methods for individualized treatment regimes","volume":"102","author":"Laber","year":"2015","journal-title":"Biometrika"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"895","DOI":"10.1111\/biom.12354","article-title":"Using decision lists to construct interpretable and parsimonious treatment regimes","volume":"71","author":"Zhang","year":"2015","journal-title":"Biometrics"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1541","DOI":"10.1080\/01621459.2017.1345743","article-title":"Interpretable dynamic treatment regimes","volume":"113","author":"Zhang","year":"2018","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_26","unstructured":"Lakkaraju, H., and Rudin, C. (2017, January 20\u201322). Learning cost-effective and interpretable treatment regimes. Proceedings of the Artificial Intelligence and Statistics. PMLR, Fort Lauderdale, FL, USA."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1007\/BF00058680","article-title":"Learning decision lists","volume":"2","author":"Rivest","year":"1987","journal-title":"Mach. Learn."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1914","DOI":"10.1214\/18-AOAS1137","article-title":"Tree-based reinforcement learning for estimating optimal dynamic treatment regimes","volume":"12","author":"Tao","year":"2018","journal-title":"Ann. Appl. Stat."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1080\/01621459.2020.1819294","article-title":"Stochastic tree search for estimating optimal dynamic treatment regimes","volume":"116","author":"Sun","year":"2021","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_30","unstructured":"Min, J., and Elliott, L.T. (2022). Q-learning with online random forests. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Alyass, A., Turcotte, M., and Meyre, D. (2015). From big data analysis to personalized medicine for all: Challenges and opportunities. BMC Med. Genom., 8.","DOI":"10.1186\/s12920-015-0108-y"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"3","DOI":"10.3892\/br.2017.922","article-title":"Personalized medicine could transform healthcare","volume":"7","author":"Mathur","year":"2017","journal-title":"Biomed. Rep."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"S31","DOI":"10.1093\/ibd\/izz078","article-title":"Challenges in IBD research: Precision medicine","volume":"25","author":"Denson","year":"2019","journal-title":"Inflamm. Bowel Dis."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"459","DOI":"10.1016\/S0196-0644(97)70217-8","article-title":"Risk stratification of patients with syncope","volume":"29","author":"Martin","year":"1997","journal-title":"Ann. Emerg. Med."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1484","DOI":"10.1001\/jamaoncol.2018.1940","article-title":"Implementation challenges for risk-stratified screening in the era of precision medicine","volume":"4","author":"Roberts","year":"2018","journal-title":"JAMA Oncol."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1093\/biomet\/70.1.41","article-title":"The central role of the propensity score in observational studies for causal effects","volume":"70","author":"Rosenbaum","year":"1983","journal-title":"Biometrika"},{"key":"ref_37","first-page":"599","article-title":"Estimation of the causal effects of time-varying exposures","volume":"553","author":"Robins","year":"2009","journal-title":"Longitud. Data Anal."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1038\/s41584-020-00538-2","article-title":"Machine learning in precision medicine: Lessons to learn","volume":"17","author":"Plant","year":"2021","journal-title":"Nat. Rev. Rheumatol."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"805","DOI":"10.1002\/bimj.202100077","article-title":"Optimal dynamic treatment regime estimation using information extraction from unstructured clinical text","volume":"64","author":"Zhou","year":"2022","journal-title":"Biom. J."},{"key":"ref_40","unstructured":"Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984). Classification and Regression Trees, Routledge."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1080\/01621459.1995.10476493","article-title":"Analysis of semiparametric regression models for repeated outcomes in the presence of missing data","volume":"90","author":"Robins","year":"1995","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"935","DOI":"10.1080\/01621459.1998.10473750","article-title":"Bayesian CART model search","volume":"93","author":"Chipman","year":"1998","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1198\/106186007X180426","article-title":"Bayesian CART: Prior specification and posterior simulation","volume":"16","author":"Wu","year":"2007","journal-title":"J. Comput. Graph. Stats."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"7353","DOI":"10.1073\/pnas.1510489113","article-title":"Recursive partitioning for heterogeneous causal effects","volume":"113","author":"Athey","year":"2016","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1482","DOI":"10.1093\/ndt\/gfw193.01","article-title":"Baseline Data from the Multinational Prospective Cohort Study for Validation of Biomarkers (Provalid)","volume":"31","author":"Mayer","year":"2016","journal-title":"Nephrol. Dial. Transplant."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1159\/000487500","article-title":"A prospective cohort study in patients with type 2 diabetes mellitus for validation of biomarkers (PROVALID)\u2014Study design and baseline characteristics","volume":"43","author":"Eder","year":"2018","journal-title":"Kidney Blood Press. Res."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1186\/s41512-021-00107-5","article-title":"A prediction model for the decline in renal function in people with type 2 diabetes mellitus: Study protocol","volume":"5","author":"Gregorich","year":"2021","journal-title":"Diagn. Progn. Res."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Scutari, M., and Denis, J.B. (2021). Bayesian Networks: With Examples in R, Chapman and Hall\/CRC.","DOI":"10.1201\/9780429347436"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"15236","DOI":"10.1038\/s41598-017-15293-w","article-title":"Bayesian networks analysis of malocclusion data","volume":"7","author":"Scutari","year":"2017","journal-title":"Sci. Rep."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1016\/j.jval.2019.01.006","article-title":"Bayesian networks for risk prediction using real-world data: A tool for precision medicine","volume":"22","author":"Arora","year":"2019","journal-title":"Value Health"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"116547","DOI":"10.1016\/j.eswa.2022.116547","article-title":"Decision support analysis for risk identification and control of patients affected by COVID-19 based on Bayesian Networks","volume":"196","author":"Shen","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"8721","DOI":"10.1007\/s10462-022-10351-w","article-title":"A survey of Bayesian Network structure learning","volume":"56","author":"Kitson","year":"2023","journal-title":"Artif. Intell. Rev."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/6\/3\/88\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:26:32Z","timestamp":1760109992000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/6\/3\/88"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,30]]},"references-count":52,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2024,9]]}},"alternative-id":["make6030088"],"URL":"https:\/\/doi.org\/10.3390\/make6030088","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2024,7,30]]}}}