{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,11]],"date-time":"2025-07-11T10:33:36Z","timestamp":1752230016272,"version":"3.37.3"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2020,3,14]],"date-time":"2020-03-14T00:00:00Z","timestamp":1584144000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,3,14]],"date-time":"2020-03-14T00:00:00Z","timestamp":1584144000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"crossref","award":["SCHM 2966\/2-1"],"award-info":[{"award-number":["SCHM 2966\/2-1"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Min Knowl Disc"],"published-print":{"date-parts":[[2020,5]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Random survival forests (RSF) are a powerful nonparametric method for building prediction models with a time-to-event outcome. RSF do not rely on the proportional hazards assumption and can be readily applied to both low- and higher-dimensional data. A remaining limitation of RSF, however, arises from the fact that the method is almost entirely focussed on continuously measured event times. This issue may become problematic in studies where time is measured on a discrete scale<jats:inline-formula><jats:alternatives><jats:tex-math>$$t = 1, 2, ...$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:mi>t<\/mml:mi><mml:mo>=<\/mml:mo><mml:mn>1<\/mml:mn><mml:mo>,<\/mml:mo><mml:mn>2<\/mml:mn><mml:mo>,<\/mml:mo><mml:mo>.<\/mml:mo><mml:mo>.<\/mml:mo><mml:mo>.<\/mml:mo><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>, referring to time intervals<jats:inline-formula><jats:alternatives><jats:tex-math>$$[0,a_1), [a_1,a_2), \\ldots $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:mrow><mml:mo>[<\/mml:mo><mml:mn>0<\/mml:mn><mml:mo>,<\/mml:mo><mml:msub><mml:mi>a<\/mml:mi><mml:mn>1<\/mml:mn><\/mml:msub><mml:mo>)<\/mml:mo><\/mml:mrow><mml:mo>,<\/mml:mo><mml:mrow><mml:mo>[<\/mml:mo><mml:msub><mml:mi>a<\/mml:mi><mml:mn>1<\/mml:mn><\/mml:msub><mml:mo>,<\/mml:mo><mml:msub><mml:mi>a<\/mml:mi><mml:mn>2<\/mml:mn><\/mml:msub><mml:mo>)<\/mml:mo><\/mml:mrow><mml:mo>,<\/mml:mo><mml:mo>\u2026<\/mml:mo><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>. In this situation, the application of methods designed for continuous time-to-event data may lead to biased estimators and inaccurate predictions if discreteness is ignored. To address this issue, we develop a RSF algorithm that is specifically designed for the analysis of (possibly right-censored) discrete event times. The algorithm is based on an ensemble of discrete-time survival trees that operate on transformed versions of the original time-to-event data using tree methods for binary classification. As the outcome variable in these trees is typically highly imbalanced, our algorithm implements a node splitting strategy based on Hellinger\u2019s distance, which is a skew-insensitive alternative to classical split criteria such as the Gini impurity. The new algorithm thus provides flexible nonparametric predictions of individual-specific discrete hazard and survival functions. Our numerical results suggest that node splitting by Hellinger\u2019s distance improves predictive performance when compared to the Gini impurity. Furthermore, discrete-time RSF improve prediction accuracy when compared to RSF approaches treating discrete event times as continuous in situations where the number of time intervals is small.<\/jats:p>","DOI":"10.1007\/s10618-020-00682-z","type":"journal-article","created":{"date-parts":[[2020,3,14]],"date-time":"2020-03-14T14:02:21Z","timestamp":1584194541000},"page":"812-832","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Discrete-time survival forests with Hellinger distance decision trees"],"prefix":"10.1007","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0788-0317","authenticated-orcid":false,"given":"Matthias","family":"Schmid","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thomas","family":"Welchowski","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marvin N.","family":"Wright","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Moritz","family":"Berger","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,3,14]]},"reference":[{"key":"682_CR1","doi-asserted-by":"publisher","first-page":"2720","DOI":"10.1210\/jc.2018-00511","volume":"103","author":"M Banerjee","year":"2018","unstructured":"Banerjee M, Reyes-Gastelum D, Haymart MR (2018) Treatment-free survival in patients with differentiated thyroid cancer. J Clin Endocrinol Metab 103:2720\u20132727","journal-title":"J Clin Endocrinol Metab"},{"key":"682_CR2","doi-asserted-by":"publisher","first-page":"322","DOI":"10.1177\/1471082X17748084","volume":"18","author":"M Berger","year":"2018","unstructured":"Berger M, Schmid M (2018) Semiparametric regression for discrete time-to-event data. Stat Model 18:322\u2013345","journal-title":"Stat Model"},{"key":"682_CR3","doi-asserted-by":"publisher","DOI":"10.1093\/biostatistics\/kxy069","author":"M Berger","year":"2018","unstructured":"Berger M, Schmid M, Welchowski T, Schmitz-Valckenberg S, Beyersmann J (2018) Subdistribution hazard models for competing risks in discrete time. Biostatistics. https:\/\/doi.org\/10.1093\/biostatistics\/kxy069","journal-title":"Biostatistics"},{"key":"682_CR4","doi-asserted-by":"publisher","DOI":"10.1201\/9781315116945","volume-title":"Survival analysis with interval-censored data: a practical approach with examples in R, SAS, and BUGS","author":"K Bogaerts","year":"2017","unstructured":"Bogaerts K, Komarek A, Lesaffre E (2017) Survival analysis with interval-censored data: a practical approach with examples in R, SAS, and BUGS. Chapman & Hall\/CRC, New York"},{"key":"682_CR5","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1002\/cjs.10007","volume":"37","author":"I Bou-Hamad","year":"2009","unstructured":"Bou-Hamad I, Larocque D, Ben-Hameur H, M\u00e2sse LC, Vitaro F, Tremblay RE (2009) Discrete-time survival trees. Can J Stat 37:17\u201332","journal-title":"Can J Stat"},{"key":"682_CR6","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1177\/1471082X1001100503","volume":"11","author":"I Bou-Hamad","year":"2011","unstructured":"Bou-Hamad I, Larocque D, Ben-Ameur H (2011) Discrete-time survival trees and forests with time-varying covariates: application to bankruptcy data. Stat Model 11:429\u2013446","journal-title":"Stat Model"},{"key":"682_CR7","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L (2001) Random forests. Mach Learn 45:5\u201332","journal-title":"Mach Learn"},{"key":"682_CR8","volume-title":"Classification and regression trees","author":"L Breiman","year":"1984","unstructured":"Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont"},{"key":"682_CR9","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511811241","volume-title":"Microeconometrics: methods and applications","author":"AC Cameron","year":"2005","unstructured":"Cameron AC, Trivedi PK (2005) Microeconometrics: methods and applications. Cambridge University Press, Cambridge"},{"key":"682_CR10","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1007\/978-3-540-87479-9_34","volume-title":"Proceedings of the joint conference on machine learning and knowledge discovery in databases: ECML PKDD 2008, Antwerp, Belgium","author":"DA Cieslak","year":"2008","unstructured":"Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. In: Daelemans W, Goethals B, Morik K (eds) Proceedings of the joint conference on machine learning and knowledge discovery in databases: ECML PKDD 2008, Antwerp, Belgium. Springer, Berlin, pp 241\u2013256"},{"key":"682_CR11","doi-asserted-by":"publisher","first-page":"136","DOI":"10.1007\/s10618-011-0222-1","volume":"24","author":"DA Cieslak","year":"2012","unstructured":"Cieslak DA, Hoens TR, Chawla NV, Kegelmeyer WP (2012) Hellinger distance decision trees are robust and skew-insensitive. Data Min Knowl Discov 24:136\u2013158","journal-title":"Data Min Knowl Discov"},{"key":"682_CR12","unstructured":"Croissant Y (2016) Ecdat: data sets for econometrics. R package version 0.3-1. http:\/\/cran.r-project.org\/web\/packages\/Ecdat. Accessed 16 Nov 2019"},{"key":"682_CR13","unstructured":"Dal Pozzolo A, Caelen O, Bontempi G (2015) Unbalanced: racing for unbalanced methods selection. R package version 2.0. http:\/\/cran.r-project.org\/web\/packages\/unbalanced. Accessed 16 Nov 2019"},{"key":"682_CR14","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1007\/s11009-008-9078-2","volume":"11","author":"D Fantazzini","year":"2009","unstructured":"Fantazzini D, Figini S (2009) Random survival forests models for SME credit risk measurement. Methodol Comput Appl Probab 11:29\u201345","journal-title":"Methodol Comput Appl Probab"},{"key":"682_CR15","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1016\/j.contraception.2012.10.010","volume":"88","author":"R Fehring","year":"2013","unstructured":"Fehring R, Schneider M, Raviele K, Rodriguez D, Pruszynski J (2013) Randomized comparison of two internet-supported fertility-awareness-based methods of family planning. Contraception 88:24\u201330","journal-title":"Contraception"},{"key":"682_CR16","doi-asserted-by":"publisher","first-page":"863","DOI":"10.1613\/jair.1.11192","volume":"61","author":"A Fernandez","year":"2018","unstructured":"Fernandez A, Garcia S, Herrera F, Chawla NV (2018) SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61:863\u2013905","journal-title":"J Artif Intell Res"},{"key":"682_CR17","unstructured":"Friedman J, Hastie T, Tibshirani R, Narasimhan B, Simon N (2019) glmnet: lasso and elastic-net regularized generalized linear models. R package version 3.0. http:\/\/cran.r-project.org\/web\/packages\/glmnet. Accessed 16 Nov 2019"},{"key":"682_CR18","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1198\/016214506000001437","volume":"102","author":"T Gneiting","year":"2007","unstructured":"Gneiting T, Raftery A (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102:359\u2013378","journal-title":"J Am Stat Assoc"},{"key":"682_CR19","doi-asserted-by":"crossref","first-page":"77","DOI":"10.20377\/jfr-235","volume":"23","author":"J Huinink","year":"2011","unstructured":"Huinink J, Br\u00fcderl J, Nauck B, Walper S, Castiglioni L, Feldhaus M (2011) Panel analysis of intimate relationships and family dynamics (pairfam): conceptual framework and design. J Fam Res 23:77\u2013101","journal-title":"J Fam Res"},{"key":"682_CR20","doi-asserted-by":"publisher","first-page":"769","DOI":"10.2967\/jnumed.117.200758","volume":"59","author":"M Ingrisch","year":"2018","unstructured":"Ingrisch M, Sch\u00f6ppe F, Paprottka K, Fabritius M, Strobl FF, Toni END, Ilhan H, Todica A, Michl M, Paprottka PM (2018) Prediction of 90Y radioembolization outcome from pretherapeutic factors with random survival forests. J Nucl Med 59:769\u2013773","journal-title":"J Nucl Med"},{"key":"682_CR21","doi-asserted-by":"publisher","first-page":"841","DOI":"10.1214\/08-AOAS169","volume":"2","author":"H Ishwaran","year":"2008","unstructured":"Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2:841\u2013860","journal-title":"Ann Appl Stat"},{"key":"682_CR22","doi-asserted-by":"publisher","first-page":"205","DOI":"10.1198\/jasa.2009.tm08622","volume":"105","author":"H Ishwaran","year":"2010","unstructured":"Ishwaran H, Kogalur UB, Gorodeski EZ, Minn AJ, Lauer MS (2010) High-dimensional variable selection for survival data. J Am Stat Assoc 105:205\u2013217","journal-title":"J Am Stat Assoc"},{"key":"682_CR23","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1002\/sam.10103","volume":"4","author":"H Ishwaran","year":"2011","unstructured":"Ishwaran H, Kogalur UB, Chen X, Minn AJ (2011) Random survival forests for high-dimensional data. Stat Anal Data Min 4:115\u2013132","journal-title":"Stat Anal Data Min"},{"key":"682_CR24","doi-asserted-by":"publisher","DOI":"10.1177\/0962280219862586","author":"N Korepanova","year":"2019","unstructured":"Korepanova N, Seibold H, Steffen V, Hothorn T (2019) Survival forests under test: impact of the proportional hazards assumption on prognostic and predictive forests for amyotrophic lateral sclerosis survival. Stat Methods Med Res. https:\/\/doi.org\/10.1177\/0962280219862586","journal-title":"Stat Methods Med Res"},{"key":"682_CR25","doi-asserted-by":"publisher","first-page":"457","DOI":"10.1080\/01621459.1993.10476296","volume":"88","author":"M LeBlanc","year":"1993","unstructured":"LeBlanc M, Crowley J (1993) Survival trees by goodness of split. J Am Stat Assoc 88:457\u2013467","journal-title":"J Am Stat Assoc"},{"key":"682_CR26","doi-asserted-by":"publisher","first-page":"647","DOI":"10.2307\/2171865","volume":"64","author":"BP McCall","year":"1996","unstructured":"McCall BP (1996) Unemployment insurance rules, joblessness, and part-time work. Econometrica 64:647\u2013682","journal-title":"Econometrica"},{"key":"682_CR27","doi-asserted-by":"publisher","first-page":"92","DOI":"10.1007\/s10618-012-0295-5","volume":"28","author":"G Menardi","year":"2014","unstructured":"Menardi G, Torelli N (2014) Training and assessing classification rules with imbalanced data. Data Min Knowl Discov 28:92\u2013122","journal-title":"Data Min Knowl Discov"},{"key":"682_CR28","doi-asserted-by":"publisher","first-page":"671","DOI":"10.1007\/s10985-016-9372-1","volume":"23","author":"H Moradian","year":"2017","unstructured":"Moradian H, Larocque D, Bellavance F (2017) $${L}_1$$ splitting rules in survival forests. Lifetime Data Anal 23:671\u2013691","journal-title":"Lifetime Data Anal"},{"key":"682_CR29","first-page":"6724","volume":"14","author":"Y Pan","year":"2017","unstructured":"Pan Y, Zhang H, Zhang M, Zhu J, Yu J, Wang B, Qiu J, Zhang J (2017) A five-gene based risk score with high prognostic value in colorectal cancer. Oncol Lett 14:6724\u20136734","journal-title":"Oncol Lett"},{"key":"682_CR30","doi-asserted-by":"publisher","first-page":"199","DOI":"10.1023\/A:1024099825458","volume":"52","author":"F Provost","year":"2003","unstructured":"Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52:199\u2013215","journal-title":"Mach Learn"},{"key":"682_CR31","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1191\/0962280206sm435oa","volume":"15","author":"TH Scheike","year":"2006","unstructured":"Scheike TH, Keiding N (2006) Design and analysis of time-to-pregnancy. Stat Methods Med Res 15:127\u2013140","journal-title":"Stat Methods Med Res"},{"key":"682_CR32","doi-asserted-by":"publisher","first-page":"734","DOI":"10.1002\/sim.6729","volume":"35","author":"M Schmid","year":"2016","unstructured":"Schmid M, K\u00fcchenhoff H, Hoerauf A, Tutz G (2016a) A survival tree method for the analysis of discrete event times in clinical and epidemiological studies. Stat Med 35:734\u2013751","journal-title":"Stat Med"},{"key":"682_CR33","doi-asserted-by":"publisher","first-page":"450","DOI":"10.1016\/j.eswa.2016.07.018","volume":"63","author":"M Schmid","year":"2016","unstructured":"Schmid M, Wright MN, Ziegler A (2016b) On the use of Harrell\u2019s C for clinical risk prediction via random survival forests. Expert Syst Appl 63:450\u2013459","journal-title":"Expert Syst Appl"},{"key":"682_CR34","first-page":"153","volume":"7","author":"M Schmid","year":"2018","unstructured":"Schmid M, Tutz G, Welchowski T (2018) Discrimination measures for discrete time-to-event predictions. Econom Stat 7:153\u2013164","journal-title":"Econom Stat"},{"key":"682_CR35","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-28158-2","volume-title":"Modeling discrete time-to-event data","author":"G Tutz","year":"2016","unstructured":"Tutz G, Schmid M (2016) Modeling discrete time-to-event data. Springer, New York"},{"key":"682_CR36","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1186\/s12898-018-0187-7","volume":"18","author":"TA Verschut","year":"2018","unstructured":"Verschut TA, Hamb\u00e4ck PA (2018) A random survival forest illustrates the importance of natural enemies compared to host plant quality on leaf beetle survival rates. BMC Ecol 18:33","journal-title":"BMC Ecol"},{"key":"682_CR37","unstructured":"Welchowski T, Schmid M (2019) discSurv: discrete time survival analysis. R package version 1.4.0. http:\/\/cran.r-project.org\/web\/packages\/discSurv. Accessed 16 Nov 2019"},{"key":"682_CR38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v077.i01","volume":"77","author":"MN Wright","year":"2017","unstructured":"Wright MN, Ziegler A (2017) Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1\u201317","journal-title":"J Stat Softw"},{"key":"682_CR39","doi-asserted-by":"publisher","first-page":"1272","DOI":"10.1002\/sim.7212","volume":"36","author":"MN Wright","year":"2017","unstructured":"Wright MN, Dankowski T, Ziegler A (2017) Unbiased split variable selection for random survival forests using maximally selected rank statistics. Stat Med 36:1272\u20131284","journal-title":"Stat Med"},{"key":"682_CR40","doi-asserted-by":"publisher","DOI":"10.1093\/biostatistics\/kxz025","author":"W Yao","year":"2019","unstructured":"Yao W, Frydman H, Simonoff JS (2019a) An ensemble method for interval-censored time-to-event data. Biostatistics. https:\/\/doi.org\/10.1093\/biostatistics\/kxz025","journal-title":"Biostatistics"},{"key":"682_CR41","doi-asserted-by":"crossref","unstructured":"Yao W, Frydman H, Simonoff JS (2019b) ICcforest: an ensemble method for interval-censored survival data. R package version 0.5.0. http:\/\/cran.r-project.org\/web\/packages\/ICcforest. Accessed 16 Nov 2019","DOI":"10.32614\/CRAN.package.ICcforest"}],"container-title":["Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-020-00682-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10618-020-00682-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-020-00682-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,2]],"date-time":"2024-08-02T05:31:00Z","timestamp":1722576660000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10618-020-00682-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,14]]},"references-count":41,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,5]]}},"alternative-id":["682"],"URL":"https:\/\/doi.org\/10.1007\/s10618-020-00682-z","relation":{},"ISSN":["1384-5810","1573-756X"],"issn-type":[{"type":"print","value":"1384-5810"},{"type":"electronic","value":"1573-756X"}],"subject":[],"published":{"date-parts":[[2020,3,14]]},"assertion":[{"value":"17 June 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 March 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 March 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}