{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T12:43:05Z","timestamp":1775047385216,"version":"3.50.1"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2023,3,16]],"date-time":"2023-03-16T00:00:00Z","timestamp":1678924800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,3,16]],"date-time":"2023-03-16T00:00:00Z","timestamp":1678924800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["FOR 2488"],"award-info":[{"award-number":["FOR 2488"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["FOR 2488"],"award-info":[{"award-number":["FOR 2488"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Adv Data Anal Classif"],"published-print":{"date-parts":[[2024,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    In life sciences, random forests are often used to train predictive models. However, gaining any explanatory insight into the mechanics leading to a specific outcome is rather complex, which impedes the implementation of random forests into clinical practice. By simplifying a complex ensemble of decision trees to a single most representative tree, it is assumed to be possible to observe common tree structures, the importance of specific features and variable interactions. Thus, representative trees could also help to understand interactions between genetic variants. Intuitively, representative trees are those with the minimal distance to all other trees, which requires a proper definition of the distance between two trees. Thus, we developed a new tree-based distance measure, which incorporates more of the underlying tree structure than other metrics. We compared our new method with the existing metrics in an extensive simulation study and applied it to predict the age at onset based on a set of genetic risk factors in a clinical data set. In our simulation study we were able to show the advantages of our weighted splitting variable approach. Our real data application revealed that representative trees are not only able to replicate the results from a recent genome-wide association study, but also can give additional explanations of the genetic mechanisms. Finally, we implemented all compared distance measures in R and made them publicly available in the R package timbR (\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/imbs-hl\/timbR\">https:\/\/github.com\/imbs-hl\/timbR<\/jats:ext-link>\n                    ).\n                  <\/jats:p>","DOI":"10.1007\/s11634-023-00537-7","type":"journal-article","created":{"date-parts":[[2023,3,26]],"date-time":"2023-03-26T19:48:51Z","timestamp":1679860131000},"page":"363-380","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Identification of representative trees in random forests based on a new tree-based distance measure"],"prefix":"10.1007","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9265-5738","authenticated-orcid":false,"given":"Bj\u00f6rn-Hergen","family":"Laabs","sequence":"first","affiliation":[]},{"given":"Ana","family":"Westenberger","sequence":"additional","affiliation":[]},{"given":"Inke R.","family":"K\u00f6nig","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,3,16]]},"reference":[{"issue":"6","key":"537_CR1","doi-asserted-by":"publisher","first-page":"557","DOI":"10.3414\/ME16-01-0055","volume":"55","author":"W Adler","year":"2016","unstructured":"Adler W, Gefeller O, Gul A, Horn FK, Kahn Z, Lausen B (2016) Ensemble pruning for glaucoma detection in an unbalanced data set. Methods Inf Med 55(6):557\u2013563. https:\/\/doi.org\/10.3414\/ME16-01-0055","journal-title":"Methods Inf Med"},{"key":"537_CR2","doi-asserted-by":"publisher","unstructured":"Aneichyk T, Hendriks WT, Yadav R, Shin D, Gao D, Vaine CA, Collins RL, Domingo A, Currall B, Stortchevoi A, Multhaupt-Buell T, Penney EB, Cruz L, Dhakal J, Brand H, Hanscom C, Antolik C, Dy M, Ragavendra A, Underwood J, Cantsilieris S, Munson KM, Eichler EE, Acu$$\\tilde{n}$$a P, Go C, Jamora RDG, Rosales RL, Church DM, Williams SR, Garcia S, Klein C, M\u00fcller U, Wilhelmsen KC, Timmers HTM, Sapir Y, Wainger BJ, Henderson D, Ito N, Weisenfeld N, Jaffe D, Sharma N, Braekefield XO, Ozelius LJ, Bragg DC, and Talkowski ME (2018) Dissecting the Causal Mechanism of X-Linked Dystonia-Parkinsonism by Integrating Genome and Transcriptome Assembly. Cell 172(5):897\u2013909. https:\/\/doi.org\/10.1016\/j.cell.2018.02.011","DOI":"10.1016\/j.cell.2018.02.011"},{"issue":"15","key":"537_CR3","doi-asserted-by":"publisher","first-page":"1601","DOI":"10.1002\/sim.4492","volume":"31","author":"M Banerjee","year":"2012","unstructured":"Banerjee M, Ding Y, Noone A-M (2012) Identifying representative trees from ensembles. Stat Med 31(15):1601\u20131616. https:\/\/doi.org\/10.1002\/sim.4492","journal-title":"Stat Med"},{"key":"537_CR4","doi-asserted-by":"publisher","unstructured":"Bragg DC, Mangkalaphiban K, Vaine CA, Kulkarni NJ, Shin D, Yadav R, Dhakal J, Ton ML, Cheng A, Russo CT, Ang M, Acu$$\\tilde{\\text{n}}$$a P, Go C, Franceour TN, Multhaupt-Buell T, Ito N, M\u00fcller U, Hendriks WT, Breakfield XO, Sharma N and Ozelius LJ (2017) Disease onset in X-linked dystonia-parkinsonism correlates with expansion of a hexameric repeat within an SVA retrotransposon in TAF1. Proceedings of the national academy of sciences of the United States of America 114(51):E11020\u2013E11028. https:\/\/doi.org\/10.1073\/pnas.1712526114","DOI":"10.1073\/pnas.1712526114"},{"issue":"2","key":"537_CR5","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1007\/BF00058655","volume":"24","author":"L Breiman","year":"1996","unstructured":"Breiman L (1996) Bagging predictors. Mach Learn 24(2):123\u2013140. https:\/\/doi.org\/10.1007\/BF00058655","journal-title":"Mach Learn"},{"issue":"1","key":"537_CR6","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L (2001) Random forests. Mach Learn 45(1):5\u201332. https:\/\/doi.org\/10.1023\/A:1010933404324","journal-title":"Mach Learn"},{"key":"537_CR7","volume-title":"Classification and regression trees","author":"L Breiman","year":"1984","unstructured":"Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. Chapman & Hall\/CRC, Boca Raton"},{"key":"537_CR8","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pgen.1007274","author":"MJ Chao","year":"2018","unstructured":"Chao MJ, Kim K-H, Shin JW, Lucente D, Wheeler VC, Li H, Roach JC, Hood L, Wexler NS, Jardim LB, Holmans P, Jones L, Orth M, Kwak S, MacDonald ME, Gusella JF, Lee J-M (2018) Population-specific genetic modification of Huntington\u2019s disease in Venezuela. PLOS Genet. https:\/\/doi.org\/10.1371\/journal.pgen.1007274","journal-title":"PLOS Genet"},{"issue":"3","key":"537_CR9","doi-asserted-by":"publisher","first-page":"431","DOI":"10.1002\/bimj.201700067","volume":"60","author":"G Heinze","year":"2018","unstructured":"Heinze G, Wallisch C, Dunkler D (2018) Variable selection\u2014A review and recommendations for the practicing statistician. Biom J 60(3):431\u2013449. https:\/\/doi.org\/10.1002\/bimj.201700067","journal-title":"Biom J"},{"issue":"3","key":"537_CR10","doi-asserted-by":"publisher","first-page":"651","DOI":"10.1198\/106186006X133933","volume":"15","author":"T Hothorn","year":"2006","unstructured":"Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15(3):651\u2013674. https:\/\/doi.org\/10.1198\/106186006X133933","journal-title":"J Comput Graph Stat"},{"issue":"3","key":"537_CR11","doi-asserted-by":"publisher","first-page":"841","DOI":"10.1214\/08-AOAS169","volume":"2","author":"H Ishwaran","year":"2008","unstructured":"Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2(3):841\u2013860. https:\/\/doi.org\/10.1214\/08-AOAS169","journal-title":"Ann Appl Stat"},{"key":"537_CR12","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1007\/s11634-019-00364-9","volume":"14","author":"Z Kahn","year":"2019","unstructured":"Kahn Z, Gul A, Perperoglou A, Miftahuddin M, Mahmoud O, Adler W, Lausen B (2019) Ensemble of optimal trees, random forest and random projection ensemble classification. Adv Data Anal Classif 14:97\u2013116. https:\/\/doi.org\/10.1007\/s11634-019-00364-9","journal-title":"Adv Data Anal Classif"},{"key":"537_CR13","doi-asserted-by":"publisher","first-page":"28591","DOI":"10.1109\/ACCESS.2021.3055992","volume":"9","author":"Z Kahn","year":"2021","unstructured":"Kahn Z, Gul N, Faiz N, Gul A, Adler W, Lausen B (2021) Optimal trees selection for classification via out-of-bag assessment and sub-baging. IEEE Access 9:28591\u201328607. https:\/\/doi.org\/10.1109\/ACCESS.2021.3055992","journal-title":"IEEE Access"},{"key":"537_CR14","doi-asserted-by":"publisher","unstructured":"K\u00f6nig G, Molna C, Bischl B, and Grosse-Wentrup M (2020). Realtive feature importance. Proceedings of the 2020 25th international conference on pattern recognition, 9318\u20139325 https:\/\/doi.org\/10.1007\/978-3-030-68787-8","DOI":"10.1007\/978-3-030-68787-8"},{"key":"537_CR15","doi-asserted-by":"publisher","first-page":"3216","DOI":"10.1038\/s41467-021-23491-4","volume":"12","author":"B-H Laabs","year":"2021","unstructured":"Laabs B-H, Klein C, Pozojevic J, Domingo A, Br\u00fcggemann N, Gr\u00fctz K, Rosales RL, Jamora RD, Saranza G, Diesta CCE, Schaake S, Dulovic-Mahlow M, Quismundo J, Otto P, Acuna P, Go C, Sharma N, Multhaupt-Buell T, M\u00fceller U, Hanssen H, Kilpert F, Rolfs A, Bauer P, Dobricic V, Lohmann K, Ozelius LJ, Kaiser FJ, K\u00f6nig IR, Westenberger A (2021) Identifying novel genetic modifiers of age-associated penetrance in X-linked dystonia-parkinsonism. Nat Commun 12:3216","journal-title":"Nat Commun"},{"key":"537_CR16","doi-asserted-by":"publisher","unstructured":"Lee LV, Rivera C, Teleg RA, Dantes MB, Pasco PMD, Jamora RDG, Arancillo J, Villareal-Jordan RF, Rosales RL, Demaisip C, Maranon E, Peralta O, Borres R, Tolentino C, Monding MJ, Sarcia S (2011) The unique phenomenology of sex-linked dystonia parkinsonism (XDP, DYT3, \u2019Lubag\u2019). Int J Neurosci 121(1):3\u201311. https:\/\/doi.org\/10.3109\/00207454.2010.526728","DOI":"10.3109\/00207454.2010.526728"},{"issue":"7","key":"537_CR17","doi-asserted-by":"publisher","first-page":"921","DOI":"10.1002\/mds.25791","volume":"29","author":"K Lohmann","year":"2014","unstructured":"Lohmann K, Schmidt A, Schillert A, Winkler S, Albanese A, Baas F, Bentivoglio AR, Borngr\u00e4ber F, Br\u00fcggemann N, Defazio G, Del Sorbo F, Deuschl G, Edwards MJ, Gasser T, G\u00f3mez-Garre P, Graf J, Groen JL, Gr\u00fcnewald A, Hagenah J, Hemmelmann C, Jabusch HC, Kaji R, Kasten M, Kawakami H, Kostic VS, Ligouri M, Mir P, M\u00fcnchau A, Ricchiuti F, Schreiber S, Siegesmund K, Svetel M, Tijssen MA, Valente EM, Westenberger A, Zeuner KE, Zittel KE, Altenm\u00fcller E, Ziegler A, Klein C (2014) Genome-wide association study in musician\u2019s dystonia: a risk variant at the arylsulfatase G locus? Move Disord 29(7):921\u2013927. https:\/\/doi.org\/10.1002\/mds.25791","journal-title":"Move Disord"},{"issue":"2","key":"537_CR18","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1002\/mds.25732","volume":"29","author":"KY Mok","year":"2014","unstructured":"Mok KY, Schneider SA, Trabzuni D, Stamelou M, Edwards M, Kasperaviciute D, Pickering-Brown S, Silverdale M, Hardy J, Bhatia KP (2014) Genomewide association study in cervical dystonia demonstrates possible association with sodium leak channel. Mov Disord 29(2):245\u2013251. https:\/\/doi.org\/10.1002\/mds.25732","journal-title":"Mov Disord"},{"key":"537_CR19","doi-asserted-by":"publisher","unstructured":"Molnar C, Freiesleben T, K\u00f6nig G, Casalicchio G, Wright MN, and Bischl B (2021) Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process. https:\/\/doi.org\/10.48550\/arXiv.2109.01433","DOI":"10.48550\/arXiv.2109.01433"},{"issue":"9","key":"537_CR20","doi-asserted-by":"publisher","first-page":"701","DOI":"10.1016\/S1474-4422(17)30161-8","volume":"16","author":"DJH Moss","year":"2017","unstructured":"Moss DJH, Langbehn D, Leavitt BR, Roos R, Durr A, Mead S, holmans P, Jones L, tabrizi SJ, TRACK-HD investigators, REGISTRY investigators (2017) Identification of genetic variants associated with Huntington\u2019s disease progression: a genome wide association study. Lancet Neurol 16(9):701\u2013711. https:\/\/doi.org\/10.1016\/S1474-4422(17)30161-8","journal-title":"Lancet Neurol"},{"key":"537_CR21","doi-asserted-by":"crossref","unstructured":"Nalls MA, Blauwendraat C, Vallerga CL, Heilbron K, Bandres-Ciga S, Chang D, Tan M, Kia DA, Noyce AJ, Xue A, Bras J, Young E, von Coelln R, Sim\u00f3n-S\u00e1nchez J, Schulte C, Sharma M, Krohn L, Pihlstrom L, Siitonen A, Iwaki H, Leonard H, Faghri F, Gibbs JR, Hernandez DG, Scholz SW, Botia JA, Martinez M, Corvol JC, Lesage S, Jankovic J, Shulman LM, Sutherland M, Tienari P, Majamaa K, Toft M, Andreassen OA, Bangale T, Brice A, Yang J, Gan-Or Z, Gasser T, Heutink P, Shulman JM, Wood NW, Hinds DA, Hardy JA, Morris HR, Gratten J, Visscher PM, Graham RR, Singleton AB; 23andMe Research Team; System Genomics of Parkinson\u2019s Disease Consortium; International Parkinson\u2019s Disease Genomics Consortium (2019) Identification of novel risk loci, causal insights, and heritable risk for Parkinson\u2019s disease: a meta-analysis of genome-wide association studies. Lancet Neurol 18(12):1091\u20131102. https:\/\/doi.org\/10.1016\/S1474-4422(19)30320-5","DOI":"10.1016\/S1474-4422(19)30320-5"},{"issue":"21","key":"537_CR22","doi-asserted-by":"publisher","first-page":"3711","DOI":"10.1093\/bioinformatics\/bty373","volume":"34","author":"S Nembrini","year":"2018","unstructured":"Nembrini S, K\u00f6nig IR, Wright MN (2018) The revival of the Gini importance? Bioinformatics 34(21):3711\u20133718. https:\/\/doi.org\/10.1093\/bioinformatics\/bty373","journal-title":"Bioinformatics"},{"issue":"7","key":"537_CR23","doi-asserted-by":"publisher","first-page":"1108","DOI":"10.1002\/mds.27441","volume":"33","author":"A Rakovic","year":"2018","unstructured":"Rakovic A, Domingo A, Gr\u00fctz K, Kulikovskaja L, Capetian P, Cowley SA, Lenz I, Br\u00fcggemann N, Rosales R, Jamora D, Rolfs A, Seibler P, Westenberger A, K\u00f6nig I, Klein C (2018) Genome editing in induced pluripotent stem cells rescues TAF1 levels in X-linked dystonia-parkinsonism. Move Disord 33(7):1108\u20131118. https:\/\/doi.org\/10.1002\/mds.27441","journal-title":"Move Disord"},{"key":"537_CR24","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1007\/s11634-020-00409-4","volume":"15","author":"T Salles","year":"2020","unstructured":"Salles T, Rocha L, Goncalves M (2020) A bias-variance analysis of state-of-the-art random forest text classifiers. Adv Data Anal Classif 15:379\u2013405. https:\/\/doi.org\/10.1007\/s11634-020-00409-4","journal-title":"Adv Data Anal Classif"},{"issue":"25","key":"537_CR25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-8-25","volume":"8","author":"C Strobl","year":"2007","unstructured":"Strobl C, Boulesteix A-L, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform 8(25):1\u201321. https:\/\/doi.org\/10.1186\/1471-2105-8-25","journal-title":"BMC Bioinform"},{"key":"537_CR26","doi-asserted-by":"publisher","first-page":"812","DOI":"10.1002\/ana.25488","volume":"85","author":"A Westenberger","year":"2019","unstructured":"Westenberger A, Reyes CJ, Saranza G, Dobricic V, Hanssen H, Domingo A, Laabs B-H, Schaake S, Pozojevic J, Rakovic A, Gr\u00fctz K, Begemann K, Walter U, Dressler D, Bauer P, Rolfs A, M\u00fcnchau A, Kaiser FJ, Ozelius LJ, Jamora RD, Rosales RL, Diesta CCE, Lohmann K, K\u00f6nig IR, Br\u00fcggemann N, Klein C (2019) A hexanucleotide repeat modifies expressivity of X-linked dystonia parkinsonism. Ann Neurol 85:812\u2013822. https:\/\/doi.org\/10.1002\/ana.25488","journal-title":"Ann Neurol"},{"key":"537_CR27","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v077.i01","volume":"77","author":"MN Wright","year":"2017","unstructured":"Wright MN, Ziegler A (2017) Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1\u201317. https:\/\/doi.org\/10.18637\/jss.v077.i01","journal-title":"J Stat Softw"},{"key":"537_CR28","doi-asserted-by":"publisher","first-page":"1539","DOI":"10.1136\/bmj.311.7019.1539","volume":"311","author":"JC Wyatt","year":"1995","unstructured":"Wyatt JC, Altman DG (1995) Prognostic models: clinically useful or quickly forgotten? BMJ 311:1539\u20131541. https:\/\/doi.org\/10.1136\/bmj.311.7019.1539","journal-title":"BMJ"}],"container-title":["Advances in Data Analysis and Classification"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11634-023-00537-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11634-023-00537-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11634-023-00537-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,19]],"date-time":"2024-06-19T04:19:45Z","timestamp":1718770785000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11634-023-00537-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,16]]},"references-count":28,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,6]]}},"alternative-id":["537"],"URL":"https:\/\/doi.org\/10.1007\/s11634-023-00537-7","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.05.15.492004","asserted-by":"object"}]},"ISSN":["1862-5347","1862-5355"],"issn-type":[{"value":"1862-5347","type":"print"},{"value":"1862-5355","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,16]]},"assertion":[{"value":"11 April 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 December 2022","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 February 2023","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 March 2023","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have declared no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}