{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T01:55:11Z","timestamp":1774922111298,"version":"3.50.1"},"reference-count":21,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2019,3,2]],"date-time":"2019-03-02T00:00:00Z","timestamp":1551484800000},"content-version":"vor","delay-in-days":1,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"German Federal Ministry of Education and Research"},{"DOI":"10.13039\/501100002347","name":"BMBF","doi-asserted-by":"publisher","award":["01Zx1510"],"award-info":[{"award-number":["01Zx1510"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>It has been shown that the machine learning approach random forest can be successfully applied to omics data, such as gene expression data, for classification or regression and to select variables that are important for prediction. However, the complex relationships between predictor variables, in particular between causal predictor variables, make the interpretation of currently applied variable selection techniques difficult.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Here we propose a new variable selection approach called surrogate minimal depth (SMD) that incorporates surrogate variables into the concept of minimal depth (MD) variable importance. Applying SMD, we show that simulated correlation patterns can be reconstructed and that the increased consideration of variable relationships improves variable selection. When compared with existing state-of-the-art methods and MD, SMD has higher empirical power to identify causal variables while the resulting variable lists are equally stable. In conclusion, SMD is a promising approach to get more insight into the complex interplay of predictor variables and outcome in a high-dimensional data setting.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>https:\/\/github.com\/StephanSeifert\/SurrogateMinimalDepth.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz149","type":"journal-article","created":{"date-parts":[[2019,2,27]],"date-time":"2019-02-27T00:59:51Z","timestamp":1551229191000},"page":"3663-3671","source":"Crossref","is-referenced-by-count":36,"title":["Surrogate minimal depth as an importance measure for variables in random forests"],"prefix":"10.1093","volume":"35","author":[{"given":"Stephan","family":"Seifert","sequence":"first","affiliation":[{"name":"Institute of Medical Informatics and Statistics, Kiel University, University Hospital Schleswig-Holstein , Kiel, ermany"}]},{"given":"Sven","family":"Gundlach","sequence":"additional","affiliation":[{"name":"Institute of Medical Informatics and Statistics, Kiel University, University Hospital Schleswig-Holstein , Kiel, ermany"}]},{"given":"Silke","family":"Szymczak","sequence":"additional","affiliation":[{"name":"Institute of Medical Informatics and Statistics, Kiel University, University Hospital Schleswig-Holstein , Kiel, ermany"}]}],"member":"286","published-online":{"date-parts":[[2019,3,1]]},"reference":[{"key":"2023013108125564700_btz149-B1","doi-asserted-by":"crossref","first-page":"377.","DOI":"10.1515\/hmbci-2012-0025","article-title":"Co-expression of genes with estrogen receptor-\u03b1 and progesterone receptor in human breast carcinoma tissue","volume":"12","author":"Andres","year":"2012","journal-title":"Horm. Mol. Biol. Clin. Investig"},{"key":"2023013108125564700_btz149-B2","first-page":"140","volume-title":"Classification and Regression Trees","author":"Breiman","year":"1984"},{"key":"2023013108125564700_btz149-B3","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn"},{"key":"2023013108125564700_btz149-B4","article-title":"Evaluation of variable selection methods for random forests and omics data sets","author":"Degenhardt","year":"2017","journal-title":"Brief. Bioinform"},{"key":"2023013108125564700_btz149-B5","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1016\/j.compbiolchem.2010.07.002","article-title":"Stable feature selection for biomarker discovery","volume":"34","author":"He","year":"2010","journal-title":"Comput. Biol. Chem"},{"key":"2023013108125564700_btz149-B6","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1586\/14737159.2016.1164601","article-title":"Omics for personalized medicine: defining the current we swim in","volume":"16","author":"Ibrahim","year":"2016","journal-title":"Expert Rev. Mol. Diagn"},{"key":"2023013108125564700_btz149-B7","doi-asserted-by":"crossref","first-page":"519","DOI":"10.1214\/07-EJS039","article-title":"Variable importance in binary regression trees and forests","volume":"1","author":"Ishwaran","year":"2007","journal-title":"Electron. J. Stat"},{"key":"2023013108125564700_btz149-B8","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1198\/jasa.2009.tm08622","article-title":"High-dimensional variable selection for survival data","volume":"105","author":"Ishwaran","year":"2010","journal-title":"J. Am. Stat. Assoc"},{"key":"2023013108125564700_btz149-B9","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1002\/sam.10103","article-title":"Random survival forests for high-dimensional data","volume":"4","author":"Ishwaran","year":"2011","journal-title":"Stat. Anal. Data Min"},{"key":"2023013108125564700_btz149-B10","doi-asserted-by":"crossref","first-page":"885","DOI":"10.1007\/s11634-016-0276-4","article-title":"A computationally fast variable importance test for random forests for high-dimensional data","volume":"4","author":"Janitza","year":"2018","journal-title":"Adv. Data Anal Classif"},{"key":"2023013108125564700_btz149-B11","doi-asserted-by":"crossref","first-page":"4237","DOI":"10.1098\/rsta.2009.0159","article-title":"Statistical challenges of high-dimensional data","volume":"367","author":"Johnstone","year":"2009","journal-title":"Philos. Trans. Royal Soc. A"},{"key":"2023013108125564700_btz149-B12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v036.i11","article-title":"Feature selection with the Boruta package","volume":"36","author":"Kursa","year":"2010","journal-title":"J. Stat Softw"},{"key":"2023013108125564700_btz149-B13","doi-asserted-by":"crossref","first-page":"559.","DOI":"10.1186\/1471-2105-9-559","article-title":"WGCNA: an R package for weighted correlation network analysis","volume":"9","author":"Langfelder","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023013108125564700_btz149-B14","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/bty373","article-title":"The revival of the Gini importance?","author":"Nembrini","year":"2018","journal-title":"Bioinformatics"},{"key":"2023013108125564700_btz149-B15","doi-asserted-by":"crossref","first-page":"61.","DOI":"10.1038\/nature11412","article-title":"Comprehensive molecular portraits of human breast tumours","volume":"490","author":"Network","year":"2012","journal-title":"Nature"},{"key":"2023013108125564700_btz149-B16","doi-asserted-by":"crossref","first-page":"110.","DOI":"10.1186\/1471-2105-11-110","article-title":"The behaviour of random forest permutation-based variable importance measures under predictor correlation","volume":"11","author":"Nicodemus","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023013108125564700_btz149-B17","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1186\/1471-2105-8-25","article-title":"Bias in random forest variable importance measures: illustrations, sources and a solution","volume":"8","author":"Strobl","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023013108125564700_btz149-B18","doi-asserted-by":"crossref","first-page":"307.","DOI":"10.1186\/1471-2105-9-307","article-title":"Conditional variable importance for random forests","volume":"9","author":"Strobl","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023013108125564700_btz149-B19","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1037\/a0016973","article-title":"An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests","volume":"14","author":"Strobl","year":"2009","journal-title":"Psychol. Methods"},{"key":"2023013108125564700_btz149-B20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v077.i01","article-title":"ranger: A Fast Implementation of Random forests for high dimensional data in C++ and R","volume":"77","author":"Wright","year":"2017","journal-title":"J Stat Softw"},{"key":"2023013108125564700_btz149-B21","first-page":"44","article-title":"Simulating gene expression data to estimate sample size for class and biomarker discovery","volume":"4","author":"Zhang","year":"2012","journal-title":"Int. J. Adv. Life Sci"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz149\/28964000\/btz149.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/19\/3663\/48975887\/bioinformatics_35_19_3663.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/19\/3663\/48975887\/bioinformatics_35_19_3663.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T04:32:28Z","timestamp":1721017948000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/19\/3663\/5368013"}},"subtitle":[],"editor":[{"given":"Janet","family":"Kelso","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,3,1]]},"references-count":21,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2019,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz149","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,10,1]]},"published":{"date-parts":[[2019,3,1]]}}}