{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,15]],"date-time":"2025-04-15T16:26:10Z","timestamp":1744734370525},"reference-count":24,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2019,10,17]],"date-time":"2019-10-17T00:00:00Z","timestamp":1571270400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Matched case\u2013control analysis is widely used in biomedical studies to identify exposure variables associated with health conditions. The matching is used to improve the efficiency. Existing variable selection methods for matched case\u2013control studies are challenged in high-dimensional settings where interactions among variables are also important. We describe a quite different method for high-dimensional matched case\u2013control data, based on the potential outcome model, which is not only flexible regarding the number of matching and exposure variables but also able to detect interaction effects.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We present Matched Forest (MF), an algorithm for variable selection in matched case\u2013control data. The method preserves the case and control values in each instance but transforms the matched case\u2013control data with added counterfactuals. A modified variable importance score from a supervised learner is used to detect important variables. The method is conceptually simple and can be applied with widely available software tools. Simulation studies show the effectiveness of MF in identifying important variables. MF is also applied to data from the biomedical domain and its performance is compared with alternative approaches.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>R code for implementing MF is available at https:\/\/github.com\/NooshinSh\/Matched_Forest.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz785","type":"journal-article","created":{"date-parts":[[2019,10,15]],"date-time":"2019-10-15T22:27:58Z","timestamp":1571178478000},"page":"1570-1576","source":"Crossref","is-referenced-by-count":3,"title":["Matched Forest: supervised learning for high-dimensional matched case\u2013control studies"],"prefix":"10.1093","volume":"36","author":[{"given":"Nooshin","family":"Shomal Zadeh","sequence":"first","affiliation":[{"name":"School of Computing , Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ 85281, USA"}]},{"given":"Sangdi","family":"Lin","sequence":"additional","affiliation":[{"name":"Zillow Group , Seattle, WA 98101, USA"}]},{"given":"George C","family":"Runger","sequence":"additional","affiliation":[{"name":"School of Computing , Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ 85281, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,10,17]]},"reference":[{"key":"2023060910381781500_btz785-B1","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1198\/jcgs.2009.07118","article-title":"Boosting for correlated binary classification","volume":"19","author":"Adewale","year":"2010","journal-title":"J. Comput. Graph. Stat"},{"key":"2023060910381781500_btz785-B2","doi-asserted-by":"crossref","DOI":"10.1515\/ijb-2016-0043","article-title":"Bayesian variable selection methods for matched case-control studies","volume":"13","author":"Asafu-Adjei","year":"2017","journal-title":"Int. J. Biostat"},{"key":"2023060910381781500_btz785-B3","doi-asserted-by":"crossref","first-page":"639","DOI":"10.1111\/rssc.12056","article-title":"Variable importance in matched case\u2013control studies in settings of high dimensional data","volume":"63","author":"Balasubramanian","year":"2014","journal-title":"J. R. Stat. Soc"},{"key":"2023060910381781500_btz785-B4","doi-asserted-by":"crossref","first-page":"711","DOI":"10.1182\/blood-2006-02-002824","article-title":"Biologic pathways associated with relapse in childhood acute lymphoblastic leukemia: a children\u2019s oncology group study","volume":"108","author":"Bhojwani","year":"2006","journal-title":"Blood"},{"key":"2023060910381781500_btz785-B5","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn"},{"key":"2023060910381781500_btz785-B6","volume-title":"UCI Machine Learning Repository","author":"Dua","year":"2019"},{"key":"2023060910381781500_btz785-B7","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-41259-7","volume-title":"Statistical Causal Inferences and Their Applications in Public Health Research","author":"He","year":"2016"},{"key":"2023060910381781500_btz785-B8","doi-asserted-by":"crossref","first-page":"904","DOI":"10.1093\/bioinformatics\/btn650","article-title":"Matching methods for observational microarray studies","volume":"25","author":"Heller","year":"2009","journal-title":"Bioinformatics"},{"key":"2023060910381781500_btz785-B9","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1093\/pan\/mpl013","article-title":"Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference","volume":"15","author":"Ho","year":"2007","journal-title":"Polit. Anal"},{"key":"2023060910381781500_btz785-B10","doi-asserted-by":"crossref","DOI":"10.1002\/0471722146","volume-title":"Applied Logistic Regression","author":"Hosmer","year":"2000"},{"key":"2023060910381781500_btz785-B11","author":"Keogh","year":"2017"},{"key":"2023060910381781500_btz785-B85","first-page":"18","article-title":"Classification and regression by random forest","volume":"2","author":"Liaw","year":"2002","journal-title":"R News"},{"key":"2023060910381781500_btz785-B12","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1016\/j.trstmh.2003.10.009","article-title":"Severe malaria attack is associated with high prevalence of Ascaris lumbricoides infection among children in rural Senegal","volume":"98","author":"Le Hesran","year":"2004","journal-title":"Trans. R. Soc. Trop. Med. Hyg"},{"key":"2023060910381781500_btz785-B14","first-page":"465","article-title":"On the application of probability theory to agricultural experiments. Essay on principles. Section 9","volume":"5","author":"Neyman","year":"1923","journal-title":"Stat. Sci"},{"key":"2023060910381781500_btz785-B15","doi-asserted-by":"crossref","first-page":"1307","DOI":"10.1086\/514340","article-title":"Risk factors, clinical characteristics, and outcome of Nocardia infection in organ transplant recipients: a matched case-control study","volume":"44","author":"Peleg","year":"2007","journal-title":"Clin. Infect. Dis"},{"key":"2023060910381781500_btz785-B16","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1111\/biom.12113","article-title":"Variable selection and prediction using a nested, matched case-control study: application to hospital acquired pneumonia in stroke patients","volume":"70","author":"Qian","year":"2014","journal-title":"Biometrics"},{"key":"2023060910381781500_btz785-B17","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.2202\/1557-4679.1127","article-title":"Why match? Investigating matched case-control study designs with causal effect estimations","volume":"5","author":"Rose","year":"2009","journal-title":"Int. J. Biostat"},{"key":"2023060910381781500_btz785-B18","volume-title":"Modern Epidemiology","author":"Rothman","year":"2008"},{"key":"2023060910381781500_btz785-B19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.3102\/10769986002001001","article-title":"Assignment to treatment group on the basis of a covariate","volume":"2","author":"Rubin","year":"1977","journal-title":"J. Educ. Stat"},{"key":"2023060910381781500_btz785-B20","author":"Strobl","year":"2008"},{"key":"2023060910381781500_btz785-B21","doi-asserted-by":"crossref","first-page":"224","DOI":"10.2478\/v10001-006-0032-7","article-title":"Use of generalized linear mixed models to examine the association between air pollution and health outcomes","volume":"19","author":"Szyszkowicz","year":"2006","journal-title":"Int. J. Occup. Med. Environ. Health"},{"key":"2023060910381781500_btz785-B22","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1177\/117693510700300025","article-title":"Feature selection for predicting tumor metastases in microarray experiments using paired design","volume":"3","author":"Tan","year":"2007","journal-title":"Cancer Inform"},{"key":"2023060910381781500_btz785-B23","doi-asserted-by":"crossref","first-page":"70.","DOI":"10.1186\/1476-4598-6-70","article-title":"Identification of a panel of sensitive and specific DNA methylation markers for lung adenocarcinoma","volume":"6","author":"Tsou","year":"2007","journal-title":"Mol. Cancer"},{"key":"2023060910381781500_btz785-B24","volume-title":"Proceedings of the 24th Annual SAS User\u2019s Group International Conference","author":"Vierkant","year":"1999"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz785\/30700201\/btz785.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/5\/1570\/50553041\/bioinformatics_36_5_1570.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/5\/1570\/50553041\/bioinformatics_36_5_1570.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T10:39:13Z","timestamp":1686307153000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/5\/1570\/5588842"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,10,17]]},"references-count":24,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2020,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz785","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,3]]},"published":{"date-parts":[[2019,10,17]]}}}