{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T08:15:48Z","timestamp":1760170548012},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,3,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: High-dimensional data are frequently generated in genome-wide association studies (GWAS) and other studies. It is important to identify features such as single nucleotide polymorphisms (SNPs) in GWAS that are associated with a disease. Random forests represent a very useful approach for this purpose, using a variable importance score. This importance score has several shortcomings. We propose an alternative importance measure to overcome those shortcomings.<\/jats:p>\n               <jats:p>Results: We characterized the effect of multiple SNPs under various models using our proposed importance measure in random forests, which uses maximal conditional chi-square (MCC) as a measure of association between a SNP and the trait conditional on other SNPs. Based on this importance measure, we employed a permutation test to estimate empirical P-values of SNPs. Our method was compared to a univariate test and the permutation test using the Gini and permutation importance. In simulation, the proposed method performed consistently superior to the other methods in identifying of risk SNPs. In a GWAS of age-related macular degeneration, the proposed method confirmed two significant SNPs (at the genome-wide adjusted level of 0.05). Further analysis showed that these two SNPs conformed with a heterogeneity model. Compared with the existing importance measures, the MCC importance measure is more sensitive to complex effects of risk SNPs by utilizing conditional information on different SNPs. The permutation test with the MCC importance measure provides an efficient way to identify candidate SNPs in GWAS and facilitates the understanding of the etiology between genetic variants and complex diseases.<\/jats:p>\n               <jats:p>Contact: \u00a0heping.zhang@yale.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq038","type":"journal-article","created":{"date-parts":[[2010,2,4]],"date-time":"2010-02-04T01:55:22Z","timestamp":1265248522000},"page":"831-837","source":"Crossref","is-referenced-by-count":28,"title":["Maximal conditional chi-square importance in random forests"],"prefix":"10.1093","volume":"26","author":[{"given":"Minghui","family":"Wang","sequence":"first","affiliation":[{"name":"Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520-8034, USA"}]},{"given":"Xiang","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520-8034, USA"}]},{"given":"Heping","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520-8034, USA"}]}],"member":"286","published-online":{"date-parts":[[2010,2,3]]},"reference":[{"key":"2023012508002369900_B1","doi-asserted-by":"crossref","first-page":"2010","DOI":"10.1093\/bioinformatics\/btn356","article-title":"Enriched random forests","volume":"24","author":"Amaratunga","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012508002369900_B2","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Machine Learn."},{"key":"2023012508002369900_B3","author":"Breiman","year":"2002","journal-title":"Manual On Setting Up, Using, And Understanding Random Forests V3.1."},{"key":"2023012508002369900_B4","volume-title":"Classification and Regression Trees.","author":"Breiman","year":"1984"},{"key":"2023012508002369900_B5","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1002\/gepi.20041","article-title":"Identifying SNPs predictive of phenotype using random forests","volume":"28","author":"Bureau","year":"2005","journal-title":"Genet. Epidemiol."},{"key":"2023012508002369900_B6","doi-asserted-by":"crossref","first-page":"19199","DOI":"10.1073\/pnas.0709868104","article-title":"A forest-based approach to identifying gene and gene gene interactions","volume":"104","author":"Chen","year":"2007","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508002369900_B7","doi-asserted-by":"crossref","first-page":"362","DOI":"10.1126\/science.1111655","article-title":"Genetics. Was the Human Genome Project worth the effort?","volume":"308","author":"Daiger","year":"2005","journal-title":"Science"},{"key":"2023012508002369900_B8","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/1471-2105-7-3","article-title":"Gene selection and classification of microarray data using random forest","volume":"7","author":"Diaz-Uriarte","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023012508002369900_B9","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1126\/science.1110189","article-title":"Complement factor H polymorphism and age-related macular degeneration","volume":"308","author":"Edwards","year":"2005","journal-title":"Science"},{"key":"2023012508002369900_B10","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: a gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"2023012508002369900_B11","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1126\/science.1110359","article-title":"Complement factor H variant increases the risk of age-related macular degeneration","volume":"308","author":"Haines","year":"2005","journal-title":"Science"},{"key":"2023012508002369900_B12","doi-asserted-by":"crossref","first-page":"1491","DOI":"10.1126\/science.1142842","article-title":"A common variant on chromosome 9p21 affects the risk of myocardial infarction","volume":"316","author":"Helgadottir","year":"2007","journal-title":"Science"},{"issue":"Suppl. 1","key":"2023012508002369900_B13","doi-asserted-by":"crossref","first-page":"S65","DOI":"10.1186\/1471-2105-10-S1-S65","article-title":"A random forest approach to the detection of epistatic interactions in case-control studies","volume":"10","author":"Jiang","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012508002369900_B14","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1126\/science.1109557","article-title":"Complement factor H polymorphism in age-related macular degeneration","volume":"308","author":"Klein","year":"2005","journal-title":"Science"},{"key":"2023012508002369900_B15","doi-asserted-by":"crossref","first-page":"1049","DOI":"10.1038\/ng1871","article-title":"CFH haplotypes without the Y402H coding variant show strong association with susceptibility to age-related macular degeneration","volume":"38","author":"Li","year":"2006","journal-title":"Nat. Genet."},{"key":"2023012508002369900_B16","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1186\/1471-2156-5-32","article-title":"Screening large-scale association study data: exploiting interactions using random forests","volume":"5","author":"Lunetta","year":"2004","journal-title":"BMC Genet."},{"key":"2023012508002369900_B17","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1126\/science.314.5798.405a","article-title":"Gene offers insight into macular degeneration","volume":"314","author":"Marx","year":"2006","journal-title":"Science"},{"key":"2023012508002369900_B18","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1007\/s00439-009-0667-0","article-title":"The influence of carnosinase gene polymorphisms on diabetic nephropathy risk in African-Americans","volume":"126","author":"McDonough","year":"2009","journal-title":"Hum. Genet."},{"key":"2023012508002369900_B19","doi-asserted-by":"crossref","first-page":"1488","DOI":"10.1126\/science.1142447","article-title":"A common allele on chromosome 9 associated with coronary heart disease","volume":"316","author":"McPherson","year":"2007","journal-title":"Science"},{"key":"2023012508002369900_B20","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1186\/1471-2105-10-78","article-title":"Performance of random forest when SNPs are in linkage disequilibrium","volume":"10","author":"Meng","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012508002369900_B21","doi-asserted-by":"crossref","first-page":"3312","DOI":"10.1167\/iovs.07-1517","article-title":"Multiple gene polymorphisms in the complement factor h gene are associated with exudative age-related macular degeneration in Chinese","volume":"49","author":"Ng","year":"2008","journal-title":"Invest. Ophthalmol. Vis. Sci."},{"key":"2023012508002369900_B22","first-page":"222","article-title":"Linkage strategies for genetically complex traits. I. Multilocus models","volume":"46","author":"Risch","year":"1990","journal-title":"Am. J. Hum. Genet."},{"key":"2023012508002369900_B23","first-page":"229","article-title":"Linkage strategies for genetically complex traits. II. The power of affected relative pairs","volume":"46","author":"Risch","year":"1990","journal-title":"Am. J. Hum. Genet."},{"key":"2023012508002369900_B24","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1152\/physiolgenomics.00167.2007","article-title":"A framework to identify physiological responses in microarray-based gene expression studies: selection and interpretation of biologically relevant genes","volume":"33","author":"Rodenburg","year":"2008","journal-title":"Physiol. Genomics"},{"key":"2023012508002369900_B25","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1056\/NEJMoa072366","article-title":"Genomewide association analysis of coronary artery disease","volume":"357","author":"Samani","year":"2007","journal-title":"N. Engl. J. Med."},{"key":"2023012508002369900_B26","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1186\/1471-2105-10-336","article-title":"A permutation-based multiple testing method for time-course microarray experiments","volume":"10","author":"Sohn","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012508002369900_B27","doi-asserted-by":"crossref","first-page":"S69","DOI":"10.1186\/1753-6561-3-S7-S69","article-title":"Detecting significant SNPs in rheumatoid arthritis study with random forests","volume":"3","author":"Wang","year":"2009","journal-title":"BMC Proc."},{"issue":"Suppl. 1","key":"2023012508002369900_B28","doi-asserted-by":"crossref","first-page":"S135","DOI":"10.1186\/1471-2156-6-S1-S135","article-title":"A genome-wide tree- and forest-based association analysis of comorbidity of alcoholism and smoking","volume":"6","author":"Ye","year":"2005","journal-title":"BMC Genet."},{"key":"2023012508002369900_B29","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1002\/1098-2272(200012)19:4<323::AID-GEPI4>3.0.CO;2-5","article-title":"Use of classification trees for association studies","volume":"19","author":"Zhang","year":"2000","journal-title":"Genet. Epidemiol."},{"key":"2023012508002369900_B30","doi-asserted-by":"crossref","first-page":"169","DOI":"10.4310\/SII.2008.v1.n1.a14","article-title":"A tree-based method for modeling a multivariate ordinal response","volume":"1","author":"Zhang","year":"2008","journal-title":"Stat. Interface"},{"key":"2023012508002369900_B31","doi-asserted-by":"crossref","first-page":"4168","DOI":"10.1073\/pnas.0230559100","article-title":"Cell and tumor classification using gene expression data: construction of forests","volume":"100","author":"Zhang","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508002369900_B32","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1186\/1471-2350-9-51","article-title":"The NEI\/NCBI dbGAP database: genotypes and haplotypes that may specifically predispose to risk of neovascular age-related macular degeneration","volume":"9","author":"Zhang","year":"2008","journal-title":"BMC Med. Genet."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/6\/831\/48853458\/bioinformatics_26_6_831.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/6\/831\/48853458\/bioinformatics_26_6_831.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T08:01:04Z","timestamp":1674633664000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/6\/831\/245050"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,2,3]]},"references-count":32,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2010,3,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq038","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,3,15]]},"published":{"date-parts":[[2010,2,3]]}}}