{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T10:10:52Z","timestamp":1760955052343,"version":"3.34.0"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"14","license":[{"start":{"date-parts":[[2016,10,1]],"date-time":"2016-10-01T00:00:00Z","timestamp":1475280000000},"content-version":"vor","delay-in-days":3055,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: We developed an EM-random forest (EMRF) for Haseman\u2013Elston quantitative trait linkage analysis that accounts for marker ambiguity and weighs each sib-pair according to the posterior identical by descent (IBD) distribution. The usual random forest (RF) variable importance (VI) index used to rank markers for variable selection is not optimal when applied to linkage data because of correlation between markers. We define new VI indices that borrow information from linked markers using the correlation structure inherent in IBD linkage data.<\/jats:p><jats:p>Results: Using simulations, we find that the new VI indices in EMRF performed better than the original RF VI index and performed similarly or better than EM-Haseman\u2013Elston regression LOD score for various genetic models. Moreover, tree size and markers subset size evaluated at each node are important considerations in RFs.<\/jats:p><jats:p>Availability: The source code for EMRF written in C is available at www.infornomics.utoronto.ca\/downloads\/EMRF<\/jats:p><jats:p>Contact: \u00a0bull@mshri.on.ca<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at www.infornomics.utoronto.ca\/downloads\/EMRF<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn239","type":"journal-article","created":{"date-parts":[[2008,5,23]],"date-time":"2008-05-23T00:24:59Z","timestamp":1211502299000},"page":"1603-1610","source":"Crossref","is-referenced-by-count":16,"title":["EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis"],"prefix":"10.1093","volume":"24","author":[{"given":"Sophia S. F.","family":"Lee","sequence":"first","affiliation":[{"name":"1 Department of Public Health Sciences, University of Toronto, Toronto M5T 3M7, 2Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto M5G 1X5 and 3Genetics and Genomic Biology, The Hospital for Sick Children Research Institute, Toronto M5G 1L7, Canada"},{"name":"1 Department of Public Health Sciences, University of Toronto, Toronto M5T 3M7, 2Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto M5G 1X5 and 3Genetics and Genomic Biology, The Hospital for Sick Children Research Institute, Toronto M5G 1L7, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lei","family":"Sun","sequence":"additional","affiliation":[{"name":"1 Department of Public Health Sciences, University of Toronto, Toronto M5T 3M7, 2Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto M5G 1X5 and 3Genetics and Genomic Biology, The Hospital for Sick Children Research Institute, Toronto M5G 1L7, Canada"},{"name":"1 Department of Public Health Sciences, University of Toronto, Toronto M5T 3M7, 2Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto M5G 1X5 and 3Genetics and Genomic Biology, The Hospital for Sick Children Research Institute, Toronto M5G 1L7, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rafal","family":"Kustra","sequence":"additional","affiliation":[{"name":"1 Department of Public Health Sciences, University of Toronto, Toronto M5T 3M7, 2Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto M5G 1X5 and 3Genetics and Genomic Biology, The Hospital for Sick Children Research Institute, Toronto M5G 1L7, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shelley B.","family":"Bull","sequence":"additional","affiliation":[{"name":"1 Department of Public Health Sciences, University of Toronto, Toronto M5T 3M7, 2Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto M5G 1X5 and 3Genetics and Genomic Biology, The Hospital for Sick Children Research Institute, Toronto M5G 1L7, Canada"},{"name":"1 Department of Public Health Sciences, University of Toronto, Toronto M5T 3M7, 2Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto M5G 1X5 and 3Genetics and Genomic Biology, The Hospital for Sick Children Research Institute, Toronto M5G 1L7, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2008,5,21]]},"reference":[{"key":"2023020210483271800_B1","doi-asserted-by":"crossref","first-page":"2350","DOI":"10.1214\/aos\/1032181158","article-title":"Heuristics of instability and stabilization in model selection","volume":"24","author":"Breiman","year":"1996","journal-title":"Ann. Stat"},{"key":"2023020210483271800_B2","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/BF00058655","article-title":"Bagging predictors","volume":"24","author":"Breiman","year":"1996","journal-title":"Mach. Learn"},{"key":"2023020210483271800_B3","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn"},{"key":"2023020210483271800_B4","doi-asserted-by":"crossref","first-page":"S19","DOI":"10.1186\/1471-2156-4-S1-S19","article-title":"Multilevel modeling for the analysis of longitudinal blood pressure data in the Framingham heart study pedigrees","volume":"4","author":"Briollais","year":"2003","journal-title":"BMC Genet"},{"key":"2023020210483271800_B5","doi-asserted-by":"crossref","first-page":"S64","DOI":"10.1186\/1471-2156-4-S1-S64","article-title":"Mapping complex traits using random forests","volume":"4","author":"Bureau","year":"2003","journal-title":"BMC Genet"},{"key":"2023020210483271800_B6","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1002\/gepi.20041","article-title":"Identifying SNPs predictive of phenotype using random forests","volume":"28","author":"Bureau","year":"2005","journal-title":"Genet. Epidemiol"},{"key":"2023020210483271800_B7","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1002\/gepi.10315","article-title":"Quantitative trait linkage analysis by generalized estimating equations: unification of variance components and Haseman-Elston regression","volume":"26","author":"Chen","year":"2004","journal-title":"Genet. Epid"},{"key":"2023020210483271800_B8","doi-asserted-by":"crossref","first-page":"963","DOI":"10.1093\/genetics\/138.3.963","article-title":"Empirical threshold values for quantitative trait mapping","volume":"138","author":"Churchill","year":"1994","journal-title":"Genetics"},{"key":"2023020210483271800_B9","doi-asserted-by":"crossref","first-page":"279","DOI":"10.2105\/AJPH.41.3.279","article-title":"Epidemiological approaches to heart disease: the Framingham study","volume":"41","author":"Dawber","year":"1951","journal-title":"Am. J. Public Health"},{"key":"2023020210483271800_B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc"},{"key":"2023020210483271800_B11","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1086\/302189","article-title":"A simulation study of the effects of assignment of prior identity-by-descent probabilities to unselected sib pairs, in covariance-structure modeling of a quantitative-trait locus","volume":"64","author":"Dolan","year":"1999","journal-title":"Am. J. Hum. Genet"},{"key":"2023020210483271800_B12","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1023\/A:1021687817609","article-title":"A note on the power provided by sibships of sizes 2, 3, and 4 in genetic covariance modeling of a codominant QTL","volume":"29","author":"Dolan","year":"1999","journal-title":"Behav. Genet"},{"key":"2023020210483271800_B13","doi-asserted-by":"crossref","first-page":"523","DOI":"10.1159\/000152448","article-title":"A general model for the genetic analysis of pedigree data","volume":"21","author":"Elston","year":"1971","journal-title":"Hum. Hered"},{"key":"2023020210483271800_B14","volume-title":"Introduction to Quantitative Genetics","author":"Falconer","year":"1989","edition":"3rd edn"},{"key":"2023020210483271800_B15","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: a gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat"},{"key":"2023020210483271800_B16","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1006\/tpbi.1996.0003","article-title":"Epistasis and pleiotropy as natural properties of transcriptional regulation","volume":"49","author":"Gibson","year":"1996","journal-title":"Theor. Popul. Biol"},{"key":"2023020210483271800_B17","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/BF01066731","article-title":"The investigation of linkage between a quantitative trait and a marker locus","volume":"2","author":"Haseman","year":"1972","journal-title":"Behav. Genet"},{"key":"2023020210483271800_B18","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1196\/annals.1310.015","article-title":"Application of the random forest classification algorithm to a SELDI-TOF proteomics study in the setting of a cancer prevention trial","volume":"1020","author":"Izmirlian","year":"2004","journal-title":"Ann. N. Y. Acad. Sci"},{"key":"2023020210483271800_B19","first-page":"439","article-title":"Complete multipoint sib-pair analysis of qualitative and quantitative traits","volume":"57","author":"Kruglyak","year":"1995","journal-title":"Am. J. Hum. Genet"},{"key":"2023020210483271800_B20","first-page":"1347","article-title":"Parametric and nonparametric linkage analysis: a unified multipoint approach","volume":"58","author":"Kruglyak","year":"1996","journal-title":"Am. J. Hum. Genet"},{"key":"2023020210483271800_B21","doi-asserted-by":"crossref","first-page":"2363","DOI":"10.1073\/pnas.84.8.2363","article-title":"Construction of multilocus genetic linkage maps in humans","volume":"84","author":"Lander","year":"1987","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020210483271800_B22","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1161\/01.HYP.36.4.477","article-title":"Evidence for a gene influencing blood pressure on chromosome 17. Genome scan linkage results for longitudinal blood pressure phenotypes in subjects from the Framingham Heart Study","volume":"36","author":"Levy","year":"2000","journal-title":"Hypertension"},{"key":"2023020210483271800_B23","first-page":"18","article-title":"Classification and regression by randomForest","volume":"2","author":"Liaw","year":"2002","journal-title":"R News"},{"key":"2023020210483271800_B24","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1186\/1471-2156-5-32","article-title":"Screening large-scale association study data: exploiting interactions using random forests","volume":"10","author":"Lunetta","year":"2004","journal-title":"BMC Genet"},{"key":"2023020210483271800_B25","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1159\/000073735","article-title":"The ubiquitous nature of epistasis in determining susceptibility to common human diseases","volume":"56","author":"Moore","year":"2003","journal-title":"Hum. Hered"},{"key":"2023020210483271800_B26","doi-asserted-by":"crossref","DOI":"10.56021\/9780801861406","volume-title":"Analysis of Human Genetic Linkage","author":"Ott","year":"1999","edition":"3rd edn"},{"article-title":"R: A language and environment for statistical computing","year":"2008","author":"R Development Core Team","key":"2023020210483271800_B27"},{"key":"2023020210483271800_B28","first-page":"1306","article-title":"Extended multipoint identity-by-descent analysis of human quantitative traits: efficiency, power, and modeling considerations","volume":"53","author":"Schork","year":"1993","journal-title":"Am. J. Hum. Genet"},{"key":"2023020210483271800_B29","doi-asserted-by":"crossref","first-page":"2","DOI":"10.2202\/1544-6115.1031","article-title":"Relating HIV-1 sequence variation to replication capacity via trees and forests","volume":"3","author":"Segal","year":"2004","journal-title":"Stat. Appl. Genet. Mol. Biol"},{"key":"2023020210483271800_B30","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1038\/modpathol.3800322","article-title":"Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma","volume":"18","author":"Shi","year":"2005","journal-title":"Mod. Pathol"},{"key":"2023020210483271800_B31","doi-asserted-by":"crossref","first-page":"3940","DOI":"10.1093\/bioinformatics\/bti623","article-title":"ROCR: visualizing classifier performance in R","volume":"21","author":"Sing","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020210483271800_B32","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1002\/gepi.20075","article-title":"Two-level Haseman-Elston regression for general pedigree data analysis","volume":"29","author":"Wang","year":"2005","journal-title":"Genet. Epidemiol"},{"key":"2023020210483271800_B33","doi-asserted-by":"crossref","first-page":"545","DOI":"10.1046\/j.1469-1809.1999.6360545.x","article-title":"Power of variance component linkage analysis to detect quantitative trait loci","volume":"63","author":"Williams","year":"1999","journal-title":"Ann. Hum. Genet"},{"key":"2023020210483271800_B34","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.1002\/(SICI)1098-2272(1997)14:6<1065::AID-GEPI84>3.0.CO;2-F","article-title":"Statistical properties of a variance components method for quantitative trait linkage analysis in nuclear families and extended pedigrees","volume":"14","author":"Williams","year":"1997","journal-title":"Genet. Epidemiol"},{"key":"2023020210483271800_B35","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1159\/000096096","article-title":"Locus-specific heritability estimation via the bootstrap in linkage scans for quantitative trait loci","volume":"62","author":"Wu","year":"2006","journal-title":"Hum. Hered"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/14\/1603\/49048623\/bioinformatics_24_14_1603.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/14\/1603\/49048623\/bioinformatics_24_14_1603.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,30]],"date-time":"2025-01-30T11:22:17Z","timestamp":1738236137000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/14\/1603\/181817"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,5,21]]},"references-count":35,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2008,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn239","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"type":"electronic","value":"1367-4811"},{"type":"print","value":"1367-4803"}],"subject":[],"published-other":{"date-parts":[[2008,7,15]]},"published":{"date-parts":[[2008,5,21]]}}}