{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T04:53:36Z","timestamp":1764046416126},"reference-count":70,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2013,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case-control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Methods such as self-declared ancestry, ancestry informative markers, genomic control, structured association, and principal component analysis are used to assess and correct population stratification but each has limitations. We provide an alternative technique to address population stratification.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We propose a novel machine learning method, ETHNOPRED, which uses the genotype and ethnicity data from the HapMap project to learn ensembles of disjoint decision trees, capable of accurately predicting an individual\u2019s continental and sub-continental ancestry. To predict an individual\u2019s continental ancestry, ETHNOPRED produced an ensemble of 3 decision trees involving a total of 10 SNPs, with 10-fold cross validation accuracy of 100% using HapMap II dataset. We extended this model to involve 29 disjoint decision trees over 149 SNPs, and showed that this ensemble has an accuracy of\u2009\u2265\u200999.9%, even if some of those 149 SNP values were missing. On an independent dataset, predominantly of Caucasian origin, our continental classifier showed 96.8% accuracy and improved genomic control\u2019s \u03bb from 1.22 to 1.11. We next used the HapMap III dataset to learn classifiers to distinguish European subpopulations (North-Western vs. Southern), East Asian subpopulations (Chinese vs. Japanese), African subpopulations (Eastern vs. Western), North American subpopulations (European vs. Chinese vs. African vs. Mexican vs. Indian), and Kenyan subpopulations (Luhya vs. Maasai). In these cases, ETHNOPRED produced ensembles of 3, 39, 21, 11, and 25 disjoint decision trees, respectively involving 31, 502, 526, 242 and 271 SNPs, with 10-fold cross validation accuracy of 86.5%\u2009\u00b1\u20092.4%, 95.6%\u2009\u00b1\u20093.9%, 95.6%\u2009\u00b1\u20092.1%, 98.3%\u2009\u00b1\u20092.0%, and 95.9%\u2009\u00b1\u20091.5%. However, ETHNOPRED was unable to produce a classifier that can accurately distinguish Chinese in Beijing vs. Chinese in Denver.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>ETHNOPRED is a novel technique for producing classifiers that can identify an individual\u2019s continental and sub-continental heritage, based on a small number of SNPs. We show that its learned classifiers are simple, cost-efficient, accurate, transparent, flexible, fast, applicable to large scale GWASs, and robust to missing values.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-14-61","type":"journal-article","created":{"date-parts":[[2013,2,22]],"date-time":"2013-02-22T07:14:18Z","timestamp":1361517258000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction"],"prefix":"10.1186","volume":"14","author":[{"given":"Mohsen","family":"Hajiloo","sequence":"first","affiliation":[]},{"given":"Yadav","family":"Sapkota","sequence":"additional","affiliation":[]},{"given":"John R","family":"Mackey","sequence":"additional","affiliation":[]},{"given":"Paula","family":"Robson","sequence":"additional","affiliation":[]},{"given":"Russell","family":"Greiner","sequence":"additional","affiliation":[]},{"given":"Sambasivarao","family":"Damaraju","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2013,2,22]]},"reference":[{"key":"5763_CR1","volume-title":"Human Evolutionary Genetics: Origins, Peoples and Disease","author":"MA Jobling","year":"2004","unstructured":"Jobling MA, Hurles ME, Tyler-Smith C: Human Evolutionary Genetics: Origins, Peoples and Disease. New York: Garland Science; 2004."},{"issue":"1","key":"5763_CR2","doi-asserted-by":"publisher","first-page":"308","DOI":"10.1093\/nar\/29.1.308","volume":"29","author":"ST Sherry","year":"2001","unstructured":"Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001,29(1):308-311. 10.1093\/nar\/29.1.308","journal-title":"Nucleic Acids Res"},{"issue":"Database Issue","key":"5763_CR3","doi-asserted-by":"publisher","first-page":"D610","DOI":"10.1093\/nar\/gkl996","volume":"35","author":"TJ Hubbard","year":"2007","unstructured":"Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Goates G, Cunnigham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P: Ensembl 2007. Nucleic Acids Res 2007,35(Database Issue):D610-D617.","journal-title":"Nucleic Acids Res"},{"key":"5763_CR4","doi-asserted-by":"publisher","first-page":"2037","DOI":"10.1126\/science.8091226","volume":"265","author":"ES Lander","year":"1994","unstructured":"Lander ES, Schork NJ: Genetic dissection of complex traits. Science 1994, 265: 2037-2048. 10.1126\/science.8091226","journal-title":"Science"},{"key":"5763_CR5","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1038\/nrg1521","volume":"6","author":"JN Hirschhorn","year":"2005","unstructured":"Hirschhorn JN, Daly MJ: Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 2005, 6: 95-108.","journal-title":"Nat Rev Genet"},{"key":"5763_CR6","doi-asserted-by":"publisher","first-page":"388","DOI":"10.1038\/ng1333","volume":"36","author":"M Freedman","year":"2004","unstructured":"Freedman M: Assessing the impact of population stratification on genetic association studies. Nat Genet 2004, 36: 388-393. 10.1038\/ng1333","journal-title":"Nat Genet"},{"key":"5763_CR7","doi-asserted-by":"publisher","first-page":"512","DOI":"10.1038\/ng1337","volume":"36","author":"J Marchini","year":"2004","unstructured":"Marchini J: The effects of human population structure on large genetic association studies. Nat Genet 2004, 36: 512-517. 10.1038\/ng1337","journal-title":"Nat Genet"},{"issue":"10","key":"5763_CR8","doi-asserted-by":"publisher","first-page":"1181","DOI":"10.1038\/ng1007-1181","volume":"39","author":"MD Mailman","year":"2007","unstructured":"Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R, Hao L, Kiang A, Paschall J, Phan L, Popova N, Pretel S, Ziyabari L, Lee M, Shao Y, Wang ZY, Sirotkin K, Ward M, Kholodov M, Zbicz K, Beck J, Kimelman M, Shevelev S, Preuss D, Yaschenko E, Graeff A, Ostell J, Sherry ST: The NCBI dbGaP database of genotypes and phenotypes. Nat Genet 2007,39(10):1181-1186.","journal-title":"Nat Genet"},{"issue":"23","key":"5763_CR9","doi-asserted-by":"publisher","first-page":"9362","DOI":"10.1073\/pnas.0903103106","volume":"106","author":"LA Hindorff","year":"2009","unstructured":"Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. P Natl Acad Sci 2009,106(23):9362-9367. 10.1073\/pnas.0903103106","journal-title":"P Natl Acad Sci"},{"key":"5763_CR10","doi-asserted-by":"publisher","first-page":"598","DOI":"10.1016\/S0140-6736(03)12520-2","volume":"361","author":"LR Cardon","year":"2003","unstructured":"Cardon LR, Palmer LJ: Population stratification and spurious allelic association. Lancet 2003, 361: 598-604. 10.1016\/S0140-6736(03)12520-2","journal-title":"Lancet"},{"issue":"3","key":"5763_CR11","doi-asserted-by":"publisher","first-page":"418","DOI":"10.1111\/j.1469-1809.2010.00639.x","volume":"75","author":"C Wu","year":"2011","unstructured":"Wu C, DeWan A, Hoh J, Wang Z: A comparison of association methods correcting for population stratification in case-control studies. Ann Hum Genet 2011,75(3):418-427. 10.1111\/j.1469-1809.2010.00639.x","journal-title":"Ann Hum Genet"},{"issue":"4 Suppl","key":"5763_CR12","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1177\/1359786806066041","volume":"20","author":"MA Enoch","year":"2006","unstructured":"Enoch MA, Shen PH, Xu K, Hodgkinson C, Goldman D: Using ancestry-informative markers to define populations and detect population stratification. J Psychopharmacol 2006,20(4 Suppl):19-26. 10.1177\/1359786806066041","journal-title":"J Psychopharmacol"},{"issue":"1","key":"5763_CR13","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1002\/humu.20822","volume":"30","author":"R Kosoy","year":"2009","unstructured":"Kosoy R, Nassir R, Tian C, White PA, Butler LM, Silva G, Kittles R, Alarcon-Riquelme ME, Gregersen PK, Belmont JW: Ancestry informative marker sets for determining continental origin and admixture proportions in common populations in America. Hum Mutat 2009,30(1):69-78. 10.1002\/humu.20822","journal-title":"Hum Mutat"},{"key":"5763_CR14","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1186\/1471-2156-10-39","volume":"10","author":"R Nassir","year":"2009","unstructured":"Nassir R, Kosoy R, Tian C, White PA, Butler LM, Silva G, Kittles R, Alarcon-Riquelme ME, Gregersen PK, Belmont JW, De La Vega FM, Seldin MF: An ancestry informative marker set for determining continental origin: validation and extension using human genome diversity panels. BMC Genet 2009, 10: 39.","journal-title":"BMC Genet"},{"issue":"3-4","key":"5763_CR15","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1016\/j.fsigen.2007.06.008","volume":"1","author":"C Phillips","year":"2007","unstructured":"Phillips C, Salas A, Sanchez JJ, Fondevila M, Gomez-Tato A, Alvarez-Dios J, Calaza M, de Cal MC, Ballard D, Lareu MV: Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs. Forensic Sci Int Genet 2007,1(3-4):273-280.","journal-title":"Forensic Sci Int Genet"},{"issue":"5","key":"5763_CR16","doi-asserted-by":"publisher","first-page":"648","DOI":"10.1002\/humu.20695","volume":"29","author":"I Halder","year":"2008","unstructured":"Halder I, Shriver M, Thomas M, Fernandez JR, Frudakis T: A panel of ancestry informative markers for estimating individual biogeographical ancestry and admixture from four continents: utility and applications. Hum Mutat 2008,29(5):648-658. 10.1002\/humu.20695","journal-title":"Hum Mutat"},{"issue":"8","key":"5763_CR17","doi-asserted-by":"publisher","first-page":"868","DOI":"10.1038\/ng1607","volume":"37","author":"CD Campbell","year":"2005","unstructured":"Campbell CD, Ogburn EL, Lunetta KL, Lyon HN, Freedman ML, Groop LC, Altshuler D, Ardlie KG, Hirschhorn JN: Demonstrating stratification in a European American population. Nat Genet 2005,37(8):868-872. 10.1038\/ng1607","journal-title":"Nat Genet"},{"issue":"9","key":"5763_CR18","doi-asserted-by":"publisher","first-page":"e143","DOI":"10.1371\/journal.pgen.0020143","volume":"2","author":"MF Seldin","year":"2006","unstructured":"Seldin MF, Shigeta R, Villoslada P, Selmi C, Tuomilehto J, Silva G, Belmont JW, Klareskog L, Gregersen PK: European population substructure: clustering of northern and southern populations. PLoS Genet 2006,2(9):e143. 10.1371\/journal.pgen.0020143","journal-title":"PLoS Genet"},{"issue":"1","key":"5763_CR19","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1038\/ng1492","volume":"37","author":"A Helgason","year":"2005","unstructured":"Helgason A, Yngvadottir B, Hrafnkelsson B, Gulcher J, Stefansson K: An Icelandic example of the impact of population structure on association studies. Nat Genet 2005,37(1):90-95.","journal-title":"Nat Genet"},{"issue":"1","key":"5763_CR20","doi-asserted-by":"publisher","first-page":"e5","DOI":"10.1371\/journal.pgen.0040005","volume":"4","author":"MF Seldin","year":"2008","unstructured":"Seldin MF, Price AL: Application of ancestry informative markers to association studies in European Americans. PLoS Genet 2008,4(1):e5. 10.1371\/journal.pgen.0040005","journal-title":"PLoS Genet"},{"issue":"1","key":"5763_CR21","doi-asserted-by":"publisher","first-page":"e4","DOI":"10.1371\/journal.pgen.0040004","volume":"4","author":"C Tian","year":"2008","unstructured":"Tian C, Plenge RM, Ransom M, Lee A, Villoslada P, Selmi C, Klareskog L, Pulver AE, Qi L, Gregersen PK: Analysis and application of European genetic substructure using 300\u00a0K SNP information. PLoS Genet 2008,4(1):e4. 10.1371\/journal.pgen.0040004","journal-title":"PLoS Genet"},{"issue":"12","key":"5763_CR22","doi-asserted-by":"publisher","first-page":"e3862","DOI":"10.1371\/journal.pone.0003862","volume":"3","author":"C Tian","year":"2008","unstructured":"Tian C, Kosoy R, Lee A, Ransom M, Belmont JW, Gregersen PK, Seldin MF: Analysis of East Asia genetic substructure using genome-wide SNP arrays. PLoS One 2008,3(12):e3862. 10.1371\/journal.pone.0003862","journal-title":"PLoS One"},{"key":"5763_CR23","doi-asserted-by":"publisher","first-page":"786","DOI":"10.1073\/pnas.0909559107","volume":"107","author":"K Bryc","year":"2010","unstructured":"Bryc K, Auton A, Nelson MR, Oksenberg JR, Hauser SL, Williams S, Froment A, Bodo JM, Wambebe C, Tishkoff SA, Bustamante CD: Genomewide patterns of population structure and admixture in West Africans and African Americans. PNAS 2010, 107: 786-791. 10.1073\/pnas.0909559107","journal-title":"PNAS"},{"issue":"6","key":"5763_CR24","doi-asserted-by":"publisher","first-page":"1014","DOI":"10.1086\/513522","volume":"80","author":"C Tian","year":"2007","unstructured":"Tian C, Hinds DA, Shigeta R, Adler SG, Lee A, Pahl MV, Silva G, Belmont JW, Hanson RL, Knowler WC: A genomewide single-nucleotide-polymorphism panel for Mexican American admixture mapping. Am J Hum Genet 2007,80(6):1014-1023. 10.1086\/513522","journal-title":"Am J Hum Genet"},{"issue":"5","key":"5763_CR25","doi-asserted-by":"publisher","first-page":"948","DOI":"10.1086\/513477","volume":"80","author":"M Bauchet","year":"2007","unstructured":"Bauchet M, McEvoy B, Pearson LN, Quillen EE, Sarkisian T, Hovhannesyan K, Deka R, Bradley DG, Shriver MD: Measuring European population stratification with microarray genotype data. Am J Hum Genet 2007,80(5):948-956. 10.1086\/513477","journal-title":"Am J Hum Genet"},{"key":"5763_CR26","doi-asserted-by":"publisher","first-page":"997","DOI":"10.1111\/j.0006-341X.1999.00997.x","volume":"55","author":"B Devlin","year":"1999","unstructured":"Devlin B, Roeder K: Genomic control for association studies. Biometrics 1999, 55: 997-1004. 10.1111\/j.0006-341X.1999.00997.x","journal-title":"Biometrics"},{"key":"5763_CR27","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1002\/1098-2272(200101)20:1<4::AID-GEPI2>3.0.CO;2-T","volume":"20","author":"D Reich","year":"2001","unstructured":"Reich D, Goldstein D: Detecting association in a case-control study while allowing for population stratification. Genet Epidemiol 2001, 20: 4-16. 10.1002\/1098-2272(200101)20:1<4::AID-GEPI2>3.0.CO;2-T","journal-title":"Genet Epidemiol"},{"key":"5763_CR28","doi-asserted-by":"publisher","first-page":"1129","DOI":"10.1038\/ng1104-1129","volume":"36","author":"B Devlin","year":"2004","unstructured":"Devlin B: Genomic control to the extreme. Nat Genet 2004, 36: 1129-1130. 10.1038\/ng1104-1129","journal-title":"Nat Genet"},{"key":"5763_CR29","doi-asserted-by":"publisher","first-page":"1243","DOI":"10.1038\/ng1653","volume":"37","author":"DG Clayton","year":"2005","unstructured":"Clayton DG: Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat Genet 2005, 37: 1243-1246. 10.1038\/ng1653","journal-title":"Nat Genet"},{"key":"5763_CR30","doi-asserted-by":"publisher","first-page":"170","DOI":"10.1086\/302959","volume":"67","author":"JK Pritchard","year":"2000","unstructured":"Pritchard JK, Stephens M, Rosenberg NA, Donnelly P: Association mapping in structured populations. Am J Hum Genet 2000, 67: 170-181. 10.1086\/302959","journal-title":"Am J Hum Genet"},{"key":"5763_CR31","doi-asserted-by":"publisher","first-page":"466","DOI":"10.1086\/318195","volume":"68","author":"G Satten","year":"2001","unstructured":"Satten G: Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am J Hum Genet 2001, 68: 466-477. 10.1086\/318195","journal-title":"Am J Hum Genet"},{"key":"5763_CR32","doi-asserted-by":"crossref","first-page":"945","DOI":"10.1093\/genetics\/155.2.945","volume":"155","author":"JK Pritchard","year":"2000","unstructured":"Pritchard JK: Inference of population structure using multilocus genotype data. Genetics 2000, 155: 945-959.","journal-title":"Genetics"},{"key":"5763_CR33","doi-asserted-by":"publisher","first-page":"2381","DOI":"10.1126\/science.1078311","volume":"298","author":"NA Rosenberg","year":"2002","unstructured":"Rosenberg NA: Genetic structure of human populations. Science 2002, 298: 2381-2385. 10.1126\/science.1078311","journal-title":"Science"},{"key":"5763_CR34","doi-asserted-by":"publisher","first-page":"904","DOI":"10.1038\/ng1847","volume":"38","author":"AL Price","year":"2006","unstructured":"Price AL: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006, 38: 904-909. 10.1038\/ng1847","journal-title":"Nat Genet"},{"key":"5763_CR35","doi-asserted-by":"publisher","first-page":"e190","DOI":"10.1371\/journal.pgen.0020190","volume":"2","author":"N Patterson","year":"2006","unstructured":"Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS Genet 2006, 2: e190. 10.1371\/journal.pgen.0020190","journal-title":"PLoS Genet"},{"key":"5763_CR36","doi-asserted-by":"publisher","first-page":"646","DOI":"10.1038\/ng.139","volume":"40","author":"J Novembre","year":"2008","unstructured":"Novembre J, Stephens M: Interpreting principal component analyses of spatial population genetic variation. Nat Genet 2008, 40: 646-649. 10.1038\/ng.139","journal-title":"Nat Genet"},{"key":"5763_CR37","doi-asserted-by":"publisher","first-page":"356","DOI":"10.1038\/nrg2344","volume":"9","author":"MI McCarthy","year":"2008","unstructured":"McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 2008, 9: 356-369. 10.1038\/nrg2344","journal-title":"Nat Rev Genet"},{"key":"5763_CR38","doi-asserted-by":"crossref","unstructured":"Ahn K, Gordon D, Finch SJ: Increase of rejection rate in case-control studies with the differential genotyping error rates. Stat Appl Genet Mol Biol 2009.,8(1): Article25 Article25","DOI":"10.2202\/1544-6115.1429"},{"issue":"11","key":"5763_CR39","doi-asserted-by":"publisher","first-page":"1243","DOI":"10.1038\/ng1653","volume":"37","author":"DG Clayton","year":"2005","unstructured":"Clayton DG, Walker NM, Smyth DJ, Pask R, Cooper JD, Maier LM, Smink LJ, Lam AC, Ovington NR, Stevens HE: Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat Genet 2005,37(11):1243-1246. 10.1038\/ng1653","journal-title":"Nat Genet"},{"issue":"3-4","key":"5763_CR40","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1159\/000083540","volume":"58","author":"SJ Kang","year":"2004","unstructured":"Kang SJ, Finch SJ, Haynes C, Gordon D: Quantifying the percent increase in minimum sample size for SNP genotyping errors in genetic model-based association studies. Hum Hered 2004,58(3-4):139-144.","journal-title":"Hum Hered"},{"issue":"2","key":"5763_CR41","doi-asserted-by":"publisher","first-page":"102","DOI":"10.1159\/000314470","volume":"70","author":"D Londono","year":"2010","unstructured":"Londono D, Haynes C, De La Vega FM, Finch SJ, Gordon D: A cost-effective statistical method to correct for differential genotype misclassification when performing case-control genetic association. Hum Hered 2010,70(2):102-108. 10.1159\/000314470","journal-title":"Hum Hered"},{"issue":"1","key":"5763_CR42","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1159\/000092553","volume":"61","author":"V Moskvina","year":"2006","unstructured":"Moskvina V, Craddock N, Holmans P, Owen MJ, O'Donovan MC: Effects of differential genotyping error rate on the type I error probability of case-control studies. Hum Hered 2006,61(1):55-64. 10.1159\/000092553","journal-title":"Hum Hered"},{"issue":"5","key":"5763_CR43","doi-asserted-by":"publisher","first-page":"e74","DOI":"10.1371\/journal.pgen.0030074","volume":"3","author":"V Plagnol","year":"2007","unstructured":"Plagnol V, Cooper JD, Todd JA, Clayton DG: A method to address differential bias in genotyping in large-scale association studies. PLoS Genet 2007,3(5):e74. 10.1371\/journal.pgen.0030074","journal-title":"PLoS Genet"},{"issue":"Pt 2","key":"5763_CR44","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1046\/j.1469-1809.2003.00020.x","volume":"67","author":"KM Rice","year":"2003","unstructured":"Rice KM, Holmans P: Allowing for genotyping error in analysis of unmatched case-control studies. Ann Hum Genet 2003,67(Pt 2):165-174.","journal-title":"Ann Hum Genet"},{"issue":"6","key":"5763_CR45","doi-asserted-by":"publisher","first-page":"e5825","DOI":"10.1371\/journal.pone.0005825","volume":"4","author":"CS Rakovski","year":"2009","unstructured":"Rakovski CS, Stram DO: A kinship-based modification of the armitage trend test to address hidden population structure and small differential genotyping errors. PLoS One 2009,4(6):e5825. 10.1371\/journal.pone.0005825","journal-title":"PLoS One"},{"key":"5763_CR46","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1038\/nature02168","volume":"426","author":"The International HapMap Consortium","year":"2003","unstructured":"The International HapMap Consortium: The International HapMap Project. Nature 2003, 426: 89-796.","journal-title":"Nature"},{"issue":"4","key":"5763_CR47","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1007\/s00439-011-0973-1","volume":"130","author":"B Sehrawat","year":"2011","unstructured":"Sehrawat B, Sridharan M, Ghosh S, Robson P, Cass CE, Mackey J, Greiner R, Damaraju S: Potential novel candidate polymorphisms identified in genome-wide association study for breast cancer susceptibility. Hum Genet 2011,130(4):529-37. 10.1007\/s00439-011-0973-1","journal-title":"Hum Genet"},{"key":"5763_CR48","doi-asserted-by":"crossref","unstructured":"Pearson K: Mathematical contributions to the theory of evolution. XI. On the influence of natural selection on the variability and correlation of organs. Philos Trans R Soc Lond 1903, Ser A 200(321-330):1-66.","DOI":"10.1098\/rsta.1903.0001"},{"key":"5763_CR49","volume-title":"Machine Learning","author":"T Mitchell","year":"1997","unstructured":"Mitchell T: Machine Learning. New York: McGraw Hill; 1997."},{"key":"5763_CR50","volume-title":"Pattern classification","author":"RO Duda","year":"2001","unstructured":"Duda RO, Hart PE, Stork DG: Pattern classification. 2nd edition. New York: Wiley; 2001.","edition":"2"},{"key":"5763_CR51","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The Elements of Statistical Learning: Data Mining, Inference, and Prediction","author":"T Hastie","year":"2009","unstructured":"Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd edition. New York: Springer; 2009.","edition":"2"},{"key":"5763_CR52","volume-title":"Bioinformatics: The Machine Learning Approach","author":"P Baldi","year":"2001","unstructured":"Baldi P, Brunak S: Bioinformatics: The Machine Learning Approach. 2nd edition. Cambridge, Massachusetts: The MIT Press; 2001.","edition":"2"},{"issue":"1","key":"5763_CR53","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1093\/bib\/bbk007","volume":"7","author":"P Larranaga","year":"2006","unstructured":"Larranaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, Lozano JA, Armananzas R, Santafe G, Perez A, Robles A: Machine learning in bioinformatics. Brief Bioinform 2006,7(1):86-112. 10.1093\/bib\/bbk007","journal-title":"Brief Bioinform"},{"issue":"6","key":"5763_CR54","doi-asserted-by":"publisher","first-page":"e116","DOI":"10.1371\/journal.pcbi.0030116","volume":"3","author":"AL Tarca","year":"2007","unstructured":"Tarca AL, Carey VJ, Chen XW, Romero R, Draghici S: Machine learning and its applications to biology. PLoS Comput Biol 2007,3(6):e116. 10.1371\/journal.pcbi.0030116","journal-title":"PLoS Comput Biol"},{"key":"5763_CR55","doi-asserted-by":"publisher","first-page":"4103","DOI":"10.1093\/nar\/gkf543","volume":"30","author":"C Math\u00e9","year":"2002","unstructured":"Math\u00e9 C, Sagot M-F, Schiex T, Rouz\u00e9 P: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res 2002, 30: 4103-4117. 10.1093\/nar\/gkf543","journal-title":"Nucleic Acids Res"},{"issue":"18","key":"5763_CR56","doi-asserted-by":"publisher","first-page":"3613","DOI":"10.1093\/bioinformatics\/bth454","volume":"20","author":"K Won","year":"2004","unstructured":"Won K, Prugel-Bennett A, Krogh A: Training HMM structure with genetic algorithm for biological sequence analysis. Bioinformatics 2004,20(18):3613-3619. 10.1093\/bioinformatics\/bth454","journal-title":"Bioinformatics"},{"key":"5763_CR57","doi-asserted-by":"publisher","first-page":"1117","DOI":"10.1006\/jmbi.1993.1464","volume":"232","author":"TM Yi","year":"1993","unstructured":"Yi TM, Lander ES: Protein secondary structure prediction using nearest-neighbor methods. J Mol Biology 1993, 232: 1117-1129. 10.1006\/jmbi.1993.1464","journal-title":"J Mol Biology"},{"issue":"Suppl 1","key":"5763_CR58","doi-asserted-by":"publisher","first-page":"S13","DOI":"10.1186\/1471-2164-9-S1-S13","volume":"9","author":"M Pirooznia","year":"2008","unstructured":"Pirooznia M, Yang JY, Yang MQ, Deng Y: A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics 2008,9(Suppl 1):S13. 10.1186\/1471-2164-9-S1-S13","journal-title":"BMC Genomics"},{"issue":"Suppl 1","key":"5763_CR59","doi-asserted-by":"publisher","first-page":"I232","DOI":"10.1093\/bioinformatics\/bth923","volume":"20","author":"M Middendorf","year":"2004","unstructured":"Middendorf M, Kundaje A, Wiggins C, Freund Y, Leslie C: Predicting genetic regulatory response using classification. Bioinformatics 2004,20(Suppl 1):I232-I240. 10.1093\/bioinformatics\/bth923","journal-title":"Bioinformatics"},{"issue":"Suppl 1","key":"5763_CR60","doi-asserted-by":"publisher","first-page":"S7","DOI":"10.1186\/1471-2105-6-S1-S7","volume":"6","author":"GD Zhou","year":"2005","unstructured":"Zhou GD, Shen D, Zhang J, Su J, Tan SH: Recognition of protein\/gene names from text using an ensemble of classifiers. BMC Bioinformatics 2005,6(Suppl 1):S7. 10.1186\/1471-2105-6-S1-S7","journal-title":"BMC Bioinformatics"},{"key":"5763_CR61","first-page":"81","volume":"1","author":"JR Quinlan","year":"1986","unstructured":"Quinlan JR: Induction of decision trees. Mach Learn 1986, 1: 81-106.","journal-title":"Mach Learn"},{"key":"5763_CR62","volume-title":"Classification and Regression Trees","author":"L Breiman","year":"1984","unstructured":"Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and Regression Trees. New York: Chapman &Hall (Wadsworth, Inc.); 1984."},{"key":"5763_CR63","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/3-540-45014-9_1","volume":"1857","author":"TG Dietterich","year":"2000","unstructured":"Dietterich TG: Ensemble methods in machine learning. Lect Notes Comput Sc 2000, 1857: 1-15. 10.1007\/3-540-45014-9_1","journal-title":"Lect Notes Comput Sc"},{"issue":"2","key":"5763_CR64","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1023\/A:1022859003006","volume":"51","author":"LI Kuncheva","year":"2003","unstructured":"Kuncheva LI, Whitaker CJ: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. J Mach Learn 2003,51(2):181-207. 10.1023\/A:1022859003006","journal-title":"J Mach Learn"},{"key":"5763_CR65","first-page":"S75","volume":"2","author":"AC Tan","year":"2003","unstructured":"Tan AC, Gilbert D: Ensemble machine learning on gene expression data for cancer classification. Appl Bioinformatics 2003, 2: S75-S83.","journal-title":"Appl Bioinformatics"},{"issue":"6","key":"5763_CR66","doi-asserted-by":"publisher","first-page":"553","DOI":"10.1016\/j.compbiomed.2005.04.001","volume":"36","author":"Y Peng","year":"2006","unstructured":"Peng Y: A novel ensemble machine learning for robust microarray data classification. Comput Biol Med 2006,36(6):553-573. 10.1016\/j.compbiomed.2005.04.001","journal-title":"Comput Biol Med"},{"issue":"3","key":"5763_CR67","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1109\/MCAS.2006.1688199","volume":"6","author":"R Polikar","year":"2006","unstructured":"Polikar R: Ensemble based systems in decision making. IEEE Circuits Syst Mag 2006,6(3):21-45.","journal-title":"IEEE Circuits Syst Mag"},{"issue":"457","key":"5763_CR68","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1198\/016214502753479248","volume":"97","author":"S Dudoit","year":"2002","unstructured":"Dudoit S, Fridlyand J, Speed TP: Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 2002,97(457):77-87. 10.1198\/016214502753479248","journal-title":"J Am Stat Assoc"},{"issue":"2","key":"5763_CR69","doi-asserted-by":"publisher","first-page":"444","DOI":"10.1016\/j.ajhg.2007.11.004","volume":"82","author":"DY Lin","year":"2008","unstructured":"Lin DY, Hu Y, Huang BE: Simple and efficient analysis of disease association with missing genotype data. Am J Hum Genet 2008,82(2):444-452. 10.1016\/j.ajhg.2007.11.004","journal-title":"Am J Hum Genet"},{"key":"5763_CR70","doi-asserted-by":"crossref","first-page":"77","DOI":"10.4137\/CIN.S408","volume":"6","author":"AL Boulesteix","year":"2008","unstructured":"Boulesteix AL, Strobl C, Augustin T, Daumer M: Evaluating microarray-based classifiers: an overview. Cancer Informatics 2008, 6: 77-97.","journal-title":"Cancer Informatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-14-61.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T22:50:27Z","timestamp":1630536627000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-14-61"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,2,22]]},"references-count":70,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2013,12]]}},"alternative-id":["5763"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-14-61","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,2,22]]},"assertion":[{"value":"28 January 2013","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 February 2013","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 February 2013","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"61"}}