{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T04:45:24Z","timestamp":1773204324515,"version":"3.50.1"},"reference-count":26,"publisher":"Oxford University Press (OUP)","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,3,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The rapid development of genotyping technology and extensive cataloguing of single nucleotide polymorphisms (SNPs) across the human genome have made genetic association studies the mainstream for gene mapping of complex human diseases. For many diseases, the most practical approach is the population-based design with unrelated individuals. Although having the advantages of easier sample collection and greater power than family-based designs, unrecognized population stratification in the study samples can lead to both false-positive and false-negative findings and might obscure the true association signals if not appropriately corrected.<\/jats:p>\n               <jats:p>Methods: We report PHYLOSTRAT, a new method that corrects for population stratification by combining phylogeny constructed from SNP genotypes and principal coordinates from multi-dimensional scaling (MDS) analysis. This hybrid approach efficiently captures both discrete and admixed population structures.<\/jats:p>\n               <jats:p>Results: By extensive simulations, the analysis of a synthetic genome-wide association dataset created using data from the Human Genome Diversity Project, and the analysis of a lactase-height dataset, we show that our method can correct for population stratification more efficiently than several existing population stratification correction methods, including EIGENSTRAT, a hybrid approach based on MDS and clustering, and STRATSCORE , in terms of requiring fewer random SNPs for inference of population structure. By combining the flexibility and hierarchical nature of phylogenetic trees with the advantage of representing admixture using MDS, our hybrid approach can capture the complex population structures in human populations effectively.<\/jats:p>\n               <jats:p>Software Availability: Codes can be downloaded from http:\/\/people.pcbi.upenn.edu\/\u223clswang\/phylostrat\/<\/jats:p>\n               <jats:p>Contact: \u00a0mingyao@upenn.edu; iswang@upenn.edu.<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq025","type":"journal-article","created":{"date-parts":[[2010,1,24]],"date-time":"2010-01-24T01:24:07Z","timestamp":1264296247000},"page":"798-806","source":"Crossref","is-referenced-by-count":36,"title":["Correcting population stratification in genetic association studies using a phylogenetic approach"],"prefix":"10.1093","volume":"26","author":[{"given":"Mingyao","family":"Li","sequence":"first","affiliation":[{"name":"1 Department of Biostatistics and Epidemiology, 2 Cardiovascular Institute, 3 Department of Pathology and Laboratory Medicine, 4 Penn Center for Bioinformatics and 5 Institute on Aging, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA"}]},{"given":"Muredach P.","family":"Reilly","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics and Epidemiology, 2 Cardiovascular Institute, 3 Department of Pathology and Laboratory Medicine, 4 Penn Center for Bioinformatics and 5 Institute on Aging, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA"}]},{"given":"Daniel J.","family":"Rader","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics and Epidemiology, 2 Cardiovascular Institute, 3 Department of Pathology and Laboratory Medicine, 4 Penn Center for Bioinformatics and 5 Institute on Aging, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA"}]},{"given":"Li-San","family":"Wang","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics and Epidemiology, 2 Cardiovascular Institute, 3 Department of Pathology and Laboratory Medicine, 4 Penn Center for Bioinformatics and 5 Institute on Aging, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA"},{"name":"1 Department of Biostatistics and Epidemiology, 2 Cardiovascular Institute, 3 Department of Pathology and Laboratory Medicine, 4 Penn Center for Bioinformatics and 5 Institute on Aging, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA"},{"name":"1 Department of Biostatistics and Epidemiology, 2 Cardiovascular Institute, 3 Department of Pathology and Laboratory Medicine, 4 Penn Center for Bioinformatics and 5 Institute on Aging, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA"}]}],"member":"286","published-online":{"date-parts":[[2010,1,22]]},"reference":[{"key":"2023012508002544900_B1","doi-asserted-by":"crossref","first-page":"375","DOI":"10.2307\/3001775","article-title":"Tests for linear trends in proportions and frequencies","volume":"11","author":"Armitage","year":"1955","journal-title":"Biometrics"},{"key":"2023012508002544900_B2","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/BF01441146","article-title":"A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity","volume":"96","author":"Balding","year":"1995","journal-title":"Genetica"},{"key":"2023012508002544900_B3","doi-asserted-by":"crossref","first-page":"868","DOI":"10.1038\/ng1607","article-title":"Demonstrating stratification in an European American population","volume":"37","author":"Campbell","year":"2005","journal-title":"Nat. Genet."},{"key":"2023012508002544900_B4","first-page":"233","article-title":"Phylogenetic analysis: models and estimation procedures","volume":"19","author":"Cavalli-Sforza","year":"1967","journal-title":"Am. J. Hum. Genet."},{"key":"2023012508002544900_B5","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1089\/106652702761034136","article-title":"Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle","volume":"19","author":"Desper","year":"2002","journal-title":"J. Comput. Biol."},{"key":"2023012508002544900_B6","doi-asserted-by":"crossref","first-page":"997","DOI":"10.1111\/j.0006-341X.1999.00997.x","article-title":"Genomic control for association studies","volume":"55","author":"Devlin","year":"1999","journal-title":"Biometrics"},{"key":"2023012508002544900_B7","doi-asserted-by":"crossref","first-page":"921","DOI":"10.1086\/516842","article-title":"A simple and improved correction for population stratification in case-control studies","volume":"80","author":"Epstein","year":"2007","journal-title":"Am. J. Hum. Genet."},{"key":"2023012508002544900_B8","doi-asserted-by":"crossref","first-page":"783","DOI":"10.2307\/2408678","article-title":"Confidence limits on phylogenies: an approach using the bootstrap","volume":"39","author":"Felsenstein","year":"1985","journal-title":"Evolution"},{"key":"2023012508002544900_B9","first-page":"190","volume-title":"Computers and Intractability: A Guide to the Theory of NP-Completeness.","author":"Garey","year":"1979"},{"key":"2023012508002544900_B10","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1093\/genetics\/139.1.463","article-title":"An evaluation of genetic distances for use with microsatellite loci","volume":"139","author":"Goldstein","year":"1995","journal-title":"Genetics"},{"key":"2023012508002544900_B11","doi-asserted-by":"crossref","first-page":"998","DOI":"10.1038\/nature06742","article-title":"Genotype, haplotype and copy-number variation in worldwide human populations","volume":"451","author":"Jakobsson","year":"2008","journal-title":"Nature"},{"key":"2023012508002544900_B12","doi-asserted-by":"crossref","first-page":"e3583","DOI":"10.1371\/journal.pone.0003583","article-title":"Concept, design and implementation of a cardiovascular gene-centric 50K SNP array for large-scale genomic association studies","volume":"3","author":"Keating","year":"2008","journal-title":"PLoS ONE"},{"key":"2023012508002544900_B13","doi-asserted-by":"crossref","first-page":"895","DOI":"10.1086\/521372","article-title":"A randomization test for controlling population stratification in whole-genome association studies","volume":"81","author":"Kimmel","year":"2007","journal-title":"Am. J. Hum. Genet."},{"key":"2023012508002544900_B14","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1002\/gepi.20296","article-title":"Improved correction for population stratification in genome-wide association studies by identifying hidden population structures","volume":"32","author":"Li","year":"2008","journal-title":"Genet. Epid."},{"key":"2023012508002544900_B15","doi-asserted-by":"crossref","first-page":"1100","DOI":"10.1126\/science.1153717","article-title":"Worldwide human relationships inferred from genome-wide patterns of variation","volume":"319","author":"Li","year":"2008","journal-title":"Science"},{"key":"2023012508002544900_B16","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1016\/j.ajhg.2007.11.003","article-title":"On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants","volume":"82","author":"Luca","year":"2008","journal-title":"Am. J. Hum. Genet."},{"key":"2023012508002544900_B17","doi-asserted-by":"crossref","first-page":"512","DOI":"10.1038\/ng1337","article-title":"The effects of human population structure on large genetic association studies","volume":"36","author":"Marchini","year":"2004","journal-title":"Nat. Genet."},{"key":"2023012508002544900_B18","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1086\/282771","article-title":"Genetic distance between populations","volume":"106","author":"Nei","year":"1972","journal-title":"Am. Naturalist"},{"key":"2023012508002544900_B19","doi-asserted-by":"crossref","first-page":"904","DOI":"10.1038\/ng1847","article-title":"Principal components analysis corrects for stratification in genome-wide association studies","volume":"38","author":"Price","year":"2006","journal-title":"Nat. Genet."},{"key":"2023012508002544900_B20","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1086\/302449","article-title":"Use of unlinked genetic markers to detect population stratification in association studies","volume":"65","author":"Pritchard","year":"1999","journal-title":"Am. J. Hum. Genet."},{"key":"2023012508002544900_B21","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1086\/302959","article-title":"Association mapping in structured populations","volume":"67","author":"Pritchard","year":"2000","journal-title":"Am. J. Hum. Genet."},{"key":"2023012508002544900_B22","doi-asserted-by":"crossref","first-page":"2381","DOI":"10.1126\/science.1078311","article-title":"Genetic structure of human populations","volume":"298","author":"Rosenberg","year":"2002","journal-title":"Science"},{"key":"2023012508002544900_B23","first-page":"406","article-title":"The neighbor-joining method: a new method for reconstructing phylogenetic trees","volume":"4","author":"Saitou","year":"1987","journal-title":"Mol. Biol. Evol."},{"key":"2023012508002544900_B24","doi-asserted-by":"crossref","first-page":"e1382","DOI":"10.1371\/journal.pone.0001382","article-title":"Correction of population stratification in large multi-ethnic association studies","volume":"1","author":"Serre","year":"2008","journal-title":"PLoS ONE"},{"key":"2023012508002544900_B25","first-page":"729","article-title":"A note on the neighbor-joining algorithm of Saitou and Nei","volume":"5","author":"Studier","year":"1988","journal-title":"Mol. Biol. Evol."},{"key":"2023012508002544900_B26","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1111\/1467-9868.00293","article-title":"Estimating the number of clusters in a data set via the gap statistic","volume":"63","author":"Tibshirani","year":"2001","journal-title":"J. R. Stat. Soc. B"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/6\/798\/48853771\/bioinformatics_26_6_798.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/6\/798\/48853771\/bioinformatics_26_6_798.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T08:01:04Z","timestamp":1674633664000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/6\/798\/244373"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,1,22]]},"references-count":26,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2010,3,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq025","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,3,15]]},"published":{"date-parts":[[2010,1,22]]}}}