{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T13:08:18Z","timestamp":1761743298457},"reference-count":43,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2009,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>A novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>The algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-10-382","type":"journal-article","created":{"date-parts":[[2009,11,24]],"date-time":"2009-11-24T07:15:24Z","timestamp":1259046924000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":29,"title":["Iterative pruning PCA improves resolution of highly structured populations"],"prefix":"10.1186","volume":"10","author":[{"given":"Apichart","family":"Intarapanich","sequence":"first","affiliation":[]},{"given":"Philip J","family":"Shaw","sequence":"additional","affiliation":[]},{"given":"Anunchai","family":"Assawamakin","sequence":"additional","affiliation":[]},{"given":"Pongsakorn","family":"Wangkumhang","sequence":"additional","affiliation":[]},{"given":"Chumpol","family":"Ngamphiw","sequence":"additional","affiliation":[]},{"given":"Kridsadakorn","family":"Chaichoompu","sequence":"additional","affiliation":[]},{"given":"Jittima","family":"Piriyapongsa","sequence":"additional","affiliation":[]},{"given":"Sissades","family":"Tongsima","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2009,11,23]]},"reference":[{"issue":"9357","key":"3112_CR1","doi-asserted-by":"publisher","first-page":"598","DOI":"10.1016\/S0140-6736(03)12520-2","volume":"361","author":"LR Cardon","year":"2003","unstructured":"Cardon LR, Palmer LJ: Population stratification and spurious allelic association. Lancet 2003, 361(9357):598\u2013604. 10.1016\/S0140-6736(03)12520-2","journal-title":"Lancet"},{"issue":"2","key":"3112_CR2","doi-asserted-by":"crossref","first-page":"945","DOI":"10.1093\/genetics\/155.2.945","volume":"155","author":"JK Pritchard","year":"2000","unstructured":"Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics 2000, 155(2):945\u2013959.","journal-title":"Genetics"},{"issue":"7063","key":"3112_CR3","doi-asserted-by":"publisher","first-page":"1299","DOI":"10.1038\/nature04226","volume":"437","author":"Consortium IH","year":"2005","unstructured":"Consortium IH: A haplotype map of the human genome. Nature 2005, 437(7063):1299\u20131320. 10.1038\/nature04226","journal-title":"Nature"},{"issue":"4","key":"3112_CR4","doi-asserted-by":"crossref","first-page":"1567","DOI":"10.1093\/genetics\/164.4.1567","volume":"164","author":"D Falush","year":"2003","unstructured":"Falush D, Stephens M, Pritchard JK: Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 2003, 164(4):1567\u20131587.","journal-title":"Genetics"},{"issue":"2","key":"3112_CR5","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1159\/000083030","volume":"58","author":"S Purcell","year":"2004","unstructured":"Purcell S, Sham P: Properties of structured association approaches to detecting population stratification. Human heredity 2004, 58(2):93\u2013107. 10.1159\/000083030","journal-title":"Human heredity"},{"key":"3112_CR6","doi-asserted-by":"publisher","first-page":"317","DOI":"10.1186\/1471-2105-7-317","volume":"7","author":"B Wu","year":"2006","unstructured":"Wu B, Liu N, Zhao H: PSMIX: an R package for population structure inference via maximum likelihood method. BMC bioinformatics 2006, 7: 317. 10.1186\/1471-2105-7-317","journal-title":"BMC bioinformatics"},{"issue":"4","key":"3112_CR7","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1002\/gepi.20064","volume":"28","author":"H Tang","year":"2005","unstructured":"Tang H, Peng J, Wang P, Risch NJ: Estimation of individual admixture: analytical and study design considerations. Genetic epidemiology 2005, 28(4):289\u2013301. 10.1002\/gepi.20064","journal-title":"Genetic epidemiology"},{"issue":"10","key":"3112_CR8","doi-asserted-by":"publisher","first-page":"2833","DOI":"10.1111\/j.1365-294X.2006.02994.x","volume":"15","author":"J Corander","year":"2006","unstructured":"Corander J, Marttinen P: Bayesian identification of admixture events using multilocus molecular markers. Molecular ecology 2006, 15(10):2833\u20132843.","journal-title":"Molecular ecology"},{"key":"3112_CR9","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1186\/1471-2105-9-539","volume":"9","author":"J Corander","year":"2008","unstructured":"Corander J, Marttinen P, Siren J, Tang J: Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC bioinformatics 2008, 9: 539. 10.1186\/1471-2105-9-539","journal-title":"BMC bioinformatics"},{"issue":"5","key":"3112_CR10","doi-asserted-by":"publisher","first-page":"747","DOI":"10.1111\/j.1471-8286.2007.01769.x","volume":"7","author":"C Chen","year":"2007","unstructured":"Chen C, Durand E, Forbes F, Fran\u00e7ois O: Bayesian clustering algorithms ascertaining spatial population structure: A new computer program and a comparison study. Molecular Ecology Notes 2007, 7(5):747\u2013756. 10.1111\/j.1471-8286.2007.01769.x","journal-title":"Molecular Ecology Notes"},{"issue":"5","key":"3112_CR11","doi-asserted-by":"publisher","first-page":"948","DOI":"10.1086\/513477","volume":"80","author":"M Bauchet","year":"2007","unstructured":"Bauchet M, McEvoy B, Pearson LN, Quillen EE, Sarkisian T, Hovhannesyan K, Deka R, Bradley DG, Shriver MD: Measuring European population stratification with microarray genotype data. American journal of human genetics 2007, 80(5):948\u2013956. 10.1086\/513477","journal-title":"American journal of human genetics"},{"key":"3112_CR12","doi-asserted-by":"crossref","unstructured":"Reeves PA, Richards CM: Accurate Inference of Subtle Population STructure (and Other Genetic Discontinuities) Using Proncipal Coordinates. PLoS ONE 2009., 4(1): 10.1371\/journal.pone.0004269","DOI":"10.1371\/journal.pone.0004269"},{"issue":"12","key":"3112_CR13","doi-asserted-by":"publisher","first-page":"e190","DOI":"10.1371\/journal.pgen.0020190","volume":"2","author":"N Patterson","year":"2006","unstructured":"Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS genetics 2006, 2(12):e190. 10.1371\/journal.pgen.0020190","journal-title":"PLoS genetics"},{"issue":"8","key":"3112_CR14","doi-asserted-by":"publisher","first-page":"904","DOI":"10.1038\/ng1847","volume":"38","author":"AL Price","year":"2006","unstructured":"Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics 2006, 38(8):904\u2013909. 10.1038\/ng1847","journal-title":"Nature genetics"},{"issue":"5","key":"3112_CR15","doi-asserted-by":"publisher","first-page":"e1000074","DOI":"10.1371\/journal.pgen.1000074","volume":"4","author":"J Han","year":"2008","unstructured":"Han J, Kraft P, Nan H, Guo Q, Chen C, Qureshi A, Hankinson SE, Hu FB, Duffy DL, Zhao ZZ, et al.: A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation. PLoS genetics 2008, 4(5):e1000074. 10.1371\/journal.pgen.1000074","journal-title":"PLoS genetics"},{"issue":"3","key":"3112_CR16","doi-asserted-by":"publisher","first-page":"e1000041","DOI":"10.1371\/journal.pgen.1000041","volume":"4","author":"Y Liu","year":"2008","unstructured":"Liu Y, Helms C, Liao W, Zaba LC, Duan S, Gardner J, Wise C, Miner A, Malloy MJ, Pullinger CR, et al.: A genome-wide association study of psoriasis and psoriatic arthritis identifies new disease loci. PLoS genetics 2008, 4(3):e1000041. 10.1371\/journal.pgen.1000041","journal-title":"PLoS genetics"},{"issue":"6","key":"3112_CR17","doi-asserted-by":"publisher","first-page":"1119","DOI":"10.1086\/522235","volume":"81","author":"RP Stokowski","year":"2007","unstructured":"Stokowski RP, Pant PV, Dadd T, Fereday A, Hinds DA, Jarman C, Filsell W, Ginger RS, Green MR, Ouderaa FJ, et al.: A genomewide association study of skin pigmentation in a South Asian population. American journal of human genetics 2007, 81(6):1119\u20131132. 10.1086\/522235","journal-title":"American journal of human genetics"},{"issue":"1","key":"3112_CR18","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1145\/1007730.1007731","volume":"6","author":"L Parsons","year":"2004","unstructured":"Parsons L, Haque E, Liu H: Subspace Clustering for high dimensional data: A review. Sigkdd Explorations 2004, 6(1):15. 10.1145\/1007730.1007731","journal-title":"Sigkdd Explorations"},{"key":"3112_CR19","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1186\/1471-2105-9-77","volume":"9","author":"X Gao","year":"2008","unstructured":"Gao X, Starmer JD: AWclust: point-and-click software for non-parametric population structure analysis. BMC bioinformatics 2008, 9: 77. 10.1186\/1471-2105-9-77","journal-title":"BMC bioinformatics"},{"issue":"Suppl 1","key":"3112_CR20","doi-asserted-by":"publisher","first-page":"S73","DOI":"10.1186\/1471-2105-10-S1-S73","volume":"10","author":"C Lee","year":"2009","unstructured":"Lee C, Abdool A, Huang CH: PCA-based population structure inference with generic clustering algorithms. BMC bioinformatics 2009, 10(Suppl 1):S73. 10.1186\/1471-2105-10-S1-S73","journal-title":"BMC bioinformatics"},{"issue":"6","key":"3112_CR21","doi-asserted-by":"publisher","first-page":"353","DOI":"10.1186\/1479-7364-2-6-353","volume":"2","author":"N Liu","year":"2006","unstructured":"Liu N, Zhao H: A non-parametric approach to population structure inference using multilocus genotypes. Human genomics 2006, 2(6):353\u2013364.","journal-title":"Human genomics"},{"issue":"2","key":"3112_CR22","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1145\/276305.276314","volume":"27","author":"R Agrawal","year":"1998","unstructured":"Agrawal R, Gehrke J, Gunopulos D, Raghavan P: Automatic Subspace Clustering of High Dimensional Data for data mining applications. SIGMOD Record ACM Special Interest Group on Management of Data 1998, 27(2):94\u2013105.","journal-title":"SIGMOD Record ACM Special Interest Group on Management of Data"},{"key":"3112_CR23","volume-title":"matrix computations","author":"GH Golub","year":"1996","unstructured":"Golub GH, Van Loan FC: matrix computations. 3rd edition. Baltimore: The Johns Hopkins University Press; 1996.","edition":"3"},{"issue":"1","key":"3112_CR24","doi-asserted-by":"publisher","first-page":"e4","DOI":"10.1371\/journal.pgen.0040004","volume":"4","author":"C Tian","year":"2008","unstructured":"Tian C, Plenge RM, Ransom M, Lee A, Villoslada P, Selmi C, Klareskog L, Pulver AE, Qi L, Gregersen PK, et al.: Analysis and application of European genetic substructure using 300 K SNP information. PLoS genetics 2008, 4(1):e4. 10.1371\/journal.pgen.0040004","journal-title":"PLoS genetics"},{"issue":"2","key":"3112_CR25","doi-asserted-by":"publisher","first-page":"453","DOI":"10.1016\/j.ajhg.2007.11.003","volume":"82","author":"D Luca","year":"2008","unstructured":"Luca D, Ringquist S, Klei L, Lee AB, Gieger C, Wichmann HE, Schreiber S, Krawczak M, Lu Y, Styche A, et al.: On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants. American journal of human genetics 2008, 82(2):453\u2013463. 10.1016\/j.ajhg.2007.11.003","journal-title":"American journal of human genetics"},{"key":"3112_CR26","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1111\/1467-9868.00293","volume":"63","author":"RWG Tibshirani","year":"2001","unstructured":"Tibshirani RWG, Hastie T: Estimating the number of clusters in a dataset via the gap statistic. Journal Royal Statistical Soc B 2001, 63: 411\u2013423. 10.1111\/1467-9868.00293","journal-title":"Journal Royal Statistical Soc B"},{"key":"3112_CR27","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-0450-1","volume-title":"Pattern Recognition with Fuzzy Objective Function Algorithms","author":"JC Bezdec","year":"1981","unstructured":"Bezdec JC: Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum Press; 1981."},{"key":"3112_CR28","unstructured":"Download Structure 2.2[http:\/\/pritch.bsd.uchicago.edu\/software\/structure2_2.html]"},{"key":"3112_CR29","unstructured":"Installing BAPS to XP\/Windows 2000 systems[http:\/\/web.abo.fi\/fak\/mnf\/mate\/jc\/software\/baps_xp.html]"},{"key":"3112_CR30","unstructured":"AWclust[http:\/\/awclust.sourceforge.net\/]"},{"issue":"12","key":"3112_CR31","doi-asserted-by":"publisher","first-page":"1565","DOI":"10.1093\/bioinformatics\/btm138","volume":"23","author":"L Liang","year":"2007","unstructured":"Liang L, Zollner S, Abecasis GR: GENOME: a rapid coalescent-based whole genome simulator. Bioinformatics (Oxford, England) 2007, 23(12):1565\u20131567. 10.1093\/bioinformatics\/btm138","journal-title":"Bioinformatics (Oxford, England)"},{"key":"3112_CR32","volume-title":"Mathematical Population Genetics","author":"WJ Ewens","year":"1979","unstructured":"Ewens WJ: Mathematical Population Genetics. Berlin: Springer; 1979."},{"key":"3112_CR33","unstructured":"International HapMap Project[http:\/\/hapmap.org]"},{"key":"3112_CR34","unstructured":"FTP site for downloading bovine SNPs[ftp:\/\/ftp.hgsc.bcm.tmc.edu\/pub\/data\/Btaurus\/snp\/Btau20040927]"},{"key":"3112_CR35","unstructured":"Bovine Genome Project[http:\/\/www.hgsc.bcm.tmc.edu\/projects\/bovine\/index.html]"},{"issue":"2","key":"3112_CR36","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1186\/1479-7364-2-2-81","volume":"2","author":"MD Shriver","year":"2005","unstructured":"Shriver MD, Mei R, Parra EJ, Sonpar V, Halder I, Tishkoff SA, Schurr TG, Zhadanov SI, Osipova LP, Brutsaert TD, et al.: Large-scale SNP analysis reveals clustered and continuous patterns of human genetic variation. Human genomics 2005, 2(2):81\u201389.","journal-title":"Human genomics"},{"key":"3112_CR37","unstructured":"Breeds of Livestock, Cattle: (Bos)[http:\/\/www.ansi.okstate.edu\/breeds\/cattle\/]"},{"issue":"5","key":"3112_CR38","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1038\/ng0508-491","volume":"40","author":"D Reich","year":"2008","unstructured":"Reich D, Price AL, Patterson N: Principal component analysis of genetic data. Nature genetics 2008, 40(5):491\u2013492. 10.1038\/ng0508-491","journal-title":"Nature genetics"},{"issue":"9","key":"3112_CR39","doi-asserted-by":"publisher","first-page":"1672","DOI":"10.1371\/journal.pgen.0030160","volume":"3","author":"P Paschou","year":"2007","unstructured":"Paschou P, Ziv E, Burchard EG, Choudhry S, Rodriguez-Cintron W, Mahoney MW, Drineas P: PCA-correlated SNPs for structure identification in worldwide human populations. PLoS genetics 2007, 3(9):1672\u20131686. 10.1371\/journal.pgen.0030160","journal-title":"PLoS genetics"},{"issue":"6","key":"3112_CR40","doi-asserted-by":"publisher","first-page":"1419","DOI":"10.1111\/j.1365-294X.2006.02890.x","volume":"15","author":"RS Waples","year":"2006","unstructured":"Waples RS, Gaggiotti O: What is a population? An empirical evaluation of some genetic methods for identifying the number of gene pools and their degree of connectivity. Molecular ecology 2006, 15(6):1419\u20131439. 10.1111\/j.1365-294X.2006.02890.x","journal-title":"Molecular ecology"},{"issue":"5866","key":"3112_CR41","doi-asserted-by":"publisher","first-page":"1100","DOI":"10.1126\/science.1153717","volume":"319","author":"JZ Li","year":"2008","unstructured":"Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, et al.: Worldwide human relationships inferred from genome-wide patterns of variation. Science (New York, NY) 2008, 319(5866):1100\u20131104.","journal-title":"Science (New York, NY)"},{"key":"3112_CR42","volume-title":"Data Clustering: Theory, Algorithms, and Applications","author":"CM Guojun Gan","year":"2007","unstructured":"Guojun Gan CM, Jianhong Wu: Data Clustering: Theory, Algorithms, and Applications. SIAM (Society for Industrial and Applied Mathematics), Philadephia; 2007."},{"issue":"3","key":"3112_CR43","doi-asserted-by":"publisher","first-page":"626","DOI":"10.1086\/520769","volume":"81","author":"H Tang","year":"2007","unstructured":"Tang H, Choudhry S, Mei R, Morgan M, Rodriguez-Cintron W, Burchard EG, Risch NJ: Recent genetic selection in the ancestral admixture of Puerto Ricans. American journal of human genetics 2007, 81(3):626\u2013633. 10.1086\/520769","journal-title":"American journal of human genetics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-10-382.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,8,31]],"date-time":"2021-08-31T21:34:16Z","timestamp":1630445656000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-10-382"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,11,23]]},"references-count":43,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,12]]}},"alternative-id":["3112"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-10-382","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,11,23]]},"assertion":[{"value":"30 April 2009","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 November 2009","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 November 2009","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"382"}}