{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T00:27:29Z","timestamp":1774657649686,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1010301","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,9,7]],"date-time":"2022-09-07T00:00:00Z","timestamp":1662508800000}}],"reference-count":17,"publisher":"Public Library of Science (PLoS)","issue":"8","license":[{"start":{"date-parts":[[2022,8,25]],"date-time":"2022-08-25T00:00:00Z","timestamp":1661385600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100014989","name":"Chan Zuckerberg Initiative","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100014989","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Royal Academy of Engineering Leaders Scholarship"}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>The estimation of genetic clusters using genomic data has application from genome-wide association studies (GWAS) to demographic history to polygenic risk scores (PRS) and is expected to play an important role in the analyses of increasingly diverse, large-scale cohorts. However, existing methods are computationally-intensive, prohibitively so in the case of nationwide biobanks. Here we explore Archetypal Analysis as an efficient, unsupervised approach for identifying genetic clusters and for associating individuals with them. Such unsupervised approaches help avoid conflating socially constructed ethnic labels with genetic clusters by eliminating the need for exogenous training labels. We show that Archetypal Analysis yields similar cluster structure to existing unsupervised methods such as ADMIXTURE and provides interpretative advantages. More importantly, we show that since Archetypal Analysis can be used with lower-dimensional representations of genetic data, significant reductions in computational time and memory requirements are possible. When Archetypal Analysis is run in such a fashion, it takes several orders of magnitude less compute time than the current standard, ADMIXTURE. Finally, we demonstrate uses ranging across datasets from humans to canids.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1010301","type":"journal-article","created":{"date-parts":[[2022,8,25]],"date-time":"2022-08-25T13:58:09Z","timestamp":1661435889000},"page":"e1010301","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":16,"title":["Archetypal Analysis for population genetics"],"prefix":"10.1371","volume":"18","author":[{"given":"Julia","family":"Gimbernat-Mayol","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6224-0750","authenticated-orcid":true,"given":"Albert","family":"Dominguez Mantes","sequence":"additional","affiliation":[]},{"given":"Carlos D.","family":"Bustamante","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7946-7724","authenticated-orcid":true,"given":"Daniel","family":"Mas Montserrat","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4735-7803","authenticated-orcid":true,"given":"Alexander G.","family":"Ioannidis","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,8,25]]},"reference":[{"issue":"2","key":"pcbi.1010301.ref001","doi-asserted-by":"crossref","first-page":"945","DOI":"10.1093\/genetics\/155.2.945","article-title":"Inference of population structure using multilocus genotype data","volume":"155","author":"JK Pritchard","year":"2000","journal-title":"Genetics"},{"issue":"4","key":"pcbi.1010301.ref002","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1002\/gepi.20064","article-title":"Estimation of individual admixture: analytical and study design considerations","volume":"28","author":"H Tang","year":"2005","journal-title":"Genetic epidemiology"},{"key":"pcbi.1010301.ref003","doi-asserted-by":"crossref","DOI":"10.1186\/1471-2105-12-246","article-title":"Enhancements to the ADMIXTURE algorithm for individual ancestry estimation","volume":"12","author":"DH Alexander","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"pcbi.1010301.ref004","doi-asserted-by":"crossref","DOI":"10.1038\/ng0508-491","article-title":"Principal component analysis of genetic data","volume":"40","author":"D Reich","year":"2008","journal-title":"Nature Genetics"},{"issue":"11","key":"pcbi.1010301.ref005","doi-asserted-by":"crossref","first-page":"e1008432","DOI":"10.1371\/journal.pgen.1008432","article-title":"UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts","volume":"15","author":"A Diaz-Papkovich","year":"2019","journal-title":"PLoS genetics"},{"key":"pcbi.1010301.ref006","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pgen.1000686","article-title":"A genealogical interpretation of principal components analysis","volume":"5","author":"G McVean","year":"2009","journal-title":"PLoS Genetics"},{"issue":"7218","key":"pcbi.1010301.ref007","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1038\/nature07331","article-title":"Genes mirror geography within Europe","volume":"456","author":"J Novembre","year":"2008","journal-title":"Nature"},{"key":"pcbi.1010301.ref008","doi-asserted-by":"crossref","DOI":"10.1080\/00401706.1994.10485840","article-title":"Archetypal analysis","volume":"36","author":"A Cutler","year":"1994","journal-title":"Technometrics"},{"key":"pcbi.1010301.ref009","unstructured":"Motevalli Soumehsaraei A Benyamin; Barnard. Archetypal Analysis Package. v1. CSIRO. Software Collection.; 2019."},{"key":"pcbi.1010301.ref010","article-title":"From Spider-Man to Hero\u2014Archetypal Analysis in R","volume":"30","author":"MJA Eugster","year":"2009","journal-title":"Journal of Statistical Software"},{"key":"pcbi.1010301.ref011","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1016\/j.neucom.2011.06.033","article-title":"Archetypal analysis for machine learning and data mining","volume":"80","author":"M M\u00f8rup","year":"2012","journal-title":"Neurocomputing"},{"key":"pcbi.1010301.ref012","doi-asserted-by":"crossref","DOI":"10.1126\/science.aay5012","article-title":"Insights into human genetic variation and population history from 929 diverse genomes","volume":"367","author":"A Bergstr\u00f6m","year":"2020","journal-title":"Science"},{"issue":"7624","key":"pcbi.1010301.ref013","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1038\/nature18964","article-title":"The Simons genome diversity project: 300 genomes from 142 diverse populations","volume":"538","author":"S Mallick","year":"2016","journal-title":"Nature"},{"issue":"7571","key":"pcbi.1010301.ref014","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"GP Consortium","year":"2015","journal-title":"Nature"},{"key":"pcbi.1010301.ref015","doi-asserted-by":"crossref","DOI":"10.1016\/j.celrep.2017.03.079","article-title":"Genomic Analyses Reveal the Influence of Geographic Origin, Migration, and Hybridization on Modern Dog Breed Development","volume":"19","author":"HG Parker","year":"2017","journal-title":"Cell Reports"},{"issue":"18","key":"pcbi.1010301.ref016","doi-asserted-by":"crossref","first-page":"2817","DOI":"10.1093\/bioinformatics\/btw327","article-title":"pong: fast analysis and visualization of latent clusters in population genetic data","volume":"32","author":"AA Behr","year":"2016","journal-title":"Bioinformatics"},{"key":"pcbi.1010301.ref017","doi-asserted-by":"crossref","DOI":"10.1038\/nature08365","article-title":"Reconstructing Indian population history","volume":"461","author":"D Reich","year":"2009","journal-title":"Nature"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1010301","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,9,7]],"date-time":"2022-09-07T00:00:00Z","timestamp":1662508800000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010301","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,9,7]],"date-time":"2022-09-07T13:37:07Z","timestamp":1662557827000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010301"}},"subtitle":[],"editor":[{"given":"Heather E.","family":"Wheeler","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,8,25]]},"references-count":17,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2022,8,25]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1010301","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.11.28.470296","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,25]]}}}