{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T10:52:54Z","timestamp":1740135174186,"version":"3.37.3"},"reference-count":16,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,5,2]],"date-time":"2023-05-02T00:00:00Z","timestamp":1682985600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,2]],"date-time":"2023-05-02T00:00:00Z","timestamp":1682985600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01 AG076901"],"award-info":[{"award-number":["R01 AG076901"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Large-scale multi-ethnic DNA sequencing data is increasingly available owing to decreasing cost of modern sequencing technologies. Inference of the population structure with such sequencing data is fundamentally important. However, the ultra-dimensionality and complicated linkage disequilibrium patterns across the whole genome make it challenging to infer population structure using traditional principal component analysis based methods and software.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>We present the ERStruct Python Package, which enables the inference of population structure using whole-genome sequencing data. By leveraging parallel computing and GPU acceleration, our package achieves significant improvements in the speed of matrix operations for large-scale data. Additionally, our package features adaptive data splitting capabilities to facilitate computation on GPUs with limited memory.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>Our Python package ERStruct is an efficient and user-friendly tool for estimating the number of top informative principal components that capture population structure from whole genome sequencing data.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-023-05305-0","type":"journal-article","created":{"date-parts":[[2023,5,2]],"date-time":"2023-05-02T15:03:04Z","timestamp":1683039784000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["ERStruct: a fast Python package for inferring the number of top principal components from whole genome sequencing data"],"prefix":"10.1186","volume":"24","author":[{"given":"Jinghan","family":"Yang","sequence":"first","affiliation":[]},{"given":"Yuyang","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Minhao","family":"Yao","sequence":"additional","affiliation":[]},{"given":"Gao","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Zhonghua","family":"Liu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,5,2]]},"reference":[{"issue":"8","key":"5305_CR1","doi-asserted-by":"publisher","first-page":"904","DOI":"10.1038\/ng1847","volume":"38","author":"AL Price","year":"2006","unstructured":"Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38(8):904.","journal-title":"Nat Genet"},{"issue":"3","key":"5305_CR2","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1038\/ng.1074","volume":"44","author":"I Mathieson","year":"2012","unstructured":"Mathieson I, McVean G. Differential confounding of rare and common variants in spatially structured populations. Nat Genet. 2012;44(3):243\u20136.","journal-title":"Nat Genet"},{"issue":"4","key":"5305_CR3","doi-asserted-by":"publisher","first-page":"409","DOI":"10.1038\/ng.2924","volume":"46","author":"C Wang","year":"2014","unstructured":"Wang C, Zhan X, Bragg-Gresham J, Kang HM, Stambolian D, Chew EY, et al. Ancestry estimation and control of population stratification for sequence-based association studies. Nat Genet. 2014;46(4):409\u201315.","journal-title":"Nat Genet"},{"key":"5305_CR4","doi-asserted-by":"publisher","first-page":"786","DOI":"10.1126\/science.356262","volume":"201","author":"P Menozzi","year":"1978","unstructured":"Menozzi P, Piazza A, Cavalli-Sforza L. Synthetic maps of human gene frequencies in Europeans. Science. 1978;201:786\u201392.","journal-title":"Science"},{"key":"5305_CR5","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pgen.0020190","volume":"2","author":"N Patterson","year":"2006","unstructured":"Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2: e190.","journal-title":"PLoS Genet"},{"key":"5305_CR6","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1038\/ng0508-491","volume":"40","author":"D Reich","year":"2008","unstructured":"Reich D, Price AL, Patterson N. Principal component analysis of genetic data. Nat Genet. 2008;40:491\u20132.","journal-title":"Nat Genet"},{"issue":"2","key":"5305_CR7","doi-asserted-by":"publisher","first-page":"295","DOI":"10.1214\/aos\/1009210544","volume":"29","author":"IM Johnstone","year":"2001","unstructured":"Johnstone IM. On the distribution of the largest eigenvalue in principal components analysis. Ann Stat. 2001;29(2):295\u2013327.","journal-title":"Ann Stat"},{"key":"5305_CR8","doi-asserted-by":"publisher","DOI":"10.1111\/biom.13691","author":"Y Xu","year":"2022","unstructured":"Xu Y, Liu Z, Yao J. An eigenvalue ratio approach to inferring population structure from whole genome sequencing data. Biometrics. 2022. https:\/\/doi.org\/10.1111\/biom.13691.","journal-title":"Biometrics"},{"key":"5305_CR9","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1038\/nature09298","volume":"467","author":"The International HapMap 3 Consortium","year":"2010","unstructured":"The International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52\u20138.","journal-title":"Nature"},{"issue":"7571","key":"5305_CR10","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1038\/nature15393","volume":"526","author":"The 1000 Genomes Project Consortium","year":"2015","unstructured":"The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526(7571):68\u201374.","journal-title":"Nature"},{"issue":"1","key":"5305_CR11","doi-asserted-by":"publisher","first-page":"494","DOI":"10.1016\/j.aim.2011.02.007","volume":"227","author":"F Benaych-Georges","year":"2011","unstructured":"Benaych-Georges F, Nadakuditi RR. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adv Math. 2011;227(1):494\u2013521.","journal-title":"Adv Math"},{"issue":"60","key":"5305_CR12","first-page":"1621","volume":"16","author":"F Benaych-Georges","year":"2011","unstructured":"Benaych-Georges F, Guionnet A, Maida M. Fluctuations of the extreme eigenvalues of finite rank deformations of random matrices. Electron J Probab. 2011;16(60):1621\u201362.","journal-title":"Electron J Probab"},{"issue":"1","key":"5305_CR13","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1214\/16-AOS1452","volume":"45","author":"Z Li","year":"2017","unstructured":"Li Z, Wang Q, Yao J. Identifying the number of factors from singular values of a large sample auto-covariance matrix. Ann Stat. 2017;45(1):257\u201388.","journal-title":"Ann Stat"},{"issue":"2","key":"5305_CR14","doi-asserted-by":"publisher","first-page":"325","DOI":"10.2307\/1970008","volume":"67","author":"EP Wigner","year":"1958","unstructured":"Wigner EP. On the distribution of the roots of certain symmetric matrices. Ann Math. 1958;67(2):325\u20137.","journal-title":"Ann Math"},{"issue":"3","key":"5305_CR15","first-page":"191","volume":"19","author":"L Arnold","year":"1971","unstructured":"Arnold L. On Wigner\u2019s semicircle law for the eigenvalues of random matrices. Probab Theory Relat Fields. 1971;19(3):191\u20138.","journal-title":"Probab Theory Relat Fields"},{"key":"5305_CR16","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1016\/j.jmva.2013.12.015","volume":"126","author":"L Wang","year":"2014","unstructured":"Wang L, Paul D. Limiting spectral distribution of renormalized separable sample covariance matrices when p\/n$$\\rightarrow$$0. J Multivar Anal. 2014;126:25\u201352.","journal-title":"J Multivar Anal"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-023-05305-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-023-05305-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-023-05305-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,3]],"date-time":"2023-05-03T03:25:04Z","timestamp":1683084304000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-023-05305-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,2]]},"references-count":16,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["5305"],"URL":"https:\/\/doi.org\/10.1186\/s12859-023-05305-0","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2023,5,2]]},"assertion":[{"value":"7 November 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 April 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 May 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"180"}}