{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:45:13Z","timestamp":1740185113716,"version":"3.37.3"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"16","license":[{"start":{"date-parts":[[2018,12,24]],"date-time":"2018-12-24T00:00:00Z","timestamp":1545609600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Science Foundation of China","doi-asserted-by":"publisher","award":["NSFC 11671375","11271346"],"award-info":[{"award-number":["NSFC 11671375","11271346"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,8,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Estimating haplotype frequencies from genotype data plays an important role in genetic analysis. In silico methods are usually computationally involved since phase information is not available. Due to tight linkage disequilibrium and low recombination rates, the number of haplotypes observed in human populations is far less than all the possibilities. This motivates us to solve the estimation problem by maximizing the sparsity of existing haplotypes. Here, we propose a new algorithm by applying the compressive sensing (CS) theory in the field of signal processing, compressive sensing haplotype inference (CSHAP), to solve the sparse representation of haplotype frequencies based on allele frequencies and between-allele co-variances.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Our proposed approach can handle both individual genotype data and pooled DNA data with hundreds of loci. The CSHAP exhibits the same accuracy compared with the state-of-the-art methods, but runs several orders of magnitude faster. CSHAP can also handle with missing genotype data imputations efficiently.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The CSHAP is implemented in R, the source code and the testing datasets are available at http:\/\/home.ustc.edu.cn\/\u223czhouys\/CSHAP\/.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty1040","type":"journal-article","created":{"date-parts":[[2018,12,20]],"date-time":"2018-12-20T20:19:01Z","timestamp":1545337141000},"page":"2827-2833","source":"Crossref","is-referenced-by-count":1,"title":["CSHAP: efficient haplotype frequency estimation based on sparse representation"],"prefix":"10.1093","volume":"35","author":[{"given":"Yinsheng","family":"Zhou","sequence":"first","affiliation":[{"name":"Department of Statistics and Finance, University of Science and Technology of China, Hefei, Anhui, China"}]},{"given":"Han","family":"Zhang","sequence":"additional","affiliation":[{"name":"Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA"}]},{"given":"Yaning","family":"Yang","sequence":"additional","affiliation":[{"name":"Department of Statistics and Finance, University of Science and Technology of China, Hefei, Anhui, China"}]}],"member":"286","published-online":{"date-parts":[[2018,12,24]]},"reference":[{"volume-title":"Theory of Multivariate Statistics","year":"2008","author":"Bilodeau","key":"2023062708552453300_bty1040-B1"},{"key":"2023062708552453300_bty1040-B2","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511804441","volume-title":"Convex Optimization","author":"Boyd","year":"2004"},{"key":"2023062708552453300_bty1040-B3","doi-asserted-by":"crossref","first-page":"703","DOI":"10.1038\/nrg3054","article-title":"Haplotype phasing: existing methods and new developments","volume":"12","author":"Browning","year":"2011","journal-title":"Nat. Rev. Genet"},{"key":"2023062708552453300_bty1040-B4","first-page":"111","article-title":"Inference of haplotypes from pcr-amplified samples of diploid populations","volume":"7","author":"Clark","year":"1990","journal-title":"Mol. Biol. Evol"},{"key":"2023062708552453300_bty1040-B5","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1038\/ng1001-229","article-title":"High-resolution haplotype structure in the human genome","volume":"29","author":"Daly","year":"2001","journal-title":"Nat. Genet"},{"key":"2023062708552453300_bty1040-B6","doi-asserted-by":"crossref","first-page":"540.","DOI":"10.1186\/1471-2105-9-540","article-title":"Shape-IT: new rapid and accurate algorithm for haplotype inference","volume":"9","author":"Delaneau","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023062708552453300_bty1040-B7","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1038\/nmeth.1785","article-title":"A linear complexity phasing method for thousands of genomes","volume":"9","author":"Delaneau","year":"2011","journal-title":"Nat. Methods"},{"key":"2023062708552453300_bty1040-B8","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1038\/nmeth.2307","article-title":"Improved whole-chromosome phasing for disease and population genetic studies","volume":"10","author":"Delaneau","year":"2013","journal-title":"Nat. Methods"},{"key":"2023062708552453300_bty1040-B9","doi-asserted-by":"crossref","first-page":"3934.","DOI":"10.1038\/ncomms4934","article-title":"Integrating sequence and array data to create an improved 1000 genomes project haplotype reference panel","volume":"5","author":"Delaneau","year":"2014","journal-title":"Nat. Commun"},{"key":"2023062708552453300_bty1040-B10","first-page":"921","article-title":"Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population","volume":"12","author":"Excoffier","year":"1995","journal-title":"Mol. Biol. Evol"},{"key":"2023062708552453300_bty1040-B11","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1089\/10665270152530863","article-title":"Inference of haplotypes from samples of diploid populations: complexity and algorithms","volume":"8","author":"Gusfield","year":"2001","journal-title":"J. Comput. Biol"},{"key":"2023062708552453300_bty1040-B12","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1007\/3-540-44888-8_11","volume-title":"Proceedings of the 14th Annual Conference on Combinatorial Pattern Matching, CPM\u201903","author":"Gusfield","year":"2003"},{"volume-title":"Haplotype Inference. CRC Handbook on Bioinformatics, Chapter 1","year":"2005","author":"Gusfield","key":"2023062708552453300_bty1040-B13"},{"key":"2023062708552453300_bty1040-B14","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-662-06409-2","volume-title":"Convex Analysis and Minimization Algorithms II","author":"Hiriart-Urruty","year":"1993"},{"key":"2023062708552453300_bty1040-B15","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1534\/g3.111.001198","article-title":"Genotype imputation with thousands of genomes","volume":"1","author":"Howie","year":"2011","journal-title":"G3"},{"key":"2023062708552453300_bty1040-B16","doi-asserted-by":"crossref","first-page":"2013","DOI":"10.1109\/TSP.2011.2179542","article-title":"Maximum-parsimony haplotype inference based on sparse representations of genotypes","volume":"60","author":"Jajamovich","year":"2012","journal-title":"IEEE Trans. Sig. Process"},{"key":"2023062708552453300_bty1040-B17","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1093\/bioinformatics\/btn623","article-title":"Computationally feasible estimation of haplotype frequencies from pooled DNA with and without Hardy-Weinberg equilibrium","volume":"25","author":"Kuk","year":"2009","journal-title":"Bioinformatics"},{"key":"2023062708552453300_bty1040-B18","doi-asserted-by":"crossref","first-page":"1129","DOI":"10.1086\/344347","article-title":"Haplotype inference in random population samples","volume":"71","author":"Lin","year":"2002","journal-title":"Am. J. Hum. Genet"},{"key":"2023062708552453300_bty1040-B19","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1016\/S0065-2660(07)00414-2","article-title":"Haplotype-association analysis","volume":"60","author":"Liu","year":"2008","journal-title":"Adv. Genet"},{"key":"2023062708552453300_bty1040-B20","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1086\/338446","article-title":"Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms","volume":"70","author":"Niu","year":"2002","journal-title":"Am. J. Hum. Genet"},{"key":"2023062708552453300_bty1040-B21","doi-asserted-by":"crossref","first-page":"1719","DOI":"10.1126\/science.1065573","article-title":"Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21","volume":"294","author":"Patil","year":"2001","journal-title":"Science"},{"key":"2023062708552453300_bty1040-B22","doi-asserted-by":"crossref","first-page":"1242","DOI":"10.1086\/344207","article-title":"Partition-ligation\u2013expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms","volume":"71","author":"Qin","year":"2002","journal-title":"Am. J. Hum. Genet"},{"key":"2023062708552453300_bty1040-B23","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1137\/070697835","article-title":"Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization","volume":"52","author":"Recht","year":"2010","journal-title":"SIAM Rev"},{"key":"2023062708552453300_bty1040-B24","doi-asserted-by":"crossref","first-page":"59.","DOI":"10.1038\/8760","article-title":"Sequence variation in the human angiotensin converting enzyme","volume":"22","author":"Rieder","year":"1999","journal-title":"Nat. Genet"},{"key":"2023062708552453300_bty1040-B25","doi-asserted-by":"crossref","first-page":"832","DOI":"10.1038\/nature01140","article-title":"Detecting recent positive selection in the human genome from haplotype structure","volume":"419","author":"Sabeti","year":"2002","journal-title":"Nature"},{"key":"2023062708552453300_bty1040-B26","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1086\/502802","article-title":"A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase","volume":"78","author":"Scheet","year":"2006","journal-title":"Am. J. Hum. Genet"},{"key":"2023062708552453300_bty1040-B27","doi-asserted-by":"crossref","first-page":"862","DOI":"10.1038\/nrg930","article-title":"Dna pooling: a tool for large-scale association studies","volume":"3","author":"Sham","year":"2002","journal-title":"Nat. Rev. Genet"},{"key":"2023062708552453300_bty1040-B28","doi-asserted-by":"crossref","first-page":"1162","DOI":"10.1086\/379378","article-title":"A comparison of bayesian methods for haplotype reconstruction from population genotype data","volume":"73","author":"Stephens","year":"2003","journal-title":"Am. J. Hum. Genet"},{"key":"2023062708552453300_bty1040-B29","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1086\/428594","article-title":"Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation","volume":"76","author":"Stephens","year":"2005","journal-title":"Am. J. Hum. Genet"},{"key":"2023062708552453300_bty1040-B30","doi-asserted-by":"crossref","first-page":"978","DOI":"10.1086\/319501","article-title":"A new statistical method for haplotype reconstruction from population data","volume":"68","author":"Stephens","year":"2001","journal-title":"Am. J. Hum. Genet"},{"key":"2023062708552453300_bty1040-B31","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1089\/cmb.2006.0102","article-title":"Bayesian haplotype inference via the Dirichlet process","volume":"14","author":"Xing","year":"2007","journal-title":"J. Comput. Biol"},{"key":"2023062708552453300_bty1040-B32","doi-asserted-by":"crossref","first-page":"7225","DOI":"10.1073\/pnas.1237858100","article-title":"Efficiency of single-nucleotide polymorphism haplotype estimation from pooled dna","volume":"100","author":"Yang","year":"2003","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062708552453300_bty1040-B33","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1002\/gepi.20040","article-title":"Estimating haplotype-disease associations with pooled genotype data","volume":"28","author":"Zeng","year":"2005","journal-title":"Genet. Epidemiol"},{"key":"2023062708552453300_bty1040-B34","doi-asserted-by":"crossref","first-page":"1942","DOI":"10.1093\/bioinformatics\/btn324","article-title":"PoooL: an efficient method for estimating haplotype frequencies from large DNA pools","volume":"24","author":"Zhang","year":"2008","journal-title":"Bioinformatics"},{"key":"2023062708552453300_bty1040-B35","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1086\/506276","article-title":"A coalescence-guided hierarchical bayesian method for haplotype inference","volume":"79","author":"Zhang","year":"2006","journal-title":"Am. J. Hum. Genet"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/16\/2827\/50719304\/bty1040.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/16\/2827\/50719304\/bty1040.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T09:00:53Z","timestamp":1687856453000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/16\/2827\/5259188"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,12,24]]},"references-count":35,"journal-issue":{"issue":"16","published-print":{"date-parts":[[2019,8,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty1040","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2019,8,15]]},"published":{"date-parts":[[2018,12,24]]}}}