{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,7]],"date-time":"2025-11-07T08:58:48Z","timestamp":1762505928959},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"15","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: A promising class of methods for large-scale population genomic inference use the conditional sampling distribution (CSD), which approximates the probability of sampling an individual with a particular DNA sequence, given that a collection of sequences from the population has already been observed. The CSD has a wide range of applications, including imputing missing sequence data, estimating recombination rates, inferring human colonization history and identifying tracts of distinct ancestry in admixed populations. Most well-used CSDs are based on hidden Markov models (HMMs). Although computationally efficient in principle, methods resulting from the common implementation of the relevant HMM techniques remain intractable for large genomic datasets.<\/jats:p>\n               <jats:p>Results: To address this issue, a set of algorithmic improvements for performing the exact HMM computation is introduced here, by exploiting the particular structure of the CSD and typical characteristics of genomic data. It is empirically demonstrated that these improvements result in a speedup of several orders of magnitude for large datasets and that the speedup continues to increase with the number of sequences. The optimized algorithms can be adopted in methods for various applications, including the ones mentioned above and make previously impracticable analyses possible.<\/jats:p>\n               <jats:p>Availability: Software available upon request.<\/jats:p>\n               <jats:p>Supplementary Information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <jats:p>Contact: \u00a0yss@eecs.berkeley.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts314","type":"journal-article","created":{"date-parts":[[2012,5,29]],"date-time":"2012-05-29T00:55:25Z","timestamp":1338252925000},"page":"2008-2015","source":"Crossref","is-referenced-by-count":19,"title":["Blockwise HMM computation for large-scale population genomic inference"],"prefix":"10.1093","volume":"28","author":[{"given":"Joshua S.","family":"Paul","sequence":"first","affiliation":[{"name":"1 Computer Science Division and 2Department of Statistics, University of California, Berkeley, CA 94720, USA"}]},{"given":"Yun S.","family":"Song","sequence":"additional","affiliation":[{"name":"1 Computer Science Division and 2Department of Statistics, University of California, Berkeley, CA 94720, USA"},{"name":"1 Computer Science Division and 2Department of Statistics, University of California, Berkeley, CA 94720, USA"}]}],"member":"286","published-online":{"date-parts":[[2012,5,28]]},"reference":[{"key":"2023012512454449900_B1","doi-asserted-by":"crossref","first-page":"1084","DOI":"10.1086\/521987","article-title":"Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering","volume":"81","author":"Browning","year":"2007","journal-title":"Am. J. Hum. Genet."},{"key":"2023012512454449900_B2","doi-asserted-by":"crossref","first-page":"700","DOI":"10.1038\/ng1376","article-title":"Evidence for substantial fine-scale variation in recombination rates across the human genome","volume":"36","author":"Crawford","year":"2004","journal-title":"Nat. Genet."},{"key":"2023012512454449900_B3","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1016\/j.tpb.2009.04.001","article-title":"An approximate likelihood for genetic data under a model with recombination and population splitting","volume":"75","author":"Davison","year":"2009","journal-title":"Theor. Popul. Biol."},{"key":"2023012512454449900_B4","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1239\/aap\/1086957579","article-title":"Importance sampling on coalescent histories. I","volume":"36","author":"De Iorio","year":"2004","journal-title":"Adv. Appl. Prob."},{"key":"2023012512454449900_B5","doi-asserted-by":"crossref","first-page":"434","DOI":"10.1239\/aap\/1086957580","article-title":"Importance sampling on coalescent histories. II: Subdivided population models","volume":"36","author":"De Iorio","year":"2004","journal-title":"Adv. Appl. Prob."},{"key":"2023012512454449900_B6","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1038\/nmeth.1785","article-title":"A linear complexity phasing method for thousands of genomes","volume":"9","author":"Delaneau","year":"2012","journal-title":"Nat. Methods"},{"key":"2023012512454449900_B7","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1534\/genetics.109.103010","article-title":"Ancestral population genomics: the coalescent hidden markov model approach","volume":"183","author":"Dutheil","year":"2009","journal-title":"Genetics"},{"key":"2023012512454449900_B8","doi-asserted-by":"crossref","first-page":"1299","DOI":"10.1093\/genetics\/159.3.1299","article-title":"Estimating recombination rates from population genetic data","volume":"159","author":"Fearnhead","year":"2001","journal-title":"Genetics"},{"key":"2023012512454449900_B9","doi-asserted-by":"crossref","first-page":"657","DOI":"10.1111\/1467-9868.00355","article-title":"Approximate likelihood methods for estimating local recombination rates","volume":"64","author":"Fearnhead","year":"2002","journal-title":"J. Royal Stat. Soc. B"},{"key":"2023012512454449900_B10","doi-asserted-by":"crossref","first-page":"781","DOI":"10.1086\/497579","article-title":"A novel method with improved power to detect recombination hotspots from polymorphism data reveals multiple hotspots in human genes","volume":"77","author":"Fearnhead","year":"2005","journal-title":"Am. J. Hum. Genet."},{"key":"2023012512454449900_B11","doi-asserted-by":"crossref","first-page":"881","DOI":"10.1534\/genetics.107.078907","article-title":"Estimating meiotic gene conversion rates from population genetic data","volume":"177","author":"Gay","year":"2007","journal-title":"Genetics"},{"key":"2023012512454449900_B12","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1239\/aap\/1214950213","article-title":"Importance sampling and the two-locus model with subdivided population structure","volume":"40","author":"Griffiths","year":"2008","journal-title":"Adv. Appl. Probab."},{"key":"2023012512454449900_B13","doi-asserted-by":"crossref","first-page":"e1000078","DOI":"10.1371\/journal.pgen.1000078","article-title":"Inferring human colonization history using a copying model","volume":"4","author":"Hellenthal","year":"2008","journal-title":"PLoS Genet."},{"key":"2023012512454449900_B14","doi-asserted-by":"crossref","first-page":"e7","DOI":"10.1371\/journal.pgen.0030007","article-title":"Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden markov model","volume":"3","author":"Hobolth","year":"2007","journal-title":"PLoS Genet"},{"key":"2023012512454449900_B15","doi-asserted-by":"crossref","first-page":"e1000529","DOI":"10.1371\/journal.pgen.1000529","article-title":"A flexible and accurate genotype imputation method for the next generation of genome-wide association studies","volume":"5","author":"Howie","year":"2009","journal-title":"PLoS Genet."},{"key":"2023012512454449900_B16","doi-asserted-by":"crossref","first-page":"1805","DOI":"10.1093\/genetics\/159.4.1805","article-title":"Two-locus sampling distributions and their application","volume":"159","author":"Hudson","year":"2001","journal-title":"Genetics"},{"key":"2023012512454449900_B17","doi-asserted-by":"crossref","first-page":"e1002453","DOI":"10.1371\/journal.pgen.1002453","article-title":"Inference of population structure using dense haplotype data","volume":"8","author":"Lawson","year":"2012","journal-title":"PLoS Genet."},{"key":"2023012512454449900_B18","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1038\/nature10231","article-title":"Inference of human population history from individual whole-genome sequences","volume":"475","author":"Li","year":"2011","journal-title":"Nature"},{"key":"2023012512454449900_B19","doi-asserted-by":"crossref","first-page":"2213","DOI":"10.1093\/genetics\/165.4.2213","article-title":"Modelling linkage disequilibrium, and identifying recombination hotspots using SNP data","volume":"165","author":"Li","year":"2003","journal-title":"Genetics"},{"key":"2023012512454449900_B20","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1002\/gepi.20533","article-title":"Mach: Using sequence and genotype data to estimate haplotypes and unobserved genotypes","volume":"34","author":"Li","year":"2010","journal-title":"Genet. Epidemiol."},{"key":"2023012512454449900_B21","doi-asserted-by":"crossref","first-page":"906","DOI":"10.1038\/ng2088","article-title":"A new multipoint method for genome-wide association studies by imputation of genotypes","volume":"39","author":"Marchini","year":"2007","journal-title":"Nat. Genet."},{"key":"2023012512454449900_B22","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1186\/1471-2156-7-16","article-title":"Fast \u201ccoalescent\u201d simulation","volume":"7","author":"Marjoram","year":"2006","journal-title":"BMC Genet."},{"key":"2023012512454449900_B23","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1126\/science.1092500","article-title":"The fine-scale structure of recombination rate variation in the human genome","volume":"304","author":"McVean","year":"2004","journal-title":"Science"},{"key":"2023012512454449900_B24","doi-asserted-by":"crossref","first-page":"1387","DOI":"10.1098\/rstb.2005.1673","article-title":"Approximating the coalescent with recombination","volume":"360","author":"McVean","year":"2005","journal-title":"Philos. Trans. R. Soc. Lond. B Biol. Sci."},{"key":"2023012512454449900_B25","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1534\/genetics.110.117986","article-title":"A principled approach to deriving approximate conditional sampling distributions in population genetics models with recombination","volume":"186","author":"Paul","year":"2010","journal-title":"Genetics"},{"key":"2023012512454449900_B26","doi-asserted-by":"crossref","first-page":"1115","DOI":"10.1534\/genetics.110.125534","article-title":"An accurate sequentially markov conditional sampling distribution for the coalescent with recombination","volume":"187","author":"Paul","year":"2011","journal-title":"Genetics"},{"key":"2023012512454449900_B27","doi-asserted-by":"crossref","first-page":"e1000519","DOI":"10.1371\/journal.pgen.1000519","article-title":"Sensitive detection of chromosomal segments of distinct ancestry in admixed populations","volume":"5","author":"Price","year":"2009","journal-title":"PLoS Genet."},{"key":"2023012512454449900_B28","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1086\/502802","article-title":"A fast and flexible method for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase","volume":"78","author":"Scheet","year":"2006","journal-title":"Am. J. Hum. Genet."},{"key":"2023012512454449900_B29","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1111\/1467-9868.00254","article-title":"Inference in molecular population genetics","volume":"62","author":"Stephens","year":"2000","journal-title":"J. R. Stat. Soc. Ser. B Stat. Methodol."},{"key":"2023012512454449900_B30","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1086\/428594","article-title":"Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation","volume":"76","author":"Stephens","year":"2005","journal-title":"Am. J. Hum. Genet."},{"key":"2023012512454449900_B31","doi-asserted-by":"crossref","first-page":"676","DOI":"10.1101\/gr.072850.107","article-title":"Effect of genetic divergence in identifying ancestral origin using HAPAA","volume":"18","author":"Sundquist","year":"2008","journal-title":"Genome Res."},{"key":"2023012512454449900_B32","doi-asserted-by":"crossref","first-page":"847","DOI":"10.1038\/ng.894","article-title":"Recombination rates in admixed individuals identified by ancestry-based inference","volume":"43","author":"Wegmann","year":"2011","journal-title":"Nat. Genet."},{"key":"2023012512454449900_B33","doi-asserted-by":"crossref","first-page":"i231","DOI":"10.1093\/bioinformatics\/btp229","article-title":"Joint estimation of gene conversion rates and mean conversion tract lengths from population SNP data","volume":"25","author":"Yin","year":"2009","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/15\/2008\/48876185\/bioinformatics_28_15_2008.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/15\/2008\/48876185\/bioinformatics_28_15_2008.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T17:22:40Z","timestamp":1674667360000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/15\/2008\/236900"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,5,28]]},"references-count":33,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2012,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts314","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,8,1]]},"published":{"date-parts":[[2012,5,28]]}}}