{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T23:55:33Z","timestamp":1768002933086,"version":"3.49.0"},"reference-count":28,"publisher":"Oxford University Press (OUP)","issue":"18","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2013,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Haplotypes, defined as the sequence of alleles on one chromosome, are crucial for many genetic analyses. As experimental determination of haplotypes is extremely expensive, haplotypes are traditionally inferred using computational approaches from genotype data, i.e. the mixture of the genetic information from both haplotypes. Best performing approaches for haplotype inference rely on Hidden Markov Models, with the underlying assumption that the haplotypes of a given individual can be represented as a mosaic of segments from other haplotypes in the same population. Such algorithms use this model to predict the most likely haplotypes that explain the observed genotype data conditional on reference panel of haplotypes. With rapid advances in short read sequencing technologies, sequencing is quickly establishing as a powerful approach for collecting genetic variation information. As opposed to traditional genotyping-array technologies that independently call genotypes at polymorphic sites, short read sequencing often collects haplotypic information; a read spanning more than one polymorphic locus (multi-single nucleotide polymorphic read) contains information on the haplotype from which the read originates. However, this information is generally ignored in existing approaches for haplotype phasing and genotype-calling from short read data.<\/jats:p>\n               <jats:p>Results: In this article, we propose a novel framework for haplotype inference from short read sequencing that leverages multi-single nucleotide polymorphic reads together with a reference panel of haplotypes. The basis of our approach is a new probabilistic model that finds the most likely haplotype segments from the reference panel to explain the short read sequencing data for a given individual. We devised an efficient sampling method within a probabilistic model to achieve superior performance than existing methods. Using simulated sequencing reads from real individual genotypes in the HapMap data and the 1000 Genomes projects, we show that our method is highly accurate and computationally efficient. Our haplotype predictions improve accuracy over the basic haplotype copying model by \u223c20% with comparable computational time, and over another recently proposed approach Hap-SeqX by \u223c10% with significantly reduced computational time and memory usage.<\/jats:p>\n               <jats:p>Availability: Publicly available software is available at http:\/\/genetics.cs.ucla.edu\/harsh<\/jats:p>\n               <jats:p>Contact: \u00a0bpasaniuc@mednet.ucla.edu or eeskin@cs.ucla.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt386","type":"journal-article","created":{"date-parts":[[2013,7,4]],"date-time":"2013-07-04T07:00:34Z","timestamp":1372921234000},"page":"2245-2252","source":"Crossref","is-referenced-by-count":23,"title":["Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data"],"prefix":"10.1093","volume":"29","author":[{"given":"Wen-Yun","family":"Yang","sequence":"first","affiliation":[{"name":"1 Department of Computer Science and 2Inter-Departmental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA, 3IBM T.J. Watson Research, Yorktown Heights, NY 10598, USA, 4Department of Pathology and Laboratory Medicine, 5Jonsson Comprehensive Cancer Center and 6Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"},{"name":"1 Department of Computer Science and 2Inter-Departmental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA, 3IBM T.J. Watson Research, Yorktown Heights, NY 10598, USA, 4Department of Pathology and Laboratory Medicine, 5Jonsson Comprehensive Cancer Center and 6Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"}]},{"given":"Farhad","family":"Hormozdiari","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and 2Inter-Departmental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA, 3IBM T.J. Watson Research, Yorktown Heights, NY 10598, USA, 4Department of Pathology and Laboratory Medicine, 5Jonsson Comprehensive Cancer Center and 6Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"}]},{"given":"Zhanyong","family":"Wang","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and 2Inter-Departmental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA, 3IBM T.J. Watson Research, Yorktown Heights, NY 10598, USA, 4Department of Pathology and Laboratory Medicine, 5Jonsson Comprehensive Cancer Center and 6Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"}]},{"given":"Dan","family":"He","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and 2Inter-Departmental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA, 3IBM T.J. Watson Research, Yorktown Heights, NY 10598, USA, 4Department of Pathology and Laboratory Medicine, 5Jonsson Comprehensive Cancer Center and 6Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"}]},{"given":"Bogdan","family":"Pasaniuc","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and 2Inter-Departmental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA, 3IBM T.J. Watson Research, Yorktown Heights, NY 10598, USA, 4Department of Pathology and Laboratory Medicine, 5Jonsson Comprehensive Cancer Center and 6Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"},{"name":"1 Department of Computer Science and 2Inter-Departmental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA, 3IBM T.J. Watson Research, Yorktown Heights, NY 10598, USA, 4Department of Pathology and Laboratory Medicine, 5Jonsson Comprehensive Cancer Center and 6Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"},{"name":"1 Department of Computer Science and 2Inter-Departmental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA, 3IBM T.J. Watson Research, Yorktown Heights, NY 10598, USA, 4Department of Pathology and Laboratory Medicine, 5Jonsson Comprehensive Cancer Center and 6Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"}]},{"given":"Eleazar","family":"Eskin","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and 2Inter-Departmental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA, 3IBM T.J. Watson Research, Yorktown Heights, NY 10598, USA, 4Department of Pathology and Laboratory Medicine, 5Jonsson Comprehensive Cancer Center and 6Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"},{"name":"1 Department of Computer Science and 2Inter-Departmental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA, 3IBM T.J. Watson Research, Yorktown Heights, NY 10598, USA, 4Department of Pathology and Laboratory Medicine, 5Jonsson Comprehensive Cancer Center and 6Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"},{"name":"1 Department of Computer Science and 2Inter-Departmental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA, 3IBM T.J. Watson Research, Yorktown Heights, NY 10598, USA, 4Department of Pathology and Laboratory Medicine, 5Jonsson Comprehensive Cancer Center and 6Department of Human Genetics, University of California, Los Angeles, CA 90095, USA"}]}],"member":"286","published-online":{"date-parts":[[2013,7,3]]},"reference":[{"key":"2023070311325079100_btt386-B1","doi-asserted-by":"crossref","first-page":"577","DOI":"10.1089\/cmb.2012.0084","article-title":"HapCompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data","volume":"19","author":"Aguiar","year":"2012","journal-title":"J. Comput. Biol."},{"key":"2023070311325079100_btt386-B2","doi-asserted-by":"crossref","first-page":"i153","DOI":"10.1093\/bioinformatics\/btn298","article-title":"HapCUT: an efficient and accurate algorithm for the haplotype assembly problem","volume":"24","author":"Bansal","year":"2008","journal-title":"Bioinformatics"},{"key":"2023070311325079100_btt386-B3","doi-asserted-by":"crossref","first-page":"1336","DOI":"10.1101\/gr.077065.108","article-title":"An MCMC algorithm for haplotype assembly from whole-genome sequence data","volume":"18","author":"Bansal","year":"2008","journal-title":"Genome Res."},{"key":"2023070311325079100_btt386-B4","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1016\/j.ajhg.2009.01.005","article-title":"A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals","volume":"84","author":"Browning","year":"2009","journal-title":"Am. J. Hum. Genet."},{"key":"2023070311325079100_btt386-B6","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1145\/1854776.1854802","article-title":"Refhap: a reliable and fast algorithm for single individual haplotyping","volume-title":"Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology","author":"Duitama","year":"2010"},{"key":"2023070311325079100_btt386-B7","doi-asserted-by":"crossref","first-page":"2041","DOI":"10.1093\/nar\/gkr1042","article-title":"Fosmid-based whole genome haplotyping of a hapmap trio child: evaluation of single individual haplotyping techniques","volume":"40","author":"Duitama","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023070311325079100_btt386-B8","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/nature09534","article-title":"A map of human genome variation from population-scale sequencing","volume":"467","author":"Durbin","year":"2010","journal-title":"Nature"},{"key":"2023070311325079100_btt386-B9","doi-asserted-by":"crossref","first-page":"1299","DOI":"10.1093\/genetics\/159.3.1299","article-title":"Estimating recombination rates from population genetic data","volume":"159","author":"Fearnhead","year":"2001","journal-title":"Genetics"},{"key":"2023070311325079100_btt386-B10","doi-asserted-by":"crossref","first-page":"721","DOI":"10.1109\/TPAMI.1984.4767596","article-title":"Stochastic relaxation, gibbs distributions, and the bayesian restoration of images","volume":"6","author":"Geman","year":"1984","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"2023070311325079100_btt386-B11","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1016\/j.gene.2012.11.093","article-title":"Hap-seqX: expedite algorithm for haplotype phasing with imputation using sequence data","volume":"518","author":"He","year":"2013","journal-title":"Gene"},{"key":"2023070311325079100_btt386-B12","doi-asserted-by":"crossref","first-page":"i183","DOI":"10.1093\/bioinformatics\/btq215","article-title":"Optimal algorithms for haplotype assembly from whole-genome sequence data","volume":"26","author":"He","year":"2010","journal-title":"Bioinformatics"},{"key":"2023070311325079100_btt386-B13","first-page":"64","article-title":"Hap-seq: an optimal algorithm for haplotype phasing with imputation using sequencing data","volume-title":"Proceedings of the 16th Annual International Conference on Research in Computational Molecular Biology (RECOMB)","author":"He","year":"2012"},{"key":"2023070311325079100_btt386-B14","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1534\/g3.111.001198","article-title":"Genotype imputation with thousands of genomes","volume":"1","author":"Howie","year":"2011","journal-title":"G3 (Bethesda)"},{"key":"2023070311325079100_btt386-B15","doi-asserted-by":"crossref","first-page":"e1000529","DOI":"10.1371\/journal.pgen.1000529","article-title":"A flexible and accurate genotype imputation method for the next generation of genome-wide association studies","volume":"5","author":"Howie","year":"2009","journal-title":"PLoS Genet."},{"key":"2023070311325079100_btt386-B16","doi-asserted-by":"crossref","first-page":"599","DOI":"10.1038\/35079107","article-title":"Association of nod2 leucine-rich repeat variants with susceptibility to crohn\u2019s disease","volume":"411","author":"Hugot","year":"2001","journal-title":"Nature"},{"key":"2023070311325079100_btt386-B17","doi-asserted-by":"crossref","first-page":"1299","DOI":"10.1038\/nature04226","article-title":"A haplotype map of the human genome","volume":"437","author":"International HapMap Consortium","year":"2005","journal-title":"Nature"},{"key":"2023070311325079100_btt386-B18","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1089\/cmb.2009.0199","article-title":"EMINIM: an adaptive and memory-efficient algorithm for genotype imputation","volume":"17","author":"Kang","year":"2010","journal-title":"J. Comput. Biol."},{"key":"2023070311325079100_btt386-B19","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nbt.1740","article-title":"Haplotype-resolved genome sequencing of a gujarati indian individual","volume":"29","author":"Kitzman","year":"2011","journal-title":"Nat. Biotechnol."},{"key":"2023070311325079100_btt386-B20","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1177\/096228020101000104","article-title":"A chronology of fine-scale gene mapping by linkage disequilibrium","volume":"10","author":"Lazzeroni","year":"2001","journal-title":"Stat. Methods Med. Res."},{"key":"2023070311325079100_btt386-B21","doi-asserted-by":"crossref","first-page":"2213","DOI":"10.1093\/genetics\/165.4.2213","article-title":"Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data","volume":"165","author":"Li","year":"2003","journal-title":"Genetics"},{"key":"2023070311325079100_btt386-B22","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1002\/gepi.20533","article-title":"MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes","volume":"34","author":"Li","year":"2010","journal-title":"Genet. Epidemiol."},{"key":"2023070311325079100_btt386-B23","volume-title":"Monte Carlo Strategies in Scientific Computing","author":"Liu","year":"2008"},{"key":"2023070311325079100_btt386-B24","doi-asserted-by":"crossref","first-page":"2436","DOI":"10.1093\/bioinformatics\/btp412","article-title":"HI: haplotype improver using paired-end short reads","volume":"25","author":"Long","year":"2009","journal-title":"Bioinformatics"},{"key":"2023070311325079100_btt386-B25","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1093\/genetics\/163.1.375","article-title":"Bounds on the minimum number of recombination events in a sample history","volume":"163","author":"Myers","year":"2003","journal-title":"Genetics"},{"key":"2023070311325079100_btt386-B26","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1038\/ng.2283","article-title":"Extremely low-coverage sequencing and imputation increases power for genome-wide association studies","volume":"44","author":"Pasaniuc","year":"2012","journal-title":"Nat. Genet."},{"key":"2023070311325079100_btt386-B27","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1038\/ng1001-223","article-title":"Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease","volume":"29","author":"Rioux","year":"2001","journal-title":"Nat. Genet."},{"key":"2023070311325079100_btt386-B28","doi-asserted-by":"crossref","first-page":"832","DOI":"10.1038\/nature01140","article-title":"Detecting recent positive selection in the human genome from haplotype structure","volume":"419","author":"Sabeti","year":"2002","journal-title":"Nature"},{"key":"2023070311325079100_btt386-B29","doi-asserted-by":"crossref","first-page":"S8","DOI":"10.1186\/1752-0509-6-S2-S8","article-title":"A fast and accurate algorithm for single individual haplotyping","volume":"6","author":"Xie","year":"2012","journal-title":"BMC Syst. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/18\/2245\/50782800\/bioinformatics_29_18_2245.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/18\/2245\/50782800\/bioinformatics_29_18_2245.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,3]],"date-time":"2023-07-03T11:35:19Z","timestamp":1688384119000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/29\/18\/2245\/239997"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,7,3]]},"references-count":28,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2013,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt386","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2013,9,15]]},"published":{"date-parts":[[2013,7,3]]}}}