{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,8]],"date-time":"2026-03-08T22:12:40Z","timestamp":1773007960817,"version":"3.50.1"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2315,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Motivation: Haplotype inference is an important step for many types of analyses of genetic variation in the human genome. Traditional approaches for obtaining haplotypes involve collecting genotype information from a population of individuals and then applying a haplotype inference algorithm. The development of high-throughput sequencing technologies allows for an alternative strategy to obtain haplotypes by combining sequence fragments. The problem of \u2018haplotype assembly\u2019 is the problem of assembling the two haplotypes for a chromosome given the collection of such fragments, or reads, and their locations in the haplotypes, which are pre-determined by mapping the reads to a reference genome. Errors in reads significantly increase the difficulty of the problem and it has been shown that the problem is NP-hard even for reads of length 2. Existing greedy and stochastic algorithms are not guaranteed to find the optimal solutions for the haplotype assembly problem.<\/jats:p>\n                  <jats:p>Results: In this article, we proposed a dynamic programming algorithm that is able to assemble the haplotypes optimally with time complexity O(m \u00d7 2k \u00d7 n), where m is the number of reads, k is the length of the longest read and n is the total number of SNPs in the haplotypes. We also reduce the haplotype assembly problem into the maximum satisfiability problem that can often be solved optimally even when k is large. Taking advantage of the efficiency of our algorithm, we perform simulation experiments demonstrating that the assembly of haplotypes using reads of length typical of the current sequencing technologies is not practical. However, we demonstrate that the combination of this approach and the traditional haplotype phasing approaches allow us to practically construct haplotypes containing both common and rare variants.<\/jats:p>\n                  <jats:p>Contact: \u00a0danhe@cs.ucla.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq215","type":"journal-article","created":{"date-parts":[[2010,6,7]],"date-time":"2010-06-07T03:28:13Z","timestamp":1275881293000},"page":"i183-i190","source":"Crossref","is-referenced-by-count":105,"title":["Optimal algorithms for haplotype assembly from whole-genome sequence data"],"prefix":"10.1093","volume":"26","author":[{"given":"Dan","family":"He","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA"}]},{"given":"Arthur","family":"Choi","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA"}]},{"given":"Knot","family":"Pipatsrisawat","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA"}]},{"given":"Adnan","family":"Darwiche","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA"}]},{"given":"Eleazar","family":"Eskin","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA"}]}],"member":"286","published-online":{"date-parts":[[2010,6,1]]},"reference":[{"key":"2023012508041562800_B1","author":"1000 Genomes Project","year":"2010","journal-title":"A deep catalog of human genetic variation."},{"key":"2023012508041562800_B2","doi-asserted-by":"crossref","first-page":"i153","DOI":"10.1093\/bioinformatics\/btn298","article-title":"HapCUT: an efficient and accurate algorithm for the haplotype assembly problem","volume":"24","author":"Bansal","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012508041562800_B3","doi-asserted-by":"crossref","first-page":"1336","DOI":"10.1101\/gr.077065.108","article-title":"An MCMC algorithm for haplotype assembly from whole-genome sequence data","volume":"18","author":"Bansal","year":"2008","journal-title":"Genome Res."},{"key":"2023012508041562800_B4","article-title":"Handbook of Satisfiability","volume-title":"Frontiers in Artificial Intelligence and Applications.","author":"Biere","year":"2009"},{"key":"2023012508041562800_B5","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/s00439-008-0472-1","article-title":"Haplotypic analysis of Wellcome Trust Case Control Consortium data","volume":"123","author":"Browning","year":"2008","journal-title":"Hum. genet."},{"key":"2023012508041562800_B6","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1007\/978-3-540-87361-7_12","article-title":"Efficient genome wide tagging by reduction to SAT","volume-title":"Proceedings of the 8th International Workshop on Algorithms in Bioinformatics","author":"Choi","year":"2008"},{"key":"2023012508041562800_B7","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1007\/11557067_11","article-title":"On the complexity of several haplotyping problems","volume-title":"Proceedings of the 5th International Workshop on Algorithms in Bioinformatics","author":"Cilibrasi","year":"2005"},{"key":"2023012508041562800_B8","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1086\/283157","article-title":"On the problem of discovering the most parsimonious tree","volume":"111","author":"Fitch","year":"1977","journal-title":"Am. Nat."},{"key":"2023012508041562800_B9","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1007\/3-540-44888-8_11","article-title":"Haplotype inference by pure parsimony","volume-title":"Proceedings of the Combinatorial Pattern Matching Conference","author":"Gusfield","year":"2003"},{"key":"2023012508041562800_B10","doi-asserted-by":"crossref","first-page":"1842","DOI":"10.1093\/bioinformatics\/bth149","article-title":"Haplotype reconstruction from genotype data using imperfect phylogeny","volume":"20","author":"Halperin","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012508041562800_B11","doi-asserted-by":"crossref","first-page":"851","DOI":"10.1038\/nature06258","article-title":"A second generation human haplotype map of over 3.1 million SNPS","volume":"449","author":"International HapMap Consortium","year":"2007","journal-title":"Nature"},{"key":"2023012508041562800_B12","first-page":"182","article-title":"SNPs problems, complexity, and algorithms","volume-title":"Proceedings of the 9th Annual European Symposium on Algorithms. Lecture Notes in Computer Science","author":"Lancia","year":"2001"},{"key":"2023012508041562800_B13","doi-asserted-by":"crossref","first-page":"e254","DOI":"10.1371\/journal.pbio.0050254","article-title":"The diploid genome sequence of an individual human","volume":"5","author":"Levy","year":"2007","journal-title":"PLoS Biol."},{"key":"2023012508041562800_B14","first-page":"613","article-title":"MaxSAT, hard and soft constraints","volume":"185","author":"Li","year":"2009","journal-title":"Handbook of Satisfiability"},{"key":"2023012508041562800_B15","doi-asserted-by":"crossref","first-page":"1851","DOI":"10.1101\/gr.078212.108","article-title":"Mapping short DNA sequencing reads and calling variants using mapping quality scores","volume":"18","author":"Li","year":"2008","journal-title":"Genome Res."},{"key":"2023012508041562800_B16","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1093\/bib\/3.1.23","article-title":"Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem","volume":"3","author":"Lippert","year":"2002","journal-title":"Brief. Bioinform."},{"key":"2023012508041562800_B17","first-page":"495","article-title":"Algorithms for weighted Boolean optimization","volume-title":"Proceedings of the 12th International Conference on Theory and Applications of Satisfiability Testing","author":"Manquinho","year":"2009"},{"key":"2023012508041562800_B18","doi-asserted-by":"crossref","first-page":"906","DOI":"10.1038\/ng2088","article-title":"A new multipoint method for genome-wide association studies by imputation of genotypes","volume":"39","author":"Marchini","year":"2007","journal-title":"Nat. Genet."},{"key":"2023012508041562800_B19","first-page":"266","article-title":"Fast hare: a fast heuristic for single individual SNP haplotype reconstruction","volume-title":"Lecture Notes in Computer Science","author":"Panconesi","year":"2004"},{"key":"2023012508041562800_B20","doi-asserted-by":"crossref","first-page":"191","DOI":"10.3233\/SAT190044","article-title":"Solving weighted Max-SAT problems in a reduced search space: a performance analysis","volume":"4","author":"Pipatsrisawat","year":"2008","journal-title":"Journal on Satisfiability, Boolean Modeling and Computation"},{"key":"2023012508041562800_B21","doi-asserted-by":"crossref","first-page":"978","DOI":"10.1086\/319501","article-title":"A new statistical method for haplotype reconstruction from population data","volume":"68","author":"Stephens","year":"2001","journal-title":"Am. J. Hum. Gene."},{"key":"2023012508041562800_B22","doi-asserted-by":"crossref","first-page":"2456","DOI":"10.1093\/bioinformatics\/bti352","article-title":"Haplotype reconstruction from SNP fragments by minimum error correction","volume":"21","author":"Wang","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012508041562800_B23","doi-asserted-by":"crossref","first-page":"872","DOI":"10.1038\/nature06884","article-title":"The complete genome of an individual by massively parallel DNA sequencing","volume":"452","author":"Wheeler","year":"2008","journal-title":"Nature"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/12\/i183\/48855486\/bioinformatics_26_12_i183.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/12\/i183\/48855486\/bioinformatics_26_12_i183.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T03:07:56Z","timestamp":1674616076000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/12\/i183\/286031"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,6,1]]},"references-count":23,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2010,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq215","relation":{"has-review":[{"id-type":"doi","id":"10.3410\/f.13339986.14707085","asserted-by":"object"}]},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,6,15]]},"published":{"date-parts":[[2010,6,1]]}}}