{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:45:02Z","timestamp":1740185102136,"version":"3.37.3"},"reference-count":24,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2017,7,11]],"date-time":"2017-07-11T00:00:00Z","timestamp":1499731200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["7R21HG007430"],"award-info":[{"award-number":["7R21HG007430"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The short read lengths of current high-throughput sequencing technologies limit the ability to recover long-range haplotype information. Dilution pool methods for preparing DNA sequencing libraries from high molecular weight DNA fragments enable the recovery of long DNA fragments from short sequence reads. These approaches require computational methods for identifying the DNA fragments using aligned sequence reads and assembling the fragments into long haplotypes. Although a number of computational methods have been developed for haplotype assembly, the problem of identifying DNA fragments from dilution pool sequence data has not received much attention.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We formulate the problem of detecting DNA fragments from dilution pool sequencing experiments as a genome segmentation problem and develop an algorithm that uses dynamic programming to optimize a likelihood function derived from a generative model for the sequence reads. This algorithm uses an iterative approach to automatically infer the mean background read depth and the number of fragments in each pool. Using simulated data, we demonstrate that our method, FragmentCut, has 25\u201330% greater sensitivity compared with an HMM based method for fragment detection and can also detect overlapping fragments. On a whole-genome human fosmid pool dataset, the haplotypes assembled using the fragments identified by FragmentCut had greater N50 length, 16.2% lower switch error rate and 35.8% lower mismatch error rate compared with two existing methods. We further demonstrate the greater accuracy of our method using two additional dilution pool datasets.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>FragmentCut is available from https:\/\/bansal-lab.github.io\/software\/FragmentCut<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx436","type":"journal-article","created":{"date-parts":[[2017,7,4]],"date-time":"2017-07-04T19:11:43Z","timestamp":1499195503000},"page":"155-162","source":"Crossref","is-referenced-by-count":0,"title":["An accurate algorithm for the detection of DNA fragments from dilution pool sequencing experiments"],"prefix":"10.1093","volume":"34","author":[{"given":"Vikas","family":"Bansal","sequence":"first","affiliation":[{"name":"Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA, USA"}]}],"member":"286","published-online":{"date-parts":[[2017,7,11]]},"reference":[{"key":"2023020208401986900_btx436-B1","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/nature09534","article-title":"A map of human genome variation from population-scale sequencing","volume":"467","author":"Abecasis","year":"2010","journal-title":"Nature"},{"key":"2023020208401986900_btx436-B2","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"Auton","year":"2015","journal-title":"Nature"},{"key":"2023020208401986900_btx436-B3","doi-asserted-by":"crossref","first-page":"i153","DOI":"10.1093\/bioinformatics\/btn298","article-title":"Hapcut: an efficient and accurate algorithm for the haplotype assembly problem","volume":"24","author":"Bansal","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020208401986900_btx436-B4","doi-asserted-by":"crossref","first-page":"1570","DOI":"10.1101\/gr.191189.115","article-title":"Read clouds uncover variation in complex regions of the human genome","volume":"25","author":"Bishara","year":"2015","journal-title":"Genome Res"},{"key":"2023020208401986900_btx436-B5","doi-asserted-by":"crossref","first-page":"703","DOI":"10.1038\/nrg3054","article-title":"Haplotype phasing: existing methods and new developments","volume":"12","author":"Browning","year":"2011","journal-title":"Nat. Rev. Genet"},{"year":"2010","author":"Duitama","key":"2023020208401986900_btx436-B6"},{"key":"2023020208401986900_btx436-B7","doi-asserted-by":"crossref","first-page":"2041","DOI":"10.1093\/nar\/gkr1042","article-title":"Fosmid-based whole genome haplotyping of a hapmap trio child: evaluation of single individual haplotyping techniques","volume":"40","author":"Duitama","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023020208401986900_btx436-B8","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1101\/gr.210500.116","article-title":"A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree","volume":"27","author":"Eberle","year":"2017","journal-title":"Genome Res"},{"key":"2023020208401986900_btx436-B9","doi-asserted-by":"crossref","first-page":"801","DOI":"10.1101\/gr.213462.116","article-title":"Hapcut2: robust and accurate haplotype assembly for diverse sequencing technologies","volume":"27","author":"Edge","year":"2017","journal-title":"Genome Res"},{"key":"2023020208401986900_btx436-B10","doi-asserted-by":"crossref","first-page":"i183","DOI":"10.1093\/bioinformatics\/btq215","article-title":"Optimal algorithms for haplotype assembly from whole-genome sequence data","volume":"26","author":"He","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020208401986900_btx436-B11","doi-asserted-by":"crossref","first-page":"5552","DOI":"10.1073\/pnas.1218696110","article-title":"Whole-genome haplotyping by dilution, amplification, and sequencing","volume":"110","author":"Kaper","year":"2013","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020208401986900_btx436-B12","doi-asserted-by":"crossref","first-page":"1590","DOI":"10.1080\/01621459.2012.737745","article-title":"Optimal detection of changepoints with a linear computational cost","volume":"107","author":"Killick","year":"2012","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020208401986900_btx436-B13","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nbt.1740","article-title":"Haplotype-resolved genome sequencing of a Gujarati Indian individual","volume":"29","author":"Kitzman","year":"2011","journal-title":"Nat. Biotechnol"},{"key":"2023020208401986900_btx436-B14","doi-asserted-by":"crossref","first-page":"137ra76","DOI":"10.1126\/scitranslmed.3004323","article-title":"Noninvasive whole-genome sequencing of a human fetus","volume":"4","author":"Kitzman","year":"2012","journal-title":"Sci. Transl. Med"},{"key":"2023020208401986900_btx436-B15","doi-asserted-by":"crossref","first-page":"i379","DOI":"10.1093\/bioinformatics\/btu484","article-title":"Probabilistic single-individual haplotyping","volume":"30","author":"Kuleshov","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020208401986900_btx436-B16","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1038\/nbt.2833","article-title":"Whole-genome haplotyping using long reads and statistical methods","volume":"32","author":"Kuleshov","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023020208401986900_btx436-B17","doi-asserted-by":"crossref","first-page":"e254.","DOI":"10.1371\/journal.pbio.0050254","article-title":"The diploid genome sequence of an individual human","volume":"5","author":"Levy","year":"2007","journal-title":"PLoS Biol"},{"key":"2023020208401986900_btx436-B18","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and samtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020208401986900_btx436-B19","doi-asserted-by":"crossref","first-page":"42.","DOI":"10.1186\/s13742-016-0148-z","article-title":"The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes","volume":"5","author":"Mao","year":"2016","journal-title":"GigaScience"},{"key":"2023020208401986900_btx436-B20","doi-asserted-by":"crossref","first-page":"780","DOI":"10.1038\/nmeth.3454","article-title":"Assembly and diploid architecture of an individual human genome via single-molecule technologies","volume":"12","author":"Pendleton","year":"2015","journal-title":"Nat. Methods"},{"key":"2023020208401986900_btx436-B21","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1038\/nature11236","article-title":"Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells","volume":"487","author":"Peters","year":"2012","journal-title":"Nature"},{"key":"2023020208401986900_btx436-B22","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1038\/nature12886","article-title":"The complete genome sequence of a neanderthal from the altai mountains","volume":"505","author":"Pr\u00fcfer","year":"2014","journal-title":"Nature"},{"key":"2023020208401986900_btx436-B23","doi-asserted-by":"crossref","first-page":"344","DOI":"10.1038\/nrg3903","article-title":"Haplotype-resolved genome sequencing: experimental methods and applications","volume":"16","author":"Snyder","year":"2015","journal-title":"Nat. Rev. Genet"},{"key":"2023020208401986900_btx436-B24","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1038\/nrg2950","article-title":"The importance of phase information for human genomics","volume":"12","author":"Tewhey","year":"2011","journal-title":"Nat. Rev. Genet"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/1\/155\/49043420\/bioinformatics_34_1_155.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/1\/155\/49043420\/bioinformatics_34_1_155.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T08:45:26Z","timestamp":1675327526000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/1\/155\/3952670"}},"subtitle":[],"editor":[{"given":"Cenk","family":"Sahinalp","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,7,11]]},"references-count":24,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2018,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx436","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2018,1,1]]},"published":{"date-parts":[[2017,7,11]]}}}