{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,12]],"date-time":"2026-04-12T06:46:50Z","timestamp":1775976410036,"version":"3.50.1"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"23","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2013,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Draft de novo genome assemblies are now available for many organisms. These assemblies are point estimates of the true genome sequences. Each is a specific hypothesis, drawn from among many alternative hypotheses, of the sequence of a genome. Assembly uncertainty, the inability to distinguish between multiple alternative assembly hypotheses, can be due to real variation between copies of the genome in the sample, errors and ambiguities in the sequenced data and assumptions and heuristics of the assemblers. Most assemblers select a single assembly according to ad hoc criteria, and do not yet report and quantify the uncertainty of their outputs. Those assemblers that do report uncertainty take different approaches to describing multiple assembly hypotheses and the support for each.<\/jats:p>\n               <jats:p>Results: Here we review and examine the problem of representing and measuring uncertainty in assemblies. A promising recent development is the implementation of assemblers that are built according to explicit statistical models. Some new assembly methods, for example, estimate and maximize assembly likelihood. These advances, combined with technical advances in the representation of alternative assembly hypotheses, will lead to a more complete and biologically relevant understanding of assembly uncertainty. This will in turn facilitate the interpretation of downstream analyses and tests of specific biological hypotheses.<\/jats:p>\n               <jats:p>Contact: \u00a0mhowison@brown.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt525","type":"journal-article","created":{"date-parts":[[2013,9,11]],"date-time":"2013-09-11T09:53:20Z","timestamp":1378893200000},"page":"2959-2963","source":"Crossref","is-referenced-by-count":22,"title":["Toward a statistically explicit understanding of <i>de novo<\/i> sequence assembly"],"prefix":"10.1093","volume":"29","author":[{"given":"Mark","family":"Howison","sequence":"first","affiliation":[{"name":"1 Center for Computation and Visualization and 2Department of Ecology and Evolutionary Biology, Brown University, Providence, RI 02912, USA"}]},{"given":"Felipe","family":"Zapata","sequence":"additional","affiliation":[{"name":"1 Center for Computation and Visualization and 2Department of Ecology and Evolutionary Biology, Brown University, Providence, RI 02912, USA"}]},{"given":"Casey W.","family":"Dunn","sequence":"additional","affiliation":[{"name":"1 Center for Computation and Visualization and 2Department of Ecology and Evolutionary Biology, Brown University, Providence, RI 02912, USA"}]}],"member":"286","published-online":{"date-parts":[[2013,9,10]]},"reference":[{"key":"2023012810480204000_btt525-B1","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1038\/nmeth.1527","article-title":"Limitations of next-generation genome sequence assembly","volume":"8","author":"Alkan","year":"2011","journal-title":"Nat. Methods"},{"key":"2023012810480204000_btt525-B2","doi-asserted-by":"crossref","first-page":"1336","DOI":"10.1101\/gr.077065.108","article-title":"An MCMC algorithm for haplotype assembly from whole-genome sequence data","volume":"18","author":"Bansal","year":"2008","journal-title":"Genome Res."},{"key":"2023012810480204000_btt525-B3","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1186\/2047-217X-2-10","article-title":"Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species","volume":"2","author":"Bradnam","year":"2013","journal-title":"Gigascience"},{"key":"2023012810480204000_btt525-B4","doi-asserted-by":"crossref","first-page":"703","DOI":"10.1038\/nrg3054","article-title":"Haplotype phasing: existing methods and new developments","volume":"12","author":"Browning","year":"2011","journal-title":"Nat. Rev. Genet."},{"key":"2023012810480204000_btt525-B5","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1126\/science.1180614","article-title":"Genomics. Genome project standards in a new era of sequencing","volume":"326","author":"Chain","year":"2009","journal-title":"Science"},{"key":"2023012810480204000_btt525-B6","doi-asserted-by":"crossref","first-page":"S8","DOI":"10.1186\/1471-2164-12-S2-S8","article-title":"Evaluation of short read metagenomic assembly","volume":"12","author":"Charuvaka","year":"2011","journal-title":"BMC Genomics"},{"key":"2023012810480204000_btt525-B7","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1093\/bioinformatics\/bts723","article-title":"ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies","volume":"29","author":"Clark","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012810480204000_btt525-B8","doi-asserted-by":"crossref","first-page":"2224","DOI":"10.1101\/gr.126599.111","article-title":"Assemblathon 1: a competitive assessment of de novo short read assembly methods","volume":"21","author":"Earl","year":"2011","journal-title":"Genome Res."},{"key":"2023012810480204000_btt525-B9","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1093\/bib\/bbr063","article-title":"Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data","volume":"13","author":"Finotello","year":"2012","journal-title":"Brief. Bioinform."},{"key":"2023012810480204000_btt525-B10","doi-asserted-by":"crossref","first-page":"1451","DOI":"10.1101\/gr.4086505","article-title":"Galaxy: a platform for interactive large-scale genome analysis","volume":"15","author":"Giardine","year":"2005","journal-title":"Genome Res."},{"key":"2023012810480204000_btt525-B11","doi-asserted-by":"crossref","DOI":"10.1201\/b14835","volume-title":"Markov Chain Monte Carlo in Practice","author":"Gilks","year":"1995"},{"key":"2023012810480204000_btt525-B12","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1073\/pnas.1017351108","article-title":"High-quality draft assemblies of mammalian genomes from massively parallel sequence data","volume":"108","author":"Gnerre","year":"2011","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012810480204000_btt525-B13","doi-asserted-by":"crossref","first-page":"814","DOI":"10.1080\/10635150802422308","article-title":"A justification for reporting the majority-rule consensus tree in Bayesian phylogenetics","volume":"57","author":"Holder","year":"2008","journal-title":"Syst. Biol."},{"key":"2023012810480204000_btt525-B14","article-title":"BioLite, a lightweight bioinformatics framework with automated tracking of diagnostics and provenance","volume-title":"Proceedings of the 4th USENIX Workshop on the Theory and Practice of Provenance (TaPP\u201912)","author":"Howison","year":"2012"},{"key":"2023012810480204000_btt525-B15","doi-asserted-by":"crossref","first-page":"R47","DOI":"10.1186\/gb-2013-14-5-r47","article-title":"REAPR: a universal tool for genome assembly evaluation","volume":"14","author":"Hunt","year":"2013","journal-title":"Genome Biol."},{"key":"2023012810480204000_btt525-B16","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1038\/ng.1028","article-title":"De novo assembly and genotyping of variants using colored de Bruijn graphs","volume":"44","author":"Iqbal","year":"2012","journal-title":"Nat. Genet."},{"key":"2023012810480204000_btt525-B17","article-title":"The FASTG Format Specification (v1.00)","author":"Jaffe","year":"2012"},{"key":"2023012810480204000_btt525-B18","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1186\/1479-7364-4-4-271","article-title":"State of the art de novo assembly of human genomes from massively parallel sequencing data","volume":"4","author":"Li","year":"2010","journal-title":"Hum. Genomics"},{"key":"2023012810480204000_btt525-B19","doi-asserted-by":"crossref","first-page":"1838","DOI":"10.1093\/bioinformatics\/bts280","article-title":"Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly","volume":"28","author":"Li","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012810480204000_btt525-B20","doi-asserted-by":"crossref","first-page":"669","DOI":"10.1101\/gr.032102","article-title":"What is finished, and why does it matter","volume":"12","author":"Mardis","year":"2002","journal-title":"Genome Res."},{"key":"2023012810480204000_btt525-B21","doi-asserted-by":"crossref","first-page":"1101","DOI":"10.1089\/cmb.2009.0047","article-title":"Maximum likelihood genome assembly","volume":"16","author":"Medvedev","year":"2009","journal-title":"J. Comput. Biol."},{"key":"2023012810480204000_btt525-B22","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/j.ygeno.2010.03.001","article-title":"Assembly algorithms for next-generation sequencing data","volume":"95","author":"Miller","year":"2010","journal-title":"Genomics"},{"key":"2023012810480204000_btt525-B23","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1038\/nrg3367","article-title":"Sequence assembly demystified","volume":"14","author":"Nagarajan","year":"2013","journal-title":"Nat. Rev. Genet."},{"key":"2023012810480204000_btt525-B24","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1038\/nrg2986","article-title":"Genotype and SNP calling from next-generation sequencing data","volume":"12","author":"Nielsen","year":"2011","journal-title":"Nat. Rev. Genet."},{"key":"2023012810480204000_btt525-B25","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1093\/bib\/bbq020","article-title":"De novo assembly of short sequence reads","volume":"11","author":"Paszkiewicz","year":"2010","journal-title":"Brief. Bioinform."},{"key":"2023012810480204000_btt525-B26","doi-asserted-by":"crossref","first-page":"R55","DOI":"10.1186\/gb-2008-9-3-r55","article-title":"Genome assembly forensics: finding the elusive mis-assembly","volume":"9","author":"Phillippy","year":"2008","journal-title":"Genome Biol."},{"key":"2023012810480204000_btt525-B27","doi-asserted-by":"crossref","first-page":"R8","DOI":"10.1186\/gb-2013-14-1-r8","article-title":"CGAL: computing genome assembly likelihoods","volume":"14","author":"Rahman","year":"2013","journal-title":"Genome Biol."},{"key":"2023012810480204000_btt525-B28","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1016\/j.ygeno.2012.06.009","article-title":"The limitations of draft assemblies for understanding prokaryotic adaptation and evolution","volume":"100","author":"Ricker","year":"2012","journal-title":"Genomics"},{"key":"2023012810480204000_btt525-B29","doi-asserted-by":"crossref","first-page":"4320","DOI":"10.1093\/bioinformatics\/bti769","article-title":"Beware of mis-assembled genomes","volume":"21","author":"Salzberg","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012810480204000_btt525-B30","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1101\/gr.131383.111","article-title":"GAGE: a critical evaluation of genome assemblies and assembly algorithms","volume":"22","author":"Salzberg","year":"2012","journal-title":"Genome Res."},{"key":"2023012810480204000_btt525-B31","doi-asserted-by":"crossref","first-page":"1165","DOI":"10.1101\/gr.101360.109","article-title":"Assembly of large genomes using second-generation sequencing","volume":"20","author":"Schatz","year":"2010","journal-title":"Genome Res."},{"key":"2023012810480204000_btt525-B32","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1101\/gr.126953.111","article-title":"Efficient de novo assembly of large genomes using compressed data structures","volume":"22","author":"Simpson","year":"2012","journal-title":"Genome Res."},{"key":"2023012810480204000_btt525-B33","doi-asserted-by":"crossref","first-page":"1117","DOI":"10.1101\/gr.089532.108","article-title":"ABySS: a parallel assembler for short read sequence data","volume":"19","author":"Simpson","year":"2009","journal-title":"Genome Res."},{"key":"2023012810480204000_btt525-B34","first-page":"165","article-title":"An improved maximum likelihood formulation for accurate genome assembly","volume-title":"Proceedings of the 1st IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)","author":"Varma","year":"2011"},{"key":"2023012810480204000_btt525-B35","doi-asserted-by":"crossref","first-page":"i363","DOI":"10.1093\/bioinformatics\/bts388","article-title":"Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics","volume":"28","author":"Wu","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012810480204000_btt525-B36","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1093\/bioinformatics\/btm542","article-title":"Assembly reconciliation","volume":"24","author":"Zimin","year":"2008","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/23\/2959\/48892100\/bioinformatics_29_23_2959.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/23\/2959\/48892100\/bioinformatics_29_23_2959.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,28]],"date-time":"2023-01-28T12:45:54Z","timestamp":1674909954000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/29\/23\/2959\/246939"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,9,10]]},"references-count":36,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2013,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt525","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2013,12,1]]},"published":{"date-parts":[[2013,9,10]]}}}