{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T07:31:26Z","timestamp":1775633486644,"version":"3.50.1"},"reference-count":13,"publisher":"Oxford University Press (OUP)","issue":"21","license":[{"start":{"date-parts":[[2019,5,16]],"date-time":"2019-05-16T00:00:00Z","timestamp":1557964800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Genome Canada and Genome BC","award":["243FOR"],"award-info":[{"award-number":["243FOR"]}]},{"name":"Genome Canada and Genome BC","award":["281ANV"],"award-info":[{"award-number":["281ANV"]}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["2R01HG007182-04A1"],"award-info":[{"award-number":["2R01HG007182-04A1"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>In the modern genomics era, genome sequence assemblies are routine practice. However, depending on the methodology, resulting drafts may contain considerable base errors. Although utilities exist for genome base polishing, they work best with high read coverage and do not scale well. We developed ntEdit, a Bloom filter-based genome sequence editing utility that scales to large mammalian and conifer genomes.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We first tested ntEdit and the state-of-the-art assembly improvement tools GATK, Pilon and Racon on controlled Escherichia coli and Caenorhabditis elegans sequence data. Generally, ntEdit performs well at low sequence depths (&amp;lt;20\u00d7), fixing the majority (&amp;gt;97%) of base substitutions and indels, and its performance is largely constant with increased coverage. In all experiments conducted using a single CPU, the ntEdit pipeline executed in &amp;lt;14\u2009s and &amp;lt;3\u2009m, on average, on E.coli and C.elegans, respectively. We performed similar benchmarks on a sub-20\u00d7 coverage human genome sequence dataset, inspecting accuracy and resource usage in editing chromosomes 1 and 21, and whole genome. ntEdit scaled linearly, executing in 30\u201340\u2009m on those sequences. We show how ntEdit ran in &amp;lt;2 h 20 m to improve upon long and linked read human genome assemblies of NA12878, using high-coverage (54\u00d7) Illumina sequence data from the same individual, fixing frame shifts in coding sequences. We also generated 17-fold coverage spruce sequence data from haploid sequence sources (seed megagametophyte), and used it to edit our pseudo haploid assemblies of the 20 Gb interior and white spruce genomes in &amp;lt;4 and &amp;lt;5\u2009h, respectively, making roughly 50M edits at a (substitution+indel) rate of 0.0024.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>https:\/\/github.com\/bcgsc\/ntedit<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz400","type":"journal-article","created":{"date-parts":[[2019,5,7]],"date-time":"2019-05-07T19:26:09Z","timestamp":1557257169000},"page":"4430-4432","source":"Crossref","is-referenced-by-count":94,"title":["ntEdit: scalable genome sequence polishing"],"prefix":"10.1093","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9890-2293","authenticated-orcid":false,"given":"Ren\u00e9 L","family":"Warren","sequence":"first","affiliation":[{"name":"Genome Sciences Centre, BC Cancer , Vancouver, Canada"}]},{"given":"Lauren","family":"Coombe","sequence":"additional","affiliation":[{"name":"Genome Sciences Centre, BC Cancer , Vancouver, Canada"}]},{"given":"Hamid","family":"Mohamadi","sequence":"additional","affiliation":[{"name":"Genome Sciences Centre, BC Cancer , Vancouver, Canada"}]},{"given":"Jessica","family":"Zhang","sequence":"additional","affiliation":[{"name":"Genome Sciences Centre, BC Cancer , Vancouver, Canada"}]},{"given":"Barry","family":"Jaquish","sequence":"additional","affiliation":[{"name":"BC Ministry of Forests, Lands, and Natural Resource Operations , Victoria, Canada"}]},{"given":"Nathalie","family":"Isabel","sequence":"additional","affiliation":[{"name":"Laurentian Forestry Centre, Natural Resources Canada , Qu\u00e9bec, Canada"}]},{"given":"Steven J M","family":"Jones","sequence":"additional","affiliation":[{"name":"Genome Sciences Centre, BC Cancer , Vancouver, Canada"}]},{"given":"Jean","family":"Bousquet","sequence":"additional","affiliation":[{"name":"Canada Research Chair in Forest Genomics, Universit\u00e9 Laval , Qu\u00e9bec, Canada"}]},{"given":"Joerg","family":"Bohlmann","sequence":"additional","affiliation":[{"name":"Michael Smith Laboratories, University of British Columbia , Vancouver, Canada"}]},{"given":"Inan\u00e7","family":"Birol","sequence":"additional","affiliation":[{"name":"Genome Sciences Centre, BC Cancer , Vancouver, Canada"}]}],"member":"286","published-online":{"date-parts":[[2019,5,16]]},"reference":[{"key":"2023062712531269000_btz400-B1","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1186\/s40246-016-0068-0","article-title":"A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis","volume":"10","author":"Akogwu","year":"2016","journal-title":"Hum. Genomics"},{"key":"2023062712531269000_btz400-B2","doi-asserted-by":"crossref","first-page":"1492","DOI":"10.1093\/bioinformatics\/btt178","article-title":"Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data","volume":"29","author":"Birol","year":"2013","journal-title":"Bioinformatics"},{"key":"2023062712531269000_btz400-B3","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1038\/nbt.4060","article-title":"Nanopore sequencing and assembly of a human genome with ultra-long reads","volume":"36","author":"Jain","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023062712531269000_btz400-B4","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1038\/s41587-018-0005-y","article-title":"Reply to \u2018Errors in long-read assemblies can critically affect protein prediction\u2019","volume":"37","author":"Koren","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023062712531269000_btz400-B5","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2023062712531269000_btz400-B6","doi-asserted-by":"crossref","first-page":"i142","DOI":"10.1093\/bioinformatics\/bty266","article-title":"Versatile genome assembly evaluation with QUAST-LG","volume":"34","author":"Mikheenko","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062712531269000_btz400-B7","doi-asserted-by":"crossref","first-page":"1324","DOI":"10.1093\/bioinformatics\/btw832","article-title":"ntCard: a streaming algorithm for cardinality estimation in genomics data","volume":"33","author":"Mohamadi","year":"2017","journal-title":"Bioinformatics"},{"key":"2023062712531269000_btz400-B8","doi-asserted-by":"crossref","first-page":"780","DOI":"10.1038\/nmeth.3454","article-title":"Assembly and diploid architecture of an individual human genome via single-molecule technologies","volume":"12","author":"Pendleton","year":"2015","journal-title":"Nat. Methods"},{"key":"2023062712531269000_btz400-B9","doi-asserted-by":"crossref","first-page":"3210","DOI":"10.1093\/bioinformatics\/btv351","article-title":"BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs","volume":"31","author":"Sim\u00e3o","year":"2015","journal-title":"Bioinformatics"},{"key":"2023062712531269000_btz400-B10","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1101\/gr.214270.116","article-title":"Fast and accurate de novo genome assembly from long uncorrected reads","volume":"27","author":"Vaser","year":"2017","journal-title":"Genome Res"},{"key":"2023062712531269000_btz400-B11","doi-asserted-by":"crossref","first-page":"e112963.","DOI":"10.1371\/journal.pone.0112963","article-title":"Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement","volume":"9","author":"Walker","year":"2014","journal-title":"PLoS One"},{"key":"2023062712531269000_btz400-B12","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1038\/s41587-018-0004-z","article-title":"Errors in long-read assemblies can critically affect protein prediction","volume":"37","author":"Watson","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023062712531269000_btz400-B13","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1111\/tpj.12886","article-title":"Improved white spruce (white spruce) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism","volume":"83","author":"Warren","year":"2015","journal-title":"Plant J"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz400\/28787884\/btz400.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/21\/4430\/50721970\/bioinformatics_35_21_4430.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/21\/4430\/50721970\/bioinformatics_35_21_4430.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T08:53:38Z","timestamp":1687856018000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/21\/4430\/5490204"}},"subtitle":[],"editor":[{"given":"Bonnie","family":"Berger","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,5,16]]},"references-count":13,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2019,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz400","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/565374","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,11,1]]},"published":{"date-parts":[[2019,5,16]]}}}