{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,11]],"date-time":"2025-11-11T22:22:06Z","timestamp":1762899726282,"version":"3.37.3"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"21","license":[{"start":{"date-parts":[[2019,3,27]],"date-time":"2019-03-27T00:00:00Z","timestamp":1553644800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"INCEPTION","award":["PIA\/ANR16CONV0005"],"award-info":[{"award-number":["PIA\/ANR16CONV0005"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Long-read genome assembly tools are expected to reconstruct bacterial genomes nearly perfectly; however, they still produce fragmented assemblies in some cases. It would be beneficial to understand whether these cases are intrinsically impossible to resolve, or if assemblers are at fault, implying that genomes could be refined or even finished with little to no additional experimental cost.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We propose a set of computational techniques to assist inspection of fragmented bacterial genome assemblies, through careful analysis of assembly graphs. By finding paths of overlapping raw reads between pairs of contigs, we recover potential short-range connections between contigs that were lost during the assembly process. We show that our procedure recovers 45% of missing contig adjacencies in fragmented Canu assemblies, on samples from the NCTC bacterial sequencing project. We also observe that a simple procedure based on enumerating weighted Hamiltonian cycles can suggest likely contig orderings. In our tests, the correct contig order is ranked first in half of the cases and within the top-three predictions in nearly all evaluated cases, providing a direction for finishing fragmented long-read assemblies.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/gitlab.inria.fr\/pmarijon\/knot .<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz219","type":"journal-article","created":{"date-parts":[[2019,3,26]],"date-time":"2019-03-26T15:13:43Z","timestamp":1553613223000},"page":"4239-4246","source":"Crossref","is-referenced-by-count":6,"title":["Graph analysis of fragmented long-read bacterial genome assemblies"],"prefix":"10.1093","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6694-6873","authenticated-orcid":false,"given":"Pierre","family":"Marijon","sequence":"first","affiliation":[{"name":"Inria, Universit\u00e9 de Lille, CNRS, Centrale Lille, UMR 9189 \u2013 CRIStAL , Lille F-59000, France"}]},{"given":"Rayan","family":"Chikhi","sequence":"additional","affiliation":[{"name":"Institut Pasteur, C3BI USR 3756 IP CNRS , Paris, France"}]},{"given":"Jean-St\u00e9phane","family":"Varr\u00e9","sequence":"additional","affiliation":[{"name":"Universit\u00e9 de Lille, CNRS, Centrale Lille , Inria, UMR 9189 \u2013 CRIStAL, Lille F-59000, France"}]}],"member":"286","published-online":{"date-parts":[[2019,3,27]]},"reference":[{"key":"2023062712454056000_btz219-B1","doi-asserted-by":"crossref","first-page":"93.","DOI":"10.1186\/s13059-017-1213-3","article-title":"A comparative evaluation of genome assembly reconciliation tools","volume":"18","author":"Alhakami","year":"2017","journal-title":"Genome Biol"},{"key":"2023062712454056000_btz219-B2","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.1093\/bioinformatics\/btv688","article-title":"hybridSPAdes: an algorithm for hybrid assembly of short and long reads","volume":"32","author":"Antipov","year":"2016","journal-title":"Bioinformatics"},{"key":"2023062712454056000_btz219-B3","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1089\/cmb.2012.0021","article-title":"SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing","volume":"19","author":"Bankevich","year":"2012","journal-title":"J. Comput. Biol"},{"key":"2023062712454056000_btz219-B4","doi-asserted-by":"crossref","first-page":"2443","DOI":"10.1093\/bioinformatics\/btv171","article-title":"MeDuSa: a multi-draft based scaffolder","volume":"31","author":"Bosi","year":"2015","journal-title":"Bioinformatics"},{"key":"2023062712454056000_btz219-B5","doi-asserted-by":"crossref","first-page":"S18","DOI":"10.1186\/1471-2105-14-S5-S18","article-title":"Optimal assembly for high throughput shotgun sequencing","volume":"14","author":"Bresler","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023062712454056000_btz219-B6","doi-asserted-by":"crossref","first-page":"1050","DOI":"10.1038\/nmeth.4035","article-title":"Phased diploid genome assembly with single-molecule real-time sequencing","volume":"13","author":"Chin","year":"2016","journal-title":"Nat. Methods"},{"key":"2023062712454056000_btz219-B7","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1093\/bioinformatics\/bts723","article-title":"ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies","volume":"29","author":"Clark","year":"2013","journal-title":"Bioinformatics"},{"key":"2023062712454056000_btz219-B8","doi-asserted-by":"crossref","first-page":"334.","DOI":"10.1186\/1756-0500-6-334","article-title":"De novo likelihood-based measures for comparing genome assemblies","volume":"6","author":"Ghodsi","year":"2013","journal-title":"BMC Res. Notes"},{"key":"2023062712454056000_btz219-B9","doi-asserted-by":"crossref","first-page":"R47.","DOI":"10.1186\/gb-2013-14-5-r47","article-title":"REAPR: a universal tool for genome assembly evaluation","volume":"14","author":"Hunt","year":"2013","journal-title":"Genome Biol"},{"key":"2023062712454056000_btz219-B10","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1186\/s13059-015-0849-0","article-title":"Circlator: automated circularization of genome assemblies using long sequencing reads","volume":"16","author":"Hunt","year":"2015","journal-title":"Genome Biol"},{"key":"2023062712454056000_btz219-B11","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1101\/gr.216465.116","article-title":"HINGE: long-read assembly achieves optimal repeat resolution","volume":"27","author":"Kamath","year":"2017","journal-title":"Genome Res"},{"key":"2023062712454056000_btz219-B12","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1016\/j.mib.2014.11.014","article-title":"One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly","volume":"23","author":"Koren","year":"2015","journal-title":"Curr. Opin. Microbiol"},{"key":"2023062712454056000_btz219-B13","doi-asserted-by":"crossref","first-page":"722","DOI":"10.1101\/gr.215087.116","article-title":"Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation","volume":"27","author":"Koren","year":"2017","journal-title":"Genome Res"},{"key":"2023062712454056000_btz219-B14","doi-asserted-by":"crossref","first-page":"2520","DOI":"10.1093\/bioinformatics\/bts480","article-title":"Snakemake\u2013a scalable bioinformatics workflow engine","volume":"28","author":"Koster","year":"2012","journal-title":"Bioinformatics"},{"key":"2023062712454056000_btz219-B15","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1590\/1678-4685-gmb-2016-0230","article-title":"Approaches for in silico finishing of microbial genome sequences","volume":"40","author":"Kremer","year":"2017","journal-title":"Genet. Mol. Biol"},{"year":"2018","author":"Lariviere","key":"2023062712454056000_btz219-B16"},{"key":"2023062712454056000_btz219-B17","doi-asserted-by":"crossref","first-page":"3829","DOI":"10.1093\/bioinformatics\/btw602","article-title":"LongISLND: in silicosequencing of lengthy and noisy datatypes","volume":"32","author":"Lau","year":"2016","journal-title":"Bioinformatics"},{"year":"2013","author":"Li","key":"2023062712454056000_btz219-B18"},{"key":"2023062712454056000_btz219-B19","doi-asserted-by":"crossref","first-page":"2103","DOI":"10.1093\/bioinformatics\/btw152","article-title":"Minimap2 and Miniasm: fast mapping and de novo assembly for noisy long sequences","volume":"32","author":"Li","year":"2016","journal-title":"Bioinformatics"},{"key":"2023062712454056000_btz219-B20","doi-asserted-by":"crossref","first-page":"E8396","DOI":"10.1073\/pnas.1604560113","article-title":"Assembly of long error-prone reads using de Bruijn graphs","volume":"113","author":"Lin","year":"2016","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062712454056000_btz219-B21","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1038\/nmeth.3444","article-title":"A complete bacterial genome assembled de novo using only nanopore sequencing data","volume":"12","author":"Loman","year":"2015","journal-title":"Nat. Methods"},{"key":"2023062712454056000_btz219-B22","doi-asserted-by":"crossref","first-page":"2818","DOI":"10.1093\/bioinformatics\/btn548","article-title":"Aggressive assembly of pyrosequencing reads with mates","volume":"24","author":"Miller","year":"2008","journal-title":"Bioinformatics"},{"key":"2023062712454056000_btz219-B23","doi-asserted-by":"crossref","first-page":"ii79","DOI":"10.1093\/bioinformatics\/bti1114","article-title":"The fragment assembly string graph","volume":"21","author":"Myers","year":"2005","journal-title":"Bioinformatics"},{"key":"2023062712454056000_btz219-B24","article-title":"Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes","author":"Olson","year":"2017","journal-title":"Brief. Bioinform"},{"key":"2023062712454056000_btz219-B25","doi-asserted-by":"crossref","first-page":"1043","DOI":"10.1101\/gr.186072.114","article-title":"CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes","volume":"25","author":"Parks","year":"2015","journal-title":"Genome Res"},{"key":"2023062712454056000_btz219-B26","doi-asserted-by":"crossref","first-page":"R55.","DOI":"10.1186\/gb-2008-9-3-r55","article-title":"Genome assembly forensics: finding the elusive mis-assembly","volume":"9","author":"Phillippy","year":"2008","journal-title":"Genome Biol"},{"key":"2023062712454056000_btz219-B27","doi-asserted-by":"crossref","first-page":"R8.","DOI":"10.1186\/gb-2013-14-1-r8","article-title":"CGAL: computing genome assembly likelihoods","volume":"14","author":"Rahman","year":"2013","journal-title":"Genome Biol"},{"key":"2023062712454056000_btz219-B28","doi-asserted-by":"crossref","first-page":"3210","DOI":"10.1093\/bioinformatics\/btv351","article-title":"BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs","volume":"31","author":"Sim\u00e3o","year":"2015","journal-title":"Bioinformatics"},{"key":"2023062712454056000_btz219-B29","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1111\/j.1574-6976.2009.00169.x","article-title":"Genesis, effects and fates of repeats in prokaryotic genomes","volume":"33","author":"Treangen","year":"2009","journal-title":"FEMS Microbiol. Rev"},{"key":"2023062712454056000_btz219-B30","doi-asserted-by":"crossref","first-page":"1272.","DOI":"10.3389\/fmicb.2017.01272","article-title":"A case study into microbial genome assembly gap sequences and finishing strategies","volume":"8","author":"Utturkar","year":"2017","journal-title":"Front. Microbiol"},{"key":"2023062712454056000_btz219-B31","doi-asserted-by":"crossref","first-page":"e52210.","DOI":"10.1371\/journal.pone.0052210","article-title":"Reevaluating assembly evaluations with feature response curves: GAGE and assemblathons","volume":"7","author":"Vezzi","year":"2012","journal-title":"PLoS One"},{"key":"2023062712454056000_btz219-B32","doi-asserted-by":"crossref","first-page":"e112963.","DOI":"10.1371\/journal.pone.0112963","article-title":"Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement","volume":"9","author":"Walker","year":"2014","journal-title":"PLoS One"},{"key":"2023062712454056000_btz219-B33","doi-asserted-by":"crossref","first-page":"3350","DOI":"10.1093\/bioinformatics\/btv383","article-title":"Bandage: interactive visualization of de novo genome assemblies","volume":"31","author":"Wick","year":"2015","journal-title":"Bioinformatics"},{"key":"2023062712454056000_btz219-B34","doi-asserted-by":"crossref","first-page":"e1005595.","DOI":"10.1371\/journal.pcbi.1005595","article-title":"Unicycler: resolving bacterial genome assemblies from short and long sequencing reads","volume":"13","author":"Wick","year":"2017","journal-title":"PLoS Comput. Biol"},{"key":"2023062712454056000_btz219-B35","article-title":"DBG2olc: efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies","volume":"6","author":"Ye","year":"2016","journal-title":"Sci. Rep"},{"key":"2023062712454056000_btz219-B36","doi-asserted-by":"crossref","first-page":"2669","DOI":"10.1093\/bioinformatics\/btt476","article-title":"The MaSuRCA genome assembler","volume":"29","author":"Zimin","year":"2013","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz219\/28535729\/btz219.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/21\/4239\/50721764\/bioinformatics_35_21_4239.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/21\/4239\/50721764\/bioinformatics_35_21_4239.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T12:46:21Z","timestamp":1687869981000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/21\/4239\/5421164"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,3,27]]},"references-count":36,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2019,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz219","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2019,11,1]]},"published":{"date-parts":[[2019,3,27]]}}}