{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T00:30:43Z","timestamp":1773275443318,"version":"3.50.1"},"reference-count":57,"publisher":"Oxford University Press (OUP)","issue":"15","license":[{"start":{"date-parts":[[2018,3,15]],"date-time":"2018-03-15T00:00:00Z","timestamp":1521072000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100004440","name":"Wellcome Trust","doi-asserted-by":"publisher","award":["090532\/Z\/09\/Z"],"award-info":[{"award-number":["090532\/Z\/09\/Z"]}],"id":[{"id":"10.13039\/100004440","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100004440","name":"Wellcome Trust","doi-asserted-by":"publisher","award":["100956\/Z\/13\/Z"],"award-info":[{"award-number":["100956\/Z\/13\/Z"]}],"id":[{"id":"10.13039\/100004440","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Wellcome Trust Research Studentship award","award":["097310\/Z\/11\/Z"],"award-info":[{"award-number":["097310\/Z\/11\/Z"]}]},{"DOI":"10.13039\/501100000268","name":"BBSRC","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000268","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100004440","name":"Wellcome Trust","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100004440","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Royal Society Sir Henry Dale Fellowship","award":["102541\/Z\/13\/Z"],"award-info":[{"award-number":["102541\/Z\/13\/Z"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter k, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, applications that rely on de Bruijn graphs can produce sub-optimal results given their input data.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present a novel assembly graph data structure: the Linked de Bruijn Graph (LdBG). Constructed by adding annotations on top of a de Bruijn graph, it stores long range connectivity information through the graph. We show that with error-free data it is possible to losslessly store and recover sequence from a Linked de Bruijn graph. With assembly simulations we demonstrate that the LdBG data structure outperforms both our de Bruijn graph and the String Graph Assembler (SGA). Finally we apply the LdBG to Klebsiella pneumoniae short read data to make large (12 kbp) variant calls, which we validate using PacBio sequencing data, and to characterize the genomic context of drug-resistance genes.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Linked de Bruijn Graphs and associated algorithms are implemented as part of McCortex, which is available under the MIT license at https:\/\/github.com\/mcveanlab\/mccortex.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty157","type":"journal-article","created":{"date-parts":[[2018,3,26]],"date-time":"2018-03-26T02:23:11Z","timestamp":1522030991000},"page":"2556-2565","source":"Crossref","is-referenced-by-count":61,"title":["Integrating long-range connectivity information into de Bruijn graphs"],"prefix":"10.1093","volume":"34","author":[{"given":"Isaac","family":"Turner","sequence":"first","affiliation":[{"name":"Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK"}]},{"given":"Kiran V","family":"Garimella","sequence":"additional","affiliation":[{"name":"Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK"},{"name":"Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK"}]},{"given":"Zamin","family":"Iqbal","sequence":"additional","affiliation":[{"name":"Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK"},{"name":"European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK"}]},{"given":"Gil","family":"McVean","sequence":"additional","affiliation":[{"name":"Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK"},{"name":"Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK"}]}],"member":"286","published-online":{"date-parts":[[2018,3,15]]},"reference":[{"key":"2023012713051532100_bty157-B1","doi-asserted-by":"crossref","first-page":"204","DOI":"10.1038\/nrg2268","article-title":"Genome instability: a mechanistic view of its causes and consequences","volume":"9","author":"Aguilera","year":"2008","journal-title":"Nat. Rev. Genet"},{"key":"2023012713051532100_bty157-B2","doi-asserted-by":"crossref","first-page":"e00093","DOI":"10.7554\/eLife.00093","article-title":"Population structuring of multi-copy, antigen-encoding genes in Plasmodium falciparum","volume":"1","author":"Artzy-Randrup","year":"2012","journal-title":"eLife"},{"key":"2023012713051532100_bty157-B3","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1089\/cmb.2012.0021","article-title":"SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing","volume":"19","author":"Bankevich","year":"2012","journal-title":"J. Comput. Biol. J. Comput. Mol. Cell Biol"},{"key":"2023012713051532100_bty157-B4","first-page":"499","author":"Bateman","year":"2016"},{"key":"2023012713051532100_bty157-B5","doi-asserted-by":"crossref","first-page":"288.","DOI":"10.1186\/s12859-015-0709-7","article-title":"Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph","volume":"16","author":"Benoit","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023012713051532100_bty157-B6","author":"Bolger","year":"2017"},{"key":"2023012713051532100_bty157-B7","doi-asserted-by":"crossref","first-page":"394","DOI":"10.1007\/s00453-016-0165-4","article-title":"An external-memory algorithm for string graph construction","volume":"78","author":"Bonizzoni","year":"2016","journal-title":"Algorithmica"},{"key":"2023012713051532100_bty157-B8","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1007\/978-3-642-33122-0_18","volume-title":"Algorithms in Bioinformatics","author":"Bowe","year":"2012"},{"key":"2023012713051532100_bty157-B9","doi-asserted-by":"crossref","first-page":"10063","DOI":"10.1038\/ncomms10063","article-title":"Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis","volume":"6","author":"Bradley","year":"2015","journal-title":"Nat. Commun."},{"key":"2023012713051532100_bty157-B10","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1186\/2047-217X-2-10","article-title":"Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species","volume":"2","author":"Bradnam","year":"2013","journal-title":"GigaScience"},{"key":"2023012713051532100_bty157-B11","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1007\/978-3-642-23038-7_4","volume-title":"Algorithms in Bioinformatics","author":"Chikhi","year":"2011"},{"key":"2023012713051532100_bty157-B12","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1186\/1748-7188-8-22","article-title":"Space-efficient and exact de Bruijn graph representation based on a Bloom filter","volume":"8","author":"Chikhi","year":"2013","journal-title":"Algorithms Mol. Biol."},{"key":"2023012713051532100_bty157-B13","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1089\/cmb.2014.0160","article-title":"On the representation of de Bruijn graphs","volume":"22","author":"Chikhi","year":"2015","journal-title":"J. Comput. Biol"},{"key":"2023012713051532100_bty157-B14","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1093\/bioinformatics\/btq697","article-title":"Succinct data structures for assembling large genomes","volume":"27","author":"Conway","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012713051532100_bty157-B15","first-page":"758","article-title":"A Combinatorial Problem","volume":"49","author":"de Bruijn","year":"1946","journal-title":"Koninklijke Nederlandsche Akademie Van Wetenschappen"},{"key":"2023012713051532100_bty157-B16","doi-asserted-by":"crossref","first-page":"469","DOI":"10.1084\/jem.20020851","article-title":"Evidence for replicative repair of DNA double-strand breaks leading to oncogenic translocation and gene amplification","volume":"196","author":"Difilippantonio","year":"2002","journal-title":"J. Exp. Med"},{"key":"2023012713051532100_bty157-B17","doi-asserted-by":"crossref","first-page":"682","DOI":"10.1038\/ng.3257","article-title":"Improved genome inference in the MHC using a population reference graph","volume":"47","author":"Dilthey","year":"2015","journal-title":"Nat. Genet"},{"key":"2023012713051532100_bty157-B18","first-page":"390","author":"Ferragina","year":"2000"},{"key":"2023012713051532100_bty157-B19","doi-asserted-by":"crossref","first-page":"1018","DOI":"10.1038\/35039531","article-title":"Frequent ectopic recombination of virulence factor genes in telomeric chromosome clusters of P. falciparum","volume":"407","author":"Freitas-Junior","year":"2000","journal-title":"Nature"},{"key":"2023012713051532100_bty157-B20","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1038\/nrg.2016.49","article-title":"Coming of age: ten years of next-generation sequencing technologies","volume":"17","author":"Goodwin","year":"2016","journal-title":"Nat. Rev. Genet"},{"key":"2023012713051532100_bty157-B21","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1093\/bioinformatics\/btt086","article-title":"QUAST: quality assessment tool for genome assemblies","volume":"29","author":"Gurevich","year":"2013","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023012713051532100_bty157-B22","author":"Harris","year":"2007"},{"key":"2023012713051532100_bty157-B23","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1007\/978-3-319-56970-3_4","volume-title":"Research in Computational Molecular Biology","author":"Holley","year":"2017"},{"key":"2023012713051532100_bty157-B24","doi-asserted-by":"crossref","first-page":"i361","DOI":"10.1093\/bioinformatics\/btt215","article-title":"Short read alignment with populations of genomes","volume":"29","author":"Huang","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012713051532100_bty157-B25","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1089\/cmb.1995.2.291","article-title":"A new algorithm for DNA sequence assembly","volume":"2","author":"Idury","year":"1995","journal-title":"J. Comput. Biol"},{"key":"2023012713051532100_bty157-B26","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1038\/ng.1028","article-title":"De novo assembly and genotyping of variants using colored de Bruijn graphs","volume":"44","author":"Iqbal","year":"2012","journal-title":"Nat. Genet"},{"key":"2023012713051532100_bty157-B27","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1093\/bioinformatics\/bts673","article-title":"High-throughput microbial population genomics using the Cortex variation assembler","volume":"29","author":"Iqbal","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012713051532100_bty157-B28","doi-asserted-by":"crossref","first-page":"3416","DOI":"10.1073\/pnas.1117313109","article-title":"Antigenic diversity is generated by distinct evolutionary mechanisms in African trypanosome species","volume":"109","author":"Jackson","year":"2012","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012713051532100_bty157-B29","doi-asserted-by":"crossref","first-page":"1785","DOI":"10.1073\/pnas.1220349110","article-title":"Reference-assisted chromosome assembly","volume":"110","author":"Kim","year":"2013","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012713051532100_bty157-B30","doi-asserted-by":"crossref","first-page":"1920","DOI":"10.1093\/bioinformatics\/btv071","article-title":"Reference-based compression of short-read sequences using path encoding","volume":"31","author":"Kingsford","year":"2015","journal-title":"Bioinformatics"},{"key":"2023012713051532100_bty157-B31","doi-asserted-by":"crossref","first-page":"1674","DOI":"10.1093\/bioinformatics\/btv033","article-title":"MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph","volume":"31","author":"Li","year":"2015","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023012713051532100_bty157-B32","doi-asserted-by":"crossref","first-page":"1838","DOI":"10.1093\/bioinformatics\/bts280","article-title":"Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly","volume":"28","author":"Li","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012713051532100_bty157-B33","doi-asserted-by":"crossref","first-page":"3274","DOI":"10.1093\/bioinformatics\/btu541","article-title":"Fast construction of FM-index for long sequence reads","volume":"30","author":"Li","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012713051532100_bty157-B34","doi-asserted-by":"crossref","first-page":"2843","DOI":"10.1093\/bioinformatics\/btu356","article-title":"Toward better understanding of artifacts in variant calling from high-coverage samples","volume":"30","author":"Li","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012713051532100_bty157-B35","doi-asserted-by":"crossref","first-page":"2885","DOI":"10.1093\/bioinformatics\/btv290","article-title":"BFC: correcting Illumina sequencing errors","volume":"31","author":"Li","year":"2015","journal-title":"Bioinformatics"},{"key":"2023012713051532100_bty157-B36","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1186\/s12859-016-1103-9","article-title":"Read mapping on de Bruijn graphs","volume":"17","author":"Limasset","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023012713051532100_bty157-B37","doi-asserted-by":"crossref","first-page":"i302","DOI":"10.1093\/bioinformatics\/btu280","article-title":"Ragout\u2014a reference-assisted assembly tool for bacterial genomes","volume":"30","author":"Kolmogorov","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012713051532100_bty157-B38","doi-asserted-by":"crossref","first-page":"1656","DOI":"10.1128\/AAC.04292-14","article-title":"Klebsiella pneumoniae carbapenemase (KPC) producing K. pneumoniae at a Single Institution: insights into Endemicity from Whole Genome Sequencing","volume":"59","author":"Mathers","year":"2015","journal-title":"Antimicrob. Agents Chemother"},{"key":"2023012713051532100_bty157-B39","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/j.ygeno.2010.03.001","article-title":"Assembly algorithms for next-generation sequencing data","volume":"95","author":"Miller","year":"2010","journal-title":"Genomics"},{"key":"2023012713051532100_bty157-B40","doi-asserted-by":"crossref","first-page":"3181","DOI":"10.1093\/bioinformatics\/btx067","article-title":"Succinct colored de Bruijn graphs","volume":"33","author":"Muggli","year":"2017","journal-title":"Bioinformatics"},{"key":"2023012713051532100_bty157-B41","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1089\/cmb.1995.2.275","article-title":"Toward simplifying and accurately formulating fragment assembly","volume":"2","author":"Myers","year":"1995","journal-title":"J. Comput. Biol"},{"key":"2023012713051532100_bty157-B42","doi-asserted-by":"crossref","first-page":"ii79","DOI":"10.1093\/bioinformatics\/bti1114","article-title":"The fragment assembly string graph","volume":"21","author":"Myers","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012713051532100_bty157-B43","first-page":"426","article-title":"IDBA \u2013 a practical iterative de Bruijn graph de novo assembler","volume":"6044","author":"Peng","year":"2010","journal-title":"RECOMB"},{"key":"2023012713051532100_bty157-B44","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1080\/07391102.1989.10507752","article-title":"l-Tuple DNA sequencing: computer analysis","volume":"7","author":"Pevzner","year":"1989","journal-title":"J. Biomol. Struct. Dyn"},{"key":"2023012713051532100_bty157-B45","doi-asserted-by":"crossref","first-page":"1786","DOI":"10.1101\/gr.2395204","article-title":"De novo repeat classification and fragment assembly","volume":"14","author":"Pevzner","year":"2004","journal-title":"Genome Res"},{"key":"2023012713051532100_bty157-B46","doi-asserted-by":"crossref","first-page":"i293","DOI":"10.1093\/bioinformatics\/btu266","article-title":"ExSPAnder: a universal repeat resolver for DNA fragment assembly","volume":"30","author":"Prjibelski","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012713051532100_bty157-B47","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1038\/nature12221","article-title":"Pan genome of the phytoplankton Emiliania underpins its global distribution","volume":"499","author":"Read","year":"2013","journal-title":"Nature"},{"key":"2023012713051532100_bty157-B48","doi-asserted-by":"crossref","first-page":"382.","DOI":"10.1186\/s12859-015-0801-z","article-title":"An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome","volume":"16","author":"Ribeiro","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023012713051532100_bty157-B49","doi-asserted-by":"crossref","first-page":"912","DOI":"10.1038\/ng.3036","article-title":"Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications","volume":"46","author":"Rimmer","year":"2014","journal-title":"Nat. Genet"},{"key":"2023012713051532100_bty157-B50","first-page":"147","author":"Rozov","year":"2017"},{"key":"2023012713051532100_bty157-B51","doi-asserted-by":"crossref","first-page":"R98.","DOI":"10.1186\/gb-2009-10-9-r98","article-title":"Simultaneous alignment of short reads against multiple genomes","volume":"10","author":"Schneeberger","year":"2009","journal-title":"Genome Biol"},{"key":"2023012713051532100_bty157-B52","doi-asserted-by":"crossref","first-page":"3767","DOI":"10.1128\/AAC.00464-16","article-title":"Nested Russian doll-like genetic mobility drives rapid dissemination of the carbapenem resistance gene blaKPC","volume":"60","author":"Sheppard","year":"2016","journal-title":"Antimicrob. Agents Chemother"},{"key":"2023012713051532100_bty157-B53","doi-asserted-by":"crossref","first-page":"i367","DOI":"10.1093\/bioinformatics\/btq217","article-title":"Efficient construction of an assembly string graph using the FM-index","volume":"26","author":"Simpson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012713051532100_bty157-B54","doi-asserted-by":"crossref","first-page":"1117","DOI":"10.1101\/gr.089532.108","article-title":"ABySS: a parallel assembler for short read sequence data","volume":"19","author":"Simpson","year":"2009","journal-title":"Genome Res"},{"key":"2023012713051532100_bty157-B56","doi-asserted-by":"crossref","first-page":"1350","DOI":"10.1038\/ng.3121","article-title":"Comprehensive variation discovery in single human genomes","volume":"46","author":"Weisenfeld","year":"2014","journal-title":"Nat. Genet"},{"key":"2023012713051532100_bty157-B57","doi-asserted-by":"crossref","first-page":"11.5.1","DOI":"10.1002\/0471250953.bi1105s31","article-title":"Using the Velvet de novo assembler for short-read sequencing technologies","volume":"31","author":"Zerbino","year":"2010","journal-title":"Curr. Protoc. Bioinf"},{"key":"2023012713051532100_bty157-B58","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/15\/2556\/48935412\/bioinformatics_34_15_2556.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/15\/2556\/48935412\/bioinformatics_34_15_2556.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T09:07:51Z","timestamp":1674810471000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/15\/2556\/4938484"}},"subtitle":[],"editor":[{"given":"Bonnie","family":"Berger","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,3,15]]},"references-count":57,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2018,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty157","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/147777","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,8,1]]},"published":{"date-parts":[[2018,3,15]]}}}