{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:39Z","timestamp":1772138079522,"version":"3.50.1"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"24","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,12,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Motivation: Genomics is expanding from a single reference per species paradigm into a more comprehensive pan-genome approach that analyzes multiple individuals together. A compressed de Bruijn graph is a sophisticated data structure for representing the genomes of entire populations. It robustly encodes shared segments, simple single-nucleotide polymorphisms and complex structural variations far beyond what can be represented in a collection of linear sequences alone.<\/jats:p>\n                  <jats:p>Results: We explore deep topological relationships between suffix trees and compressed de Bruijn graphs and introduce an algorithm, splitMEM, that directly constructs the compressed de Bruijn graph in time and space linear to the total number of genomes for a given maximum genome size. We introduce suffix skips to traverse several suffix links simultaneously and use them to efficiently decompose maximal exact matches into graph nodes. We demonstrate the utility of splitMEM by analyzing the nine-strain pan-genome of Bacillus anthracis and up to 62 strains of Escherichia coli , revealing their core-genome properties.<\/jats:p>\n                  <jats:p>Availability and implementation: Source code and documentation available open-source http:\/\/splitmem.sourceforge.net .<\/jats:p>\n                  <jats:p>Contact: \u00a0mschatz@cshl.edu<\/jats:p>\n                  <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu756","type":"journal-article","created":{"date-parts":[[2014,11,14]],"date-time":"2014-11-14T23:10:48Z","timestamp":1416006648000},"page":"3476-3483","source":"Crossref","is-referenced-by-count":108,"title":["SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips"],"prefix":"10.1093","volume":"30","author":[{"given":"Shoshana","family":"Marcus","sequence":"first","affiliation":[{"name":"1 Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA and 2 Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA"}]},{"given":"Hayan","family":"Lee","sequence":"additional","affiliation":[{"name":"1 Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA and 2 Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA"},{"name":"1 Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA and 2 Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA"}]},{"given":"Michael C.","family":"Schatz","sequence":"additional","affiliation":[{"name":"1 Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA and 2 Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA"},{"name":"1 Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA and 2 Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA"}]}],"member":"286","published-online":{"date-parts":[[2014,11,13]]},"reference":[{"key":"2023012712055754800_btu756-B1","first-page":"225","article-title":"Succinct de bruijn graphs","volume-title":"Proceedings of the 12th International Conference on Algorithms in Bioinformatics, Ljubljana, Slovenia","author":"Bowe","year":"2012"},{"key":"2023012712055754800_btu756-B2","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-07566-2_10","volume-title":"From indexing data structures to de bruijn graphs","author":"Cazaux","year":"2014"},{"key":"2023012712055754800_btu756-B3","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1186\/1748-7188-8-22","article-title":"Space-efficient and exact de bruijn graph representation based on a bloom filter","volume":"8","author":"Chikhi","year":"2013","journal-title":"Algorithm Mol. Biol."},{"key":"2023012712055754800_btu756-B4","first-page":"35","article-title":"On the representation of de bruijn graphs","volume-title":"RECOMB","author":"Chikhi","year":"2014"},{"key":"2023012712055754800_btu756-B5","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511574931","volume-title":"Algorithms on Strings, Trees, and Sequences\u2014Computer Science and Computational Biology","author":"Gusfield","year":"1997"},{"key":"2023012712055754800_btu756-B6","doi-asserted-by":"crossref","first-page":"1341","DOI":"10.1093\/bioinformatics\/btt128","article-title":"Hal: a hierarchical format for storing and analyzing multiple genome alignments","volume":"29","author":"Hickey","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012712055754800_btu756-B7","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1038\/ng.1028","article-title":"De novo assembly and genotyping of variants using colored de Bruijn graphs","volume":"44","author":"Iqbal","year":"2012","journal-title":"Nat. Genet."},{"key":"2023012712055754800_btu756-B8","volume-title":"An Introduction to Parallel Algorithms","author":"Jaja","year":"1992"},{"key":"2023012712055754800_btu756-B9","first-page":"181","article-title":"Linear-time longest-common-prefix computation in suffix arrays and its applications","volume-title":"CPM","author":"Kasai","year":"2001"},{"key":"2023012712055754800_btu756-B10","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1186\/1471-2105-11-21","article-title":"Assembly complexity of prokaryotic genomes using short reads","volume":"11","author":"Kingsford","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012712055754800_btu756-B11","doi-asserted-by":"crossref","first-page":"R12","DOI":"10.1186\/gb-2004-5-2-r12","article-title":"Versatile and open software for comparing large genomes","volume":"5","author":"Kurtz","year":"2004","journal-title":"Genome Biol."},{"key":"2023012712055754800_btu756-B12","doi-asserted-by":"crossref","first-page":"D332","DOI":"10.1093\/nar\/gkj145","article-title":"The genomes on line database (gold) v.2: a monitor of genome projects worldwide","volume":"34","author":"Liolios","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023012712055754800_btu756-B13","doi-asserted-by":"crossref","first-page":"D986","DOI":"10.1093\/nar\/gkt958","article-title":"The database of genomic variants: a curated collection of structural variation in the human genome","volume":"42","author":"MacDonald","year":"2014","journal-title":"Nucleic Acids Res."},{"key":"2023012712055754800_btu756-B14","first-page":"215","article-title":"Sibelia: a scalable and comprehensive synteny block generation tool for closely related microbial genomes","volume-title":"WABI","author":"Minkin","year":"2013"},{"key":"2023012712055754800_btu756-B15","doi-asserted-by":"crossref","first-page":"2444","DOI":"10.1073\/pnas.85.8.2444","article-title":"Improved tools for biological sequence comparison","volume":"85","author":"Pearson","year":"1988","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012712055754800_btu756-B16","doi-asserted-by":"crossref","first-page":"6881","DOI":"10.1128\/JB.00619-08","article-title":"The pangenome structure of \n              Escherichia coli\n              : comparative genomic analysis of \n              E. coli\n               commensal and pathogenic isolates","volume":"190","author":"Rasko","year":"2008","journal-title":"J. Bacteriol."},{"key":"2023012712055754800_btu756-B17","doi-asserted-by":"crossref","first-page":"5027","DOI":"10.1073\/pnas.1016657108","article-title":"Bacillus anthracis\n               comparative genome analysis in support of the amerithrax investigation","volume":"108","author":"Rasko","year":"2011","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012712055754800_btu756-B18","doi-asserted-by":"crossref","first-page":"709","DOI":"10.1056\/NEJMoa1106920","article-title":"Origins of the \n              E. coli\n               strain causing an outbreak of hemolyticuremic syndrome in Germany","volume":"365","author":"Rasko","year":"2011","journal-title":"New Engl. J. Med."},{"key":"2023012712055754800_btu756-B19","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1186\/gb-2013-14-6-405","article-title":"The advantages of smrt sequencing","volume":"14","author":"Roberts","year":"2013","journal-title":"Genome Biol."},{"key":"2023012712055754800_btu756-B20","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1186\/1471-2105-14-313","article-title":"Compact representation of k-mer de bruijn graphs for genome read assembly","volume":"14","author":"R\u00f8dland","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023012712055754800_btu756-B21","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1101\/gr.126953.111","article-title":"Efficient de novo assembly of large genomes using compressed data structures","volume":"22","author":"Simpson","year":"2012","journal-title":"Genome Res."},{"key":"2023012712055754800_btu756-B22","doi-asserted-by":"crossref","first-page":"13950","DOI":"10.1073\/pnas.0506758102","article-title":"Genome analysis of multiple pathogenic isolates of streptococcus agalactiae: implications for the microbial pan-genome","volume":"102","author":"Tettelin","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012712055754800_btu756-B23","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1007\/BF01206331","article-title":"On-line construction of suffix trees","volume":"14","author":"Ukkonen","year":"1995","journal-title":"Algorithmica"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/24\/3476\/48932387\/bioinformatics_30_24_3476.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/24\/3476\/48932387\/bioinformatics_30_24_3476.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T08:04:46Z","timestamp":1674806686000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/24\/3476\/2422268"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,11,13]]},"references-count":23,"journal-issue":{"issue":"24","published-print":{"date-parts":[[2014,12,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu756","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/003954","asserted-by":"object"}]},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,12,15]]},"published":{"date-parts":[[2014,11,13]]}}}