{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T23:34:09Z","timestamp":1773272049762,"version":"3.50.1"},"reference-count":26,"publisher":"Oxford University Press (OUP)","issue":"24","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,12,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation : The throughput of genomic sequencing has increased to the point that is overrunning the rate of downstream analysis. This, along with the desire to revisit old data, has led to a situation where large quantities of raw, and nearly impenetrable, sequence data are rapidly filling the hard drives of modern biology labs. These datasets can be compressed via a multi-string variant of the Burrows\u2013Wheeler Transform (BWT), which provides the side benefit of searches for arbitrary k -mers within the raw data as well as the ability to reconstitute arbitrary reads as needed. We propose a method for merging such datasets for both increased compression and downstream analysis.<\/jats:p>\n               <jats:p>Results : We present a novel algorithm that merges multi-string BWTs in O(LCS\u00d7N) time where LCS is the length of their longest common substring between any of the inputs, and N is the total length of all inputs combined (number of symbols) using O(N\u00d7log2(F)) bits where F is the number of multi-string BWTs merged. This merged multi-string BWT is also shown to have a higher compressibility compared with the input multi-string BWTs separately. Additionally, we explore some uses of a merged multi-string BWT for bioinformatics applications.<\/jats:p>\n               <jats:p>Availability and implementation : The MSBWT package is available through PyPI with source code located at https:\/\/code.google.com\/p\/msbwt\/ .<\/jats:p>\n               <jats:p>Contact : holtjma@cs.unc.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu584","type":"journal-article","created":{"date-parts":[[2014,8,30]],"date-time":"2014-08-30T00:28:26Z","timestamp":1409358506000},"page":"3524-3531","source":"Crossref","is-referenced-by-count":38,"title":["Merging of multi-string BWTs with applications"],"prefix":"10.1093","volume":"30","author":[{"given":"James","family":"Holt","sequence":"first","affiliation":[{"name":"Department of Computer Science, 201 S. Columbia St. UNC-CH, Chapel Hill, NC 27599, USA"}]},{"given":"Leonard","family":"McMillan","sequence":"additional","affiliation":[{"name":"Department of Computer Science, 201 S. Columbia St. UNC-CH, Chapel Hill, NC 27599, USA"}]}],"member":"286","published-online":{"date-parts":[[2014,8,28]]},"reference":[{"key":"2023012712042879500_btu584-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol."},{"key":"2023012712042879500_btu584-B2","first-page":"219","article-title":"Lightweight BWT Construction for Very Large String Collections","volume":"6661","author":"Bauer","year":"2011","journal-title":"Comb. Pattern Matching"},{"key":"2023012712042879500_btu584-B3","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1016\/j.tcs.2012.02.002","article-title":"Lightweight algorithms for constructing and inverting the BWT of string collections","volume":"483","author":"Bauer","year":"2013","journal-title":"Theor. Comput. Sci."},{"key":"2023012712042879500_btu584-B4","volume-title":"A Block-Sorting Lossless Data Compression Algorithm","author":"Burrows","year":"1994"},{"key":"2023012712042879500_btu584-B5","doi-asserted-by":"crossref","first-page":"810820","DOI":"10.1101\/gr.7337908","article-title":"ALLPATHS: de novo assembly of whole-genome shotgun microreads","volume":"18","author":"Butler","year":"2008","journal-title":"Genome Res."},{"key":"2023012712042879500_btu584-B6","doi-asserted-by":"crossref","first-page":"1415","DOI":"10.1093\/bioinformatics\/bts173","article-title":"Large-scale compression of genomic sequence databases with the burrows-wheeler transform","volume":"28","author":"Cox","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012712042879500_btu584-B7","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1007\/978-3-642-33122-0_17","article-title":"Comparing DNA sequence collections by direct comparison of compressed text indexes","volume-title":"Algorithms in Bioinformatics","author":"Cox","year":"2012"},{"key":"2023012712042879500_btu584-B8","doi-asserted-by":"crossref","first-page":"707","DOI":"10.1007\/s00453-011-9535-0","article-title":"Lightweight data indexing and compression in external memory","volume":"63","author":"Ferragina","year":"2012","journal-title":"Algorithmica"},{"key":"2023012712042879500_btu584-B9","first-page":"269","article-title":"An Experimental Study of an Opportunistic Index","volume-title":"Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms","author":"Ferragina","year":"2001"},{"key":"2023012712042879500_btu584-B10","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1093\/bioinformatics\/btt257","article-title":"Adaptive reference-free compression of sequence quality scores","volume":"30","author":"Janin","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012712042879500_btu584-B11","doi-asserted-by":"crossref","first-page":"728","DOI":"10.1126\/science.1197891","article-title":"On the future of genomic data","volume":"331","author":"Kahn","year":"2011","journal-title":"Science (Washington)"},{"key":"2023012712042879500_btu584-B12","first-page":"656","article-title":"BLAT-the BLAST-like alignment tool","volume":"12","author":"Kent","year":"2002","journal-title":"Genome Res."},{"key":"2023012712042879500_btu584-B13","first-page":"170","volume-title":"The Art of Computer Programming","author":"Knuth","year":"1973"},{"key":"2023012712042879500_btu584-B14","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2009-10-3-r25","article-title":"Ultrafast and memory-efficient alignment of short DNA sequences to the human genome","volume":"10","author":"Langmead","year":"2009","journal-title":"Genome Biol."},{"key":"2023012712042879500_btu584-B15","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with burrows-wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012712042879500_btu584-B16","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1038\/nbt.2241","article-title":"Compressive genomics","volume":"30","author":"Loh","year":"2012","journal-title":"Nat. Biotechnol."},{"key":"2023012712042879500_btu584-B17","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1007\/11496656_16","article-title":"An extension of the burrows wheeler transform and applications to sequence comparison and data expression","volume":"3537","author":"Mantaci","year":"2005","journal-title":"Comb. Pattern Matching"},{"key":"2023012712042879500_btu584-B18","doi-asserted-by":"crossref","first-page":"9748","DOI":"10.1073\/pnas.171285098","article-title":"An Eulerian path approach to DNA fragment assembly","volume":"98","author":"Pevzner","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012712042879500_btu584-B19","doi-asserted-by":"crossref","first-page":"6881","DOI":"10.1128\/JB.00619-08","article-title":"The pangenome structure of \n              Escherichia coli\n              : comparative genomic analysis of \n              E. coli\n               commensal and pathogenic isolates","volume":"190","author":"Rasko","year":"2008","journal-title":"J. Bacteriol."},{"key":"2023012712042879500_btu584-B20","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nrg3655","article-title":"The role of replicates for error mitigation in next-generation sequencing","volume":"15","author":"Robasky","year":"2014","journal-title":"Nat. Rev. Genet."},{"key":"2023012712042879500_btu584-B21","doi-asserted-by":"crossref","first-page":"364","DOI":"10.1007\/978-3-642-40453-5_28","article-title":"Using cascading Bloom filters to improve the memory usage for de Brujin graphs","volume-title":"Algorithms in Bioinformatics","author":"Salikhov","year":"2013"},{"key":"2023012712042879500_btu584-B22","doi-asserted-by":"crossref","first-page":"i367","DOI":"10.1093\/bioinformatics\/btq217","article-title":"Efficient construction of an assembly string graph using the FM-index","volume":"26","author":"Simpson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012712042879500_btu584-B23","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1101\/gr.126953.111","article-title":"Efficient de novo assembly of large genomes using compressed data structures","volume":"22","author":"Simpson","year":"2012","journal-title":"Genome Res."},{"key":"2023012712042879500_btu584-B24","doi-asserted-by":"crossref","first-page":"1117","DOI":"10.1101\/gr.089532.108","article-title":"ABySS: a parallel assembler for short read sequence data","volume":"19","author":"Simpson","year":"2009","journal-title":"Genome Res."},{"key":"2023012712042879500_btu584-B25","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1007\/978-3-642-03784-9_7","article-title":"Compressed suffix arrays for massive data","volume-title":"String Processing and Information Retrieval","author":"Sir\u00e9n","year":"2009"},{"key":"2023012712042879500_btu584-B26","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/24\/3524\/48931354\/bioinformatics_30_24_3524.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/24\/3524\/48931354\/bioinformatics_30_24_3524.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T13:02:35Z","timestamp":1674824555000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/24\/3524\/2422224"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,8,28]]},"references-count":26,"journal-issue":{"issue":"24","published-print":{"date-parts":[[2014,12,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu584","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,12,15]]},"published":{"date-parts":[[2014,8,28]]}}}