{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T09:14:09Z","timestamp":1758273249956},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2016,10,28]],"date-time":"2016-10-28T00:00:00Z","timestamp":1477612800000},"content-version":"vor","delay-in-days":139,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation : With the development of high-throughput sequencing, the number of assembled genomes continues to rise. It is critical to well organize and index many assembled genomes to promote future genomics studies. Burrows\u2013Wheeler Transform (BWT) is an important data structure of genome indexing, which has many fundamental applications; however, it is still non-trivial to construct BWT for large collection of genomes, especially for highly similar or repetitive genomes. Moreover, the state-of-the-art approaches cannot well support scalable parallel computing owing to their incremental nature, which is a bottleneck to use modern computers to accelerate BWT construction.<\/jats:p>\n               <jats:p>Results : We propose de Bruijn branch-based BWT constructor (deBWT), a novel parallel BWT construction approach. DeBWT innovatively represents and organizes the suffixes of input sequence with a novel data structure, de Bruijn branch encoding. This data structure takes the advantage of de Bruijn graph to facilitate the comparison between the suffixes with long common prefix, which breaks the bottleneck of the BWT construction of repetitive genomic sequences. Meanwhile, deBWT also uses the structure of de Bruijn graph for reducing unnecessary comparisons between suffixes. The benchmarking suggests that, deBWT is efficient and scalable to construct BWT for large dataset by parallel computing. It is well-suited to index many genomes, such as a collection of individual human genomes, with multiple-core servers or clusters.<\/jats:p>\n               <jats:p>Availability and implementation : deBWT is implemented in C language, the source code is available at https:\/\/github.com\/hitbc\/deBWT or https:\/\/github.com\/DixianZhu\/deBWT<\/jats:p>\n               <jats:p>Contact: ydwang@hit.edu.cn<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw266","type":"journal-article","created":{"date-parts":[[2016,6,15]],"date-time":"2016-06-15T15:43:52Z","timestamp":1466005432000},"page":"i174-i182","source":"Crossref","is-referenced-by-count":8,"title":["deBWT: parallel construction of Burrows\u2013Wheeler Transform for large collection of genomes with de Bruijn-branch encoding"],"prefix":"10.1093","volume":"32","author":[{"given":"Bo","family":"Liu","sequence":"first","affiliation":[{"name":"Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China"}]},{"given":"Dixian","family":"Zhu","sequence":"additional","affiliation":[{"name":"Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China"}]},{"given":"Yadong","family":"Wang","sequence":"additional","affiliation":[{"name":"Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China"}]}],"member":"286","published-online":{"date-parts":[[2016,6,11]]},"reference":[{"key":"2023020112312663000_btw266-B1","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1016\/j.tcs.2012.02.002","article-title":"Lightweight algorithms for constructing and inverting the bwt of string collections","volume":"483","author":"Bauer","year":"2013","journal-title":"Theor. Comput. Sci"},{"key":"2023020112312663000_btw266-B2","author":"Bentley","year":"1997"},{"key":"2023020112312663000_btw266-B3","author":"Burrow","year":"1994"},{"key":"2023020112312663000_btw266-B4","author":"Cox","year":"2011"},{"key":"2023020112312663000_btw266-B5","doi-asserted-by":"crossref","first-page":"1415","DOI":"10.1093\/bioinformatics\/bts173","article-title":"Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform","volume":"28","author":"Cox","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020112312663000_btw266-B6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s00453-001-0051-5","article-title":"A theoretical and experimental study on the construction of suffix arrays in external memory","volume":"32","author":"Crauser","year":"2008","journal-title":"Algorithmica"},{"key":"2023020112312663000_btw266-B7","doi-asserted-by":"crossref","first-page":"1569","DOI":"10.1093\/bioinformatics\/btv022","article-title":"KMC 2: fast and resource-frugal k-mer counting","volume":"31","author":"Deorowicz","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020112312663000_btw266-B8","doi-asserted-by":"crossref","first-page":"390","DOI":"10.1109\/SFCS.2000.892127","article-title":"Opportunistic data structures with applications","author":"Ferragina","year":"2000","journal-title":"Proceedings of 41st Annual Symposium on Foundations of Computer Science (FOCS)"},{"key":"2023020112312663000_btw266-B9","doi-asserted-by":"crossref","first-page":"552","DOI":"10.1145\/1082036.1082039","article-title":"Indexing compressed text","volume":"52","author":"Ferragina","year":"2005","journal-title":"J ACM"},{"key":"2023020112312663000_btw266-B10","doi-asserted-by":"crossref","first-page":"707","DOI":"10.1007\/s00453-011-9535-0","article-title":"Lightweight data indexing and compression in external memory","volume":"63","author":"Ferragina","year":"2012","journal-title":"Algorithmica"},{"key":"2023020112312663000_btw266-B11","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1073\/pnas.1017351108","article-title":"High-quality draft assemblies of mammalian genomes from massively parallel sequence data","volume":"108","author":"Gnerre","year":"2011","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020112312663000_btw266-B12","first-page":"31","article-title":"Practical aspects of compressed suffix arrays and FM-Index in searching DNA sequences","author":"Hon","year":"2004","journal-title":"Proceedings of the Sixth Workshop on Algorithm Engineering and Experiments and the First Workshop on Analytic Algorithmics and Combinatorics"},{"key":"2023020112312663000_btw266-B13","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/s00453-006-1228-8","article-title":"Constructing compressed suffix arrays with large alphabets","volume":"48","author":"Hon","year":"2007","journal-title":"Algorithmica"},{"key":"2023020112312663000_btw266-B14","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1016\/j.tcs.2007.07.018","article-title":"Fast BWT in small space by blockwise suffix sorting","volume":"387","author":"Karkkainen","year":"2007","journal-title":"Theor. Comput. Sci"},{"key":"2023020112312663000_btw266-B15","doi-asserted-by":"crossref","first-page":"791","DOI":"10.1093\/bioinformatics\/btn032","article-title":"Compressed indexing and local alignment of DNA","volume":"24","author":"Lam","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020112312663000_btw266-B16","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat. Methods"},{"key":"2023020112312663000_btw266-B17","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows-Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020112312663000_btw266-B18","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020112312663000_btw266-B19","doi-asserted-by":"crossref","first-page":"1838","DOI":"10.1093\/bioinformatics\/bts280","article-title":"Exploring single-sample snp and indel calling with whole-genome de novo assembly","volume":"28","author":"Li","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020112312663000_btw266-B20","doi-asserted-by":"crossref","first-page":"3274","DOI":"10.1093\/bioinformatics\/btu541","article-title":"Fast construction of FM-index for long sequence reads","volume":"30","author":"Li","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020112312663000_btw266-B21","author":"Liu","year":"2014"},{"key":"2023020112312663000_btw266-B22","article-title":"Parallel and space-efficient construction of Burrows-Wheeler transform and suffix array for big genome data","author":"Liu","year":"2014","journal-title":"IEEE\/ACM Trans. Comput. \n              Biol. Bioinform\n              ., in press"},{"key":"2023020112312663000_btw266-B23","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1089\/cmb.2009.0169","article-title":"Storage and retrieval of highly repetitive sequence collections","volume":"17","author":"Makinen","year":"2010","journal-title":"J. Comp. Biol"},{"key":"2023020112312663000_btw266-B24","doi-asserted-by":"crossref","first-page":"764","DOI":"10.1093\/bioinformatics\/btr011","article-title":"A fast, lock-free approach for efficient parallel counting of occurrences of k-mers","volume":"27","author":"Mar\u00e7ais","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020112312663000_btw266-B25","doi-asserted-by":"crossref","first-page":"3476","DOI":"10.1093\/bioinformatics\/btu756","article-title":"SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips","volume":"30","author":"Marcus","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020112312663000_btw266-B26","doi-asserted-by":"crossref","DOI":"10.1145\/2518175","article-title":"Suffix array construction in external memory using d-critical substrings","volume":"32","author":"Nong","year":"2014","journal-title":"ACM Trans. Inform. Syst"},{"key":"2023020112312663000_btw266-B27","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1089\/cmb.2015.0199","article-title":"Computational performance assessment of k-mer counting algorithms","volume":"23","author":"Perez","year":"2016","journal-title":"J. Comp. Biol"},{"key":"2023020112312663000_btw266-B28","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1101\/gr.126953.111","article-title":"Efficient de novo assembly of large genomes using compressed data structures","volume":"22","author":"Simpson","year":"2012","journal-title":"Genome Res"},{"key":"2023020112312663000_btw266-B29","article-title":"A taxonomy of suffix array construction algorithms","volume":"39","author":"Smyth","year":"2007","journal-title":"ACM Comput. Surv"},{"key":"2023020112312663000_btw266-B30","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"The 1000 Genomes Project Consortium","year":"2015","journal-title":"Nature"},{"key":"2023020112312663000_btw266-B31","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1038\/nature14962","article-title":"The UK10K project identifies rare variants in health and disease","volume":"526","author":"The UK10K Consortium","year":"2015","journal-title":"Nature"},{"key":"2023020112312663000_btw266-B32","author":"Tomescu","year":"2016"},{"key":"2023020112312663000_btw266-B33","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1038\/nrg3117","article-title":"Repetitive DNA and next-generation sequencing: computational challenges and solutions","volume":"13","author":"Treangen","year":"2012","journal-title":"Nat. Rev. Genet"},{"key":"2023020112312663000_btw266-B34","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1186\/gb4165","article-title":"Illuminating the future of DNA sequencing","volume":"15","author":"Watson","year":"2014","journal-title":"Genome Biol"},{"key":"2023020112312663000_btw266-B35","doi-asserted-by":"crossref","first-page":"2669","DOI":"10.1093\/bioinformatics\/btt476","article-title":"The MaSuRCA genome assembler","volume":"29","author":"Zimin","year":"2013","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/12\/i174\/49021168\/bioinformatics_32_12_i174.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/12\/i174\/49021168\/bioinformatics_32_12_i174.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T22:37:32Z","timestamp":1675291052000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/12\/i174\/2288834"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,6,11]]},"references-count":35,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2016,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw266","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,6,15]]},"published":{"date-parts":[[2016,6,11]]}}}