{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,6]],"date-time":"2025-10-06T09:27:10Z","timestamp":1759742830458},"reference-count":19,"publisher":"Oxford University Press (OUP)","issue":"18","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: A massive number of bioinformatics applications require counting of k-length substrings in genetically important long strings. A k-mer counter generates the frequencies of each k-length substring in genome sequences. Genome assembly, repeat detection, multiple sequence alignment, error detection and many other related applications use a k-mer counter as a building block. Very fast and efficient algorithms are necessary to count k-mers in large data sets to be useful in such applications.<\/jats:p>\n               <jats:p>Results: We propose a novel trie-based algorithm for this k-mer counting problem. We compare our devised algorithm k-mer Counter based on Multiple Burst Trees (KCMBT) with available all well-known algorithms. Our experimental results show that KCMBT is around 30% faster than the previous best-performing algorithm KMC2 for human genome dataset. As another example, our algorithm is around six times faster than Jellyfish2. Overall, KCMBT is 20\u201330% faster than KMC2 on five benchmark data sets when both the algorithms were run using multiple threads.<\/jats:p>\n               <jats:p>Availability and Implementation: KCMBT is freely available on GitHub: (https:\/\/github.com\/abdullah009\/kcmbt_mt).<\/jats:p>\n               <jats:p>Contact: \u00a0rajasek@engr.uconn.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw345","type":"journal-article","created":{"date-parts":[[2016,6,10]],"date-time":"2016-06-10T00:56:16Z","timestamp":1465520176000},"page":"2783-2790","source":"Crossref","is-referenced-by-count":24,"title":["KCMBT: a <i>k<\/i>-mer Counter based on Multiple Burst Trees"],"prefix":"10.1093","volume":"32","author":[{"given":"Abdullah-Al","family":"Mamun","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA"}]},{"given":"Soumitra","family":"Pal","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA"}]},{"given":"Sanguthevar","family":"Rajasekaran","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA"}]}],"member":"286","published-online":{"date-parts":[[2016,6,9]]},"reference":[{"key":"2023020113385497800_btw345-B1","doi-asserted-by":"crossref","first-page":"2070","DOI":"10.1093\/bioinformatics\/btu152","article-title":"KAnalyze: a fast versatile pipelined k-mer toolkit","volume":"30","author":"Audano","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020113385497800_btw345-B2","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1145\/362686.362692","article-title":"Space\/time trade-offs in hash coding with allowable errors","volume":"13","author":"Bloom","year":"1970","journal-title":"Commun. ACM"},{"key":"2023020113385497800_btw345-B3","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1186\/1471-2105-14-160","article-title":"Disk-based k-mer counting on a PC","volume":"14","author":"Deorowicz","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023020113385497800_btw345-B4","doi-asserted-by":"crossref","first-page":"1569","DOI":"10.1093\/bioinformatics\/btv022","article-title":"KMC 2: fast and resource-frugal k-mer counting","volume":"31","author":"Deorowicz","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020113385497800_btw345-B5","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1145\/506309.506312","article-title":"Burst tries: a fast, efficient data structure for string keys","volume":"20","author":"Heinz","year":"2002","journal-title":"ACM Trans. Inf. Syst. (TOIS)"},{"key":"2023020113385497800_btw345-B6","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1101\/gr.828403","article-title":"Whole-genome sequence assembly for mammalian genomes: Arachne 2","volume":"13","author":"Jaffe","year":"2003","journal-title":"Genome Res"},{"key":"2023020113385497800_btw345-B7","doi-asserted-by":"crossref","first-page":"R116","DOI":"10.1186\/gb-2010-11-11-r116","article-title":"Quake: quality-aware detection and correction of sequencing errors","volume":"11","author":"Kelley","year":"2010","journal-title":"Genome Biol"},{"key":"2023020113385497800_btw345-B8","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1186\/1471-2164-9-517","article-title":"A new method to compute k-mer frequencies and its application to annotate large repetitive plant genomes","volume":"9","author":"Kurtz","year":"2008","journal-title":"BMC Genomics"},{"key":"2023020113385497800_btw345-B9","author":"Li","year":"2015"},{"key":"2023020113385497800_btw345-B10","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/bioinformatics\/bts690","article-title":"Musket: a multistage k-mer spectrum-based error corrector for illumina sequence data","volume":"29","author":"Liu","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020113385497800_btw345-B11","doi-asserted-by":"crossref","first-page":"764","DOI":"10.1093\/bioinformatics\/btr011","article-title":"A fast, lock-free approach for efficient parallel counting of occurrences of k-mers","volume":"27","author":"Mar\u00e7ais","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020113385497800_btw345-B12","doi-asserted-by":"crossref","first-page":"i137","DOI":"10.1093\/bioinformatics\/btr208","article-title":"Error correction of high-throughput sequencing datasets with non-uniform coverage","volume":"27","author":"Medvedev","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020113385497800_btw345-B13","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1186\/1471-2105-12-333","article-title":"Efficient counting of k-mers in DNA sequences using a bloom filter","volume":"12","author":"Melsted","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023020113385497800_btw345-B14","doi-asserted-by":"crossref","first-page":"2818","DOI":"10.1093\/bioinformatics\/btn548","article-title":"Aggressive assembly of pyrosequencing reads with mates","volume":"24","author":"Miller","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020113385497800_btw345-B15","doi-asserted-by":"crossref","first-page":"9748","DOI":"10.1073\/pnas.171285098","article-title":"An Eulerian path approach to DNA fragment assembly","volume":"98","author":"Pevzner","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020113385497800_btw345-B16","doi-asserted-by":"crossref","first-page":"652","DOI":"10.1093\/bioinformatics\/btt020","article-title":"DSK k-mer counting with very low memory usage","volume":"29","author":"Rizk","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020113385497800_btw345-B17","doi-asserted-by":"crossref","first-page":"1950","DOI":"10.1093\/bioinformatics\/btu132","article-title":"Turtle: identifying frequent k-mers with cache-efficient algorithms","volume":"30","author":"Roy","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020113385497800_btw345-B18","first-page":"1","article-title":"Cache-conscious sorting of large sets of strings with dynamic tries","volume":"9","author":"Sinha","year":"2004","journal-title":"J. Exp. Algorithmics (JEA)"},{"key":"2023020113385497800_btw345-B19","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/18\/2783\/49020598\/bioinformatics_32_18_2783.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/18\/2783\/49020598\/bioinformatics_32_18_2783.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T23:43:31Z","timestamp":1675295011000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/18\/2783\/1744340"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,6,9]]},"references-count":19,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2016,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw345","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,9,15]]},"published":{"date-parts":[[2016,6,9]]}}}