{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T14:19:27Z","timestamp":1756995567583,"version":"3.37.3"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"23","license":[{"start":{"date-parts":[[2019,4,30]],"date-time":"2019-04-30T00:00:00Z","timestamp":1556582400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation of China","doi-asserted-by":"publisher","award":["31501070","81671831","61762008","81702482"],"award-info":[{"award-number":["31501070","81671831","61762008","81702482"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Natural Science Foundation of Hubei","award":["2017CFB137","2016GXNSFCA380006","2018GXNSFAA281275"],"award-info":[{"award-number":["2017CFB137","2016GXNSFCA380006","2018GXNSFAA281275"]}]},{"name":"Scientific Research Found of Guangxi University","award":["XGZ150316"],"award-info":[{"award-number":["XGZ150316"]}]},{"name":"and Taihe Hospital","award":["2016JZ11"],"award-info":[{"award-number":["2016JZ11"]}]},{"name":"Kwan Im Thong Hood Cho Temple chair professorship"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>K-mers along with their frequency have served as an elementary building block for error correction, repeat detection, multiple sequence alignment, genome assembly, etc., attracting intensive studies in k-mer counting. However, the output of k-mer counters itself is large; very often, it is too large to fit into main memory, leading to highly narrowed usability.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We introduce a novel idea of encoding k-mers as well as their frequency, achieving good memory saving and retrieval efficiency. Specifically, we propose a Bloom filter-like data structure to encode counted k-mers by coupled-bit arrays\u2014one for k-mer representation and the other for frequency encoding. Experiments on five real datasets show that the average memory-saving ratio on all 31-mers is as high as 13.81 as compared with raw input, with 7 hash functions. At the same time, the retrieval time complexity is well controlled (effectively constant), and the false-positive rate is decreased by two orders of magnitude.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The source codes of our algorithm are available at github.com\/lzhLab\/kmcEx.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz299","type":"journal-article","created":{"date-parts":[[2019,4,19]],"date-time":"2019-04-19T11:09:40Z","timestamp":1555672180000},"page":"4871-4878","source":"Crossref","is-referenced-by-count":4,"title":["kmcEx: memory-frugal and retrieval-efficient encoding of counted <i>k<\/i>-mers"],"prefix":"10.1093","volume":"35","author":[{"given":"Peng","family":"Jiang","sequence":"first","affiliation":[{"name":"Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine , Shiyan, Hubei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jie","family":"Luo","sequence":"additional","affiliation":[{"name":"Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine , Shiyan, Hubei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yiqi","family":"Wang","sequence":"additional","affiliation":[{"name":"Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine , Shiyan, Hubei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pingji","family":"Deng","sequence":"additional","affiliation":[{"name":"Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine , Shiyan, Hubei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bertil","family":"Schmidt","sequence":"additional","affiliation":[{"name":"Institute of Computer Science, Johannes Gutenberg University Mainz , Mainz Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiangjun","family":"Tang","sequence":"additional","affiliation":[{"name":"Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine , Shiyan, Hubei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ningjiang","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Computing and Electronic Information, Guangxi University , Nanning, Guangxi, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Limsoon","family":"Wong","sequence":"additional","affiliation":[{"name":"School of Computing, National University of Singapore , Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Liang","family":"Zhao","sequence":"additional","affiliation":[{"name":"Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine , Shiyan, Hubei, China"},{"name":"School of Computing and Electronic Information, Guangxi University , Nanning, Guangxi, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2019,4,30]]},"reference":[{"key":"2023013108304438800_btz299-B1","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1145\/362686.362692","article-title":"Space\/time trade-offs in hash coding with allowable errors","volume":"13","author":"Bloom","year":"1970","journal-title":"Commun. ACM"},{"key":"2023013108304438800_btz299-B2","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1007\/978-3-642-33122-0_18","volume-title":"Algorithms in Bioinformatics","author":"Bowe","year":"2012"},{"key":"2023013108304438800_btz299-B3","doi-asserted-by":"crossref","first-page":"22.","DOI":"10.1186\/1748-7188-8-22","article-title":"Space-efficient and exact de Bruijn graph representation based on a Bloom filter","volume":"8","author":"Chikhi","year":"2013","journal-title":"Algorithm. Mol. Biol"},{"volume-title":"Research in Computational Molecular Biology","year":"2014","author":"Chikhi","key":"2023013108304438800_btz299-B4"},{"key":"2023013108304438800_btz299-B5","doi-asserted-by":"crossref","first-page":"987","DOI":"10.1038\/nbt.2023","article-title":"How to apply de Bruijn graphs to genome assembly","volume":"29","author":"Compeau","year":"2011","journal-title":"Nat. Biotechnol"},{"key":"2023013108304438800_btz299-B6","doi-asserted-by":"crossref","first-page":"900.","DOI":"10.12688\/f1000research.6924.1","article-title":"The khmer software package: enabling efficient nucleotide sequence analysis","volume":"4","author":"Crusoe","year":"2015","journal-title":"F1000Research"},{"key":"2023013108304438800_btz299-B7","doi-asserted-by":"crossref","first-page":"56.","DOI":"10.3390\/info7040056","article-title":"A survey on data compression methods for biological sequences","volume":"7","author":"Hosseini","year":"2016","journal-title":"Information"},{"key":"2023013108304438800_btz299-B8","doi-asserted-by":"crossref","first-page":"e83","DOI":"10.1093\/nar\/gky315","article-title":"MeShClust: an intelligent tool for clustering DNA sequences","volume":"46","author":"James","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023013108304438800_btz299-B9","doi-asserted-by":"crossref","first-page":"2759","DOI":"10.1093\/bioinformatics\/btx304","article-title":"KMC 3: counting and manipulating k-mer statistics","volume":"33","author":"Kokot","year":"2017","journal-title":"Bioinformatics"},{"key":"2023013108304438800_btz299-B10","doi-asserted-by":"crossref","first-page":"2783","DOI":"10.1093\/bioinformatics\/btw345","article-title":"KCMBT: a k-mer counter based on multiple burst trees","volume":"32","author":"Mamun","year":"2016","journal-title":"Bioinformatics"},{"key":"2023013108304438800_btz299-B11","doi-asserted-by":"crossref","first-page":"764","DOI":"10.1093\/bioinformatics\/btr011","article-title":"A fast, lock-free approach for efficient parallel counting of occurrences of k-mers","volume":"27","author":"Mar\u00e7ais","year":"2011","journal-title":"Bioinformatics"},{"key":"2023013108304438800_btz299-B12","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1038\/nmeth.f.268","article-title":"Next-generation gap","volume":"6","author":"McPherson","year":"2009","journal-title":"Nat. Methods"},{"key":"2023013108304438800_btz299-B13","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1186\/1471-2105-12-333","article-title":"Efficient counting of k-mers in DNA sequences using a bloom filter","volume":"12","author":"Melsted","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023013108304438800_btz299-B14","doi-asserted-by":"crossref","first-page":"i133","DOI":"10.1093\/bioinformatics\/btx261","article-title":"deBGR: an efficient and near-exact representation of the weighted de Bruijn graph","volume":"33","author":"Pandey","year":"2017","journal-title":"Bioinformatics"},{"key":"2023013108304438800_btz299-B15","doi-asserted-by":"crossref","first-page":"13272","DOI":"10.1073\/pnas.1121464109","article-title":"Scaling metagenome sequence assembly with probabilistic de Bruijn graphs","volume":"109","author":"Pell","year":"2012","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023013108304438800_btz299-B16","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1089\/cmb.2016.0155","article-title":"Improving Bloom filter performance on sequence data using k-mer Bloom filters","volume":"24","author":"Pellow","year":"2017","journal-title":"J. Comput. Biol"},{"key":"2023013108304438800_btz299-B17","doi-asserted-by":"crossref","first-page":"652.","DOI":"10.1093\/bioinformatics\/btt020","article-title":"DSK: k-mer counting with very low memory usage","volume":"29","author":"Rizk","year":"2013","journal-title":"Bioinformatics"},{"key":"2023013108304438800_btz299-B18","doi-asserted-by":"crossref","first-page":"2.","DOI":"10.1186\/1748-7188-9-2","article-title":"Using cascading Bloom filters to improve the memory usage for de Bruijn graphs","volume":"9","author":"Salikhov","year":"2014","journal-title":"Algorithm. Mol. Biol"},{"key":"2023013108304438800_btz299-B19","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1101\/gr.131383.111","article-title":"GAGE: a critical evaluation of genome assemblies and assembly algorithms","volume":"22","author":"Salzberg","year":"2012","journal-title":"Genome Res"},{"key":"2023013108304438800_btz299-B20","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/s41576-018-0016-z","article-title":"From genome-wide associations to candidate causal variants by statistical fine-mapping","volume":"19","author":"Schaid","year":"2018","journal-title":"Nat. Rev. Genet"},{"key":"2023013108304438800_btz299-B21","doi-asserted-by":"crossref","first-page":"952","DOI":"10.1021\/ci600526a","article-title":"Mathematical correction for fingerprint similarity measures to improve chemical retrieval","volume":"47","author":"Swamidass","year":"2007","journal-title":"J. Chem. Inform. Model"},{"key":"2023013108304438800_btz299-B22","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res"},{"key":"2023013108304438800_btz299-B23","doi-asserted-by":"crossref","first-page":"3844","DOI":"10.1093\/bioinformatics\/btx089","article-title":"MapReduce for accurate error correction of next-generation sequencing data","volume":"33","author":"Zhao","year":"2017","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz299\/28683277\/btz299.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/23\/4871\/48978235\/bioinformatics_35_23_4871.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/23\/4871\/48978235\/bioinformatics_35_23_4871.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T17:35:36Z","timestamp":1675186536000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/23\/4871\/5481953"}},"subtitle":[],"editor":[{"given":"Bonnie","family":"Berger","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2019,4,30]]},"references-count":23,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2019,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz299","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2019,12,1]]},"published":{"date-parts":[[2019,4,30]]}}}