{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T06:22:57Z","timestamp":1775110977127,"version":"3.50.1"},"reference-count":48,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2017,10,9]],"date-time":"2017-10-09T00:00:00Z","timestamp":1507507200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["BBSRC-NSF\/BIO-1564917, IIS-1247726, IIS-1251137, CNS-1408695, CCF-1439084, and CCF-1617618"],"award-info":[{"award-number":["BBSRC-NSF\/BIO-1564917, IIS-1247726, IIS-1251137, CNS-1408695, CCF-1439084, and CCF-1617618"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006234","name":"Sandia National Laboratories","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006234","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,2,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>k-mer-based algorithms have become increasingly popular in the processing of high-throughput sequencing data. These algorithms span the gamut of the analysis pipeline from k-mer counting (e.g. for estimating assembly parameters), to error correction, genome and transcriptome assembly, and even transcript quantification. Yet, these tasks often use very different k-mer representations and data structures. In this article, we show how to build a k-mer-counting and multiset-representation system using the counting quotient filter, a feature-rich approximate membership query data structure. We introduce the k-mer-counting\/querying system Squeakr (Simple Quotient filter-based Exact and Approximate Kmer Representation), which is based on the counting quotient filter. This off-the-shelf data structure turns out to be an efficient (approximate or exact) representation for sets or multisets of k-mers.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Squeakr takes 2\u00d7\u20134.3\u00d7 less time than the state-of-the-art to count and perform a random-point-query workload. Squeakr is memory-efficient, consuming 1.5\u00d7\u20134.3\u00d7 less memory than the state-of-the-art. It offers competitive counting performance. In fact, it is faster for larger k-mers, and answers point queries (i.e. queries for the abundance of a particular k-mer) over an order-of-magnitude faster than other systems. The Squeakr representation of the k-mer multiset turns out to be immediately useful for downstream processing (e.g. de Bruijn graph traversal) because it supports fast queries and dynamic k-mer insertion, deletion, and modification.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>https:\/\/github.com\/splatlab\/squeakr available under BSD 3-Clause License.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx636","type":"journal-article","created":{"date-parts":[[2017,10,6]],"date-time":"2017-10-06T15:12:01Z","timestamp":1507302721000},"page":"568-575","source":"Crossref","is-referenced-by-count":75,"title":["Squeakr: an exact and approximate\n                    <i>k<\/i>\n                    -mer counting system"],"prefix":"10.1093","volume":"34","author":[{"given":"Prashant","family":"Pandey","sequence":"first","affiliation":[{"name":"Department of Computer Science, Stony Brook University, Stony Brook, NY, USA"}]},{"given":"Michael A","family":"Bender","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stony Brook University, Stony Brook, NY, USA"}]},{"given":"Rob","family":"Johnson","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stony Brook University, Stony Brook, NY, USA"},{"name":"VMware Research, Palo Alto, CA, USA"}]},{"given":"Rob","family":"Patro","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stony Brook University, Stony Brook, NY, USA"}]}],"member":"286","published-online":{"date-parts":[[2017,10,9]]},"reference":[{"key":"2023012712332304100_btx636-B1","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1016\/j.ipl.2006.10.007","article-title":"Scalable Bloom filters","volume":"101","author":"Almeida","year":"2007","journal-title":"J. Inform. Proc. Lett"},{"key":"2023012712332304100_btx636-B2","author":"Appleby","year":"2016"},{"key":"2023012712332304100_btx636-B3","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1089\/cmb.2012.0021","article-title":"SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing","volume":"19","author":"Bankevich","year":"2012","journal-title":"J. Comput. Biol"},{"key":"2023012712332304100_btx636-B4","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.14778\/2350229.2350275","article-title":"Don\u2019t thrash: how to cache your hash on flash","volume":"5","author":"Bender","year":"2012","journal-title":"Proc. VLDB Endowment"},{"key":"2023012712332304100_btx636-B5","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1038\/nbt.3238","article-title":"Assembling large genomes with single-molecule sequencing and locality-sensitive hashing","volume":"33","author":"Berlin","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023012712332304100_btx636-B6","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1145\/362686.362692","article-title":"Spacetime trade-offs in hash coding with allowable errors","volume":"13","author":"Bloom","year":"1970","journal-title":"Commun. ACM"},{"key":"2023012712332304100_btx636-B7","first-page":"684","volume-title":"14th Annual European Symposium on Algorithms, LNCS 4168","author":"Bonomi","year":"2006"},{"key":"2023012712332304100_btx636-B8","author":"Brown","year":"2012"},{"key":"2023012712332304100_btx636-B9","first-page":"1","volume-title":"Proceedings of the International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (ADMS)","author":"Canim","year":"2010"},{"key":"2023012712332304100_btx636-B10","first-page":"1710","author":"Carvalho","year":"2016"},{"key":"2023012712332304100_btx636-B11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1748-7188-8-22","article-title":"Space-efficient and exact de Bruijn graph representation based on a Bloom filter","volume":"8","author":"Chikhi","year":"2013","journal-title":"Algorithms Mol. Biol"},{"key":"2023012712332304100_btx636-B12","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1016\/j.jalgor.2003.12.001","article-title":"An improved data stream summary: the count-min sketch and its applications","volume":"55","author":"Cormode","year":"2005","journal-title":"J. Algorithms"},{"key":"2023012712332304100_btx636-B13","author":"Danek","year":"2016"},{"key":"2023012712332304100_btx636-B14","first-page":"635","volume-title":"Proceedings of the 31st International Conference on Distributed Computing Systems (ICDCS)","author":"Debnath","year":"2011"},{"key":"2023012712332304100_btx636-B15","doi-asserted-by":"crossref","first-page":"1569","DOI":"10.1093\/bioinformatics\/btv022","article-title":"Kmc 2: Fast and resource-frugal k-mer counting","volume":"31","author":"Deorowicz","year":"2015","journal-title":"Bioinformatics"},{"key":"2023012712332304100_btx636-B16","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1109\/90.851975","article-title":"Summary cache: a scalable wide-area web cache sharing protocol","volume":"8","author":"Fan","year":"2000","journal-title":"IEEE\/ACM T. Netw"},{"key":"2023012712332304100_btx636-B17","doi-asserted-by":"crossref","first-page":"644","DOI":"10.1038\/nbt.1883","article-title":"Full-length transcriptome assembly from RNA-Seq data without a reference genome","volume":"29","author":"Grabherr","year":"2011","journal-title":"Nat. Biotechnol"},{"key":"2023012712332304100_btx636-B18","first-page":"1354","author":"Heo","year":"2014"},{"key":"2023012712332304100_btx636-B19","author":"Koren","year":"2016"},{"key":"2023012712332304100_btx636-B20","author":"Li","year":"2016"},{"key":"2023012712332304100_btx636-B21","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/bioinformatics\/bts690","article-title":"Musket: a multistage k-mer spectrum-based error corrector for illumina sequence data","volume":"29","author":"Liu","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012712332304100_btx636-B22","first-page":"1","volume-title":"Proceedings of the 27th Symposium on Mass Storage Systems and Technologies (MSST)","author":"Lu","year":"2011"},{"key":"2023012712332304100_btx636-B23","doi-asserted-by":"crossref","first-page":"764","DOI":"10.1093\/bioinformatics\/btr011","article-title":"A fast, lock-free approach for efficient parallel counting of occurrences of k-mers","volume":"27","author":"Mar\u00e7ais","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012712332304100_btx636-B24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-12-333","article-title":"Efficient counting of k-mers in DNA sequences using a bloom filter","volume":"12","author":"Melsted","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012712332304100_btx636-B25","author":"Mohamadi","year":"2017"},{"key":"2023012712332304100_btx636-B26","first-page":"075481","author":"Murray","year":"2016"},{"key":"2023012712332304100_btx636-B27","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1186\/s13059-016-0997-x","article-title":"Mash: fast genome and metagenome distance estimation using minhash","volume":"17","author":"Ondov","year":"2016","journal-title":"Genome Biol"},{"key":"2023012712332304100_btx636-B28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12864-015-1419-2","article-title":"Clark: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers","volume":"16","author":"Ounit","year":"2015","journal-title":"BMC Genomics"},{"key":"2023012712332304100_btx636-B29","first-page":"775","author":"Pandey","year":"2017"},{"key":"2023012712332304100_btx636-B30","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1038\/nbt.2862","article-title":"Sailfish enables alignment-free isoform quantification from RNA-Seq reads using lightweight algorithms","volume":"32","author":"Patro","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023012712332304100_btx636-B31","doi-asserted-by":"crossref","first-page":"13272","DOI":"10.1073\/pnas.1121464109","article-title":"Scaling metagenome sequence assembly with probabilistic de Bruijn graphs","volume":"109","author":"Pell","year":"2012","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012712332304100_btx636-B32","doi-asserted-by":"crossref","first-page":"9748","DOI":"10.1073\/pnas.171285098","article-title":"An Eulerian path approach to DNA fragment assembly","volume":"98","author":"Pevzner","year":"2001","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012712332304100_btx636-B33","first-page":"108","volume-title":"Proceedings 6th International Conference on Experimental Algorithms","author":"Putze","year":"2007"},{"key":"2023012712332304100_btx636-B34","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1109\/TPDS.2013.46","article-title":"Fast Bloom filters and their generalization","volume":"25","author":"Qiao","year":"2014","journal-title":"IEEE Trans. Parallel Distributed Syst"},{"key":"2023012712332304100_btx636-B35","first-page":"652","author":"Rizk","year":"2013"},{"key":"2023012712332304100_btx636-B36","doi-asserted-by":"crossref","first-page":"3363","DOI":"10.1093\/bioinformatics\/bth408","article-title":"Reducing storage requirements for biological sequence comparison","volume":"20","author":"Roberts","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012712332304100_btx636-B37","first-page":"1950","author":"Roy","year":"2014"},{"key":"2023012712332304100_btx636-B38","first-page":"3506","author":"Salmela","year":"2014"},{"key":"2023012712332304100_btx636-B39","first-page":"799","author":"Salmela","year":"2016"},{"key":"2023012712332304100_btx636-B40","doi-asserted-by":"crossref","first-page":"1086","DOI":"10.1093\/bioinformatics\/bts094","article-title":"Oases: robust de novo RNA-Seq assembly across the dynamic range of expression levels","volume":"28","author":"Schulz","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012712332304100_btx636-B41","doi-asserted-by":"crossref","first-page":"1117","DOI":"10.1101\/gr.089532.108","article-title":"Abyss: a parallel assembler for short read sequence data","volume":"19","author":"Simpson","year":"2009","journal-title":"Genome Res"},{"key":"2023012712332304100_btx636-B42","first-page":"300","author":"Solomon","year":"2016"},{"key":"2023012712332304100_btx636-B43","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-014-0509-9","article-title":"Lighter: fast and memory-efficient sequencing error correction without counting","volume":"15","author":"Song","year":"2014","journal-title":"Genome Biol"},{"key":"2023012712332304100_btx636-B44","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1093\/bioinformatics\/btg005","article-title":"Alignment-free sequence comparisona review","volume":"19","author":"Vinga","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012712332304100_btx636-B45","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol"},{"key":"2023012712332304100_btx636-B46","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res"},{"key":"2023012712332304100_btx636-B47","doi-asserted-by":"crossref","first-page":"e101271","DOI":"10.1371\/journal.pone.0101271","article-title":"These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure","volume":"9","author":"Zhang","year":"2014","journal-title":"PLoS One"},{"key":"2023012712332304100_btx636-B48","doi-asserted-by":"crossref","first-page":"i283","DOI":"10.1093\/bioinformatics\/btu288","article-title":"RNA-Skim: a rapid method for RNA-Seq quantification at transcript level","volume":"30","author":"Zhang","year":"2014","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/4\/568\/48913860\/bioinformatics_34_4_568.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/4\/568\/48913860\/bioinformatics_34_4_568.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T08:21:41Z","timestamp":1674807701000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/4\/568\/4386917"}},"subtitle":[],"editor":[{"given":"Bonnie","family":"Berger","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,10,9]]},"references-count":48,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,2,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx636","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/122077","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,2,15]]},"published":{"date-parts":[[2017,10,9]]}}}