{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T03:16:12Z","timestamp":1781752572942,"version":"3.54.5"},"reference-count":14,"publisher":"Oxford University Press (OUP)","issue":"14","license":[{"start":{"date-parts":[[2017,7,12]],"date-time":"2017-07-12T00:00:00Z","timestamp":1499817600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003977","name":"Israel Science Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003977","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The minimizers scheme is a method for selecting k-mers from sequences. It is used in many bioinformatics software tools to bin comparable sequences or to sample a sequence in a deterministic fashion at approximately regular intervals, in order to reduce memory consumption and processing time. Although very useful, the minimizers selection procedure has undesirable behaviors (e.g. too many k-mers are selected when processing certain sequences). Some of these problems were already known to the authors of the minimizers technique, and the natural lexicographic ordering of k-mers used by minimizers was recognized as their origin. Many software tools using minimizers employ ad hoc variations of the lexicographic order to alleviate those issues.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We provide an in-depth analysis of the effect of k-mer ordering on the performance of the minimizers technique. By using small universal hitting sets (a recently defined concept), we show how to significantly improve the performance of minimizers and avoid some of its worse behaviors. Based on these results, we encourage bioinformatics software developers to use an ordering based on a universal hitting set or, if not possible, a randomized ordering, rather than the lexicographic order. This analysis also settles negatively a conjecture (by Schleimer et al.) on the expected density of minimizers in a random sequence.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and Implementation<\/jats:title>\n                    <jats:p>The software used for this analysis is available on GitHub: https:\/\/github.com\/gmarcais\/minimizers.git.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx235","type":"journal-article","created":{"date-parts":[[2017,4,20]],"date-time":"2017-04-20T03:52:13Z","timestamp":1492660333000},"page":"i110-i117","source":"Crossref","is-referenced-by-count":79,"title":["Improving the performance of minimizers and winnowing schemes"],"prefix":"10.1093","volume":"33","author":[{"given":"Guillaume","family":"Mar\u00e7ais","sequence":"first","affiliation":[{"name":"Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"David","family":"Pellow","sequence":"additional","affiliation":[{"name":"Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Daniel","family":"Bork","sequence":"additional","affiliation":[{"name":"Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yaron","family":"Orenstein","sequence":"additional","affiliation":[{"name":"Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ron","family":"Shamir","sequence":"additional","affiliation":[{"name":"Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Carl","family":"Kingsford","sequence":"additional","affiliation":[{"name":"Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2017,7,12]]},"reference":[{"key":"2023051506472984000_btx235-B1","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1089\/cmb.2014.0160","article-title":"On the representation of De Bruijn graphs","volume":"22","author":"Chikhi","year":"2015","journal-title":"J. Comput. Biol"},{"key":"2023051506472984000_btx235-B2","doi-asserted-by":"crossref","first-page":"i201","DOI":"10.1093\/bioinformatics\/btw279","article-title":"Compacting de Bruijn graphs from sequencing data quickly and in low memory","volume":"32","author":"Chikhi","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051506472984000_btx235-B3","first-page":"758","article-title":"A combinatorial problem","volume":"49","author":"de Bruijn","year":"1946","journal-title":"Proceedings of the Section of Sciences of the Koninklijke Nederlandse Akademie Van Wetenschappen Te Amsterdam"},{"key":"2023051506472984000_btx235-B4","doi-asserted-by":"crossref","first-page":"1569","DOI":"10.1093\/bioinformatics\/btv022","article-title":"KMC 2: fast and resource-frugal k-mer counting","volume":"31","author":"Deorowicz","year":"2015","journal-title":"Bioinformatics"},{"key":"2023051506472984000_btx235-B5","volume-title":"String Processing and Information Retrieval: 22nd International Symposium","author":"Grabowski","year":"2015"},{"key":"2023051506472984000_btx235-B6","doi-asserted-by":"crossref","first-page":"2103","DOI":"10.1093\/bioinformatics\/btw152","article-title":"Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences","volume":"32","author":"Li","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051506472984000_btx235-B7","author":"Li","year":"2015"},{"key":"2023051506472984000_btx235-B8","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1007\/978-3-319-43681-4_21","volume-title":"Algorithms in Bioinformatics","author":"Orenstein","year":"2016"},{"key":"2023051506472984000_btx235-B9","author":"Orenstein","year":"2016"},{"key":"2023051506472984000_btx235-B10","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1089\/cmb.2004.11.734","article-title":"A preprocessor for shotgun assembly of large genomes","volume":"11","author":"Roberts","year":"2004","journal-title":"J. Comput. Biol"},{"key":"2023051506472984000_btx235-B11","doi-asserted-by":"crossref","first-page":"3363","DOI":"10.1093\/bioinformatics\/bth408","article-title":"Reducing storage requirements for biological sequence comparison","volume":"20","author":"Roberts","year":"2004","journal-title":"Bioinformatics"},{"key":"2023051506472984000_btx235-B12","author":"Schleimer","year":"2003"},{"key":"2023051506472984000_btx235-B13","doi-asserted-by":"crossref","first-page":"R46.","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol"},{"key":"2023051506472984000_btx235-B14","doi-asserted-by":"crossref","first-page":"S1.","DOI":"10.1186\/1471-2105-13-S6-S1","article-title":"Exploiting sparseness in de novo genome assembly","volume":"13","author":"Ye","year":"2012","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/14\/i110\/50314827\/bioinformatics_33_14_i110.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/14\/i110\/50314827\/bioinformatics_33_14_i110.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T02:47:48Z","timestamp":1684118868000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/14\/i110\/3953951"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,7,12]]},"references-count":14,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2017,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx235","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/104075","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,7,15]]},"published":{"date-parts":[[2017,7,12]]}}}