{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T00:36:34Z","timestamp":1774312594629,"version":"3.50.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T00:00:00Z","timestamp":1752537600000},"content-version":"vor","delay-in-days":14,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003977","name":"Israel Science Foundation","doi-asserted-by":"publisher","award":["810\/21"],"award-info":[{"award-number":["810\/21"]}],"id":[{"id":"10.13039\/501100003977","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Israel Data Science and AI Initiative"},{"DOI":"10.13039\/100010663","name":"ERC","doi-asserted-by":"publisher","award":["683064"],"award-info":[{"award-number":["683064"]}],"id":[{"id":"10.13039\/100010663","id-type":"DOI","asserted-by":"publisher"}]},{"name":"EU\u2019s Horizon 2020 Research and Innovation Programme"},{"name":"Center for Absorption in Science of the Ministry of Aliyah and Immigration"},{"name":"United States\u2013Israel Binational Science Foundation","award":["2020297"],"award-info":[{"award-number":["2020297"]}]},{"DOI":"10.13039\/501100003977","name":"Israel Science Foundation","doi-asserted-by":"publisher","award":["358\/21"],"award-info":[{"award-number":["358\/21"]}],"id":[{"id":"10.13039\/501100003977","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Minimizers are the most popular k-mer selection scheme in algorithms and data structures analyzing high-throughput sequencing (HTS) data. In a minimizer scheme, the smallest k-mer by some predefined order is selected as the representative of a sequence window containing w consecutive k-mers, which results in overlapping windows often selecting the same k-mer. Minimizers that achieve the lowest frequency of selected k-mers over a random DNA sequence, termed the expected density, are desired for improved performance of HTS analyses. Yet, no method to date exists to generate minimizers that achieve minimum expected density. Moreover, for k and w values used by common HTS algorithms and data structures, there is a gap between densities achieved by existing selection schemes and the theoretical lower bound.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We developed GreedyMini, a toolkit of methods to generate minimizers with low expected or particular density, to improve minimizers, to extend minimizers to larger alphabets, k, and w, and to measure the expected density of a given minimizer efficiently. We demonstrate over various combinations of k and w values, including those of popular HTS methods, that GreedyMini can generate DNA minimizers that achieve expected densities very close to the lower bound, and both expected and particular densities much lower compared to existing selection schemes. Moreover, we show that GreedyMini\u2019s k-mer rank-retrieval time is comparable to common k-mer hash functions. We expect GreedyMini to improve the performance of many HTS algorithms and data structures and advance the research of k-mer selection schemes.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The toolkit, its source code, and precomputed minimizers for a variety of (k,w) pairs are available via https:\/\/github.com\/OrensteinLab\/GreedyMini.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf251","type":"journal-article","created":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T13:02:29Z","timestamp":1752584549000},"page":"i275-i284","source":"Crossref","is-referenced-by-count":5,"title":["GreedyMini: generating low-density DNA minimizers"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8357-2802","authenticated-orcid":false,"given":"Shay","family":"Golan","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Haifa , Haifa 3498838,","place":["Israel"]},{"name":"Efi Arazi School of Computer Science, Reichman University , Herzliya 4610101,","place":["Israel"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-5738-0882","authenticated-orcid":false,"given":"Ido","family":"Tziony","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Bar-Ilan University , Ramat Gan 5290002,","place":["Israel"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2989-1113","authenticated-orcid":false,"given":"Matan","family":"Kraus","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Bar-Ilan University , Ramat Gan 5290002,","place":["Israel"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3583-3112","authenticated-orcid":false,"given":"Yaron","family":"Orenstein","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Bar-Ilan University , Ramat Gan 5290002,","place":["Israel"]},{"name":"The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University , Ramat Gan 5290002,","place":["Israel"]}]},{"given":"Arseny","family":"Shur","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Bar-Ilan University , Ramat Gan 5290002,","place":["Israel"]}]}],"member":"286","published-online":{"date-parts":[[2025,7,15]]},"reference":[{"key":"2025071509022059200_btaf251-B1","doi-asserted-by":"publisher","first-page":"1052","DOI":"10.1089\/cmb.2021.0270","article-title":"Metaprob 2: metagenomic reads binning based on assembly using minimizers and k-mers statistics","volume":"28","author":"Andreace","year":"2021","journal-title":"J Comput Biol"},{"key":"2025071509022059200_btaf251-B2","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1089\/cmb.2014.0160","article-title":"On the representation of de Bruijn graphs","volume":"22","author":"Chikhi","year":"2015","journal-title":"J Comput Biol"},{"key":"2025071509022059200_btaf251-B3","author":"DeBlasio","year":"2019"},{"key":"2025071509022059200_btaf251-B4","doi-asserted-by":"crossref","first-page":"1569","DOI":"10.1093\/bioinformatics\/btv022","article-title":"KMC 2: fast and resource-frugal k-mer counting","volume":"31","author":"Deorowicz","year":"2015","journal-title":"Bioinformatics"},{"key":"2025071509022059200_btaf251-B5","doi-asserted-by":"crossref","first-page":"e10805","DOI":"10.7717\/peerj.10805","article-title":"Syncmers are more sensitive than minimizers for selecting conserved k-mers in biological sequences","volume":"9","author":"Edgar","year":"2021","journal-title":"PeerJ"},{"key":"2025071509022059200_btaf251-B6","author":"Ekim","year":"2020"},{"key":"2025071509022059200_btaf251-B7","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1016\/0012-365X(78)90002-X","article-title":"Necklaces of beads in k colors and k-ary de Bruijn sequences","volume":"23","author":"Fredricksen","year":"1978","journal-title":"Discret. Math"},{"key":"2025071509022059200_btaf251-B8","first-page":"1","author":"Groot Koerkamp","year":"2024"},{"key":"2025071509022059200_btaf251-B9","doi-asserted-by":"crossref","first-page":"1288","DOI":"10.1089\/cmb.2022.0275","article-title":"Differentiable learning of sequence-specific minimizer schemes with DeepMinimizer","volume":"29","author":"Hoang","year":"2022","journal-title":"J Comput Biol"},{"key":"2025071509022059200_btaf251-B10","doi-asserted-by":"publisher","first-page":"btae736","DOI":"10.1093\/bioinformatics\/btae736","article-title":"A near-tight lower bound on the density of forward sampling schemes","volume":"41","author":"Kille","year":"2024","journal-title":"Bioinformatics"},{"key":"2025071509022059200_btaf251-B11","doi-asserted-by":"publisher","author":"Koerkamp","DOI":"10.1186\/s13015-025-00270-0"},{"key":"2025071509022059200_btaf251-B12","doi-asserted-by":"publisher","first-page":"2759","DOI":"10.1093\/bioinformatics\/btx304","article-title":"KMC 3: counting and manipulating k-mer statistics","volume":"33","author":"Kokot","year":"2017","journal-title":"Bioinformatics"},{"key":"2025071509022059200_btaf251-B13","doi-asserted-by":"publisher","first-page":"2103","DOI":"10.1093\/bioinformatics\/btw152","article-title":"Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences","volume":"32","author":"Li","year":"2016","journal-title":"Bioinformatics"},{"key":"2025071509022059200_btaf251-B14","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2025071509022059200_btaf251-B15","doi-asserted-by":"crossref","first-page":"i110","DOI":"10.1093\/bioinformatics\/btx235","article-title":"Improving the performance of minimizers and winnowing schemes","volume":"33","author":"Mar\u00e7ais","year":"2017","journal-title":"Bioinformatics"},{"key":"2025071509022059200_btaf251-B16","doi-asserted-by":"crossref","first-page":"i13","DOI":"10.1093\/bioinformatics\/bty258","article-title":"Asymptotically optimal minimizers schemes","volume":"34","author":"Mar\u00e7ais","year":"2018","journal-title":"Bioinformatics"},{"key":"2025071509022059200_btaf251-B17","doi-asserted-by":"publisher","first-page":"2289","DOI":"10.1016\/j.csbj.2024.05.025","article-title":"A survey of k-mer methods and applications in bioinformatics","volume":"23","author":"Moeckel","year":"2024","journal-title":"Comput Struct Biotechnol J"},{"key":"2025071509022059200_btaf251-B18","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1186\/s13059-024-03414-4","article-title":"When less is more: sketching with minimizers in genomics","volume":"25","author":"Ndiaye","year":"2024","journal-title":"Genome Biol"},{"key":"2025071509022059200_btaf251-B19","author":"Orenstein","year":"2016"},{"key":"2025071509022059200_btaf251-B20","doi-asserted-by":"crossref","first-page":"e1005777","DOI":"10.1371\/journal.pcbi.1005777","article-title":"Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing","volume":"13","author":"Orenstein","year":"2017","journal-title":"PLoS Comput Biol"},{"key":"2025071509022059200_btaf251-B21","first-page":"1154","article-title":"Efficient minimizer orders for large values of k using minimum decycling sets","volume":"33","author":"Pellow","year":"2023","journal-title":"Genome Res"},{"key":"2025071509022059200_btaf251-B22","doi-asserted-by":"publisher","first-page":"i185","DOI":"10.1093\/bioinformatics\/btac245","article-title":"Sparse and skew hashing of k-mers","volume":"38","author":"Pibiri","year":"2022","journal-title":"Bioinformatics"},{"key":"2025071509022059200_btaf251-B23","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1186\/s13059-020-02157-2","article-title":"GraphAligner: rapid and versatile sequence-to-graph alignment","volume":"21","author":"Rautiainen","year":"2020","journal-title":"Genome Biol"},{"key":"2025071509022059200_btaf251-B24","doi-asserted-by":"crossref","first-page":"3363","DOI":"10.1093\/bioinformatics\/bth408","article-title":"Reducing storage requirements for biological sequence comparison","volume":"20","author":"Roberts","year":"2004","journal-title":"Bioinformatics"},{"key":"2025071509022059200_btaf251-B25","doi-asserted-by":"crossref","first-page":"2080","DOI":"10.1101\/gr.275648.121","article-title":"Effective sequence similarity detection with strobemers","volume":"31","author":"Sahlin","year":"2021","journal-title":"Genome Res"},{"key":"2025071509022059200_btaf251-B26","doi-asserted-by":"publisher","first-page":"524","DOI":"10.1016\/J.DISC.2016.09.008","article-title":"A simple shift rule for k-ary de Bruijn sequences","volume":"340","author":"Sawada","year":"2017","journal-title":"Discret Math"},{"key":"2025071509022059200_btaf251-B27","doi-asserted-by":"publisher","first-page":"abg8871","DOI":"10.1126\/science.abg8871","article-title":"Pangenomics enables genotyping of known structural variants in 5202 diverse genomes","volume":"374","author":"Sir\u00e9n","year":"2021","journal-title":"Science"},{"key":"2025071509022059200_btaf251-B28","doi-asserted-by":"crossref","first-page":"R46","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol"},{"key":"2025071509022059200_btaf251-B29","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1186\/s13059-019-1891-0","article-title":"Improved metagenomic analysis with Kraken 2","volume":"20","author":"Wood","year":"2019","journal-title":"Genome Biol"},{"key":"2025071509022059200_btaf251-B30","doi-asserted-by":"crossref","first-page":"i119","DOI":"10.1093\/bioinformatics\/btaa472","article-title":"Improved design and analysis of practical minimizers","volume":"36","author":"Zheng","year":"2020","journal-title":"Bioinformatics"},{"key":"2025071509022059200_btaf251-B31","doi-asserted-by":"publisher","first-page":"i187","DOI":"10.1093\/bioinformatics\/btab313","article-title":"Sequence-specific minimizers via polar sets","volume":"37","author":"Zheng","year":"2021","journal-title":"Bioinformatics"},{"key":"2025071509022059200_btaf251-B32","doi-asserted-by":"crossref","first-page":"1251","DOI":"10.1089\/cmb.2023.0094","article-title":"Creating and using minimizer sketches in computational genomics","volume":"30","author":"Zheng","year":"2023","journal-title":"J Comput Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/Supplement_1\/i275\/63745752\/btaf251.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/Supplement_1\/i275\/63745752\/btaf251.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T13:02:32Z","timestamp":1752584552000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/41\/Supplement_1\/i275\/8199415"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,1]]},"references-count":32,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2025,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf251","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,7,1]]}}}