{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T03:46:21Z","timestamp":1775274381932,"version":"3.50.1"},"reference-count":14,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2020,7,13]],"date-time":"2020-07-13T00:00:00Z","timestamp":1594598400000},"content-version":"vor","delay-in-days":12,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Gordon and Betty Moore Foundation\u2019s Data-Driven Discovery Initiative","award":["GBMF4554"],"award-info":[{"award-number":["GBMF4554"]}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF-1256087"],"award-info":[{"award-number":["CCF-1256087"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF-1319998"],"award-info":[{"award-number":["CCF-1319998"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01GM122935"],"award-info":[{"award-number":["R01GM122935"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Minimizers are methods to sample k-mers from a string, with the guarantee that similar set of k-mers will be chosen on similar strings. It is parameterized by the k-mer length k, a window length w and an order on the k-mers. Minimizers are used in a large number of softwares and pipelines to improve computation efficiency and decrease memory usage. Despite the method\u2019s popularity, many theoretical questions regarding its performance remain open. The core metric for measuring performance of a minimizer is the density, which measures the sparsity of sampled k-mers. The theoretical optimal density for a minimizer is 1\/w, provably not achievable in general. For given k and w, little is known about asymptotically optimal minimizers, that is minimizers with density O(1\/w).<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We derive a necessary and sufficient condition for existence of asymptotically optimal minimizers. We also provide a randomized algorithm, called the Miniception, to design minimizers with the best theoretical guarantee to date on density in practical scenarios. Constructing and using the Miniception is as easy as constructing and using a random minimizer, which allows the design of efficient minimizers that scale to the values of k and w used in current bioinformatics software programs.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Reference implementation of the Miniception and the codes for analysis can be found at https:\/\/github.com\/kingsford-group\/miniception.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa472","type":"journal-article","created":{"date-parts":[[2020,6,9]],"date-time":"2020-06-09T15:11:13Z","timestamp":1591715473000},"page":"i119-i127","source":"Crossref","is-referenced-by-count":59,"title":["Improved design and analysis of practical minimizers"],"prefix":"10.1093","volume":"36","author":[{"given":"Hongyu","family":"Zheng","sequence":"first","affiliation":[{"name":"School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Computational Biology Department,"}]},{"given":"Carl","family":"Kingsford","sequence":"additional","affiliation":[{"name":"School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Computational Biology Department,"}]},{"given":"Guillaume","family":"Mar\u00e7ais","sequence":"additional","affiliation":[{"name":"School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Computational Biology Department,"}]}],"member":"286","published-online":{"date-parts":[[2020,7,13]]},"reference":[{"key":"2024021913325188500_btaa472-B1","doi-asserted-by":"crossref","first-page":"i201","DOI":"10.1093\/bioinformatics\/btw279","article-title":"Compacting de Bruijn graphs from sequencing data quickly and in low memory","volume":"32","author":"Chikhi","year":"2016","journal-title":"Bioinformatics"},{"key":"2024021913325188500_btaa472-B2","first-page":"167","author":"DeBlasio","year":"2019"},{"key":"2024021913325188500_btaa472-B3","author":"Ekim","year":"2020"},{"key":"2024021913325188500_btaa472-B4","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2024021913325188500_btaa472-B5","doi-asserted-by":"crossref","first-page":"i110","DOI":"10.1093\/bioinformatics\/btx235","article-title":"Improving the performance of minimizers and winnowing schemes","volume":"33","author":"Mar\u00e7ais","year":"2017","journal-title":"Bioinformatics"},{"key":"2024021913325188500_btaa472-B6","doi-asserted-by":"crossref","first-page":"i13","DOI":"10.1093\/bioinformatics\/bty258","article-title":"Asymptotically optimal minimizers schemes","volume":"34","author":"Mar\u00e7ais","year":"2018","journal-title":"Bioinformatics"},{"key":"2024021913325188500_btaa472-B7","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1146\/annurev-biodatasci-072018-021156","article-title":"Sketching and sublinear data structures in genomics","volume":"2","author":"Mar\u00e7ais","year":"2019","journal-title":"Annu. Rev. Biomed. Data Sci"},{"key":"2024021913325188500_btaa472-B8","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1016\/0095-8956(72)90006-8","article-title":"A proof of Golomb\u2019s conjecture for the de Bruijn graph","volume":"13","author":"Mykkeltveit","year":"1972","journal-title":"J. Comb. Theory B"},{"key":"2024021913325188500_btaa472-B9","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1007\/978-3-319-43681-4_21","volume-title":"Algorithms in Bioinformatics.","author":"Orenstein","year":"2016"},{"key":"2024021913325188500_btaa472-B10","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1089\/cmb.2004.11.734","article-title":"A preprocessor for shotgun assembly of large genomes","volume":"11","author":"Roberts","year":"2004","journal-title":"J. Comput. Biol"},{"key":"2024021913325188500_btaa472-B11","doi-asserted-by":"crossref","first-page":"3363","DOI":"10.1093\/bioinformatics\/bth408","article-title":"Reducing storage requirements for biological sequence comparison","volume":"20","author":"Roberts","year":"2004","journal-title":"Bioinformatics"},{"key":"2024021913325188500_btaa472-B12","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1186\/s13059-019-1809-x","article-title":"When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data","volume":"20","author":"Rowe","year":"2019","journal-title":"Genome Biol"},{"key":"2024021913325188500_btaa472-B13","first-page":"76","author":"Schleimer","year":"2003"},{"key":"2024021913325188500_btaa472-B14","author":"Zheng","year":"2020"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/Supplement_1\/i119\/56702411\/bioinformatics_36_supplement1_i119.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/Supplement_1\/i119\/56702411\/bioinformatics_36_supplement1_i119.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,19]],"date-time":"2024-02-19T08:38:31Z","timestamp":1708331911000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/Supplement_1\/i119\/5870484"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,1]]},"references-count":14,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2020,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa472","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.02.07.939025","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,7]]},"published":{"date-parts":[[2020,7,1]]}}}