{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,30]],"date-time":"2026-05-30T01:39:09Z","timestamp":1780105149264,"version":"3.54.0"},"reference-count":20,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2018,6,27]],"date-time":"2018-06-27T00:00:00Z","timestamp":1530057600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Gordon and Betty Moore Foundation\u2019s Data-Driven Discovery Initiative","award":["GBMF4554"],"award-info":[{"award-number":["GBMF4554"]}]},{"DOI":"10.13039\/100000001","name":"US National Science Foundation","doi-asserted-by":"crossref","award":["CCF-1256087"],"award-info":[{"award-number":["CCF-1256087"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000001","name":"US National Science Foundation","doi-asserted-by":"crossref","award":["CCF-1319998"],"award-info":[{"award-number":["CCF-1319998"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000002","name":"US National Institutes of Health","doi-asserted-by":"crossref","award":["R01HG007104"],"award-info":[{"award-number":["R01HG007104"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000002","name":"US National Institutes of Health","doi-asserted-by":"crossref","award":["R01GM122935"],"award-info":[{"award-number":["R01GM122935"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The minimizers technique is a method to sample k-mers that is used in many bioinformatics software to reduce computation, memory usage and run time. The number of applications using minimizers keeps on growing steadily. Despite its many uses, the theoretical understanding of minimizers is still very limited. In many applications, selecting as few k-mers as possible (i.e. having a low density) is beneficial. The density is highly dependent on the choice of the order on the k-mers. Different applications use different orders, but none of these orders are optimal. A better understanding of minimizers schemes, and the related local and forward schemes, will allow designing schemes with lower density and thereby making existing and future bioinformatics tools even more efficient.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>From the analysis of the asymptotic behavior of minimizers, forward and local schemes, we show that the previously believed lower bound on minimizers schemes does not hold, and that schemes with density lower than thought possible actually exist. The proof is constructive and leads to an efficient algorithm to compare k-mers. These orders are the first known orders that are asymptotically optimal. Additionally, we give improved bounds on the density achievable by the three type of schemes.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty258","type":"journal-article","created":{"date-parts":[[2018,4,13]],"date-time":"2018-04-13T02:58:15Z","timestamp":1523588295000},"page":"i13-i22","source":"Crossref","is-referenced-by-count":66,"title":["Asymptotically optimal minimizers schemes"],"prefix":"10.1093","volume":"34","author":[{"given":"Guillaume","family":"Mar\u00e7ais","sequence":"first","affiliation":[{"name":"Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dan","family":"DeBlasio","sequence":"additional","affiliation":[{"name":"Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Carl","family":"Kingsford","sequence":"additional","affiliation":[{"name":"Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2018,6,27]]},"reference":[{"key":"2023051604241177000_bty258-B1","first-page":"758","article-title":"A combinatorial problem","volume":"49","author":"de Bruijn","year":"1946","journal-title":"Proc. Section Sci. Koninklijke Nederlandse Akademie Van Wetenschappen Te Amsterdam"},{"key":"2023051604241177000_bty258-B2","doi-asserted-by":"crossref","first-page":"1569","DOI":"10.1093\/bioinformatics\/btv022","article-title":"KMC 2: fast and resource-frugal k-mer counting","volume":"31","author":"Deorowicz","year":"2015","journal-title":"Bioinformatics"},{"key":"2023051604241177000_bty258-B3","first-page":"287","volume-title":"String Processing and Information Retrieval, Number 9309 in Lecture Notes in Computer Science","author":"Grabowski","year":"2015"},{"key":"2023051604241177000_bty258-B4","doi-asserted-by":"crossref","first-page":"e0121453.","DOI":"10.1371\/journal.pone.0121453","article-title":"CoMeta: classification of Metagenomes using k-mers","volume":"10","author":"Kawulok","year":"2015","journal-title":"Plos One"},{"key":"2023051604241177000_bty258-B5","doi-asserted-by":"crossref","first-page":"1204","DOI":"10.1109\/T-C.1970.222859","article-title":"On a homomorphism of the de Bruijn graph and its applications to the design of feedback shift registers","volume":"C-19","author":"Lempel","year":"1970","journal-title":"IEEE Trans. Computers"},{"key":"2023051604241177000_bty258-B6","doi-asserted-by":"crossref","first-page":"2103","DOI":"10.1093\/bioinformatics\/btw152","article-title":"Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences","volume":"32","author":"Li","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051604241177000_bty258-B7","author":"Li","year":"2015"},{"key":"2023051604241177000_bty258-B8","author":"Li","year":"2013"},{"key":"2023051604241177000_bty258-B9","doi-asserted-by":"crossref","first-page":"1145","DOI":"10.1016\/j.disc.2005.10.032","article-title":"Independence number of de Bruijn graphs","volume":"306","author":"Lichiardopol","year":"2006","journal-title":"Discrete Math"},{"key":"2023051604241177000_bty258-B10","doi-asserted-by":"crossref","first-page":"i110","DOI":"10.1093\/bioinformatics\/btx235","article-title":"Improving the performance of minimizers and winnowing schemes","volume":"33","author":"Mar\u00e7ais","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051604241177000_bty258-B11","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1016\/0166-218X(92)90149-5","article-title":"Asymptotically-tight bounds on the number of cycles in generalized de Bruijn-Good graphs","volume":"37\u201338","author":"Maurer","year":"1992","journal-title":"Discrete Appl. Math"},{"key":"2023051604241177000_bty258-B12","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1016\/0095-8956(72)90006-8","article-title":"A proof of Golomb\u2019s conjecture for the de Bruijn graph","volume":"13","author":"Mykkeltveit","year":"1972","journal-title":"J. Combinatorial Theory, Ser. B"},{"key":"2023051604241177000_bty258-B13","doi-asserted-by":"crossref","first-page":"132.","DOI":"10.1186\/s13059-016-0997-x","article-title":"Mash: fast genome and metagenome distance estimation using MinHash","volume":"17","author":"Ondov","year":"2016","journal-title":"Genome Biol"},{"key":"2023051604241177000_bty258-B14","doi-asserted-by":"crossref","first-page":"e1005777.","DOI":"10.1371\/journal.pcbi.1005777","article-title":"Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing","volume":"13","author":"Orenstein","year":"2017","journal-title":"PLOS Comput. Biol"},{"key":"2023051604241177000_bty258-B15","first-page":"25","volume-title":"Selected Areas in Cryptography\u2013SAC 2015, Lecture Notes in Computer Science","author":"Paindavoine","year":"2015"},{"key":"2023051604241177000_bty258-B16","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1089\/cmb.2004.11.734","article-title":"A preprocessor for shotgun assembly of large genomes","volume":"11","author":"Roberts","year":"2004","journal-title":"J. Comput. Biol"},{"key":"2023051604241177000_bty258-B17","doi-asserted-by":"crossref","first-page":"3363","DOI":"10.1093\/bioinformatics\/bth408","article-title":"Reducing storage requirements for biological sequence comparison","volume":"20","author":"Roberts","year":"2004","journal-title":"Bioinformatics"},{"key":"2023051604241177000_bty258-B18","author":"Schleimer","year":"2003"},{"key":"2023051604241177000_bty258-B19","doi-asserted-by":"crossref","first-page":"R46.","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol"},{"key":"2023051604241177000_bty258-B20","doi-asserted-by":"crossref","first-page":"S1.","DOI":"10.1186\/1471-2105-13-S6-S1","article-title":"Exploiting sparseness in de novo genome assembly","volume":"13","author":"Ye","year":"2012","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i13\/50316024\/bioinformatics_34_13_i13.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i13\/50316024\/bioinformatics_34_13_i13.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T00:26:35Z","timestamp":1684196795000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/13\/i13\/5045769"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6,27]]},"references-count":20,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2018,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty258","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/256156","asserted-by":"object"}],"has-review":[{"id-type":"doi","id":"10.3410\/f.733534214.793551172","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,7,1]]},"published":{"date-parts":[[2018,6,27]]}}}