{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,17]],"date-time":"2026-02-17T15:37:20Z","timestamp":1771342640128,"version":"3.50.1"},"reference-count":17,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2024,1,25]],"date-time":"2024-01-25T00:00:00Z","timestamp":1706140800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004543","name":"Chinese Scholarship Council","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004543","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,2,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The minimizer concept is a data structure for sequence sketching. The standard canonical minimizer selects a subset of k-mers from the given DNA sequence by comparing the forward and reverse k-mers in a window simultaneously according to a predefined selection scheme. It is widely employed by sequence analysis such as read mapping and assembly. k-mer density, k-mer repetitiveness (e.g. k-mer bias), and computational efficiency are three critical measurements for minimizer selection schemes. However, there exist trade-offs between kinds of minimizer variants. Generic, effective, and efficient are always the requirements for high-performance minimizer algorithms.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We propose a simple minimizer operator as a refinement of the standard canonical minimizer. It takes only a few operations to compute. However, it can improve the k-mer repetitiveness, especially for the lexicographic order. It applies to other selection schemes of total orders (e.g. random orders). Moreover, it is computationally efficient and the density is close to that of the standard minimizer. The refined minimizer may benefit high-performance applications like binning and read mapping.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The source code of the benchmark in this work is available at the github repository https:\/\/github.com\/xp3i4\/mini_benchmark<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae045","type":"journal-article","created":{"date-parts":[[2024,1,25]],"date-time":"2024-01-25T09:47:19Z","timestamp":1706176039000},"source":"Crossref","is-referenced-by-count":5,"title":["A simple refined DNA minimizer operator enables 2-fold faster computation"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3713-0605","authenticated-orcid":false,"given":"Chenxu","family":"Pan","sequence":"first","affiliation":[{"name":"Department of Mathematics and Computer Science, Freie Universit\u00e4t Berlin , Takustra\u00dfe 9 , Berlin, 14195, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3078-8129","authenticated-orcid":false,"given":"Knut","family":"Reinert","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Computer Science, Freie Universit\u00e4t Berlin , Takustra\u00dfe 9 , Berlin, 14195, Germany"},{"name":"Max Planck Institute for Molecular Genetics , Ihnestra\u00dfe 63-73 , Berlin, 14195, Germany"}]}],"member":"286","published-online":{"date-parts":[[2024,1,25]]},"reference":[{"key":"2024020805385330600_btae045-B1","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btad320","article-title":"Efficient short read mapping to a pangenome that is represented by a graph of ED strings","volume":"39","author":"B\u00fcchler","year":"2023","journal-title":"Bioinformatics"},{"key":"2024020805385330600_btae045-B2","first-page":"35","volume-title":"Research in Computational Molecular Biology, Lecture Notes in Computer Science","author":"Chikhi","year":"2014"},{"key":"2024020805385330600_btae045-B3","doi-asserted-by":"crossref","first-page":"i201","DOI":"10.1093\/bioinformatics\/btw279","article-title":"Compacting de bruijn graphs from sequencing data quickly and in low memory","volume":"32","author":"Chikhi","year":"2016","journal-title":"Bioinformatics"},{"key":"2024020805385330600_btae045-B4","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511809088","volume-title":"Introduction to Lattices and Order","author":"Davey","year":"2002","edition":"2nd edn"},{"key":"2024020805385330600_btae045-B5","doi-asserted-by":"crossref","first-page":"1569","DOI":"10.1093\/bioinformatics\/btv022","article-title":"KMC 2: fast and resource-frugal k-mer counting","volume":"31","author":"Deorowicz","year":"2015","journal-title":"Bioinformatics"},{"key":"2024020805385330600_btae045-B6","doi-asserted-by":"crossref","first-page":"e10805","DOI":"10.7717\/peerj.10805","article-title":"Syncmers are more sensitive than minimizers for selecting conserved k-mers in biological sequences","volume":"9","author":"Edgar","year":"2021","journal-title":"PeerJ"},{"key":"2024020805385330600_btae045-B7","doi-asserted-by":"crossref","first-page":"i111","DOI":"10.1093\/bioinformatics\/btaa435","article-title":"Weighted minimizer sampling improves long read mapping","volume":"36","author":"Jain","year":"2020","journal-title":"Bioinformatics"},{"key":"2024020805385330600_btae045-B8","doi-asserted-by":"crossref","first-page":"2103","DOI":"10.1093\/bioinformatics\/btw152","article-title":"Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences","volume":"32","author":"Li","year":"2016","journal-title":"Bioinformatics"},{"key":"2024020805385330600_btae045-B9","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2024020805385330600_btae045-B10","doi-asserted-by":"crossref","first-page":"i110","DOI":"10.1093\/bioinformatics\/btx235","article-title":"Improving the performance of minimizers and winnowing schemes","volume":"33","author":"Mar\u00e7ais","year":"2017","journal-title":"Bioinformatics"},{"key":"2024020805385330600_btae045-B11","doi-asserted-by":"crossref","first-page":"3492","DOI":"10.1093\/bioinformatics\/btw397","article-title":"ntHash: recursive nucleotide hashing","volume":"32","author":"Mohamadi","year":"2016","journal-title":"Bioinformatics"},{"key":"2024020805385330600_btae045-B12","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1007\/978-3-319-43681-4_21","volume-title":"Algorithms in Bioinformatics, Lecture Notes in Computer Science","author":"Orenstein","year":"2016"},{"key":"2024020805385330600_btae045-B13","doi-asserted-by":"crossref","first-page":"3363","DOI":"10.1093\/bioinformatics\/bth408","article-title":"Reducing storage requirements for biological sequence comparison","volume":"20","author":"Roberts","year":"2004","journal-title":"Bioinformatics"},{"key":"2024020805385330600_btae045-B14","doi-asserted-by":"crossref","first-page":"260","DOI":"10.1186\/s13059-022-02831-7","article-title":"Strobealign: flexible seed size enables ultra-fast and accurate read alignment","volume":"23","author":"Sahlin","year":"2022","journal-title":"Genome Biol"},{"key":"2024020805385330600_btae045-B15","first-page":"76","author":"Schleimer","year":"2003"},{"key":"2024020805385330600_btae045-B16","doi-asserted-by":"crossref","first-page":"R46","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol"},{"key":"2024020805385330600_btae045-B17","doi-asserted-by":"crossref","first-page":"i187","DOI":"10.1093\/bioinformatics\/btab313","article-title":"Sequence-specific minimizers via polar sets","volume":"37","author":"Zheng","year":"2021","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae045\/56412562\/btae045.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/2\/btae045\/56619446\/btae045.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/2\/btae045\/56619446\/btae045.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,8]],"date-time":"2024-02-08T06:03:43Z","timestamp":1707372223000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae045\/7588893"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,1,25]]},"references-count":17,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,2,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae045","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,2,1]]},"published":{"date-parts":[[2024,1,25]]},"article-number":"btae045"}}