{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T03:22:31Z","timestamp":1774408951886,"version":"3.50.1"},"reference-count":22,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,7,13]],"date-time":"2025-07-13T00:00:00Z","timestamp":1752364800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,13]],"date-time":"2025-07-13T00:00:00Z","timestamp":1752364800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100005690","name":"Universit\u00e4t des Saarlandes","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005690","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Algorithms Mol Biol"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Motivation<\/jats:title>\n            <jats:p>Short DNA sequences of length\u00a0<jats:italic>k<\/jats:italic> that appear in a single location (e.g., at a single genomic position, in a single species from a larger set of species, etc.) are called <jats:italic>unique k-mers<\/jats:italic>. They are useful for placing sequenced DNA fragments at the correct location without computing alignments and without ambiguity. However, they are not necessarily robust: A single basepair change may turn a unique <jats:italic>k<\/jats:italic>-mer into a different one that may in fact be present at one or more different locations, which may give confusing or contradictory information when attempting to place a read by its <jats:italic>k<\/jats:italic>-mer content. A more robust concept are <jats:italic>strongly unique<\/jats:italic>\n              <jats:italic>k<\/jats:italic>-mers, i.e., unique <jats:italic>k<\/jats:italic>-mers for which no Hamming-distance-1 neighbor with conflicting information exists in all of the considered sequences. Given a set of <jats:italic>k<\/jats:italic>-mers, it is therefore of interest to have an efficient method that can distinguish <jats:italic>k<\/jats:italic>-mers with a Hamming-distance-1 neighbor in the collection from those that do not.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We present engineered algorithms to identify and mark within a set <jats:italic>K<\/jats:italic> of (canonical) <jats:italic>k<\/jats:italic>-mers all elements that have a Hamming-distance-1 neighbor in the same set. One algorithm is based on recursively running a 4-way comparison on sub-intervals of the sorted set. The other algorithm is based on bucketing and running a pairwise bit-parallel Hamming distance test on small buckets of the sorted set. Both methods consider canonical <jats:italic>k<\/jats:italic>-mers (i.e., taking reverse complements into account) and allow for efficient parallelization. The methods have been implemented and applied in practice to sets consisting of several billions of <jats:italic>k<\/jats:italic>-mers. An optimized combined approach running with 16 threads on a 16-core workstation yields wall times below 20 seconds on the 2.5 billion distinct 31-mers of the human telomere-to-telomere reference genome.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Availability<\/jats:title>\n            <jats:p>An implementation can be found at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/gitlab.com\/rahmannlab\/strong-k-mers\" ext-link-type=\"uri\">https:\/\/gitlab.com\/rahmannlab\/strong-k-mers<\/jats:ext-link>.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/s13015-025-00286-6","type":"journal-article","created":{"date-parts":[[2025,7,13]],"date-time":"2025-07-13T14:27:23Z","timestamp":1752416843000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Swiftly identifying strongly unique k-mers"],"prefix":"10.1186","volume":"20","author":[{"given":"Jens","family":"Zentgraf","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sven","family":"Rahmann","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,7,13]]},"reference":[{"issue":"4","key":"286_CR1","doi-asserted-by":"publisher","first-page":"278","DOI":"10.1007\/S40484-019-0181-X","volume":"7","author":"R Rizzi","year":"2019","unstructured":"Rizzi R, Beretta S, Patterson M, Pirola Y, Previtali M, Vedova GD, Bonizzoni P. Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era. Quant Biol. 2019;7(4):278\u201392. https:\/\/doi.org\/10.1007\/S40484-019-0181-X.","journal-title":"Quant Biol"},{"issue":"5","key":"286_CR2","doi-asserted-by":"publisher","first-page":"525","DOI":"10.1038\/nbt.3519","volume":"34","author":"NL Bray","year":"2016","unstructured":"Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34(5):525\u20137.","journal-title":"Nat Biotechnol"},{"issue":"1","key":"286_CR3","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1186\/S13015-021-00181-W","volume":"16","author":"J Zentgraf","year":"2021","unstructured":"Zentgraf J, Rahmann S. Fast lightweight accurate xenograft sorting. Algorithms Mol Biol. 2021;16(1):2. https:\/\/doi.org\/10.1186\/S13015-021-00181-W.","journal-title":"Algorithms Mol Biol"},{"issue":"1","key":"286_CR4","doi-asserted-by":"publisher","first-page":"198","DOI":"10.1186\/s13059-018-1568-0","volume":"19","author":"FP Breitwieser","year":"2018","unstructured":"Breitwieser FP, Baker DN, Salzberg SL. KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol. 2018;19(1):198.","journal-title":"Genome Biol"},{"key":"286_CR5","first-page":"364","volume":"10","author":"P Hirsch","year":"2024","unstructured":"Hirsch P, Molano L-AG, Engel A, Zentgraf J, Rahmann S, Hannig M, M\u00fcller R, Kern F, Keller A, Schmartz GP. Mibianto: ultra-efficient online microbiome analysis through $$k$$-mer based metagenomics. Nucl Acids Res. 2024;10:364.","journal-title":"Nucl Acids Res"},{"issue":"5","key":"286_CR6","doi-asserted-by":"publisher","first-page":"27","DOI":"10.21105\/joss.00027","volume":"1","author":"C Brown","year":"2016","unstructured":"Brown C, Irber L. sourmash: a library for minhash sketching of DNA. J Open Source Softw. 2016;1(5):27. https:\/\/doi.org\/10.21105\/joss.00027.","journal-title":"J Open Source Softw"},{"key":"286_CR7","doi-asserted-by":"publisher","unstructured":"Zentgraf J, Rahmann S. Swiftly identifying strongly unique $$k$$-mers. In: Pissis, S.P., Sung, W. (eds.) 24th International Workshop on Algorithms in Bioinformatics, WABI 2024, September 2-4, 2024, Royal Holloway, London, United Kingdom. LIPIcs, vol. 312, pp. 15\u201311515. Schloss Dagstuhl - Leibniz-Zentrum f\u00fcr Informatik, Germany 2024. https:\/\/doi.org\/10.4230\/LIPICS.WABI.2024.15","DOI":"10.4230\/LIPICS.WABI.2024.15"},{"issue":"10","key":"286_CR8","doi-asserted-by":"publisher","first-page":"1569","DOI":"10.1093\/bioinformatics\/btv022","volume":"31","author":"S Deorowicz","year":"2015","unstructured":"Deorowicz S, Kokot M, Grabowski S, Debudaj-Grabysz A. KMC 2: fast and resource-frugal $$k$$-mer counting. Bioinformatics. 2015;31(10):1569\u201376.","journal-title":"Bioinformatics"},{"issue":"17","key":"286_CR9","doi-asserted-by":"publisher","first-page":"2759","DOI":"10.1093\/bioinformatics\/btx304","volume":"33","author":"M Kokot","year":"2017","unstructured":"Kokot M, Dlugosz M, Deorowicz S. KMC 3: counting and manipulating $$k$$-mer statistics. Bioinformatics. 2017;33(17):2759\u201361.","journal-title":"Bioinformatics"},{"key":"286_CR10","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1186\/s13015-017-0097-9","volume":"12","author":"M Erbert","year":"2017","unstructured":"Erbert M, Rechner S, M\u00fcller-Hannemann M. Gerbil: a fast and memory-efficient $$k$$-mer counter with GPU-support. Algorithms Mol Biol. 2017;12:9.","journal-title":"Algorithms Mol Biol"},{"issue":"6","key":"286_CR11","doi-asserted-by":"publisher","first-page":"764","DOI":"10.1093\/bioinformatics\/btr011","volume":"27","author":"G Marcais","year":"2011","unstructured":"Marcais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of $$k$$-mers. Bioinformatics. 2011;27(6):764\u201370.","journal-title":"Bioinformatics"},{"key":"286_CR12","doi-asserted-by":"publisher","unstructured":"Zentgraf J, Rahmann S. Fast gapped $$k$$-mer counting with subdivided multi-way bucketed cuckoo hash tables. In: Boucher, C., Rahmann, S. (eds.) 22nd International Workshop on Algorithms in Bioinformatics, WABI 2022, September 5-7, 2022, Potsdam, Germany. LIPIcs, vol. 242, pp. 12\u201311220. Schloss Dagstuhl - Leibniz-Zentrum f\u00fcr Informatik, Germany 2022. https:\/\/doi.org\/10.4230\/LIPICS.WABI.2022.12","DOI":"10.4230\/LIPICS.WABI.2022.12"},{"issue":"1","key":"286_CR13","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1186\/s13015-024-00259-1","volume":"19","author":"D D\u00edaz-Dom\u00ednguez","year":"2024","unstructured":"D\u00edaz-Dom\u00ednguez D, Leinonen M, Salmela L. Space-efficient computation of $$k$$-mer dictionaries for large values of $$k$$. Algor Mol Biol. 2024;19(1):14.","journal-title":"Algor Mol Biol"},{"issue":"2","key":"286_CR14","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1109\/MCSE.2011.37","volume":"13","author":"S Walt","year":"2011","unstructured":"Walt S, Colbert SC, Varoquaux G. The NumPy array: a structure for efficient numerical computation. Comput Sci Eng. 2011;13(2):22\u201330. https:\/\/doi.org\/10.1109\/MCSE.2011.37.","journal-title":"Comput Sci Eng"},{"key":"286_CR15","doi-asserted-by":"publisher","unstructured":"Lam SK, Pitrou A, Seibert S. Numba: a LLVM-based Python JIT compiler. In: Finkel, H. (ed.) Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, LLVM 2015, Austin, Texas, USA, November 15, 2015, pp. 7\u2013176. ACM, New York, NY, USA 2015. https:\/\/doi.org\/10.1145\/2833157.2833162","DOI":"10.1145\/2833157.2833162"},{"issue":"7978","key":"286_CR16","doi-asserted-by":"publisher","first-page":"344","DOI":"10.1038\/s41586-023-06457-y","volume":"621","author":"A Rhie","year":"2023","unstructured":"Rhie A, Nurk S, Cechova M, Hoyt SJ, Taylor DJ, Altemose N, Hook PW, Koren S, Rautiainen M, Alexandrov IA, et al. The complete sequence of a human Y chromosome. Nature. 2023;621(7978):344\u201354.","journal-title":"Nature"},{"key":"286_CR17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12859-016-0976-y","volume":"17","author":"M Schirmer","year":"2016","unstructured":"Schirmer M, D\u2019Amore R, Ijaz UZ, Hall N, Quince C. Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data. BMC Bioinform. 2016;17:1\u201315.","journal-title":"BMC Bioinform"},{"key":"286_CR18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-14-S5-S1","volume":"14","author":"M Allhoff","year":"2013","unstructured":"Allhoff M, Sch\u00f6nhuth A, Martin M, Costa IG, Rahmann S, Marschall T. Discovering motifs that induce sequencing errors. BMC Bioinform. 2013;14:1.","journal-title":"BMC Bioinform"},{"issue":"13","key":"286_CR19","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1093\/bioinformatics\/btr208","volume":"27","author":"P Medvedev","year":"2011","unstructured":"Medvedev P, Scott E, Kakaradov B, Pevzner P. Error correction of high-throughput sequencing datasets with non-uniform coverage. Bioinformatics. 2011;27(13):137\u201341.","journal-title":"Bioinformatics"},{"issue":"Suppl 1","key":"286_CR20","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1186\/1471-2164-14-S1-S7","volume":"14","author":"SI Nikolenko","year":"2013","unstructured":"Nikolenko SI, Korobeynikov AI, Alekseyev MA. Bayeshammer: Bayesian clustering for error correction in single-cell sequencing. BMC Genom. 2013;14(Suppl 1):7.","journal-title":"BMC Genom"},{"issue":"1","key":"286_CR21","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1186\/S13015-023-00227-1","volume":"18","author":"SS Schmidt","year":"2023","unstructured":"Schmidt SS, Alanko JN. Eulertigs: minimum plain text representation of $$k$$-mer sets without repetitions in linear time. Algorithms Mol Biol. 2023;18(1):5. https:\/\/doi.org\/10.1186\/S13015-023-00227-1.","journal-title":"Algorithms Mol Biol"},{"key":"286_CR22","doi-asserted-by":"publisher","unstructured":"Renders L, Depuydt L, Rahmann S, Fostier J. Automated design of efficient search schemes for lossless approximate pattern matching. In: Ma, J. (ed.) Research in Computational Molecular Biology - 28th Annual International Conference, RECOMB 2024, Cambridge, MA, USA, April 29 - May 2, 2024, Proceedings. Lecture Notes in Computer Science, vol. 14758, pp. 164\u2013184. Springer, Berlin, Heidelberg 2024. https:\/\/doi.org\/10.1007\/978-1-0716-3989-4_11","DOI":"10.1007\/978-1-0716-3989-4_11"}],"container-title":["Algorithms for Molecular Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13015-025-00286-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13015-025-00286-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13015-025-00286-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,7]],"date-time":"2025-09-07T06:59:07Z","timestamp":1757228347000},"score":1,"resource":{"primary":{"URL":"https:\/\/almob.biomedcentral.com\/articles\/10.1186\/s13015-025-00286-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,13]]},"references-count":22,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["286"],"URL":"https:\/\/doi.org\/10.1186\/s13015-025-00286-6","relation":{},"ISSN":["1748-7188"],"issn-type":[{"value":"1748-7188","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,13]]},"assertion":[{"value":"6 February 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 June 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 July 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"13"}}