{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T01:13:33Z","timestamp":1774746813034,"version":"3.50.1"},"reference-count":5,"publisher":"Oxford University Press (OUP)","issue":"20","license":[{"start":{"date-parts":[[2022,8,24]],"date-time":"2022-08-24T00:00:00Z","timestamp":1661299200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["2R01HG007182-04A1"],"award-info":[{"award-number":["2R01HG007182-04A1"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,10,14]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Spaced seeds are robust alternatives to k-mers in analyzing nucleotide sequences with high base mismatch rates. Hashing is also crucial for efficiently storing abundant sequence data. Here, we introduce ntHash2, a fast algorithm for spaced seed hashing that can be integrated into various bioinformatics tools for efficient sequence analysis with applications in genome research.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>ntHash2 is up to 2.1\u00d7 faster at hashing various spaced seeds than the previous version and 3.8\u00d7 faster than conventional hashing algorithms with na\u00efve adaptation. Additionally, we reduced the collision rate of ntHash for longer k-mer lengths and improved the uniformity of the hash distribution by modifying the canonical hashing mechanism.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>ntHash2 is freely available online at github.com\/bcgsc\/ntHash under an MIT license.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac564","type":"journal-article","created":{"date-parts":[[2022,8,24]],"date-time":"2022-08-24T13:33:29Z","timestamp":1661348009000},"page":"4812-4813","source":"Crossref","is-referenced-by-count":19,"title":["ntHash2: recursive spaced seed hashing for nucleotide sequences"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2126-5644","authenticated-orcid":false,"given":"Parham","family":"Kazemi","sequence":"first","affiliation":[{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency , Vancouver, BC V5Z 4S6, Canada"},{"name":"Faculty of Science, University of British Columbia , Vancouver, BC V6T 1Z4, Canada"}]},{"given":"Johnathan","family":"Wong","sequence":"additional","affiliation":[{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency , Vancouver, BC V5Z 4S6, Canada"}]},{"given":"Vladimir","family":"Nikoli\u0107","sequence":"additional","affiliation":[{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency , Vancouver, BC V5Z 4S6, Canada"}]},{"given":"Hamid","family":"Mohamadi","sequence":"additional","affiliation":[{"name":"Amazon Web Services Inc. , Seattle, WA 98109, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9890-2293","authenticated-orcid":false,"given":"Ren\u00e9 L","family":"Warren","sequence":"additional","affiliation":[{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency , Vancouver, BC V5Z 4S6, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0950-7839","authenticated-orcid":false,"given":"Inan\u00e7","family":"Birol","sequence":"additional","affiliation":[{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency , Vancouver, BC V5Z 4S6, Canada"},{"name":"Department of Medical Genetics, University of British Columbia , Vancouver, BC V6T 1Z3, Canada"}]}],"member":"286","published-online":{"date-parts":[[2022,8,24]]},"reference":[{"key":"2022101415190062100_btac564-B1","volume-title":"Handbook of Methods of Applied Statistics","author":"Chakravarti","year":"1967"},{"key":"2022101415190062100_btac564-B2","doi-asserted-by":"crossref","first-page":"16961","DOI":"10.1073\/pnas.1903436117","article-title":"Mismatch-tolerant, alignment-free sequence classification using multiple spaced seeds and multiindex bloom filters","volume":"117","author":"Chu","year":"2020","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2022101415190062100_btac564-B4","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1093\/bioinformatics\/18.3.440","article-title":"PatternHunter: faster and more sensitive homology search","volume":"18","author":"Ma","year":"2002","journal-title":"Bioinformatics"},{"key":"2022101415190062100_btac564-B5","doi-asserted-by":"crossref","first-page":"3492","DOI":"10.1093\/bioinformatics\/btw397","article-title":"ntHash: recursive nucleotide hashing","volume":"32","author":"Mohamadi","year":"2016","journal-title":"Bioinformatics"},{"key":"2022101415190062100_btac564-B6","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1089\/cmb.2019.0298","article-title":"Iterative spaced seed hashing: closing the gap between spaced seed hashing and k-mer Hashing","volume":"27","author":"Petrucci","year":"2020","journal-title":"J. Comput. Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac564\/45641212\/btac564.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/20\/4812\/46535020\/btac564.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/20\/4812\/46535020\/btac564.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,15]],"date-time":"2023-02-15T20:17:42Z","timestamp":1676492262000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/20\/4812\/6674501"}},"subtitle":[],"editor":[{"given":"Peter","family":"Robinson","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,8,24]]},"references-count":5,"journal-issue":{"issue":"20","published-online":{"date-parts":[[2022,8,24]]},"published-print":{"date-parts":[[2022,10,14]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac564","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,10,15]]},"published":{"date-parts":[[2022,8,24]]}}}