{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T15:38:22Z","timestamp":1772725102560,"version":"3.50.1"},"reference-count":30,"publisher":"Oxford University Press (OUP)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,5,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Calculating the edit-distance (i.e. minimum number of insertions, deletions and substitutions) between short DNA sequences is the primary task performed by seed-and-extend based mappers, which compare billions of sequences. In practice, only sequence pairs with a small edit-distance provide useful scientific data. However, the majority of sequence pairs analyzed by seed-and-extend based mappers differ by significantly more errors than what is typically allowed. Such error-abundant sequence pairs needlessly waste resources and severely hinder the performance of read mappers. Therefore, it is crucial to develop a fast and accurate filter that can rapidly and efficiently detect error-abundant string pairs and remove them from consideration before more computationally expensive methods are used.<\/jats:p>\n               <jats:p>Results: We present a simple and efficient algorithm, Shifted Hamming Distance (SHD), which accelerates the alignment verification procedure in read mapping, by quickly filtering out error-abundant sequence pairs using bit-parallel and SIMD-parallel operations. SHD only filters string pairs that contain more errors than a user-defined threshold, making it fully comprehensive. It also maintains high accuracy with moderate error threshold (up to 5% of the string length) while achieving a 3-fold speedup over the best previous algorithm (Gene Myers\u2019s bit-vector algorithm). SHD is compatible with all mappers that perform sequence alignment for verification.<\/jats:p>\n               <jats:p>Availability and implementation: We provide an implementation of SHD in C with Intel SSE instructions at: https:\/\/github.com\/CMU-SAFARI\/SHD.<\/jats:p>\n               <jats:p>Contact: hxin@cmu.edu, calkan@cs.bilkent.edu.tr or onur@cmu.edu<\/jats:p>\n               <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu856","type":"journal-article","created":{"date-parts":[[2015,1,11]],"date-time":"2015-01-11T01:27:33Z","timestamp":1420939653000},"page":"1553-1560","source":"Crossref","is-referenced-by-count":69,"title":["Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping"],"prefix":"10.1093","volume":"31","author":[{"given":"Hongyi","family":"Xin","sequence":"first","affiliation":[{"name":"1 Computer Science Department, 2Department of Electrical and Computer Engineering, 3Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA and 4Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey"}]},{"given":"John","family":"Greth","sequence":"additional","affiliation":[{"name":"1 Computer Science Department, 2Department of Electrical and Computer Engineering, 3Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA and 4Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey"}]},{"given":"John","family":"Emmons","sequence":"additional","affiliation":[{"name":"1 Computer Science Department, 2Department of Electrical and Computer Engineering, 3Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA and 4Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey"}]},{"given":"Gennady","family":"Pekhimenko","sequence":"additional","affiliation":[{"name":"1 Computer Science Department, 2Department of Electrical and Computer Engineering, 3Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA and 4Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey"}]},{"given":"Carl","family":"Kingsford","sequence":"additional","affiliation":[{"name":"1 Computer Science Department, 2Department of Electrical and Computer Engineering, 3Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA and 4Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey"}]},{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[{"name":"1 Computer Science Department, 2Department of Electrical and Computer Engineering, 3Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA and 4Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey"}]},{"given":"Onur","family":"Mutlu","sequence":"additional","affiliation":[{"name":"1 Computer Science Department, 2Department of Electrical and Computer Engineering, 3Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA and 4Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey"}]}],"member":"286","published-online":{"date-parts":[[2015,1,10]]},"reference":[{"key":"2023020115455739300_btu856-B1","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/nature09534","article-title":"A map of human genome variation from population-scale sequencing","volume":"467","author":"1000 Genomes Project Consortium","year":"2010","journal-title":"Nature"},{"key":"2023020115455739300_btu856-B2","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nature11632","article-title":"An integrated map of genetic variation from 1,092 human genomes","volume":"491","author":"1000 Genomes Project Consortium","year":"2012","journal-title":"Nature"},{"key":"2023020115455739300_btu856-B3","doi-asserted-by":"crossref","first-page":"e41","DOI":"10.1093\/nar\/gkr1246","article-title":"Hobbes: optimized gram-based methods for efficient read alignment","volume":"40","author":"Ahmadi","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023020115455739300_btu856-B4","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/ng.437","article-title":"Personalized copy number and segmental duplication maps using next-generation sequencing","volume":"41","author":"Alkan","year":"2009","journal-title":"Nat. Genet."},{"key":"2023020115455739300_btu856-B5","article-title":"A block-sorting lossless data compression algorithm","author":"Burrows","year":"1994","journal-title":"Technical Report 124, Digital Equipment Corporation"},{"key":"2023020115455739300_btu856-B6","doi-asserted-by":"crossref","first-page":"2369","DOI":"10.1093\/nar\/27.11.2369","article-title":"Alignment of whole genomes","volume":"27","author":"Delcher","year":"1999","journal-title":"Nucleic Acids Res"},{"key":"2023020115455739300_btu856-B7","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1186\/1471-2105-9-11","article-title":"Seqan an efficient, generic c++ library for sequence analysis","volume":"9","author":"D\u00f6ring","year":"2008","journal-title":"BMC Bioinf."},{"key":"2023020115455739300_btu856-B8","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1093\/bioinformatics\/btl582","article-title":"Striped Smith-Waterman speeds database searches six times over other SIMD implementations","volume":"23","author":"Farrar","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020115455739300_btu856-B9","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/ng.2915","article-title":"Loss-of-function mutations in slc30a8 protect against type 2 diabetes","volume":"46","author":"Flannick","year":"2014","journal-title":"Nat. Genet."},{"key":"2023020115455739300_btu856-B10","doi-asserted-by":"crossref","first-page":"710","DOI":"10.1126\/science.1188021","article-title":"A draft sequence of the Neandertal genome","volume":"328","author":"Green","year":"2010","journal-title":"Science"},{"key":"2023020115455739300_btu856-B11","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-540-30577-4_44","article-title":"Fast bit-vector algorithms for approximate string matching under indel distance","author":"Hyyro","year":"2005"},{"key":"2023020115455739300_btu856-B12","author":"Intel","year":"2012"},{"key":"2023020115455739300_btu856-B13","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat. Method"},{"key":"2023020115455739300_btu856-B14","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1093\/bioinformatics\/btp698","article-title":"Fast and accurate long-read alignment with Burrows-Wheeler transform","volume":"26","author":"Li","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020115455739300_btu856-B15","doi-asserted-by":"crossref","first-page":"1966","DOI":"10.1093\/bioinformatics\/btp336","article-title":"SOAP2: an improved ultrafast tool for short read alignment","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"issue":"Suppl 2","key":"2023020115455739300_btu856-B16","doi-asserted-by":"crossref","first-page":"S10","DOI":"10.1186\/1471-2105-9-S2-S10","article-title":"CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment","volume":"9","author":"Manavski","year":"2008","journal-title":"BMC Bioinf."},{"key":"2023020115455739300_btu856-B17","doi-asserted-by":"crossref","first-page":"877","DOI":"10.1038\/nature07744","article-title":"A burst of segmental duplications in the genome of the African great ape ancestor","volume":"457","author":"Marques-Bonet","year":"2009","journal-title":"Nature"},{"key":"2023020115455739300_btu856-B18","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1126\/science.1224344","article-title":"A high-coverage genome sequence from an archaic denisovan individual","volume":"338","author":"Meyer","year":"2012","journal-title":"Science"},{"key":"2023020115455739300_btu856-B19","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1145\/316542.316550","article-title":"A fast bit-vector algorithm for approximate string matching based on dynamic programming","volume":"46","author":"Myers","year":"1999","journal-title":"J. ACM"},{"key":"2023020115455739300_btu856-B20","doi-asserted-by":"crossref","first-page":"790","DOI":"10.1038\/ng.646","article-title":"Exome sequencing identifies MLL2 mutations as a cause of kabuki syndrome","volume":"42","author":"Ng","year":"2010","journal-title":"Nat. Genet."},{"key":"2023020115455739300_btu856-B21","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1038\/nature12228","article-title":"Great ape genetic diversity and population history","volume":"499","author":"Prado-Martinez","year":"2013","journal-title":"Nature"},{"key":"2023020115455739300_btu856-B22","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/nature09710","article-title":"Genetic history of an archaic hominin group from Denisova Cave in Siberia","volume":"468","author":"Reich","year":"2010","journal-title":"Nature"},{"key":"2023020115455739300_btu856-B23","doi-asserted-by":"crossref","first-page":"e1000386","DOI":"10.1371\/journal.pcbi.1000386","article-title":"Shrimp: accurate mapping of short color-space reads","volume":"5","author":"Rumble","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023020115455739300_btu856-B24","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1038\/nature10842","article-title":"Insights into hominid evolution from the gorilla genome sequence","volume":"483","author":"Scally","year":"2012","journal-title":"Nature"},{"key":"2023020115455739300_btu856-B25","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol."},{"key":"2023020115455739300_btu856-B26","doi-asserted-by":"crossref","first-page":"107+","DOI":"10.1186\/1756-0500-1-107","article-title":"SWPS3\u2014fast multi-threaded vectorized Smith-Waterman for IBM Cell\/B.e. and x86\/SSE2","volume":"1","author":"Szalkowski","year":"2008","journal-title":"BMC Res. Notes"},{"key":"2023020115455739300_btu856-B27","doi-asserted-by":"crossref","DOI":"10.1016\/0196-6774(85)90023-9","article-title":"Finding approximate patterns in strings","author":"Ukkonen","year":"1985","journal-title":"J. Algorithms."},{"key":"2023020115455739300_btu856-B28","doi-asserted-by":"crossref","first-page":"1640","DOI":"10.1101\/gr.124461.111","article-title":"Gorilla genome structural variation reveals evolutionary parallelisms with chimpanzee","volume":"21","author":"Ventura","year":"2011","journal-title":"Genome Res."},{"key":"2023020115455739300_btu856-B29","doi-asserted-by":"crossref","first-page":"2592","DOI":"10.1093\/bioinformatics\/bts505","article-title":"Razers 3: faster, fully sensitive read mapping","volume":"28","author":"Weese","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020115455739300_btu856-B30","doi-asserted-by":"crossref","first-page":"S13","DOI":"10.1186\/1471-2164-14-S1-S13","article-title":"Accelerating read mapping with FastHASH","volume":"14","author":"Xin","year":"2013","journal-title":"BMC Genomics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/10\/1553\/49013086\/bioinformatics_31_10_1553.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/10\/1553\/49013086\/bioinformatics_31_10_1553.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T00:14:12Z","timestamp":1675296852000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/10\/1553\/176591"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,1,10]]},"references-count":30,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2015,5,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu856","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,5,15]]},"published":{"date-parts":[[2015,1,10]]}}}