{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,2]],"date-time":"2026-07-02T05:07:09Z","timestamp":1782968829995,"version":"3.54.5"},"reference-count":56,"publisher":"Oxford University Press (OUP)","issue":"20","license":[{"start":{"date-parts":[[2018,5,3]],"date-time":"2018-05-03T00:00:00Z","timestamp":1525305600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Intel\u00ae Parallel Compute Center"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,10,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (single instruction multiple data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we (a) distribute many independent alignments on multiple threads and (b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel\u00ae Xeon\u00ae (Skylake) and Intel\u00ae Xeon PhiTM (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon PhiTM and 1400 times faster on the Xeon\u00ae than executing them with our previous sequential alignment module.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The module is programmed in C++\u2009using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4 under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME: SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++\u2009compilers on various platforms.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty380","type":"journal-article","created":{"date-parts":[[2018,5,2]],"date-time":"2018-05-02T19:29:52Z","timestamp":1525289392000},"page":"3437-3445","source":"Crossref","is-referenced-by-count":49,"title":["Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading"],"prefix":"10.1093","volume":"34","author":[{"given":"Ren\u00e9","family":"Rahn","sequence":"first","affiliation":[{"name":"Department of Mathematics and Computer Science, Freie Universit\u00e4t Berlin, Berlin, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Stefan","family":"Budach","sequence":"additional","affiliation":[{"name":"Otto-Warburg-Laboratory, RNA Bioinformatics, Max Planck Institute for Molecular Genetics, Berlin, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Pascal","family":"Costanza","sequence":"additional","affiliation":[{"name":"ExaScience Lab, IMEC, Leuven, Belgium"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Marcel","family":"Ehrhardt","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Computer Science, Freie Universit\u00e4t Berlin, Berlin, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jonny","family":"Hancox","sequence":"additional","affiliation":[{"name":"Health & Life Sciences, Intel Corporation, London, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Knut","family":"Reinert","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Computer Science, Freie Universit\u00e4t Berlin, Berlin, Germany"},{"name":"Otto-Warburg-Laboratory, RNA Bioinformatics, Max Planck Institute for Molecular Genetics, Berlin, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2018,5,3]]},"reference":[{"key":"2023012712415388300_bty380-B1","author":"Alpern","year":"1995"},{"key":"2023012712415388300_bty380-B2","author":"ARM","year":"2007"},{"key":"2023012712415388300_bty380-B3","doi-asserted-by":"crossref","first-page":"181.","DOI":"10.1186\/1471-2105-12-181","article-title":"Protein alignment algorithms with an efficient backtracking routine on multiple GPUs","volume":"12","author":"Blazewicz","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012712415388300_bty380-B4","doi-asserted-by":"crossref","first-page":"720","DOI":"10.1145\/324133.324234","article-title":"Scheduling multithreaded computations by work stealing","volume":"46","author":"Blumofe","year":"1999","journal-title":"J. ACM"},{"key":"2023012712415388300_bty380-B5","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nmeth.3176","article-title":"Fast and sensitive protein alignment using DIAMOND","volume":"12","author":"Buchfink","year":"2015","journal-title":"Nat. Methods"},{"key":"2023012712415388300_bty380-B6","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1093\/bioinformatics\/8.5.481","article-title":"Aligning two sequences within a specified diagonal band","volume":"8","author":"Chao","year":"1992","journal-title":"Bioinformatics"},{"key":"2023012712415388300_bty380-B7","doi-asserted-by":"crossref","first-page":"81.","DOI":"10.1186\/s12859-016-0930-z","article-title":"Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments","volume":"17","author":"Daily","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023012712415388300_bty380-B8","doi-asserted-by":"crossref","first-page":"11.","DOI":"10.1186\/1471-2105-9-11","article-title":"SeqAn an efficient, generic C++ library for sequence analysis","volume":"9","author":"D\u00f6ring","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012712415388300_bty380-B9","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1007\/BF02427852","article-title":"Parallel processing of biological sequence comparison algorithms","volume":"17","author":"Edmiston","year":"1988","journal-title":"Int. J. Parallel Program"},{"key":"2023012712415388300_bty380-B10","doi-asserted-by":"crossref","first-page":"619","DOI":"10.1093\/bioinformatics\/bts019","article-title":"Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using splazers","volume":"28","author":"Emde","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012712415388300_bty380-B11","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1093\/bioinformatics\/btl582","article-title":"Striped Smith-Waterman speeds database searches six times over other SIMD implementations","volume":"23","author":"Farrar","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012712415388300_bty380-B12","author":"Freescale Semiconductor","year":"1999"},{"key":"2023012712415388300_bty380-B13","author":"Frielingsdorf","year":"2015"},{"key":"2023012712415388300_bty380-B14","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1007\/BF02458577","article-title":"Optimal sequence alignment allowing for long gaps","volume":"52","author":"Gotoh","year":"1990","journal-title":"Bull. Math. Biol"},{"key":"2023012712415388300_bty380-B15","doi-asserted-by":"crossref","first-page":"i349","DOI":"10.1093\/bioinformatics\/btu439","article-title":"Lambda: the local aligner for massive biological data","volume":"30","author":"Hauswedell","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012712415388300_bty380-B16","author":"Holtgrewe","year":"2010","journal-title":"Mason \u2013 a Read Simulator for Second Generation Sequencing Data"},{"key":"2023012712415388300_bty380-B17","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1093\/bioinformatics\/btv051","article-title":"Methods for the detection and assembly of novel sequence in high-throughput sequencing data","volume":"31","author":"Holtgrewe","year":"2015","journal-title":"Bioinformatics"},{"key":"2023012712415388300_bty380-B18","author":"Intel","year":"2016"},{"key":"2023012712415388300_bty380-B19","first-page":"662","volume-title":"Intel\u00ae Xeon PhiTM Processor High Performance Programming, Knights Landing Edition","author":"Jeffers","year":"2016"},{"key":"2023012712415388300_bty380-B20","first-page":"21","article-title":"A high-performance portable abstract interface for explicit SIMD vectorization","author":"Karpi\u0144ski","year":"2017","journal-title":"Proc. 8th Int. Work. Program. Model. Appl. Multicores Manycores - PMAM\u201917"},{"key":"2023012712415388300_bty380-B21","doi-asserted-by":"crossref","first-page":"S15.","DOI":"10.1186\/1471-2105-12-S9-S15","article-title":"STELLAR: fast and exact local alignments","volume":"12","author":"Kehr","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012712415388300_bty380-B22","doi-asserted-by":"crossref","first-page":"4247","DOI":"10.1016\/j.jcp.2010.02.009","article-title":"Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors","volume":"229","author":"Khajeh-Saeed","year":"2010","journal-title":"J. Comput. Phys"},{"key":"2023012712415388300_bty380-B23","doi-asserted-by":"crossref","first-page":"2494","DOI":"10.1093\/bioinformatics\/btt410","article-title":"SW#-GPU-enabled exact alignments on genome scale","volume":"29","author":"Korpar","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012712415388300_bty380-B24","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat. Methods"},{"key":"2023012712415388300_bty380-B25","author":"Li","year":"2013"},{"key":"2023012712415388300_bty380-B26","author":"Li","year":"2012"},{"key":"2023012712415388300_bty380-B27","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1093\/bfgp\/elr035","article-title":"Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph","volume":"11","author":"Li","year":"2012","journal-title":"Brief. Funct. Genomics"},{"key":"2023012712415388300_bty380-B28","first-page":"184","volume-title":"2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors (ASAP)","author":"Liu","year":"2014"},{"key":"2023012712415388300_bty380-B29","first-page":"311","article-title":"A multithreaded parallel implementation of a dynamic programming algorithm for sequence comparison","author":"Martins","year":"2000","journal-title":"Biocomputing"},{"key":"2023012712415388300_bty380-B30","doi-asserted-by":"crossref","first-page":"117.","DOI":"10.1186\/1471-2105-14-117","article-title":"CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions","volume":"14","author":"Liu","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023012712415388300_bty380-B31","author":"Liu","year":"2014"},{"key":"2023012712415388300_bty380-B32","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1038\/nrg2626","article-title":"Sequencing technologies \u2013 the next generation","volume":"11","author":"Metzker","year":"2010","journal-title":"Nat. Rev. Genet"},{"key":"2023012712415388300_bty380-B33","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similiarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol"},{"key":"2023012712415388300_bty380-B34","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1006\/jmbi.2000.4042","article-title":"T-coffee: a novel method for fast and accurate multiple sequence alignment","volume":"302","author":"Notredame","year":"2000","journal-title":"J. Mol. Biol"},{"key":"2023012712415388300_bty380-B35","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1093\/bioinformatics\/bts649","article-title":"PBSIM: PacBio reads simulator\u2013toward accurate genome assembly","volume":"29","author":"Ono","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012712415388300_bty380-B36","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1002\/0471250953.bi0305s43","article-title":"Selecting the right similarity-scoring matrix","volume":"43","author":"Pearson","year":"2013","journal-title":"Curr. Protoc. Bioinformatics"},{"key":"2023012712415388300_bty380-B37","doi-asserted-by":"crossref","first-page":"i187","DOI":"10.1093\/bioinformatics\/btn281","article-title":"Segment-based multiple sequence alignment","volume":"24","author":"Rausch","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012712415388300_bty380-B38","doi-asserted-by":"crossref","first-page":"i333","DOI":"10.1093\/bioinformatics\/bts378","article-title":"DELLY: structural variant discovery by integrated paired-end and split-read analysis","volume":"28","author":"Rausch","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012712415388300_bty380-B39","author":"Reinert","year":"2009","journal-title":"Biological Sequence Analysis using the SeqAn C++ Library"},{"key":"2023012712415388300_bty380-B40","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1016\/j.jbiotec.2017.07.017","article-title":"The SeqAn C++ template library for efficient sequence analysis: a resource for programmers","volume":"261","author":"Reinert","year":"2017","journal-title":"J. Biotechnol"},{"key":"2023012712415388300_bty380-B41","doi-asserted-by":"crossref","first-page":"2941","DOI":"10.1093\/bioinformatics\/btx330","article-title":"Flexbar 3.0 \u2013 SIMD and multicore parallelization","volume":"33","author":"Roehr","year":"2017","journal-title":"Bioinformatics"},{"key":"2023012712415388300_bty380-B42","doi-asserted-by":"crossref","first-page":"221.","DOI":"10.1186\/1471-2105-12-221","article-title":"Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation","volume":"12","author":"Rognes","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012712415388300_bty380-B43","doi-asserted-by":"crossref","first-page":"699","DOI":"10.1093\/bioinformatics\/16.8.699","article-title":"Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors","volume":"16","author":"Rognes","year":"2000","journal-title":"Bioinformatics"},{"key":"2023012712415388300_bty380-B44","author":"Rucci","year":"2017"},{"key":"2023012712415388300_bty380-B45","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.1109\/TPDS.2012.194","article-title":"Retrieving smith-waterman alignments with optimizations for megabase biological sequences using GPU","volume":"24","author":"Sandes","year":"2013","journal-title":"IEEE Trans. Parallel Distrib. Syst"},{"key":"2023012712415388300_bty380-B46","author":"Sarje","year":"2008"},{"key":"2023012712415388300_bty380-B47","doi-asserted-by":"crossref","first-page":"e78.","DOI":"10.1093\/nar\/gkt005","article-title":"Fast and accurate read mapping with approximate seeds and multiple backtracking","volume":"41","author":"Siragusa","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023012712415388300_bty380-B48","author":"\u0160o\u0161i\u0107","year":"2014"},{"key":"2023012712415388300_bty380-B49","doi-asserted-by":"crossref","first-page":"107.","DOI":"10.1186\/1756-0500-1-107","article-title":"SWPS3 \u2013 fast multi-threaded vectorized Smith-Waterman for IBM Cell\/B.E. and X86\/SSE2","volume":"1","author":"Szalkowski","year":"2008","journal-title":"BMC Res. Notes"},{"key":"2023012712415388300_bty380-B50","first-page":"1347","article-title":"Dynamic gap selector: a Smith Waterman sequence alignment algorithm with affine gap model optimisation","author":"Urgese","year":"2014","journal-title":"Proc. IWBBIO"},{"key":"2023012712415388300_bty380-B51","volume-title":"C++ Templates: The Complete Guide","author":"Vandevoorde","year":"2002"},{"key":"2023012712415388300_bty380-B52","doi-asserted-by":"crossref","first-page":"2592","DOI":"10.1093\/bioinformatics\/bts505","article-title":"RazerS 3: faster, fully sensitive read mapping","volume":"28","author":"Weese","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012712415388300_bty380-B53","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1093\/bioinformatics\/13.2.145","article-title":"Using video-oriented instructions to speed up sequence comparison","volume":"13","author":"Wozniak","year":"1997","journal-title":"Bioinformatics"},{"key":"2023012712415388300_bty380-B54","doi-asserted-by":"crossref","first-page":"159.","DOI":"10.1186\/1471-2105-12-159","article-title":"RAPSearch: a fast protein similarity search tool for short reads","volume":"12","author":"Ye","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012712415388300_bty380-B55","doi-asserted-by":"crossref","first-page":"e82138","DOI":"10.1371\/journal.pone.0082138","article-title":"SSW library: an SIMD Smith-Waterman C\/C++ library for use in genomic applications","volume":"8","author":"Zhao","year":"2013","journal-title":"PLoS One"},{"key":"2023012712415388300_bty380-B56","doi-asserted-by":"crossref","first-page":"160025","DOI":"10.1038\/sdata.2016.25","article-title":"Extensive sequencing of seven human genomes to characterize benchmark reference materials","volume":"3","author":"Zook","year":"2016","journal-title":"Sci. Data"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/20\/3437\/48918958\/bioinformatics_34_20_3437.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/20\/3437\/48918958\/bioinformatics_34_20_3437.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T13:34:49Z","timestamp":1674826489000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/20\/3437\/4992147"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2018,5,3]]},"references-count":56,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2018,10,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty380","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,10,15]]},"published":{"date-parts":[[2018,5,3]]}}}