{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T18:10:37Z","timestamp":1777486237419,"version":"3.51.4"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"21","license":[{"start":{"date-parts":[[2019,3,28]],"date-time":"2019-03-28T00:00:00Z","timestamp":1553731200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["HG006004"],"award-info":[{"award-number":["HG006004"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"EMBO Installation","award":["IG-2521"],"award-info":[{"award-number":["IG-2521"]}]},{"name":"EMBO Installation","award":["TUBITAK-2215"],"award-info":[{"award-number":["TUBITAK-2215"]}]},{"DOI":"10.13039\/501100004410","name":"Scientific and Technological Research Council of Turkey","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004410","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The ability to generate massive amounts of sequencing data continues to overwhelm the processing capability of existing algorithms and compute infrastructures. In this work, we explore the use of hardware\/software co-design and hardware acceleration to significantly reduce the execution time of short sequence alignment, a crucial step in analyzing sequenced genomes. We introduce Shouji, a highly parallel and accurate pre-alignment filter that remarkably reduces the need for computationally-costly dynamic programming algorithms. The first key idea of our proposed pre-alignment filter is to provide high filtering accuracy by correctly detecting all common subsequences shared between two given sequences. The second key idea is to design a hardware accelerator that adopts modern field-programmable gate array (FPGA) architectures to further boost the performance of our algorithm.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Shouji significantly improves the accuracy of pre-alignment filtering by up to two orders of magnitude compared to the state-of-the-art pre-alignment filters, GateKeeper and SHD. Our FPGA-based accelerator is up to three orders of magnitude faster than the equivalent CPU implementation of Shouji. Using a single FPGA chip, we benchmark the benefits of integrating Shouji with five state-of-the-art sequence aligners, designed for different computing platforms. The addition of Shouji as a pre-alignment step reduces the execution time of the five state-of-the-art sequence aligners by up to 18.8\u00d7. Shouji can be adapted for any bioinformatics pipeline that performs sequence alignment for verification. Unlike most existing methods that aim to accelerate sequence alignment, Shouji does not sacrifice any of the aligner capabilities, as it does not modify or replace the alignment step.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/CMU-SAFARI\/Shouji.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz234","type":"journal-article","created":{"date-parts":[[2019,3,27]],"date-time":"2019-03-27T21:26:30Z","timestamp":1553721990000},"page":"4255-4263","source":"Crossref","is-referenced-by-count":69,"title":["Shouji: a fast and efficient pre-alignment filter for sequence alignment"],"prefix":"10.1093","volume":"35","author":[{"given":"Mohammed","family":"Alser","sequence":"first","affiliation":[{"name":"Computer Science Department, ETH Z\u00fcrich, Z\u00fcrich , Switzerland"},{"name":"Chair for Processor Design, Center For Advancing Electronics Dresden, Institute of Computer Engineering, Technische Universit\u00e4t Dresden , Dresden, Germany"},{"name":"Computer Engineering Department, Bilkent University , Ankara, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hasan","family":"Hassan","sequence":"additional","affiliation":[{"name":"Computer Science Department, ETH Z\u00fcrich, Z\u00fcrich , Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Akash","family":"Kumar","sequence":"additional","affiliation":[{"name":"Chair for Processor Design, Center For Advancing Electronics Dresden, Institute of Computer Engineering, Technische Universit\u00e4t Dresden , Dresden, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Onur","family":"Mutlu","sequence":"additional","affiliation":[{"name":"Computer Science Department, ETH Z\u00fcrich, Z\u00fcrich , Switzerland"},{"name":"Computer Engineering Department, Bilkent University , Ankara, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[{"name":"Computer Engineering Department, Bilkent University , Ankara, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2019,3,28]]},"reference":[{"key":"2023062712451457600_btz234-B1","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nature11632","article-title":"An integrated map of genetic variation from 1,092 human genomes","volume":"491","year":"2012","journal-title":"Nature"},{"key":"2023062712451457600_btz234-B2","doi-asserted-by":"crossref","first-page":"e41","DOI":"10.1093\/nar\/gkr1246","article-title":"Hobbes: optimized gram-based methods for efficient read alignment","volume":"40","author":"Ahmadi","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023062712451457600_btz234-B3","doi-asserted-by":"crossref","first-page":"1202","DOI":"10.1109\/TCBB.2016.2586070","article-title":"A Survey of Software and Hardware Approaches to Performing Read Alignment in Next Generation Sequencing","volume":"14","author":"Al Kawam","year":"2017","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"2023062712451457600_btz234-B4","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/ng.437","article-title":"Personalized copy number and segmental duplication maps using next-generation sequencing","volume":"41","author":"Alkan","year":"2009","journal-title":"Nat. Genet"},{"key":"2023062712451457600_btz234-B5","doi-asserted-by":"crossref","first-page":"3355","DOI":"10.1093\/bioinformatics\/btx342","article-title":"GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping","volume":"33","author":"Alser","year":"2017","journal-title":"Bioinformatics"},{"key":"2023062712451457600_btz234-B6","first-page":"33","article-title":"MAGNET: understanding and improving the accuracy of genome pre-alignment filtering","volume":"13","author":"Alser","year":"2017","journal-title":"TIR"},{"key":"2023062712451457600_btz234-B7","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1109\/MDAT.2013.2293757","article-title":"A review of hardware acceleration for computational genomics","volume":"31","author":"Aluru","year":"2014","journal-title":"IEEE Des. Test"},{"key":"2023062712451457600_btz234-B8","first-page":"51","article-title":"Edit distance cannot be computed in strongly subquadratic time (unless SETH is false)","author":"Backurs","year":"2017","journal-title":"Proceedings of the forty-seventh annual ACM symposium on Theory of computing"},{"key":"2023062712451457600_btz234-B9","first-page":"02657","article-title":"ASAP: accelerated short-read alignment on programmable hardware","volume":"1803","author":"Banerjee","year":"2018","journal-title":"arXiv"},{"key":"2023062712451457600_btz234-B10","first-page":"141","article-title":"Additive distances and quasi-distances between words","volume":"8","author":"Calude","year":"2002","journal-title":"J. Univers. Comput. Sci"},{"key":"2023062712451457600_btz234-B11","doi-asserted-by":"crossref","first-page":"840","DOI":"10.1109\/TCBB.2014.2326876","article-title":"Accelerating the next generation long read mapping with the FPGA-based system","volume":"11","author":"Chen","year":"2014","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"2023062712451457600_btz234-B12","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1109\/FCCM.2016.18","volume-title":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","author":"Chen","year":"2016"},{"key":"2023062712451457600_btz234-B13","doi-asserted-by":"crossref","first-page":"81.","DOI":"10.1186\/s12859-016-0930-z","article-title":"Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments","volume":"17","author":"Daily","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023062712451457600_btz234-B14","doi-asserted-by":"crossref","first-page":"176","DOI":"10.1007\/s12539-017-0225-8","article-title":"FPGASW: accelerating Large-Scale Smith\u2013Waterman Sequence Alignment Application with Backtracking on FPGA Linear Systolic Array","volume":"10","author":"Fei","year":"2018","journal-title":"Interdiscip. Sci"},{"key":"2023062712451457600_btz234-B15","first-page":"1000106","article-title":"Accuracy of next generation sequencing platforms","volume":"1","author":"Fox","year":"2014","journal-title":"Next Gener. Seq. Appl"},{"key":"2023062712451457600_btz234-B16","first-page":"561","author":"Georganas","year":"2015"},{"key":"2023062712451457600_btz234-B17","doi-asserted-by":"crossref","first-page":"184.","DOI":"10.1186\/1471-2105-14-184","article-title":"Benchmarking short sequence mapping tools","volume":"14","author":"Hatem","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023062712451457600_btz234-B18","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062712451457600_btz234-B19","doi-asserted-by":"crossref","first-page":"50.","DOI":"10.1109\/MC.2007.79","article-title":"Achieving high performance with FPGA-based computing","volume":"40","author":"Herbordt","year":"2007","journal-title":"Computer"},{"key":"2023062712451457600_btz234-B20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2815631","article-title":"RIFFA 2.1: a Reusable Integration Framework for FPGA Accelerators","volume":"8","author":"Jacobsen","year":"2015","journal-title":"ACM TRETS"},{"key":"2023062712451457600_btz234-B21","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1186\/s12864-018-4460-0","article-title":"GRIM-Filter: fast seed location filtering in DNA read mapping using processing-in-memory technologies","volume":"19","author":"Kim","year":"2018","journal-title":"BMC Genomics"},{"key":"2023062712451457600_btz234-B22","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1109\/MC.1982.1653825","article-title":"Why systolic architectures?","volume":"15","author":"Kung","year":"1982","journal-title":"IEEE Comput"},{"key":"2023062712451457600_btz234-B23","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions, and reversals","volume":"10","author":"Levenshtein","year":"1966","journal-title":"Sov. Phys. Dokl"},{"key":"2023062712451457600_btz234-B24","first-page":"3997","article-title":"Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM","volume":"1303","author":"Li","year":"2013","journal-title":"arXiv"},{"key":"2023062712451457600_btz234-B25","doi-asserted-by":"crossref","first-page":"917","DOI":"10.1093\/bioinformatics\/btw659","article-title":"HiLive: real-time mapping of illumina reads while sequencing","volume":"33","author":"Lindner","year":"2016","journal-title":"Bioinformatics"},{"key":"2023062712451457600_btz234-B26","doi-asserted-by":"crossref","first-page":"1435","DOI":"10.1126\/science.2983426","article-title":"Rapid and sensitive protein similarity searches","volume":"227","author":"Lipman","year":"1985","journal-title":"Science"},{"key":"2023062712451457600_btz234-B27","doi-asserted-by":"crossref","first-page":"958","DOI":"10.1002\/cpe.3371","article-title":"GSWABE: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences","volume":"27","author":"Liu","year":"2015","journal-title":"Concurr. Comput"},{"key":"2023062712451457600_btz234-B28","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1186\/1471-2105-14-117","article-title":"CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions","volume":"14","author":"Liu","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023062712451457600_btz234-B29","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1016\/0022-0000(80)90002-1","article-title":"A faster algorithm computing string edit distances","volume":"20","author":"Masek","year":"1980","journal-title":"J. Comput. Syst. Sci"},{"key":"2023062712451457600_btz234-B30","doi-asserted-by":"crossref","first-page":"1527","DOI":"10.1101\/gr.091868.109","article-title":"Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding","volume":"19","author":"McKernan","year":"2009","journal-title":"Genome Res"},{"key":"2023062712451457600_btz234-B31","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1145\/375360.375365","article-title":"A guided tour to approximate string matching","volume":"33","author":"Navarro","year":"2001","journal-title":"ACM Comput. Surv"},{"key":"2023062712451457600_btz234-B32","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol"},{"key":"2023062712451457600_btz234-B33","first-page":"1","author":"Ng","year":"2017"},{"key":"2023062712451457600_btz234-B34","first-page":"932","author":"Nishimura","year":"2017"},{"key":"2023062712451457600_btz234-B35","first-page":"178","author":"Salinas","year":"2017"},{"key":"2023062712451457600_btz234-B36","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1145\/2893488","article-title":"Parallel optimal pairwise biological sequence comparison: algorithms, platforms, and classification","volume":"48","author":"Sandes","year":"2016","journal-title":"ACM Comput. Surv"},{"key":"2023062712451457600_btz234-B37","article-title":"Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions","author":"Senol","year":"2018","journal-title":"Brief. Bioinform"},{"key":"2023062712451457600_btz234-B38","first-page":"273","author":"Seshadri","year":"2017"},{"key":"2023062712451457600_btz234-B39","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol"},{"key":"2023062712451457600_btz234-B40","doi-asserted-by":"crossref","first-page":"1394","DOI":"10.1093\/bioinformatics\/btw753","article-title":"Edlib: a C\/C++ library for fast, exact sequence alignment using edit distance","volume":"33","author":"\u0160o\u0161i\u0107","year":"2017","journal-title":"Bioinformatics"},{"key":"2023062712451457600_btz234-B41","doi-asserted-by":"crossref","first-page":"318","DOI":"10.1109\/JPROC.2015.2392104","article-title":"Three ages of FPGAs: a retrospective on the first thirty years of FPGA technology","volume":"103","author":"Trimberger","year":"2015","journal-title":"Proc. IEEE"},{"key":"2023062712451457600_btz234-B42","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1016\/S0019-9958(85)80046-2","article-title":"Algorithms for approximate string matching","volume":"64","author":"Ukkonen","year":"1985","journal-title":"Inform. Control"},{"key":"2023062712451457600_btz234-B43","first-page":"1","author":"Waidyasooriya","year":"2015"},{"key":"2023062712451457600_btz234-B44","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1016\/j.compbiolchem.2011.07.006","article-title":"Comparison of linear gap penalties and profile-based variable gap penalties in profile\u2013profile alignments","volume":"35","author":"Wang","year":"2011","journal-title":"Comput. Biol. Chem"},{"key":"2023062712451457600_btz234-B45","year":"2014"},{"key":"2023062712451457600_btz234-B46","doi-asserted-by":"crossref","first-page":"S13.","DOI":"10.1186\/1471-2164-14-S1-S13","article-title":"Accelerating read mapping with FastHASH","volume":"14","author":"Xin","year":"2013","journal-title":"BMC Genomics"},{"key":"2023062712451457600_btz234-B47","doi-asserted-by":"crossref","first-page":"1553","DOI":"10.1093\/bioinformatics\/btu856","article-title":"Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping","volume":"31","author":"Xin","year":"2015","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz234\/28533771\/btz234.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/21\/4255\/50721703\/bioinformatics_35_21_4255.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/21\/4255\/50721703\/bioinformatics_35_21_4255.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T12:46:24Z","timestamp":1687869984000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/21\/4255\/5421509"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2019,3,28]]},"references-count":47,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2019,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz234","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,11,1]]},"published":{"date-parts":[[2019,3,28]]}}}