{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T18:41:16Z","timestamp":1767984076575,"version":"3.49.0"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2022,2,9]],"date-time":"2022-02-09T00:00:00Z","timestamp":1644364800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01-HG007196"],"award-info":[{"award-number":["R01-HG007196"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01-HL129239"],"award-info":[{"award-number":["R01-HL129239"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Johns Hopkins Department of Physics and Astronomy"},{"DOI":"10.13039\/100015503","name":"Lieber Institute for Brain Development","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100015503","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Extreme Science and Engineering Discovery Environment"},{"name":"UCSD Expanse and Purdue Anvil, XSEDE","award":["CCR190056"],"award-info":[{"award-number":["CCR190056"]}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["ACI-154856"],"award-info":[{"award-number":["ACI-154856"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,4,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Summary<\/jats:title><jats:p>Over the past decade, short-read sequence alignment has become a mature technology. Optimized algorithms, careful software engineering and high-speed hardware have contributed to greatly increased throughput and accuracy. With these improvements, many opportunities for performance optimization have emerged. In this review, we examine three general-purpose short-read alignment tools\u2014BWA-MEM, Bowtie 2 and Arioc\u2014with a focus on performance optimization. We analyze the performance-related behavior of the algorithms and heuristics each tool implements, with the goal of arriving at practical methods of improving processing speed and accuracy. We indicate where an aligner's default behavior may result in suboptimal performance, explore the effects of computational constraints such as end-to-end mapping and alignment scoring threshold, and discuss sources of imprecision in the computation of alignment scores and mapping quality. With this perspective, we describe an approach to tuning short-read aligner performance to meet specific data-analysis and throughput requirements while avoiding potential inaccuracies in subsequent analysis of alignment results. Finally, we illustrate how this approach avoids easily overlooked pitfalls and leads to verifiable improvements in alignment speed and accuracy.<\/jats:p><\/jats:sec><jats:sec><jats:title>Contact<\/jats:title><jats:p>richard.wilton@jhu.edu<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Appendices referenced in this article are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac066","type":"journal-article","created":{"date-parts":[[2022,2,2]],"date-time":"2022-02-02T04:12:43Z","timestamp":1643775163000},"page":"2081-2087","source":"Crossref","is-referenced-by-count":15,"title":["Performance optimization in DNA short-read alignment"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1263-5532","authenticated-orcid":false,"given":"Richard","family":"Wilton","sequence":"first","affiliation":[{"name":"Department of Physics and Astronomy, Johns Hopkins University , Baltimore, MD 21218, USA"}]},{"given":"Alexander S","family":"Szalay","sequence":"additional","affiliation":[{"name":"Department of Physics and Astronomy, Johns Hopkins University , Baltimore, MD 21218, USA"},{"name":"Department of Computer Science, Johns Hopkins University , Baltimore, MD 21218, USA"}]}],"member":"286","published-online":{"date-parts":[[2022,2,9]]},"reference":[{"key":"2023033101030355200_","year":"2022"},{"key":"2023033101030355200_","year":"2022"},{"key":"2023033101030355200_","year":"2022"},{"key":"2023033101030355200_","year":"2017"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"2114","DOI":"10.1093\/bioinformatics\/btu170","article-title":"Trimmomatic: a flexible trimmer for Illumina sequence data","volume":"30","author":"Bolger","year":"2014","journal-title":"Bioinformatics"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1109\/JPROC.2015.2455551","article-title":"Short read mapping: an algorithmic tour","volume":"105","author":"Canzar","year":"2017","journal-title":"Proc. IEEE"},{"key":"2023033101030355200_","volume-title":"How to Write Parallel Programs: A First Course","author":"Carriero","year":"1990"},{"key":"2023033101030355200_","first-page":"216","author":"Chow","year":"1991"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"1767","DOI":"10.1093\/nar\/gkp1137","article-title":"The Sanger FASTQ file format for sequences with quality scores, and the Solexa\/Illumina FASTQ variants","volume":"38","author":"Cock","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/gigascience\/giab008","article-title":"Twelve years of SAMtools and BCFtools","volume":"10","author":"Danecek","year":"2021","journal-title":"GigaScience"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"3712","DOI":"10.1093\/bioinformatics\/btaa265","article-title":"Vargas: heuristic-free alignment for assessing linear and graph read aligners","volume":"36","author":"Darby","year":"2020","journal-title":"Bioinformatics"},{"key":"2023033101030355200_","first-page":"390","author":"Ferragina","year":"2000"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1016\/0022-2836(82)90398-9","article-title":"An improved algorithm for matching biological sequences","volume":"162","author":"Gotoh","year":"1982","journal-title":"J. Mol. Biol"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511574931","volume-title":"Algorithms on Strings, Trees, and Sequences","author":"Gusfield","year":"1997"},{"key":"2023033101030355200_","author":"Holtgrewe","year":"2010"},{"key":"2023033101030355200_","year":"2021"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat. Methods"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1093\/bioinformatics\/bty648","article-title":"Scaling read aligners to hundreds of threads on general-purpose processors","volume":"35","author":"Langmead","year":"2019","journal-title":"Bioinformatics"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"1838","DOI":"10.1093\/bioinformatics\/bts280","article-title":"Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly","volume":"28","author":"Li","year":"2012","journal-title":"Bioinformatics"},{"key":"2023033101030355200_","author":"Li","year":"2013"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1093\/bib\/bbq015","article-title":"A survey of sequence alignment algorithms for next-generation sequencing","volume":"11","author":"Li","year":"2010","journal-title":"Brief. Bioinf"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map (SAM) format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"2431","DOI":"10.1093\/bioinformatics\/btn416","article-title":"ZOOM! Zillions of oligos mapped","volume":"24","author":"Lin","year":"2008","journal-title":"Bioinformatics"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1186\/1471-2105-14-117","article-title":"CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions","volume":"14","author":"Liu","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023033101030355200_","first-page":"314","author":"Md","year":"2019"},{"key":"2023033101030355200_","year":"2021"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1146\/annurev-genom-090413-025358","article-title":"Alignment of next-generation sequencing reads","volume":"16","author":"Reinert","year":"2015","journal-title":"Annu. Rev. Genomics Hum. Genet"},{"key":"2023033101030355200_","year":"2021"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"e1002195","DOI":"10.1371\/journal.pbio.1002195","article-title":"Big data: astronomical or genomical?","volume":"13","author":"Stephens","year":"2015","journal-title":"PLoS Biol"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"lqab019","DOI":"10.1093\/nargab\/lqab019","article-title":"Sequencing error profiles of Illumina sequencing instruments","volume":"3","author":"Stoler","year":"2021","journal-title":"NAR Genomics Bioinf"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"2032","DOI":"10.1093\/bioinformatics\/btv098","article-title":"Sambamba: fast processing of NGS alignment formats","volume":"31","author":"Tarasov","year":"2015","journal-title":"Bioinformatics"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"2150","DOI":"10.1002\/pro.3954","article-title":"Substitution scoring matrices for proteins \u2013 an overview","volume":"29","author":"Trivedi","year":"2020","journal-title":"Protein Sci"},{"key":"2023033101030355200_","year":"2022"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"e1008383","DOI":"10.1371\/journal.pcbi.1008383","article-title":"Arioc: high-concurrency short-read alignment on multiple GPUs","volume":"16","author":"Wilton","year":"2020","journal-title":"PLoS Comput. Biol"},{"key":"2023033101030355200_","doi-asserted-by":"crossref","first-page":"e808","DOI":"10.7717\/peerj.808","article-title":"Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space","volume":"3","author":"Wilton","year":"2015","journal-title":"PeerJ"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac066\/42599082\/btac066.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/8\/2081\/49692956\/btac066.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/8\/2081\/49692956\/btac066.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,16]],"date-time":"2023-11-16T19:09:38Z","timestamp":1700161778000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/8\/2081\/6525215"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,2,9]]},"references-count":36,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2022,4,12]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac066","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,4,15]]},"published":{"date-parts":[[2022,2,9]]}}}