{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T18:44:46Z","timestamp":1776278686961,"version":"3.50.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"20","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,10,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The advent of next-generation sequencing (NGS) techniques presents many novel opportunities for many applications in life sciences. The vast number of short reads produced by these techniques, however, pose significant computational challenges. The first step in many types of genomic analysis is the mapping of short reads to a reference genome, and several groups have developed dedicated algorithms and software packages to perform this function. As the developers of these packages optimize their algorithms with respect to various considerations, the relative merits of different software packages remain unclear. However, for scientists who generate and use NGS data for their specific research projects, an important consideration is choosing the software that is most suitable for their application.<\/jats:p>\n               <jats:p>Results: With a view to comparing existing short read alignment software, we develop a simulation and evaluation suite, Seal, which simulates NGS runs for different configurations of various factors, including sequencing error, indels and coverage. We also develop criteria to compare the performances of software with disparate output structure (e.g. some packages return a single alignment while some return multiple possible alignments). Using these criteria, we comprehensively evaluate the performances of Bowtie, BWA, mr- and mrsFAST, Novoalign, SHRiMP and SOAPv2, with regard to accuracy and runtime.<\/jats:p>\n               <jats:p>Conclusion: We expect that the results presented here will be useful to investigators in choosing the alignment software that is most suitable for their specific research aims. Our results also provide insights into the factors that should be considered to use alignment results effectively. Seal can also be used to evaluate the performance of algorithms that use deep sequencing data for various purposes (e.g. identification of genomic variants).<\/jats:p>\n               <jats:p>Availability: \u00a0Seal is available as open source at http:\/\/compbio.case.edu\/seal\/.<\/jats:p>\n               <jats:p>Contact: \u00a0matthew.ruffalo@case.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btr477","type":"journal-article","created":{"date-parts":[[2011,8,20]],"date-time":"2011-08-20T01:14:59Z","timestamp":1313802899000},"page":"2790-2796","source":"Crossref","is-referenced-by-count":178,"title":["Comparative analysis of algorithms for next-generation sequencing read alignment"],"prefix":"10.1093","volume":"27","author":[{"given":"Matthew","family":"Ruffalo","sequence":"first","affiliation":[{"name":"1 Department of Electrical Engineering and Computer Science, 2Department of Genetics and 3Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thomas","family":"LaFramboise","sequence":"additional","affiliation":[{"name":"1 Department of Electrical Engineering and Computer Science, 2Department of Genetics and 3Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA"},{"name":"1 Department of Electrical Engineering and Computer Science, 2Department of Genetics and 3Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mehmet","family":"Koyut\u00fcrk","sequence":"additional","affiliation":[{"name":"1 Department of Electrical Engineering and Computer Science, 2Department of Genetics and 3Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA"},{"name":"1 Department of Electrical Engineering and Computer Science, 2Department of Genetics and 3Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2011,8,19]]},"reference":[{"key":"2023012512013126000_B1","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/ng.437","article-title":"Personalized copy number and segmental duplication maps using next-generation sequencing","volume":"41","author":"Alkan","year":"2009","journal-title":"Nat. Genet."},{"key":"2023012512013126000_B2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gm2","article-title":"Systems medicine: the future of medical genomics and healthcare","volume":"1","author":"Auffray","year":"2009","journal-title":"Genome Med."},{"key":"2023012512013126000_B3","doi-asserted-by":"crossref","first-page":"1003","DOI":"10.1126\/science.1072047","article-title":"Recent segmental duplications in the human genome","volume":"297","author":"Bailey","year":"2002","journal-title":"Science"},{"key":"2023012512013126000_B4","volume-title":"A block-sorting lossless data compression algorithm.","author":"Burrows","year":"1994"},{"key":"2023012512013126000_B5","first-page":"56","article-title":"FLASH: a fast look-up algorithm for string homology","volume-title":"Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology","author":"Califano","year":"1993"},{"key":"2023012512013126000_B6","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2003-4-4-r25","article-title":"Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence","volume":"4","author":"Cheung","year":"2003","journal-title":"Genome Biol."},{"key":"2023012512013126000_B7","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1101\/gr.8.3.186","article-title":"Base-calling of automated sequencer traces using Phred. II. error probabilities","volume":"8","author":"Ewing","year":"1998","journal-title":"Genome Res."},{"key":"2023012512013126000_B8","first-page":"390","article-title":"Opportunistic data structures with applications","volume-title":"Proceedings of the 41st Symposium on Foundations of Computer Science (FOCS 2000)","author":"Ferragina","year":"2000"},{"key":"2023012512013126000_B9","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1186\/1471-2164-10-163","article-title":"A transcriptional sketch of a primary human breast cancer by 454 deep sequencing","volume":"10","author":"Guffanti","year":"2009","journal-title":"BMC Genomics"},{"key":"2023012512013126000_B10","doi-asserted-by":"crossref","first-page":"576","DOI":"10.1038\/nmeth0810-576","article-title":"mrsFAST: a cache-oblivious algorithm for short-read mapping","volume":"7","author":"Hach","year":"2010","journal-title":"Nat. Methods"},{"key":"2023012512013126000_B11","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1093\/bib\/bbp046","article-title":"Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing","volume":"11","author":"Horner","year":"2010","journal-title":"Brief. Bioinformatics"},{"key":"2023012512013126000_B12","author":"Illumina","year":"2010","journal-title":"Quality scores data."},{"key":"2023012512013126000_B13","doi-asserted-by":"crossref","first-page":"860","DOI":"10.1038\/35057062","article-title":"Initial sequencing and analysis of the human genome","volume":"409","author":"International Human Genome Sequencing Consortium","year":"2001","journal-title":"Nature"},{"key":"2023012512013126000_B14","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2009-10-3-r25","article-title":"Ultrafast and memory-efficient alignment of short DNA sequences to the human genome","volume":"10","author":"Langmead","year":"2009","journal-title":"Genome Biol."},{"key":"2023012512013126000_B15","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows-Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512013126000_B16","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1093\/bioinformatics\/btp698","article-title":"Fast and accurate long-read alignment with Burrows-Wheeler transform","volume":"26","author":"Li","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012512013126000_B17","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1093\/bib\/bbq015","article-title":"A survey of sequence alignment algorithms for next-generation sequencing","volume":"11","author":"Li","year":"2010","journal-title":"Brief. Bioinformatics"},{"key":"2023012512013126000_B18","doi-asserted-by":"crossref","first-page":"1851","DOI":"10.1101\/gr.078212.108","article-title":"Mapping short DNA sequencing reads and calling variants using mapping quality scores","volume":"18","author":"Li","year":"2008","journal-title":"Genome Res."},{"key":"2023012512013126000_B19","doi-asserted-by":"crossref","first-page":"713","DOI":"10.1093\/bioinformatics\/btn025","article-title":"SOAP: short oligonucleotide alignment program","volume":"24","author":"Li","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012512013126000_B20","doi-asserted-by":"crossref","first-page":"1966","DOI":"10.1093\/bioinformatics\/btp336","article-title":"SOAP2: an improved ultrafast tool for short read alignment","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512013126000_B21","doi-asserted-by":"crossref","first-page":"S13","DOI":"10.1038\/nmeth.1374","article-title":"Computational methods for discovering structural variation with next-generation sequencing","volume":"6","author":"Medvedev","year":"2009","journal-title":"Nat. Methods"},{"key":"2023012512013126000_B22","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/j.ygeno.2010.03.001","article-title":"Assembly algorithms for next-generation sequencing data","volume":"95","author":"Miller","year":"2010","journal-title":"Genomics"},{"key":"2023012512013126000_B23"},{"key":"2023012512013126000_B24","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nature08821","article-title":"A human gut microbial gene catalogue established by metagenomic sequencing","volume":"464","author":"Qin","year":"2010","journal-title":"Nature"},{"key":"2023012512013126000_B25","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1007\/11415770_15","article-title":"Efficient q-gram filters for finding all e-matches over a given length","volume-title":"Research in Computational Molecular Biology","author":"Rasmussen","year":"2005"},{"issue":"Suppl. 1","key":"2023012512013126000_B26","doi-asserted-by":"crossref","first-page":"D613","DOI":"10.1093\/nar\/gkp939","article-title":"The UCSC Genome Browser database: update 2010","volume":"38","author":"Rhead","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2023012512013126000_B27","doi-asserted-by":"crossref","first-page":"e1000386","DOI":"10.1371\/journal.pcbi.1000386","article-title":"Shrimp: accurate mapping of short color-space reads","volume":"5","author":"Rumble","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023012512013126000_B28","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1038\/nmeth1156","article-title":"Next-generation sequencing transforms today's biology","volume":"5","author":"Schuster","year":"2007","journal-title":"Nat. Methods"},{"key":"2023012512013126000_B29","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol."},{"key":"2023012512013126000_B30","doi-asserted-by":"crossref","first-page":"956","DOI":"10.1126\/science.1160342","article-title":"A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome","volume":"321","author":"Sultan","year":"2008","journal-title":"Science"},{"key":"2023012512013126000_B31","doi-asserted-by":"crossref","first-page":"8511","DOI":"10.1158\/0008-5472.CAN-07-1016","article-title":"Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promoters by 454 sequencing","volume":"67","author":"Taylor","year":"2007","journal-title":"Cancer Res."},{"key":"2023012512013126000_B32","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1038\/nmeth.1185","article-title":"SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries","volume":"5","author":"Van Tassell","year":"2008","journal-title":"Nat. Methods"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/20\/2790\/48872412\/bioinformatics_27_20_2790.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/20\/2790\/48872412\/bioinformatics_27_20_2790.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T14:08:36Z","timestamp":1674655716000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/27\/20\/2790\/201940"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,8,19]]},"references-count":32,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2011,10,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btr477","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2011,10,15]]},"published":{"date-parts":[[2011,8,19]]}}}