{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T05:52:00Z","timestamp":1777096320625,"version":"3.51.4"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2426,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Next-generation sequencing captures sequence differences in reads relative to a reference genome or transcriptome, including splicing events and complex variants involving multiple mismatches and long indels. We present computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained search process of merging and filtering position lists from a genomic index. Our methods are implemented in GSNAP (Genomic Short-read Nucleotide Alignment Program), which can align both single- and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short- and long-distance splicing, including interchromosomal splicing, in individual reads, using probabilistic models or a database of known splice sites. Our program also permits SNP-tolerant alignment to a reference space of all possible combinations of major and minor alleles, and can align reads from bisulfite-treated DNA for the study of methylation state.<\/jats:p>\n               <jats:p>Results: In comparison testing, GSNAP has speeds comparable to existing programs, especially in reads of \u226570 nt and is fastest in detecting complex variants with four or more mismatches or insertions of 1\u20139 nt and deletions of 1\u201330 nt. Although SNP tolerance does not increase alignment yield substantially, it affects alignment results in 7\u20138% of transcriptional reads, typically by revealing alternate genomic mappings for a read. Simulations of bisulfite-converted DNA show a decrease in identifying genomic positions uniquely in 6% of 36 nt reads and 3% of 70 nt reads.<\/jats:p>\n               <jats:p>Availability: Source code in C and utility programs in Perl are freely available for download as part of the GMAP package at http:\/\/share.gene.com\/gmap.<\/jats:p>\n               <jats:p>Contact: \u00a0twu@gene.com<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq057","type":"journal-article","created":{"date-parts":[[2010,2,11]],"date-time":"2010-02-11T03:43:13Z","timestamp":1265859793000},"page":"873-881","source":"Crossref","is-referenced-by-count":1763,"title":["Fast and SNP-tolerant detection of complex variants and splicing in short reads"],"prefix":"10.1093","volume":"26","author":[{"given":"Thomas D.","family":"Wu","sequence":"first","affiliation":[{"name":"Department of Bioinformatics, Genentech, Inc., 1 DNA Way, South San Francisco, CA, USA"}]},{"given":"Serban","family":"Nacu","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics, Genentech, Inc., 1 DNA Way, South San Francisco, CA, USA"}]}],"member":"286","published-online":{"date-parts":[[2010,2,10]]},"reference":[{"key":"2023012508033121200_B1","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1093\/hmg\/ddi006","article-title":"Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes","volume":"14","author":"Bhangale","year":"2005","journal-title":"Hum. Mol. Genet."},{"key":"2023012508033121200_B2","doi-asserted-by":"crossref","first-page":"i174","DOI":"10.1093\/bioinformatics\/btn300","article-title":"Optimal spliced alignments of short sequence reads","volume":"24","author":"Bona","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012508033121200_B3","article-title":"A block-sorting lossless data compression algorithm","volume-title":"Technical Report 124.","author":"Burrows","year":"1994"},{"key":"2023012508033121200_B4","doi-asserted-by":"crossref","first-page":"1115","DOI":"10.1038\/nbt1236","article-title":"Evaluation of DNA microarray results with quantitative gene expression platforms","volume":"24","author":"Canales","year":"2006","journal-title":"Nat. Biotechnol."},{"issue":"suppl. 4","key":"2023012508033121200_B5","doi-asserted-by":"crossref","first-page":"16491","DOI":"10.1073\/pnas.162371599","article-title":"Locus-specific control of asymmetric and CpNpG methylation by the DRM and CMT3 methyltransferase genes","volume":"99","author":"Cao","year":"2002","journal-title":"Proc. Natl Acad. Sci."},{"key":"2023012508033121200_B6","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1038\/nbt.1530","article-title":"Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming","volume":"27","author":"Deng","year":"2009","journal-title":"Nat. Biotechnol."},{"key":"2023012508033121200_B7","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1016\/j.modgep.2003.08.006","article-title":"A conserved non-homeodomain Hoxa9 isoform interacting with CBP is co-expressed with the \u2018typical\u2019 Hoxa9 protein during embryogenesis","volume":"4","author":"Dintilhac","year":"2004","journal-title":"Gene Expression Patterns"},{"key":"2023012508033121200_B8","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1146\/annurev.biochem.74.010904.153721","article-title":"Eukaryotic cytosine methyltransferases","volume":"74","author":"Goll","year":"2005","journal-title":"Annu. Rev. Biochem."},{"key":"2023012508033121200_B9","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1101\/gr.080259.108","article-title":"A sequence-level map of chromosomal breakpoints in the MCF-7 breast cancer cell line yields insights into the evolution of a cancer genome","volume":"19","author":"Hampton","year":"2009","journal-title":"Genome Res."},{"key":"2023012508033121200_B10","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1137\/0201004","article-title":"A simple algorithm for merging two disjoint linearly ordered sets","volume":"1","author":"Hwang","year":"1980","journal-title":"SIAM J. Comput."},{"key":"2023012508033121200_B11","doi-asserted-by":"crossref","first-page":"2395","DOI":"10.1093\/bioinformatics\/btn429","article-title":"SeqMap: mapping massive amount of oligonucleotides to the genome","volume":"24","author":"Jiang","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012508033121200_B12","first-page":"656","article-title":"BLAT\u2014the BLAST-like alignment tool","volume":"12","author":"Kent","year":"2002","journal-title":"Genome Research"},{"key":"2023012508033121200_B13","volume-title":"The Art of Computer Programming: Sorting and Searching","author":"Knuth","year":"1973"},{"key":"2023012508033121200_B14","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2009-10-3-r25","article-title":"Ultrafast and memory-efficient alignment of short dna sequences to the human genome","volume":"10","author":"Langmead","year":"2009","journal-title":"Genome Biology"},{"key":"2023012508033121200_B15","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows-Wheeler Transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012508033121200_B16","doi-asserted-by":"crossref","first-page":"1851","DOI":"10.1101\/gr.078212.108","article-title":"Mapping short DNA sequencing reads and calling variants using mapping quality scores","volume":"18","author":"Li","year":"2008","journal-title":"Genome Res."},{"key":"2023012508033121200_B17","doi-asserted-by":"crossref","first-page":"713","DOI":"10.1093\/bioinformatics\/btn025","article-title":"SOAP: short oligonucleotide alignment program","volume":"24","author":"Li","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012508033121200_B18","doi-asserted-by":"crossref","first-page":"1966","DOI":"10.1093\/bioinformatics\/btp336","article-title":"SOAP2: an improved ultrafast tool for short read alignment","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012508033121200_B19","doi-asserted-by":"crossref","first-page":"959","DOI":"10.1101\/gr.083451.108","article-title":"Finding the fifth base: Genome-wide sequencing of cytosine methylation","volume":"19","author":"Lister","year":"2009","journal-title":"Genome Research"},{"key":"2023012508033121200_B20","doi-asserted-by":"crossref","first-page":"935","DOI":"10.1137\/0222058","article-title":"Suffix arrays: a new method for on-line string searches","volume":"22","author":"Manber","year":"1993","journal-title":"SIAM J. Comput."},{"key":"2023012508033121200_B21","doi-asserted-by":"crossref","first-page":"2434","DOI":"10.1093\/bioinformatics\/btp403","article-title":"SNP-o-matic","volume":"25","author":"Manske","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012508033121200_B22","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1145\/1250734.1250746","article-title":"Valgrind: a framework for heavyweight dynamic binary instrumentation","volume-title":"Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation","author":"Nethercote","year":"2007"},{"key":"2023012508033121200_B23","doi-asserted-by":"crossref","first-page":"1725","DOI":"10.1101\/gr.194201","article-title":"SSAHA: a fast search method for large DNA databases","volume":"11","author":"Ning","year":"2001","journal-title":"Genome Res."},{"key":"2023012508033121200_B24","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1089\/cmb.2006.13.296","article-title":"Efficient q-gram filters for finding all \u03b5-matches over a given length","volume":"13","author":"Rasmussen","year":"2006","journal-title":"J. Comput. Biol."},{"key":"2023012508033121200_B25","doi-asserted-by":"crossref","first-page":"e1000386","DOI":"10.1371\/journal.pcbi.1000386","article-title":"SHRiMP: accurate mapping of color-space reads","volume":"5","author":"Rumble","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023012508033121200_B26","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/nar\/29.1.308","article-title":"dbSNP: the NCBI database of genetic variation","volume":"29","author":"Sherry","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023012508033121200_B27","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1186\/1471-2105-9-128","article-title":"Using quality scores and longer reads improves accuracy of Solexa read mapping","volume":"9","author":"Smith","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012508033121200_B28","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1093\/bioinformatics\/btp120","article-title":"TopHat: discovering splice junctions with RNA-Seq","volume":"25","author":"Trapnell","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012508033121200_B29","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1038\/nature07509","article-title":"Alternative isoform regulation in human tissue transcriptomes","volume":"456","author":"Wang","year":"2008","journal-title":"Nature"},{"key":"2023012508033121200_B30","doi-asserted-by":"crossref","first-page":"854","DOI":"10.1086\/342727","article-title":"Human diallelic insertion\/deletion polymorphisms","volume":"71","author":"Weber","year":"2002","journal-title":"Am. J. Hum. Genet."},{"key":"2023012508033121200_B31","doi-asserted-by":"crossref","first-page":"1646","DOI":"10.1101\/gr.088823.108","article-title":"RazerS\u2014fast read mapping with sensitivity control","volume":"19","author":"Weese","year":"2009","journal-title":"Genome Res."},{"key":"2023012508033121200_B32","doi-asserted-by":"crossref","first-page":"1859","DOI":"10.1093\/bioinformatics\/bti310","article-title":"GMAP: a genomic mapping and alignment program for mRNA and EST sequences","volume":"21","author":"Wu","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012508033121200_B33","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1089\/1066527041410418","article-title":"Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals","volume":"11","author":"Yeo","year":"2004","journal-title":"J. Comput. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/7\/873\/48855440\/bioinformatics_26_7_873.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/7\/873\/48855440\/bioinformatics_26_7_873.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T08:06:34Z","timestamp":1674633994000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/7\/873\/212606"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,2,10]]},"references-count":33,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2010,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq057","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,4,1]]},"published":{"date-parts":[[2010,2,10]]}}}