{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T21:54:26Z","timestamp":1776376466501,"version":"3.51.2"},"reference-count":20,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2757,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or \u2018reads\u2019, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.<\/jats:p>\n               <jats:p>Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development.<\/jats:p>\n               <jats:p>Availability: TopHat is free, open-source software available from http:\/\/tophat.cbcb.umd.edu<\/jats:p>\n               <jats:p>Contact: \u00a0cole@cs.umd.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp120","type":"journal-article","created":{"date-parts":[[2009,3,17]],"date-time":"2009-03-17T00:34:13Z","timestamp":1237250053000},"page":"1105-1111","source":"Crossref","is-referenced-by-count":10467,"title":["TopHat: discovering splice junctions with RNA-Seq"],"prefix":"10.1093","volume":"25","author":[{"given":"Cole","family":"Trapnell","sequence":"first","affiliation":[{"name":"1 Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742 and 2Department of Mathematics, University of California, Berkeley, CA 94720, USA"}]},{"given":"Lior","family":"Pachter","sequence":"additional","affiliation":[{"name":"1 Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742 and 2Department of Mathematics, University of California, Berkeley, CA 94720, USA"}]},{"given":"Steven L.","family":"Salzberg","sequence":"additional","affiliation":[{"name":"1 Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742 and 2Department of Mathematics, University of California, Berkeley, CA 94720, USA"}]}],"member":"286","published-online":{"date-parts":[[2009,3,16]]},"reference":[{"key":"2023013110280858900_B1","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/S1570-8667(03)00065-0","article-title":"Replacing suffix trees with enhanced suffix arrays","volume":"2","author":"Abouelhoda","year":"2004","journal-title":"J. Discrete Alg."},{"key":"2023013110280858900_B2","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1038\/ng0893-373","article-title":"Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library","volume":"4","author":"Adams","year":"1993","journal-title":"Nat. Genet."},{"key":"2023013110280858900_B3","article-title":"A block sorting lossless data compression algorithm","volume-title":"Technical Report 124.","author":"Burrows","year":"1994"},{"key":"2023013110280858900_B4","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1038\/nmeth.1223","article-title":"Stem cell transcriptome profiling via massive-scale mRNA sequencing","volume":"5","author":"Cloonan","year":"2008","journal-title":"Nat. Meth."},{"key":"2023013110280858900_B5","doi-asserted-by":"crossref","first-page":"i174","DOI":"10.1093\/bioinformatics\/btn300","article-title":"Optimal spliced alignments of short sequence reads","volume":"24","author":"De Bona","year":"2008","journal-title":"Bioinformatics"},{"key":"2023013110280858900_B6","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1186\/1471-2105-9-11","article-title":"Seqan an efficient, generic c++library for sequence analysis","volume":"9","author":"D\u00f6ring","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023013110280858900_B7","first-page":"269","article-title":"An experimental study of an opportunistic index","volume-title":"Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms.","author":"Ferragina","year":"2001"},{"key":"2023013110280858900_B8","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1038\/nmeth.1179","article-title":"Whole-genome sequencing and variant discovery in C. elegans","volume":"5","author":"Hillier","year":"2008","journal-title":"Nat. Meth."},{"key":"2023013110280858900_B9","first-page":"656","article-title":"Blat\u2014the blast-like alignment tool","volume":"12","author":"Kent","year":"2002","journal-title":"Genome Res."},{"key":"2023013110280858900_B10","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2009-10-3-r25","article-title":"Ultrafast and memory-efficient alignment of short DNA sequences to the human genome","volume":"10","author":"Langmead","year":"2009","journal-title":"Genome Biol."},{"key":"2023013110280858900_B11","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1186\/1471-2105-7-169","article-title":"Alttrans: transcript pattern variants annotated for both alternative splicing and alternative polyadenylation","volume":"7","author":"Le Texier","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023013110280858900_B12","doi-asserted-by":"crossref","first-page":"1851","DOI":"10.1101\/gr.078212.108","article-title":"Mapping short dna sequencing reads and calling variants using mapping quality scores","volume":"18","author":"Li","year":"2008","journal-title":"Genome Res."},{"key":"2023013110280858900_B13","doi-asserted-by":"crossref","first-page":"1509","DOI":"10.1101\/gr.079558.108","article-title":"RNA-Seq: an assessment of technical reproducibility and comparison with gene expression arrays","volume":"18","author":"Marioni","year":"2008","journal-title":"Genome Res."},{"key":"2023013110280858900_B14","doi-asserted-by":"crossref","first-page":"12856","DOI":"10.1073\/pnas.93.23.12856","article-title":"Isolation of a brefeldin A-inhibited guanine nucleotide-exchange protein for ADP ribosylation factor (ARF) 1 and ARF3 that contains a Sec7-like domain","volume":"93","author":"Morinaga","year":"1996","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013110280858900_B15","doi-asserted-by":"crossref","first-page":"621","DOI":"10.1038\/nmeth.1226","article-title":"Mapping and quantifying mammalian transcriptomes by RNA-Seq","volume":"5","author":"Mortazavi","year":"2008","journal-title":"Nat. Meth."},{"key":"2023013110280858900_B16","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.tig.2006.10.003","article-title":"Intron size in mammals: complexity comes to terms with economy","volume":"23","author":"Pozzoli","year":"2007","journal-title":"Trends Genet."},{"key":"2023013110280858900_B17","doi-asserted-by":"crossref","first-page":"956","DOI":"10.1126\/science.1160342","article-title":"A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome","volume":"321","author":"Sultan","year":"2008","journal-title":"Science"},{"key":"2023013110280858900_B18","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1038\/nature07509","article-title":"Alternative isoform regulation in human tissue transcriptomes","volume":"456","author":"Wang","year":"2008","journal-title":"Nature"},{"key":"2023013110280858900_B19","doi-asserted-by":"crossref","first-page":"1859","DOI":"10.1093\/bioinformatics\/bti310","article-title":"GMAP: a genomic mapping and alignment program for mRNA and EST sequences","volume":"21","author":"Wu","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013110280858900_B20","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/9\/1105\/48984105\/bioinformatics_25_9_1105.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/9\/1105\/48984105\/bioinformatics_25_9_1105.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T20:34:25Z","timestamp":1675197265000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/9\/1105\/203994"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,3,16]]},"references-count":20,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2009,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp120","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,5,1]]},"published":{"date-parts":[[2009,3,16]]}}}