{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:38:34Z","timestamp":1740184714131,"version":"3.37.3"},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2018,9,5]],"date-time":"2018-09-05T00:00:00Z","timestamp":1536105600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000054","name":"National Cancer Institute","doi-asserted-by":"publisher","award":["R00 CA168987"],"award-info":[{"award-number":["R00 CA168987"]}],"id":[{"id":"10.13039\/100000054","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000057","name":"National Institute of General Medical Sciences","doi-asserted-by":"publisher","award":["R01 GM116847"],"award-info":[{"award-number":["R01 GM116847"]}],"id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["MCB-1552196"],"award-info":[{"award-number":["MCB-1552196"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000879","name":"Alfred P. Sloan Foundation","doi-asserted-by":"publisher","award":["R25 CA180993"],"award-info":[{"award-number":["R25 CA180993"]}],"id":[{"id":"10.13039\/100000879","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,4,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Identification of splice sites is critical to gene annotation and to determine which sequences control circRNA biogenesis. Full-length RNA transcripts could in principle complete annotations of introns and exons in genomes without external ontologies, i.e., ab initio. However, whether it is possible to reconstruct genomic positions where splicing occurs from full-length transcripts, even if sampled in the absence of noise, depends on the genome sequence composition. If it is not, there exist provable limits on the use of RNA-Seq to define splice locations (linear or circular) in the genome.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We provide a formal definition of splice site ambiguity due to the genomic sequence by introducing equivalent junction, which is the set of local genomic positions resulting in the same RNA sequence when joined through RNA splicing. We show that equivalent junctions are prevalent in diverse eukaryotic genomes and occur in 88.64% and 78.64% of annotated human splice sites in linear and circRNA junctions, respectively. The observed fractions of equivalent junctions and the frequency of many individual motifs are statistically significant when compared against the null distribution computed via simulation or closed-form. The frequency of equivalent junctions establishes a fundamental limit on the possibility of ab initio reconstruction of RNA transcripts without appealing to the ontology of \u201cGT-AG\u201d boundaries defining introns. Said differently, completely ab initio is impossible in the vast majority of splice sites in annotated circRNAs and linear transcripts.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Two python scripts generating an equivalent junction sequence per junction are available at: https:\/\/github.com\/salzmanlab\/Equivalent-Junctions.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty785","type":"journal-article","created":{"date-parts":[[2018,9,4]],"date-time":"2018-09-04T19:27:16Z","timestamp":1536089236000},"page":"1263-1268","source":"Crossref","is-referenced-by-count":8,"title":["Ambiguous splice sites distinguish circRNA and linear splicing in the human genome"],"prefix":"10.1093","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7413-3437","authenticated-orcid":false,"given":"Roozbeh","family":"Dehghannasiri","sequence":"first","affiliation":[{"name":"Department of Biochemistry, Stanford University, Stanford, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Linda","family":"Szabo","sequence":"additional","affiliation":[{"name":"Department of Biomedical Data Science, Stanford University, Stanford, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Julia","family":"Salzman","sequence":"additional","affiliation":[{"name":"Department of Biochemistry, Stanford University, Stanford, CA, USA"},{"name":"Department of Biomedical Data Science, Stanford University, Stanford, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2018,9,5]]},"reference":[{"key":"2023012810013390800_bty785-B1","doi-asserted-by":"crossref","first-page":"e1007114.","DOI":"10.1371\/journal.pgen.1007114","article-title":"ciRS-7 exonic sequence is embedded in a long non-coding RNA locus","volume":"13","author":"Barrett","year":"2017","journal-title":"PLoS Genet"},{"key":"2023012810013390800_bty785-B2","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1006\/jmbi.1997.0951","article-title":"Prediction of complete gene structures in human genomic DNA","volume":"268","author":"Burge","year":"1997","journal-title":"J. Mol. Biol"},{"key":"2023012810013390800_bty785-B3","first-page":"525","volume-title":"Splicing of Precursors to mRNAs by the Spliceosomes","author":"Burge","year":"1999"},{"key":"2023012810013390800_bty785-B4","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/S0959-440X(98)80069-9","article-title":"Finding the genes in genomic DNA","volume":"8","author":"Burge","year":"1998","journal-title":"Current Opin. Struct. Biol"},{"key":"2023012810013390800_bty785-B5","doi-asserted-by":"crossref","first-page":"S2.","DOI":"10.1186\/1471-2105-16-S9-S2","article-title":"Alternative splicing detection workflow needs a careful combination of sample prep and bioinformatics analysis","volume":"16","author":"Carrara","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023012810013390800_bty785-B6","first-page":"1.","article-title":"Uncovering the complexity of transcriptomes with RNA-Seq","volume":"2010","author":"Costa","year":"2010","journal-title":"BioMed. Res. Int"},{"key":"2023012810013390800_bty785-B7","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1242\/dmm.000331","article-title":"Aberrant RNA splicing and its functional consequences in cancer cells","volume":"1","author":"Fackenthal","year":"2008","journal-title":"Disease Models Mech"},{"key":"2023012810013390800_bty785-B8","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1007\/s00439-013-1411-3","article-title":"Genomics of alternative splicing: evolution, development and pathophysiology","volume":"133","author":"Gamazon","year":"2014","journal-title":"Hum. Genet"},{"key":"2023012810013390800_bty785-B9","doi-asserted-by":"crossref","first-page":"1666","DOI":"10.1261\/rna.043687.113","article-title":"circBase: a database for circular RNAs","volume":"20","author":"Gla\u017ear","year":"2014","journal-title":"RNA"},{"key":"2023012810013390800_bty785-B10","doi-asserted-by":"crossref","first-page":"644","DOI":"10.1038\/nbt.1883","article-title":"Full-length transcriptome assembly from RNA-Seq data without a reference genome","volume":"29","author":"Grabherr","year":"2011","journal-title":"Nat. Biotechnol"},{"key":"2023012810013390800_bty785-B11","doi-asserted-by":"crossref","first-page":"364.","DOI":"10.1186\/s12859-014-0364-4","article-title":"Comparisons of computational methods for differential alternative splicing detection using RNA-Seq in plant systems","volume":"15","author":"Liu","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023012810013390800_bty785-B12","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1016\/j.gpb.2016.05.004","article-title":"Oxford Nanopore MinION sequencing and genome assembly","volume":"14","author":"Lu","year":"2016","journal-title":"Genomics, Prot. Bioinformatics"},{"key":"2023012810013390800_bty785-B13","doi-asserted-by":"crossref","first-page":"1107","DOI":"10.1093\/nar\/26.4.1107","article-title":"GeneMark.hmm: new solutions for gene finding","volume":"26","author":"Lukashin","year":"1998","journal-title":"Nucleic Acids Res"},{"key":"2023012810013390800_bty785-B14","doi-asserted-by":"crossref","first-page":"4103","DOI":"10.1093\/nar\/gkf543","article-title":"Current methods of gene prediction, their strengths and weaknesses","volume":"30","author":"Math\u00e9","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023012810013390800_bty785-B15","doi-asserted-by":"crossref","first-page":"1413","DOI":"10.1038\/ng.259","article-title":"Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing","volume":"40","author":"Pan","year":"2008","journal-title":"Nat. Genet"},{"key":"2023012810013390800_bty785-B16","doi-asserted-by":"crossref","first-page":"R95.","DOI":"10.1186\/gb-2013-14-9-r95","article-title":"Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data","volume":"14","author":"Rapaport","year":"2013","journal-title":"Genome Biol"},{"key":"2023012810013390800_bty785-B17","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1101\/gad.209759.112","article-title":"Pick one, but be quick: 5\u2019 splice sites and the problems of too many choices","volume":"27","author":"Roca","year":"2013","journal-title":"Genes Dev"},{"key":"2023012810013390800_bty785-B18","doi-asserted-by":"crossref","first-page":"698","DOI":"10.1016\/j.cell.2015.09.054","article-title":"Learning the sequence determinants of alternative splicing from millions of random sequences","volume":"163","author":"Rosenberg","year":"2015","journal-title":"Cell"},{"key":"2023012810013390800_bty785-B19","doi-asserted-by":"crossref","first-page":"e30733.","DOI":"10.1371\/journal.pone.0030733","article-title":"Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types","volume":"7","author":"Salzman","year":"2012","journal-title":"PloS One"},{"key":"2023012810013390800_bty785-B20","doi-asserted-by":"crossref","first-page":"1124","DOI":"10.1016\/0022-2836(92)90320-J","article-title":"Features of spliceosome evolution and function inferred from an analysis of the information at human splice sites","volume":"228","author":"Stephens","year":"1992","journal-title":"J. Mol. Biol"},{"key":"2023012810013390800_bty785-B21","doi-asserted-by":"crossref","first-page":"2413.","DOI":"10.1038\/onc.2015.318","article-title":"Aberrant RNA splicing in cancer; expression changes and driver mutations of splicing factor genes","volume":"35","author":"Sveen","year":"2016","journal-title":"Oncogene"},{"key":"2023012810013390800_bty785-B22","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1038\/nrg.2016.114","article-title":"Detecting circular RNAs: bioinformatic and experimental challenges","volume":"17","author":"Szabo","year":"2016","journal-title":"Nat. Rev. Genet"},{"key":"2023012810013390800_bty785-B23","doi-asserted-by":"crossref","first-page":"74.","DOI":"10.1186\/s13059-016-0940-1","article-title":"A benchmark for RNA-seq quantification pipelines","volume":"17","author":"Teng","year":"2016","journal-title":"Genome Biol"},{"key":"2023012810013390800_bty785-B24","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1093\/bioinformatics\/btp120","article-title":"Tophat: discovering splice junctions with RNA-Seq","volume":"25","author":"Trapnell","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012810013390800_bty785-B25","doi-asserted-by":"crossref","first-page":"e90859.","DOI":"10.1371\/journal.pone.0090859","article-title":"Circular RNA is expressed across the eukaryotic tree of life","volume":"9","author":"Wang","year":"2014","journal-title":"PLoS One"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/8\/1263\/48940770\/bioinformatics_35_8_1263.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/8\/1263\/48940770\/bioinformatics_35_8_1263.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,28]],"date-time":"2023-01-28T10:08:43Z","timestamp":1674900523000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/8\/1263\/5091181"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2018,9,5]]},"references-count":25,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2019,4,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty785","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2019,4,15]]},"published":{"date-parts":[[2018,9,5]]}}}