{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:45:10Z","timestamp":1740185110067,"version":"3.37.3"},"reference-count":15,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2022,1,3]],"date-time":"2022-01-03T00:00:00Z","timestamp":1641168000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01HG007834","P50 HG007735"],"award-info":[{"award-number":["R01HG007834","P50 HG007735"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000289","name":"Cancer Research UK","doi-asserted-by":"publisher","award":["C355\/A26819"],"award-info":[{"award-number":["C355\/A26819"]}],"id":[{"id":"10.13039\/501100000289","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,3,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Isoform deconvolution is an NP-hard problem. The accuracy of the proposed solutions is far from perfect. At present, it is not known if gene structure and isoform concentration can be uniquely inferred given paired-end reads, and there is no objective method to select the fragment length to improve the number of identifiable genes. Different pieces of evidence suggest that the optimal fragment length is gene-dependent, stressing the need for a method that selects the fragment length according to a reasonable trade-off across all the genes in the whole genome.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>A gene is considered to be identifiable if it is possible to get both the structure and concentration of its transcripts univocally. Here, we present a method to state the identifiability of this deconvolution problem. Assuming a given transcriptome and that the coverage is sufficient to interrogate all junction reads of the transcripts, this method states whether or not a gene is identifiable given the read length and fragment length distribution. Applying this method using different read and fragment length combinations, the optimal average fragment length for the human transcriptome is around 400\u2013600\u00a0nt for coding genes and 150\u2013200\u00a0nt for long non-coding RNAs. The optimal read length is the largest one that fits in the fragment length. It is also discussed the potential profit of combining several libraries to reconstruct the transcriptome. Combining two libraries of very different fragment lengths results in a significant improvement in gene identifiability.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Code is available in GitHub (https:\/\/github.com\/JFerrer-B\/transcriptome-identifiability).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab873","type":"journal-article","created":{"date-parts":[[2021,12,30]],"date-time":"2021-12-30T12:23:14Z","timestamp":1640866994000},"page":"1491-1496","source":"Crossref","is-referenced-by-count":0,"title":["On the identifiability of the isoform deconvolution problem: application to select the proper fragment length in an RNA-seq library"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1588-2195","authenticated-orcid":false,"given":"Juan A","family":"Ferrer-Bonsoms","sequence":"first","affiliation":[{"name":"Department of Biomedical Engineering and Sciences, TECNUN, University of Navarra , Pamplona, Spain"}]},{"given":"Xabier","family":"Morales","sequence":"additional","affiliation":[{"name":"Department of Biomedical Engineering and Sciences, TECNUN, University of Navarra , Pamplona, Spain"}]},{"given":"Pegah T","family":"Afshar","sequence":"additional","affiliation":[{"name":"Department of Statistics, Stanford University , Stanford, CA 94305-4020, USA"}]},{"given":"Wing H","family":"Wong","sequence":"additional","affiliation":[{"name":"Department of Statistics, Stanford University , Stanford, CA 94305-4020, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3274-2450","authenticated-orcid":false,"given":"Angel","family":"Rubio","sequence":"additional","affiliation":[{"name":"Department of Biomedical Engineering and Sciences, TECNUN, University of Navarra , Pamplona, Spain"}]}],"member":"286","published-online":{"date-parts":[[2022,1,3]]},"reference":[{"key":"2023020108580062700_btab873-B1","doi-asserted-by":"crossref","first-page":"2447","DOI":"10.1093\/bioinformatics\/btu317","article-title":"Efficient RNA isoform identification and quantification from RNA-Seq data with network flows","volume":"30","author":"Bernard","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020108580062700_btab873-B2","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1038\/nbt.2850","article-title":"Genome-guided transcript assembly by integrative analysis of RNA sequence data","volume":"32","author":"Boley","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023020108580062700_btab873-B3","doi-asserted-by":"crossref","first-page":"S181","DOI":"10.1093\/bioinformatics\/18.suppl_1.S181","article-title":"Splicing graphs and EST assembly problem","volume":"18","author":"Heber","year":"2002","journal-title":"Bioinformatics"},{"key":"2023020108580062700_btab873-B4","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1093\/bioinformatics\/btp113","article-title":"Statistical inferences for isoform expression in RNA-Seq","volume":"25","author":"Jiang","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020108580062700_btab873-B5","doi-asserted-by":"crossref","first-page":"1731","DOI":"10.1038\/ng.3988","article-title":"High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing","volume":"49","author":"Lagarde","year":"2017","journal-title":"Nat. Genet"},{"key":"2023020108580062700_btab873-B6","doi-asserted-by":"crossref","first-page":"1693","DOI":"10.1089\/cmb.2011.0171","article-title":"IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly","volume":"18","author":"Li","year":"2011","journal-title":"J. Comput. Biol"},{"key":"2023020108580062700_btab873-B7","doi-asserted-by":"crossref","first-page":"2052","DOI":"10.1093\/bib\/bbz126","article-title":"Systematic evaluation of differential splicing tools for RNA-seq studies","volume":"21","author":"Mehmood","year":"2020","journal-title":"Brief. Bioinform"},{"key":"2023020108580062700_btab873-B8","doi-asserted-by":"crossref","first-page":"60","DOI":"10.3389\/fgene.2020.00606","article-title":"Methodologies for Transcript Profiling Using Long-Read Technologies","volume":"11","author":"Oikonomopoulos","year":"2020","journal-title":"Front. Genet"},{"key":"2023020108580062700_btab873-B9","doi-asserted-by":"crossref","first-page":"3328","DOI":"10.1038\/s41467-020-17009-7","article-title":"Comprehensive identification of mRNA isoforms reveals the diversity of neural cell-surface molecules with roles in retinal development and disease","volume":"11","author":"Ray","year":"2020","journal-title":"Nat. Commun"},{"key":"2023020108580062700_btab873-B10","doi-asserted-by":"crossref","first-page":"703","DOI":"10.1186\/s12864-018-5082-2","article-title":"Comparison of RNA-seq and microarray platforms for splice event detection using a cross-platform algorithm","volume":"19","author":"Romero","year":"2018","journal-title":"BMC Genomics"},{"key":"2023020108580062700_btab873-B11","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/s41467-017-00050-4","article-title":"Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis","volume":"8","author":"Sahraeian","year":"2017","journal-title":"Nat. Commun"},{"key":"2023020108580062700_btab873-B12","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1214\/10-STS343","article-title":"Statistical modeling of RNA-seq data","volume":"26","author":"Salzman","year":"2011","journal-title":"Stat. Sci"},{"key":"2023020108580062700_btab873-B13","doi-asserted-by":"crossref","first-page":"1177","DOI":"10.1038\/nmeth.2714","article-title":"Assessment of transcript reconstruction methods for RNA-seq","volume":"10","author":"Steijger","year":"2013","journal-title":"Nat. Methods"},{"key":"2023020108580062700_btab873-B14","doi-asserted-by":"crossref","first-page":"2856","DOI":"10.1038\/s41598-019-39076-7","article-title":"Long fragments achieve lower base quality in Illumina paired-end sequencing","volume":"9","author":"Tan","year":"2019","journal-title":"Sci. Rep"},{"key":"2023020108580062700_btab873-B15","doi-asserted-by":"crossref","first-page":"2216","DOI":"10.1093\/bioinformatics\/btx128","article-title":"RMATS-DVR: RMATS discovery of differential variants in RNA","volume":"33","author":"Wang","year":"2017","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab873\/42111969\/btab873.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/6\/1491\/49008437\/btab873.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/6\/1491\/49008437\/btab873.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T20:30:43Z","timestamp":1675283443000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/6\/1491\/6493231"}},"subtitle":[],"editor":[{"given":"Jan","family":"Gorodkin","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,1,3]]},"references-count":15,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2022,3,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab873","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2022,3,15]]},"published":{"date-parts":[[2022,1,3]]}}}