{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,30]],"date-time":"2025-09-30T04:09:38Z","timestamp":1759205378469,"version":"3.37.3"},"reference-count":19,"publisher":"Oxford University Press (OUP)","issue":"14","license":[{"start":{"date-parts":[[2017,7,12]],"date-time":"2017-07-12T00:00:00Z","timestamp":1499817600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["BBSRC-NSF\/BIO-1564917"],"award-info":[{"award-number":["BBSRC-NSF\/BIO-1564917"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Many methods for transcript-level abundance estimation reduce the computational burden associated with the iterative algorithms they use by adopting an approximate factorization of the likelihood function they optimize. This leads to considerably faster convergence of the optimization procedure, since each round of e.g. the EM algorithm, can execute much more quickly. However, these approximate factorizations of the likelihood function simplify calculations at the expense of discarding certain information that can be useful for accurate transcript abundance estimation.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We demonstrate that model simplifications (i.e. factorizations of the likelihood function) adopted by certain abundance estimation methods can lead to a diminished ability to accurately estimate the abundances of highly related transcripts. In particular, considering factorizations based on transcript-fragment compatibility alone can result in a loss of accuracy compared to the per-fragment, unsimplified model. However, we show that such shortcomings are not an inherent limitation of approximately factorizing the underlying likelihood function. By considering the appropriate conditional fragment probabilities, and adopting improved, data-driven factorizations of this likelihood, we demonstrate that such approaches can achieve accuracy nearly indistinguishable from methods that consider the complete (i.e. per-fragment) likelihood, while retaining the computational efficiently of the compatibility-based factorizations.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Our data-driven factorizations are incorporated into a branch of the Salmon transcript quantification tool: https:\/\/github.com\/COMBINE-lab\/salmon\/tree\/factorizations.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx262","type":"journal-article","created":{"date-parts":[[2017,4,20]],"date-time":"2017-04-20T07:52:13Z","timestamp":1492674733000},"page":"i142-i151","source":"Crossref","is-referenced-by-count":26,"title":["Improved data-driven likelihood factorizations for transcript abundance estimation"],"prefix":"10.1093","volume":"33","author":[{"given":"Mohsen","family":"Zakeri","sequence":"first","affiliation":[{"name":"Department of Computer Science, Stony Brook University, Stony Brook, NY, USA"}]},{"given":"Avi","family":"Srivastava","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stony Brook University, Stony Brook, NY, USA"}]},{"given":"Fatemeh","family":"Almodaresi","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stony Brook University, Stony Brook, NY, USA"}]},{"given":"Rob","family":"Patro","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stony Brook University, Stony Brook, NY, USA"}]}],"member":"286","published-online":{"date-parts":[[2017,7,12]]},"reference":[{"key":"2023051506481759400_btx262-B1","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1038\/nbt.3519","article-title":"Near-optimal probabilistic RNA-seq quantification","volume":"34","author":"Bray","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2023051506481759400_btx262-B2","doi-asserted-by":"crossref","first-page":"903","DOI":"10.1038\/nbt.2957","article-title":"A comprehensive assessment of rna-seq accuracy, reproducibility and information content by the sequencing quality control consortium","volume":"32","author":"Consortium","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023051506481759400_btx262-B3","doi-asserted-by":"crossref","first-page":"2778","DOI":"10.1093\/bioinformatics\/btv272","article-title":"Polyester: simulating RNA-seq datasets with differential transcript expression","volume":"31","author":"Frazee","year":"2015","journal-title":"Bioinformatics"},{"key":"2023051506481759400_btx262-B4","doi-asserted-by":"crossref","first-page":"1721","DOI":"10.1093\/bioinformatics\/bts260","article-title":"Identifying differentially expressed transcripts from RNA-seq data with biological variation","volume":"28","author":"Glaus","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051506481759400_btx262-B5","doi-asserted-by":"crossref","first-page":"3881","DOI":"10.1093\/bioinformatics\/btv483","article-title":"Fast and accurate approximate inference of transcript expression from RNA-seq data","volume":"31","author":"Hensman","year":"2015","journal-title":"Bioinformatics"},{"key":"2023051506481759400_btx262-B6","doi-asserted-by":"crossref","first-page":"506","DOI":"10.1038\/nature12531","article-title":"Transcriptome and genome sequencing uncovers functional variation in humans","volume":"501","author":"Lappalainen","year":"2013","journal-title":"Nature"},{"key":"2023051506481759400_btx262-B7","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1186\/1471-2105-12-323","article-title":"RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome","volume":"12","author":"Li","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023051506481759400_btx262-B8","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1093\/bioinformatics\/btp692","article-title":"RNA-Seq gene expression estimation with read mapping uncertainty","volume":"26","author":"Li","year":"2010","journal-title":"Bioinformatics"},{"key":"2023051506481759400_btx262-B9","first-page":"btt381.","article-title":"TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference","author":"Nariai","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051506481759400_btx262-B10","doi-asserted-by":"crossref","first-page":"S5.","DOI":"10.1186\/1471-2164-15-S10-S5","article-title":"TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq readsonline","volume":"15","author":"Nariai","year":"2014","journal-title":"BMC Genomics"},{"key":"2023051506481759400_btx262-B11","doi-asserted-by":"crossref","first-page":"9.","DOI":"10.1186\/1748-7188-6-9","article-title":"Estimation of alternative splicing isoform frequencies from RNA-Seq data","volume":"6","author":"Nicolae","year":"2011","journal-title":"Algorithms Mol. Biol"},{"key":"2023051506481759400_btx262-B12","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1038\/nbt.2862","article-title":"Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms","volume":"32","author":"Patro","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023051506481759400_btx262-B13","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1038\/nmeth.4197","article-title":"Salmon provides fast and bias-aware quantification of transcript expression","volume":"14","author":"Patro","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051506481759400_btx262-B14","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1038\/nmeth.2251","article-title":"Streaming fragment assignment for real-time analysis of sequencing experiments","volume":"10","author":"Roberts","year":"2013","journal-title":"Nat. Methods"},{"key":"2023051506481759400_btx262-B15","first-page":"62","article-title":"Statistical modeling of RNA-Seq data","volume":"26","author":"Salzman","year":"2011","journal-title":"Stat. Sci. Rev. J. Inst. Math. Stat"},{"key":"2023051506481759400_btx262-B16","doi-asserted-by":"crossref","first-page":"i192","DOI":"10.1093\/bioinformatics\/btw277","article-title":"RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes","volume":"32","author":"Srivastava","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051506481759400_btx262-B17","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1038\/nbt.1621","article-title":"Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation","volume":"28","author":"Trapnell","year":"2010","journal-title":"Nat. Biotechnol"},{"key":"2023051506481759400_btx262-B18","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1186\/gb-2011-12-2-r13","article-title":"Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads","volume":"12","author":"Turro","year":"2011","journal-title":"Genome Biol"},{"key":"2023051506481759400_btx262-B19","first-page":"gkv1157.","article-title":"Ensembl 2016","author":"Yates","year":"2015","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/14\/i142\/50315087\/bioinformatics_33_14_i142.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/14\/i142\/50315087\/bioinformatics_33_14_i142.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T06:48:43Z","timestamp":1684133323000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/14\/i142\/3953977"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,7,12]]},"references-count":19,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2017,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx262","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2017,7,15]]},"published":{"date-parts":[[2017,7,12]]}}}