{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,31]],"date-time":"2026-07-31T02:19:40Z","timestamp":1785464380314,"version":"3.56.0"},"reference-count":56,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T00:00:00Z","timestamp":1752537600000},"content-version":"vor","delay-in-days":14,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01 HG009937"],"award-info":[{"award-number":["R01 HG009937"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF-1750472"],"award-info":[{"award-number":["CCF-1750472"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CNS-1763680"],"award-info":[{"award-number":["CNS-1763680"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["252586"],"award-info":[{"award-number":["252586"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["2024342821"],"award-info":[{"award-number":["2024342821"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Chan Zuckerberg Initiative DAF"},{"DOI":"10.13039\/100000923","name":"Silicon Valley Community Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000923","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Chan Zuckerberg Initiative Foundation"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Long-read sequencing technology is becoming an increasingly indispensable tool in genomic and transcriptomic analysis. In transcriptomics in particular, long reads offer the possibility of sequencing full-length isoforms, which can vastly simplify the identification of novel transcripts and transcript quantification. However, despite this promise, the focus of much long-read method development to date has been on transcript identification, with comparatively little attention paid to quantification. Yet, due to differences in the underlying protocols and technologies, lower throughput (i.e. fewer reads sequenced per sample compared to short-read technologies), as well as technical artifacts, long-read quantification remains a challenge, motivating the continued development and assessment of quantification methods tailored to this increasingly prevalent type of data.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We introduce a new method and corresponding user-friendly software tool for long-read transcript quantification called oarfish. Our model incorporates a novel coverage score, which affects the conditional probability of fragment assignment in the underlying probabilistic model. We demonstrate, in both simulated and experimental data, that by accounting for this coverage information, oarfish is able to produce more accurate quantification estimates than existing long-read quantification tools.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and Implementation<\/jats:title>\n                  <jats:p>oarfish is implemented in the Rust programming language and is made available as free and open-source software under the BSD 3-clause license. The source code is available at https:\/\/www.github.com\/COMBINE-lab\/oarfish.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf240","type":"journal-article","created":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T13:02:30Z","timestamp":1752584550000},"page":"i304-i313","source":"Crossref","is-referenced-by-count":27,"title":["<tt>Oarfish<\/tt>: enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-6877-4934","authenticated-orcid":false,"given":"Zahra","family":"Zare Jousheghani","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Maryland , College Park, MD 20742,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3721-2157","authenticated-orcid":false,"given":"Noor Pratap","family":"Singh","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Maryland , College Park, MD 20742,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8463-1675","authenticated-orcid":false,"given":"Rob","family":"Patro","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Maryland , College Park, MD 20742,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2025,7,15]]},"reference":[{"key":"2025071509022759400_btaf240-B1","doi-asserted-by":"publisher","first-page":"582","DOI":"10.1038\/s41587-023-01815-7","article-title":"High-throughput RNA isoform sequencing using programmed cDNA concatenation","volume":"42","author":"Al\u2019Khafaji","year":"2023","journal-title":"Nat Biotechnol"},{"key":"2025071509022759400_btaf240-B2","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1186\/s13059-020-1935-5","article-title":"Opportunities and challenges in long-read sequencing data analysis","volume":"21","author":"Amarasinghe","year":"2020","journal-title":"Genome Biol"},{"key":"2025071509022759400_btaf240-B3","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1186\/1471-2164-7-246","article-title":"Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach","volume":"7","author":"Bainbridge","year":"2006","journal-title":"BMC Genomics"},{"key":"2025071509022759400_btaf240-B4","doi-asserted-by":"crossref","first-page":"e13","DOI":"10.1093\/nar\/gkad1167","article-title":"Dividing out quantification uncertainty allows efficient assessment of differential transcript expression with edgeR","volume":"52","author":"Baldoni","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025071509022759400_btaf240-B5","doi-asserted-by":"publisher","first-page":"525","DOI":"10.1038\/nbt.3519","article-title":"Near-optimal probabilistic RNA-seq quantification","volume":"34","author":"Bray","year":"2016","journal-title":"Nat Biotechnol"},{"key":"2025071509022759400_btaf240-B6","doi-asserted-by":"crossref","first-page":"16027","DOI":"10.1038\/ncomms16027","article-title":"Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells","volume":"8","author":"Byrne","year":"2017","journal-title":"Nat Commun"},{"key":"2025071509022759400_btaf240-B7","doi-asserted-by":"publisher","first-page":"801","DOI":"10.1038\/s41592-025-02623-4","article-title":"A systematic benchmark of nanopore long-read RNA sequencing for transcript-level analysis in human cell lines","volume":"22","author":"Chen","year":"2025","journal-title":"Nat Methods"},{"key":"2025071509022759400_btaf240-B8","doi-asserted-by":"publisher","first-page":"1187","DOI":"10.1038\/s41592-023-01908-w","article-title":"Context-aware transcript quantification from long-read RNA-seq data with bambu","volume":"20","author":"Chen","year":"2023","journal-title":"Nat Methods"},{"key":"2025071509022759400_btaf240-B9","doi-asserted-by":"crossref","first-page":"giab008","DOI":"10.1093\/gigascience\/giab008","article-title":"Twelve years of SAMtools and BCFtools","volume":"10","author":"Danecek","year":"2021","journal-title":"Gigascience"},{"key":"2025071509022759400_btaf240-B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J R Stat Soc B (Methodol)"},{"key":"2025071509022759400_btaf240-B11","doi-asserted-by":"crossref","first-page":"eabq5072","DOI":"10.1126\/sciadv.abq5072","article-title":"ESPRESSO: robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data","volume":"9","author":"Gao","year":"2023","journal-title":"Sci Adv"},{"key":"2025071509022759400_btaf240-B12","doi-asserted-by":"publisher","first-page":"1721","DOI":"10.1093\/bioinformatics\/bts260","article-title":"Identifying differentially expressed transcripts from RNA-seq data with biological variation","volume":"28","author":"Glaus","year":"2012","journal-title":"Bioinformatics"},{"key":"2025071509022759400_btaf240-B13","doi-asserted-by":"crossref","first-page":"e19","DOI":"10.1093\/nar\/gkab1129","article-title":"Accurate expression quantification from nanopore direct RNA sequencing with NanoCount","volume":"50","author":"Gleeson","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2025071509022759400_btaf240-B14","author":"Guhlin","year":"2024"},{"key":"2025071509022759400_btaf240-B15","doi-asserted-by":"publisher","first-page":"3881","DOI":"10.1093\/bioinformatics\/btv483","article-title":"Fast and accurate approximate inference of transcript expression from RNA-seq data","volume":"31","author":"Hensman","year":"2015","journal-title":"Bioinformatics"},{"key":"2025071509022759400_btaf240-B16","doi-asserted-by":"crossref","first-page":"107537","DOI":"10.1016\/j.biotechadv.2020.107537","article-title":"Library preparation for next generation sequencing: a review of automation strategies","volume":"41","author":"Hess","year":"2020","journal-title":"Biotechnol Adv"},{"key":"2025071509022759400_btaf240-B17","doi-asserted-by":"crossref","first-page":"801","DOI":"10.1016\/j.humimm.2021.02.012","article-title":"Next-generation sequencing technologies: an overview","volume":"82","author":"Hu","year":"2021","journal-title":"Hum Immunol"},{"key":"2025071509022759400_btaf240-B18","doi-asserted-by":"publisher","author":"Ji","year":"2024","DOI":"10.1101\/2024.04.13.589356"},{"key":"2025071509022759400_btaf240-B19","doi-asserted-by":"crossref","first-page":"btae051","DOI":"10.1093\/bioinformatics\/btae051","article-title":"TKSM: highly modular, user-customizable, and scalable transcriptomic sequencing long-read simulator","volume":"40","author":"Karao\u011flano\u011flu","year":"2024","journal-title":"Bioinformatics"},{"key":"2025071509022759400_btaf240-B20","doi-asserted-by":"crossref","first-page":"152","DOI":"10.1186\/s13059-017-1290-3","article-title":"A tandem simulation framework for predicting mapping quality","volume":"18","author":"Langmead","year":"2017","journal-title":"Genome Biol"},{"key":"2025071509022759400_btaf240-B21","volume-title":"Genes V","author":"Lewin","year":"1994"},{"key":"2025071509022759400_btaf240-B22","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2025071509022759400_btaf240-B23","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1186\/1471-2105-12-323","article-title":"RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome","volume":"12","author":"Li","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2025071509022759400_btaf240-B24","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1093\/bioinformatics\/btp692","article-title":"RNA-Seq gene expression estimation with read mapping uncertainty","volume":"26","author":"Li","year":"2010","journal-title":"Bioinformatics"},{"key":"2025071509022759400_btaf240-B25","doi-asserted-by":"publisher","first-page":"523","DOI":"10.1016\/j.cell.2008.03.029","article-title":"Highly integrated single-base resolution maps of the epigenome in arabidopsis","volume":"133","author":"Lister","year":"2008","journal-title":"Cell"},{"key":"2025071509022759400_btaf240-B26","doi-asserted-by":"publisher","first-page":"1124","DOI":"10.1101\/gr.199174.115","article-title":"Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq","volume":"26","author":"Liu","year":"2016","journal-title":"Genome Res"},{"key":"2025071509022759400_btaf240-B27","doi-asserted-by":"publisher","author":"Loving","year":"2025","DOI":"10.1101\/2024.07.19.604364"},{"key":"2025071509022759400_btaf240-B28","doi-asserted-by":"publisher","first-page":"621","DOI":"10.1038\/nmeth.1226","article-title":"Mapping and quantifying mammalian transcriptomes by RNA-Seq","volume":"5","author":"Mortazavi","year":"2008","journal-title":"Nat Methods"},{"key":"2025071509022759400_btaf240-B29","doi-asserted-by":"publisher","first-page":"1344","DOI":"10.1126\/science.1158441","article-title":"The transcriptional landscape of the yeast genome defined by RNA sequencing","volume":"320","author":"Nagalakshmi","year":"2008","journal-title":"Science"},{"key":"2025071509022759400_btaf240-B30","doi-asserted-by":"publisher","first-page":"2292","DOI":"10.1093\/bioinformatics\/btt381","article-title":"TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational bayesian inference","volume":"29","author":"Nariai","year":"2013","journal-title":"Bioinformatics"},{"key":"2025071509022759400_btaf240-B31","doi-asserted-by":"publisher","DOI":"10.1186\/1748-7188-6-9","article-title":"Estimation of alternative splicing isoform frequencies from RNA-Seq data","volume":"6","author":"Nicolae","year":"2011","journal-title":"Algorithms Mol Biol"},{"key":"2025071509022759400_btaf240-B32","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1126\/science.abj6987","article-title":"The complete sequence of a human genome","volume":"376","author":"Nurk","year":"2022","journal-title":"Science"},{"key":"2025071509022759400_btaf240-B33","doi-asserted-by":"crossref","first-page":"D733","DOI":"10.1093\/nar\/gkv1189","article-title":"Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation","volume":"44","author":"O\u2019Leary","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2025071509022759400_btaf240-B34","author":"Oxford Nanopore Technologies (ONT)"},{"key":"2025071509022759400_btaf240-B35","author":"Pacific Biosciences of California, Inc. (PacBio)"},{"key":"2025071509022759400_btaf240-B36","author":"Pacific Biosciences"},{"key":"2025071509022759400_btaf240-B37","doi-asserted-by":"publisher","first-page":"417","DOI":"10.1038\/nmeth.4197","article-title":"Salmon provides fast and bias-aware quantification of transcript expression","volume":"14","author":"Patro","year":"2017","journal-title":"Nat Methods"},{"key":"2025071509022759400_btaf240-B38","doi-asserted-by":"publisher","first-page":"462","DOI":"10.1038\/nbt.2862","article-title":"Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms","volume":"32","author":"Patro","year":"2014","journal-title":"Nat Biotechnol"},{"key":"2025071509022759400_btaf240-B39","doi-asserted-by":"publisher","first-page":"687","DOI":"10.1038\/nmeth.4324","article-title":"Differential analysis of RNA-seq incorporating quantification uncertainty","volume":"14","author":"Pimentel","year":"2017","journal-title":"Nat Methods"},{"key":"2025071509022759400_btaf240-B40","doi-asserted-by":"crossref","first-page":"915","DOI":"10.1038\/s41587-022-01565-y","article-title":"Accurate isoform discovery with IsoQuant using long reads","volume":"41","author":"Prjibelski","year":"2023","journal-title":"Nat Biotechnol"},{"key":"2025071509022759400_btaf240-B41","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1016\/j.gpb.2015.08.002","article-title":"PacBio sequencing and its applications","volume":"13","author":"Rhoads","year":"2015","journal-title":"Genomics Proteomics Bioinformatics"},{"key":"2025071509022759400_btaf240-B42","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1038\/nmeth.2251","article-title":"Streaming fragment assignment for real-time analysis of sequencing experiments","volume":"10","author":"Roberts","year":"2013","journal-title":"Nat Methods"},{"key":"2025071509022759400_btaf240-B43","doi-asserted-by":"crossref","first-page":"1438","DOI":"10.1038\/s41467-020-15171-6","article-title":"Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns","volume":"11","author":"Tang","year":"2020","journal-title":"Nat Commun"},{"key":"2025071509022759400_btaf240-B44","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1093\/bib\/bbs017","article-title":"Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration","volume":"14","author":"Thorvaldsd\u00f3ttir","year":"2013","journal-title":"Brief Bioinform"},{"key":"2025071509022759400_btaf240-B45","doi-asserted-by":"publisher","first-page":"R13","DOI":"10.1186\/gb-2011-12-2-r13","article-title":"Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads","volume":"12","author":"Turro","year":"2011","journal-title":"Genome Biol"},{"key":"2025071509022759400_btaf240-B46","doi-asserted-by":"crossref","first-page":"4760","DOI":"10.1038\/s41467-023-40083-6","article-title":"TEQUILA-seq: a versatile and low-cost method for targeted long-read RNA sequencing","volume":"14","author":"Wang","year":"2023","journal-title":"Nat Commun"},{"key":"2025071509022759400_btaf240-B47","doi-asserted-by":"crossref","first-page":"1348","DOI":"10.1038\/s41587-021-01108-x","article-title":"Nanopore sequencing technology, bioinformatics and applications","volume":"39","author":"Wang","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2025071509022759400_btaf240-B48","doi-asserted-by":"crossref","first-page":"1155","DOI":"10.1038\/s41587-019-0217-9","article-title":"Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome","volume":"37","author":"Wenger","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2025071509022759400_btaf240-B49","doi-asserted-by":"publisher","author":"Wongsurawat","year":"2022","DOI":"10.1007\/978-1-0716-2257-5_5"},{"key":"2025071509022759400_btaf240-B50","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1038\/s41592-019-0617-2","article-title":"Nanopore native RNA sequencing of a human poly (A) transcriptome","volume":"16","author":"Workman","year":"2019","journal-title":"Nat Methods"},{"key":"2025071509022759400_btaf240-B51","doi-asserted-by":"publisher","author":"Wyman","year":"2019","DOI":"10.1101\/672931"},{"key":"2025071509022759400_btaf240-B52","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/gigascience\/gix010","article-title":"NanoSim: nanopore sequence read simulator based on statistical characterization","volume":"6","author":"Yang","year":"2017","journal-title":"Gigascience"},{"key":"2025071509022759400_btaf240-B53","doi-asserted-by":"publisher","first-page":"i142","DOI":"10.1093\/bioinformatics\/btx262","article-title":"Improved data-driven likelihood factorizations for transcript abundance estimation","volume":"33","author":"Zakeri","year":"2017","journal-title":"Bioinformatics"},{"key":"2025071509022759400_btaf240-B54","doi-asserted-by":"publisher","first-page":"i283","DOI":"10.1093\/bioinformatics\/btu288","article-title":"RNA-Skim: a rapid method for RNA-Seq quantification at transcript level","volume":"30","author":"Zhang","year":"2014","journal-title":"Bioinformatics"},{"key":"2025071509022759400_btaf240-B55","doi-asserted-by":"publisher","author":"Zhang","year":"2024","DOI":"10.1101\/2024.08.19.608720"},{"key":"2025071509022759400_btaf240-B56","doi-asserted-by":"publisher","first-page":"e105","DOI":"10.1093\/nar\/gkz622","article-title":"Nonparametric expression analysis using inferential replicate counts","volume":"47","author":"Zhu","year":"2019","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/Supplement_1\/i304\/63745713\/btaf240.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/Supplement_1\/i304\/63745713\/btaf240.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T13:02:40Z","timestamp":1752584560000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/41\/Supplement_1\/i304\/8199410"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,1]]},"references-count":56,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2025,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf240","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,7,1]]}}}