{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T19:18:56Z","timestamp":1774898336955,"version":"3.50.1"},"reference-count":21,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T00:00:00Z","timestamp":1719532800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DBI-2019797"],"award-info":[{"award-number":["DBI-2019797"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DBI-2145171"],"award-info":[{"award-number":["DBI-2145171"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01HG011065"],"award-info":[{"award-number":["R01HG011065"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,6,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>High-throughput RNA sequencing has become indispensable for decoding gene activities, yet the challenge of reconstructing full-length transcripts persists. Traditional single-sample assemblers frequently produce fragmented transcripts, especially in single-cell RNA-seq data. While algorithms designed for assembling multiple samples exist, they encounter various limitations.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We present Aletsch, a new assembler for multiple bulk or single-cell RNA-seq samples. Aletsch incorporates several algorithmic innovations, including a \u201cbridging\u201d system that can effectively integrate multiple samples to restore missed junctions in individual samples, and a new graph-decomposition algorithm that leverages \u201csupporting\u201d information across multiple samples to guide the decomposition of complex vertices. A standout feature of Aletsch is its application of a random forest model with 50 well-designed features for scoring transcripts. We demonstrate its robust adaptability across different chromosomes, datasets, and species. Our experiments, conducted on RNA-seq data from several protocols, firmly demonstrate Aletsch\u2019s significant outperformance over existing meta-assemblers. As an example, when measured with the partial area under the precision-recall curve (pAUC, constrained by precision), Aletsch surpasses the leading assemblers TransMeta by 22.9%\u201362.1% and PsiCLASS by 23.0%\u2013175.5% on human datasets.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Aletsch is freely available at https:\/\/github.com\/Shao-Group\/aletsch. Scripts that reproduce the experimental results of this manuscript is available at https:\/\/github.com\/Shao-Group\/aletsch-test.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae215","type":"journal-article","created":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T09:29:05Z","timestamp":1719566945000},"page":"i307-i317","source":"Crossref","is-referenced-by-count":4,"title":["Accurate assembly of multiple RNA-seq samples with Aletsch"],"prefix":"10.1093","volume":"40","author":[{"given":"Qian","family":"Shi","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, The Pennsylvania State University , University Park, PA 16802, United States"}]},{"given":"Qimin","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, The Pennsylvania State University , University Park, PA 16802, United States"}]},{"given":"Mingfu","family":"Shao","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, The Pennsylvania State University , University Park, PA 16802, United States"},{"name":"Huck Institutes of the Life Sciences, The Pennsylvania State University , University Park, PA 16802, United States"}]}],"member":"286","published-online":{"date-parts":[[2024,6,28]]},"reference":[{"key":"2024062809045717100_btae215-B1","doi-asserted-by":"crossref","first-page":"2529","DOI":"10.1093\/bioinformatics\/btt442","article-title":"MITIE: simultaneous RNA-Seq-based transcript identification and quantification in multiple samples","volume":"29","author":"Behr","year":"2013","journal-title":"Bioinformatics"},{"key":"2024062809045717100_btae215-B2","first-page":"230","author":"Dias","year":"2022"},{"key":"2024062809045717100_btae215-B3","doi-asserted-by":"crossref","first-page":"2778","DOI":"10.1093\/bioinformatics\/btv272","article-title":"Polyester: simulating RNA-seq datasets with differential transcript expression","volume":"31","author":"Frazee","year":"2015","journal-title":"Bioinformatics"},{"key":"2024062809045717100_btae215-B4","doi-asserted-by":"crossref","first-page":"708","DOI":"10.1038\/s41587-020-0497-0","article-title":"Single-cell RNA counting at allele and isoform resolution using smart-seq3","volume":"38","author":"Hagemann-Jensen","year":"2020","journal-title":"Nat Biotechnol"},{"key":"2024062809045717100_btae215-B5","doi-asserted-by":"crossref","first-page":"1452","DOI":"10.1038\/s41587-022-01311-4","article-title":"Scalable single-cell RNA sequencing from full transcripts with smart-seq3xpress","volume":"40","author":"Hagemann-Jensen","year":"2022","journal-title":"Nat Biotechnol"},{"key":"2024062809045717100_btae215-B6","first-page":"177","author":"Khan","year":"2022"},{"key":"2024062809045717100_btae215-B7","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1186\/s13059-019-1910-1","article-title":"Transcriptome assembly from long-read RNA-seq alignments with StringTie2","volume":"20","author":"Kovaka","year":"2019","journal-title":"Genome Biol"},{"key":"2024062809045717100_btae215-B8","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1186\/s13059-016-1074-1","article-title":"TransComb: genome-guided transcriptome assembly via combing junctions in splicing graphs","volume":"17","author":"Liu","year":"2016","journal-title":"Genome Biol"},{"key":"2024062809045717100_btae215-B9","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nmeth.4078","article-title":"Taco produces robust multisample transcriptome assemblies from rna-seq","volume":"14","author":"Niknafs","year":"2017","journal-title":"Nat Methods"},{"key":"2024062809045717100_btae215-B10","author":"Pardo-Palacios","year":"2021"},{"key":"2024062809045717100_btae215-B11","doi-asserted-by":"crossref","first-page":"304","DOI":"10.12688\/f1000research.23297.1","article-title":"GFF utilities: gffread and gffcompare","volume":"9","author":"Pertea","year":"2020","journal-title":"F1000Res"},{"key":"2024062809045717100_btae215-B12","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1038\/nbt.3122","article-title":"StringTie enables improved reconstruction of a transcriptome from RNA-seq reads","volume":"33","author":"Pertea","year":"2015","journal-title":"Nat Biotechnol"},{"key":"2024062809045717100_btae215-B13","author":"Shao"},{"key":"2024062809045717100_btae215-B14","doi-asserted-by":"crossref","first-page":"1167","DOI":"10.1038\/nbt.4020","article-title":"Accurate assembly of transcripts through phase-preserving graph decomposition","volume":"35","author":"Shao","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2024062809045717100_btae215-B15","doi-asserted-by":"crossref","first-page":"e98","DOI":"10.1093\/nar\/gkw158","article-title":"CLASS2: accurate and efficient splice variant annotation from RNA-seq reads","volume":"44","author":"Song","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2024062809045717100_btae215-B16","doi-asserted-by":"crossref","first-page":"5000","DOI":"10.1038\/s41467-019-12990-0","article-title":"A multi-sample approach increases the accuracy of transcript assembly","volume":"10","author":"Song","year":"2019","journal-title":"Nat Commun"},{"key":"2024062809045717100_btae215-B17","doi-asserted-by":"crossref","first-page":"S15","DOI":"10.1186\/1471-2164-16-S2-S15","article-title":"Accurate inference of isoforms from multiple sample RNA-Seq data","volume":"16","author":"Tasnim","year":"2015","journal-title":"BMC Genomics"},{"key":"2024062809045717100_btae215-B18","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1038\/nbt.1621","article-title":"Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation","volume":"28","author":"Trapnell","year":"2010","journal-title":"Nat Biotechnol"},{"key":"2024062809045717100_btae215-B19","doi-asserted-by":"crossref","first-page":"1398","DOI":"10.1101\/gr.276434.121","article-title":"Transmeta simultaneously assembles multisample RNA-seq reads","volume":"32","author":"Yu","year":"2022","journal-title":"Genome Res"},{"key":"2024062809045717100_btae215-B20","author":"Zahin","year":"2024"},{"key":"2024062809045717100_btae215-B21","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1038\/s43588-022-00216-1","article-title":"Accurate assembly of multi-end RNA-seq data with Scallop2","volume":"2","author":"Zhang","year":"2022","journal-title":"Nat Comput Sci"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_1\/i307\/58354938\/btae215.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_1\/i307\/58354938\/btae215.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T09:29:18Z","timestamp":1719566958000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/40\/Supplement_1\/i307\/7700882"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,28]]},"references-count":21,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2024,6,28]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae215","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,7]]},"published":{"date-parts":[[2024,6,28]]}}}