{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:45:09Z","timestamp":1740185109712,"version":"3.37.3"},"reference-count":30,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2018,5,9]],"date-time":"2018-05-09T00:00:00Z","timestamp":1525824000000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000153","name":"NSF Division of Biological Infrastructure","doi-asserted-by":"crossref","award":["1564917"],"award-info":[{"award-number":["1564917"]}],"id":[{"id":"10.13039\/100000153","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>De novo transcriptome analysis using RNA-seq offers a promising means to study gene expression in non-model organisms. Yet, the difficulty of transcriptome assembly means that the contigs provided by the assembler often represent a fractured and incomplete view of the transcriptome, complicating downstream analysis. We introduce Grouper, a new method for clustering contigs from de novo assemblies that are likely to belong to the same transcripts and genes; these groups can subsequently be analyzed more robustly. When provided with access to the genome of a related organism, Grouper can transfer annotations to the de novo assembly, further improving the clustering.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>On de novo assemblies from four different species, we show that Grouper is able to accurately cluster a larger number of contigs than the existing state-of-the-art method. The Grouper pipeline is able to map greater than 10% more reads against the contigs, leading to accurate downstream differential expression analyses. The labeling module, in the presence of a closely related annotated genome, can efficiently transfer annotations to the contigs and use this information to further improve clustering. Overall, Grouper provides a complete and efficient pipeline for processing de novo transcriptomic assemblies.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The Grouper software is freely available at https:\/\/github.com\/COMBINE-lab\/grouper under the 2-clause BSD license.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty378","type":"journal-article","created":{"date-parts":[[2018,5,3]],"date-time":"2018-05-03T23:48:46Z","timestamp":1525391326000},"page":"3265-3272","source":"Crossref","is-referenced-by-count":13,"title":["Grouper: graph-based clustering and annotation for improved <i>de novo<\/i> transcriptome analysis"],"prefix":"10.1093","volume":"34","author":[{"given":"Laraib","family":"Malik","sequence":"first","affiliation":[{"name":"Department of Computer Science, Stony Brook University, Stony Brook, NY, USA"}]},{"given":"Fatemeh","family":"Almodaresi","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stony Brook University, Stony Brook, NY, USA"}]},{"given":"Rob","family":"Patro","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stony Brook University, Stony Brook, NY, USA"}]}],"member":"286","published-online":{"date-parts":[[2018,5,8]]},"reference":[{"key":"2023012712493357300_bty378-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"year":"2008","author":"Baluja","key":"2023012712493357300_bty378-B2"},{"key":"2023012712493357300_bty378-B3","doi-asserted-by":"crossref","first-page":"e2988.","DOI":"10.7717\/peerj.2988","article-title":"Compacting and correcting trinity and oases rna-seq de novo assemblies","volume":"5","author":"Cabau","year":"2017","journal-title":"PeerJ"},{"key":"2023012712493357300_bty378-B4","first-page":"410.","article-title":"Corset: enabling differential gene expression analysis for de novo assembled transcriptomes","volume":"15","author":"Davidson","year":"2014","journal-title":"Genome Biol"},{"year":"2000","author":"Dongen","key":"2023012712493357300_bty378-B5"},{"key":"2023012712493357300_bty378-B6","doi-asserted-by":"crossref","first-page":"1670","DOI":"10.1093\/bioinformatics\/btw217","article-title":"Informed k mer selection for de novo transcriptome assembly","volume":"32","author":"Durai","year":"2016","journal-title":"Bioinformatics"},{"key":"2023012712493357300_bty378-B7","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1038\/hdy.2010.152","article-title":"Applications of next generation sequencing in molecular ecology of non-model organisms","volume":"107","author":"Ekblom","year":"2011","journal-title":"Heredity"},{"key":"2023012712493357300_bty378-B8","doi-asserted-by":"crossref","first-page":"469","DOI":"10.1038\/nmeth.1613","article-title":"Computational methods for transcriptome annotation and quantification using rna-seq","volume":"8","author":"Garber","year":"2011","journal-title":"Nat. Methods"},{"key":"2023012712493357300_bty378-B9","doi-asserted-by":"crossref","first-page":"644","DOI":"10.1038\/nbt.1883","article-title":"Full-length transcriptome assembly from rna-seq data without a reference genome","volume":"29","author":"Grabherr","year":"2011","journal-title":"Nat. Biotechnol"},{"key":"2023012712493357300_bty378-B10","doi-asserted-by":"crossref","first-page":"1494","DOI":"10.1038\/nprot.2013.084","article-title":"De novo transcript sequence reconstruction from rna-seq: reference generation and analysis with trinity","volume":"8","author":"Haas","year":"2013","journal-title":"Nat. Protoc"},{"key":"2023012712493357300_bty378-B11","doi-asserted-by":"crossref","first-page":"e35152.","DOI":"10.1371\/journal.pone.0035152","article-title":"Characterization of common carp transcriptome: sequencing, de novo assembly, annotation and comparative genomics","volume":"7","author":"Ji","year":"2012","journal-title":"PLoS One"},{"key":"2023012712493357300_bty378-B12","doi-asserted-by":"crossref","DOI":"10.1002\/0471250953.bi1107s32","article-title":"Aligning short sequencing reads with bowtie","author":"Langmead","year":"2010","journal-title":"Curr. Protoc. Bioinformatics"},{"key":"2023012712493357300_bty378-B13","doi-asserted-by":"crossref","first-page":"R29.","DOI":"10.1186\/gb-2014-15-2-r29","article-title":"Voom: precision weights unlock linear model analysis tools for rna-seq read counts","volume":"15","author":"Law","year":"2014","journal-title":"Genome Biol"},{"key":"2023012712493357300_bty378-B14","doi-asserted-by":"crossref","first-page":"323.","DOI":"10.1186\/1471-2105-12-323","article-title":"Rsem: accurate transcript quantification from rna-seq data with or without a reference genome","volume":"12","author":"Li","year":"2011","journal-title":"BMC Bioinform"},{"key":"2023012712493357300_bty378-B15","doi-asserted-by":"crossref","first-page":"544","DOI":"10.1101\/gr.184341.114","article-title":"Joint annotation of chromatin state and chromatin conformation reveals relationships among domain types and identifies domains of cell-type-specific expression","volume":"25","author":"Libbrecht","year":"2015","journal-title":"Genome Res"},{"key":"2023012712493357300_bty378-B17","doi-asserted-by":"crossref","first-page":"1509","DOI":"10.1101\/gr.079558.108","article-title":"Rna-seq: an assessment of technical reproducibility and comparison with gene expression arrays","volume":"18","author":"Marioni","year":"2008","journal-title":"Genome Res"},{"key":"2023012712493357300_bty378-B18","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1038\/nrg3068","article-title":"Next-generation transcriptome assembly","volume":"12","author":"Martin","year":"2011","journal-title":"Nat. Rev. Genet"},{"key":"2023012712493357300_bty378-B19","doi-asserted-by":"crossref","first-page":"9.","DOI":"10.1186\/1748-7188-6-9","article-title":"Estimation of alternative splicing isoform frequencies from rna-seq data","volume":"6","author":"Nicolae","year":"2011","journal-title":"Algorithm. Mol. Biol"},{"key":"2023012712493357300_bty378-B20","doi-asserted-by":"crossref","first-page":"180.","DOI":"10.1186\/1471-2164-11-180","article-title":"Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery","volume":"11","author":"Parchman","year":"2010","journal-title":"BMC Genomics"},{"key":"2023012712493357300_bty378-B21","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1038\/nmeth.4197","article-title":"Salmon provides fast and bias-aware quantification of transcript expression","volume":"14","author":"Patro","year":"2017","journal-title":"Nat. Methods"},{"key":"2023012712493357300_bty378-B22","doi-asserted-by":"crossref","first-page":"e0138006.","DOI":"10.1371\/journal.pone.0138006","article-title":"Semantic assembly and annotation of draft rnaseq transcripts without a reference genome","volume":"10","author":"Ptitsyn","year":"2015","journal-title":"PLoS One"},{"key":"2023012712493357300_bty378-B23","doi-asserted-by":"crossref","first-page":"909","DOI":"10.1038\/nmeth.1517","article-title":"De novo assembly and analysis of rna-seq data","volume":"7","author":"Robertson","year":"2010","journal-title":"Nat. Methods"},{"key":"2023012712493357300_bty378-B24","doi-asserted-by":"crossref","first-page":"62.","DOI":"10.1214\/10-STS343","article-title":"Statistical modeling of rna-seq data","volume":"26","author":"Salzman","year":"2011","journal-title":"Stat. Sci"},{"key":"2023012712493357300_bty378-B25","doi-asserted-by":"crossref","first-page":"1086","DOI":"10.1093\/bioinformatics\/bts094","article-title":"Oases: robust de novo rna-seq assembly across the dynamic range of expression levels","volume":"28","author":"Schulz","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012712493357300_bty378-B16","doi-asserted-by":"crossref","DOI":"10.12688\/f1000research.7563.1","article-title":"Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences","volume":"4","author":"Soneson","year":"2015","journal-title":"F1000Res"},{"key":"2023012712493357300_bty378-B26","article-title":"Rna-seq de novo assembly reveals differential gene expression in glossina palpalis gambiensis infected with trypanosoma brucei gambiense vs. non-infected and self-cured flies","volume":"6","author":"Soumana","year":"2015","journal-title":"Front. Microbiol"},{"year":"2016","author":"Srivastava","key":"2023012712493357300_bty378-B27"},{"key":"2023012712493357300_bty378-B28","doi-asserted-by":"crossref","first-page":"385.","DOI":"10.1186\/1471-2164-15-385","article-title":"Differential expression of small rnas from burkholderia thailandensis in response to varying environmental and stress conditions","volume":"15","author":"Stubben","year":"2014","journal-title":"BMC Genomics"},{"year":"2010","author":"Talukdar","key":"2023012712493357300_bty378-B29"},{"key":"2023012712493357300_bty378-B30","doi-asserted-by":"crossref","first-page":"R13.","DOI":"10.1186\/gb-2011-12-2-r13","article-title":"Haplotype and isoform specific expression estimation using multi-mapping rna-seq reads","volume":"12","author":"Turro","year":"2011","journal-title":"Genome Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/19\/3265\/48919425\/bioinformatics_34_19_3265.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/19\/3265\/48919425\/bioinformatics_34_19_3265.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T13:43:20Z","timestamp":1674827000000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/19\/3265\/4994263"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,5,8]]},"references-count":30,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2018,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty378","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2018,10,1]]},"published":{"date-parts":[[2018,5,8]]}}}