{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T14:49:24Z","timestamp":1762008564530,"version":"build-2065373602"},"reference-count":52,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2018,11,19]],"date-time":"2018-11-19T00:00:00Z","timestamp":1542585600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Convergenomix project","award":["ANR-15-CE32-0005"],"award-info":[{"award-number":["ANR-15-CE32-0005"]}]},{"name":"Ecole Normale Sup\u00e9rieure of Lyon"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene family alignments. RNA-Seq reads are assembled into transcripts by both de novo and assisted assemblies. Then, CAARS incorporates transcripts into gene families, builds gene alignments and trees and uses phylogenetic information to classify the genes as orthologs and paralogs of existing genes. We used CAARS to assemble and annotate RNA-Seq data in rodents and fishes using distantly related genomes as reference, a difficult case for this kind of analysis. We showed CAARS assemblies are more complete and accurate than those assembled by a standard pipeline consisting of de novo assembly coupled with annotation by sequence similarity on a guide species. In addition to annotated transcripts, CAARS provides gene family alignments and trees, annotated with orthology relationships, directly usable for downstream comparative analyses.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>CAARS is implemented in Python and Ocaml and is freely available at https:\/\/github.com\/carinerey\/caars.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty903","type":"journal-article","created":{"date-parts":[[2018,11,17]],"date-time":"2018-11-17T02:12:14Z","timestamp":1542420734000},"page":"2199-2207","source":"Crossref","is-referenced-by-count":4,"title":["CAARS: comparative assembly and annotation of RNA-Seq data"],"prefix":"10.1093","volume":"35","author":[{"given":"Carine","family":"Rey","sequence":"first","affiliation":[{"name":"UnivLyon, Universit\u00e9 Claude Bernard Lyon 1, ENS de Lyon, CNRS UMR, INSERM U1210, LBMC, F-69007, Lyon, France"}]},{"given":"Philippe","family":"Veber","sequence":"additional","affiliation":[{"name":"UnivLyon, Universit\u00e9 Claude Bernard Lyon 1, CNRS, UMR, LBBE, F-69100, Villeurbanne, France"}]},{"given":"Bastien","family":"Boussau","sequence":"additional","affiliation":[{"name":"UnivLyon, Universit\u00e9 Claude Bernard Lyon 1, CNRS, UMR, LBBE, F-69100, Villeurbanne, France"}]},{"given":"Marie","family":"S\u00e9mon","sequence":"additional","affiliation":[{"name":"UnivLyon, Universit\u00e9 Claude Bernard Lyon 1, ENS de Lyon, CNRS UMR, INSERM U1210, LBMC, F-69007, Lyon, France"}]}],"member":"286","published-online":{"date-parts":[[2018,11,19]]},"reference":[{"key":"2023051612073035000_bty903-B1","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1186\/s12859-015-0515-2","article-title":"aTRAM - automated target restricted assembly method: a fast method for assembling loci across divergent taxa from next-generation sequencing data","volume":"16","author":"Allen","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023051612073035000_bty903-B2","doi-asserted-by":"crossref","first-page":"e1000262","DOI":"10.1371\/journal.pcbi.1000262","article-title":"Phylogenetic and functional assessment of orthologs inference projects and methods","volume":"5","author":"Altenhoff","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023051612073035000_bty903-B3","doi-asserted-by":"crossref","first-page":"1250","DOI":"10.1093\/bioinformatics\/btt127","article-title":"BRANCH: boosting RNA-Seq assemblies with partial or related genomic sequences","volume":"29","author":"Bao","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051612073035000_bty903-B4","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1186\/s12864-015-2349-8","article-title":"FRAMA: from RNA-seq data to annotated mRNA assemblies","volume":"17","author":"Bens","year":"2016","journal-title":"BMC Genomics"},{"key":"2023051612073035000_bty903-B5","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1111\/ele.12423","article-title":"Fossil-based comparative analyses reveal ancient marine ancestry erased by extinction in ray-finned fishes","volume":"18","author":"Betancur-R","year":"2015","journal-title":"Ecol. Lett."},{"key":"2023051612073035000_bty903-B6","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1101\/gr.141978.112","article-title":"Genome-scale coestimation of species and gene trees","volume":"23","author":"Boussau","year":"2013","journal-title":"Genome Res."},{"key":"2023051612073035000_bty903-B7","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1038\/nbt.3519","article-title":"Near-optimal probabilistic RNA-seq quantification","volume":"34","author":"Bray","year":"2016","journal-title":"Nat. Biotechnol."},{"key":"2023051612073035000_bty903-B8","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1186\/1471-2105-10-421","article-title":"BLAST+: architecture and applications","volume":"10","author":"Camacho","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023051612073035000_bty903-B9","doi-asserted-by":"crossref","first-page":"e383","DOI":"10.1371\/journal.pone.0000383","article-title":"Assessing performance of orthology detection strategies applied to eukaryotic genomes","volume":"2","author":"Chen","year":"2007","journal-title":"PLoS One"},{"key":"2023051612073035000_bty903-B10","doi-asserted-by":"crossref","first-page":"e1000112","DOI":"10.1371\/journal.pbio.1000112","article-title":"Lineage-specific biology revealed by a finished genome assembly of the mouse","volume":"7","author":"Church","year":"2009","journal-title":"PLoS Biol."},{"key":"2023051612073035000_bty903-B11","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1186\/s13059-016-0881-8","article-title":"A survey of best practices for RNA-seq data analysis","volume":"17","author":"Conesa","year":"2016","journal-title":"Genome Biol."},{"key":"2023051612073035000_bty903-B12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-14-330","article-title":"Agalma: an automated phylogenomics workflow","volume":"14","author":"Dunn","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023051612073035000_bty903-B13","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1186\/1471-2148-12-88","article-title":"A glimpse on the pattern of rodent diversification: a phylogenetic approach","volume":"12","author":"Fabre","year":"2012","journal-title":"BMC Evol. Biol."},{"key":"2023051612073035000_bty903-B14","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1074\/mcp.M113.035600","article-title":"Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics","volume":"13","author":"Fagerberg","year":"2014","journal-title":"Mol. Cell. Proteomics"},{"key":"2023051612073035000_bty903-B15","doi-asserted-by":"crossref","first-page":"D222","DOI":"10.1093\/nar\/gkt1223","article-title":"Pfam: the protein families database","volume":"42","author":"Finn","year":"2014","journal-title":"Nucleic Acids Res."},{"key":"2023051612073035000_bty903-B16","doi-asserted-by":"crossref","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","article-title":"CD-HIT: accelerated for clustering the next-generation sequencing data","volume":"28","author":"Fu","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051612073035000_bty903-B17","doi-asserted-by":"crossref","first-page":"469","DOI":"10.1038\/nmeth.1613","article-title":"Computational methods for transcriptome annotation and quantification using RNA-seq","volume":"8","author":"Garber","year":"2011","journal-title":"Nat. Methods"},{"key":"2023051612073035000_bty903-B18","doi-asserted-by":"crossref","first-page":"644","DOI":"10.1038\/nbt.1883","article-title":"Full-length transcriptome assembly from RNA-Seq data without a reference genome","volume":"29","author":"Grabherr","year":"2011","journal-title":"Nat. Biotechnol."},{"key":"2023051612073035000_bty903-B19","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1093\/molbev\/msv037","article-title":"Tree of life reveals clock-like speciation and diversification","volume":"32","author":"Hedges","year":"2015","journal-title":"Mol. Biol. Evol."},{"key":"2023051612073035000_bty903-B20","doi-asserted-by":"crossref","first-page":"bav096","DOI":"10.1093\/database\/bav096","article-title":"Ensembl comparative genomics resources","volume":"2016","author":"Herrero","year":"2016","journal-title":"Database (Oxford)"},{"key":"2023051612073035000_bty903-B21","doi-asserted-by":"crossref","first-page":"868","DOI":"10.1101\/gr.9.9.868","article-title":"CAP3: a DNA sequence assembly program","volume":"9","author":"Huang","year":"1999","journal-title":"Genome Res."},{"key":"2023051612073035000_bty903-B22","doi-asserted-by":"crossref","first-page":"D897","DOI":"10.1093\/nar\/gkt1177","article-title":"PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome","volume":"42","author":"Huerta-Cepas","year":"2014","journal-title":"Nucleic Acids Res."},{"key":"2023051612073035000_bty903-B23","doi-asserted-by":"crossref","first-page":"evw142","DOI":"10.1093\/gbe\/evw142","article-title":"Different endosymbiotic interactions in two hydra species reflect the evolutionary history of endosymbiosis","volume":"8","author":"Ishikawa","year":"2016","journal-title":"Genome Biol. Evol."},{"key":"2023051612073035000_bty903-B24","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1016\/j.ympev.2012.09.007","article-title":"Next-generation phylogenomics using a target restricted assembly method","volume":"66","author":"Johnson","year":"2013","journal-title":"Mol. Phylogenetics Evol."},{"key":"2023051612073035000_bty903-B25","doi-asserted-by":"crossref","first-page":"3059","DOI":"10.1093\/nar\/gkf436","article-title":"Mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform","volume":"30","author":"Katoh","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023051612073035000_bty903-B26","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1111\/1755-0998.12186","article-title":"Accuracy of allele frequency estimation using pooled RNA-Seq","volume":"14","author":"Konczal","year":"2014","journal-title":"Mol. Ecol. Resour."},{"key":"2023051612073035000_bty903-B27","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1093\/bib\/bbr030","article-title":"Computational methods for gene orthology inference","volume":"12","author":"Kristensen","year":"2011","journal-title":"Brief. Bioinform."},{"key":"2023051612073035000_bty903-B28","first-page":"821","article-title":"Kollector: transcript-informed, targeted de novo assembly of gene loci","volume":"18","author":"Kucuk","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051612073035000_bty903-B29","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1016\/j.tig.2008.08.009","article-title":"The quest for orthologs: finding the corresponding gene across genomes","volume":"24","author":"Kuzniar","year":"2008","journal-title":"Trends Genet."},{"key":"2023051612073035000_bty903-B30","first-page":"530","article-title":"A review of bioinformatic pipeline frameworks","volume":"18","author":"Leipzig","year":"2016","journal-title":"Brief. Bioinform."},{"key":"2023051612073035000_bty903-B31","doi-asserted-by":"crossref","first-page":"2699","DOI":"10.1111\/mec.12764","article-title":"Natural selection and the genetic basis of osmoregulation in heteromyid rodents as revealed by RNA-seq","volume":"23","author":"Marra","year":"2014","journal-title":"Mol. Ecol."},{"key":"2023051612073035000_bty903-B32","doi-asserted-by":"crossref","first-page":"446","DOI":"10.1111\/1755-0998.12465","article-title":"Optimization of next-generation sequencing transcriptome annotation for species lacking sequenced genomes","volume":"16","author":"Ockendon","year":"2016","journal-title":"Mol. Ecol. Resour."},{"key":"2023051612073035000_bty903-B33","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1038\/nrg2934","article-title":"RNA sequencing: advances, challenges and opportunities","volume":"12","author":"Ozsolak","year":"2011","journal-title":"Nat. Rev. Genet."},{"key":"2023051612073035000_bty903-B34","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1186\/1471-2105-10-S6-S3","article-title":"Databases of homologous gene families for comparative genomics","volume":"10","author":"Penel","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023051612073035000_bty903-B35","doi-asserted-by":"crossref","first-page":"1478","DOI":"10.1111\/mec.13579","article-title":"Transcriptome-wide patterns of divergence during allopatric evolution","volume":"25","author":"Pereira","year":"2016","journal-title":"Mol. Ecol."},{"key":"2023051612073035000_bty903-B36","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1186\/1471-2148-7-241","article-title":"Orthomam: a database of orthologous genomic markers for placental mammal phylogenetics","volume":"7","author":"Ranwez","year":"2007","journal-title":"BMC Evol. Biol."},{"key":"2023051612073035000_bty903-B37","article-title":"apytram v1.1","volume-title":"Zenodo","author":"Rey","year":"2017"},{"key":"2023051612073035000_bty903-B38","doi-asserted-by":"crossref","first-page":"6239","DOI":"10.1073\/pnas.95.11.6239","article-title":"Genomic evidence for two functionally distinct gene classes","volume":"95","author":"Rivera","year":"1998","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051612073035000_bty903-B39","doi-asserted-by":"crossref","first-page":"17","DOI":"10.4137\/GEI.S37925","article-title":"Inferring orthologs: open questions and perspectives","volume":"9","author":"Tekaia","year":"2016","journal-title":"Genomics Insights"},{"key":"2023051612073035000_bty903-B40","doi-asserted-by":"crossref","first-page":"2391","DOI":"10.1093\/molbev\/msw110","article-title":"Annual Killifish transcriptomics and candidate genes for metazoan diapause","volume":"33","author":"Thompson","year":"2016","journal-title":"Mol. Biol. Evol."},{"key":"2023051612073035000_bty903-B41","doi-asserted-by":"crossref","first-page":"1224","DOI":"10.1111\/mec.13526","article-title":"The power and promise of RNA-seq in ecology and evolution","volume":"25","author":"Todd","year":"2016","journal-title":"Mol. Ecol."},{"key":"2023051612073035000_bty903-B42","doi-asserted-by":"crossref","first-page":"392","DOI":"10.1186\/s12864-016-2646-x","article-title":"A robust (re-)annotation approach to generate unbiased mapping references for RNA-seq-based analyses of differential expression across closely related species","volume":"17","author":"Torres-Oliva","year":"2016","journal-title":"BMC Genomics"},{"key":"2023051612073035000_bty903-B43","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1093\/bioinformatics\/btp120","article-title":"TopHat: discovering splice junctions with RNA-Seq","volume":"25","author":"Trapnell","year":"2009","journal-title":"Bioinformatics"},{"key":"2023051612073035000_bty903-B44","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1038\/nbt.1621","article-title":"Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation","volume":"28","author":"Trapnell","year":"2010","journal-title":"Nat. Biotechnol."},{"key":"2023051612073035000_bty903-B45","doi-asserted-by":"crossref","first-page":"1260419","DOI":"10.1126\/science.1260419","article-title":"Tissue-based map of the human proteome","volume":"347","author":"Uhlen","year":"2015","journal-title":"Science"},{"key":"2023051612073035000_bty903-B46","doi-asserted-by":"crossref","first-page":"969","DOI":"10.1093\/sysbio\/syv044","article-title":"Integrating sequence evolution into probabilistic orthology analysis","volume":"64","author":"Ullah","year":"2015","journal-title":"Syst. Biol."},{"key":"2023051612073035000_bty903-B47","doi-asserted-by":"crossref","first-page":"e0185020","DOI":"10.1371\/journal.pone.0185020","article-title":"Challenges and advances for transcriptome assembly in non-model species","volume":"12","author":"Ungaro","year":"2017","journal-title":"PLoS One"},{"key":"2023051612073035000_bty903-B48","article-title":"bistro v0.3.0","volume-title":"Zenodo","author":"Veber","year":"2017"},{"key":"2023051612073035000_bty903-B49","doi-asserted-by":"crossref","first-page":"620","DOI":"10.1111\/mec.12014","article-title":"Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments","volume":"22","author":"Vijay","year":"2013","journal-title":"Mol. Ecol."},{"key":"2023051612073035000_bty903-B50","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nrg2484","article-title":"RNA-Seq: a revolutionary tool for transcriptomics","volume":"10","author":"Wang","year":"2009","journal-title":"Nat. Rev. Genet."},{"key":"2023051612073035000_bty903-B51","doi-asserted-by":"crossref","first-page":"3081","DOI":"10.1093\/molbev\/msu245","article-title":"Orthology inference in nonmodel organisms using transcriptomes and low-coverage genomes: improving accuracy and matrix occupancy for phylogenomics","volume":"31","author":"Yang","year":"2014","journal-title":"Mol. Biol. Evol."},{"key":"2023051612073035000_bty903-B52","doi-asserted-by":"crossref","first-page":"D710","DOI":"10.1093\/nar\/gkv1157","article-title":"Ensembl 2016","volume":"44","author":"Yates","year":"2016","journal-title":"Nucleic Acids Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/13\/2199\/50340440\/bioinformatics_35_13_2199.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/13\/2199\/50340440\/bioinformatics_35_13_2199.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T12:09:16Z","timestamp":1684238956000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/13\/2199\/5191702"}},"subtitle":[],"editor":[{"given":"Janet","family":"Kelso","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,11,19]]},"references-count":52,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2019,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty903","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2019,7,1]]},"published":{"date-parts":[[2018,11,19]]}}}