{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:45:28Z","timestamp":1740185128969,"version":"3.37.3"},"reference-count":11,"publisher":"Oxford University Press (OUP)","issue":"15","license":[{"start":{"date-parts":[[2022,6,22]],"date-time":"2022-06-22T00:00:00Z","timestamp":1655856000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"BASF Agricultural Solutions"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,8,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Analysis of gene expression data can be crucial for elucidating biological relationships within living organisms. However, accurate quantification of gene expression relies directly upon the accuracy of the reference genome or transcriptome to which the expression data are mapped. Errors in gene annotation can lead to errors in the quantification of gene expression. One source of gene annotation error in eukaryotes arises from incorrect predictions of messenger RNA gene models within ribosomal DNA (rDNA) regions.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Here, we provide examples of how the presence of false gene models in rDNA regions can result in a handful of genes appearing to contribute to &amp;gt;50% of the total transcripts per million values of entire RNA-seq datasets. To this end, we have created riboCleaner, a bioinformatics pipeline designed to identify misannotated gene models in rDNA regions and quantify rRNA-derived reads in RNA-seq data. We also show the applicability of riboCleaner in several plant genome assemblies.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>We have implemented riboCleaner as a containerized Snakemake workflow. The workflow, instructions for building the container and other documentation are available at https:\/\/github.com\/basf. The data underlying this article are available in GitHub at https:\/\/github.com\/basf\/riboCleaner. For convenience, a prebuilt Docker image containing riboCleaner is available at https:\/\/hub.docker.com\/u\/basfcontainers.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac402","type":"journal-article","created":{"date-parts":[[2022,6,22]],"date-time":"2022-06-22T14:45:44Z","timestamp":1655909144000},"page":"3840-3843","source":"Crossref","is-referenced-by-count":0,"title":["riboCleaner: a pipeline to identify and quantify rRNA read contamination from RNA-seq data in plants"],"prefix":"10.1093","volume":"38","author":[{"given":"Pu","family":"Huang","sequence":"first","affiliation":[{"name":"Computational Biology, BASF Corporation , Research Triangle Park, NC 27709-3528, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2625-8398","authenticated-orcid":false,"given":"Erin","family":"Davis","sequence":"additional","affiliation":[{"name":"Computational Biology, BASF Corporation , Research Triangle Park, NC 27709-3528, USA"}]},{"given":"Xia","family":"Cao","sequence":"additional","affiliation":[{"name":"Computational Biology, BASF Corporation , Research Triangle Park, NC 27709-3528, USA"}]},{"given":"Hunter J","family":"Cameron","sequence":"additional","affiliation":[{"name":"Computational Biology, BASF Corporation , Research Triangle Park, NC 27709-3528, USA"}]}],"member":"286","published-online":{"date-parts":[[2022,6,22]]},"reference":[{"key":"2023041405352539100_","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023041405352539100_","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1038\/nbt.3519","article-title":"Near-optimal probabilistic RNA-seq quantification","volume":"34","author":"Bray","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2023041405352539100_","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1093\/bioinformatics\/bts635","article-title":"STAR: ultrafast universal RNA-seq aligner","volume":"29","author":"Dobin","year":"2013","journal-title":"Bioinformatics"},{"key":"2023041405352539100_","doi-asserted-by":"crossref","first-page":"D1178","DOI":"10.1093\/nar\/gkr944","article-title":"Phytozome: a comparative platform for green plant genomics","volume":"40","author":"Goodstein","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023041405352539100_","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1038\/nmeth.4197","article-title":"Salmon provides fast and bias-aware quantification of transcript expression","volume":"14","author":"Patro","year":"2017","journal-title":"Nat. Methods"},{"key":"2023041405352539100_","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1093\/bioinformatics\/btr669","article-title":"Identification and removal of ribosomal RNA sequences from metatranscriptomes","volume":"28","author":"Schmieder","year":"2012","journal-title":"Bioinformatics"},{"key":"2023041405352539100_","doi-asserted-by":"crossref","first-page":"230","DOI":"10.1080\/21541264.2020.1794491","article-title":"Incomplete removal of ribosomal RNA can affect chromatin RNA-seq data analysis","volume":"11","author":"Tellier","year":"2020","journal-title":"Transcription"},{"key":"2023041405352539100_","doi-asserted-by":"crossref","first-page":"8792","DOI":"10.1093\/nar\/gkr576","article-title":"Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies","volume":"39","author":"Tripp","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023041405352539100_","doi-asserted-by":"crossref","first-page":"368","DOI":"10.1089\/cmb.2016.0113","article-title":"rRNAFilter: a fast approach for ribosomal RNA read removal without a reference database","volume":"24","author":"Wang","year":"2017","journal-title":"J. Comput. Biol"},{"key":"2023041405352539100_","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1186\/1471-2229-12-177","article-title":"Divergent patterns of endogenous small RNA populations from seed and vegetative tissues of Glycine max","volume":"12","author":"Zabala","year":"2012","journal-title":"BMC Plant Biol"},{"key":"2023041405352539100_","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1534\/g3.113.009290","article-title":"Genomic characterization of the mouse ribosomal DNA locus","volume":"4","author":"Zentner","year":"2014","journal-title":"G3 (Bethesda)"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac402\/44275914\/btac402.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/15\/3840\/49884188\/btac402.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/15\/3840\/49884188\/btac402.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,23]],"date-time":"2023-11-23T15:30:01Z","timestamp":1700753401000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/15\/3840\/6613134"}},"subtitle":[],"editor":[{"given":"Valentina","family":"Boeva","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,6,22]]},"references-count":11,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2022,8,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac402","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2022,8,1]]},"published":{"date-parts":[[2022,6,22]]}}}