{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,27]],"date-time":"2026-01-27T23:29:39Z","timestamp":1769556579091,"version":"3.49.0"},"reference-count":33,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,5,8]],"date-time":"2025-05-08T00:00:00Z","timestamp":1746662400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:p>In 16S-rRNA microbiome studies, cross-contamination and environmental contamination can obscure true biological signal. This contamination is particularly problematic in low-biomass studies, which are characterized by samples with a small amount of microbial DNA. Although multiple methods and packages for decontaminating microbiome data exist, there is no consensus on the most appropriate tool for decontamination based on the individual research study design and how to quantify the impact of removing identified contaminants to avoid over-filtering. To address these gaps, we introduce micRoclean, an open-source R package that contains two distinct microbiome decontamination pipelines with guidance on which to select based on the downstream goals of the research study and study design. This package integrates and expands on existing packages for microbiome decontamination and analysis for convenience of users. Furthermore, micRoclean also implements a filtering loss statistic to quantify the impact of decontamination on the overall covariance structure of the data. In this paper, we demonstrate the utility of micRoclean through implementation on example data, illustrating that micRoclean effectively and intuitively decontaminates microbiome data. Further, we demonstrate through a multi-batch simulated microbiome sample that micRoclean matches or outperforms tools with similar objectives. This package is freely available from GitHub repository rachelgriffard\/micRoclean.<\/jats:p>","DOI":"10.3389\/fbinf.2025.1556361","type":"journal-article","created":{"date-parts":[[2025,5,8]],"date-time":"2025-05-08T04:13:19Z","timestamp":1746677599000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["micRoclean: an R package for decontaminating low-biomass 16S-rRNA microbiome data"],"prefix":"10.3389","volume":"5","author":[{"given":"Rachel","family":"Griffard-Smith","sequence":"first","affiliation":[]},{"given":"Emily","family":"Schueddig","sequence":"additional","affiliation":[]},{"given":"Diane E.","family":"Mahoney","sequence":"additional","affiliation":[]},{"given":"Prabhakar","family":"Chalise","sequence":"additional","affiliation":[]},{"given":"Devin C.","family":"Koestler","sequence":"additional","affiliation":[]},{"given":"Dong","family":"Pei","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,5,8]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"1820","DOI":"10.1038\/s41587-023-01696-w","article-title":"Contamination source modeling with SCRuB improves cancer phenotype prediction from microbiome data","volume":"41","author":"Austin","year":"2023","journal-title":"Nat. Biotechnol."},{"key":"B2","doi-asserted-by":"publisher","first-page":"e1340","DOI":"10.7717\/peerj-cs.1340","article-title":"Deep learning and support vector machines for transcription start site identification","volume":"9","author":"Barbero-Aparicio","year":"2023","journal-title":"PeerJ Comput. Sci."},{"key":"B3","doi-asserted-by":"publisher","first-page":"852","DOI":"10.1038\/s41587-019-0209-9","article-title":"Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2","volume":"37","author":"Bolyen","year":"2019","journal-title":"Nat. Biotechnol."},{"key":"B4","doi-asserted-by":"publisher","first-page":"1581","DOI":"10.1016\/j.cell.2018.05.015","article-title":"Next-Generation machine learning for biological networks","volume":"173","author":"Camacho","year":"2018","journal-title":"Cell"},{"key":"B5","doi-asserted-by":"publisher","first-page":"226","DOI":"10.1186\/s40168-018-0605-2","article-title":"Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data","volume":"6","author":"Davis","year":"2018","journal-title":"Microbiome"},{"key":"B6","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1016\/j.tim.2018.11.003","article-title":"Contamination in low microbial biomass microbiome studies: issues and recommendations","volume":"27","author":"Eisenhofer","year":"2019","journal-title":"Trends Microbiol."},{"key":"B7","article-title":"irr: various coefficients of interrater reliability and agreement","author":"Gamer","year":"2019"},{"key":"B8","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1186\/s12915-023-01737-5","article-title":"Benchmarking MicrobIEM \u2013 a user-friendly tool for decontamination of microbiome sequencing data","volume":"21","author":"H\u00fclp\u00fcsch","year":"2023","journal-title":"BMC Biol."},{"key":"B9","doi-asserted-by":"publisher","first-page":"1293","DOI":"10.1038\/s12276-024-01243-w","article-title":"Big data and deep learning for RNA biology","volume":"56","author":"Hyeonseo Hwang","year":"2024","journal-title":"Exp. and Mol. Med."},{"key":"B10","doi-asserted-by":"publisher","first-page":"1176","DOI":"10.1093\/bioinformatics\/btab754","article-title":"Mian: interactive web-based microbiome data table visualization and machine learning platform","volume":"38","author":"Jin","year":"2022","journal-title":"Bioinformatics"},{"key":"B11","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1186\/s12934-022-01973-4","article-title":"Machine learning for data integration in human gut microbiome","volume":"21","author":"Li","year":"2022","journal-title":"Microb. Cell Fact."},{"key":"B12","doi-asserted-by":"publisher","first-page":"3514","DOI":"10.1038\/s41467-020-17041-7","article-title":"Analysis of compositions of microbiomes with bias correction","volume":"11","author":"Lin","year":"2020","journal-title":"Nat. Commun."},{"key":"B13","doi-asserted-by":"publisher","first-page":"1480","DOI":"10.1007\/s12094-023-03373-5","article-title":"Changes in the fecal microbiota of breast cancer patients based on 16S rRNA gene sequencing: a systematic review and meta-analysis","volume":"26","author":"Luan","year":"2024","journal-title":"Clin. and Transl. Oncol."},{"key":"B14","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1002\/edn3.11","article-title":"microDecon: a highly accurate read-subtraction tool for the post-sequencing removal of contamination in metabarcoding studies","volume":"1","author":"McKnight","year":"2019","journal-title":"Environ. DNA"},{"key":"B15","doi-asserted-by":"publisher","first-page":"435","DOI":"10.1007\/s12223-018-00670-3","article-title":"Melanoma-related changes in skin microbiome","volume":"64","author":"Mr\u00e1zek","year":"2019","journal-title":"Folia Microbiol."},{"key":"B16","doi-asserted-by":"publisher","first-page":"206","DOI":"10.1007\/s12275-020-0066-8","article-title":"Machine learning methods for microbiome studies","volume":"58","author":"Namkung","year":"2020","journal-title":"J. Microbiol."},{"key":"B17","doi-asserted-by":"publisher","first-page":"giad017","DOI":"10.1093\/gigascience\/giad017","article-title":"Contamination detection and microbiome exploration with GRIMER","volume":"12","author":"Piro","year":"2023","journal-title":"GigaScience"},{"key":"B18","doi-asserted-by":"publisher","first-page":"1516667","DOI":"10.3389\/fmicb.2024.1516667","article-title":"Deep learning in microbiome analysis: a comprehensive review of neural network models","volume":"15","author":"Przymus","year":"2024","journal-title":"Front. Microbiol."},{"key":"B19","doi-asserted-by":"publisher","first-page":"D590","DOI":"10.1093\/nar\/gks1219","article-title":"The SILVA ribosomal RNA gene database project: improved data processing and web-based tools","volume":"41","author":"Quast","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"B20","doi-asserted-by":"publisher","first-page":"D753","DOI":"10.1093\/nar\/gkac1080","article-title":"MGnify: the microbiome sequence data analysis resource in 2023","volume":"51","author":"Richardson","year":"2022","journal-title":"Nucleic Acids Res."},{"key":"B21","doi-asserted-by":"publisher","first-page":"795","DOI":"10.1016\/j.cell.2019.07.008","article-title":"Tumor microbiome diversity and composition influence pancreatic cancer outcomes","volume":"178","author":"Riquelme","year":"2019","journal-title":"Cell"},{"key":"B22","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1186\/s12915-014-0087-z","article-title":"Reagent and laboratory contamination can critically impact sequence-based microbiome analyses","volume":"12","author":"Salter","year":"2014","journal-title":"BMC Biol."},{"key":"B23","doi-asserted-by":"publisher","first-page":"615","DOI":"10.1093\/biostatistics\/kxy020","article-title":"PERFect: PERmutation Filtering test for microbiome data","volume":"20","author":"Smirnova","year":"2019","journal-title":"Biostatistics"},{"key":"B24","doi-asserted-by":"publisher","first-page":"e0031423","DOI":"10.1128\/spectrum.00314-23","article-title":"Characterization of lung and oral microbiomes in lung cancer patients using culturomics and 16S rRNA gene sequencing","volume":"11","author":"Sun","year":"2023","journal-title":"Microbiol. Spectr."},{"key":"B25","doi-asserted-by":"publisher","first-page":"5039","DOI":"10.1128\/AEM.01235-16","article-title":"The microbiota of breast tissue and its association with breast cancer","volume":"82","author":"Urbaniak","year":"2016","journal-title":"Appl. Environ. Microbiol."},{"key":"B26","doi-asserted-by":"publisher","first-page":"637","DOI":"10.1039\/D4SC06864E","article-title":"3DSMILES-GPT: 3D molecular pocket-based generation with token-only large language model","volume":"16","author":"Wang","year":"2025","journal-title":"Chem. Sci."},{"key":"B27","doi-asserted-by":"publisher","first-page":"814520","DOI":"10.3389\/fendo.2022.814520","article-title":"Blood bacterial 16S rRNA gene alterations in women with polycystic ovary syndrome","volume":"13","author":"Wang","year":"2022","journal-title":"Front. Endocrinol."},{"key":"B28","doi-asserted-by":"publisher","first-page":"328","DOI":"10.1093\/bib\/5.4.328","article-title":"Biological applications of support vector machines","volume":"5","author":"Yang","year":"2004","journal-title":"Brief. Bioinform"},{"key":"B29","doi-asserted-by":"publisher","first-page":"13727","DOI":"10.1039\/D4SC03744H","article-title":"Unlocking comprehensive molecular design across all scenarios with large language model and unordered chemical language","volume":"15","author":"Yue","year":"2024","journal-title":"Chem. Sci."},{"key":"B30","doi-asserted-by":"publisher","first-page":"bbac384","DOI":"10.1093\/bib\/bbac384","article-title":"A geometric deep learning framework for drug repositioning over heterogeneous information networks","volume":"23","author":"Zhao","year":"2022","journal-title":"Briefings Bioinforma."},{"key":"B31","doi-asserted-by":"publisher","first-page":"2924","DOI":"10.1016\/j.csbj.2024.06.032","article-title":"A heterogeneous information network learning model with neighborhood-level structural representation for predicting lncRNA-miRNA interactions","volume":"23","author":"Zhao","year":"2024","journal-title":"Comput. Struct. Biotechnol. J."},{"key":"B32","doi-asserted-by":"publisher","first-page":"121360","DOI":"10.1016\/j.ins.2024.121360","article-title":"Regulation-aware graph learning for drug repositioning over heterogeneous biological network","volume":"686","author":"Zhao","year":"2025","journal-title":"Inf. Sci."},{"key":"B33","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1186\/s13059-021-02401-3","article-title":"Detection of cell-free microbial DNA using a contaminant-controlled analysis framework","volume":"22","author":"Zozaya-Valdes","year":"2021","journal-title":"Gene Biol."}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1556361\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,8]],"date-time":"2025-05-08T04:13:20Z","timestamp":1746677600000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1556361\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,8]]},"references-count":33,"alternative-id":["10.3389\/fbinf.2025.1556361"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2025.1556361","relation":{},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,8]]},"article-number":"1556361"}}