{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T01:51:51Z","timestamp":1776217911851,"version":"3.50.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"15","license":[{"start":{"date-parts":[[2022,6,20]],"date-time":"2022-06-20T00:00:00Z","timestamp":1655683200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"National Institutes of Health awards","award":["R01GM116065"],"award-info":[{"award-number":["R01GM116065"]}]},{"name":"National Institutes of Health awards","award":["R01GM141074"],"award-info":[{"award-number":["R01GM141074"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,8,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>PERMANOVA is currently the most commonly used method for testing community-level hypotheses about microbiome associations with covariates of interest. PERMANOVA can test for associations that result from changes in which taxa are present or absent by using the Jaccard or unweighted UniFrac distance. However, such presence\u2013absence analyses face a unique challenge: confounding by library size (total sample read count), which occurs when library size is associated with covariates in the analysis. It is known that rarefaction (subsampling to a common library size) controls this bias but at the potential costs of information loss and the introduction of a stochastic component into the analysis.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here, we develop a non-stochastic approach to PERMANOVA presence\u2013absence analyses that aggregates information over all potential rarefaction replicates without actual resampling, when the Jaccard or unweighted UniFrac distance is used. We compare this new approach to three possible ways of aggregating PERMANOVA over multiple rarefactions obtained from resampling: averaging the distance matrix, averaging the (element-wise) squared distance matrix and averaging the F-statistic. Our simulations indicate that our non-stochastic approach is robust to confounding by library size and outperforms each of the stochastic resampling approaches. We also show that, when overdispersion is low, averaging the (element-wise) squared distance outperforms averaging the unsquared distance, currently implemented in the R package vegan. We illustrate our methods using an analysis of data on inflammatory bowel disease in which samples from case participants have systematically smaller library sizes than samples from control participants.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>We have implemented all the approaches described above, including the function for calculating the analytical average of the squared or unsquared distance matrix, in our R package LDM, which is available on GitHub at https:\/\/github.com\/yijuanhu\/LDM.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac399","type":"journal-article","created":{"date-parts":[[2022,6,20]],"date-time":"2022-06-20T09:41:25Z","timestamp":1655718085000},"page":"3689-3697","source":"Crossref","is-referenced-by-count":18,"title":["A rarefaction-without-resampling extension of PERMANOVA for testing presence\u2013absence associations in the microbiome"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2171-9041","authenticated-orcid":false,"given":"Yi-Juan","family":"Hu","sequence":"first","affiliation":[{"name":"Department of Biostatistics and Bioinformatics, Emory University , Atlanta, GA 30322, USA"}]},{"given":"Glen A","family":"Satten","sequence":"additional","affiliation":[{"name":"Department of Gynecology and Obstetrics, Emory University School of Medicine , Atlanta, GA 30322, USA"}]}],"member":"286","published-online":{"date-parts":[[2022,6,20]]},"reference":[{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40168-018-0580-7","article-title":"An exploration of prevotella-rich microbiomes in HIV and men who have sex with men","volume":"6","author":"Armstrong","year":"2018","journal-title":"Microbiome"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"852","DOI":"10.1038\/s41587-019-0209-9","article-title":"Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2","volume":"37","author":"Bolyen","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023041405343766200_","first-page":"1904.08937","author":"Brill","year":"2019"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-021-01636-1","article-title":"Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities","volume":"11","author":"Cameron","year":"2021","journal-title":"Sci. Rep"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"e15216","DOI":"10.1371\/journal.pone.0015216","article-title":"Disordered microbial communities in the upper respiratory tract of cigarette smokers","volume":"5","author":"Charlson","year":"2010","journal-title":"PLoS One"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1038\/nature11319","article-title":"Gut microbiota composition correlates with diet and health in the elderly","volume":"488","author":"Claesson","year":"2012","journal-title":"Nature"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"1359","DOI":"10.1038\/s41396-020-0613-7","article-title":"A phylogenetic model for the recruitment of species into microbial communities and application to studies of the human microbiome","volume":"14","author":"Darcy","year":"2020","journal-title":"ISME J"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1186\/s40168-018-0605-2","article-title":"Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data","volume":"6","author":"Davis","year":"2018","journal-title":"Microbiome"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40168-018-0487-3","article-title":"Bacterial biogeography of adult airways in atopic asthma","volume":"6","author":"Durack","year":"2018","journal-title":"Microbiome"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1016\/j.tim.2018.11.003","article-title":"Contamination in low microbial biomass microbiome studies: issues and recommendations","volume":"27","author":"Eisenhofer","year":"2019","journal-title":"Trends Microbiol"},{"key":"2023041405343766200_","article-title":"The gut microbiome in autism: study-site effects and longitudinal analysis of behavior change","volume":"6, e00848\u201320","author":"Fouquier","year":"2021","journal-title":"Msystems"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1080\/07350015.1983.10509354","article-title":"A nonstochastic interpretation of reported significance levels","volume":"1","author":"Freedman","year":"1983","journal-title":"J. Bus. Econ. Stat"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40168-018-0502-8","article-title":"Increased richness and diversity of the vaginal microbiota and spontaneous preterm birth","volume":"6","author":"Freitas","year":"2018","journal-title":"Microbiome"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"382","DOI":"10.1016\/j.chom.2014.02.005","article-title":"The treatment-naive microbiome in new-onset Crohn\u2019s disease","volume":"15","author":"Gevers","year":"2014","journal-title":"Cell Host Microbe"},{"key":"2023041405343766200_","first-page":"283283","author":"Glassman","year":"2018"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"11994","DOI":"10.1073\/pnas.1811269115","article-title":"Decomposition responses to climate depend on microbial community composition","volume":"115","author":"Glassman","year":"2018","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1093\/bib\/bbx104","article-title":"A broken promise: microbiome differential abundance methods do not control the false discovery rate","volume":"20","author":"Hawinkel","year":"2019","journal-title":"Brief. Bioinform"},{"key":"2023041405343766200_","article-title":"Issues and current standards of controls in microbiome research","volume":"95, fiz045","author":"Hornung","year":"2019","journal-title":"FEMS Microbiol. Ecol"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"4106","DOI":"10.1093\/bioinformatics\/btaa260","article-title":"Testing hypotheses about the microbiome using the linear decomposition model (LDM)","volume":"36","author":"Hu","year":"2020","journal-title":"Bioinformatics"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"1652","DOI":"10.1093\/bioinformatics\/btab012","article-title":"A rarefaction-based extension of the LDM for testing presence\u2013absence associations in the microbiome","volume":"37","author":"Hu","year":"2021","journal-title":"Bioinformatics"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1016\/S0076-6879(05)97017-1","article-title":"The application of rarefaction techniques to molecular inventories of microbial diversity","volume":"397","author":"Hughes","year":"2005","journal-title":"Methods Enzymol"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1186\/s40168-015-0083-8","article-title":"Deriving accurate microbiota profiles from human samples with low bacterial content through post-sequencing processing of illumina MiSeq data","volume":"3","author":"Jervis-Bardy","year":"2015","journal-title":"Microbiome"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511525575","volume-title":"Data Analysis in Community and Landscape Ecology","author":"Jongman","year":"1995"},{"key":"2023041405343766200_","article-title":"Controlling for contaminants in low-biomass 16s rRNA gene sequencing experiments","volume":"4, e00290\u201319","author":"Karstens","year":"2019","journal-title":"mSystems"},{"key":"2023041405343766200_","volume-title":"Applied Regression Analysis and Other Multivariable Methods","author":"Kleinbaum","year":"2007"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"2859","DOI":"10.1002\/sim.8940","article-title":"Meta-analysis methods for multiple related markers: applications to microbiome studies with the results on multiple \u03b1-diversity indices","volume":"40","author":"Koh","year":"2021","journal-title":"Stat. Med"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"8228","DOI":"10.1128\/AEM.71.12.8228-8235.2005","article-title":"UniFrac: a new phylogenetic method for comparing microbial communities","volume":"71","author":"Lozupone","year":"2005","journal-title":"Appl. Environ. Microbiol"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"1576","DOI":"10.1128\/AEM.01996-06","article-title":"Quantitative and qualitative \u03b2 diversity measures lead to different insights into factors that structure microbial communities","volume":"73","author":"Lozupone","year":"2007","journal-title":"Appl. Environ. Microbiol"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1890\/0012-9658(2001)082[0290:FMMTCD]2.0.CO;2","article-title":"Fitting multivariate models to community data: a comment on distance-based redundancy analysis","volume":"82","author":"McArdle","year":"2001","journal-title":"Ecology"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"e1003531","DOI":"10.1371\/journal.pcbi.1003531","article-title":"Waste not, want not: why rarefying microbiome data is inadmissible","volume":"10","author":"McMurdie","year":"2014","journal-title":"PLoS Comput. Biol"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","DOI":"10.1128\/mSystems.00186-19","article-title":"Quantifying and understanding well-to-well contamination in microbiome research","volume":"4","author":"Minich","year":"2019","journal-title":"MSystems"},{"key":"2023041405343766200_","volume-title":"Regression and ANOVA: An Integrated Approach Using SAS Software","author":"Muller","year":"2012"},{"key":"2023041405343766200_","first-page":"371","author":"Navas-Molina","year":"2013"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"669776","DOI":"10.3389\/fmicb.2021.669776","article-title":"An economical and flexible dual barcoding, two-step PCR approach for highly multiplexed amplicon sequencing","volume":"12","author":"Pjevac","year":"2021","journal-title":"Front. Microbiol"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"3567","DOI":"10.1093\/bioinformatics\/btz120","article-title":"Pldist: ecological dissimilarities for paired and longitudinal microbiome association analysis","volume":"35","author":"Plantinga","year":"2019","journal-title":"Bioinformatics"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1038\/nature25973","article-title":"Environment dominates over host genetics in shaping human gut microbiota","volume":"555","author":"Rothschild","year":"2018","journal-title":"Nature"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-018-0617-x","article-title":"Temporal development of the gut microbiome in early childhood from the teddy study","volume":"562","author":"Stewart","year":"2018","journal-title":"Nature"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"2618","DOI":"10.1093\/bioinformatics\/btw311","article-title":"PERMANOVA-S: association test for microbial community composition that accommodates confounders and multiple distances","volume":"32","author":"Tang","year":"2016","journal-title":"Bioinformatics"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1186\/s40168-016-0208-8","article-title":"Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies","volume":"4","author":"Thorsen","year":"2016","journal-title":"Microbiome"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1186\/s40168-017-0237-y","article-title":"Normalization and microbial differential abundance strategies depend upon data characteristics","volume":"5","author":"Weiss","year":"2017","journal-title":"Microbiome"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"1875","DOI":"10.1093\/bioinformatics\/bty014","article-title":"A distance-based approach for testing the mediation effect of the human microbiome","volume":"34","author":"Zhang","year":"2018","journal-title":"Bioinformatics"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"797","DOI":"10.1016\/j.ajhg.2015.04.003","article-title":"Testing in microbiome-profiling studies with MiRKAT, the microbiome regression-based kernel association test","volume":"96","author":"Zhao","year":"2015","journal-title":"Am. J. Hum. Genet"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"2915","DOI":"10.1093\/bioinformatics\/btac181","article-title":"Integrative analysis of relative abundance data and presence-absence data of the microbiome using the LDM","volume":"38","author":"Zhu","year":"2022","journal-title":"Bioinformatics"},{"key":"2023041405343766200_","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1186\/s40168-019-0678-6","article-title":"Towards precision quantification of contamination in metagenomic sequencing experiments","volume":"7","author":"Zinter","year":"2019","journal-title":"Microbiome"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac399\/44329256\/btac399.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/15\/3689\/49883819\/btac399.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/15\/3689\/49883819\/btac399.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,27]],"date-time":"2024-09-27T03:11:09Z","timestamp":1727406669000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/15\/3689\/6611716"}},"subtitle":[],"editor":[{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,6,20]]},"references-count":44,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2022,8,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac399","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.04.06.438671","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,8,1]]},"published":{"date-parts":[[2022,6,20]]}}}