{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,4]],"date-time":"2024-08-04T12:37:23Z","timestamp":1722775043663},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,5,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Establishment of a statistical association between microbiome features and clinical outcomes is of growing interest because of the potential for yielding insights into biological mechanisms and pathogenesis. Extracting microbiome features that are relevant for a disease is challenging and existing variable selection methods are limited due to large number of risk factor variables from microbiome sequence data and their complex biological structure.<\/jats:p><jats:p>Results: We propose a tree-based scanning method, Selection of Models for the Analysis of Risk factor Trees (referred to as SMART-scan), for identifying taxonomic groups that are associated with a disease or trait. SMART-scan is a model selection technique that uses a predefined taxonomy to organize the large pool of possible predictors into optimized groups, and hierarchically searches and determines variable groups for association test. We investigate the statistical properties of SMART-scan through simulations, in comparison to a regular single-variable analysis and three commonly-used variable selection methods, stepwise regression, least absolute shrinkage and selection operator (LASSO) and classification and regression tree (CART). When there are taxonomic group effects in the data, SMART-scan can significantly increase power by using bacterial taxonomic information to split large numbers of variables into groups. Through an application to microbiome data from a vervet monkey diet experiment, we demonstrate that SMART-scan can identify important phenotype-associated taxonomic features missed by single-variable analysis, stepwise regression, LASSO and CART.<\/jats:p><jats:p>Availability and implementation: The SMART-scan approach is implemented in R and is available at https:\/\/dsgweb.wustl.edu\/qunyuan\/software\/smartscan\/<\/jats:p><jats:p>Contact: qunyuan@wustl.edu<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu855","type":"journal-article","created":{"date-parts":[[2015,1,8]],"date-time":"2015-01-08T02:10:15Z","timestamp":1420683015000},"page":"1607-1613","source":"Crossref","is-referenced-by-count":12,"title":["Selection of models for the analysis of risk-factor trees: leveraging biological knowledge to mine large sets of risk factors with application to microbiome data"],"prefix":"10.1093","volume":"31","author":[{"given":"Qunyuan","family":"Zhang","sequence":"first","affiliation":[{"name":"1 Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, 2Department of Biology, Washington University, St. Louis, MO, USA, 3The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and 4 Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haley","family":"Abel","sequence":"additional","affiliation":[{"name":"1 Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, 2Department of Biology, Washington University, St. Louis, MO, USA, 3The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and 4 Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alan","family":"Wells","sequence":"additional","affiliation":[{"name":"1 Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, 2Department of Biology, Washington University, St. Louis, MO, USA, 3The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and 4 Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Petra","family":"Lenzini","sequence":"additional","affiliation":[{"name":"1 Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, 2Department of Biology, Washington University, St. Louis, MO, USA, 3The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and 4 Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Felicia","family":"Gomez","sequence":"additional","affiliation":[{"name":"1 Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, 2Department of Biology, Washington University, St. Louis, MO, USA, 3The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and 4 Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael A.","family":"Province","sequence":"additional","affiliation":[{"name":"1 Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, 2Department of Biology, Washington University, St. Louis, MO, USA, 3The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and 4 Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alan A.","family":"Templeton","sequence":"additional","affiliation":[{"name":"1 Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, 2Department of Biology, Washington University, St. Louis, MO, USA, 3The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and 4 Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA"},{"name":"1 Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, 2Department of Biology, Washington University, St. Louis, MO, USA, 3The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and 4 Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"George M.","family":"Weinstock","sequence":"additional","affiliation":[{"name":"1 Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, 2Department of Biology, Washington University, St. Louis, MO, USA, 3The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and 4 Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nita H.","family":"Salzman","sequence":"additional","affiliation":[{"name":"1 Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, 2Department of Biology, Washington University, St. Louis, MO, USA, 3The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and 4 Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ingrid B.","family":"Borecki","sequence":"additional","affiliation":[{"name":"1 Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, 2Department of Biology, Washington University, St. Louis, MO, USA, 3The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and 4 Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2015,1,6]]},"reference":[{"key":"2023020115462945500_btu855-B1","doi-asserted-by":"crossref","first-page":"1907","DOI":"10.1093\/jnci\/djt300","article-title":"Human gut microbiome and risk for colorectal cancer","volume":"105","author":"Ahn","year":"2013","journal-title":"J. Natl. Cancer Inst."},{"key":"2023020115462945500_btu855-B2","first-page":"32","article-title":"A new method for non-parametric multivariate analysis of variance","volume":"26","author":"Anderson","year":"2001","journal-title":"Austral Ecol."},{"key":"2023020115462945500_btu855-B3","doi-asserted-by":"crossref","first-page":"266","DOI":"10.1093\/bioinformatics\/btp636","article-title":"PyNAST: a flexible tool for aligning sequences to a template alignment","volume":"26","author":"Caporaso","year":"2010","journal-title":"Bioinformatics."},{"key":"2023020115462945500_btu855-B4","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1186\/s12862-014-0207-y","article-title":"Functional phylogenomics analysis of bacteria and archaea using consistent genome annotation with UniFam","volume":"14","author":"Chai","year":"2014","journal-title":"BMC Evol. Biol."},{"key":"2023020115462945500_btu855-B5","doi-asserted-by":"crossref","first-page":"2106","DOI":"10.1093\/bioinformatics\/bts342","article-title":"Associating microbiome composition with environmental covariates using generalized UniFrac distances","volume":"28","author":"Chen","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020115462945500_btu855-B6","doi-asserted-by":"crossref","first-page":"5069","DOI":"10.1128\/AEM.03006-05","article-title":"Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB","volume":"72","author":"DeSantis","year":"2006","journal-title":"Appl. Environ. Microbiol."},{"key":"2023020115462945500_btu855-B7","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1186\/1471-230X-13-131","article-title":"Association of gut microbiota with post-operative clinical course in Crohn's disease","volume":"13","author":"Dey","year":"2013","journal-title":"BMC Gastroenterol."},{"key":"2023020115462945500_btu855-B8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1086\/284325","article-title":"Phylogenies and the comparative method","volume":"125","author":"Felsenstein","year":"1985","journal-title":"Am. Nat."},{"key":"2023020115462945500_btu855-B9","volume-title":"PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences","author":"Felsenstein","year":"2005"},{"key":"2023020115462945500_btu855-B10","doi-asserted-by":"crossref","first-page":"713","DOI":"10.1086\/587525","article-title":"Comparative methods with sampling error and within-species variation: contrasts revisited and revised","volume":"171","author":"Felsenstein","year":"2008","journal-title":"Am. Nat."},{"key":"2023020115462945500_btu855-B11","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1093\/bioinformatics\/btt608","article-title":"Identification of important regressor groups, subgroups and individuals via regularization methods: application to gut microbiome data","volume":"30","author":"Garcia","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020115462945500_btu855-B12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.1574-6941.2003.tb01040.x","article-title":"Using ecological diversity measures with bacterial communities","volume":"43","author":"Hill","year":"2003","journal-title":"FEMS Microbiol. Ecol."},{"key":"2023020115462945500_btu855-B13","doi-asserted-by":"crossref","first-page":"e30126","DOI":"10.1371\/journal.pone.0030126","article-title":"Dirichlet multinomial mixtures: generative models for microbial metagenomics","volume":"7","author":"Holmes","year":"2012","journal-title":"PloS one"},{"key":"2023020115462945500_btu855-B14","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1038\/nature11209","article-title":"A framework for human microbiome research","volume":"486","author":"Human Microbiome Project","year":"2012","journal-title":"Nature"},{"key":"2023020115462945500_btu855-B15","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1038\/nature11234","article-title":"Structure, function and diversity of the healthy human microbiome","volume":"486","author":"Human Microbiome Project","year":"2012","journal-title":"Nature"},{"key":"2023020115462945500_btu855-B16","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1093\/ilar\/ilt049","article-title":"Systems biology of the vervet monkey","volume":"54","author":"Jasinska","year":"2013","journal-title":"ILAR J."},{"key":"2023020115462945500_btu855-B17","first-page":"448","article-title":"Effects of a Western-type diet on plasma lipids and other cardiometabolic risk factors in African green monkeys (Chlorocebus aethiops sabaeus)","volume":"52","author":"Jorgensen","year":"2013","journal-title":"J. Am. Assoc. Lab. Anim. Sci."},{"key":"2023020115462945500_btu855-B18","doi-asserted-by":"crossref","first-page":"e52078","DOI":"10.1371\/journal.pone.0052078","article-title":"Hypothesis testing and power calculations for taxonomic-based human microbiome data","volume":"7","author":"La Rosa","year":"2012","journal-title":"PloS one"},{"key":"2023020115462945500_btu855-B19","doi-asserted-by":"crossref","first-page":"8228","DOI":"10.1128\/AEM.71.12.8228-8235.2005","article-title":"UniFrac: a new phylogenetic method for comparing microbial communities","volume":"71","author":"Lozupone","year":"2005","journal-title":"Appl. Environ. Microbiol."},{"key":"2023020115462945500_btu855-B20","doi-asserted-by":"crossref","first-page":"1363","DOI":"10.1093\/nar\/gkh293","article-title":"ARB: a software environment for sequence data","volume":"32","author":"Ludwig","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023020115462945500_btu855-B21","first-page":"209","article-title":"The detection of disease clustering and a generalized regression approach","volume":"27","author":"Mantel","year":"1967","journal-title":"Cancer Res."},{"key":"2023020115462945500_btu855-B22","doi-asserted-by":"crossref","first-page":"7188","DOI":"10.1093\/nar\/gkm864","article-title":"SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB","volume":"35","author":"Pruesse","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023020115462945500_btu855-B23","doi-asserted-by":"crossref","first-page":"7537","DOI":"10.1128\/AEM.01541-09","article-title":"Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities","volume":"75","author":"Schloss","year":"2009","journal-title":"Appl. Environ. Microbiol."},{"key":"2023020115462945500_btu855-B24","doi-asserted-by":"crossref","first-page":"R42","DOI":"10.1186\/gb-2012-13-6-r42","article-title":"Composition of the adult digestive tract bacterial microbiome based on seven mouth surfaces, tonsils, throat and stool samples","volume":"13","author":"Segata","year":"2012","journal-title":"Genome Biol."},{"key":"2023020115462945500_btu855-B25","doi-asserted-by":"crossref","first-page":"e205","DOI":"10.1093\/nar\/gkq872","article-title":"Advanced computational algorithms for microbial community analysis using massive 16S rRNA sequence data","volume":"38","author":"Sun","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2023020115462945500_btu855-B26","doi-asserted-by":"crossref","first-page":"659","DOI":"10.1093\/genetics\/134.2.659","article-title":"A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. IV. Nested analyses with cladogram uncertainty and recombination","volume":"134","author":"Templeton","year":"1993","journal-title":"Genetics"},{"key":"2023020115462945500_btu855-B27","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1534\/genetics.104.030080","article-title":"Tree scanning: a method for using haplotype trees in phenotype\/genotype association studies","volume":"169","author":"Templeton","year":"2005","journal-title":"Genetics"},{"key":"2023020115462945500_btu855-B28","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the Lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. Roy. Stat. Soc."},{"key":"2023020115462945500_btu855-B29","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1002\/ajp.22125","article-title":"Significant genotype by diet (G x D) interaction effects on cardiometabolic responses to a pedigree-wide, dietary challenge in vervet monkeys (Chlorocebus aethiops sabaeus)","volume":"75","author":"Voruganti","year":"2013","journal-title":"Am. J. Primatol."},{"key":"2023020115462945500_btu855-B30","doi-asserted-by":"crossref","first-page":"5261","DOI":"10.1128\/AEM.00062-07","article-title":"Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy","volume":"73","author":"Wang","year":"2007","journal-title":"Appl. Environ. Microbiol."},{"key":"2023020115462945500_btu855-B31","doi-asserted-by":"crossref","first-page":"e1000352","DOI":"10.1371\/journal.pcbi.1000352","article-title":"Statistical methods for detecting differentially abundant features in clinical metagenomic samples","volume":"5","author":"White","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023020115462945500_btu855-B32","doi-asserted-by":"crossref","first-page":"e1000667","DOI":"10.1371\/journal.pcbi.1000667","article-title":"A primer on metagenomics","volume":"6","author":"Wooley","year":"2010","journal-title":"PLoS Comput. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/10\/1607\/49013416\/bioinformatics_31_10_1607.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/10\/1607\/49013416\/bioinformatics_31_10_1607.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,6]],"date-time":"2024-06-06T12:23:09Z","timestamp":1717676589000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/10\/1607\/176542"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,1,6]]},"references-count":32,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2015,5,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu855","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,5,15]]},"published":{"date-parts":[[2015,1,6]]}}}