{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T07:32:37Z","timestamp":1761895957217},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,1,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: With the advance of new sequencing technologies producing massive short reads data, metagenomics is rapidly growing, especially in the fields of environmental biology and medical science. The metagenomic data are not only high dimensional with large number of features and limited number of samples but also complex with a large number of zeros and skewed distribution. Efficient computational and statistical tools are needed to deal with these unique characteristics of metagenomic sequencing data. In metagenomic studies, one main objective is to assess whether and how multiple microbial communities differ under various environmental conditions.<\/jats:p><jats:p>Results: We propose a two-stage statistical procedure for selecting informative features and identifying differentially abundant features between two or more groups of microbial communities. In the functional analysis of metagenomes, the features may refer to the pathways, subsystems, functional roles and so on. In the first stage of the proposed procedure, the informative features are selected using elastic net as reducing the dimension of metagenomic data. In the second stage, the differentially abundant features are detected using generalized linear models with a negative binomial distribution. Compared with other available methods, the proposed approach demonstrates better performance for most of the comprehensive simulation studies. The new method is also applied to two real metagenomic datasets related to human health. Our findings are consistent with those in previous reports.<\/jats:p><jats:p>Availability: R code and two example datasets are available at http:\/\/cals.arizona.edu\/\u223canling\/software.htm<\/jats:p><jats:p>Contact: \u00a0anling@email.arizona.edu<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary file is available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu635","type":"journal-article","created":{"date-parts":[[2014,9,26]],"date-time":"2014-09-26T00:20:55Z","timestamp":1411690855000},"page":"158-165","source":"Crossref","is-referenced-by-count":14,"title":["A two-stage statistical procedure for feature selection and comparison in functional analysis of metagenomes"],"prefix":"10.1093","volume":"31","author":[{"given":"Naruekamol","family":"Pookhao","sequence":"first","affiliation":[{"name":"1 Department of Agricultural & Biosystems Engineering, 2 Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 85721 and 3 Department of Statistics, Northwestern University, Evanston, IL 60208, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael B.","family":"Sohn","sequence":"additional","affiliation":[{"name":"1 Department of Agricultural & Biosystems Engineering, 2 Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 85721 and 3 Department of Statistics, Northwestern University, Evanston, IL 60208, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qike","family":"Li","sequence":"additional","affiliation":[{"name":"1 Department of Agricultural & Biosystems Engineering, 2 Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 85721 and 3 Department of Statistics, Northwestern University, Evanston, IL 60208, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Isaac","family":"Jenkins","sequence":"additional","affiliation":[{"name":"1 Department of Agricultural & Biosystems Engineering, 2 Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 85721 and 3 Department of Statistics, Northwestern University, Evanston, IL 60208, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ruofei","family":"Du","sequence":"additional","affiliation":[{"name":"1 Department of Agricultural & Biosystems Engineering, 2 Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 85721 and 3 Department of Statistics, Northwestern University, Evanston, IL 60208, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongmei","family":"Jiang","sequence":"additional","affiliation":[{"name":"1 Department of Agricultural & Biosystems Engineering, 2 Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 85721 and 3 Department of Statistics, Northwestern University, Evanston, IL 60208, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lingling","family":"An","sequence":"additional","affiliation":[{"name":"1 Department of Agricultural & Biosystems Engineering, 2 Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 85721 and 3 Department of Statistics, Northwestern University, Evanston, IL 60208, USA"},{"name":"1 Department of Agricultural & Biosystems Engineering, 2 Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 85721 and 3 Department of Statistics, Northwestern University, Evanston, IL 60208, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2014,9,24]]},"reference":[{"key":"2023020116141625300_btu635-B2","doi-asserted-by":"crossref","first-page":"R106","DOI":"10.1186\/gb-2010-11-10-r106","article-title":"Differential expression analysis for sequence count data","volume":"11","author":"Anders","year":"2010","journal-title":"Genome Biol."},{"key":"2023020116141625300_btu635-B3","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. B"},{"key":"2023020116141625300_btu635-B4","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1055\/s-0031-1295687","article-title":"Inflammatory bowel disease in the obese patient","volume":"24","author":"Boutros","year":"2011","journal-title":"Clin. Colon Rectal. Surg."},{"key":"2023020116141625300_btu635-B6","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511814365","volume-title":"Regression Analysis of Count Data","author":"Cameron","year":"1998"},{"key":"2023020116141625300_btu635-B8","doi-asserted-by":"crossref","first-page":"2998","DOI":"10.1158\/0008-5472.CAN-12-4402","article-title":"Adipocytes cause leukemia cell resistance to L-Asparaginase via release of glutamine","volume":"73","author":"Ehsanipour","year":"2013","journal-title":"Cancer Res."},{"key":"2023020116141625300_btu635-B11","first-page":"1","article-title":"Regularization paths for generalized linear models via coordinate descent","volume":"33","author":"Friedman","year":"2010","journal-title":"J. Stat. Softw. Jan."},{"key":"2023020116141625300_btu635-B12","doi-asserted-by":"crossref","first-page":"777","DOI":"10.1038\/ismej.2010.178","article-title":"The future of microbial metagenomics (or is ignorance bliss?)","volume":"5","author":"Gilbert","year":"2011","journal-title":"ISME J."},{"key":"2023020116141625300_btu635-B13","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The Elements of Statistical Learning: Prediction, Inference and Data Mining","author":"Hastie","year":"2009","edition":"2nd edn"},{"key":"2023020116141625300_btu635-B14","doi-asserted-by":"crossref","first-page":"REVIEWS0003","DOI":"10.1186\/gb-2002-3-2-reviews0003","article-title":"Exploring prokaryotic diversity in the genomic era","volume":"3","author":"Hugenholtz","year":"2002","journal-title":"Genome Biol."},{"key":"2023020116141625300_btu635-B17","doi-asserted-by":"crossref","first-page":"S12","DOI":"10.1186\/1471-2105-10-S1-S12","article-title":"Methods for comparative metagenomics","volume":"10","author":"Huson","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023020116141625300_btu635-B18","doi-asserted-by":"crossref","first-page":"1552","DOI":"10.1101\/gr.120618.111","article-title":"Integrative analysis of environmental sequences using MEGAN4","volume":"21","author":"Huson","year":"2011","journal-title":"Genome Res."},{"key":"2023020116141625300_btu635-B19","first-page":"1","article-title":"ppGpp: stringent response and survival","volume":"44","author":"Jain","year":"2006","journal-title":"J. Microbiol."},{"key":"2023020116141625300_btu635-B20","doi-asserted-by":"crossref","first-page":"2737","DOI":"10.1093\/bioinformatics\/btp508","article-title":"ShotgunFunctionalizeR: an R-package for functional comparison of metagenomes","volume":"25","author":"Kristiansson","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020116141625300_btu635-B21","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1128\/MMBR.00009-08","article-title":"A bioinformatics\u2019s guide to metagenomics","volume":"72","author":"Kunin","year":"2008","journal-title":"Microbiol. Mol. Biol. Rev."},{"key":"2023020116141625300_btu635-B23","doi-asserted-by":"crossref","first-page":"1200","DOI":"10.1038\/nmeth.2658","article-title":"Differential abundance analysis for microbial marker-gene surveys","volume":"10","author":"Paulson","year":"2013","journal-title":"Nat. Methods"},{"key":"2023020116141625300_btu635-B25","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nature08821","article-title":"A human gut microbial gene catalogue established by metagenomic sequencing","volume":"464","author":"Qin","year":"2010","journal-title":"Nature"},{"key":"2023020116141625300_btu635-B26","doi-asserted-by":"crossref","first-page":"e00956","DOI":"10.1128\/mBio.00956-13","article-title":"Biogeochemical forces shape the composition and physiology of polymicrobial communities in the cystic fibrosis lung","volume":"5","author":"Quinn","year":"2014","journal-title":"mBio"},{"key":"2023020116141625300_btu635-B27","doi-asserted-by":"crossref","first-page":"R95","DOI":"10.1186\/gb-2013-14-9-r95","article-title":"Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data","volume":"14","author":"Rapaport","year":"2013","journal-title":"Genome Biol."},{"key":"2023020116141625300_btu635-B28","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2010-11-3-r25","article-title":"A scaling normalization method for differential expression analysis of RNA-seq data","volume":"11","author":"Robinson","year":"2010","journal-title":"Genome Biol."},{"key":"2023020116141625300_btu635-B29","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1093\/bioinformatics\/btp616","article-title":"edgeR: a bioconductor package for differential expression analysis of digital gene expression data","volume":"26","author":"Robinson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020116141625300_btu635-B30","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1186\/1471-2105-7-162","article-title":"An application of statistics to comparative metagenomics","volume":"7","author":"Rodriguez-Brito","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023020116141625300_btu635-B31","doi-asserted-by":"crossref","first-page":"4636","DOI":"10.1073\/pnas.0611650104","article-title":"Regulation of the stringent response is the essential function of the conserved bacterial G protein CgtA in Vibrio cholerae","volume":"104","author":"Raskin","year":"2007","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020116141625300_btu635-B32","doi-asserted-by":"crossref","first-page":"6773","DOI":"10.1128\/AEM.00474-06","article-title":"Introducing SONS, a tool for operational taxonomic unit-based comparisons of microbial community memberships and structures","volume":"72","author":"Schloss","year":"2006","journal-title":"Appl. Environ. Microbiol."},{"key":"2023020116141625300_btu635-B33","doi-asserted-by":"crossref","first-page":"804","DOI":"10.1038\/nature06244","article-title":"The human microbiome project","volume":"449","author":"Turnbaugh","year":"2007","journal-title":"Nature"},{"key":"2023020116141625300_btu635-B34","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-21706-2","volume-title":"Modern Applied Statistics with S","author":"Venables","year":"2002","edition":"4th edn"},{"key":"2023020116141625300_btu635-B35","doi-asserted-by":"crossref","first-page":"e1000352","DOI":"10.1371\/journal.pcbi.1000352","article-title":"Statistical methods for detecting differentially abundant features in clinical metagenomic samples","volume":"5","author":"White","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023020116141625300_btu635-B36","doi-asserted-by":"crossref","first-page":"e7370","DOI":"10.1371\/journal.pone.0007370","article-title":"Metagenomic analysis of respiratory tract DNA viral communities in cystic fibrosis and non-cystic fibrosis individuals","volume":"4","author":"Willner","year":"2009","journal-title":"PLoS One"},{"key":"2023020116141625300_btu635-B37","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1007\/s11390-010-9306-4","article-title":"Metagenomics: facts and artifacts, and computational challenges","volume":"25","author":"Wooley","year":"2010","journal-title":"J. Comp. Sci. Tech."},{"key":"2023020116141625300_btu635-B39","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Zou","year":"2005","journal-title":"J. R. Statist. Soc. B"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/2\/158\/49011248\/bioinformatics_31_2_158.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/2\/158\/49011248\/bioinformatics_31_2_158.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,2]],"date-time":"2024-06-02T23:29:53Z","timestamp":1717370993000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/2\/158\/2366136"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,9,24]]},"references-count":29,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2015,1,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu635","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,1,15]]},"published":{"date-parts":[[2014,9,24]]}}}