{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,23]],"date-time":"2026-03-23T13:17:02Z","timestamp":1774271822807,"version":"3.50.1"},"reference-count":41,"publisher":"Oxford University Press (OUP)","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,2,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Next-generation sequencing coupled with metagenomics has led to the rapid growth of sequence databases and enabled a new branch of microbiology called comparative metagenomics. Comparative metagenomic analysis studies compositional patterns within and between different environments providing a deep insight into the structure and function of complex microbial communities. It is a fast growing field that requires the development of novel supervised learning techniques for addressing challenges associated with metagenomic data, e.g. sensitivity to the choice of sequence similarity cutoff used to define operational taxonomic units (OTUs), high dimensionality and sparsity of the data and so forth. On the other hand, the natural properties of microbial community data may provide useful information about the structure of the data. For example, similarity between species encoded by a phylogenetic tree captures the relationship between OTUs and may be useful for the analysis of complex microbial datasets where the diversity patterns comprise features at multiple taxonomic levels. Even though some of the challenges have been addressed by learning algorithms in the literature, none of the available methods take advantage of the inherent properties of metagenomic data.<\/jats:p>\n               <jats:p>Results: We proposed a novel supervised classification method for microbial community samples, where each sample is represented as a set of OTU frequencies, which takes advantage of the natural structure in microbial community data encoded by a phylogenetic tree. This model allows us to take advantage of environment-specific compositional patterns that may contain features at multiple granularity levels. Our method is based on the multinomial logistic regression model with a tree-guided penalty function. Additionally, we proposed a new simulation framework for generating 16S ribosomal RNA gene read counts that may be useful in comparative metagenomics research. Our experimental results on simulated and real data show that the phylogenetic information used in our method improves the classification accuracy.<\/jats:p>\n               <jats:p>Availability and implementation: \u00a0http:\/\/www.cs.ucr.edu\/\u223ctanaseio\/metaphyl.htm.<\/jats:p>\n               <jats:p>Contact: \u00a0tanaseio@cs.ucr.edu or jiang@cs.ucr.edu<\/jats:p>\n               <jats:p>Supplementary Information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt700","type":"journal-article","created":{"date-parts":[[2013,12,26]],"date-time":"2013-12-26T01:19:58Z","timestamp":1388020798000},"page":"449-456","source":"Crossref","is-referenced-by-count":30,"title":["Phylogeny-based classification of microbial communities"],"prefix":"10.1093","volume":"30","author":[{"given":"Olga","family":"Tanaseichuk","sequence":"first","affiliation":[{"name":"1 Department of Computer Science and Engineering, 2Department of Plant Pathology and Microbiology, University of California, Riverside, CA 92521 USA and 3School of Information Science and Technology, Tsinghua University, Beijing 100084, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"James","family":"Borneman","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Engineering, 2Department of Plant Pathology and Microbiology, University of California, Riverside, CA 92521 USA and 3School of Information Science and Technology, Tsinghua University, Beijing 100084, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tao","family":"Jiang","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Engineering, 2Department of Plant Pathology and Microbiology, University of California, Riverside, CA 92521 USA and 3School of Information Science and Technology, Tsinghua University, Beijing 100084, China"},{"name":"1 Department of Computer Science and Engineering, 2Department of Plant Pathology and Microbiology, University of California, Riverside, CA 92521 USA and 3School of Information Science and Technology, Tsinghua University, Beijing 100084, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2013,12,24]]},"reference":[{"key":"2023012710423081200_btt700-B1","article-title":"MLPY: machine learning python. arXiv:1202.6548v2","author":"Albanese","year":"2012"},{"key":"2023012710423081200_btt700-B2","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1128\/mr.59.1.143-169.1995","article-title":"Phylogenetic identification and in situ detection of individual microbial cells without cultivation","volume":"59","author":"Amann","year":"1995","journal-title":"Microbiol. Rev."},{"key":"2023012710423081200_btt700-B3","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1038\/nature09944","article-title":"Enterotypes of the human gut microbiome","volume":"473","author":"Arumugam","year":"2011","journal-title":"Nature"},{"key":"2023012710423081200_btt700-B4","doi-asserted-by":"crossref","first-page":"55","DOI":"10.2174\/157489306775330615","article-title":"Gene expression profile classification: a review","volume":"1","author":"Asyali","year":"2006","journal-title":"Curr. Bioinform."},{"key":"2023012710423081200_btt700-B5","doi-asserted-by":"crossref","first-page":"e1000173","DOI":"10.1371\/journal.pcbi.1000173","article-title":"Support vector machines and kernels for computational biology","volume":"4","author":"Ben-Hur","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"2023012710423081200_btt700-B6","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1002\/widm.1072","article-title":"Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics","volume":"2","author":"Boulesteix","year":"2012","journal-title":"Wiley Interdiscip. Rev. Data Min. Knowl. Discov."},{"key":"2023012710423081200_btt700-B7","doi-asserted-by":"crossref","first-page":"266","DOI":"10.1093\/bioinformatics\/btp636","article-title":"PyNAST: a flexible tool for aligning sequences to a template alignment","volume":"26","author":"Caporaso","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012710423081200_btt700-B8","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1038\/nmeth.f.303","article-title":"QIIME allows analysis of high-throughput community sequencing data","volume":"7","author":"Caporaso","year":"2010","journal-title":"Nat. Methods"},{"key":"2023012710423081200_btt700-B9","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1186\/1471-2105-12-118","article-title":"Variance adjusted weighted UniFrac: a powerful beta diversity measure for comparing communities based on phylogeny","volume":"12","author":"Chang","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012710423081200_btt700-B10","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1038\/nature11234","article-title":"Structure, function and diversity of the healthy human microbiome","volume":"486","author":"Human Microbiome Project Consortium","year":"2012","journal-title":"Nature"},{"key":"2023012710423081200_btt700-B11","doi-asserted-by":"crossref","first-page":"1694","DOI":"10.1126\/science.1177486","article-title":"Bacterial community variation in human body habitats across space and time","volume":"326","author":"Costello","year":"2009","journal-title":"Science"},{"key":"2023012710423081200_btt700-B12","doi-asserted-by":"crossref","first-page":"2460","DOI":"10.1093\/bioinformatics\/btq461","article-title":"Search and clustering orders of magnitude faster than BLAST","volume":"26","author":"Edgar","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012710423081200_btt700-B13","first-page":"123","article-title":"Learning pathway-based decision rules to classify microarray cancer samples","volume-title":"German Conference on Bioinformatics 2010, of Lecture Notes in Informatics","author":"Glaab","year":"2010"},{"key":"2023012710423081200_btt700-B14","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1145\/1553374.1553431","article-title":"Group lasso with overlap and graph lasso","volume-title":"ICML\u201909: Proceedings of the 26th Annual International Conference on Machine Learning","author":"Jacob","year":"2009"},{"key":"2023012710423081200_btt700-B15","doi-asserted-by":"crossref","first-page":"e23214","DOI":"10.1371\/journal.pone.0023214","article-title":"The phylogenetic diversity of metagenomes","volume":"6","author":"Kembel","year":"2011","journal-title":"PLoS One"},{"key":"2023012710423081200_btt700-B16","article-title":"Tree-guided group lasso for multi-task regression with structured sparsity","volume-title":"Proceedings of the 27th International Conference on Machine Learning","author":"Kim","year":"2010"},{"key":"2023012710423081200_btt700-B17","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1016\/j.chom.2011.09.003","article-title":"Human-associated microbial signatures: examining their predictive value","volume":"10","author":"Knights","year":"2011","journal-title":"Cell Host Microbe"},{"key":"2023012710423081200_btt700-B18","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1111\/j.1574-6976.2010.00251.x","article-title":"Supervised classification of human microbiota","volume":"35","author":"Knights","year":"2011","journal-title":"FEMS Microbiol. Rev."},{"key":"2023012710423081200_btt700-B19","doi-asserted-by":"crossref","first-page":"3242","DOI":"10.1093\/bioinformatics\/btr547","article-title":"Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data","volume":"27","author":"Liu","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012710423081200_btt700-B20","doi-asserted-by":"crossref","first-page":"8228","DOI":"10.1128\/AEM.71.12.8228-8235.2005","article-title":"UniFrac: a new phylogenetic method for comparing microbial communities","volume":"71","author":"Lozupone","year":"2005","journal-title":"Appl. Environ. Microbiol."},{"key":"2023012710423081200_btt700-B21","doi-asserted-by":"crossref","first-page":"11436","DOI":"10.1073\/pnas.0611525104","article-title":"Global patterns in bacterial diversity","volume":"104","author":"Lozupone","year":"2007","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012710423081200_btt700-B22","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1111\/j.1574-6976.2008.00111.x","article-title":"Species divergence and the measurement of microbial diversity","volume":"32","author":"Lozupone","year":"2008","journal-title":"FEMS Microbiol. Rev."},{"key":"2023012710423081200_btt700-B23","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1038\/nrmicro2088","article-title":"Application of\u2019next-generation\u2019 sequencing technologies to microbial genetics","volume":"7","author":"MacLean","year":"2009","journal-title":"Nat. Rev. Microbiol"},{"key":"2023012710423081200_btt700-B24","first-page":"509","article-title":"Bayesian multinomial logistic regression for author identification","volume-title":"Maxent Conference","author":"Madigan","year":"2005"},{"key":"2023012710423081200_btt700-B25","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1111\/j.1467-9868.2007.00627.x","article-title":"The group lasso for logistic regression","volume":"70","author":"Meier","year":"2008","journal-title":"J. R. Stat. Soc. B Stat. Methodol."},{"key":"2023012710423081200_btt700-B26","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"2023012710423081200_btt700-B27","doi-asserted-by":"crossref","first-page":"e9490","DOI":"10.1371\/journal.pone.0009490","article-title":"FastTree 2 Approximately maximum-likelihood trees for large alignments","volume":"5","author":"Price","year":"2010","journal-title":"PLoS One"},{"key":"2023012710423081200_btt700-B28","doi-asserted-by":"crossref","first-page":"7537","DOI":"10.1128\/AEM.01541-09","article-title":"Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities","volume":"75","author":"Schloss","year":"2009","journal-title":"Appl. Environ. Microbiol."},{"key":"2023012710423081200_btt700-B29","doi-asserted-by":"crossref","first-page":"2379","DOI":"10.1128\/AEM.72.4.2379-2384.2006","article-title":"Introducing TreeClimber, a test to compare microbial community structures","volume":"72","author":"Schloss","year":"2006","journal-title":"Appl. Environ. Microbiol."},{"key":"2023012710423081200_btt700-B30","first-page":"165","article-title":"Comparing bacterial communities inferred from 16S rRNA gene sequencing and shotgun metagenomics","volume-title":"Proceedings of the Pacific Symposium on Biocomputing","author":"Shah","year":"2011"},{"key":"2023012710423081200_btt700-B31","doi-asserted-by":"crossref","first-page":"2493","DOI":"10.1093\/bioinformatics\/bts470","article-title":"Meta-Storms: efficient search for similar microbial communities based on a novel indexing scheme and similarity score for metagenomic data","volume":"28","author":"Su","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012710423081200_btt700-B32","doi-asserted-by":"crossref","first-page":"1027","DOI":"10.1038\/nature05414","article-title":"An obesity-associated gut microbiome with increased capacity for energy harvest","volume":"444","author":"Turnbaugh","year":"2006","journal-title":"Nature"},{"key":"2023012710423081200_btt700-B33","doi-asserted-by":"crossref","first-page":"804","DOI":"10.1038\/nature06244","article-title":"The Human Microbiome Project","volume":"449","author":"Turnbaugh","year":"2007","journal-title":"Nature"},{"key":"2023012710423081200_btt700-B34","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1038\/nature07540","article-title":"A core gut microbiome in obese and lean twins","volume":"457","author":"Turnbaugh","year":"2009","journal-title":"Nature"},{"key":"2023012710423081200_btt700-B35","doi-asserted-by":"crossref","first-page":"e1000352","DOI":"10.1371\/journal.pcbi.1000352","article-title":"Statistical methods for detecting differentially abundant features in clinical metagenomic samples","volume":"5","author":"White","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023012710423081200_btt700-B36","doi-asserted-by":"crossref","first-page":"714","DOI":"10.1093\/bioinformatics\/btp041","article-title":"Genome-wide association analysis by lasso penalized logistic regression","volume":"25","author":"Wu","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012710423081200_btt700-B37","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1016\/j.mimet.2005.06.012","article-title":"An ecoinformatics tool for microbial community studies: supervised classification of amplicon length heterogeneity (ALH) profiles of 16S rRNA","volume":"65","author":"Yang","year":"2006","journal-title":"J. Microbiol. Methods"},{"key":"2023012710423081200_btt700-B38","first-page":"153","article-title":"Identification and quantification of abundant species from pyrosequences of 16S rRNA by consensus alignment","volume":"2010","author":"Ye","year":"2011","journal-title":"Proc. (IEEE Int. Conf. Bioinformatics Biomed.)"},{"key":"2023012710423081200_btt700-B39","doi-asserted-by":"crossref","first-page":"957","DOI":"10.1089\/cmb.2011.0044","article-title":"Supervised protein family classification and new family construction","volume":"19","author":"Yi","year":"2012","journal-title":"J. Comput. Biol."},{"key":"2023012710423081200_btt700-B40","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1011441423217","article-title":"Text categorization based on regularized linear classification methods","volume":"4","author":"Zhang","year":"2000","journal-title":"Inf. Retr."},{"key":"2023012710423081200_btt700-B41","doi-asserted-by":"crossref","first-page":"3468","DOI":"10.1214\/07-AOS584","article-title":"The composite absolute penalties family for grouped and hierarchical variable selection","volume":"37","author":"Zhao","year":"2009","journal-title":"Ann. Stat."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/4\/449\/48917562\/bioinformatics_30_4_449.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/4\/449\/48917562\/bioinformatics_30_4_449.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T10:59:10Z","timestamp":1674817150000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/4\/449\/202339"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,12,24]]},"references-count":41,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2014,2,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt700","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,2,15]]},"published":{"date-parts":[[2013,12,24]]}}}