{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T07:22:46Z","timestamp":1761895366833},"reference-count":30,"publisher":"Oxford University Press (OUP)","issue":"23","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Direct sequencing of microbes in human ecosystems (the human microbiome) has complemented single genome cultivation and sequencing to understand and explore the impact of commensal microbes on human health. As sequencing technologies improve and costs decline, the sophistication of data has outgrown available computational methods. While several existing machine learning methods have been adapted for analyzing microbiome data recently, there is not yet an efficient and dedicated algorithm available for multiclass classification of human microbiota.<\/jats:p><jats:p>Results: By combining instance-based and model-based learning, we propose a novel sparse distance-based learning method for simultaneous class prediction and feature (variable or taxa, which is used interchangeably) selection from multiple treatment populations on the basis of 16S rRNA sequence count data. Our proposed method simultaneously minimizes the intraclass distance and maximizes the interclass distance with many fewer estimated parameters than other methods. It is very efficient for problems with small sample sizes and unbalanced classes, which are common in metagenomic studies. We implemented this method in a MATLAB toolbox called MetaDistance. We also propose several approaches for data normalization and variance stabilization transformation in MetaDistance. We validate this method on several real and simulated 16S rRNA datasets to show that it outperforms existing methods for classifying metagenomic data. This article is the first to address simultaneous multifeature selection and class prediction with metagenomic count data.<\/jats:p><jats:p>Availability: The MATLAB toolbox is freely available online at http:\/\/metadistance.igs.umaryland.edu\/.<\/jats:p><jats:p>Contact: \u00a0zliu@umm.edu<\/jats:p><jats:p>Supplementary Information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btr547","type":"journal-article","created":{"date-parts":[[2011,10,8]],"date-time":"2011-10-08T02:49:40Z","timestamp":1318042180000},"page":"3242-3249","source":"Crossref","is-referenced-by-count":46,"title":["Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data"],"prefix":"10.1093","volume":"27","author":[{"given":"Zhenqiu","family":"Liu","sequence":"first","affiliation":[{"name":"1 Department of Epidemiology and Public Health, University of Maryland Greenebaum Cancer Center and 2Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA"},{"name":"1 Department of Epidemiology and Public Health, University of Maryland Greenebaum Cancer Center and 2Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA"}]},{"given":"William","family":"Hsiao","sequence":"additional","affiliation":[{"name":"1 Department of Epidemiology and Public Health, University of Maryland Greenebaum Cancer Center and 2Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA"}]},{"given":"Brandi L.","family":"Cantarel","sequence":"additional","affiliation":[{"name":"1 Department of Epidemiology and Public Health, University of Maryland Greenebaum Cancer Center and 2Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA"}]},{"given":"Elliott Franco","family":"Dr\u00e1bek","sequence":"additional","affiliation":[{"name":"1 Department of Epidemiology and Public Health, University of Maryland Greenebaum Cancer Center and 2Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA"}]},{"given":"Claire","family":"Fraser-Liggett","sequence":"additional","affiliation":[{"name":"1 Department of Epidemiology and Public Health, University of Maryland Greenebaum Cancer Center and 2Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA"}]}],"member":"286","published-online":{"date-parts":[[2011,10,7]]},"reference":[{"key":"2023012511050709500_B1","first-page":"113","article-title":"Reducing multiclass to binary: a unifying approach for margin classifiers","volume":"9","author":"Allwein","year":"2001","journal-title":"J. Mach. Learn. Res."},{"key":"2023012511050709500_B2","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1038\/nmeth.f.303","article-title":"QIIME allows analysis of high-throughput community sequencing data","volume":"7","author":"Caporaso","year":"2010","journal-title":"Nat. Methods"},{"key":"2023012511050709500_B3","first-page":"265","article-title":"On the algorithmic implementation of multiclass kernel-based vector machines","volume":"2","author":"Crammer","year":"2001","journal-title":"J. Mach. Learn. Res."},{"key":"2023012511050709500_B4","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s10994-009-5127-5","article-title":"Combining instance-based learning and logistic regression for multilabel classification","volume":"76","author":"Cheng","year":"2009","journal-title":"Mach. Learn."},{"key":"2023012511050709500_B5","doi-asserted-by":"crossref","first-page":"1694","DOI":"10.1126\/science.1177486","article-title":"Bacterial community variation in human body habitats across space and time","volume":"326","author":"Costello","year":"2009","journal-title":"Science."},{"key":"2023012511050709500_B6","doi-asserted-by":"crossref","first-page":"2609","DOI":"10.1016\/j.sigpro.2009.04.035","article-title":"Clipped noisy images: heteroskedastic modeling and practical denoising","volume":"89","author":"Foi","year":"2009","journal-title":"Signal Process."},{"key":"2023012511050709500_B7","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1214\/aoms\/1177729756","article-title":"Transformations related to the angular and the square root","volume":"21","author":"Freeman","year":"1950","journal-title":"Ann. Math. Stat."},{"key":"2023012511050709500_B8","doi-asserted-by":"crossref","DOI":"10.1007\/s15010-011-0161-1","article-title":"Outcomes in patients infected with carbapenem-resistant Acinetobacter baumannii and treated with tigecycline alone or in combination therapy","author":"Guner","year":"2011","journal-title":"Infection"},{"key":"2023012511050709500_B9","first-page":"35","article-title":"A survey of the nonlinear conjugate gradient methods","volume":"2","author":"Hagger","year":"2006","journal-title":"Pac. J. Optim."},{"key":"2023012511050709500_B10","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1101\/gr.5969107","article-title":"MEGAN analysis of metagenomic data","volume":"17","author":"Huson","year":"2007","journal-title":"Genome Res."},{"key":"2023012511050709500_B11","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1471-2105-10-S1-S12","article-title":"Methods for comparative metagenomics","volume":"10","author":"Huson","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012511050709500_B12","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1111\/j.1440-1703.2003.00620.x","article-title":"Angular transformation - another effect of different sample sizes","volume":"19","author":"Kasuya","year":"2004","journal-title":"Ecol. Res."},{"key":"2023012511050709500_B13","doi-asserted-by":"crossref","first-page":"833","DOI":"10.1080\/01621459.1994.10476817","article-title":"Transform-both-sides approach for overdispersed binomial data when N is unobserved","volume":"89","author":"Kim","year":"1994","journal-title":"J. Am. Stat. Assoc."},{"key":"2023012511050709500_B14","article-title":"Supervised classification of human microbiota","author":"Knights","year":"2010","journal-title":"FEMS Microbiol Rev."},{"key":"2023012511050709500_B15","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1080\/01621459.1961.10482100","article-title":"On stabilizing the binomial and negative binomial variances","volume":"56","author":"Laubscher","year":"1961","journal-title":"J. Am. Stat. Assoc."},{"key":"2023012511050709500_B16","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1109\/TCBB.2008.17","article-title":"Sparse support vector machines with Lp penalty for biomarker identification","volume":"7","author":"Liu","year":"2010","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"2023012511050709500_B17","doi-asserted-by":"crossref","first-page":"8228","DOI":"10.1128\/AEM.71.12.8228-8235.2005","article-title":"UniFrac: a new phylogenetic method for comparing microbial communities","volume":"71","author":"Lozupone","year":"2005","journal-title":"Appl. Environ. Microbiol."},{"key":"2023012511050709500_B18","doi-asserted-by":"crossref","first-page":"1849","DOI":"10.1093\/bioinformatics\/btp341","article-title":"Visual and statistical comparison of metagenomes","volume":"25","author":"Mitra","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012511050709500_B19","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1093\/biomet\/71.2.405","article-title":"On Bartlett's test for homogeneity of variances","volume":"71","author":"Nagarsenker","year":"1984","journal-title":"Biometrika"},{"key":"2023012511050709500_B20","doi-asserted-by":"crossref","first-page":"69","DOI":"10.21500\/20112084.846","article-title":"Positively slewed data: revisiting the Box-Cox transformation","volume":"3","author":"Olivier","year":"2010","journal-title":"Int. J. Psychol. Res."},{"key":"2023012511050709500_B21","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nature08821","article-title":"A human gut microbial gene catalogue established by metagenomic sequencing","volume":"464","author":"Qin","year":"2010","journal-title":"Nature."},{"key":"2023012511050709500_B22","doi-asserted-by":"crossref","first-page":"7537","DOI":"10.1128\/AEM.01541-09","article-title":"Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities","volume":"75","author":"Schloss","year":"2009","journal-title":"Appl. Environ. Microbiol."},{"key":"2023012511050709500_B23","doi-asserted-by":"crossref","first-page":"804","DOI":"10.1038\/nature06244","article-title":"The human microbiome project","volume":"449","author":"Turnbaugh","year":"2007","journal-title":"Nature"},{"key":"2023012511050709500_B24","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1007\/s10994-008-5077-3","article-title":"Decision trees for hierarchical multi-label classification","volume":"73","author":"Vens","year":"2008","journal-title":"Mach. Learn."},{"key":"2023012511050709500_B25","doi-asserted-by":"crossref","first-page":"5261","DOI":"10.1128\/AEM.00062-07","article-title":"Na\u00efve Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy","volume":"73","author":"Wang","year":"2007","journal-title":"Appl. Environ. Microbiol."},{"key":"2023012511050709500_B26","doi-asserted-by":"crossref","first-page":"e1000352","DOI":"10.1371\/journal.pcbi.1000352","article-title":"Statistical methods for detecting differentially abundant features in clinical metagenomic samples","volume":"5","author":"White","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023012511050709500_B27","doi-asserted-by":"crossref","first-page":"e1000667","DOI":"10.1371\/journal.pcbi.1000667","article-title":"A primer on metagenomics","volume":"6","author":"Wooley","year":"2010","journal-title":"PLoS Comput. Biol."},{"key":"2023012511050709500_B28","doi-asserted-by":"crossref","first-page":"1033","DOI":"10.1109\/TNN.2010.2047114","article-title":"Semi-supervised feature selection based on manifold regularization","volume":"21","author":"Xu","year":"2010","journal-title":"IEEE Trans. Neural Netw."},{"key":"2023012511050709500_B29","doi-asserted-by":"crossref","first-page":"2038","DOI":"10.1016\/j.patcog.2006.12.019","article-title":"Ml-knn: a lazy learning approach to multi-label learning","volume":"40","author":"Zhang","year":"2007","journal-title":"Pattern Recognit."},{"key":"2023012511050709500_B30","doi-asserted-by":"crossref","first-page":"2365","DOI":"10.1073\/pnas.0812600106","article-title":"Human gut microbiota in obesity and after gastric bypass","volume":"106","author":"Zhang","year":"2009","journal-title":"Proc. Natl Acad. Sci. USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/23\/3242\/48862039\/bioinformatics_27_23_3242.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/23\/3242\/48862039\/bioinformatics_27_23_3242.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,13]],"date-time":"2024-04-13T12:04:41Z","timestamp":1713009881000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/27\/23\/3242\/233777"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,10,7]]},"references-count":30,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2011,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btr547","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2011,12,1]]},"published":{"date-parts":[[2011,10,7]]}}}