{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T01:53:31Z","timestamp":1774403611840,"version":"3.50.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2024,1,10]],"date-time":"2024-01-10T00:00:00Z","timestamp":1704844800000},"content-version":"vor","delay-in-days":9,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Department of Biostatistics, Columbia University"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,1,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Research on human microbiome has suggested associations with human health, opening opportunities to predict health outcomes using microbiome. Studies have also suggested that diverse forms of taxa such as rare taxa that are evolutionally related and abundant taxa that are evolutionally unrelated could be associated with or predictive of a health outcome. Although prediction models were developed for microbiome data, no prediction models currently exist that use multiple forms of microbiome\u2013outcome associations.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We developed MK-BMC, a Multi-Kernel framework with Boosted distance Metrics for Classification using microbiome data. We propose to first boost widely used distance metrics for microbiome data using taxon-level association signal strengths to up-weight taxa that are potentially associated with an outcome of interest. We then propose a multi-kernel prediction model with one kernel capturing one form of association between taxa and the outcome, where a kernel measures similarities of microbiome compositions between pairs of samples being transformed from a proposed boosted distance metric. We demonstrated superior prediction performance of (i) boosted distance metrics for microbiome data over original ones and (ii) MK-BMC over competing methods through extensive simulations. We applied MK-BMC to predict thyroid, obesity, and inflammatory bowel disease status using gut microbiome data from the American Gut Project and observed much-improved prediction performance over that of competing methods. The learned kernel weights help us understand contributions of individual microbiome signal forms nicely.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>Source code together with a sample input dataset is available at https:\/\/github.com\/HXu06\/MK-BMC<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad757","type":"journal-article","created":{"date-parts":[[2024,1,11]],"date-time":"2024-01-11T04:29:20Z","timestamp":1704947360000},"source":"Crossref","is-referenced-by-count":3,"title":["MK-BMC: a Multi-Kernel framework with Boosted distance metrics for Microbiome data for Classification"],"prefix":"10.1093","volume":"40","author":[{"given":"Huang","family":"Xu","sequence":"first","affiliation":[{"name":"Department of Statistics and Finance, University of Science and Technology of China , Hefei 230026, China"}]},{"given":"Tian","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Mailman School of Public Health, Columbia University , New York, NY 10032, United States"}]},{"given":"Yuqi","family":"Miao","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Mailman School of Public Health, Columbia University , New York, NY 10032, United States"}]},{"given":"Min","family":"Qian","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Mailman School of Public Health, Columbia University , New York, NY 10032, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2671-1519","authenticated-orcid":false,"given":"Yaning","family":"Yang","sequence":"additional","affiliation":[{"name":"Department of Statistics and Finance, University of Science and Technology of China , Hefei 230026, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1693-6888","authenticated-orcid":false,"given":"Shuang","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Mailman School of Public Health, Columbia University , New York, NY 10032, United States"}]}],"member":"286","published-online":{"date-parts":[[2024,1,10]]},"reference":[{"key":"2024011515225073000_btad757-B1","doi-asserted-by":"crossref","first-page":"325","DOI":"10.2307\/1942268","article-title":"An ordination of the upland forest communities of Southern Wisconsin","volume":"27","author":"Bray","year":"1957","journal-title":"Ecol Monogr"},{"key":"2024011515225073000_btad757-B2","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach Learn"},{"key":"2024011515225073000_btad757-B3","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1038\/nmeth.f.303","article-title":"Qiime allows analysis of high-throughput community sequencing data","volume":"7","author":"Caporaso","year":"2010","journal-title":"Nat Methods"},{"key":"2024011515225073000_btad757-B4","doi-asserted-by":"crossref","first-page":"e15216","DOI":"10.1371\/journal.pone.0015216","article-title":"Disordered microbial communities in the upper respiratory tract of cigarette smokers","volume":"5","author":"Charlson","year":"2010","journal-title":"PLoS One"},{"key":"2024011515225073000_btad757-B5","doi-asserted-by":"crossref","first-page":"2106","DOI":"10.1093\/bioinformatics\/bts342","article-title":"Associating microbiome composition with environmental covariates using generalized unifrac distances","volume":"28","author":"Chen","year":"2012","journal-title":"Bioinformatics"},{"key":"2024011515225073000_btad757-B6","doi-asserted-by":"crossref","first-page":"3991","DOI":"10.1093\/bioinformatics\/btv497","article-title":"Glmgraph: an r package for variable selection and predictive modeling of structured genomic data","volume":"31","author":"Chen","year":"2015","journal-title":"Bioinformatics"},{"key":"2024011515225073000_btad757-B7","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1038\/nature11234","article-title":"Structure, function and diversity of the healthy human microbiome","volume":"486","author":"Consortium","year":"2012","journal-title":"nature"},{"key":"2024011515225073000_btad757-B8","first-page":"213","volume-title":"Biocomputing 2012","author":"Fukuyama","year":"2012"},{"key":"2024011515225073000_btad757-B9","doi-asserted-by":"crossref","first-page":"e1010066","DOI":"10.1371\/journal.pcbi.1010066","article-title":"Host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa","volume":"18","author":"Giliberti","year":"2022","journal-title":"PLoS Comput Biol"},{"key":"2024011515225073000_btad757-B10","doi-asserted-by":"crossref","first-page":"e1010050","DOI":"10.1371\/journal.pcbi.1010050","article-title":"Microbiome-based disease prediction with multimodal variational information bottlenecks","volume":"18","author":"Grazioli","year":"2022","journal-title":"PLoS Comput Biol"},{"key":"2024011515225073000_btad757-B11","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1038\/nature12198","article-title":"Gut metagenome in European women with normal, impaired and diabetic glucose control","volume":"498","author":"Karlsson","year":"2013","journal-title":"Nature"},{"key":"2024011515225073000_btad757-B12","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1111\/j.1574-6976.2010.00251.x","article-title":"Supervised classification of human microbiota","volume":"35","author":"Knights","year":"2011","journal-title":"FEMS Microbiol Rev"},{"key":"2024011515225073000_btad757-B13","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1186\/s40168-017-0262-x","article-title":"A powerful microbiome-based association test and a microbial taxa discovery framework for comprehensive association mapping","volume":"5","author":"Koh","year":"2017","journal-title":"Microbiome"},{"key":"2024011515225073000_btad757-B14","doi-asserted-by":"crossref","first-page":"813","DOI":"10.1038\/nmeth.1499","article-title":"Microbial community resemblance methods differ in their ability to detect biologically relevant patterns","volume":"7","author":"Kuczynski","year":"2010","journal-title":"Nat Methods"},{"key":"2024011515225073000_btad757-B15","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1038\/nrmicro2857","article-title":"Genomic sequencing of uncultured microorganisms from single cells","volume":"10","author":"Lasken","year":"2012","journal-title":"Nat Rev Microbiol"},{"key":"2024011515225073000_btad757-B16","doi-asserted-by":"crossref","first-page":"8228","DOI":"10.1128\/AEM.71.12.8228-8235.2005","article-title":"Unifrac: a new phylogenetic method for comparing microbial communities","volume":"71","author":"Lozupone","year":"2005","journal-title":"Appl Environ Microbiol"},{"key":"2024011515225073000_btad757-B17","doi-asserted-by":"crossref","first-page":"1576","DOI":"10.1128\/AEM.01996-06","article-title":"Quantitative and qualitative \u03b2 diversity measures lead to different insights into factors that structure microbial communities","volume":"73","author":"Lozupone","year":"2007","journal-title":"Appl Environ Microbiol"},{"key":"2024011515225073000_btad757-B18","doi-asserted-by":"crossref","first-page":"3959","DOI":"10.1093\/bioinformatics\/btaa255","article-title":"A novel normalization and differential abundance test framework for microbiome data","volume":"36","author":"Ma","year":"2020","journal-title":"Bioinformatics"},{"key":"2024011515225073000_btad757-B19","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1128\/mSystems.00031-18","article-title":"American gut: an open platform for citizen science microbiome research","volume":"3","author":"McDonald","year":"2018","journal-title":"mSystems"},{"key":"2024011515225073000_btad757-B20","first-page":"2651","article-title":"Universal kernels","volume":"7","author":"Micchelli","year":"2006","journal-title":"J Mac Learn Res"},{"key":"2024011515225073000_btad757-B21","doi-asserted-by":"crossref","first-page":"R79","DOI":"10.1186\/gb-2012-13-9-r79","article-title":"Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment","volume":"13","author":"Morgan","year":"2012","journal-title":"Genome Biol"},{"key":"2024011515225073000_btad757-B22","doi-asserted-by":"crossref","first-page":"16004","DOI":"10.1038\/npjbiofilms.2016.4","article-title":"A perspective on 16s rRNA operational taxonomic unit clustering using sequence similarity","volume":"2","author":"Nguyen","year":"2016","journal-title":"NPJ Biofilms Microbiomes"},{"key":"2024011515225073000_btad757-B23","doi-asserted-by":"crossref","first-page":"e9490","DOI":"10.1371\/journal.pone.0009490","article-title":"Fasttree 2\u2013approximately maximum-likelihood trees for large alignments","volume":"5","author":"Price","year":"2010","journal-title":"PLoS One"},{"key":"2024011515225073000_btad757-B24","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1002\/cpmo.29","article-title":"Microbiota analysis using an illumina MiSeq platform to sequence 16s rRNA genes","volume":"7","author":"Rapin","year":"2017","journal-title":"Curr Protoc Mouse Biol"},{"key":"2024011515225073000_btad757-B25","doi-asserted-by":"crossref","first-page":"2993","DOI":"10.1109\/JBHI.2020.2993761","article-title":"Popphy-cnn: a phylogenetic tree embedded architecture for convolutional neural networks to predict host phenotype from metagenomic data","volume":"24","author":"Reiman","year":"2020","journal-title":"IEEE J Biomed Health Inform"},{"key":"2024011515225073000_btad757-B26","doi-asserted-by":"crossref","first-page":"3718","DOI":"10.1093\/bioinformatics\/btz124","article-title":"Using association signal annotations to boost similarity network fusion","volume":"35","author":"Ruan","year":"2019","journal-title":"Bioinformatics"},{"key":"2024011515225073000_btad757-B27","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1038\/nmeth.3802","article-title":"Strain-level microbial epidemiology and population genomics from shotgun metagenomics","volume":"13","author":"Scholz","year":"2016","journal-title":"Nat Methods"},{"key":"2024011515225073000_btad757-B28","doi-asserted-by":"crossref","first-page":"4544","DOI":"10.1093\/bioinformatics\/btaa542","article-title":"Taxonn: ensemble of neural networks on stratified microbiome data for disease prediction","volume":"36","author":"Sharma","year":"2020","journal-title":"Bioinformatics"},{"key":"2024011515225073000_btad757-B29","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1093\/bioinformatics\/btt700","article-title":"Phylogeny-based classification of microbial communities","volume":"30","author":"Tanaseichuk","year":"2014","journal-title":"Bioinformatics"},{"key":"2024011515225073000_btad757-B30","doi-asserted-by":"crossref","first-page":"2618","DOI":"10.1093\/bioinformatics\/btw311","article-title":"Permanova-s: association test for microbial community composition that accommodates confounders and multiple distances","volume":"32","author":"Tang","year":"2016","journal-title":"Bioinformatics"},{"key":"2024011515225073000_btad757-B31","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J R Stat Soc Series B Stat Methodo"},{"key":"2024011515225073000_btad757-B32","doi-asserted-by":"crossref","first-page":"805","DOI":"10.1038\/nrg1709","article-title":"Metagenomics: DNA sequencing of environmental samples","volume":"6","author":"Tringe","year":"2005","journal-title":"Nat Rev Genet"},{"key":"2024011515225073000_btad757-B33","doi-asserted-by":"crossref","first-page":"902","DOI":"10.1038\/nmeth.3589","article-title":"Metaphlan2 for enhanced metagenomic taxonomic profiling","volume":"12","author":"Truong","year":"2015","journal-title":"Nat Methods"},{"key":"2024011515225073000_btad757-B34","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1038\/nature07540","article-title":"A core gut microbiome in obese and lean twins","volume":"457","author":"Turnbaugh","year":"2009","journal-title":"nature"},{"key":"2024011515225073000_btad757-B35","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1093\/bioinformatics\/btab668","article-title":"Testing microbiome association using integrated quantile regression models","volume":"38","author":"Wang","year":"2022","journal-title":"Bioinformatics"},{"key":"2024011515225073000_btad757-B36","doi-asserted-by":"crossref","first-page":"e6\u2013e6","DOI":"10.1093\/nar\/gkz204","article-title":"Detection of epigenetic field defects using a weighted epigenetic distance-based method","volume":"47","author":"Wang","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2024011515225073000_btad757-B37","article-title":"A novel deep learning method for predictive modeling of microbiome data","volume":"22","author":"Wang","year":"2021","journal-title":"Brief Bioinform"},{"key":"2024011515225073000_btad757-B38","doi-asserted-by":"crossref","first-page":"107050","DOI":"10.1016\/j.patcog.2019.107050","article-title":"Collaborative and geometric multi-kernel learning for multi-class classification","volume":"99","author":"Wang","year":"2020","journal-title":"Pattern Recognition"},{"key":"2024011515225073000_btad757-B39","first-page":"44","author":"Wassan","year":"2018"},{"key":"2024011515225073000_btad757-B40","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1186\/s13073-016-0302-3","article-title":"An adaptive association test for microbiome data","volume":"8","author":"Wu","year":"2016","journal-title":"Genome Med"},{"key":"2024011515225073000_btad757-B41","doi-asserted-by":"crossref","first-page":"2435","DOI":"10.1038\/ismej.2016.37","article-title":"Cigarette smoking and the oral microbiome in a large study of american adults","volume":"10","author":"Wu","year":"2016","journal-title":"Isme J"},{"key":"2024011515225073000_btad757-B42","doi-asserted-by":"crossref","first-page":"1391","DOI":"10.3389\/fmicb.2018.01391","article-title":"Predictive modeling of microbiome data using a phylogeny-regularized generalized linear mixed model","volume":"9","author":"Xiao","year":"2018","journal-title":"Front Microbiol"},{"key":"2024011515225073000_btad757-B43","doi-asserted-by":"crossref","first-page":"1875","DOI":"10.1093\/bioinformatics\/bty014","article-title":"A distance-based approach for testing the mediation effect of the human microbiome","volume":"34","author":"Zhang","year":"2018","journal-title":"Bioinformatics"},{"key":"2024011515225073000_btad757-B44","doi-asserted-by":"crossref","first-page":"797","DOI":"10.1016\/j.ajhg.2015.04.003","article-title":"Testing in microbiome-profiling studies with MiRKAT, the microbiome regression-based kernel association test","volume":"96","author":"Zhao","year":"2015","journal-title":"Am J Hum Genet"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad757\/55399604\/btad757.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/1\/btad757\/56056987\/btad757.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/1\/btad757\/56056987\/btad757.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,7]],"date-time":"2024-11-07T18:32:24Z","timestamp":1731004344000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad757\/7515249"}},"subtitle":[],"editor":[{"given":"Christina","family":"Kendziorski","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,1,1]]},"references-count":44,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,1,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad757","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,1,1]]},"published":{"date-parts":[[2024,1,1]]},"article-number":"btad757"}}