{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:31Z","timestamp":1772138071718,"version":"3.50.1"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2025,1,12]],"date-time":"2025-01-12T00:00:00Z","timestamp":1736640000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Canadian NSERC Discovery","award":["RGPIN-03270-2023"],"award-info":[{"award-number":["RGPIN-03270-2023"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,2,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Understanding the associations between traits and microbial composition is a fundamental objective in microbiome research. Recently, researchers have turned to machine learning (ML) models to achieve this goal with promising results. However, the effectiveness of advanced ML models is often limited by the unique characteristics of microbiome data, which are typically high-dimensional, compositional, and imbalanced. These characteristics can hinder the models\u2019 ability to fully explore the relationships among taxa in predictive analyses. To address this challenge, data augmentation has become crucial. It involves generating synthetic samples with artificial labels based on existing data and incorporating these samples into the training set to improve ML model performance.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here, we propose PhyloMix, a novel data augmentation method specifically designed for microbiome data to enhance predictive analyses. PhyloMix leverages the phylogenetic relationships among microbiome taxa as an informative prior to guide the generation of synthetic microbial samples. Leveraging phylogeny, PhyloMix creates new samples by removing a subtree from one sample and combining it with the corresponding subtree from another sample. Notably, PhyloMix is designed to address the compositional nature of microbiome data, effectively handling both raw counts and relative abundances. This approach introduces sufficient diversity into the augmented samples, leading to improved predictive performance. We empirically evaluated PhyloMix on six real microbiome datasets across five commonly used ML models. PhyloMix significantly outperforms distinct baseline methods including sample-mixing-based data augmentation techniques like vanilla mixup and compositional cutmix, as well as the phylogeny-based method TADA. We also demonstrated the wide applicability of PhyloMix in both supervised learning and contrastive representation learning.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The Apache-licensed source code is available at (https:\/\/github.com\/batmen-lab\/phylomix).<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf014","type":"journal-article","created":{"date-parts":[[2025,1,8]],"date-time":"2025-01-08T23:20:46Z","timestamp":1736378446000},"source":"Crossref","is-referenced-by-count":3,"title":["PhyloMix: enhancing microbiome-trait association prediction through phylogeny-mixing augmentation"],"prefix":"10.1093","volume":"41","author":[{"given":"Yifan","family":"Jiang","sequence":"first","affiliation":[{"name":"Cheriton School of Computer Science, University of Waterloo , Waterloo, ON, N2L 3G1,","place":["Canada"]}]},{"given":"Disen","family":"Liao","sequence":"additional","affiliation":[{"name":"Cheriton School of Computer Science, University of Waterloo , Waterloo, ON, N2L 3G1,","place":["Canada"]}]},{"given":"Qiyun","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Life Sciences, Arizona State University , Tempe, AZ, 85281,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1686-5917","authenticated-orcid":false,"given":"Yang Young","family":"Lu","sequence":"additional","affiliation":[{"name":"Cheriton School of Computer Science, University of Waterloo , Waterloo, ON, N2L 3G1,","place":["Canada"]}]}],"member":"286","published-online":{"date-parts":[[2025,1,12]]},"reference":[{"key":"2025030422263964900_btaf014-B1","article-title":"Sanity checks for saliency maps","author":"Adebayo","year":"2018"},{"key":"2025030422263964900_btaf014-B2","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1111\/j.2517-6161.1982.tb01195.x","article-title":"The statistical analysis of compositional data","volume":"44","author":"Aitchison","year":"1982","journal-title":"J R Stat Soc Ser B (Methodol)"},{"key":"2025030422263964900_btaf014-B3","doi-asserted-by":"crossref","first-page":"e1004186","DOI":"10.1371\/journal.pcbi.1004186","article-title":"Explaining diversity in metagenomic datasets by phylogenetic-based feature weighting","volume":"11","author":"Albanese","year":"2015","journal-title":"PLoS Comput Biol"},{"key":"2025030422263964900_btaf014-B4","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","article-title":"Representation learning: a review and new perspectives","volume":"35","author":"Bengio","year":"2013","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2025030422263964900_btaf014-B5","doi-asserted-by":"crossref","first-page":"399","DOI":"10.1002\/mds.29300","article-title":"Integrated multi-cohort analysis of the Parkinson\u2019s disease gut metagenome","volume":"38","author":"Boktor","year":"2023","journal-title":"Mov Disord"},{"key":"2025030422263964900_btaf014-B6","first-page":"1","author":"Cao","year":"2024"},{"key":"2025030422263964900_btaf014-B7","first-page":"1","article-title":"On mixup regularization","volume":"23","author":"Carratino","year":"2022","journal-title":"J Mach Learn Res"},{"key":"2025030422263964900_btaf014-B8","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"JAIR"},{"key":"2025030422263964900_btaf014-B9","first-page":"1597","author":"Chen","year":"2020"},{"key":"2025030422263964900_btaf014-B10","doi-asserted-by":"crossref","first-page":"1246","DOI":"10.1080\/19490976.2020.1747329","article-title":"Altered gut microbial profile is associated with abnormal metabolism activity of autism spectrum disorder","volume":"11","author":"Dan","year":"2020","journal-title":"Gut Microbes"},{"key":"2025030422263964900_btaf014-B11","first-page":"233","author":"Davis","year":"2006"},{"key":"2025030422263964900_btaf014-B12","doi-asserted-by":"crossref","first-page":"363","DOI":"10.5056\/jnm19044","article-title":"Parkinson\u2019s disease: the emerging role of gut dysbiosis, antibiotics, probiotics, and fecal microbiota transplantation","volume":"25","author":"Dutta","year":"2019","journal-title":"J Neurogastroenterol Motil"},{"key":"2025030422263964900_btaf014-B13","doi-asserted-by":"crossref","first-page":"2224","DOI":"10.3389\/fmicb.2017.02224","article-title":"Microbiome datasets are compositional: and this is not optional","volume":"8","author":"Gloor","year":"2017","journal-title":"Front Microbiol"},{"key":"2025030422263964900_btaf014-B14","doi-asserted-by":"crossref","first-page":"796","DOI":"10.1038\/s41592-018-0141-9","article-title":"Qiita: rapid, web-enabled microbiome meta-analysis","volume":"15","author":"Gonzalez","year":"2018","journal-title":"Nat Methods"},{"key":"2025030422263964900_btaf014-B15","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1186\/s40168-022-01331-x","article-title":"Location-specific signatures of Crohn\u2019s disease at a multi-omics scale","volume":"10","author":"Gonzalez","year":"2022","journal-title":"Microbiome"},{"key":"2025030422263964900_btaf014-B16","article-title":"Data augmentation for compositional data: advancing predictive models of the microbiome","author":"Gordon-Rodriguez"},{"key":"2025030422263964900_btaf014-B17","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1186\/s40168-024-01822-z","article-title":"MIDASim: a fast and simple simulator for realistic microbiome data","volume":"12","author":"He","year":"2024","journal-title":"Microbiome"},{"key":"2025030422263964900_btaf014-B18","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1016\/j.chom.2014.08.014","article-title":"The integrative human microbiome project: dynamic analysis of microbiome\u2013host omics profiles during periods of human health and disease","volume":"16","author":"Integrative HMP (iHMP) Research Network Consortium","year":"2014","journal-title":"Cell Host Microbe"},{"key":"2025030422263964900_btaf014-B19","doi-asserted-by":"publisher","author":"Jiang","year":"2023","DOI":"10.1101\/2023.11.04.565596"},{"key":"2025030422263964900_btaf014-B20","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1016\/j.chom.2011.09.003","article-title":"Human-associated microbial signatures: examining their predictive value","volume":"10","author":"Knights","year":"2011","journal-title":"Cell Host Microbe"},{"key":"2025030422263964900_btaf014-B21","doi-asserted-by":"crossref","first-page":"792996","DOI":"10.3389\/fnins.2022.792996","article-title":"Signature of Alzheimer\u2019s disease in intestinal microbiome: results from the AlzBiom study","volume":"16","author":"Laske","year":"2022","journal-title":"Front Neurosci"},{"key":"2025030422263964900_btaf014-B22","doi-asserted-by":"crossref","first-page":"655","DOI":"10.1038\/s41586-019-1237-9","article-title":"Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases","volume":"569","author":"Lloyd-Price","year":"2019","journal-title":"Nature"},{"key":"2025030422263964900_btaf014-B23","doi-asserted-by":"crossref","first-page":"8228","DOI":"10.1128\/AEM.71.12.8228-8235.2005","article-title":"UniFrac: a new phylogenetic method for comparing microbial communities","volume":"71","author":"Lozupone","year":"2005","journal-title":"Appl Environ Microbiol"},{"key":"2025030422263964900_btaf014-B24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/19490976.2021.1872323","article-title":"Harnessing machine learning for development of microbiome therapeutics","volume":"13","author":"McCoubrey","year":"2021","journal-title":"Gut Microbes"},{"key":"2025030422263964900_btaf014-B25","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1038\/s43705-022-00182-9","article-title":"Machine learning and deep learning applications in microbiome research","volume":"2","author":"Medina","year":"2022","journal-title":"ISME Commun"},{"key":"2025030422263964900_btaf014-B26","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1038\/s41564-021-01050-3","article-title":"Multi-omics analyses of the ulcerative colitis gut microbiome link bacteroides vulgatus proteases with disease severity","volume":"7","author":"Mills","year":"2022","journal-title":"Nat Microbiol"},{"key":"2025030422263964900_btaf014-B27","doi-asserted-by":"crossref","first-page":"100258","DOI":"10.1016\/j.array.2022.100258","article-title":"Data augmentation: a comprehensive survey of modern approaches","volume":"16","author":"Mumuni","year":"2022","journal-title":"Array"},{"key":"2025030422263964900_btaf014-B28","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2025030422263964900_btaf014-B29","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nature08821","article-title":"A human gut microbial gene catalogue established by metagenomic sequencing","volume":"464","author":"Qin","year":"2010","journal-title":"Nature"},{"key":"2025030422263964900_btaf014-B30","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1038\/nature11450","article-title":"A metagenome-wide association study of gut microbiota in type 2 diabetes","volume":"490","author":"Qin","year":"2012","journal-title":"Nature"},{"key":"2025030422263964900_btaf014-B31","doi-asserted-by":"crossref","first-page":"2993","DOI":"10.1109\/JBHI.2020.2993761","article-title":"PopPhy-CNN: a phylogenetic tree embedded architecture for convolutional neural networks to predict host phenotype from metagenomic data","volume":"24","author":"Reiman","year":"2020","journal-title":"IEEE J Biomed Health Inform"},{"key":"2025030422263964900_btaf014-B32","doi-asserted-by":"crossref","DOI":"10.1093\/gigascience\/giab005","article-title":"MB-GAN: microbiome simulation via generative adversarial network","volume":"10","author":"Rong","year":"2021","journal-title":"Gigascience"},{"key":"2025030422263964900_btaf014-B33","doi-asserted-by":"crossref","first-page":"i31","DOI":"10.1093\/bioinformatics\/btz394","article-title":"TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification","volume":"35","author":"Sayyari","year":"2019","journal-title":"Bioinformatics"},{"key":"2025030422263964900_btaf014-B34","doi-asserted-by":"crossref","first-page":"859","DOI":"10.1152\/physrev.00045.2009","article-title":"Gut microbiota in health and disease","volume":"90","author":"Sekirov","year":"2010","journal-title":"Physiol Rev"},{"key":"2025030422263964900_btaf014-B35","doi-asserted-by":"crossref","first-page":"4544","DOI":"10.1093\/bioinformatics\/btaa542","article-title":"TaxoNN: ensemble of neural networks on stratified microbiome data for disease prediction","volume":"36","author":"Sharma","year":"2020","journal-title":"Bioinformatics"},{"key":"2025030422263964900_btaf014-B36","doi-asserted-by":"crossref","first-page":"btae161","DOI":"10.1093\/bioinformatics\/btae161","article-title":"phylaGAN: data augmentation through conditional GANs and autoencoders for improving disease prediction accuracy using microbiome data","volume":"40","author":"Sharma","year":"2024","journal-title":"Bioinformatics"},{"key":"2025030422263964900_btaf014-B37","doi-asserted-by":"crossref","first-page":"1027","DOI":"10.1038\/nature05414","article-title":"An obesity-associated gut microbiome with increased capacity for energy harvest","volume":"444","author":"Turnbaugh","year":"2006","journal-title":"Nature"},{"key":"2025030422263964900_btaf014-B38","doi-asserted-by":"crossref","first-page":"13537","DOI":"10.1038\/s41598-017-13601-y","article-title":"Gut microbiome alterations in Alzheimer\u2019s disease","volume":"7","author":"Vogt","year":"2017","journal-title":"Sci Rep"},{"key":"2025030422263964900_btaf014-B39","doi-asserted-by":"crossref","first-page":"652","DOI":"10.1038\/s41564-018-0156-0","article-title":"Methods for phylogenetic analysis of microbiome data","volume":"3","author":"Washburne","year":"2018","journal-title":"Nat Microbiol"},{"key":"2025030422263964900_btaf014-B40","doi-asserted-by":"crossref","first-page":"2742","DOI":"10.1016\/j.csbj.2021.04.054","article-title":"Towards multi-label classification: next step of machine learning for microbiome research","volume":"19","author":"Wu","year":"2021","journal-title":"Comput Struct Biotechnol J"},{"key":"2025030422263964900_btaf014-B41","doi-asserted-by":"crossref","first-page":"1391","DOI":"10.3389\/fmicb.2018.01391","article-title":"Predictive modeling of microbiome data using a phylogeny-regularized generalized linear mixed model","volume":"9","author":"Xiao","year":"2018","journal-title":"Front Microbiol"},{"key":"2025030422263964900_btaf014-B42","doi-asserted-by":"crossref","first-page":"5984","DOI":"10.1109\/TIP.2021.3089942","article-title":"Delving deep into label smoothing","volume":"30","author":"Zhang","year":"2021","journal-title":"IEEE Trans Image Process"},{"key":"2025030422263964900_btaf014-B43","author":"Zhang","year":"2018"},{"key":"2025030422263964900_btaf014-B44","author":"Zhang","year":"2022"},{"key":"2025030422263964900_btaf014-B45","doi-asserted-by":"crossref","first-page":"5477","DOI":"10.1038\/s41467-019-13443-4","article-title":"Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains bacteria and archaea","volume":"10","author":"Zhu","year":"2019","journal-title":"Nat Commun"},{"key":"2025030422263964900_btaf014-B46","doi-asserted-by":"crossref","first-page":"3399","DOI":"10.1038\/s41396-021-01016-7","article-title":"Compositional and genetic alterations in Graves\u2019 disease gut microbiome reveal specific diagnostic biomarkers","volume":"15","author":"Zhu","year":"2021","journal-title":"ISME J"},{"key":"2025030422263964900_btaf014-B47","doi-asserted-by":"crossref","first-page":"e00167\u201322","DOI":"10.1128\/msystems.00167-22","article-title":"Phylogeny-aware analysis of metagenome community ecology based on matched reference genomes while bypassing taxonomy","volume":"7","author":"Zhu","year":"2022","journal-title":"mSystems"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf014\/61416299\/btaf014.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/2\/btaf014\/61416299\/btaf014.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/2\/btaf014\/61416299\/btaf014.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T17:27:17Z","timestamp":1741109237000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf014\/7952016"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,1,12]]},"references-count":47,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,2,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf014","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.08.26.609661","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,2]]},"published":{"date-parts":[[2025,1,12]]},"article-number":"btaf014"}}