{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T16:23:26Z","timestamp":1780676606995,"version":"3.54.1"},"reference-count":64,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2022,8,26]],"date-time":"2022-08-26T00:00:00Z","timestamp":1661472000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100004410","name":"Scientific and Technological Research Council of Turkey","doi-asserted-by":"publisher","award":["1059B141601395"],"award-info":[{"award-number":["1059B141601395"]}],"id":[{"id":"10.13039\/501100004410","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,20]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Statistical and machine learning techniques based on relative abundances have been used to predict health conditions and to identify microbial biomarkers. However, high dimensionality, sparsity and the compositional nature of microbiome data represent statistical challenges. On the other hand, the taxon grouping allows summarizing microbiome abundance with a coarser resolution in a lower dimension, but it presents new challenges when correlating taxa with a disease. In this work, we present a novel approach that groups Operational Taxonomical Units (OTUs) based only on relative abundances as an alternative to taxon grouping. The proposed procedure acknowledges the compositional data making use of principal balances. The identified groups are called Principal Microbial Groups (PMGs). The procedure reduces the need for user-defined aggregation of $\\textrm{OTU}$s and offers the possibility of working with coarse group of $\\textrm{OTU}$s, which are not present in a phylogenetic tree. PMGs can be used for two different goals: (1) as a dimensionality reduction method for compositional data, (2) as an aggregation procedure that provides an alternative to taxon grouping for construction of microbial balances afterward used for disease prediction. We illustrate the procedure with a cirrhosis study data. PMGs provide a coherent data analysis for the search of biomarkers in human microbiota. The source code and demo data for PMGs are available at: https:\/\/github.com\/asliboyraz\/PMGs.<\/jats:p>","DOI":"10.1093\/bib\/bbac328","type":"journal-article","created":{"date-parts":[[2022,8,25]],"date-time":"2022-08-25T20:21:44Z","timestamp":1661458904000},"source":"Crossref","is-referenced-by-count":3,"title":["Principal microbial groups: compositional alternative to phylogenetic grouping of microbiome data"],"prefix":"10.1093","volume":"23","author":[{"given":"Asl\u0131","family":"Boyraz","sequence":"first","affiliation":[{"name":"Department of Computer Programming , Recep Tayyip Erdo\u011fan University, Arde\u015fen Vocational School, Rize, 53400 , Turkey"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Vera","family":"Pawlowsky-Glahn","sequence":"additional","affiliation":[{"name":"Department of Computer Sciences , Applied Mathematics and Statistics, University of Girona, Campus Montilivi, 17003 Girona , Spain"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Juan Jos\u00e9","family":"Egozcue","sequence":"additional","affiliation":[{"name":"Department of Civil and Environmental Engineering , Universitat Polit\u00e9cnica de Catalunya, Barcelona, 08034 , Spain"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Aybar Can","family":"Acar","sequence":"additional","affiliation":[{"name":"Department of Medical Informatics , Middle East Technical University, Ankara Turkey"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2022,8,26]]},"reference":[{"issue":"6","key":"2022092013220049900_ref1","doi-asserted-by":"crossref","first-page":"1258","DOI":"10.1016\/j.cell.2012.01.035","article-title":"The impact of the gut microbiota on human health: an integrative view","volume":"148","author":"Clemente","year":"2012","journal-title":"Cell"},{"issue":"1","key":"2022092013220049900_ref2","first-page":"e01154","article-title":"Data analysis strategies for microbiome studies in human populations-a systematic review of current practice","volume":"6","author":"Bardenhorst","year":"2021","journal-title":"Msystems"},{"issue":"5","key":"2022092013220049900_ref3","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1038\/nrc.2017.13","article-title":"Microbiota: a key orchestrator of cancer therapy","volume":"17","author":"Roy","year":"2017","journal-title":"Nat Rev Cancer"},{"issue":"8","key":"2022092013220049900_ref4","doi-asserted-by":"crossref","first-page":"508","DOI":"10.1038\/nrmicro.2016.83","article-title":"Metagenome-wide association studies: fine-mining the microbiome","volume":"14","author":"Wang","year":"2016","journal-title":"Nat Rev Microbiol"},{"issue":"3","key":"2022092013220049900_ref5","doi-asserted-by":"crossref","first-page":"e00031","DOI":"10.1128\/mSystems.00031-18","article-title":"American gut: an open platform for citizen science microbiome research","volume":"3","author":"McDonald","year":"2018","journal-title":"Msystems"},{"issue":"6","key":"2022092013220049900_ref6","doi-asserted-by":"crossref","first-page":"760","DOI":"10.1016\/j.gpb.2020.11.001","article-title":"Microphenodb associates metagenomic data with pathogenic microbes, microbial core genes, and human disease phenotypes","volume":"18","author":"Yao","year":"2020","journal-title":"Genomics Proteomics Bioinformatics"},{"issue":"2","key":"2022092013220049900_ref7","doi-asserted-by":"crossref","first-page":"562","DOI":"10.1002\/hep.24423","article-title":"Characterization of fecal microbial communities in patients with liver cirrhosis","volume":"54","author":"Chen","year":"2011","journal-title":"Hepatology"},{"issue":"1","key":"2022092013220049900_ref8","first-page":"1","article-title":"Guild-based analysis for understanding gut microbiome in human health and diseases","volume":"13","author":"Guojun","year":"2021","journal-title":"Genome Med"},{"key":"2022092013220049900_ref9","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baaa050","article-title":"maml: an automated machine learning pipeline with a microbiome repository for human disease classification","author":"Yang","year":"2020","journal-title":"Database"},{"issue":"3","key":"2022092013220049900_ref10","doi-asserted-by":"crossref","first-page":"e00434","DOI":"10.1128\/mBio.00434-20","article-title":"A framework for effective application of machine learning to microbiome-based classification problems","volume":"11","author":"Top\u00e7uo\u011flu","year":"2020","journal-title":"MBio"},{"issue":"1","key":"2022092013220049900_ref11","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0084689","article-title":"A taxonomic signature of obesity in the microbiome? Getting to the guts of the matter","volume":"9","author":"Finucane","year":"2014","journal-title":"PLoS One"},{"key":"2022092013220049900_ref12","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1016\/j.mib.2018.07.003","article-title":"Low diversity gut microbiota dysbiosis: drivers, functional implications and recovery","volume":"44","author":"Kriss","year":"2018","journal-title":"Curr Opin Microbiol"},{"issue":"3","key":"2022092013220049900_ref13","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1136\/gutjnl-2020-321747","article-title":"Gut microbiome stability and resilience: elucidating the response to perturbations in order to modulate gut health","volume":"70","author":"Fassarella","year":"2021","journal-title":"Gut"},{"key":"2022092013220049900_ref14","first-page":"324","article-title":"Dysbiosis of gut microbiota associated with clinical parameters in polycystic ovary syndrome","volume":"8","author":"Liu","year":"2017","journal-title":"Front Microbiol"},{"issue":"22","key":"2022092013220049900_ref15","doi-asserted-by":"crossref","first-page":"4131","DOI":"10.1016\/j.febslet.2014.02.037","article-title":"The dynamic microbiome","volume":"588","author":"Gerber","year":"2014","journal-title":"FEBS Lett"},{"issue":"1","key":"2022092013220049900_ref16","doi-asserted-by":"crossref","first-page":"14","DOI":"10.3390\/microorganisms7010014","article-title":"What is the healthy gut microbiota composition? a changing ecosystem across age, environment, diet, and diseases","volume":"7","author":"Rinninella","year":"2019","journal-title":"Microorganisms"},{"issue":"5","key":"2022092013220049900_ref17","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1016\/j.annepidem.2016.03.003","article-title":"It\u2019s all relative: analyzing microbiome data as compositions","volume":"26","author":"Gloor","year":"2016","journal-title":"Ann Epidemiol"},{"key":"2022092013220049900_ref18","doi-asserted-by":"crossref","first-page":"2224","DOI":"10.3389\/fmicb.2017.02224","article-title":"Microbiome datasets are compositional: and this is not optional","volume":"8","author":"Gloor","year":"2017","journal-title":"Front Microbiol"},{"issue":"4","key":"2022092013220049900_ref19","doi-asserted-by":"crossref","first-page":"e00053","DOI":"10.1128\/mSystems.00053-18","article-title":"Balances: a new perspective for microbiome analysis","volume":"3","author":"Rivera-Pinto","year":"2018","journal-title":"MSystems"},{"issue":"1","key":"2022092013220049900_ref20","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1214\/17-AOAS1102","article-title":"Kernel-penalized regression for analysis of microbiome data","volume":"12","author":"Randolph","year":"2018","journal-title":"Ann Applied Stat"},{"issue":"1","key":"2022092013220049900_ref21","doi-asserted-by":"crossref","first-page":"e00016","DOI":"10.1128\/mSystems.00016-19","article-title":"A novel sparse compositional technique reveals microbial perturbations","volume":"4","author":"Martino","year":"2019","journal-title":"MSystems"},{"key":"2022092013220049900_ref22","doi-asserted-by":"crossref","DOI":"10.7717\/peerj.2969","article-title":"Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets","volume":"5","author":"Washburne","year":"2017","journal-title":"Peer J"},{"key":"2022092013220049900_ref23","article-title":"Multisample estimation of bacterial composition matrices in metagenomics data","volume-title":"Biometrika"},{"issue":"1","key":"2022092013220049900_ref24","article-title":"Analysis of composition of microbiomes: a novel method for studying microbial composition","volume":"26","author":"Mandal","year":"2015","journal-title":"Microbial Ecol Health Dis"},{"issue":"16","key":"2022092013220049900_ref25","doi-asserted-by":"crossref","first-page":"2870","DOI":"10.1093\/bioinformatics\/bty175","article-title":"Understanding sequencing data as compositions: an outlook and review","volume":"34","author":"Quinn","year":"2018","journal-title":"Bioinformatics"},{"issue":"4","key":"2022092013220049900_ref26","doi-asserted-by":"crossref","DOI":"10.1093\/nargab\/lqaa103","article-title":"Compositional data analysis and related methods applied to genomics","volume":"2","author":"Erb","year":"2020","journal-title":"NAR Genomics Bioinformatics"},{"issue":"8","key":"2022092013220049900_ref27","doi-asserted-by":"crossref","first-page":"692","DOI":"10.1139\/cjm-2015-0821","article-title":"Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data","volume":"62","author":"Gloor","year":"2016","journal-title":"Can J Microbiol"},{"issue":"2","key":"2022092013220049900_ref28","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1177\/1471082X14535525","article-title":"Sparse principal balances","volume":"15","author":"Mert","year":"2015","journal-title":"Stat Modelling"},{"issue":"3","key":"2022092013220049900_ref29","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1004075","article-title":"Proportionality: a valid alternative to correlation for relative data","volume":"11","author":"Lovell","year":"2015","journal-title":"PLoS Comput Biol"},{"issue":"19","key":"2022092013220049900_ref30","doi-asserted-by":"crossref","first-page":"3172","DOI":"10.1093\/bioinformatics\/btv349","article-title":"Cclasso: correlation inference for compositional data through lasso","volume":"31","author":"Fang","year":"2015","journal-title":"Bioinformatics"},{"issue":"5","key":"2022092013220049900_ref31","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1004226","article-title":"Sparse and compositionally robust inference of microbial ecological networks","volume":"11","author":"Kurtz","year":"2015","journal-title":"PLoS Comput Biol"},{"key":"2022092013220049900_ref32","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbab094","article-title":"Disbalance: a platform to automatically build balance-based disease prediction models and discover microbial biomarkers from microbiome data","volume":"22","author":"Yang","year":"2021","journal-title":"Brief Bioinform"},{"key":"2022092013220049900_ref33","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbaa436","article-title":"Gutbalance: a server for the human gut microbiome-based disease prediction and biomarker discovery with compositionality addressed","volume":"22","author":"Yang","year":"2021","journal-title":"Brief Bioinform"},{"key":"2022092013220049900_ref34","doi-asserted-by":"crossref","DOI":"10.7554\/eLife.21887","article-title":"A phylogenetic transform enhances analysis of compositional microbiota data","volume":"6","author":"Silverman","year":"2017","journal-title":"Elife"},{"issue":"2","key":"2022092013220049900_ref35","doi-asserted-by":"crossref","first-page":"e00230","DOI":"10.1128\/mSystems.00230-19","article-title":"Interpretable log contrasts for the classification of health biomarkers: a new approach to balance selection","volume":"5","author":"Quinn","year":"2020","journal-title":"Msystems"},{"key":"2022092013220049900_ref36","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1146\/annurev-statistics-010814-020351","article-title":"Microbiome, metagenomics, and high-dimensional compositional data analysis","volume":"2","author":"Li","year":"2015","journal-title":"Annu Rev Stat Appl"},{"issue":"9","key":"2022092013220049900_ref37","doi-asserted-by":"crossref","DOI":"10.1093\/gigascience\/giz107","article-title":"A field guide for the compositional analysis of any-omics data","volume":"8","author":"Quinn","year":"2019","journal-title":"GigaScience"},{"key":"2022092013220049900_ref38","doi-asserted-by":"crossref","first-page":"313","DOI":"10.3389\/fmicb.2021.634511","article-title":"Applications of machine learning in human microbiome studies: a review on feature selection, biomarker identification, disease prediction and treatment","volume":"12","author":"Marcos-Zambrano","year":"2021","journal-title":"Front Microbiol"},{"issue":"5","key":"2022092013220049900_ref39","doi-asserted-by":"crossref","first-page":"384","DOI":"10.1007\/s004770100077","article-title":"Geometric approach to statistical analysis on the simplex","volume":"15","author":"Pawlowsky-Glahn","year":"2001","journal-title":"Stoch Environ Res Risk Assess"},{"issue":"2","key":"2022092013220049900_ref40","doi-asserted-by":"crossref","DOI":"10.1093\/nargab\/lqaa029","article-title":"Variable selection in microbiome compositional data analysis","volume":"2","author":"Susin","year":"2020","journal-title":"NAR Genomics Bioinformatics"},{"key":"2022092013220049900_ref41","doi-asserted-by":"crossref","first-page":"277","DOI":"10.3389\/fmicb.2021.635781","article-title":"Statistical and machine learning techniques in human microbiome studies: contemporary challenges and solutions","volume":"12","author":"Moreno-Indias","year":"2021","journal-title":"Front Microbiol"},{"issue":"3","key":"2022092013220049900_ref42","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1023\/A:1023818214614","article-title":"Isometric logratio transformations for compositional data analysis","volume":"35","author":"Egozcue","year":"2003","journal-title":"Math Geol"},{"issue":"1","key":"2022092013220049900_ref43","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1093\/biomet\/70.1.57","article-title":"Principal component analysis of compositional data","volume":"70","author":"Aitchison","year":"1983","journal-title":"Biometrika"},{"key":"2022092013220049900_ref44","first-page":"31","volume-title":"Compositional Data Analysis: Theory and Applications","author":"Mateu-Figueras","year":"2011"},{"key":"2022092013220049900_ref45","doi-asserted-by":"crossref","DOI":"10.1002\/9781119003144","volume-title":"Modeling and Analysis of Compositional Data","author":"Pawlowsky-Glahn","year":"2015"},{"issue":"7","key":"2022092013220049900_ref46","doi-asserted-by":"crossref","first-page":"795","DOI":"10.1007\/s11004-005-7381-9","article-title":"Groups of parts and their balances in compositional data analysis","volume":"37","author":"Egozcue","year":"2005","journal-title":"Math Geol"},{"issue":"3","key":"2022092013220049900_ref47","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/s11004-017-9712-z","article-title":"Advances in principal balances for compositional data","volume":"50","author":"Mart\u00edn-Fern\u00e1ndez","year":"2018","journal-title":"Math Geosci"},{"issue":"3","key":"2022092013220049900_ref48","doi-asserted-by":"crossref","first-page":"599","DOI":"10.1007\/s11749-019-00670-6","article-title":"Compositional data: the sample space and its structure","volume":"28","author":"Egozcue","year":"2019","journal-title":"TEST"},{"issue":"8","key":"2022092013220049900_ref49","article-title":"Mixmc: a multivariate statistical framework to gain insight into microbial communities","volume":"11","author":"Cao","year":"2016","journal-title":"PLoS One"},{"issue":"1","key":"2022092013220049900_ref50","doi-asserted-by":"crossref","first-page":"e00162","DOI":"10.1128\/mSystems.00162-16","article-title":"Balance trees reveal microbial niche differentiation","volume":"2","author":"Morton","year":"2017","journal-title":"MSystems"},{"key":"2022092013220049900_ref51","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btab645","article-title":"Learning sparse log-ratios for high-throughput sequencing data","author":"Gordon-Rodriguez","year":"2021"},{"key":"2022092013220049900_ref52","doi-asserted-by":"crossref","DOI":"10.1016\/j.acags.2019.100017","article-title":"Amalgamations are valid in compositional data analysis, can be used in agglomerative clustering, and their logratios have an inverse transformation","volume":"5","author":"Greenacre","year":"2020","journal-title":"Appl Comput Geosci"},{"issue":"4","key":"2022092013220049900_ref53","doi-asserted-by":"crossref","DOI":"10.1093\/nargab\/lqaa076","article-title":"Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data","volume":"2","author":"Quinn","year":"2020","journal-title":"NAR Genomics Bioinformatics"},{"issue":"1","key":"2022092013220049900_ref54","doi-asserted-by":"crossref","first-page":"3","DOI":"10.17713\/ajs.v47i1.689","article-title":"Linear association in compositional data analysis","volume":"47","author":"Egozcue","year":"2018","journal-title":"Aus J Stat"},{"key":"2022092013220049900_ref55","article-title":"Principal balances","volume-title":"Proceedings of the 4th International Workshop on CODA(2011)","author":"Pawlowsky-Glahn","year":"2011"},{"issue":"7516","key":"2022092013220049900_ref56","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nature13568","article-title":"Alterations of the human gut microbiome in liver cirrhosis","volume":"513","author":"Qin","year":"2014","journal-title":"Nature"},{"issue":"7","key":"2022092013220049900_ref57","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1004977","article-title":"Machine learning meta-analysis of large metagenomic datasets: tools and biological insights","volume":"12","author":"Pasolli","year":"2016","journal-title":"PLoS Comput Biol"},{"issue":"5","key":"2022092013220049900_ref58","doi-asserted-by":"crossref","DOI":"10.1093\/gigascience\/giz042","article-title":"Microbiome learning repo (ml repo): A public repository of microbiome regression and classification tasks","volume":"8","author":"Vangay","year":"2019","journal-title":"Gigascience"},{"key":"2022092013220049900_ref59","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.chemolab.2015.02.019","article-title":"zcompositions-r package for multivariate imputation of left-censored data under a compositional approach","volume":"143","author":"Palarea-Albaladejo","year":"2015","journal-title":"Chemom Intel Lab Syst"},{"issue":"5","key":"2022092013220049900_ref60","article-title":"The caret package","volume":"28","author":"Kuhn","year":"2009","journal-title":"J Stat Softw"},{"issue":"5","key":"2022092013220049900_ref61","doi-asserted-by":"crossref","first-page":"940","DOI":"10.1016\/j.jhep.2013.12.019","article-title":"Altered profile of human gut microbiome is associated with cirrhosis and its complications","volume":"60","author":"Bajaj","year":"2014","journal-title":"J Hepatol"},{"issue":"1 & 2","key":"2022092013220049900_ref62","first-page":"103","article-title":"Exploring compositional data with the coda-dendrogram","volume":"40","author":"Pawlowsky-Glahn","year":"2011","journal-title":"Aus J Stat"},{"key":"2022092013220049900_ref63","doi-asserted-by":"crossref","first-page":"258","DOI":"10.1007\/978-3-642-36809-7","volume-title":"Analyzing Compositional Data with R","author":"Boogaart","year":"2013"},{"issue":"12","key":"2022092013220049900_ref64","doi-asserted-by":"crossref","first-page":"1682","DOI":"10.1016\/j.cageo.2007.06.011","article-title":"Balance-dendrogram. a new routine of codapack","volume":"34","author":"Thi\u00f3-Henestrosa","year":"2008","journal-title":"Comput Geosci"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/5\/bbac328\/45937185\/bbac328.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/5\/bbac328\/45937185\/bbac328.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,26]],"date-time":"2023-11-26T04:22:50Z","timestamp":1700972570000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbac328\/6675749"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,26]]},"references-count":64,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbac328","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,9]]},"published":{"date-parts":[[2022,8,26]]},"article-number":"bbac328"}}