{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,20]],"date-time":"2026-02-20T19:19:18Z","timestamp":1771615158871,"version":"3.50.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2025,1,28]],"date-time":"2025-01-28T00:00:00Z","timestamp":1738022400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100008028","name":"University of Wisconsin School of Medicine and Public Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100008028","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,2,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Clustering patients into subgroups based on their microbial compositions can greatly enhance our understanding of the role of microbes in human health and disease etiology. Distance-based clustering methods, such as partitioning around medoids (PAM), are popular due to their computational efficiency and absence of distributional assumptions. However, the performance of these methods can be suboptimal when true cluster memberships are driven by differences in the abundance of only a few microbes, a situation known as the sparse signal scenario.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We demonstrate that classical multidimensional scaling (MDS), a widely used dimensionality reduction technique, effectively denoises microbiome data and enhances the clustering performance of distance-based methods. We propose a two-step procedure that first applies MDS to project high-dimensional microbiome data into a low-dimensional space, followed by distance-based clustering using the low-dimensional data. Our extensive simulations demonstrate that our procedure offers superior performance compared to directly conducting distance-based clustering under the sparse signal scenario. The advantage of our procedure is further showcased in several real data applications.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The R package MDSMClust is available at https:\/\/github.com\/wxy929\/MDS-project.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf042","type":"journal-article","created":{"date-parts":[[2025,1,28]],"date-time":"2025-01-28T20:01:37Z","timestamp":1738094497000},"source":"Crossref","is-referenced-by-count":6,"title":["Multidimensional scaling improves distance-based clustering for microbiome data"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9314-2037","authenticated-orcid":false,"given":"Guanhua","family":"Chen","sequence":"first","affiliation":[{"name":"Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison , Madison, WI 53726,","place":["United States"]}]},{"given":"Xinyue","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Statistics, Pennsylvania State University , University Park, PA 16802,","place":["United States"]}]},{"given":"Qiang","family":"Sun","sequence":"additional","affiliation":[{"name":"Department of Statistical Science, University of Toronto , Toronto, ON M5S 3G3,","place":["Canada"]}]},{"given":"Zheng-Zheng","family":"Tang","sequence":"additional","affiliation":[{"name":"Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison , Madison, WI 53726,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2025,1,28]]},"reference":[{"key":"2025030422261222500_btaf042-B1","doi-asserted-by":"crossref","first-page":"1203","DOI":"10.3982\/ECTA8968","article-title":"Eigenvalue ratio test for the number of factors","volume":"81","author":"Ahn","year":"2013","journal-title":"Econometrica"},{"key":"2025030422261222500_btaf042-B2","doi-asserted-by":"crossref","first-page":"821861","DOI":"10.3389\/fbinf.2022.821861","article-title":"Applications and comparison of dimensionality reduction methods for microbiome data","volume":"2","author":"Armstrong","year":"2022","journal-title":"Front Bioinform"},{"key":"2025030422261222500_btaf042-B3","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1038\/nature09944","article-title":"Enterotypes of the human gut microbiome","volume":"473","author":"Arumugam","year":"2011","journal-title":"Nature"},{"key":"2025030422261222500_btaf042-B4"},{"key":"2025030422261222500_btaf042-B5"},{"key":"2025030422261222500_btaf042-B6","doi-asserted-by":"crossref","first-page":"852","DOI":"10.1038\/s41587-019-0209-9","article-title":"Reproducible, interactive, scalable and extensible microbiome data science using qiime 2","volume":"37","author":"Bolyen","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2025030422261222500_btaf042-B7","volume-title":"Modern Multidimensional Scaling: Theory and Applications","author":"Borg","year":"2005"},{"key":"2025030422261222500_btaf042-B8","doi-asserted-by":"crossref","first-page":"325","DOI":"10.2307\/1942268","article-title":"An ordination of the upland forest communities of Southern Wisconsin","volume":"27","author":"Bray","year":"1957","journal-title":"Ecol Monogr"},{"key":"2025030422261222500_btaf042-B9","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1038\/nmeth.3869","article-title":"Dada2: high-resolution sample inference from illumina amplicon data","volume":"13","author":"Callahan","year":"2016","journal-title":"Nat Methods"},{"key":"2025030422261222500_btaf042-B10","doi-asserted-by":"crossref","first-page":"e15216","DOI":"10.1371\/journal.pone.0015216","article-title":"Disordered microbial communities in the upper respiratory tract of cigarette smokers","volume":"5","author":"Charlson","year":"2010","journal-title":"PLoS One"},{"key":"2025030422261222500_btaf042-B11","doi-asserted-by":"crossref","first-page":"2106","DOI":"10.1093\/bioinformatics\/bts342","article-title":"Associating microbiome composition with environmental covariates using generalized unifrac distances","volume":"28","author":"Chen","year":"2012","journal-title":"Bioinformatics"},{"key":"2025030422261222500_btaf042-B12","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1038\/s41564-017-0072-8","article-title":"Enterotypes in the landscape of gut microbial community composition","volume":"3","author":"Costea","year":"2018","journal-title":"Nat Microbiol"},{"key":"2025030422261222500_btaf042-B13","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nature13178","article-title":"Dynamics and associations of microbial community types across the human body","volume":"509","author":"Ding","year":"2014","journal-title":"Nature"},{"key":"2025030422261222500_btaf042-B14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v022.i04","article-title":"The ade4 package: implementing the duality diagram for ecologists","volume":"22","author":"Dray","year":"2007","journal-title":"J Stat Soft"},{"key":"2025030422261222500_btaf042-B15","doi-asserted-by":"crossref","first-page":"2224","DOI":"10.3389\/fmicb.2017.02224","article-title":"Microbiome datasets are compositional: and this is not optional","volume":"8","author":"Gloor","year":"2017","journal-title":"Front Microbiol"},{"key":"2025030422261222500_btaf042-B16","doi-asserted-by":"crossref","first-page":"e30126","DOI":"10.1371\/journal.pone.0030126","article-title":"Dirichlet multinomial mixtures: generative models for microbial metagenomics","volume":"7","author":"Holmes","year":"2012","journal-title":"PLoS One"},{"key":"2025030422261222500_btaf042-B17","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J Classif"},{"key":"2025030422261222500_btaf042-B18","first-page":"207\u221214","article-title":"Structure, function and diversity of the healthy human microbiome","volume":"486","author":"Huttenhower","year":"2012","journal-title":"Nature"},{"key":"2025030422261222500_btaf042-B19","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1002\/9780470316801.ch2","article-title":"Partitioning around medoids (program pam)","volume":"344","author":"Kaufman","year":"1990","journal-title":"Finding Groups Data"},{"key":"2025030422261222500_btaf042-B20","doi-asserted-by":"crossref","first-page":"e1002863","DOI":"10.1371\/journal.pcbi.1002863","article-title":"A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets","volume":"9","author":"Koren","year":"2013","journal-title":"PLoS Comput Biol"},{"key":"2025030422261222500_btaf042-B21","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1146\/annurev-statistics-010814-020351","article-title":"Microbiome, metagenomics, and high-dimensional compositional data analysis","volume":"2","author":"Li","year":"2015","journal-title":"Annu Rev Stat Appl"},{"key":"2025030422261222500_btaf042-B22","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","article-title":"Least squares quantization in PCM","volume":"28","author":"Lloyd","year":"1982","journal-title":"IEEE Trans Inform Theory"},{"key":"2025030422261222500_btaf042-B23","doi-asserted-by":"crossref","first-page":"8228","DOI":"10.1128\/AEM.71.12.8228-8235.2005","article-title":"Unifrac: a new phylogenetic method for comparing microbial communities","volume":"71","author":"Lozupone","year":"2005","journal-title":"Appl Environ Microbiol"},{"key":"2025030422261222500_btaf042-B24","doi-asserted-by":"crossref","first-page":"1576","DOI":"10.1128\/AEM.01996-06","article-title":"Quantitative and qualitative \u03b2 diversity measures lead to different insights into factors that structure microbial communities","volume":"73","author":"Lozupone","year":"2007","journal-title":"Appl Environ Microbiol"},{"key":"2025030422261222500_btaf042-B25","doi-asserted-by":"crossref","first-page":"527","DOI":"10.1016\/j.celrep.2015.03.049","article-title":"The gut microbiota of rural Papua New Guineans: composition, diversity patterns, and ecological processes","volume":"11","author":"Mart\u00ednez","year":"2015","journal-title":"Cell Rep"},{"key":"2025030422261222500_btaf042-B26","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1128\/msystems.00016-19","article-title":"A novel sparse compositional technique reveals microbial perturbations","volume":"4","author":"Martino","year":"2019","journal-title":"MSystems"},{"key":"2025030422261222500_btaf042-B27"},{"key":"2025030422261222500_btaf042-B28","doi-asserted-by":"crossref","first-page":"e61217","DOI":"10.1371\/journal.pone.0061217","article-title":"phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data","volume":"8","author":"McMurdie","year":"2013","journal-title":"PLoS One"},{"key":"2025030422261222500_btaf042-B29","doi-asserted-by":"crossref","first-page":"721","DOI":"10.2307\/2528553","article-title":"A generalised logit-normal distribution","volume":"21","author":"Mead","year":"1965","journal-title":"Biometrics"},{"key":"2025030422261222500_btaf042-B30","author":"Oksanen"},{"key":"2025030422261222500_btaf042-B31","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: a graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J Comput Appl Math"},{"key":"2025030422261222500_btaf042-B32","doi-asserted-by":"crossref","first-page":"1023","DOI":"10.1093\/aje\/kwz006","article-title":"Hmp16sdata: efficient access to the human microbiome project through bioconductor","volume":"188","author":"Schiffer","year":"2019","journal-title":"Am J Epidemiol"},{"key":"2025030422261222500_btaf042-B33","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1186\/s40168-021-01199-3","article-title":"Performance determinants of unsupervised clustering methods for microbiome data","volume":"10","author":"Shi","year":"2022","journal-title":"Microbiome"},{"key":"2025030422261222500_btaf042-B34","doi-asserted-by":"crossref","first-page":"e21887","DOI":"10.7554\/eLife.21887","article-title":"A phylogenetic transform enhances analysis of compositional microbiota data","volume":"6","author":"Silverman","year":"2017","journal-title":"Elife"},{"key":"2025030422261222500_btaf042-B35","doi-asserted-by":"crossref","first-page":"802","DOI":"10.1126\/science.aan4834","article-title":"Seasonal cycling in the gut microbiome of the hadza hunter-gatherers of tanzania","volume":"357","author":"Smits","year":"2017","journal-title":"Science"},{"key":"2025030422261222500_btaf042-B36","doi-asserted-by":"crossref","first-page":"2618","DOI":"10.1093\/bioinformatics\/btw311","article-title":"PERMANOVA-S: association test for microbial community composition that accommodates confounders and multiple distances","volume":"32","author":"Tang","year":"2016","journal-title":"Bioinformatics"},{"key":"2025030422261222500_btaf042-B37","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1198\/106186005X59243","article-title":"Cluster validation by prediction strength","volume":"14","author":"Tibshirani","year":"2005","journal-title":"J Comput Graph Stat"},{"key":"2025030422261222500_btaf042-B38","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1111\/1467-9868.00293","article-title":"Estimating the number of clusters in a data set via the gap statistic","volume":"63","author":"Tibshirani","year":"2001","journal-title":"J R Stat Soc: Ser B (Stat Methodol)"},{"key":"2025030422261222500_btaf042-B39","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1016\/j.tpb.2010.07.002","article-title":"Overdispersion in allelic counts and theta-correction in forensic genetics","volume":"78","author":"Tvedebrink","year":"2010","journal-title":"Theor Popul Biol"},{"key":"2025030422261222500_btaf042-B40","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Van der Maaten","year":"2008","journal-title":"J Mach Learn Res"},{"key":"2025030422261222500_btaf042-B41","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1023\/A:1012485807823","article-title":"On a connection between kernel PCA and metric multidimensional scaling","volume":"46","author":"Williams","year":"2002","journal-title":"Mach Learn"},{"key":"2025030422261222500_btaf042-B42","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1186\/s13073-016-0302-3","article-title":"An adaptive association test for microbiome data","volume":"8","author":"Wu","year":"2016","journal-title":"Genome Med"},{"key":"2025030422261222500_btaf042-B43","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1126\/science.1208344","article-title":"Linking long-term dietary patterns with gut microbial enterotypes","volume":"334","author":"Wu","year":"2011","journal-title":"Science"},{"key":"2025030422261222500_btaf042-B44","doi-asserted-by":"crossref","first-page":"1612","DOI":"10.3390\/microorganisms8101612","article-title":"Clustering on human microbiome sequencing data: a distance-based unsupervised learning model","volume":"8","author":"Yang","year":"2020","journal-title":"Microorganisms"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf042\/61664392\/btaf042.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/2\/btaf042\/61664392\/btaf042.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/2\/btaf042\/61664392\/btaf042.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T22:26:33Z","timestamp":1741127193000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf042\/7985707"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,1,28]]},"references-count":44,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,2,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf042","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,2]]},"published":{"date-parts":[[2025,1,28]]},"article-number":"btaf042"}}