{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,12]],"date-time":"2026-04-12T20:44:06Z","timestamp":1776026646452,"version":"3.50.1"},"reference-count":37,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,8,10]],"date-time":"2023-08-10T00:00:00Z","timestamp":1691625600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:p>\n                    Conventional dimensionality reduction methods like Multidimensional Scaling (MDS) are sensitive to the presence of orthogonal outliers, leading to significant defects in the embedding. We introduce a robust MDS method, called\n                    <jats:italic>DeCOr-MDS<\/jats:italic>\n                    (Detection and Correction of Orthogonal outliers using MDS), based on the geometry and statistics of simplices formed by data points, that allows to detect orthogonal outliers and subsequently reduce dimensionality. We validate our methods using synthetic datasets, and further show how it can be applied to a variety of large real biological datasets, including cancer image cell data, human microbiome project data and single cell RNA sequencing data, to address the task of data cleaning and visualization.\n                  <\/jats:p>","DOI":"10.3389\/fbinf.2023.1211819","type":"journal-article","created":{"date-parts":[[2023,8,10]],"date-time":"2023-08-10T07:53:27Z","timestamp":1691654007000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets"],"prefix":"10.3389","volume":"3","author":[{"given":"Wanxin","family":"Li","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jules","family":"Mirone","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ashok","family":"Prasad","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nina","family":"Miolane","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carine","family":"Legrand","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Khanh","family":"Dao Duc","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2023,8,10]]},"reference":[{"key":"B1","unstructured":"From graph centrality to data depth\n            AamariE.\n            Arias-CastroE.\n            BerenfeldC.\n          2021"},{"key":"B2","doi-asserted-by":"publisher","first-page":"e0217346","DOI":"10.1371\/journal.pone.0217346","article-title":"Tismorph: A tool to quantify texture, irregularity and spreading of single cells","volume":"14","author":"Alizadeh","year":"2019","journal-title":"PLoS One"},{"key":"B3","doi-asserted-by":"publisher","first-page":"7189","DOI":"10.1038\/s41467-021-27394-2","article-title":"Fine-scale population structure and demographic history of british pakistanis","volume":"12","author":"Arciero","year":"2021","journal-title":"Nat. Commun."},{"key":"B4","doi-asserted-by":"publisher","first-page":"451","DOI":"10.1214\/09-sts307","article-title":"Population structure and cryptic relatedness in genetic association studies","volume":"24","author":"Astle","year":"2009","journal-title":"Stat. Sci."},{"key":"B5","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1002\/gepi.21789","article-title":"Accounting for population stratification in dna methylation studies","volume":"38","author":"Barfield","year":"2014","journal-title":"Genet. Epidemiol."},{"key":"B6","doi-asserted-by":"publisher","first-page":"2273","DOI":"10.1109\/tpami.2018.2851513","article-title":"Outlier detection for robust multi dimensional scaling","volume":"41","author":"Blouvshtein","year":"2019","journal-title":"IEEE Trans. Pattern Analysis Mach. Intell."},{"key":"B7","doi-asserted-by":"publisher","first-page":"e2006842","DOI":"10.1371\/journal.pbio.2006842","article-title":"Gut microbiota diversity across ethnicities in the United States","volume":"16","author":"Brooks","year":"2018","journal-title":"PLoS Biol."},{"key":"B8","first-page":"1","article-title":"Metric multidimensional scaling for large single-cell data sets using neural networks","author":"Canzar","year":"2021","journal-title":"bioRxiv"},{"key":"B9","first-page":"169","article-title":"Robust euclidean embedding","author":"Cayton","year":""},{"key":"B10","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12864-017-4008-8","article-title":"Genetic influences on the human oral microbiome","volume":"18","author":"Demmitt","year":"2017","journal-title":"BMC Genomics"},{"key":"B11","doi-asserted-by":"publisher","first-page":"3181","DOI":"10.1038\/s41396-021-00993-z","article-title":"Genome wide association study reveals plant loci controlling heritability of the rhizosphere microbiome","volume":"15","author":"Deng","year":"2021","journal-title":"ISME J."},{"key":"B12","doi-asserted-by":"publisher","first-page":"4118","DOI":"10.1109\/tsp.2012.2197617","article-title":"Sparsity-exploiting robust multidimensional scaling","volume":"60","author":"Forero","year":"2012","journal-title":"IEEE Trans. Signal Process."},{"key":"B13","doi-asserted-by":"publisher","first-page":"1608","DOI":"10.1016\/j.neucom.2005.05.015","article-title":"From outliers to prototypes: Ordering data","volume":"69","author":"Harmeling","year":"2005","journal-title":"Neurocomputing"},{"key":"B14","article-title":"Exploring and controlling for underlying structure in genome and microbiome case-control association studies","author":"Legrand","year":"2017"},{"key":"B15","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1214\/aos\/1176347507","article-title":"On a notion of data depth based on random simplices","volume":"18","author":"Liu","year":"1990","journal-title":"Ann. Statistics"},{"key":"B16","doi-asserted-by":"publisher","first-page":"e8746","DOI":"10.15252\/msb.20188746","article-title":"Current best practices in single-cell rna-seq analysis: A tutorial","volume":"15","author":"Luecken","year":"2019","journal-title":"Mol. Syst. Biol."},{"key":"B17","doi-asserted-by":"publisher","first-page":"919","DOI":"10.1109\/tsp.2016.2625265","article-title":"Robust multidimensional scaling using a maximum correntropy criterion","volume":"65","author":"Mandanas","year":"2017","journal-title":"IEEE Trans. Signal Process."},{"key":"B18","doi-asserted-by":"publisher","first-page":"1179","DOI":"10.1093\/bioinformatics\/btw777","article-title":"Scater: Pre-processing, quality control, normalization and visualization of single-cell rna-seq data in r","volume":"33","author":"McCarthy","year":"2017","journal-title":"Bioinformatics"},{"key":"B19","unstructured":"Umap: Uniform manifold approximation and projection for dimension reduction\n            McInnesL.\n            HealyJ.\n            MelvilleJ.\n          2018"},{"key":"B20","doi-asserted-by":"publisher","first-page":"1115","DOI":"10.1017\/apr.2021.14","article-title":"Sparse regular variation","volume":"53","author":"Meyer","year":"2021","journal-title":"Adv. Appl. Probab."},{"key":"B21","unstructured":"Iclr 2021 challenge for computational geometry & topology: Design and results\n            MiolaneN.\n            CaorsiM.\n            LupoU.\n            GuerardM.\n            GuiguiN.\n            MatheJ.\n          2021"},{"key":"B22","first-page":"1","article-title":"Geomstats: A python package for riemannian geometry in machine learning","volume":"21","author":"Miolane","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"B24","first-page":"249","article-title":"Biological data outlier detection based on kullback-leibler divergence","author":"Oh","year":""},{"key":"B25","doi-asserted-by":"publisher","first-page":"88","DOI":"10.2307\/2684253","article-title":"The three sigma rule","volume":"48","author":"Pukelsheim","year":"1994","journal-title":"Am. Statistician"},{"key":"B26","doi-asserted-by":"publisher","first-page":"148","DOI":"10.1080\/01621459.1960.10482056","article-title":"Variance of the median of small samples from several special populations","volume":"55","author":"Rider","year":"1960","journal-title":"J. Am. Stat. Assoc."},{"key":"B27","doi-asserted-by":"publisher","first-page":"1569","DOI":"10.1038\/s41467-020-15194-z","article-title":"Dimensionality reduction reveals fine-scale structure in the Japanese population with consequences for polygenic risk prediction","volume":"11","author":"Sakaue","year":"2020","journal-title":"Nat. Commun."},{"key":"B28","doi-asserted-by":"publisher","first-page":"giz087","DOI":"10.1093\/gigascience\/giz087","article-title":"ascend: R package for analysis of single-cell rna-seq data","volume":"8","author":"Senabouth","year":"2019","journal-title":"GigaScience"},{"key":"B29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.2202\/1544-6115.1426","article-title":"Detecting outlier samples in microarray data","volume":"8","author":"Shieh","year":"2009","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"B30","volume-title":"An introduction to the geometry of n dimensions","author":"Sommerville","year":"1929"},{"key":"B31","doi-asserted-by":"publisher","first-page":"631","DOI":"10.1109\/tkde.2007.1009","article-title":"Conditional anomaly detection","volume":"19","author":"Song","year":"2007","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"B32","doi-asserted-by":"publisher","first-page":"501","DOI":"10.1007\/bf02294632","article-title":"Robust multidimensional scaling","volume":"54","author":"Spence","year":"1989","journal-title":"Psychometrika"},{"key":"B33","doi-asserted-by":"publisher","first-page":"1415","DOI":"10.1109\/tpami.2010.184","article-title":"Shape analysis of elastic curves in euclidean spaces","volume":"33","author":"Srivastava","year":"2010","journal-title":"IEEE Trans. Pattern Analysis Mach. Intell."},{"key":"B34","doi-asserted-by":"publisher","first-page":"687","DOI":"10.1093\/biostatistics\/kxz060","article-title":"Sufficient dimension reduction for compositional data","volume":"22","author":"Tomassi","year":"2021","journal-title":"Biostatistics"},{"key":"B35","doi-asserted-by":"publisher","first-page":"e2117537119","DOI":"10.1073\/pnas.2117537119","article-title":"The gut microbiome influences host diet selection behavior","volume":"119","author":"Trevelline","year":"2022","journal-title":"Proc. Natl. Acad. Sci."},{"key":"B36","doi-asserted-by":"publisher","first-page":"804","DOI":"10.1038\/nature06244","article-title":"The human microbiome project","volume":"449","author":"Turnbaugh","year":"2007","journal-title":"Nature"},{"key":"B37","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1186\/s13059-017-1382-0","article-title":"Scanpy: Large-scale single-cell gene expression data analysis","volume":"19","author":"Wolf","year":"2018","journal-title":"Genome Biol."},{"key":"B38","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1186\/s40168-022-01271-6","article-title":"Gut microbiome mediates the protective effects of exercise after myocardial infarction","volume":"10","author":"Zhou","year":"2022","journal-title":"Microbiome"}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2023.1211819\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,10]],"date-time":"2023-08-10T07:53:43Z","timestamp":1691654023000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2023.1211819\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,10]]},"references-count":37,"alternative-id":["10.3389\/fbinf.2023.1211819"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2023.1211819","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.02.13.528380","asserted-by":"object"}]},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,10]]},"article-number":"1211819"}}