{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T18:24:32Z","timestamp":1776277472695,"version":"3.50.1"},"reference-count":41,"publisher":"Public Library of Science (PLoS)","issue":"7","license":[{"start":{"date-parts":[[2022,7,13]],"date-time":"2022-07-13T00:00:00Z","timestamp":1657670400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["11925103"],"award-info":[{"award-number":["11925103"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003399","name":"Science and Technology Commission of Shanghai Municipality","doi-asserted-by":"publisher","award":["2021SHZDZX0103"],"award-info":[{"award-number":["2021SHZDZX0103"]}],"id":[{"id":"10.13039\/501100003399","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program","doi-asserted-by":"crossref","award":["2021YFC2701600"],"award-info":[{"award-number":["2021YFC2701600"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program","doi-asserted-by":"crossref","award":["2021YFC2701601"],"award-info":[{"award-number":["2021YFC2701601"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100003399","name":"Science and Technology Commission of Shanghai Municipality","doi-asserted-by":"publisher","award":["20ZR1407700"],"award-info":[{"award-number":["20ZR1407700"]}],"id":[{"id":"10.13039\/501100003399","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100014718","name":"Innovative Research Group Project of the National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61932008"],"award-info":[{"award-number":["61932008"]}],"id":[{"id":"10.13039\/100014718","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Confounding factors exist widely in various biological data owing to technical variations, population structures and experimental conditions. Such factors may mask the true signals and lead to spurious associations in the respective biological data, making it necessary to adjust confounding factors accordingly. However, existing confounder correction methods were mainly developed based on the original data or the pairwise Euclidean distance, either one of which is inadequate for analyzing different types of data, such as sequencing data.<\/jats:p>\n<jats:p>In this work, we proposed a method called Adjustment for Confounding factors using Principal Coordinate Analysis, or AC-PCoA, which reduces data dimension and extracts the information from different distance measures using principal coordinate analysis, and adjusts confounding factors across multiple datasets by minimizing the associations between lower-dimensional representations and confounding variables. Application of the proposed method was further extended to classification and prediction. We demonstrated the efficacy of AC-PCoA on three simulated datasets and five real datasets. Compared to the existing methods, AC-PCoA shows better results in visualization, statistical testing, clustering, and classification.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1010184","type":"journal-article","created":{"date-parts":[[2022,7,13]],"date-time":"2022-07-13T17:31:32Z","timestamp":1657733492000},"page":"e1010184","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":35,"title":["AC-PCoA: Adjustment for confounding factors using principal coordinate analysis"],"prefix":"10.1371","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8070-116X","authenticated-orcid":true,"given":"Yu","family":"Wang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8552-043X","authenticated-orcid":true,"given":"Fengzhu","family":"Sun","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1863-4306","authenticated-orcid":true,"given":"Wei","family":"Lin","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8223-844X","authenticated-orcid":true,"given":"Shuqin","family":"Zhang","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,7,13]]},"reference":[{"issue":"1","key":"pcbi.1010184.ref001","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1093\/biostatistics\/kxj037","article-title":"Adjusting batch effects in microarray expression data using empirical Bayes methods","volume":"8","author":"WE Johnson","year":"2007","journal-title":"Biostatistics"},{"issue":"6","key":"pcbi.1010184.ref002","doi-asserted-by":"crossref","first-page":"882","DOI":"10.1093\/bioinformatics\/bts034","article-title":"The sva package for removing batch effects and other unwanted variation in high-throughput experiments","volume":"28","author":"JT Leek","year":"2012","journal-title":"Bioinformatics"},{"issue":"9","key":"pcbi.1010184.ref003","first-page":"1724","article-title":"Capturing heterogeneity in gene expression studies by surrogate variable analysis","volume":"3","author":"JT Leek","year":"2007","journal-title":"PLoS Genet"},{"issue":"48","key":"pcbi.1010184.ref004","doi-asserted-by":"crossref","first-page":"18718","DOI":"10.1073\/pnas.0808709105","article-title":"A general framework for multiple testing dependence","volume":"105","author":"JT Leek","year":"2008","journal-title":"Proc Natl Acad Sci U S A"},{"key":"pcbi.1010184.ref005","first-page":"1","volume-title":"Removing unwanted variation from high dimensional data with negative controls","author":"JA Gagnon-Bartsch","year":"2013"},{"issue":"3","key":"pcbi.1010184.ref006","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1093\/biostatistics\/kxr034","article-title":"Using control genes to correct for unwanted variation in microarray data","volume":"13","author":"JA Gagnon-Bartsch","year":"2012","journal-title":"Biostatistics"},{"issue":"1","key":"pcbi.1010184.ref007","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1093\/biostatistics\/kxv026","article-title":"Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed","volume":"17","author":"L Jacob","year":"2016","journal-title":"Biostatistics"},{"issue":"12","key":"pcbi.1010184.ref008","doi-asserted-by":"crossref","first-page":"6073","DOI":"10.1093\/nar\/gkz433","article-title":"A new normalization for Nanostring nCounter gene expression data","volume":"47","author":"R Molania","year":"2019","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"pcbi.1010184.ref009","first-page":"1","article-title":"Controlling for confounding effects in single cell RNA sequencing studies using both control and target genes","volume":"7","author":"M Chen","year":"2017","journal-title":"Scientific Reports"},{"issue":"3","key":"pcbi.1010184.ref010","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1093\/bioinformatics\/btaa715","article-title":"Efficient and effective control of confounding in eQTL mapping studies through joint differential expression and Mendelian randomization analyses","volume":"37","author":"Y Fan","year":"2021","journal-title":"Bioinformatics"},{"issue":"16","key":"pcbi.1010184.ref011","doi-asserted-by":"crossref","first-page":"e106","DOI":"10.1093\/nar\/gkv526","article-title":"Removing unwanted variation in a differential methylation analysis of Illumina HumanMethylation450 array data","volume":"43","author":"J Maksimovic","year":"2015","journal-title":"Nucleic Acids Research"},{"issue":"9","key":"pcbi.1010184.ref012","doi-asserted-by":"crossref","first-page":"896","DOI":"10.1038\/nbt.2931","article-title":"Normalization of RNA-seq data using factor analysis of control genes or samples","volume":"32","author":"D Risso","year":"2014","journal-title":"Nat Biotechnol"},{"issue":"27","key":"pcbi.1010184.ref013","doi-asserted-by":"crossref","first-page":"7391","DOI":"10.1073\/pnas.1511656113","article-title":"Modeling confounding by half-sibling regression","volume":"113","author":"B Sch\u00f6lkopf","year":"2016","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"8","key":"pcbi.1010184.ref014","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1093\/bioinformatics\/btt075","article-title":"Accounting for non-genetic factors by low-rank representation and sparse regression for eQTL mapping","volume":"29","author":"C Yang","year":"2013","journal-title":"Bioinformatics"},{"issue":"51","key":"pcbi.1010184.ref015","doi-asserted-by":"crossref","first-page":"14662","DOI":"10.1073\/pnas.1617317113","article-title":"Simultaneous dimension reduction and adjustment for confounding variation","volume":"113","author":"Z Lin","year":"2016","journal-title":"Proc Natl Acad Sci U S A"},{"key":"pcbi.1010184.ref016","doi-asserted-by":"crossref","first-page":"W45","DOI":"10.1093\/nar\/gkh362","article-title":"CVTree: a phylogenetic tree reconstruction tool based on whole genomes","volume":"32","author":"J Qi","year":"2004","journal-title":"Nucleic Acids Res"},{"issue":"2","key":"pcbi.1010184.ref017","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1089\/cmb.2012.0228","article-title":"Alignment-free sequence comparison based on next-generation sequencing reads","volume":"20","author":"K Song","year":"2013","journal-title":"J Comput Biol"},{"key":"pcbi.1010184.ref018","doi-asserted-by":"crossref","first-page":"109","DOI":"10.4324\/9780429501463-11","volume-title":"Computers and DNA","author":"DC Torney","year":"2018"},{"issue":"4","key":"pcbi.1010184.ref019","doi-asserted-by":"crossref","first-page":"325","DOI":"10.2307\/1942268","article-title":"An ordination of the upland forest communities of southern Wisconsin","volume":"27","author":"JR Bray","year":"1957","journal-title":"Ecological Monographs"},{"key":"pcbi.1010184.ref020","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1007\/978-3-642-55721-7_12","volume-title":"Exploratory Data Analysis in Empirical Research","author":"K Jajuga","year":"2003"},{"key":"pcbi.1010184.ref021","doi-asserted-by":"crossref","unstructured":"Boriah S, Chandola V, Kumar V. Similarity measures for categorical data: A comparative evaluation. In: Proceedings of the 2008 SIAM international conference on data mining. SIAM; 2008. p. 243\u2013254.","DOI":"10.1137\/1.9781611972788.22"},{"key":"pcbi.1010184.ref022","doi-asserted-by":"crossref","unstructured":"Bojorque R, Hurtado R, Inga A. A comparative analysis of similarity metrics on sparse data for clustering in recommender systems. In: International Conference on Applied Human Factors and Ergonomics. Springer; 2018. p. 291\u2013299.","DOI":"10.1007\/978-3-319-94229-2_28"},{"key":"pcbi.1010184.ref023","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1016\/j.knosys.2015.03.001","article-title":"A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data","volume":"82","author":"BK Patra","year":"2015","journal-title":"Knowledge-Based Systems"},{"key":"pcbi.1010184.ref024","unstructured":"Torgerson WS. Theory and methods of scaling. 1958;."},{"key":"pcbi.1010184.ref025","first-page":"588","article-title":"A Q-technique for the calculation of canonical variates","author":"JC Gower","year":"1966","journal-title":"Biometrika"},{"issue":"3-4","key":"pcbi.1010184.ref026","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1016\/j.ecolmodel.2006.02.015","article-title":"Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM)","volume":"196","author":"S Dray","year":"2006","journal-title":"Ecological Modelling"},{"issue":"4","key":"pcbi.1010184.ref027","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1016\/j.cell.2014.09.053","article-title":"Human genetics shape the gut microbiome","volume":"159","author":"JK Goodrich","year":"2014","journal-title":"Cell"},{"key":"pcbi.1010184.ref028","first-page":"259","article-title":"Principal coordinate analysis and non-metric multidimensional scaling","author":"AF Zuur","year":"2007","journal-title":"Analysing Ecological Data"},{"issue":"13","key":"pcbi.1010184.ref029","doi-asserted-by":"crossref","first-page":"4099","DOI":"10.1093\/bioinformatics\/btaa276","article-title":"aPCoA: covariate adjusted principal coordinates analysis","volume":"36","author":"Y Shi","year":"2020","journal-title":"Bioinformatics"},{"issue":"5","key":"pcbi.1010184.ref030","doi-asserted-by":"crossref","first-page":"657","DOI":"10.1007\/s11258-014-0406-z","article-title":"Principal component analysis with missing values: a comparative survey of methods","volume":"216","author":"S Dray","year":"2015","journal-title":"Plant Ecology"},{"key":"pcbi.1010184.ref031","doi-asserted-by":"crossref","unstructured":"Gower JC. Principal coordinates analysis. Wiley StatsRef: Statistics Reference Online. 2014;.","DOI":"10.1002\/9781118445112.stat05670"},{"key":"pcbi.1010184.ref032","volume-title":"Learning with kernels: support vector machines, regularization, optimization, and beyond","author":"B Sch\u00f6lkopf","year":"2002"},{"issue":"2","key":"pcbi.1010184.ref033","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1109\/TNN.2008.2005601","article-title":"Normalized mutual information feature selection","volume":"20","author":"PA Est\u00e9vez","year":"2009","journal-title":"IEEE Trans Neural Netw"},{"key":"pcbi.1010184.ref034","unstructured":"Chen J, Zhang X, Zhou H. GUniFrac: Generalized UniFrac Distances, Distance-Based Multivariate Methods and Feature-Based Univariate Methods for Microbiome Data Analysis; 2021. Available from: https:\/\/CRAN.R-project.org\/package=GUniFrac."},{"issue":"1","key":"pcbi.1010184.ref035","doi-asserted-by":"crossref","first-page":"896","DOI":"10.1186\/s12864-018-5253-1","article-title":"Alignment-free genome comparison enables accurate geographic sourcing of white oak DNA","volume":"19","author":"K Tang","year":"2018","journal-title":"BMC Genomics"},{"key":"pcbi.1010184.ref036","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1186\/s13059-015-0841-8","article-title":"The microbiome quality control project: baseline study design and future directions","volume":"16","author":"R Sinha","year":"2015","journal-title":"Genome Biol"},{"issue":"9","key":"pcbi.1010184.ref037","doi-asserted-by":"crossref","first-page":"903","DOI":"10.1038\/nbt.2957","article-title":"A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium","volume":"32","author":"Z Su","year":"2014","journal-title":"Nat Biotechnol"},{"issue":"12","key":"pcbi.1010184.ref038","doi-asserted-by":"crossref","first-page":"1289","DOI":"10.1038\/s41592-019-0619-0","article-title":"Fast, sensitive and accurate integration of single-cell data with Harmony","volume":"16","author":"I Korsunsky","year":"2019","journal-title":"Nat Methods"},{"issue":"7370","key":"pcbi.1010184.ref039","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1038\/nature10523","article-title":"Spatio-temporal transcriptome of the human brain","volume":"478","author":"HJ Kang","year":"2011","journal-title":"Nature"},{"issue":"1","key":"pcbi.1010184.ref040","first-page":"1","article-title":"Partial cross mapping eliminates indirect causal influences","volume":"11","author":"SY Leng","year":"2020","journal-title":"Nat Comm"},{"key":"pcbi.1010184.ref041","doi-asserted-by":"crossref","first-page":"9870149","DOI":"10.34133\/2022\/9870149","article-title":"Continuity scaling: A rigorous framework for detecting and quantifying causality accurately","volume":"2022","author":"X Ying","year":"2022","journal-title":"Research"}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010184","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,7,13]],"date-time":"2022-07-13T17:32:23Z","timestamp":1657733543000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010184"}},"subtitle":[],"editor":[{"given":"Maria D.","family":"Chikina","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,7,13]]},"references-count":41,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,7,13]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1010184","relation":{},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,13]]}}}