{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,17]],"date-time":"2026-06-17T22:24:24Z","timestamp":1781735064037,"version":"3.54.5"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"S1","license":[{"start":{"date-parts":[[2020,12,1]],"date-time":"2020-12-01T00:00:00Z","timestamp":1606780800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,12,9]],"date-time":"2020-12-09T00:00:00Z","timestamp":1607472000000},"content-version":"vor","delay-in-days":8,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA).<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>\n                      We demonstrate that even though PCA ignores the information regarding the class labels of the samples, this unsupervised tool can be remarkably effective as a feature selector. In some cases, it outperforms PLS-DA, which is made aware of the class labels in its input. Our experiments range from looking at the signal-to-noise ratio in the feature selection task, to considering many practical distributions and models encountered when analyzing bioinformatics and clinical data. Other methods were also evaluated. Finally, we analyzed an interesting data set from 396 vaginal microbiome samples where the ground truth for the feature selection was available. All the 3D figures shown in this paper as well as the supplementary ones can be viewed interactively at\n                      <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"http:\/\/biorg.cs.fiu.edu\/plsda\">http:\/\/biorg.cs.fiu.edu\/plsda<\/jats:ext-link>\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>Our results highlighted the strengths and weaknesses of PLS-DA in comparison with PCA for different underlying data models.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12859-019-3310-7","type":"journal-article","created":{"date-parts":[[2020,12,9]],"date-time":"2020-12-09T04:34:03Z","timestamp":1607488443000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":363,"title":["So you think you can PLS-DA?"],"prefix":"10.1186","volume":"21","author":[{"given":"Daniel","family":"Ruiz-Perez","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Haibin","family":"Guan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Purnima","family":"Madhivanan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kalai","family":"Mathee","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Giri","family":"Narasimhan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2020,12,9]]},"reference":[{"issue":"3","key":"3310_CR1","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1002\/cem.1180010306","volume":"1","author":"L St\u00e5hle","year":"1987","unstructured":"St\u00e5hle L, Wold S. Partial least squares analysis with cross-validation for the two-class problem: A monte carlo study. J Chemometrics. 1987; 1(3):185\u201396.","journal-title":"J Chemometrics"},{"issue":"3","key":"3310_CR2","doi-asserted-by":"publisher","first-page":"166","DOI":"10.1002\/cem.785","volume":"17","author":"M Barker","year":"2003","unstructured":"Barker M, Rayens W. Partial least squares for discrimination. J Chemometrics. 2003; 17(3):166\u201373.","journal-title":"J Chemometrics"},{"issue":"2","key":"3310_CR3","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1159\/000106926","volume":"6","author":"J Gottfries","year":"1995","unstructured":"Gottfries J, Blennow K, Wallin A, Gottfries C. Diagnosis of dementias using partial least squares discriminant analysis. Dementia Geriatric Cognit Disorders. 1995; 6(2):83\u20138.","journal-title":"Dementia Geriatric Cognit Disorders"},{"issue":"1","key":"3310_CR4","first-page":"92","volume":"1","author":"B Worley","year":"2013","unstructured":"Worley B, Powers R. Multivariate analysis in metabolomics. Curr Metabol. 2013; 1(1):92\u2013107.","journal-title":"Curr Metabol"},{"issue":"2","key":"3310_CR5","doi-asserted-by":"publisher","first-page":"102","DOI":"10.1016\/j.ab.2012.10.011","volume":"433","author":"B Worley","year":"2013","unstructured":"Worley B, Halouska S, Powers R. Utilities for quantifying separation in PCA\/PLS-DA scores plots. Anal Biochem. 2013; 433(2):102\u20134.","journal-title":"Anal Biochem"},{"issue":"11","key":"3310_CR6","doi-asserted-by":"publisher","first-page":"108597","DOI":"10.1371\/journal.pcbi.1005752","volume":"13","author":"F Rohart","year":"2017","unstructured":"Rohart F, Gautier B, Singh A, Le Cao K-A. mixOmics: An R package for \u2019omics feature selection and multiple data integration. PLoS computational biology. 2017; 13(11):108597.","journal-title":"PLoS computational biology"},{"issue":"3","key":"3310_CR7","doi-asserted-by":"publisher","first-page":"419","DOI":"10.1007\/s00216-004-2783-y","volume":"380","author":"L Eriksson","year":"2004","unstructured":"Eriksson L, Antti H, Gottfries J, Holmes E, Johansson E, Lindgren F, Long I, Lundstedt T, Trygg J, Wold S. Using chemometrics for navigating in the large data sets of genomics, proteomics, and metabonomics (gpm). Anal Bioanalyt Chem. 2004; 380(3):419\u201329.","journal-title":"Anal Bioanalyt Chem"},{"issue":"1","key":"3310_CR8","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1074\/mcp.M112.022566","volume":"12","author":"C Christin","year":"2013","unstructured":"Christin C, Hoefsloot HC, Smilde AK, Hoekman B, Suits F, Bischoff R, Horvatovich P. A critical assessment of feature selection methods for biomarker discovery in clinical proteomics. Mole Cell Proteom. 2013; 12(1):263\u201376.","journal-title":"Mole Cell Proteom"},{"key":"3310_CR9","doi-asserted-by":"publisher","unstructured":"Nguyen DV, Rocke DM. Classification of Acute Leukemia Based on DNA Microarray Gene Expressions Using Partial Least Squares. Linux Journal. 2002;:109\u201324. https:\/\/doi.org\/10.1007\/978-1-4615-0873-1_9.","DOI":"10.1007\/978-1-4615-0873-1_9"},{"issue":"3","key":"3310_CR10","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1016\/j.compbiolchem.2004.05.002","volume":"28","author":"Y Tan","year":"2004","unstructured":"Tan Y, Shi L, Tong W, Hwang GG, Wang C. Multi-class tumor classification by discriminant partial least squares using microarray gene expression data and assessment of classification models. Comput Biol Chem. 2004; 28(3):235\u201343.","journal-title":"Comput Biol Chem"},{"issue":"1","key":"3310_CR11","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1016\/j.talanta.2009.06.072","volume":"80","author":"C Botella","year":"2009","unstructured":"Botella C, Ferr\u00e9 J, Boqu\u00e9 R. Classification from microarray data using probabilistic discriminant partial least squares with reject option. Talanta. 2009; 80(1):321\u20138.","journal-title":"Talanta"},{"issue":"4","key":"3310_CR12","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1002\/cem.2609","volume":"28","author":"RG Brereton","year":"2014","unstructured":"Brereton RG, Lloyd GR. Partial least squares discriminant analysis: taking the magic away. J Chemometrics. 2014; 28(4):213\u201325.","journal-title":"J Chemometrics"},{"issue":"1","key":"3310_CR13","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1007\/s11306-007-0099-6","volume":"4","author":"JA Westerhuis","year":"2008","unstructured":"Westerhuis JA, Hoefsloot HC, Smit S, Vis DJ, Smilde AK, van Velzen EJ, van Duijnhoven JP, van Dorsten FA. Assessment of PLSDA cross validation. Metabolomics. 2008; 4(1):81\u20139.","journal-title":"Metabolomics"},{"issue":"7-8","key":"3310_CR14","doi-asserted-by":"publisher","first-page":"558","DOI":"10.1002\/cem.1346","volume":"24","author":"K Kjeldahl","year":"2010","unstructured":"Kjeldahl K, Bro R. Some common misunderstandings in chemometrics. J Chemometrics. 2010; 24(7-8):558\u201364.","journal-title":"J Chemometrics"},{"issue":"1","key":"3310_CR15","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1111\/j.1467-9868.2009.00723.x","volume":"72","author":"H Chun","year":"2010","unstructured":"Chun H, Kele\u015f S. Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J Royal Stat Soc: Ser B (Stat Methodol). 2010; 72(1):3\u201325.","journal-title":"J Royal Stat Soc: Ser B (Stat Methodol)"},{"issue":"1","key":"3310_CR16","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1186\/1471-2105-12-253","volume":"12","author":"K-A L\u00ea Cao","year":"2011","unstructured":"L\u00ea Cao K-A, Boitard S, Besse P. Sparse pls discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics. 2011; 12(1):253.","journal-title":"BMC Bioinformatics"},{"key":"3310_CR17","doi-asserted-by":"crossref","unstructured":"Chung D, Keles S. Sparse partial least squares classification for high dimensional data. Stat Appl Genet Mole Biol. 2010; 9(1).","DOI":"10.2202\/1544-6115.1492"},{"key":"3310_CR18","unstructured":"Le Cao K-A, Rohart F, Gonzalez I, Dejean S, Gautier B, Bartolo F. mixOmics: Omics data integration project. R package, version. 2017."},{"issue":"8","key":"3310_CR19","doi-asserted-by":"publisher","first-page":"2379","DOI":"10.1021\/acs.jproteome.5b01029","volume":"15","author":"E Saccenti","year":"2016","unstructured":"Saccenti E, Timmerman ME. Approaches to sample size determination for multivariate data: Applications to pca and pls-da of omics data. J Proteome Res. 2016; 15(8):2379\u201393. https:\/\/doi.org\/10.1021\/acs.jproteome.5b01029. PMID: 27322847.","journal-title":"J Proteome Res"},{"issue":"10","key":"3310_CR20","doi-asserted-by":"publisher","first-page":"6562","DOI":"10.1073\/pnas.102102699","volume":"99","author":"C Ambroise","year":"2002","unstructured":"Ambroise C, McLachlan GJ. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Nat Acad Sci. 2002; 99(10):6562\u20136.","journal-title":"Proc Nat Acad Sci"},{"key":"3310_CR21","doi-asserted-by":"crossref","unstructured":"Hyvarinen A KJ, E O. Independent Component Analysis: Wiley; 2001. ISBN 978-0471-40540-5.","DOI":"10.1002\/0471221317"},{"issue":"6","key":"3310_CR22","doi-asserted-by":"publisher","first-page":"1015","DOI":"10.1016\/j.jmva.2007.06.007","volume":"99","author":"H Shen","year":"2008","unstructured":"Shen H, Huang JZ. Sparse principal component analysis via regularized low rank matrix approximation. J Multivariate Anal. 2008; 99(6):1015\u201334. https:\/\/doi.org\/10.1016\/j.jmva.2007.06.007.","journal-title":"J Multivariate Anal"},{"issue":"405","key":"3310_CR23","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1080\/01621459.1989.10478752","volume":"84","author":"JH Friedman","year":"1989","unstructured":"Friedman JH. Regularized discriminant analysis. J Am Stat Assoc. 1989; 84(405):165\u201375. https:\/\/doi.org\/10.1080\/01621459.1989.10478752. http:\/\/arxiv.org\/abs\/https:\/\/www.tandfonline.com\/doi\/pdf\/10.1080\/01621459.1989.10478752.","journal-title":"J Am Stat Assoc"},{"issue":"1","key":"3310_CR24","doi-asserted-by":"publisher","first-page":"114","DOI":"10.1097\/01.AOG.0000247627.84791.91","volume":"109","author":"JE Allsworth","year":"2007","unstructured":"Allsworth JE, Peipert JF. Prevalence of bacterial vaginosis: 2001\u20132004 national health and nutrition examination survey data. Obstetrics Gynecol. 2007; 109(1):114\u201320.","journal-title":"Obstetrics Gynecol"},{"issue":"3","key":"3310_CR25","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1086\/375819","volume":"37","author":"TL Cherpes","year":"2003","unstructured":"Cherpes TL, Meyn LA, Krohn MA, Lurie JG, Hillier SL. Association between acquisition of herpes simplex virus type 2 in women and bacterial vaginosis. Clin Infect Dis. 2003; 37(3):319\u201325.","journal-title":"Clin Infect Dis"},{"issue":"Supplement 1","key":"3310_CR26","doi-asserted-by":"publisher","first-page":"4680","DOI":"10.1073\/pnas.1002611107","volume":"108","author":"J Ravel","year":"2011","unstructured":"Ravel J, Gajer P, Abdo Z, Schneider GM, Koenig SSK, McCulle SL, Karlebach S, Gorle R, Russell J, Tacket CO, Brotman RM, Davis CC, Ault K, Peralta L, Forney LJ. Vaginal microbiome of reproductive-age women. Proc Nat Acad Sci. 2011; 108(Supplement 1):4680\u20137. https:\/\/doi.org\/10.1073\/pnas.1002611107.","journal-title":"Proc Nat Acad Sci"},{"issue":"132","key":"3310_CR27","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1126\/scitranslmed.3003605","volume":"4","author":"P Gajer","year":"2012","unstructured":"Gajer P, Brotman RM, Bai G, Sakamoto J, Sch\u00fctte UME, Zhong X, Koenig SSK, Fu L, Ma ZS, Zhou X, Abdo Z, Forney LJ, Ravel J. Temporal dynamics of the human vaginal microbiota. Sci Trans Med. 2012; 4(132):132\u20135213252. https:\/\/doi.org\/10.1126\/scitranslmed.3003605. http:\/\/arxiv.org\/abs\/http:\/\/stm.sciencemag.org\/content\/4\/132\/132ra52.full.pdf.","journal-title":"Sci Trans Med"},{"issue":"1","key":"3310_CR28","doi-asserted-by":"publisher","first-page":"371","DOI":"10.1146\/annurev-micro-092611-150157","volume":"66","author":"B Ma","year":"2012","unstructured":"Ma B, Forney LJ, Ravel J. Vaginal microbiome: Rethinking health and disease. Ann Rev Microbiol. 2012; 66(1):371\u201389. https:\/\/doi.org\/10.1146\/annurev-micro-092611-150157. PMID: 22746335.","journal-title":"Ann Rev Microbiol"},{"key":"3310_CR29","doi-asserted-by":"publisher","first-page":"3366","DOI":"10.7717\/peerj.3366","volume":"5","author":"ZS Ma","year":"2017","unstructured":"Ma ZS, Li L. Quantifying the human vaginal community state types (csts) with the species specificity index. PeerJ. 2017; 5:3366.","journal-title":"PeerJ"},{"key":"3310_CR30","doi-asserted-by":"crossref","unstructured":"Mahendra M, Samuel P, Dieter E. Microbial ecosystems are dominated by specialist taxa. Ecol Lett; 18(9):974\u201382. https:\/\/doi.org\/10.1111\/ele.12478. http:\/\/arxiv.org\/abs\/https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1111\/ele.12478.","DOI":"10.1111\/ele.12478"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-3310-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-019-3310-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-3310-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,12,9]],"date-time":"2020-12-09T04:39:16Z","timestamp":1607488756000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-019-3310-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12]]},"references-count":30,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["3310"],"URL":"https:\/\/doi.org\/10.1186\/s12859-019-3310-7","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/207225","asserted-by":"object"}]},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12]]},"assertion":[{"value":"26 November 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 December 2019","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 December 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"2"}}