{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T02:41:59Z","timestamp":1779244919431,"version":"3.51.4"},"reference-count":41,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2020,3,16]],"date-time":"2020-03-16T00:00:00Z","timestamp":1584316800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Fonds de recherche du Qu\u00e9bec - Nature et technologies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and others incorporating subject-matter knowledge, have provided effective advances. However, no procedure currently satisfies the dual objectives of recovering stable and relevant features simultaneously.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Inspired by recent proposals for making use of control data in the removal of unwanted variation, we propose a variant of principal component analysis (PCA), sparse contrastive PCA that extracts sparse, stable, interpretable and relevant biological signal. The new methodology is compared to competing dimensionality reduction approaches through a simulation study and via analyses of several publicly available protein expression, microarray gene expression and single-cell transcriptome sequencing datasets.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>A free and open-source software implementation of the methodology, the scPCA R package, is made available via the Bioconductor Project. Code for all analyses presented in this article is also available via GitHub.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Contact<\/jats:title>\n                    <jats:p>philippe_boileau@berkeley.edu<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa176","type":"journal-article","created":{"date-parts":[[2020,3,10]],"date-time":"2020-03-10T16:47:02Z","timestamp":1583858822000},"page":"3422-3430","source":"Crossref","is-referenced-by-count":47,"title":["Exploring high-dimensional biological data with sparse contrastive principal component analysis"],"prefix":"10.1093","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4850-2507","authenticated-orcid":false,"given":"Philippe","family":"Boileau","sequence":"first","affiliation":[{"name":"Graduate Group in Biostatistics"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7127-2789","authenticated-orcid":false,"given":"Nima S","family":"Hejazi","sequence":"additional","affiliation":[{"name":"Graduate Group in Biostatistics"},{"name":"Center for Computational Biology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sandrine","family":"Dudoit","sequence":"additional","affiliation":[{"name":"Center for Computational Biology"},{"name":"Division of Epidemiology and Biostatistics , School of Public Health"},{"name":"Department of Statistics , University of California, Berkeley, CA 94720, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2020,3,16]]},"reference":[{"key":"2023062300081689000_btaa176-B1","doi-asserted-by":"crossref","first-page":"2134","DOI":"10.1038\/s41467-018-04608-8","article-title":"Exploring patterns enriched in a dataset with contrastive principal component analysis","volume":"9","author":"Abid","year":"2018","journal-title":"Nat. Commun"},{"key":"2023062300081689000_btaa176-B2","doi-asserted-by":"crossref","first-page":"545","DOI":"10.1038\/nbt.2594","article-title":"ViSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia","volume":"31","author":"Amir","year":"2013","journal-title":"Nat. Biotechnol"},{"key":"2023062300081689000_btaa176-B3","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1038\/nbt.4314","article-title":"Dimensionality reduction for visualizing single-cell data using UMAP","volume":"37","author":"Becht","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023062300081689000_btaa176-B4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2014\/968262","article-title":"Evidence of HLA-DQB1 contribution to susceptibility of dengue serotype 3 in dengue patients in Southern Brazil","volume":"2014","author":"Cardozo","year":"2014","journal-title":"J. Trop. Med"},{"key":"2023062300081689000_btaa176-B5","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1016\/j.virol.2016.07.014","article-title":"B cells naturally induced during dengue virus infection release soluble CD27, the plasma level of which is associated with severe forms of pediatric dengue","volume":"497","author":"Casta\u00f1eda","year":"2016","journal-title":"Virology"},{"key":"2023062300081689000_btaa176-B6","first-page":"131","volume-title":"The Interferon Inducible Gene","author":"Fitzgerald","year":"2011"},{"key":"2023062300081689000_btaa176-B7","doi-asserted-by":"crossref","first-page":"1303","DOI":"10.1128\/MCB.01101-09","article-title":"Mitochondrial p32 protein is a critical regulator of tumor metabolism via maintenance of oxidative phosphorylation","volume":"30","author":"Fogal","year":"2010","journal-title":"Mol. Cell. Biol"},{"key":"2023062300081689000_btaa176-B8","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1109\/TVCG.2019.2934251","article-title":"Supporting analysis of dimensionality reduction results with contrastive learning","volume":"26","author":"Fujiwara","year":"2020","journal-title":"IEEE Trans. Vis. Comput. Graph"},{"key":"2023062300081689000_btaa176-B9","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1093\/biostatistics\/kxr034","article-title":"Using control genes to correct for unwanted variation in microarray data","volume":"13","author":"Gagnon-Bartsch","year":"2012","journal-title":"Biostatistics"},{"key":"2023062300081689000_btaa176-B10","first-page":"1","author":"Gagnon-Bartsch","year":"2013"},{"key":"2023062300081689000_btaa176-B11","volume-title":"Bioinformatics and Computational Biology Solutions Using R and Bioconductor","author":"Gentleman","year":"2006"},{"key":"2023062300081689000_btaa176-B12","doi-asserted-by":"crossref","first-page":"R80","DOI":"10.1186\/gb-2004-5-10-r80","article-title":"Bioconductor: open software development for computational biology and bioinformatics","volume":"5","author":"Gentleman","year":"2004","journal-title":"Genome Biol"},{"key":"2023062300081689000_btaa176-B13","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1056\/NEJMoa033513","article-title":"Gene-expression patterns in drug-resistant acute lymphoblastic leukemia cells and response to treatment","volume":"351","author":"Holleman","year":"2004","journal-title":"N. Engl. J. Med"},{"key":"2023062300081689000_btaa176-B14","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1038\/nmeth.3252","article-title":"Orchestrating high-throughput genomic analysis with bioconductor","volume":"12","author":"Huber","year":"2015","journal-title":"Nat. Methods"},{"key":"2023062300081689000_btaa176-B15","doi-asserted-by":"crossref","first-page":"682","DOI":"10.1198\/jasa.2009.0121","article-title":"On consistency and sparsity for principal components analysis in high dimensions","volume":"104","author":"Johnstone","year":"2009","journal-title":"J. Am. Stat. Assoc"},{"key":"2023062300081689000_btaa176-B16","doi-asserted-by":"crossref","first-page":"1277","DOI":"10.1109\/JPROC.2018.2846730","article-title":"PCA in high dimensions: an orientation","volume":"106","author":"Johnstone","year":"2018","journal-title":"Proc. IEEE"},{"key":"2023062300081689000_btaa176-B17","author":"Kobak","year":"2019"},{"key":"2023062300081689000_btaa176-B18","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1016\/j.chom.2014.06.001","article-title":"Dengue virus infection induces expansion of a CD14(+)CD16(+) monocyte population that stimulates plasmablast differentiation","volume":"16","author":"Kwissa","year":"2014","journal-title":"Cell Host Microbe"},{"key":"2023062300081689000_btaa176-B19","doi-asserted-by":"crossref","first-page":"e161","DOI":"10.1371\/journal.pgen.0030161","article-title":"Capturing heterogeneity in gene expression studies by surrogate variable analysis","volume":"3","author":"Leek","year":"2007","journal-title":"PLoS Genet"},{"key":"2023062300081689000_btaa176-B20","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","article-title":"Least squares quantization in PCM","volume":"28","author":"Lloyd","year":"1982","journal-title":"IEEE Trans. Inform. Theory"},{"key":"2023062300081689000_btaa176-B21","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"Lopez","year":"2018","journal-title":"Nat. Methods"},{"key":"2023062300081689000_btaa176-B22","doi-asserted-by":"crossref","first-page":"660","DOI":"10.5483\/BMBRep.2014.47.12.020","article-title":"Stathmin 1 in normal and malignant hematopoiesis","volume":"47","author":"Machado-Neto","year":"2014","journal-title":"BMB Rep"},{"key":"2023062300081689000_btaa176-B23","author":"McInnes","year":"2018"},{"key":"2023062300081689000_btaa176-B24","first-page":"307","article-title":"Autoantibodies against carbonic anhydrase I and II in patients with acute myeloid leukemia TT","volume":"34","author":"Mente\u015fe","year":"2017","journal-title":"Turk. J. Haematol"},{"key":"2023062300081689000_btaa176-B25","doi-asserted-by":"crossref","first-page":"e1006907","DOI":"10.1371\/journal.pcbi.1006907","article-title":"Ten quick tips for effective dimensionality reduction","volume":"15","author":"Nguyen","year":"2019","journal-title":"PLoS Comput. Biol"},{"key":"2023062300081689000_btaa176-B26","year":"2019"},{"key":"2023062300081689000_btaa176-B27","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1038\/nbt0308-303","article-title":"What is principal component analysis?","author":"Ringner","year":"2008","journal-title":"Nat. Biotechnol"},{"key":"2023062300081689000_btaa176-B28","doi-asserted-by":"crossref","first-page":"896","DOI":"10.1038\/nbt.2931","article-title":"Normalization of RNA-seq data using factor analysis of control genes or samples","volume":"32","author":"Risso","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023062300081689000_btaa176-B29","article-title":"A general and flexible method for signal extraction from single-cell RNA-seq data","volume":"9, 284","author":"Risso","year":"2018","journal-title":"Nat. Commun"},{"key":"2023062300081689000_btaa176-B30","first-page":"4862","article-title":"Unsupervised learning with contrastive latent variable models","volume":"33","author":"Severson","year":"2019","journal-title":"Proc. AAAI Conf. Artif. Intell"},{"key":"2023062300081689000_btaa176-B31","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1016\/j.jmva.2012.10.007","article-title":"Consistency of sparse PCA in high dimension, low sample size contexts","volume":"115","author":"Shen","year":"2013","journal-title":"J. Multivariate Anal"},{"key":"2023062300081689000_btaa176-B32","first-page":"2579","author":"van der Maaten","year":"2008"},{"key":"2023062300081689000_btaa176-B33","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1038\/nmeth.4207","article-title":"Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning","volume":"14","author":"Wang","year":"2017","journal-title":"Nat. Methods"},{"key":"2023062300081689000_btaa176-B34","doi-asserted-by":"crossref","first-page":"1309","DOI":"10.1016\/j.cell.2014.07.048","article-title":"Cell-state-specific metabolic dependency in hematopoiesis and leukemogenesis","volume":"158","author":"Wang","year":"2014","journal-title":"Cell"},{"key":"2023062300081689000_btaa176-B35","doi-asserted-by":"crossref","first-page":"1484","DOI":"10.3150\/13-BEJSP14","article-title":"Stability","volume":"19","author":"Yu","year":"2013","journal-title":"Bernoulli"},{"key":"2023062300081689000_btaa176-B36","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1186\/s13059-017-1305-0","article-title":"Splatter: simulation of single-cell RNA sequencing data","volume":"18","author":"Zappia","year":"2017","journal-title":"Genome Biol"},{"key":"2023062300081689000_btaa176-B37","doi-asserted-by":"crossref","first-page":"e103207","DOI":"10.1371\/journal.pone.0103207","article-title":"A comparative study of techniques for differential expression analysis on RNA-seq data","volume":"9","author":"Zhang","year":"2014","journal-title":"PLoS One"},{"key":"2023062300081689000_btaa176-B38","doi-asserted-by":"crossref","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat. Commun"},{"key":"2023062300081689000_btaa176-B39","author":"Zou","year":"2018"},{"key":"2023062300081689000_btaa176-B40","doi-asserted-by":"crossref","first-page":"1311","DOI":"10.1109\/JPROC.2018.2846588","article-title":"A selective overview of sparse principal component analysis","volume":"106","author":"Zou","year":"2018","journal-title":"Proc. IEEE"},{"key":"2023062300081689000_btaa176-B41","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1198\/106186006X113430","article-title":"Sparse principal component analysis","volume":"15","author":"Zou","year":"2006","journal-title":"J. Comput. Graph. Stat"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa176\/33179838\/btaa176.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/11\/3422\/50670663\/bioinformatics_36_11_3422.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/11\/3422\/50670663\/bioinformatics_36_11_3422.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T14:16:54Z","timestamp":1687616214000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/11\/3422\/5807607"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2020,3,16]]},"references-count":41,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2020,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa176","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/836650","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,6]]},"published":{"date-parts":[[2020,3,16]]}}}