{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T00:52:27Z","timestamp":1772844747646,"version":"3.50.1"},"reference-count":67,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2021,8,10]],"date-time":"2021-08-10T00:00:00Z","timestamp":1628553600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001773","name":"University of New South Wales","doi-asserted-by":"publisher","award":["2019"],"award-info":[{"award-number":["2019"]}],"id":[{"id":"10.13039\/501100001773","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,11,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>A typical single-cell RNA sequencing (scRNA-seq) experiment will measure on the order of 20\u2009000 transcripts and thousands, if not millions, of cells. The high dimensionality of such data presents serious complications for traditional data analysis methods and, as such, methods to reduce dimensionality play an integral role in many analysis pipelines. However, few studies have benchmarked the performance of these methods on scRNA-seq data, with existing comparisons assessing performance via downstream analysis accuracy measures, which may confound the interpretation of their results. Here, we present the most comprehensive benchmark of dimensionality reduction methods in scRNA-seq data to date, utilizing over 300\u2009000 compute hours to assess the performance of over 25\u2009000 low-dimension embeddings across 33 dimensionality reduction methods and 55 scRNA-seq datasets. We employ a simple, yet novel, approach, which does not rely on the results of downstream analyses. Internal validation measures (IVMs), traditionally used as an unsupervised method to assess clustering performance, are repurposed to measure how well-formed biological clusters are after dimensionality reduction. Performance was further evaluated over nearly 200\u2009000\u2009000 iterations of DBSCAN, a density-based clustering algorithm, showing that hyperparameter optimization using IVMs as the objective function leads to near-optimal clustering. Methods were also assessed on the extent to which they preserve the global structure of the data, and on their computational memory and time requirements across a large range of sample sizes. Our comprehensive benchmarking analysis provides a valuable resource for researchers and aims to guide best practice for dimensionality reduction in scRNA-seq analyses, and we highlight Latent Dirichlet Allocation and Potential of Heat-diffusion for Affinity-based Transition Embedding as high-performing algorithms.<\/jats:p>","DOI":"10.1093\/bib\/bbab304","type":"journal-article","created":{"date-parts":[[2021,7,19]],"date-time":"2021-07-19T07:07:35Z","timestamp":1626678455000},"source":"Crossref","is-referenced-by-count":19,"title":["Supervised application of internal validation measures to benchmark dimensionality reduction methods in scRNA-seq data"],"prefix":"10.1093","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1890-795X","authenticated-orcid":false,"given":"Forrest C","family":"Koch","sequence":"first","affiliation":[{"name":"School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5266-8772","authenticated-orcid":false,"given":"Gavin J","family":"Sutton","sequence":"additional","affiliation":[{"name":"School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4162-3872","authenticated-orcid":false,"given":"Irina","family":"Voineagu","sequence":"additional","affiliation":[{"name":"School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia"},{"name":"UNSW Data Science Hub, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7521-2417","authenticated-orcid":false,"given":"Fatemeh","family":"Vafaee","sequence":"additional","affiliation":[{"name":"School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia"},{"name":"UNSW Data Science Hub, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia"}]}],"member":"286","published-online":{"date-parts":[[2021,8,10]]},"reference":[{"key":"2021110815090083600_ref1","doi-asserted-by":"crossref","DOI":"10.7554\/eLife.27041","article-title":"The Human Cell Atlas","volume":"6","author":"Regev","year":"2017","journal-title":"Elife"},{"key":"2021110815090083600_ref2","doi-asserted-by":"crossref","first-page":"1173","DOI":"10.1016\/j.cell.2013.02.022","article-title":"Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression","volume":"152","author":"Qi","year":"2013","journal-title":"Cell"},{"key":"2021110815090083600_ref3","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1038\/nmeth.4177","article-title":"Pooled CRISPR screening with single-cell transcriptome readout","volume":"14","author":"Datlinger","year":"2017","journal-title":"Nat Methods"},{"key":"2021110815090083600_ref4","doi-asserted-by":"crossref","first-page":"494","DOI":"10.1038\/s41586-018-0414-6","article-title":"RNA velocity of single cells","volume":"560","author":"la Manno","year":"2018","journal-title":"Nature"},{"key":"2021110815090083600_ref5","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1006245","article-title":"Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database","volume":"14","author":"Zappia","year":"2018","journal-title":"PLoS Comput Biol"},{"key":"2021110815090083600_ref6","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1186\/s13059-020-1949-z","article-title":"Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data","volume":"21","author":"Holland","year":"2020","journal-title":"Genome Biol"},{"key":"2021110815090083600_ref7","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1038\/nbt.4096","article-title":"Integrating single-cell transcriptomic data across different conditions, technologies, and species","volume":"36","author":"Butler","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2021110815090083600_ref8","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1038\/nbt.2859","article-title":"The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells","volume":"32","author":"Trapnell","year":"2014","journal-title":"Nat Biotechnol"},{"key":"2021110815090083600_ref9","first-page":"2122","article-title":"A step-by-step workflow for low-level analysis of single-cell RNA-seq data with bioconductor","volume":"5","author":"Lun","year":"2016","journal-title":"F1000Res"},{"key":"2021110815090083600_ref10","first-page":"66","article-title":"Dimensionality reduction: a comparative review","volume":"10","author":"Van Der Maaten","year":"2009","journal-title":"J Mach Learn Res"},{"key":"2021110815090083600_ref11","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/s12982-016-0047-x","article-title":"Dimension reduction and shrinkage methods for high dimensional disease risk scores in historical data","volume":"13","author":"Kumamaru","year":"2016","journal-title":"Emerg Themes Epidemiol"},{"key":"2021110815090083600_ref12","first-page":"83","volume-title":"Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers","author":"Chizi","year":"2005"},{"key":"2021110815090083600_ref13","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1186\/s13059-019-1898-6","article-title":"Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis","volume":"20","author":"Sun","year":"2019","journal-title":"Genome Biol"},{"key":"2021110815090083600_ref14","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1186\/s13059-019-1900-3","article-title":"Benchmarking principal component analysis for large-scale single-cell RNA-sequencing","volume":"21","author":"Tsuyuzaki","year":"2020","journal-title":"Genome Biol"},{"key":"2021110815090083600_ref15","doi-asserted-by":"crossref","DOI":"10.1016\/j.celrep.2020.107576","article-title":"A quantitative framework for evaluating single-cell data structure preservation by dimensionality reduction techniques","volume":"31","author":"Heiser","year":"2020","journal-title":"Cell Rep"},{"key":"2021110815090083600_ref16","doi-asserted-by":"crossref","first-page":"825","DOI":"10.1016\/S0165-1684(02)00475-9","article-title":"Cluster validation techniques for genome expression data","volume":"83","author":"Bolshakova","year":"2003","journal-title":"Signal Processing"},{"key":"2021110815090083600_ref17","first-page":"226","article-title":"A density-based algorithm for discovering clusters in large spatial databases with noise","volume-title":"kdd","author":"Ester","year":"1996"},{"key":"2021110815090083600_ref18","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1186\/s13059-019-1738-8","article-title":"Essential guidelines for computational method benchmarking","volume":"20","author":"Weber","year":"2019","journal-title":"Genome Biol"},{"key":"2021110815090083600_ref19","doi-asserted-by":"publisher","first-page":"911","DOI":"10.1109\/ICDM.2010.35","article-title":"Understanding of internal clustering validation measures","volume-title":"2010 IEEE International Conference on Data Mining","author":"Liu","year":"2010"},{"key":"2021110815090083600_ref20","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: a graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J Comput Appl Math"},{"key":"2021110815090083600_ref21","first-page":"1","article-title":"A dendrite method for cluster analysis","volume":"3","author":"Cali\u0144ski","year":"1974","journal-title":"Commun Stat"},{"key":"2021110815090083600_ref22","first-page":"224","article-title":"A cluster separation measure","volume-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1","author":"Davies","year":"1979"},{"key":"2021110815090083600_ref23","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1038\/s41587-019-0379-5","article-title":"Droplet scRNA-seq is not zero-inflated","volume":"38","author":"Svensson","year":"2020","journal-title":"Nat Biotechnol"},{"key":"2021110815090083600_ref24","doi-asserted-by":"crossref","DOI":"10.3389\/fgene.2020.00041","article-title":"Normalization methods on single-cell RNA-seq data: an empirical survey","volume":"11","author":"Lytal","year":"2020","journal-title":"Front Genet"},{"key":"2021110815090083600_ref25","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1186\/s13059-019-1861-6","article-title":"Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model","volume":"20","author":"Townes","year":"2019","journal-title":"Genome Biol"},{"key":"2021110815090083600_ref26","doi-asserted-by":"crossref","first-page":"1093","DOI":"10.1038\/nmeth.2645","article-title":"Accounting for technical noise in single-cell RNA-seq experiments","volume":"10","author":"Brennecke","year":"2013","journal-title":"Nat Methods"},{"key":"2021110815090083600_ref27","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1038\/s41587-020-00809-z","article-title":"Initialization is critical for preserving global data structure in both t-SNE and UMAP","volume":"39","author":"Kobak","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2021110815090083600_ref28","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1038\/s41467-017-02554-5","article-title":"A general and flexible method for signal extraction from single-cell RNA-seq data","volume":"9","author":"Risso","year":"2018","journal-title":"Nat Commun"},{"key":"2021110815090083600_ref29","doi-asserted-by":"crossref","first-page":"2674","DOI":"10.1214\/18-AOAS1177","article-title":"Variational inference for probabilistic Poisson PCA","volume":"12","author":"Chiquet","year":"2018","journal-title":"Ann Appl Stat"},{"key":"2021110815090083600_ref30","article-title":"Probabilistic count matrix factorization for single cell expression data analysis","volume-title":"Bioinformatics","author":"Durif","year":"2017"},{"key":"2021110815090083600_ref31","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1038\/nmeth.4644","article-title":"scmap: projection of single-cell RNA-seq data across data sets","volume":"15","author":"Kiselev","year":"2018","journal-title":"Nat Methods"},{"key":"2021110815090083600_ref32","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1038\/s41586-018-0590-4","article-title":"Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris","volume":"562","author":"Schaum","year":"2018","journal-title":"Nature"},{"key":"2021110815090083600_ref33","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2021110815090083600_ref34","doi-asserted-by":"crossref","first-page":"524","DOI":"10.1111\/2041-210X.12188","article-title":"Synchrony: quantifying variability in space and time","volume":"5","author":"Gouhier","year":"2014","journal-title":"Methods Ecol Evol"},{"key":"2021110815090083600_ref35","first-page":"285","article-title":"Random search for hyper-parameter optimization","volume-title":"J Mach Learn Res","author":"Bergstra","year":"2012"},{"key":"2021110815090083600_ref36","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1007\/978-3-642-00599-2_68","volume-title":"Independent Component Analysis and Signal Separation","author":"Schmidt","year":"2009"},{"key":"2021110815090083600_ref37","first-page":"849","article-title":"NIMFA: a Python Library for Nonnegative Matrix Factorization","volume":"13","author":"\u017ditnik","year":"2012","journal-title":"J Mach Learn Res"},{"key":"2021110815090083600_ref38","doi-asserted-by":"publisher","DOI":"10.1037\/11491-006","volume-title":"\u2018General Intelligence\u2019 Objectively Determined and Measured","author":"Spearman","year":"1961"},{"key":"2021110815090083600_ref39","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1016\/S0893-6080(00)00026-5","article-title":"Independent component analysis: algorithms and applications","volume":"13","author":"Hyv\u00e4rinen","year":"2000","journal-title":"Neural Netw"},{"key":"2021110815090083600_ref40","article-title":"Experiments with random projection","author":"Dasgupta","year":"2013","journal-title":"arXiv"},{"key":"2021110815090083600_ref41","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1007\/s11263-007-0075-7","article-title":"Incremental learning for robust visual tracking","volume":"77","author":"Ross","year":"2008","journal-title":"Int J Comput Vis"},{"key":"2021110815090083600_ref42","doi-asserted-by":"crossref","first-page":"2319","DOI":"10.1126\/science.290.5500.2319","article-title":"A global geometric framework for nonlinear dimensionality reduction","volume":"290","author":"Tenenbaum","year":"2000","journal-title":"Science"},{"key":"2021110815090083600_ref43","doi-asserted-by":"crossref","first-page":"8914","DOI":"10.1038\/s41598-019-45301-0","article-title":"Structure-preserving visualisation of high dimensional single-cell datasets","volume":"9","author":"Szubert","year":"2019","journal-title":"Sci Rep"},{"key":"2021110815090083600_ref44","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1007\/BFb0020217","volume-title":"Artificial Neural Networks \u2014 ICANN\u201997","author":"Sch\u00f6lkopf","year":"1997"},{"key":"2021110815090083600_ref45","first-page":"993","article-title":"Latent Dirichlet Allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J Mach Learn Res"},{"key":"2021110815090083600_ref46","doi-asserted-by":"crossref","first-page":"2756","DOI":"10.1162\/neco.2007.19.10.2756","article-title":"Projected gradient methods for nonnegative matrix factorization","volume":"19","author":"Lin","year":"2007","journal-title":"Neural Comput"},{"key":"2021110815090083600_ref47","doi-asserted-by":"crossref","first-page":"2323","DOI":"10.1126\/science.290.5500.2323","article-title":"Nonlinear dimensionality reduction by locally linear embedding","volume":"290","author":"Roweis","year":"2000","journal-title":"Science"},{"key":"2021110815090083600_ref48","first-page":"708","article-title":"Fast local algorithms for large scale nonnegative matrix and tensor factorizations","volume-title":"IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer SciencesE92-A","author":"Cichocki","year":"2009"},{"key":"2021110815090083600_ref49","doi-asserted-by":"crossref","first-page":"788","DOI":"10.1038\/44565","article-title":"Learning the parts of objects by non-negative matrix factorization","volume":"401","author":"Lee","year":"1999","journal-title":"Nature"},{"key":"2021110815090083600_ref50","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1109\/TPAMI.2006.60","article-title":"Nonsmooth nonnegative matrix factorization (nsNMF)","volume":"28","author":"Pascual-Montano","year":"2006","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2021110815090083600_ref51","doi-asserted-by":"crossref","DOI":"10.1038\/s41587-019-0336-3","article-title":"Visualizing structure and transitions in high-dimensional biological data","volume":"37","author":"Moon","year":"2019","journal-title":"Nature biotechnology"},{"key":"2021110815090083600_ref52","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1137\/090771806","article-title":"Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions","volume":"53","author":"Halko","year":"2011","journal-title":"SIAM Rev"},{"key":"2021110815090083600_ref53","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1080\/14786440109462720","article-title":"LIII.\u00a0On lines and planes of closest fit to systems of points in space","volume":"2","author":"Pearson","year":"1901","journal-title":"London Edinburgh Dublin Philos Mag J Sci"},{"key":"2021110815090083600_ref54","doi-asserted-by":"crossref","DOI":"10.1155\/2008\/764206","article-title":"Theorems on positive data: on the uniqueness of NMF","volume":"2008","author":"Laurberg","year":"2008","journal-title":"Comput Intell Neurosci"},{"key":"2021110815090083600_ref55","article-title":"Probabilistic sparse matrix factorization","volume-title":"University of Toronto technical report PSI\u20132004\u201323","author":"Dueck","year":"2004"},{"key":"2021110815090083600_ref56","doi-asserted-by":"crossref","first-page":"1139","DOI":"10.1038\/s41592-019-0576-7","article-title":"Exploring single-cell data with deep multitasking neural networks","volume":"16","author":"Amodio","year":"2019","journal-title":"Nat Methods"},{"key":"2021110815090083600_ref57","doi-asserted-by":"crossref","first-page":"1495","DOI":"10.1093\/bioinformatics\/btm134","article-title":"Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis","volume":"23","author":"Kim","year":"2007","journal-title":"Bioinformatics"},{"key":"2021110815090083600_ref58","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1198\/106186006X113430","article-title":"Sparse principal component analysis","volume":"15","author":"Zou","year":"2006","journal-title":"J Comput Graph Stat"},{"key":"2021110815090083600_ref59","doi-asserted-by":"publisher","first-page":"287","DOI":"10.1145\/1150402.1150436","article-title":"Very sparse random projections","volume-title":"Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Li","year":"2006"},{"key":"2021110815090083600_ref60","doi-asserted-by":"crossref","first-page":"585","DOI":"10.7551\/mitpress\/1120.003.0080","volume-title":"Advances in Neural Information Processing Systems 14","author":"Belkin","year":"2002"},{"key":"2021110815090083600_ref61","doi-asserted-by":"crossref","first-page":"1373","DOI":"10.1162\/089976603321780317","article-title":"Laplacian eigenmaps for dimensionality reduction and data representation","volume":"15","author":"Belkin","year":"2003","journal-title":"Neural Comput"},{"key":"2021110815090083600_ref62","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten","year":"2008","journal-title":"J Mach Learn Res"},{"key":"2021110815090083600_ref63","article-title":"Multicore-TSNE","volume-title":"GitHub repository","author":"Ulyanov","year":"2016"},{"key":"2021110815090083600_ref64","article-title":"UMAP: Uniform Manifold Approximation and Projection for dimension reduction","author":"McInnes","year":"2018","journal-title":"arXiv"},{"key":"2021110815090083600_ref65","doi-asserted-by":"crossref","DOI":"10.1016\/j.gpb.2018.08.003","article-title":"VASC: dimension reduction and visualization of single-cell RNA-seq data by deep variational autoencoder","volume":"16","author":"Wang","year":"2018","journal-title":"Genomics, proteomics & bioinformatics"},{"key":"2021110815090083600_ref66","article-title":"VPAC: variational projection for accurate clustering of single-cell transcriptomic data","volume":"20","author":"Chen","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2021110815090083600_ref67","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1186\/s13059-015-0805-z","article-title":"Dimensionality reduction for zero-inflated single-cell gene expression analysis","volume":"16","author":"Pierson","year":"2015","journal-title":"Genome Biol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab304\/41090155\/bbab304.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab304\/41090155\/bbab304.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,4]],"date-time":"2024-09-04T09:47:15Z","timestamp":1725443235000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab304\/6347204"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,10]]},"references-count":67,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,11,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab304","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.10.29.361451","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,11]]},"published":{"date-parts":[[2021,8,10]]},"article-number":"bbab304"}}