{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,26]],"date-time":"2026-04-26T05:06:51Z","timestamp":1777180011731,"version":"3.51.4"},"reference-count":60,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2018,8,22]],"date-time":"2018-08-22T00:00:00Z","timestamp":1534896000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Australian Research Council Discovery Early Career Researcher Award","award":["DE170100759"],"award-info":[{"award-number":["DE170100759"]}]},{"name":"Australian Research Council Discovery Projects","award":["DP170100654"],"award-info":[{"award-number":["DP170100654"]}]},{"DOI":"10.13039\/501100000925","name":"National Health and Medical Research Council Career Development Fellowships","doi-asserted-by":"publisher","award":["1111338"],"award-info":[{"award-number":["1111338"]}],"id":[{"id":"10.13039\/501100000925","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Judith and David Coffey Life Lab"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,11,27]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Advances in high-throughput sequencing on single-cell gene expressions [single-cell RNA sequencing (scRNA-seq)] have enabled transcriptome profiling on individual cells from complex samples. A common goal in scRNA-seq data analysis is to discover and characterise cell types, typically through clustering methods. The quality of the clustering therefore plays a critical role in biological discovery. While numerous clustering algorithms have been proposed for scRNA-seq data, fundamentally they all rely on a similarity metric for categorising individual cells. Although several studies have compared the performance of various clustering algorithms for scRNA-seq data, currently there is no benchmark of different similarity metrics and their influence on scRNA-seq data clustering. Here, we compared a panel of similarity metrics on clustering a collection of annotated scRNA-seq datasets. Within each dataset, a stratified subsampling procedure was applied and an array of evaluation measures was employed to assess the similarity metrics. This produced a highly reliable and reproducible consensus on their performance assessment. Overall, we found that correlation-based metrics (e.g. Pearson\u2019s correlation) outperformed distance-based metrics (e.g. Euclidean distance). To test if the use of correlation-based metrics can benefit the recently published clustering techniques for scRNA-seq data, we modified a state-of-the-art kernel-based clustering algorithm (SIMLR) using Pearson\u2019s correlation as a similarity measure and found significant performance improvement over Euclidean distance on scRNA-seq data clustering. These findings demonstrate the importance of similarity metrics in clustering scRNA-seq data and highlight Pearson\u2019s correlation as a favourable choice. Further comparison on different scRNA-seq library preparation protocols suggests that they may also affect clustering performance. Finally, the benchmarking framework is available at http:\/\/www.maths.usyd.edu.au\/u\/SMS\/bioinformatics\/software.html.<\/jats:p>","DOI":"10.1093\/bib\/bby076","type":"journal-article","created":{"date-parts":[[2018,8,6]],"date-time":"2018-08-06T12:03:31Z","timestamp":1533557011000},"page":"2316-2326","source":"Crossref","is-referenced-by-count":124,"title":["Impact of similarity metrics on single-cell RNA-seq data clustering"],"prefix":"10.1093","volume":"20","author":[{"given":"Taiyun","family":"Kim","sequence":"first","affiliation":[{"name":"School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia"}]},{"given":"Irene Rui","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia"}]},{"given":"Yingxin","family":"Lin","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia"}]},{"given":"Andy Yi-Yang","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Anaesthesia, The University of Sydney Northern Clinical School, The University of Sydney, Sydney, NSW 2006, Australia"}]},{"given":"Jean Yee Hwa","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1098-3138","authenticated-orcid":false,"given":"Pengyi","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia"}]}],"member":"286","published-online":{"date-parts":[[2018,8,22]]},"reference":[{"key":"2020011102342976000_ref1","doi-asserted-by":"crossref","first-page":"776","DOI":"10.1126\/science.1247651","article-title":"Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types","volume":"343","author":"Jaitin","year":"2014","journal-title":"Science"},{"key":"2020011102342976000_ref2","doi-asserted-by":"crossref","first-page":"610","DOI":"10.1016\/j.molcel.2015.04.005","article-title":"The technology and biology of single-cell RNA sequencing","volume":"58","author":"Kolodziejczyk","year":"2015","journal-title":"Mol Cell"},{"key":"2020011102342976000_ref3","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1038\/nrg3833","article-title":"Computational and analytical challenges in single-cell transcriptomics","volume":"16","author":"Stegle","year":"2015","journal-title":"Nat Rev Genet"},{"key":"2020011102342976000_ref4","doi-asserted-by":"crossref","first-page":"1179","DOI":"10.1093\/bioinformatics\/btw777","article-title":"Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R","volume":"33","author":"McCarthy","year":"2017","journal-title":"Bioinformatics"},{"key":"2020011102342976000_ref5","doi-asserted-by":"crossref","DOI":"10.1186\/s13059-016-0927-y","article-title":"Design and computational analysis of single-cell RNA-sequencing experiments","volume":"17","author":"Bacher","year":"2016","journal-title":"Genome Biol"},{"key":"2020011102342976000_ref6","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1038\/nature14966","article-title":"Single-cell messenger RNA sequencing reveals rare intestinal cell types","volume":"525","author":"Gr\u00fcn","year":"2015","journal-title":"Nature"},{"key":"2020011102342976000_ref7","doi-asserted-by":"crossref","first-page":"1138","DOI":"10.1126\/science.aaa1934","article-title":"Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq","volume":"347","author":"Zeisel","year":"2015","journal-title":"Science"},{"key":"2020011102342976000_ref8","doi-asserted-by":"crossref","first-page":"1131","DOI":"10.1038\/nn.4366","article-title":"Disentangling neural cell diversity using single-cell transcriptomics","volume":"19","author":"Poulin","year":"2016","journal-title":"Nat Neurosci"},{"key":"2020011102342976000_ref9","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1038\/nmeth.3863","article-title":"Automated mapping of phenotype space with single-cell data","volume":"13","author":"Samusik","year":"2016","journal-title":"Nat Methods"},{"key":"2020011102342976000_ref10","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1038\/nn.4216","article-title":"Adult mouse cortical cell taxonomy revealed by single cell transcriptomics","volume":"19","author":"Tasic","year":"2016","journal-title":"Nat Neurosci"},{"key":"2020011102342976000_ref11","doi-asserted-by":"crossref","first-page":"708","DOI":"10.1038\/ng.3818","article-title":"Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors","volume":"49","author":"Li","year":"2017","journal-title":"Nat Genet"},{"key":"2020011102342976000_ref12","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1006245","article-title":"Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database","volume":"14","author":"Zappia","year":"2018","journal-title":"PLoS Comput Biol"},{"key":"2020011102342976000_ref13","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1038\/nbt.3102","article-title":"Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells","volume":"33","author":"Buettner","year":"2015","journal-title":"Nat Biotechnol"},{"key":"2020011102342976000_ref14","doi-asserted-by":"crossref","first-page":"266","DOI":"10.1016\/j.stem.2016.05.010","article-title":"De novo prediction of stem cell identity using single-cell transcriptome data","volume":"19","author":"Gr\u00fcn","year":"2016","journal-title":"Cell Stem Cell"},{"key":"2020011102342976000_ref15","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1038\/nmeth.4236","article-title":"SC3: consensus clustering of single-cell RNA-seq data","volume":"14","author":"Kiselev","year":"2017","journal-title":"Nat Methods"},{"key":"2020011102342976000_ref16","doi-asserted-by":"crossref","DOI":"10.1186\/s13059-017-1188-0","article-title":"CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data","volume":"18","author":"Lin","year":"2017","journal-title":"Genome Biol"},{"key":"2020011102342976000_ref17","article-title":"Visualizing the structure of RNA-seq expression data using grade of membership models","volume":"13","author":"Dey","year":"2017","journal-title":"PLoS Genet"},{"key":"2020011102342976000_ref18","doi-asserted-by":"crossref","first-page":"1202","DOI":"10.1016\/j.cell.2015.05.002","article-title":"Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets","volume":"161","author":"Macosko","year":"2015","journal-title":"Cell"},{"key":"2020011102342976000_ref19","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1038\/nmeth.4207","article-title":"Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning","volume":"14","author":"Wang","year":"2017","journal-title":"Nat Methods"},{"key":"2020011102342976000_ref20","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0144059","article-title":"A comparison study on similarity and dissimilarity measures in clustering continuous data","volume":"10","author":"Shirkhorshidi","year":"2015","journal-title":"PLoS One"},{"key":"2020011102342976000_ref21","doi-asserted-by":"crossref","DOI":"10.1137\/1.9781611972788.22","article-title":"Similarity measures for categorical data: a comparative evaluation","author":"Boriah","year":"2008"},{"key":"2020011102342976000_ref22","doi-asserted-by":"crossref","DOI":"10.1109\/ICPR.2006.392","article-title":"Comparison of similarity measures for trajectory clustering in outdoor surveillance scenes","author":"Zhang","year":"2006"},{"key":"2020011102342976000_ref23","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1109\/COMST.2014.2336610","article-title":"A survey of distance and similarity measures used within network intrusion anomaly detection","volume":"17","author":"Weller-Fahy","year":"2015","journal-title":"IEEE Commun Surv Tutor"},{"key":"2020011102342976000_ref24","first-page":"9","article-title":"Clustering techniques and the similarity measures used in clustering: a survey","volume":"134","author":"Irani","year":"2016","journal-title":"Int J Comput Appl"},{"key":"2020011102342976000_ref25","doi-asserted-by":"crossref","first-page":"240","DOI":"10.1093\/bfgp\/elx044","article-title":"Clustering single cells: a review of approaches on high- and low-depth single-cell RNA-seq data","volume":"17","author":"Menon","year":"2018","journal-title":"Brief Funct Genomics"},{"key":"2020011102342976000_ref26","doi-asserted-by":"crossref","first-page":"D991","DOI":"10.1093\/nar\/gks1193","article-title":"NCBI GEO: archive for functional genomics data sets\u2014update","volume":"41","author":"Barrett","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2020011102342976000_ref27","doi-asserted-by":"crossref","first-page":"D926","DOI":"10.1093\/nar\/gkt1270","article-title":"Expression Atlas update\u2014a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments","volume":"42","author":"Petryszak","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2020011102342976000_ref28","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1126\/science.1245316","article-title":"Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells","volume":"343","author":"Deng","year":"2014","journal-title":"Science"},{"key":"2020011102342976000_ref29","doi-asserted-by":"crossref","first-page":"1437","DOI":"10.1016\/j.cell.2015.05.015","article-title":"The transcriptome and DNA methylome landscapes of human primordial germ cells","volume":"161","author":"Guo","year":"2015","journal-title":"Cell"},{"key":"2020011102342976000_ref30","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.1507125112","article-title":"A survey of human brain transcriptome diversity at the single cell level","volume":"112","author":"Darmanis","year":"2015","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2020011102342976000_ref31","doi-asserted-by":"crossref","first-page":"1126","DOI":"10.1016\/j.celrep.2016.06.059","article-title":"Cellular taxonomy of the mouse striatum as revealed by single-cell RNA-seq","volume":"16","author":"Gokce","year":"2016","journal-title":"Cell Rep"},{"key":"2020011102342976000_ref32","doi-asserted-by":"crossref","first-page":"15672","DOI":"10.1073\/pnas.1520760112","article-title":"Human cerebral organoids recapitulate gene expression programs of fetal neocortex development","volume":"112","author":"Camp","year":"2015","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2020011102342976000_ref33","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1186\/s13059-016-1033-x","article-title":"Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm","volume":"17","author":"Chu","year":"2016","journal-title":"Genome Biol"},{"key":"2020011102342976000_ref34","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.cels.2016.08.011","article-title":"A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure","volume":"3","author":"Baron","year":"2016","journal-title":"Cell Syst"},{"key":"2020011102342976000_ref35","doi-asserted-by":"crossref","first-page":"2861","DOI":"10.1084\/jem.20161135","article-title":"Human dendritic cells (DCs) are derived from distinct circulating precursors that are precommitted to become CD1c+ or CD141+ DCs","volume":"213","author":"Breton","year":"2016","journal-title":"J Exp Med"},{"key":"2020011102342976000_ref36","doi-asserted-by":"crossref","first-page":"eaah4573","DOI":"10.1126\/science.aah4573","article-title":"Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors","volume":"356","author":"Villani","year":"2017","journal-title":"Science"},{"key":"2020011102342976000_ref37","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1038\/nature18633","article-title":"Resolving early mesoderm diversification through single-cell expression profiling","volume":"535","author":"Scialdone","year":"2016","journal-title":"Nature"},{"key":"2020011102342976000_ref38","doi-asserted-by":"crossref","first-page":"925","DOI":"10.1126\/science.aad7038","article-title":"Div-seq: single-nucleus RNA-seq reveals dynamics of rare adult newborn neurons","volume":"353","author":"Habib","year":"2016","journal-title":"Science"},{"key":"2020011102342976000_ref39","doi-asserted-by":"crossref","first-page":"1860","DOI":"10.1101\/gr.192237.115","article-title":"Single-cell RNA-seq reveals changes in cell cycle and differentiation programs upon aging of hematopoietic stem cells","volume":"25","author":"Kowalczyk","year":"2015","journal-title":"Genome Res"},{"key":"2020011102342976000_ref40","doi-asserted-by":"crossref","first-page":"1012","DOI":"10.1016\/j.cell.2016.03.023","article-title":"Single-cell RNA-seq reveals lineage and X chromosome dynamics in human preimplantation embryos","volume":"165","author":"Petropoulos","year":"2016","journal-title":"Cell"},{"key":"2020011102342976000_ref41","doi-asserted-by":"crossref","first-page":"1035","DOI":"10.1016\/j.neuron.2017.02.014","article-title":"Single-cell profiling of an in vitro model of human interneuron development reveals temporal dynamics of cell type production and maturation","volume":"93","author":"Close","year":"2017","journal-title":"Neuron"},{"key":"2020011102342976000_ref42","doi-asserted-by":"crossref","first-page":"858","DOI":"10.1016\/j.stem.2017.03.007","article-title":"Single-cell RNA-seq analysis maps development of human germline cells and gonadal niche interactions","volume":"20","author":"Li","year":"2017","journal-title":"Cell Stem Cell"},{"key":"2020011102342976000_ref43","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1038\/nature20123","article-title":"Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma","volume":"539","author":"Tirosh","year":"2016","journal-title":"Nature"},{"key":"2020011102342976000_ref44","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1126\/science.aad0501","article-title":"Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq","volume":"352","author":"Tirosh","year":"2016","journal-title":"Science"},{"key":"2020011102342976000_ref45","doi-asserted-by":"crossref","first-page":"955","DOI":"10.1038\/nmeth.4407","article-title":"Massively parallel single-nucleus RNA-seq with DroNc-seq","volume":"14","author":"Habib","year":"2017","journal-title":"Nat Methods"},{"key":"2020011102342976000_ref46","doi-asserted-by":"crossref","first-page":"1308","DOI":"10.1016\/j.cell.2016.07.054","article-title":"Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics","volume":"166","author":"Shekhar","year":"2016","journal-title":"Cell"},{"key":"2020011102342976000_ref47","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1093\/bioinformatics\/btp616","article-title":"edgeR: a bioconductor package for differential expression analysis of digital gene expression data","volume":"26","author":"Robinson","year":"2010","journal-title":"Bioinformatics"},{"key":"2020011102342976000_ref48","doi-asserted-by":"crossref","first-page":"e179","DOI":"10.1093\/nar\/gkx828","article-title":"Linnorm: improved statistical analysis for single cell RNA-seq expression data","volume":"45","author":"Yip","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2020011102342976000_ref49","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/s41592-018-0033-z","article-title":"SAVER: gene expression recovery for single-cell RNA sequencing","volume":"15","author":"Huang","year":"2018","journal-title":"Nat Methods"},{"key":"2020011102342976000_ref50","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","article-title":"Least squares quantization in PCM","volume":"28","author":"Lloyd","year":"1982","journal-title":"IEEE Trans Inf Theory"},{"key":"2020011102342976000_ref51","first-page":"1","article-title":"Comparing clusterings\u2014an overview","volume":"4769","author":"Wagner","year":"2007","journal-title":"Analysis"},{"key":"2020011102342976000_ref52","doi-asserted-by":"crossref","first-page":"997","DOI":"10.1038\/s41467-018-03405-7","article-title":"An accurate and robust imputation method scImpute for single-cell RNA-seq data","volume":"9","author":"Li","year":"2018","journal-title":"Nat Commun"},{"key":"2020011102342976000_ref53","doi-asserted-by":"crossref","first-page":"1096","DOI":"10.1038\/nmeth.2639","article-title":"Smart-seq2 for sensitive full-length transcriptome profiling in single cells","volume":"10","author":"Picelli","year":"2013","journal-title":"Nat Methods"},{"key":"2020011102342976000_ref54","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1038\/nmeth.2694","article-title":"Quantitative assessment of single-cell RNA-sequencing methods","volume":"11","author":"Wu","year":"2014","journal-title":"Nat Methods"},{"key":"2020011102342976000_ref55","doi-asserted-by":"crossref","first-page":"1187","DOI":"10.1016\/j.cell.2015.04.044","article-title":"Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells","volume":"161","author":"Klein","year":"2015","journal-title":"Cell"},{"key":"2020011102342976000_ref56","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1038\/nmeth.4220","article-title":"Power analysis of single-cell RNA-sequencing experiments","volume":"14","author":"Svensson","year":"2017","journal-title":"Nat Methods"},{"key":"2020011102342976000_ref57","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1016\/j.molcel.2017.01.023","article-title":"Comparative analysis of single-cell RNA sequencing methods","volume":"65","author":"Ziegenhain","year":"2017","journal-title":"Mol Cell"},{"key":"2020011102342976000_ref58","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1186\/s13059-016-0947-7","article-title":"Pooling across cells to normalize single-cell RNA sequencing data with many zero counts","volume":"17","author":"Lun","year":"2016","journal-title":"Genome Biol"},{"key":"2020011102342976000_ref59","doi-asserted-by":"crossref","first-page":"740","DOI":"10.1038\/nmeth.2967","article-title":"Bayesian approach to single-cell differential expression analysis","volume":"11","author":"Kharchenko","year":"2014","journal-title":"Nat Methods"},{"key":"2020011102342976000_ref60","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1016\/j.ymeth.2015.06.021","article-title":"Computational assignment of cell-cycle stage from single-cell transcriptome data","volume":"85","author":"Scialdone","year":"2015","journal-title":"Methods"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/20\/6\/2316\/31789357\/bby076.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/20\/6\/2316\/31789357\/bby076.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,4]],"date-time":"2023-09-04T05:21:27Z","timestamp":1693804887000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/20\/6\/2316\/5077112"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,8,22]]},"references-count":60,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2018,8,22]]},"published-print":{"date-parts":[[2019,11,27]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bby076","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,11]]},"published":{"date-parts":[[2018,8,22]]}}}