{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T11:44:26Z","timestamp":1753875866114,"version":"3.41.2"},"reference-count":55,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2021,6,15]],"date-time":"2021-06-15T00:00:00Z","timestamp":1623715200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61972185","U1909208"],"award-info":[{"award-number":["61972185","U1909208"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100013314","name":"111 Project","doi-asserted-by":"publisher","award":["B18059"],"award-info":[{"award-number":["B18059"]}],"id":[{"id":"10.13039\/501100013314","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Hunan Provincial Science and Technology Program","award":["2018WK4001"],"award-info":[{"award-number":["2018WK4001"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,11,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>In single-cell RNA-seq (scRNA-seq) data analysis, a fundamental problem is to determine the number of cell clusters based on the gene expression profiles. However, the performance of current methods is still far from satisfactory, presumably due to their limitations in capturing the expression variability among cell clusters. Batch effects represent the undesired variability between data measured in different batches. When data are obtained from different labs or protocols batch effects occur. Motivated by the practice of batch effect removal, we considered cell clusters as batches. We hypothesized that the number of cell clusters (i.e. batches) could be correctly determined if the variances among clusters (i.e. batch effects) were removed. We developed a new method, namely, removal of batch effect and testing (REBET), for determining the number of cell clusters. In this method, cells are first partitioned into k clusters. Second, the batch effects among these k clusters are then removed. Third, the quality of batch effect removal is evaluated with the average range of normalized mutual information (ARNMI), which measures how uniformly the cells with batch-effects-removal are mixed. By testing a range of k values, the k value that corresponds to the lowest ARNMI is determined to be the optimal number of clusters. We compared REBET with state-of-the-art methods on 32 simulated datasets and 14 published scRNA-seq datasets. The results show that REBET can accurately and robustly estimate the number of cell clusters and outperform existing methods. Contact: H.D.L. (hongdong@csu.edu.cn) or Q.S.X. (qsxu@csu.edu.cn)<\/jats:p>","DOI":"10.1093\/bib\/bbab204","type":"journal-article","created":{"date-parts":[[2021,5,10]],"date-time":"2021-05-10T20:26:12Z","timestamp":1620678372000},"source":"Crossref","is-referenced-by-count":2,"title":["REBET: a method to determine the number of cell clusters based on batch effect removal"],"prefix":"10.1093","volume":"22","author":[{"given":"Zhao-Yu","family":"Fang","sequence":"first","affiliation":[{"name":"School of Mathematics and Statistics, Central South University, Changsha, Hunan 410083, P.R. China"}]},{"given":"Cui-Xiang","family":"Lin","sequence":"additional","affiliation":[{"name":"Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China"},{"name":"School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China"}]},{"given":"Yun-Pei","family":"Xu","sequence":"additional","affiliation":[{"name":"Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China"},{"name":"School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China"}]},{"given":"Hong-Dong","family":"Li","sequence":"additional","affiliation":[{"name":"Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China"},{"name":"School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China"}]},{"given":"Qing-Song","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, Central South University, Changsha, Hunan 410083, P.R. China"}]}],"member":"286","published-online":{"date-parts":[[2021,6,15]]},"reference":[{"issue":"6","key":"2021110815065735800_ref1","doi-asserted-by":"crossref","first-page":"1905","DOI":"10.1016\/j.celrep.2014.08.029","article-title":"Single-Cell RNA sequencing identifies extracellular matrix gene expression by pancreatic circulating tumor cells","volume":"8","author":"Ting","year":"2004","journal-title":"Cell Rep"},{"key":"2021110815065735800_ref2","doi-asserted-by":"crossref","first-page":"371","DOI":"10.3389\/fgene.2019.00371","article-title":"High-order correlation integration for single-cell or bulk RNA-seq data analysis","volume":"10","author":"Tang","year":"2019","journal-title":"Front Genet"},{"issue":"23","key":"2021110815065735800_ref3","doi-asserted-by":"crossref","first-page":"7285","DOI":"10.1073\/pnas.1507125112","article-title":"A survey of human brain transcriptome diversity at the single cell level","volume":"112","author":"Darmanis","year":"2015","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2021110815065735800_ref4","doi-asserted-by":"crossref","DOI":"10.1038\/ncomms15081","article-title":"Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer","volume":"8","author":"Chung","year":"2017","journal-title":"Nat Commun"},{"issue":"8","key":"2021110815065735800_ref5","doi-asserted-by":"crossref","first-page":"777","DOI":"10.1038\/nbt.2282","article-title":"Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells","volume":"30","author":"Ramsk\u00f6ld","year":"2012","journal-title":"Nat Biotechnol"},{"key":"2021110815065735800_ref6","first-page":"226","article-title":"A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining(KDD-96)","volume":"1996","author":"Ester","journal-title":"AAAI Press"},{"issue":"9","key":"2021110815065735800_ref7","doi-asserted-by":"crossref","first-page":"1464","DOI":"10.1109\/5.58325","article-title":"The self-organizing map","volume":"78","author":"Kohonen","year":"1990","journal-title":"Proc IEEE"},{"issue":"4","key":"2021110815065735800_ref8","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1007\/s11222-007-9033-z","article-title":"A tutorial on spectral clustering","volume":"17","author":"Luxburg","year":"2007","journal-title":"Stat Comput"},{"key":"2021110815065735800_ref9","first-page":"478","article-title":"Unsupervised deep embedding for clustering analysis","volume-title":"33rd International Conference on Machine Learning, ICML 2016","author":"Xie","year":"2016"},{"key":"2021110815065735800_ref10","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/A:1023949509487","article-title":"Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data","volume":"52","author":"Monti","year":"2003","journal-title":"Mach Learn"},{"issue":"12","key":"2021110815065735800_ref11","doi-asserted-by":"crossref","first-page":"1974","DOI":"10.1093\/bioinformatics\/btv088","article-title":"Identification of cell types from single-cell transcriptomes using a novel clustering method","volume":"31","author":"Xu","year":"2015","journal-title":"Bioinformatics"},{"issue":"5","key":"2021110815065735800_ref12","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1038\/nmeth.4236","article-title":"SC3: consensus clustering of single-cell RNA-seq data","volume":"14","author":"Kiselev","year":"2017","journal-title":"Nat Methods"},{"issue":"4","key":"2021110815065735800_ref13","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1038\/nmeth.4207","article-title":"Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning","volume":"14","author":"Wang","year":"2017","journal-title":"Nat Methods"},{"issue":"5","key":"2021110815065735800_ref14","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1038\/nbt.4096","article-title":"Integrating single-cell transcriptomic data across different conditions, technologies, and species","volume":"36","author":"Butler","year":"2018","journal-title":"Nat Biotechnol"},{"issue":"7","key":"2021110815065735800_ref15","doi-asserted-by":"crossref","first-page":"1888","DOI":"10.1016\/j.cell.2019.05.031","article-title":"Comprehensive integration of single-cell data","volume":"177","author":"Stuart","year":"2019","journal-title":"Cell"},{"issue":"5","key":"2021110815065735800_ref16","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1038\/s41576-018-0088-9","article-title":"Challenges in unsupervised clustering of single-cell RNA-seq data","volume":"20","author":"Kiselev","year":"2019","journal-title":"Nat Rev Genet"},{"key":"2021110815065735800_ref17","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1186\/s13059-017-1188-0","article-title":"Ultrafast and accurate clustering through imputation for single-cell RNA-seq data","volume":"18","author":"Lin","year":"2017","journal-title":"Genome Biol"},{"issue":"4","key":"2021110815065735800_ref18","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1038\/s42256-019-0037-0","article-title":"Clustering single-cell RNA-seq data with a model-based deep learning approach","volume":"1","author":"Tian","year":"2019","journal-title":"Nat Mach Intell"},{"issue":"7","key":"2021110815065735800_ref19","doi-asserted-by":"crossref","DOI":"10.1186\/gb-2002-3-7-research0036","article-title":"A prediction-based resampling method for estimating the number of clusters in a dataset","volume":"3","author":"Dudoit","year":"2002","journal-title":"Genome Biol"},{"issue":"2","key":"2021110815065735800_ref20","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1111\/1467-9868.00293","article-title":"Estimating the number of clusters in a data set via the gap statistic","volume":"63","author":"Tibshirani","year":"2001","journal-title":"J R STAT SOC B"},{"issue":"8","key":"2021110815065735800_ref21","first-page":"1269","article-title":"Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data","volume":"35","author":"Chen","year":"2019","journal-title":"Bioinformatics"},{"issue":"2","key":"2021110815065735800_ref22","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1007\/BF02099779","article-title":"Level spacing distributions and the bessel kernel","volume":"161","author":"Tracy","year":"1994","journal-title":"Commun Math Phys"},{"issue":"8","key":"2021110815065735800_ref23","doi-asserted-by":"crossref","first-page":"1269","DOI":"10.1093\/bioinformatics\/bty793","article-title":"SAFE-clustering: Single-cell aggregated (from ensemble) clustering for single-cell RNA-seq data","volume":"35","author":"Yang","year":"2018","journal-title":"Bioinformatics"},{"key":"2021110815065735800_ref24","doi-asserted-by":"crossref","first-page":"3155","DOI":"10.1038\/s41467-020-16904-3","article-title":"An entropy-based metric for assessing the purity of single cell populations","volume":"11","author":"Liu","year":"2020","journal-title":"Nat Commun"},{"issue":"11","key":"2021110815065735800_ref25","doi-asserted-by":"crossref","first-page":"4245","DOI":"10.1073\/pnas.1208949110","article-title":"Pattern discovery and cancer gene identification in integrated cancer genomic data","volume":"110","author":"Mo","year":"2013","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"3","key":"2021110815065735800_ref26","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1038\/nmeth.2810","article-title":"Similarity network fusion for aggregating data types on a genomic scale","volume":"11","author":"Wang","year":"2014","journal-title":"Nat Methods"},{"issue":"12","key":"2021110815065735800_ref27","doi-asserted-by":"crossref","first-page":"2025","DOI":"10.1101\/gr.215129.116","article-title":"A novel approach for data integration and disease subtyping","volume":"27","author":"Nguyen","year":"2017","journal-title":"Genome Res"},{"key":"2021110815065735800_ref28","doi-asserted-by":"crossref","first-page":"83","DOI":"10.3389\/fgene.2018.00083","article-title":"Adjusting for batch effects in DNA methylation microarray data, a lesson learned","volume":"9","author":"Price","year":"2018","journal-title":"Front Genet"},{"issue":"9","key":"2021110815065735800_ref29","doi-asserted-by":"crossref","first-page":"1724","DOI":"10.1371\/journal.pgen.0030161","article-title":"Storey JDCapturing heterogeneity in gene expression studies by surrogate variable analysis","volume":"3","author":"Leek","year":"2007","journal-title":"PLoS Genet"},{"issue":"4","key":"2021110815065735800_ref30","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1016\/S1046-2023(03)00155-5","article-title":"Normalization of cDNA microarray data","volume":"31","author":"Smyth","year":"2003","journal-title":"Methods"},{"issue":"7","key":"2021110815065735800_ref31","doi-asserted-by":"crossref","first-page":"e47","DOI":"10.1093\/nar\/gkv007","article-title":"Limma powers differential expression analyses for RNA-sequencing and microarray studies","volume":"43","author":"Ritchie","year":"2015","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"2021110815065735800_ref32","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1093\/biostatistics\/kxj037","article-title":"Adjusting batch effects in microarray expression data using empirical Bayes methods","volume":"8","author":"Johnson","year":"2007","journal-title":"Biostatistics"},{"key":"2021110815065735800_ref33","doi-asserted-by":"crossref","DOI":"10.1186\/s13059-019-1850-9","article-title":"A benchmark of batch-effect correction methods for single-cell RNA sequencing data","volume":"21","author":"Tran","year":"2020","journal-title":"Genome Biol"},{"issue":"6","key":"2021110815065735800_ref34","doi-asserted-by":"crossref","first-page":"882","DOI":"10.1093\/bioinformatics\/bts034","article-title":"The sva package for removing batch effects and other unwanted variation in high-throughput experiments","volume":"28","author":"Leek","year":"2012","journal-title":"Bioinformatics"},{"issue":"4","key":"2021110815065735800_ref35","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1002\/widm.32","article-title":"Cluster ensembles","volume":"1","author":"Ghosh","year":"2011","journal-title":"WIREs Data Mining Knowl Discov"},{"key":"2021110815065735800_ref36","doi-asserted-by":"crossref","DOI":"10.1186\/s13059-017-1305-0","article-title":"Splatter: simulation of single-cell RNA sequencing data","volume":"18","author":"Zappia","year":"2017","journal-title":"Genome Biol"},{"key":"2021110815065735800_ref37","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1109\/BIBM.2018.8621275","article-title":"BioRank: A similarity assessment method for single cell clustering","volume-title":"In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","author":"Xu","year":"2018"},{"issue":"7604","key":"2021110815065735800_ref38","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1038\/nature17997","article-title":"Tracing haematopoietic stem cell formation at single-cell resolution","volume":"533","author":"Zhou","year":"2016","journal-title":"Nature"},{"issue":"11","key":"2021110815065735800_ref39","doi-asserted-by":"crossref","first-page":"1787","DOI":"10.1101\/gr.177725.114","article-title":"Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing","volume":"24","author":"Biase","year":"2014","journal-title":"Genome Res"},{"issue":"9","key":"2021110815065735800_ref40","doi-asserted-by":"crossref","first-page":"1131","DOI":"10.1038\/nsmb.2660","article-title":"Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells","volume":"20","author":"Yan","year":"2013","journal-title":"Nat Struct Mol Biol"},{"issue":"1","key":"2021110815065735800_ref41","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.cell.2016.01.047","article-title":"Heterogeneity in Oct4 and Sox2 targets Biases cell fate in 4-cell mouse embryos","volume":"165","author":"Goolam","year":"2016","journal-title":"Cell"},{"issue":"6167","key":"2021110815065735800_ref42","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1126\/science.1245316","article-title":"Single-cell RNA-Seq reveals dynamic, random monoallelic gene expression in mammalian cells","volume":"343","author":"Deng","year":"2014","journal-title":"Science"},{"key":"2021110815065735800_ref43","doi-asserted-by":"crossref","DOI":"10.1038\/ncomms11075","article-title":"Single-cell RNA sequencing reveals molecular and functional platelet bias of aged haematopoietic stem cells","volume":"7","author":"Grover","year":"2016","journal-title":"Nat Commun"},{"issue":"6190","key":"2021110815065735800_ref44","doi-asserted-by":"crossref","first-page":"1396","DOI":"10.1126\/science.1254257","article-title":"Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma","volume":"344","author":"Patel","year":"2014","journal-title":"Science"},{"issue":"7500","key":"2021110815065735800_ref45","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1038\/nature13173","article-title":"Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq","volume":"509","author":"Treutlein","year":"2014","journal-title":"Nature"},{"issue":"6","key":"2021110815065735800_ref46","doi-asserted-by":"crossref","first-page":"728","DOI":"10.1038\/ni.3437","article-title":"Innate-like functions of natural killer T cell subsets result from highly divergent gene programs","volume":"17","author":"Engel","year":"2016","journal-title":"Nat Immunol"},{"issue":"1","key":"2021110815065735800_ref47","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1016\/j.molcel.2017.06.003","article-title":"Single-cell alternative splicing analysis with expedition reveals splicing dynamics during neuron differentiation","volume":"67","author":"Song","year":"2017","journal-title":"Mol Cell"},{"issue":"10","key":"2021110815065735800_ref48","doi-asserted-by":"crossref","first-page":"947","DOI":"10.1038\/nmeth.3549","article-title":"Oscope identifies oscillatory genes in unsynchronized single-cell RNA-seq experiments","volume":"12","author":"Leng","year":"2015","journal-title":"Nat Methods"},{"key":"2021110815065735800_ref49","doi-asserted-by":"crossref","DOI":"10.1038\/s41467-018-06052-0","article-title":"Unravelling subclonal heterogeneity and aggressive disease states in TNBC through single-cell RNA-seq","volume":"9","author":"Karaayvaz","year":"2018","journal-title":"Nat Commun"},{"issue":"1","key":"2021110815065735800_ref50","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1038\/nn.3881","article-title":"Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing","volume":"18","author":"Usoskin","year":"2015","journal-title":"Nat Neurosci"},{"issue":"12","key":"2021110815065735800_ref51","doi-asserted-by":"crossref","first-page":"1572","DOI":"10.1093\/bioinformatics\/btq170","article-title":"ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking","volume":"26","author":"Wilkerson","year":"2010","journal-title":"Bioinformatics"},{"key":"2021110815065735800_ref52","first-page":"1601","article-title":"Self-tuning spectral clustering","volume-title":"NIPS\u201904: Proceedings of the 17th International Conference on Neural Information Processing Systems","author":"Zelnik-Manor","year":"2004"},{"issue":"11","key":"2021110815065735800_ref53","first-page":"1","article-title":"A smart local moving algorithm for large-scale modularity-based community detection","volume":"86","author":"Ludo","year":"2013","journal-title":"Eur Phys J B"},{"key":"2021110815065735800_ref54","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J Classif"},{"issue":"13","key":"2021110815065735800_ref55","doi-asserted-by":"crossref","first-page":"i79","DOI":"10.1093\/bioinformatics\/bty260","article-title":"Random forest based similarity learning for single cell RNA sequencing data","volume":"34","author":"Baran","year":"2018","journal-title":"Bioinformatics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab204\/41089357\/bbab204.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab204\/41089357\/bbab204.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,11,8]],"date-time":"2021-11-08T15:13:11Z","timestamp":1636384391000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab204\/6299206"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,15]]},"references-count":55,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,11,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab204","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"type":"print","value":"1467-5463"},{"type":"electronic","value":"1477-4054"}],"subject":[],"published-other":{"date-parts":[[2021,11]]},"published":{"date-parts":[[2021,6,15]]},"article-number":"bbab204"}}