{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T02:16:11Z","timestamp":1781144171236,"version":"3.54.1"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2020,6,26]],"date-time":"2020-06-26T00:00:00Z","timestamp":1593129600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01 HL129132"],"award-info":[{"award-number":["R01 HL129132"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01 GM105785"],"award-info":[{"award-number":["R01 GM105785"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,5,20]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Batch effect correction has been recognized to be indispensable when integrating single-cell RNA sequencing (scRNA-seq) data from multiple batches. State-of-the-art methods ignore single-cell cluster label information, but such information can improve the effectiveness of batch effect correction, particularly under realistic scenarios where biological differences are not orthogonal to batch effects. To address this issue, we propose SMNN for batch effect correction of scRNA-seq data via supervised mutual nearest neighbor detection. Our extensive evaluations in simulated and real datasets show that SMNN provides improved merging within the corresponding cell types across batches, leading to reduced differentiation across batches over MNN, Seurat v3 and LIGER. Furthermore, SMNN retains more cell-type-specific features, partially manifested by differentially expressed genes identified between cell types after SMNN correction being biologically more relevant, with precision improving by up to 841.0%.<\/jats:p>","DOI":"10.1093\/bib\/bbaa097","type":"journal-article","created":{"date-parts":[[2020,5,1]],"date-time":"2020-05-01T15:08:51Z","timestamp":1588345731000},"source":"Crossref","is-referenced-by-count":24,"title":["SMNN: batch effect correction for single-cell RNA-seq data via supervised mutual nearest neighbor detection"],"prefix":"10.1093","volume":"22","author":[{"given":"Yuchen","family":"Yang","sequence":"first","affiliation":[{"name":"Department of Genetics at the University of North Carolina at Chapel Hill"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Gang","family":"Li","sequence":"additional","affiliation":[{"name":"Department of Statistics and Operations Research at the University of North Carolina at Chapel Hill"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8171-1378","authenticated-orcid":false,"given":"Huijun","family":"Qian","sequence":"additional","affiliation":[{"name":"Department of Statistics and Operations Research at the University of North Carolina at Chapel Hill"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kirk C","family":"Wilhelmsen","sequence":"additional","affiliation":[{"name":"Department of Genetics at the University of North Carolina at Chapel Hill"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yin","family":"Shen","sequence":"additional","affiliation":[{"name":"Institute for Human Genetics and Department of Neurology at the University of California San Francisco"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9275-4189","authenticated-orcid":false,"given":"Yun","family":"Li","sequence":"additional","affiliation":[{"name":"Departments of Genetics, Biostatistics and Computer Science at the University of North Carolina at Chapel Hill"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2020,6,26]]},"reference":[{"key":"2021073106395331900_ref1","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1038\/550451a","article-title":"The human cell atlas: from vision to reality","volume":"550","author":"Rozenblatt-Rosen","year":"2017","journal-title":"Nat News"},{"key":"2021073106395331900_ref2","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1038\/nrg3833","article-title":"Computational and analytical challenges in single-cell transcriptomics","volume":"16","author":"Stegle","year":"2015","journal-title":"Nat Rev Genet"},{"key":"2021073106395331900_ref3","doi-asserted-by":"crossref","first-page":"13587","DOI":"10.1038\/s41598-017-13665-w","article-title":"Controlling for confounding effects in single cell RNA sequencing studies using both control and target genes","volume":"7","author":"Chen","year":"2017","journal-title":"Sci Rep"},{"key":"2021073106395331900_ref4","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1038\/s41576-019-0093-7","article-title":"Integrative single-cell analysis","volume":"20","author":"Stuart","year":"2019","journal-title":"Nat Rev Genet"},{"key":"2021073106395331900_ref5","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1007\/0-387-29362-0_23","volume-title":"Limma: linear models for microarray data. Bioinformatics and computational biology solutions using R and Bioconductor","author":"Smyth","year":"2005"},{"key":"2021073106395331900_ref6","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1093\/biostatistics\/kxj037","article-title":"Adjusting batch effects in microarray expression data using empirical Bayes methods","volume":"8","author":"Johnson","year":"2007","journal-title":"Biostatistics"},{"key":"2021073106395331900_ref7","doi-asserted-by":"crossref","first-page":"e161","DOI":"10.1093\/nar\/gku864","article-title":"Svaseq: removing batch effects and other unwanted noise from sequencing data","volume":"42","author":"Leek","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2021073106395331900_ref8","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1038\/nbt.4091","article-title":"Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors","volume":"36","author":"Haghverdi","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2021073106395331900_ref9","first-page":"3221","article-title":"Accelerating t-SNE using tree-based algorithms","volume":"15","author":"Van Der Maaten","year":"2014","journal-title":"J Mach Learn Res"},{"key":"2021073106395331900_ref10","doi-asserted-by":"crossref","first-page":"e20","DOI":"10.1182\/blood-2016-05-716480","article-title":"A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation","volume":"128","author":"Nestorowa","year":"2016","journal-title":"Blood"},{"key":"2021073106395331900_ref11","doi-asserted-by":"crossref","first-page":"1259425","DOI":"10.1126\/science.1259425","article-title":"An interactive reference framework for modeling a dynamic immune system","volume":"349","author":"Spitzer","year":"2015","journal-title":"Science"},{"key":"2021073106395331900_ref12","doi-asserted-by":"crossref","first-page":"1888","DOI":"10.1016\/j.cell.2019.05.031","article-title":"Comprehensive integration of single-cell data","volume":"177","author":"Stuart","year":"2019","journal-title":"Cell"},{"key":"2021073106395331900_ref13","doi-asserted-by":"crossref","first-page":"1141","DOI":"10.12688\/f1000research.15666.2","article-title":"A systematic performance evaluation of clustering methods for single-cell RNA-seq data","volume":"7","author":"Du\u00f2","year":"2018","journal-title":"F1000Res"},{"key":"2021073106395331900_ref14","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1038\/s41576-018-0088-9","article-title":"Challenges in unsupervised clustering of single-cell RNA-seq data","volume":"20","author":"Kiselev","year":"2019","journal-title":"Nat Rev Genet"},{"key":"2021073106395331900_ref15","doi-asserted-by":"crossref","first-page":"466","DOI":"10.1073\/pnas.1817715116","article-title":"Semisoft clustering of single-cell data","volume":"116","author":"Zhu","year":"2019","journal-title":"P Natl Acad Sci USA"},{"key":"2021073106395331900_ref16","doi-asserted-by":"crossref","first-page":"1649","DOI":"10.1038\/s41467-019-09639-3","article-title":"A Bayesian mixture model for clustering droplet-based single-cell transcriptomic data from population studies","volume":"10","author":"Sun","year":"2019","journal-title":"Nat Commun"},{"key":"2021073106395331900_ref17","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1038\/nbt.4096","article-title":"Integrating single-cell transcriptomic data across different conditions, technologies, and species","volume":"36","author":"Butler","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2021073106395331900_ref18","doi-asserted-by":"crossref","first-page":"1269","DOI":"10.1093\/bioinformatics\/bty793","article-title":"SAFE-clustering: single-cell aggregated (from ensemble) clustering for single-cell RNA-seq data","volume":"35","author":"Yang","year":"2019","journal-title":"Bioinformatics"},{"key":"2021073106395331900_ref19","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1093\/nar\/gkz959","article-title":"SAME-clustering: single-cell aggregated clustering via mixture model ensemble","volume":"48","author":"Huh","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2021073106395331900_ref20","volume-title":"Matrix computations","author":"Van Loan","year":"1983"},{"key":"2021073106395331900_ref21","doi-asserted-by":"crossref","first-page":"3504","DOI":"10.4161\/cc.21802","article-title":"Impaired adult myeloid progenitor CMP and GMP cell function in conditional c-myb-knockout mice","volume":"11","author":"Lieu","year":"2012","journal-title":"Cell Cycle"},{"key":"2021073106395331900_ref22","doi-asserted-by":"crossref","first-page":"1873","DOI":"10.1016\/j.cell.2019.05.006","article-title":"Single-cell multi-omic integration compares and contrasts features of brain cell identity","volume":"177","author":"Welch","year":"2019","journal-title":"Cell"},{"key":"2021073106395331900_ref23","doi-asserted-by":"crossref","first-page":"1663","DOI":"10.1016\/j.cell.2015.11.013","article-title":"Transcriptional heterogeneity and lineage commitment in myeloid progenitors","volume":"163","author":"Paul","year":"2015","journal-title":"Cell"},{"key":"2021073106395331900_ref24","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1038\/nbt.4314","article-title":"Dimensionality reduction for visualizing single-cell data using UMAP","volume":"37","author":"Becht","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2021073106395331900_ref25","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1089\/omi.2011.0118","article-title":"clusterProfiler: an R package for comparing biological themes among gene clusters","volume":"16","author":"Yu","year":"2012","journal-title":"OMICS"},{"key":"2021073106395331900_ref26","doi-asserted-by":"crossref","first-page":"266","DOI":"10.1016\/j.stem.2016.05.010","article-title":"De novo prediction of stem cell identity using single-cell transcriptome data","volume":"19","author":"Gr\u00fcn","year":"2016","journal-title":"Cell Stem Cell"},{"key":"2021073106395331900_ref27","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1016\/j.cels.2016.09.002","article-title":"A single-cell transcriptome atlas of the human pancreas","volume":"3","author":"Muraro","year":"2016","journal-title":"Cell Syst"},{"key":"2021073106395331900_ref28","doi-asserted-by":"crossref","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat Commun"},{"key":"2021073106395331900_ref29","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J Classif"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/3\/bbaa097\/39503724\/bbaa097.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/3\/bbaa097\/39503724\/bbaa097.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,7,31]],"date-time":"2021-07-31T02:40:24Z","timestamp":1627699224000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaa097\/5855265"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,26]]},"references-count":29,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,5,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaa097","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/672261","asserted-by":"object"}]},"ISSN":["1477-4054"],"issn-type":[{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,5]]},"published":{"date-parts":[[2020,6,26]]},"article-number":"bbaa097"}}