{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T21:00:41Z","timestamp":1775077241486,"version":"3.50.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2020,11,29]],"date-time":"2020-11-29T00:00:00Z","timestamp":1606608000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000923","name":"Silicon Valley Community Foundation","doi-asserted-by":"publisher","award":["2018\u2013182730"],"award-info":[{"award-number":["2018\u2013182730"]}],"id":[{"id":"10.13039\/100000923","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100001440","name":"American Dental Association Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100001440","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100009810","name":"Brain Science Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100009810","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007540","name":"Jiangsu Agricultural Science and Technology Innovation Fund","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100007540","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,7,20]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Single cell\/nucleus RNA sequencing (scRNAseq) is emerging as an essential tool to unravel the phenotypic heterogeneity of cells in complex biological systems. While computational methods for scRNAseq cell type clustering have advanced, the ability to integrate datasets to identify common and novel cell types across experiments remains a challenge. Here, we introduce a cluster-to-cluster cell type matching method\u2014FR-Match\u2014that utilizes supervised feature selection for dimensionality reduction and incorporates shared information among cells to determine whether two cell type clusters share the same underlying multivariate gene expression distribution. FR-Match is benchmarked with existing cell-to-cell and cell-to-cluster cell type matching methods using both simulated and real scRNAseq data. FR-Match proved to be a stringent method that produced fewer erroneous matches of distinct cell subtypes and had the unique ability to identify novel cell phenotypes in new datasets. In silico validation demonstrated that the proposed workflow is the only self-contained algorithm that was robust to increasing numbers of true negatives (i.e. non-represented cell types). FR-Match was applied to two human brain scRNAseq datasets sampled from cortical layer 1 and full thickness middle temporal gyrus. When mapping cell types identified in specimens isolated from these overlapping human brain regions, FR-Match precisely recapitulated the laminar characteristics of matched cell type clusters, reflecting their distinct neuroanatomical distributions. An R package and Shiny application are provided at https:\/\/github.com\/JCVenterInstitute\/FRmatch for users to interactively explore and match scRNAseq cell type clusters with complementary visualization tools.<\/jats:p>","DOI":"10.1093\/bib\/bbaa339","type":"journal-article","created":{"date-parts":[[2020,10,28]],"date-time":"2020-10-28T20:12:02Z","timestamp":1603915922000},"source":"Crossref","is-referenced-by-count":13,"title":["FR-Match: robust matching of cell type clusters from single cell RNA sequencing data using the Friedman\u2013Rafsky non-parametric test"],"prefix":"10.1093","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2707-5881","authenticated-orcid":false,"given":"Yun","family":"Zhang","sequence":"first","affiliation":[{"name":"Staff Scientist and Biostatistician in the Informatics Department at the J. Craig Venter Institute"}]},{"given":"Brian D","family":"Aevermann","sequence":"additional","affiliation":[{"name":"Senior Bioinformatics Analyst in the Informatics Department at the J. Craig Venter Institute"}]},{"given":"Trygve E","family":"Bakken","sequence":"additional","affiliation":[{"name":"Allen Institute for Brain Science"}]},{"given":"Jeremy A","family":"Miller","sequence":"additional","affiliation":[{"name":"Allen Institute for Brain Science"}]},{"given":"Rebecca D","family":"Hodge","sequence":"additional","affiliation":[{"name":"Allen Institute for Brain Science"}]},{"given":"Ed S","family":"Lein","sequence":"additional","affiliation":[{"name":"Allen Institute for Brain Science"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1355-892X","authenticated-orcid":false,"given":"Richard H","family":"Scheuermann","sequence":"additional","affiliation":[{"name":"J. Craig Venter Institute La Jolla Campus, an Adjunct Professor of Pathology at the University of California San Diego, and an Adjunct Professor at the La Jolla Institute for Immunology"}]}],"member":"286","published-online":{"date-parts":[[2020,11,30]]},"reference":[{"key":"2021072117010998300_ref1","doi-asserted-by":"crossref","DOI":"10.7554\/eLife.27041","article-title":"The human cell atlas","volume":"6","author":"Regev","year":"2017","journal-title":"Elife"},{"issue":"11","key":"2021072117010998300_ref2","doi-asserted-by":"crossref","first-page":"839","DOI":"10.1038\/s41592-018-0210-0","volume":"15","author":"The impact of the NIH BRAIN Initiative","year":"2018","journal-title":"Nat Methods"},{"key":"2021072117010998300_ref3","article-title":"Production of a preliminary quality control pipeline for single nuclei Rna-Seq and its application in the analysis of cell type diversity of post-mortem human brain neocortex","volume":"22","author":"Aevermann","year":"2017","journal-title":"Pac Symp Biocomput"},{"key":"2021072117010998300_ref4","doi-asserted-by":"crossref","DOI":"10.1186\/s13059-016-0888-1","article-title":"Classification of low quality cells from single-cell RNA-seq data","volume":"17","author":"Ilicic","year":"2016","journal-title":"Genome Biol"},{"issue":"7","key":"2021072117010998300_ref5","doi-asserted-by":"crossref","first-page":"1160","DOI":"10.1101\/gr.110882.110","article-title":"Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq","volume":"21","author":"Islam","year":"2011","journal-title":"Genome Res"},{"issue":"8","key":"2021072117010998300_ref6","doi-asserted-by":"crossref","first-page":"907","DOI":"10.1038\/s41587-019-0201-4","article-title":"Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype","volume":"37","author":"Kim","year":"2019","journal-title":"Nat Biotechnol"},{"issue":"1","key":"2021072117010998300_ref7","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1093\/bioinformatics\/bts635","article-title":"STAR: ultrafast universal RNA-seq aligner","volume":"29","author":"Dobin","year":"2013","journal-title":"Bioinformatics"},{"issue":"14","key":"2021072117010998300_ref8","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with burrows-wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"issue":"3","key":"2021072117010998300_ref9","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1038\/nbt.3122","article-title":"StringTie enables improved reconstruction of a transcriptome from RNA-seq reads","volume":"33","author":"Pertea","year":"2015","journal-title":"Nat Biotechnol"},{"issue":"10","key":"2021072117010998300_ref10","doi-asserted-by":"crossref","DOI":"10.1088\/1742-5468\/2008\/10\/P10008","article-title":"Fast unfolding of communities in large networks","volume":"2008","author":"Blondel","year":"2008","journal-title":"J Stat Mech Theory Exp"},{"issue":"1","key":"2021072117010998300_ref11","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/s13059-017-1382-0","article-title":"SCANPY: large-scale single-cell gene expression data analysis","volume":"19","author":"Wolf","year":"2018","journal-title":"Genome Biol"},{"issue":"5","key":"2021072117010998300_ref12","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1038\/nmeth.4236","article-title":"SC3: consensus clustering of single-cell RNA-seq data","volume":"14","author":"Kiselev","year":"2017","journal-title":"Nat Methods"},{"issue":"12","key":"2021072117010998300_ref13","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0209648","article-title":"Single-nucleus and single-cell transcriptomes compared in matched cortical cell types","volume":"13","author":"Bakken","year":"2018","journal-title":"PLoS One"},{"issue":"R1","key":"2021072117010998300_ref14","doi-asserted-by":"crossref","first-page":"R40","DOI":"10.1093\/hmg\/ddy100","article-title":"Cell type discovery using single-cell transcriptomics: implications for ontological representation","volume":"27","author":"Aevermann","year":"2018","journal-title":"Hum Mol Genet"},{"issue":"Suppl 17","key":"2021072117010998300_ref15","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1186\/s12859-017-1977-1","article-title":"Cell type discovery and representation in the era of high-content single cell phenotyping","volume":"18","author":"Bakken","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2021072117010998300_ref16","article-title":"Pooling across cells to normalize single-cell RNA sequencing data with many zero counts","volume":"17","author":"Lun","year":"2016","journal-title":"Genome Biol"},{"issue":"6","key":"2021072117010998300_ref17","doi-asserted-by":"crossref","first-page":"584","DOI":"10.1038\/nmeth.4263","article-title":"SCnorm: robust normalization of single-cell RNA-seq data","volume":"14","author":"Bacher","year":"2017","journal-title":"Nat Methods"},{"issue":"5","key":"2021072117010998300_ref18","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1038\/nbt.4091","article-title":"Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors","volume":"36","author":"Haghverdi","year":"2018","journal-title":"Nat Biotechnol"},{"issue":"3","key":"2021072117010998300_ref19","doi-asserted-by":"crossref","first-page":"964","DOI":"10.1093\/bioinformatics\/btz625","article-title":"BBKNN: fast batch alignment of single cell transcriptomes","volume":"36","author":"Polanski","year":"2020","journal-title":"Bioinformatics"},{"issue":"5","key":"2021072117010998300_ref20","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1038\/nmeth.4644","article-title":"Scmap: projection of single-cell RNA-seq data across data sets","volume":"15","author":"Kiselev","year":"2018","journal-title":"Nat Methods"},{"key":"2021072117010998300_ref21","doi-asserted-by":"crossref","DOI":"10.1016\/j.cell.2019.05.031","article-title":"Comprehensive integration of single-cell data","volume":"177","author":"Stuart","year":"2019","journal-title":"Cell"},{"issue":"5","key":"2021072117010998300_ref22","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1038\/nbt.4096","article-title":"Integrating single-cell transcriptomic data across different conditions, technologies, and species","volume":"36","author":"Butler","year":"2018","journal-title":"Nat Biotechnol"},{"issue":"6","key":"2021072117010998300_ref23","doi-asserted-by":"crossref","first-page":"685","DOI":"10.1038\/s41587-019-0113-3","article-title":"Efficient integration of heterogeneous single-cell transcriptomes using Scanorama","volume":"37","author":"Hie","year":"2019","journal-title":"Nat Biotechnol"},{"issue":"20","key":"2021072117010998300_ref24","doi-asserted-by":"crossref","first-page":"9775","DOI":"10.1073\/pnas.1820006116","article-title":"scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets","volume":"116","author":"Lin","year":"2019","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"1","key":"2021072117010998300_ref25","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1186\/s13059-019-1766-4","article-title":"scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data","volume":"20","author":"Johansen","year":"2019","journal-title":"Genome Biol"},{"key":"2021072117010998300_ref26","doi-asserted-by":"crossref","first-page":"697","DOI":"10.1214\/aos\/1176344722","article-title":"Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests","volume":"7","author":"Friedman","year":"1979","journal-title":"Ann Stat"},{"issue":"1","key":"2021072117010998300_ref27","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1002\/cyto.a.22735","article-title":"Mapping cell populations in flow cytometry data for cross-sample comparison using the Friedman-Rafsky test statistic as a distance measure","volume":"89","author":"Hsiao","year":"2016","journal-title":"Cytometry A"},{"issue":"9","key":"2021072117010998300_ref28","doi-asserted-by":"crossref","first-page":"1185","DOI":"10.1038\/s41593-018-0205-2","article-title":"Transcriptomic and morphophysiological evidence for a specialized human cortical GABAergic cell type","volume":"21","author":"Boldog","year":"2018","journal-title":"Nat Neurosci"},{"issue":"7772","key":"2021072117010998300_ref29","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1038\/s41586-019-1506-7","article-title":"Conserved cell types with divergent features in human versus mouse cortex","volume":"573","author":"Hodge","year":"2019","journal-title":"Nature"},{"key":"2021072117010998300_ref30","doi-asserted-by":"crossref","DOI":"10.1101\/2020.09.23.308932","article-title":"NS-Forest: a machine learning method for the objective identification of minimum marker gene combinations for cell type determination from single cell RNA sequencing","volume-title":"et\u00a0al","author":"Aevermann","year":"2020"},{"issue":"12","key":"2021072117010998300_ref31","doi-asserted-by":"crossref","first-page":"1289","DOI":"10.1038\/s41592-019-0619-0","article-title":"Fast, sensitive and accurate integration of single-cell data with harmony","volume":"16","author":"Korsunsky","year":"2019","journal-title":"Nat Methods"},{"issue":"7","key":"2021072117010998300_ref32","doi-asserted-by":"crossref","first-page":"1873","DOI":"10.1016\/j.cell.2019.05.006","article-title":"Single-cell multi-omic integration compares and contrasts features of brain cell identity","volume":"177","author":"Welch","year":"2019","journal-title":"Cell"},{"issue":"1","key":"2021072117010998300_ref33","doi-asserted-by":"crossref","DOI":"10.1186\/s13059-019-1850-9","article-title":"A benchmark of batch-effect correction methods for single-cell RNA sequencing data","volume":"21","author":"Tran","year":"2020","journal-title":"Genome Biol"},{"issue":"1","key":"2021072117010998300_ref34","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1038\/nprot.2014.006","article-title":"Full-length RNA-seq from single cells using smart-seq2","volume":"9","author":"Picelli","year":"2014","journal-title":"Nat Protoc"},{"key":"2021072117010998300_ref35","doi-asserted-by":"crossref","DOI":"10.1109\/TCBB.2018.2848633","article-title":"Comparison of computational methods for imputing single-cell RNA-sequencing data","volume":"17","author":"Zhang","year":"2020","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2021072117010998300_ref36","doi-asserted-by":"crossref","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat Commun"},{"issue":"6","key":"2021072117010998300_ref37","doi-asserted-by":"crossref","first-page":"565","DOI":"10.1038\/nmeth.4292","article-title":"Normalizing single-cell RNA sequencing data: challenges and opportunities","volume":"14","author":"Vallejos","year":"2017","journal-title":"Nat Methods"},{"issue":"39","key":"2021072117010998300_ref38","doi-asserted-by":"crossref","first-page":"11046","DOI":"10.1073\/pnas.1612826113","article-title":"High-throughput single-cell gene-expression profiling with multiplexed error-robust fluorescence in situ hybridization","volume":"113","author":"Moffitt","year":"2016","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"2","key":"2021072117010998300_ref39","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1016\/j.neuron.2016.10.001","article-title":"In situ transcription profiling of single cells reveals spatial Organization of Cells in the mouse hippocampus","volume":"92","author":"Shah","year":"2016","journal-title":"Neuron"},{"issue":"7770","key":"2021072117010998300_ref40","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1038\/d41586-019-02477-9","article-title":"Starfish enterprise: finding RNA patterns in single cells","volume":"572","author":"Perkel","year":"2019","journal-title":"Nature"},{"key":"2021072117010998300_ref41","volume-title":"Modern Statistics for Modern Biology","author":"Holmes","year":"2018"},{"issue":"4","key":"2021072117010998300_ref42","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1003531","article-title":"Waste not, want not: why rarefying microbiome data is inadmissible","volume":"10","author":"McMurdie","year":"2014","journal-title":"PLoS Comput Biol"},{"key":"2021072117010998300_ref43","doi-asserted-by":"crossref","first-page":"1165","DOI":"10.1214\/aos\/1013699998","article-title":"The control of the false discovery rate in multiple testing under dependency","volume":"29","author":"Benjamini","year":"2001","journal-title":"Ann Stat"},{"issue":"3","key":"2021072117010998300_ref44","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1038\/nprot.2016.015","article-title":"Using single nuclei for RNA-seq to capture the transcriptome of postmortem neurons","volume":"11","author":"Krishnaswami","year":"2016","journal-title":"Nat Protoc"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/4\/bbaa339\/39136349\/bbaa339.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/4\/bbaa339\/39136349\/bbaa339.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,16]],"date-time":"2024-08-16T15:38:14Z","timestamp":1723822694000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaa339\/6009333"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,30]]},"references-count":44,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,7,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaa339","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,7]]},"published":{"date-parts":[[2020,11,30]]},"article-number":"bbaa339"}}