{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T02:16:01Z","timestamp":1781144161365,"version":"3.54.1"},"reference-count":50,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2018,6,27]],"date-time":"2018-06-27T00:00:00Z","timestamp":1530057600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01GM115836"],"award-info":[{"award-number":["R01GM115836"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"University of Pittsburgh School of Medicine"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Genome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore, obtaining accurate cell\u2013cell similarities from scRNA-seq data is a critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis, few approaches exist that explicitly address this task. Furthermore, abundance and type of noise present in scRNA-seq datasets suggest that application of generic methods, or of methods developed for bulk RNA-seq data, is likely suboptimal.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here, we present RAFSIL, a random forest based approach to learn cell\u2013cell similarities from scRNA-seq data. RAFSIL implements a two-step procedure, where feature construction geared towards scRNA-seq data is followed by similarity learning. It is designed to be adaptable and expandable, and RAFSIL similarities can be used for typical exploratory data analysis tasks like dimension reduction, visualization and clustering. We show that our approach compares favorably with current methods across a diverse collection of datasets, and that it can be used to detect and highlight unwanted technical variation in scRNA-seq datasets in situations where other methods fail. Overall, RAFSIL implements a flexible approach yielding a useful tool that improves the analysis of scRNA-seq data.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The RAFSIL R package is available at www.kostkalab.net\/software.html<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty260","type":"journal-article","created":{"date-parts":[[2018,4,14]],"date-time":"2018-04-14T05:35:35Z","timestamp":1523684135000},"page":"i79-i88","source":"Crossref","is-referenced-by-count":46,"title":["Random forest based similarity learning for single cell RNA sequencing data"],"prefix":"10.1093","volume":"34","author":[{"given":"Maziyar Baran","family":"Pouyan","sequence":"first","affiliation":[{"name":"Department of Developmental Biology, University of Pittsburgh, Pittsburgh, PA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dennis","family":"Kostka","sequence":"additional","affiliation":[{"name":"Department of Developmental Biology, University of Pittsburgh, Pittsburgh, PA, USA"},{"name":"Department for Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2018,6,27]]},"reference":[{"key":"2023051604253514800_bty260-B1","first-page":"3625","article-title":"Psychrophilic proteases dramatically reduce single cell RNA-seq artifacts: a molecular atlas of kidney development","volume":"144","author":"Adam","year":"2017","journal-title":"Development"},{"key":"2023051604253514800_bty260-B2","author":"Arthur","year":"2007"},{"key":"2023051604253514800_bty260-B3","author":"Borchers","year":"2017"},{"key":"2023051604253514800_bty260-B4","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn"},{"key":"2023051604253514800_bty260-B5","author":"Breiman","year":"2003"},{"key":"2023051604253514800_bty260-B6","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1038\/nbt.3102","article-title":"Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells","volume":"33","author":"Buettner","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023051604253514800_bty260-B7","first-page":"1","article-title":"A dendrite method for cluster analysis","volume":"3","author":"Calinski","year":"1974","journal-title":"Commun. Stat"},{"key":"2023051604253514800_bty260-B8","doi-asserted-by":"crossref","first-page":"13.","DOI":"10.1186\/s13059-016-0881-8","article-title":"A survey of best practices for RNA-seq data analysis","volume":"17","author":"Conesa","year":"2016","journal-title":"Genome Biol"},{"key":"2023051604253514800_bty260-B9","doi-asserted-by":"crossref","first-page":"728","DOI":"10.1038\/ni.3437","article-title":"Innate-like functions of natural killer t cell subsets result from highly divergent gene programs","volume":"17","author":"Engel","year":"2016","journal-title":"Nat. Immunol"},{"key":"2023051604253514800_bty260-B10","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1007\/0-387-29362-0_12","volume-title":"Bioinformatics and Computational Biology Solutions Using R and Bioconductor","author":"Gentleman","year":"2005"},{"key":"2023051604253514800_bty260-B11","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.cell.2016.01.047","article-title":"Heterogeneity in oct4 and sox2 targets biases cell fate in 4-cell mouse embryos","volume":"165","author":"Goolam","year":"2016","journal-title":"Cell"},{"key":"2023051604253514800_bty260-B12","doi-asserted-by":"crossref","first-page":"251.","DOI":"10.1038\/nature14966","article-title":"Single-cell messenger RNA sequencing reveals rare intestinal cell types","volume":"525","author":"Gr\u00fcn","year":"2015","journal-title":"Nature"},{"key":"2023051604253514800_bty260-B13","author":"Guo","year":"2017"},{"key":"2023051604253514800_bty260-B14","doi-asserted-by":"crossref","first-page":"e1004575.","DOI":"10.1371\/journal.pcbi.1004575","article-title":"SINCERA: a pipeline for Single-Cell RNA-Seq profiling analysis","volume":"11","author":"Guo","year":"2015","journal-title":"PLoS Comput. Biol"},{"key":"2023051604253514800_bty260-B15","volume-title":"Neural Network Design","author":"Hagan","year":"1996"},{"key":"2023051604253514800_bty260-B16","author":"Hennig","year":"2018"},{"key":"2023051604253514800_bty260-B17","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J. Classification"},{"key":"2023051604253514800_bty260-B18","volume-title":"pcaMethods: A collection of PCA methods","author":"Kiselev","year":"2017"},{"key":"2023051604253514800_bty260-B19","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1038\/nmeth.4236","article-title":"SC3: consensus clustering of single-cell RNA-seq data","volume":"14","author":"Kiselev","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051604253514800_bty260-B21","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1016\/j.stem.2015.09.011","article-title":"Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation","volume":"17","author":"Kolodziejczyk","year":"2015","journal-title":"Cell Stem Cell"},{"key":"2023051604253514800_bty260-B22","author":"Krijthe","year":"2015"},{"key":"2023051604253514800_bty260-B23","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1242\/dev.133058","article-title":"Understanding development and stem cells using single cell-based analyses of gene expression","volume":"144","author":"Kumar","year":"2017","journal-title":"Development"},{"key":"2023051604253514800_bty260-B24","doi-asserted-by":"crossref","first-page":"2626","DOI":"10.1093\/bioinformatics\/bth294","article-title":"A statistical framework for genomic data fusion","volume":"20","author":"Lanckriet","year":"2004","journal-title":"Bioinformatics"},{"key":"2023051604253514800_bty260-B25","first-page":"1","article-title":"Oscope: a statistical pipeline for identifying oscillatory genes in unsynchronized single cell RNA-seq experiments","volume":"1","author":"Leng","year":"2015","journal-title":"gene"},{"key":"2023051604253514800_bty260-B26","author":"Liaw","year":"2017"},{"key":"2023051604253514800_bty260-B27","doi-asserted-by":"crossref","first-page":"e156","DOI":"10.1093\/nar\/gkx681","article-title":"Using neural networks for reducing the dimensions of single-cell RNA-seq data","volume":"45","author":"Lin","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023051604253514800_bty260-B28","author":"Mouselimis","year":"2017"},{"key":"2023051604253514800_bty260-B29","doi-asserted-by":"crossref","first-page":"1396","DOI":"10.1126\/science.1254257","article-title":"Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma","volume":"344","author":"Patel","year":"2014","journal-title":"Science"},{"key":"2023051604253514800_bty260-B30","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/nbt.2967","article-title":"Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex","volume":"32","author":"Pollen","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023051604253514800_bty260-B31","doi-asserted-by":"crossref","first-page":"1172","DOI":"10.1109\/JBHI.2016.2565561","article-title":"Clustering single-cell expression data using random forest graphs","volume":"21","author":"Pouyan","year":"2017","journal-title":"IEEE J. Biomed. Health Inform"},{"key":"2023051604253514800_bty260-B32","volume-title":"R: A Language and Environment for Statistical Computing.","author":"R Core Team","year":"2017"},{"key":"2023051604253514800_bty260-B33","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1038\/s41564-017-0062-x","article-title":"Detecting macroecological patterns in bacterial communities across independent studies of global soils","volume":"3","author":"Ramirez","year":"2018","journal-title":"Nat. Microbiol"},{"key":"2023051604253514800_bty260-B34","doi-asserted-by":"crossref","first-page":"1262","DOI":"10.1038\/nature03672","article-title":"Global histone modification patterns predict risk of prostate cancer recurrence","volume":"435","author":"Seligson","year":"2005","journal-title":"Nature"},{"key":"2023051604253514800_bty260-B35","first-page":"118","article-title":"Unsupervised learning with random forest predictors","author":"Shi","year":"2006"},{"key":"2023051604253514800_bty260-B36","doi-asserted-by":"crossref","first-page":"1164","DOI":"10.1093\/bioinformatics\/btm069","article-title":"pcamethods\u2014a bioconductor package providing pca methods for incomplete data","volume":"23","author":"Stacklies","year":"2007","journal-title":"Bioinformatics"},{"key":"2023051604253514800_bty260-B37","first-page":"583","article-title":"Cluster ensembles\u2014a knowledge reuse framework for combining multiple partitions","volume":"3","author":"Strehl","year":"2002","journal-title":"J. Mach. Learn. Res"},{"key":"2023051604253514800_bty260-B38","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1038\/nmeth.4220","article-title":"Power analysis of single-cell RNA-sequencing experiments","volume":"14","author":"Svensson","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051604253514800_bty260-B39","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1007\/BF02289263","article-title":"Who belongs in the family?","volume":"18","author":"Thorndike","year":"1953","journal-title":"Psychometrika"},{"key":"2023051604253514800_bty260-B40","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1111\/1467-9868.00196","article-title":"Probabilistic principal component analysis","volume":"61","author":"Tipping","year":"1999","journal-title":"J. R. Stat. Soc. B"},{"key":"2023051604253514800_bty260-B41","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1038\/nature13173","article-title":"Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq","volume":"509","author":"Treutlein","year":"2014","journal-title":"Nature"},{"key":"2023051604253514800_bty260-B42","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1038\/nn.3881","article-title":"Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing","volume":"18","author":"Usoskin","year":"2015","journal-title":"Nat. Neurosci"},{"key":"2023051604253514800_bty260-B43","first-page":"66","article-title":"Dimensionality reduction: a comparative","volume":"10","author":"van der Maaten","year":"2009","journal-title":"J. Mach. Learn. Res"},{"key":"2023051604253514800_bty260-B44","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten","year":"2008","journal-title":"JLMR"},{"key":"2023051604253514800_bty260-B45","first-page":"2837","article-title":"Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance","volume":"11","author":"Vinh","year":"2010","journal-title":"J. Mach. Learn. Res"},{"key":"2023051604253514800_bty260-B46","author":"Wang","year":"2017"},{"key":"2023051604253514800_bty260-B47","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1038\/nmeth.4207","article-title":"Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning","volume":"14","author":"Wang","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051604253514800_bty260-B48","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1016\/j.csda.2013.04.010","article-title":"Cluster forests","volume":"66","author":"Yan","year":"2013","journal-title":"Comput. Stat. Data Anal"},{"key":"2023051604253514800_bty260-B49","doi-asserted-by":"crossref","first-page":"84.","DOI":"10.1186\/s13059-017-1218-y","article-title":"Challenges and emerging directions in single-cell analysis","volume":"18","author":"Yuan","year":"2017","journal-title":"Genome Biol"},{"key":"2023051604253514800_bty260-B50","author":"\u017durauskien\u0117","year":"2015"},{"key":"2023051604253514800_bty260-B51","doi-asserted-by":"crossref","first-page":"140.","DOI":"10.1186\/s12859-016-0984-y","article-title":"pcareduce: hierarchical clustering of single cell transcriptional profiles","volume":"17","author":"\u017durauskien\u0117","year":"2016","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i79\/50316368\/bioinformatics_34_13_i79.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i79\/50316368\/bioinformatics_34_13_i79.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T00:29:26Z","timestamp":1684196966000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/13\/i79\/5045788"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6,27]]},"references-count":50,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2018,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty260","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/258699","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,7,1]]},"published":{"date-parts":[[2018,6,27]]}}}