{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T21:46:37Z","timestamp":1774993597362,"version":"3.50.1"},"reference-count":19,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2025,3,27]],"date-time":"2025-03-27T00:00:00Z","timestamp":1743033600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,5,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Single-cell RNA sequencing (scRNA-seq) analysis relies heavily on effective clustering to facilitate numerous downstream applications. Although several machine learning methods have been developed to enhance single-cell clustering, most are fully unsupervised and overlook the rich repository of annotated datasets available from previous single-cell experiments. Since cells are inherently high-dimensional entities, unsupervised clustering can often result in clusters that lack biological relevance. Leveraging annotated scRNA-seq datasets as a reference can significantly enhance clustering performance, enabling the identification of biologically meaningful clusters in target datasets.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>In this article, we propose Single Cell MUlti-Source CLustering (scMUSCL), a novel transfer learning method designed to identify cell clusters in a target dataset by leveraging knowledge from multiple annotated reference datasets. scMUSCL employs a deep neural network to extract domain- and batch-invariant cell representations, effectively addressing discrepancies across various source datasets and between source and target datasets within the new representation space. Unlike existing methods, scMUSCL does not require prior knowledge of the number of clusters in the target dataset and eliminates the need for batch correction between source and target datasets. We conduct extensive experiments using 20 real-life datasets, demonstrating that scMUSCL consistently outperforms existing unsupervised and transfer learning-based methods. Furthermore, our experiments show that scMUSCL benefits from multiple source datasets as learning references and accurately estimates the number of clusters.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The Python implementation of scMUSCL is available at https:\/\/github.com\/arashkhoeini\/scMUSCL.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf137","type":"journal-article","created":{"date-parts":[[2025,3,30]],"date-time":"2025-03-30T18:33:27Z","timestamp":1743359607000},"source":"Crossref","is-referenced-by-count":1,"title":["scMUSCL: multi-source transfer learning for clustering scRNA-seq data"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-7021-6815","authenticated-orcid":false,"given":"Arash","family":"Khoeini","sequence":"first","affiliation":[{"name":"School of Computing Science, Simon Fraser University , Burnaby, British Columbia V5A 1S6,","place":["Canada"]}]},{"given":"Funda","family":"Sar","sequence":"additional","affiliation":[{"name":"Vancouver Prostate Centre , Vancouver, British Columbia V6H 3Z6,","place":["Canada"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4379-1231","authenticated-orcid":false,"given":"Yen-Yi","family":"Lin","sequence":"additional","affiliation":[{"name":"Vancouver Prostate Centre , Vancouver, British Columbia V6H 3Z6,","place":["Canada"]},{"name":"Department of Urologic Science, University of British Columbia , Vancouver, British Columbia V5Z 1M9,","place":["Canada"]}]},{"given":"Colin","family":"Collins","sequence":"additional","affiliation":[{"name":"Vancouver Prostate Centre , Vancouver, British Columbia V6H 3Z6,","place":["Canada"]},{"name":"Department of Urologic Science, University of British Columbia , Vancouver, British Columbia V5Z 1M9,","place":["Canada"]}]},{"given":"Martin","family":"Ester","sequence":"additional","affiliation":[{"name":"School of Computing Science, Simon Fraser University , Burnaby, British Columbia V5A 1S6,","place":["Canada"]},{"name":"Vancouver Prostate Centre , Vancouver, British Columbia V6H 3Z6,","place":["Canada"]}]}],"member":"286","published-online":{"date-parts":[[2025,3,27]]},"reference":[{"key":"2025051307513157200_btaf137-B1","doi-asserted-by":"crossref","first-page":"1200","DOI":"10.1038\/s41592-020-00979-3","article-title":"Mars: discovering novel cell types across heterogeneous single-cell experiments","volume":"17","author":"Brbi\u0107","year":"2020","journal-title":"Nat Methods"},{"key":"2025051307513157200_btaf137-B2","author":"Chen","year":"2020"},{"key":"2025051307513157200_btaf137-B3","doi-asserted-by":"crossref","first-page":"280","DOI":"10.1186\/s12859-021-04210-8","article-title":"Contrastive self-supervised clustering of scRNA-seq data","volume":"22","author":"Ciortan","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"2025051307513157200_btaf137-B4","doi-asserted-by":"crossref","first-page":"2749","DOI":"10.1038\/s41596-021-00534-0","article-title":"Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods","volume":"16","author":"Clarke","year":"2021","journal-title":"Nat Protoc"},{"key":"2025051307513157200_btaf137-B5","first-page":"632216","author":"Ding","year":"2019"},{"key":"2025051307513157200_btaf137-B6","doi-asserted-by":"crossref","first-page":"1196","DOI":"10.1038\/s41592-023-01933-9","article-title":"Significance analysis for clustering with single-cell RNA-sequencing data","volume":"20","author":"Grabski","year":"2023","journal-title":"Nat Methods"},{"key":"2025051307513157200_btaf137-B7","author":"Hinton","year":"2015"},{"key":"2025051307513157200_btaf137-B8","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1038\/s41587-021-01001-7","article-title":"Mapping single-cell data to reference atlases by transfer learning","volume":"40","author":"Lotfollahi","year":"2022","journal-title":"Nat Biotechnol"},{"key":"2025051307513157200_btaf137-B9","doi-asserted-by":"crossref","first-page":"e8746","DOI":"10.15252\/msb.20188746","article-title":"Current best practices in single-cell RNA-seq analysis: a tutorial","volume":"15","author":"Luecken","year":"2019","journal-title":"Mol Syst Biol"},{"key":"2025051307513157200_btaf137-B10","first-page":"24791","article-title":"Domain adaptation with invariant representation learning: what transformations to learn?","volume":"34","author":"Stojanov","year":"2021","journal-title":"Adv Neural Inf Process Syst"},{"key":"2025051307513157200_btaf137-B11","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1038\/s42256-019-0037-0","article-title":"Clustering single-cell RNA-seq data with a model-based deep learning approach","volume":"1","author":"Tian","year":"2019","journal-title":"Nat Mach Intell"},{"key":"2025051307513157200_btaf137-B12","author":"van den Oord","year":"2018"},{"key":"2025051307513157200_btaf137-B13","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1093\/bioinformatics\/btac011","article-title":"scname: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data","volume":"38","author":"Wan","year":"2022","journal-title":"Bioinformatics"},{"key":"2025051307513157200_btaf137-B14","doi-asserted-by":"crossref","first-page":"1700232","DOI":"10.1002\/pmic.201700232","article-title":"SIMLR: a tool for large-scale genomic analyses by multi-kernel learning","volume":"18","author":"Wang","year":"2018","journal-title":"Proteomics"},{"key":"2025051307513157200_btaf137-B15","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1186\/s12859-019-2599-6","article-title":"Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data","volume":"20","author":"Wang","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2025051307513157200_btaf137-B16","doi-asserted-by":"crossref","first-page":"1084","DOI":"10.1002\/cyto.a.23030","article-title":"Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data","volume":"89","author":"Weber","year":"2016","journal-title":"Cytometry A"},{"key":"2025051307513157200_btaf137-B17","doi-asserted-by":"crossref","first-page":"3281","DOI":"10.1016\/j.cell.2021.04.028","article-title":"Charting human development using a multi-endodermal organ atlas and organoid models","volume":"184","author":"Yu","year":"2021","journal-title":"Cell"},{"key":"2025051307513157200_btaf137-B18","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1261\/rna.078965.121","article-title":"Review of single-cell RNA-seq data clustering for cell-type identification and characterization","volume":"29","author":"Zhang","year":"2023","journal-title":"RNA"},{"key":"2025051307513157200_btaf137-B19","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1109\/JPROC.2020.3004555","article-title":"A comprehensive survey on transfer learning","volume":"109","author":"Zhuang","year":"2021","journal-title":"Proc IEEE"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/5\/btaf137\/62786142\/btaf137.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/5\/btaf137\/62786142\/btaf137.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,13]],"date-time":"2025-05-13T07:51:43Z","timestamp":1747122703000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf137\/8098047"}},"subtitle":[],"editor":[{"given":"Christina","family":"Kendziorski","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,3,27]]},"references-count":19,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,5,6]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf137","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.04.22.590645","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,5]]},"published":{"date-parts":[[2025,3,27]]},"article-number":"btaf137"}}