{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T16:40:45Z","timestamp":1775580045324,"version":"3.50.1"},"reference-count":54,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2022,1,6]],"date-time":"2022-01-06T00:00:00Z","timestamp":1641427200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,3,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Rapidly generated scRNA-seq datasets enable us to understand cellular differences and the function of each individual cell at single-cell resolution. Cell-type classification, which aims at characterizing and labeling groups of cells according to their gene expression, is one of the most important steps for single-cell analysis. To facilitate the manual curation process, supervised learning methods have been used to automatically classify cells. Most of the existing supervised learning approaches only utilize annotated cells in the training step while ignoring the more abundant unannotated cells. In this article, we proposed scPretrain, a multi-task self-supervised learning approach that jointly considers annotated and unannotated cells for cell-type classification. scPretrain consists of a pre-training step and a fine-tuning step. In the pre-training step, scPretrain uses a multi-task learning framework to train a feature extraction encoder based on each dataset\u2019s pseudo-labels, where only unannotated cells are used. In the fine-tuning step, scPretrain fine-tunes this feature extraction encoder using the limited annotated cells in a new dataset.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We evaluated scPretrain on 60 diverse datasets from different technologies, species and organs, and obtained a significant improvement on both cell-type classification and cell clustering. Moreover, the representations obtained by scPretrain in the pre-training step also enhanced the performance of conventional classifiers, such as random forest, logistic regression and support-vector machines. scPretrain is able to effectively utilize the massive amount of unlabeled data and be applied to annotating increasingly generated scRNA-seq datasets.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The data and code underlying this article are available in scPretrain: Multi-task self-supervised learning for cell type classification, at https:\/\/github.com\/ruiyi-zhang\/scPretrain and https:\/\/zenodo.org\/record\/5802306.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac007","type":"journal-article","created":{"date-parts":[[2022,1,4]],"date-time":"2022-01-04T07:09:28Z","timestamp":1641280168000},"page":"1607-1614","source":"Crossref","is-referenced-by-count":16,"title":["scPretrain: multi-task self-supervised learning for cell-type classification"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4776-6762","authenticated-orcid":false,"given":"Ruiyi","family":"Zhang","sequence":"first","affiliation":[{"name":"School of EECS, Peking University , Beijing, China"}]},{"given":"Yunan","family":"Luo","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Illinois at Urbana-Champaign , Urbana, IL, USA"}]},{"given":"Jianzhu","family":"Ma","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Purdue University , West Lafayette, IN, USA"},{"name":"Department of Biochemistry, Purdue University , West Lafayette, IN, USA"}]},{"given":"Ming","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of EECS, Peking University , Beijing, China"}]},{"given":"Sheng","family":"Wang","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science & Engineering, University of Washington , Seattle, WA, USA"}]}],"member":"286","published-online":{"date-parts":[[2022,1,6]]},"reference":[{"key":"2023020108585700400_btac007-B1","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1186\/s13059-019-1795-z","article-title":"A comparison of automatic cell identification methods for single-cell RNA sequencing data","volume":"20","author":"Abdelaal","year":"2019","journal-title":"Genome Biol"},{"key":"2023020108585700400_btac007-B2","doi-asserted-by":"crossref","first-page":"590","DOI":"10.1038\/s41586-020-2496-1","article-title":"A single-cell transcriptomic atlas characterizes ageing tissues in the mouse","volume":"583","author":"Almanzar","year":"2020","journal-title":"Nature"},{"key":"2023020108585700400_btac007-B3","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.cels.2016.08.011","article-title":"A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure","volume":"3","author":"Baron","year":"2016","journal-title":"Cell Syst"},{"key":"2023020108585700400_btac007-B4","doi-asserted-by":"crossref","first-page":"1200","DOI":"10.1038\/s41592-020-00979-3","article-title":"MARS: discovering novel cell types across heterogeneous single-cell experiments","volume":"17","author":"Brbi\u0107","year":"2020","journal-title":"Nat. Methods"},{"key":"2023020108585700400_btac007-B5","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1038\/nbt.4096","article-title":"Integrating single-cell transcriptomic data across different conditions, technologies, and species","volume":"36","author":"Butler","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023020108585700400_btac007-B6","doi-asserted-by":"crossref","first-page":"3458","DOI":"10.1038\/s41467-020-17281-7","article-title":"Searching large-scale scRNA-seq databases via unbiased cell embedding with Cell BLAST","volume":"11","author":"Cao","year":"2020","journal-title":"Nat. Commun"},{"key":"2023020108585700400_btac007-B7","article-title":"Deep clustering for unsupervised learning of visual features","author":"Caron","year":"2018","journal-title":"aECCV"},{"key":"2023020108585700400_btac007-B8","doi-asserted-by":"crossref","first-page":"775","DOI":"10.1093\/bioinformatics\/btaa908","article-title":"Single-cell RNA-seq data semi-supervised clustering and annotation via structural regularized domain adaptation","volume":"37","author":"Chen","year":"2021","journal-title":"Bioinformatics"},{"key":"2023020108585700400_btac007-B9","volume-title":"Proceedings of the 37th International Conference on Machine Learning","author":"Chen","year":"2020"},{"key":"2023020108585700400_btac007-B10","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1177\/001316446002000104","article-title":"A coefficient of agreement for nominal scales","volume":"20","author":"Cohen","year":"1960","journal-title":"Educ. Psychol. Meas"},{"key":"2023020108585700400_btac007-B11","doi-asserted-by":"crossref","first-page":"982","DOI":"10.1016\/j.cell.2018.05.057","article-title":"A single-cell transcriptome atlas of the aging drosophila brain","volume":"174","author":"Davie","year":"2018","journal-title":"Cell"},{"key":"2023020108585700400_btac007-B12","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2018","journal-title":"NAACL 2019."},{"key":"2023020108585700400_btac007-B13","first-page":"03994","article-title":"Cell type identification from single-cell transcriptomic data via semi-supervised learning","volume":"2005","author":"Dong","year":"2020","journal-title":"arXiv"},{"key":"2023020108585700400_btac007-B14","first-page":"201","volume-title":"Proceedings of Machine Learning Research. JMLR Workshop and Conference Proceedings","author":"Erhan","year":"2010"},{"key":"2023020108585700400_btac007-B15","article-title":"Self-supervised video representation learning with odd-one-out networks","author":"Fernando","year":"2016","journal-title":"CVPR 2017."},{"key":"2023020108585700400_btac007-B16","first-page":"1349","volume-title":"Proceedings of the IEEE International Conference on Computer Vision, Venice","author":"Gebru","year":"2017"},{"key":"2023020108585700400_btac007-B17","doi-asserted-by":"crossref","first-page":"266","DOI":"10.1016\/j.stem.2016.05.010","article-title":"De novo prediction of stem cell identity using single-cell transcriptome data","volume":"19","author":"Gr\u00fcn","year":"2016","journal-title":"Cell Stem Cell"},{"key":"2023020108585700400_btac007-B18","doi-asserted-by":"crossref","first-page":"675","DOI":"10.1016\/j.devcel.2010.02.012","article-title":"Resolution of cell fate decisions revealed by single-cell gene expression analysis from zygote to blastocyst","volume":"18","author":"Guo","year":"2010","journal-title":"Dev. Cell"},{"key":"2023020108585700400_btac007-B19","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1038\/nbt.4091","article-title":"Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors","volume":"36","author":"Haghverdi","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023020108585700400_btac007-B20","doi-asserted-by":"crossref","first-page":"1307","DOI":"10.1016\/j.cell.2018.05.012","article-title":"Mapping the mouse cell atlas by microwell-seq","volume":"173","author":"Han","year":"2018","journal-title":"Cell"},{"key":"2023020108585700400_btac007-B21","doi-asserted-by":"crossref","first-page":"4688","DOI":"10.1093\/bioinformatics\/btz292","article-title":"scMatch: a single-cell gene expression profile annotation tool using reference datasets","volume":"35","author":"Hou","year":"2019","journal-title":"Bioinformatics"},{"key":"2023020108585700400_btac007-B22","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1038\/s42256-020-00233-7","article-title":"Iterative transfer learning with neural network for clustering and cell type classification in single-cell RNA-seq analysis","volume":"2","author":"Hu","year":"2020","journal-title":"Nat. Mach. Intell"},{"key":"2023020108585700400_btac007-B23","article-title":"Strategies for pre-training graph neural networks","author":"Hu","year":"2019"},{"key":"2023020108585700400_btac007-B24","doi-asserted-by":"crossref","first-page":"2316","DOI":"10.1093\/bib\/bby076","article-title":"Impact of similarity metrics on single-cell RNA-seq data clustering","volume":"20","author":"Kim","year":"2019","journal-title":"Brief Bioinform"},{"key":"2023020108585700400_btac007-B25","doi-asserted-by":"crossref","DOI":"10.1101\/2020.06.04.132324","article-title":"scNym: semi-supervised adversarial neural networks for single cell classification","author":"Kimmel","year":"2020"},{"key":"2023020108585700400_btac007-B26","doi-asserted-by":"crossref","first-page":"1187","DOI":"10.1016\/j.cell.2015.04.044","article-title":"Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells","volume":"161","author":"Klein","year":"2015","journal-title":"Cell"},{"key":"2023020108585700400_btac007-B27","doi-asserted-by":"crossref","first-page":"2338","DOI":"10.1038\/s41467-020-15851-3","article-title":"Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis","volume":"11","author":"Li","year":"2020","journal-title":"Nat. Commun"},{"key":"2023020108585700400_btac007-B28","first-page":"8738","article-title":"A bottom-up clustering approach to unsupervised person re-identification","volume":"33","author":"Lin","year":"2019","journal-title":"Proc. Conf. AAAI Artif. Intell"},{"key":"2023020108585700400_btac007-B29","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"Lopez","year":"2018","journal-title":"Nat. Methods"},{"key":"2023020108585700400_btac007-B30","article-title":"Query to reference single-cell integration with transfer learning","author":"Lotfollahi","year":"2020","journal-title":"Nat Biotechnol"},{"key":"2023020108585700400_btac007-B31","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1093\/bioinformatics\/btz592","article-title":"ACTINN: automated identification of cell types in single cell RNA sequencing","volume":"36","author":"Ma","year":"2020","journal-title":"Bioinformatics"},{"key":"2023020108585700400_btac007-B32","article-title":"UMAP: Uniform Manifold Approximation and Projection for dimension reduction","author":"McInnes","year":"2018"},{"key":"2023020108585700400_btac007-B33","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1016\/j.cels.2016.09.002","article-title":"A single-cell transcriptome atlas of the human pancreas","volume":"3","author":"Muraro","year":"2016","journal-title":"Cell Syst"},{"key":"2023020108585700400_btac007-B34","article-title":"Fast batch alignment of single cell transcriptomes unifies multiple mouse cell atlases into an integrated landscape","volume-title":"Bioinformatics","author":"Park","year":"2020"},{"key":"2023020108585700400_btac007-B35","article-title":"Multi-task domain adaptation for sequence tagging","author":"Peng","year":"2016"},{"key":"2023020108585700400_btac007-B36","doi-asserted-by":"crossref","first-page":"983","DOI":"10.1038\/s41592-019-0535-3","article-title":"Supervised classification enables rapid annotation of cell atlases","volume":"16","author":"Pliner","year":"2019","journal-title":"Nat. Methods"},{"key":"2023020108585700400_btac007-B37","doi-asserted-by":"crossref","first-page":"964","DOI":"10.1093\/bioinformatics\/btz625","article-title":"BBKNN: fast batch alignment of single cell transcriptomes","volume":"36","author":"Pola\u0144ski","year":"2020","journal-title":"Bioinformatics"},{"key":"2023020108585700400_btac007-B38","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1080\/01621459.1971.10482356","article-title":"Objective criteria for the evaluation of clustering methods","volume":"66","author":"Rand","year":"1971","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020108585700400_btac007-B39","first-page":"762","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Ren","year":"2018"},{"key":"2023020108585700400_btac007-B40","doi-asserted-by":"crossref","first-page":"2539","DOI":"10.1093\/bioinformatics\/btx196","article-title":"Removal of batch effects using distribution-matching residual networks","volume":"33","author":"Shaham","year":"2017","journal-title":"Bioinformatics"},{"key":"2023020108585700400_btac007-B41","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1093\/ptj\/85.3.257","article-title":"The kappa statistic in reliability studies: use, interpretation, and sample size requirements","volume":"85","author":"Sim","year":"2005","journal-title":"Phys. Ther"},{"key":"2023020108585700400_btac007-B42","doi-asserted-by":"crossref","first-page":"1888","DOI":"10.1016\/j.cell.2019.05.031","article-title":"Comprehensive integration of single-cell data","volume":"177","author":"Stuart","year":"2019","journal-title":"Cell"},{"key":"2023020108585700400_btac007-B43","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1038\/s41586-018-0590-4","article-title":"Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris","volume":"562","year":"2018","journal-title":"Nature"},{"key":"2023020108585700400_btac007-B44","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1016\/j.cels.2019.06.004","article-title":"SingleCellNet: a computational tool to classify single cell RNA-Seq data across platforms and across species","volume":"9","author":"Tan","year":"2019","journal-title":"Cell Syst"},{"key":"2023020108585700400_btac007-B45","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1038\/nmeth.1315","article-title":"mRNA-Seq whole-transcriptome analysis of a single cell","volume":"6","author":"Tang","year":"2009","journal-title":"Nat. Methods"},{"key":"2023020108585700400_btac007-B46","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1186\/s13059-019-1850-9","article-title":"A benchmark of batch-effect correction methods for single-cell RNA sequencing data","volume":"21","author":"Tran","year":"2020","journal-title":"Genome Biol"},{"key":"2023020108585700400_btac007-B47","author":"Venkateswara","year":"2020"},{"key":"2023020108585700400_btac007-B48","doi-asserted-by":"crossref","article-title":"Moana: a robust and scalable cell type classification framework for single-cell RNA-Seq data","author":"Wagner","DOI":"10.1101\/456129"},{"key":"2023020108585700400_btac007-B49","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.neucom.2018.05.083","article-title":"Deep visual domain adaptation: a survey","volume":"312","author":"Wang","year":"2018","journal-title":"Neurocomputing"},{"key":"2023020108585700400_btac007-B50","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1186\/s13059-019-1764-6","article-title":"BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypes","volume":"20","author":"Wang","year":"2019","journal-title":"Genome Biol"},{"key":"2023020108585700400_btac007-B51","doi-asserted-by":"crossref","article-title":"Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models","author":"Xu","DOI":"10.15252\/msb.20209620"},{"key":"2023020108585700400_btac007-B52","first-page":"1007","article-title":"Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling","volume":"16","author":"Zhang","year":"2019","journal-title":"Nature"},{"key":"2023020108585700400_btac007-B53","doi-asserted-by":"crossref","first-page":"531","DOI":"10.3390\/genes10070531","article-title":"SCINA: a semi-supervised subtyping algorithm of single cells and bulk samples","volume":"10","author":"Zhang","year":"2019","journal-title":"Genes"},{"key":"2023020108585700400_btac007-B54","doi-asserted-by":"crossref","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat. Commun"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac007\/42423118\/btac007.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/6\/1607\/49008834\/btac007.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/6\/1607\/49008834\/btac007.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T15:34:18Z","timestamp":1675265658000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/6\/1607\/6499287"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,1,6]]},"references-count":54,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2022,3,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac007","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.11.18.386102","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,3,15]]},"published":{"date-parts":[[2022,1,6]]}}}