{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T08:01:45Z","timestamp":1772697705695,"version":"3.50.1"},"reference-count":24,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2021,4,8]],"date-time":"2021-04-08T00:00:00Z","timestamp":1617840000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"UCLouvain"},{"name":"Duve Institute"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,10,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Identifying rare subpopulations of cells is a critical step in order to extract knowledge from single-cell expression data, especially when the available data is limited and rare subpopulations only contain a few cells. In this paper, we present a data mining method to identify small subpopulations of cells that present highly specific expression profiles. This objective is formalized as a constrained optimization problem that jointly identifies a small group of cells and a corresponding subset of specific genes. The proposed method extends the max-sum submatrix problem to yield genes that are, for instance, highly expressed inside a small number of cells, but have a low expression in the remaining ones.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We show through controlled experiments on scRNA-seq data that the MicroCellClust method achieves a high F1 score to identify rare subpopulations of artificially planted human T cells. The effectiveness of MicroCellClust is confirmed as it reveals a subpopulation of CD4 T cells with a specific phenotype from breast cancer samples, and a subpopulation linked to a specific stage in the cell cycle from breast cancer samples as well. Finally, three rare subpopulations in mouse embryonic stem cells are also identified with MicroCellClust. These results illustrate the proposed method outperforms typical alternatives at identifying small subsets of cells with highly specific expression profiles.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availabilityand implementation<\/jats:title><jats:p>The R and Scala implementation of MicroCellClust is freely available on GitHub, at https:\/\/github.com\/agerniers\/MicroCellClust\/ The data underlying this article are available on Zenodo, at https:\/\/dx.doi.org\/10.5281\/zenodo.4580332.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab239","type":"journal-article","created":{"date-parts":[[2021,4,7]],"date-time":"2021-04-07T11:29:26Z","timestamp":1617794966000},"page":"3220-3227","source":"Crossref","is-referenced-by-count":15,"title":["MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7968-6978","authenticated-orcid":false,"given":"Alexander","family":"Gerniers","sequence":"first","affiliation":[{"name":"ICTEAM\/INGI\/Artificial Intelligence and Algorithms Group, UCLouvain , Louvain-la-Neuve 1348, Belgium"}]},{"given":"Orian","family":"Bricard","sequence":"additional","affiliation":[{"name":"de Duve Institute, UCLouvain , Brussels 1200, Belgium"}]},{"given":"Pierre","family":"Dupont","sequence":"additional","affiliation":[{"name":"ICTEAM\/INGI\/Artificial Intelligence and Algorithms Group, UCLouvain , Louvain-la-Neuve 1348, Belgium"}]}],"member":"286","published-online":{"date-parts":[[2021,4,8]]},"reference":[{"key":"2023051608273542400_btab239-B1","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene Ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet"},{"key":"2023051608273542400_btab239-B2","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1007\/978-3-319-78680-3_5","volume-title":"New Frontiers in Mining Complex Patterns","author":"Branders","year":"2018"},{"key":"2023051608273542400_btab239-B3","doi-asserted-by":"crossref","first-page":"625","DOI":"10.1186\/s12859-019-3289-0","article-title":"Identifying gene-specific subgroups: an alternative to biclustering","volume":"20","author":"Branders","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023051608273542400_btab239-B4","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1186\/s12859-020-3482-1","article-title":"Giniclust3: a fast and memory-efficient tool for rare cell type identification","volume":"21","author":"Dong","year":"2020","journal-title":"BMC Bioinformatics"},{"key":"2023051608273542400_btab239-B5","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1038\/nature14966","article-title":"Single-cell messenger RNA sequencing reveals rare intestinal cell types","volume":"525","author":"Gr\u00fcn","year":"2015","journal-title":"Nature"},{"key":"2023051608273542400_btab239-B6","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1038\/nmeth.4662","article-title":"FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data","volume":"15","author":"Herman","year":"2018","journal-title":"Nat. Methods"},{"key":"2023051608273542400_btab239-B7","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1186\/s13059-016-1010-4","article-title":"GiniClust: detecting rare cell types from single-cell gene expression data with Gini index","volume":"17","author":"Jiang","year":"2016","journal-title":"Genome Biol"},{"key":"2023051608273542400_btab239-B8","doi-asserted-by":"crossref","first-page":"4719","DOI":"10.1038\/s41467-018-07234-6","article-title":"Discovery of rare cells from voluminous single cell expression data","volume":"9","author":"Jindal","year":"2018","journal-title":"Nat. Commun"},{"key":"2023051608273542400_btab239-B9","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1038\/nmeth.4236","article-title":"SC3: consensus clustering of single-cell RNA-seq data","volume":"14","author":"Kiselev","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051608273542400_btab239-B10","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1038\/s41576-018-0088-9","article-title":"Challenges in unsupervised clustering of single-cell rna-seq data","volume":"20","author":"Kiselev","year":"2019","journal-title":"Nat. Rev. Genet"},{"key":"2023051608273542400_btab239-B499351365","author":"Klein","year":"2015","journal-title":"Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells"},{"key":"2023051608273542400_btab239-B11","doi-asserted-by":"crossref","first-page":"1187","DOI":"10.1016\/j.cell.2015.04.044","article-title":"Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells","volume":"161","author":"Klein","year":"2015","journal-title":"Cell"},{"key":"2023051608273542400_btab239-B12","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten","year":"2008","journal-title":"J. Mach. Learn. Res"},{"key":"2023051608273542400_btab239-B13","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nature11244","article-title":"Embryonic stem cell potency fluctuates with endogenous retrovirus activity","volume":"487","author":"Macfarlan","year":"2012","journal-title":"Nature"},{"key":"2023051608273542400_btab239-B14","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1038\/nprot.2014.006","article-title":"Full-length RNA-seq from single cells using Smart-seq2","volume":"9","author":"Picelli","year":"2014","journal-title":"Nat. Protoc"},{"key":"2023051608273542400_btab239-B15","doi-asserted-by":"crossref","first-page":"1122","DOI":"10.1016\/j.immuni.2016.10.032","article-title":"Regulatory T cells exhibit distinct features in human breast cancer","volume":"45","author":"Plitas","year":"2016","journal-title":"Immunity"},{"key":"2023051608273542400_btab239-B16","doi-asserted-by":"crossref","first-page":"D330","DOI":"10.1093\/nar\/gky1055","article-title":"The Gene Ontology resource: 20 years and still GOing strong","volume":"47","year":"2019","journal-title":"Nucleic Acids Research"},{"key":"2023051608273542400_btab239-B17","doi-asserted-by":"crossref","first-page":"1451","DOI":"10.1111\/febs.14613","article-title":"Computational approaches for high-throughput single-cell data analysis","volume":"286","author":"Todorov","year":"2019","journal-title":"FEBS J"},{"key":"2023051608273542400_btab239-B18","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/0169-7439(87)80084-9","article-title":"Principal component analysis","volume":"2","author":"Wold","year":"1987","journal-title":"Chemometr. Intell. Lab. Syst"},{"key":"2023051608273542400_btab239-B19","doi-asserted-by":"crossref","first-page":"1450","DOI":"10.1093\/bib\/bby014","article-title":"It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data","volume":"20","author":"Xie","year":"2019","journal-title":"Brief. Bioinf"},{"key":"2023051608273542400_btab239-B20","doi-asserted-by":"crossref","first-page":"lqaa082","DOI":"10.1093\/nargab\/lqaa082","article-title":"scAIDE: clustering of large-scale single-cell RNA-seq data reveals putative and rare cell types","volume":"2","author":"Xie","year":"2020","journal-title":"NAR Genomics Bioinf"},{"key":"2023051608273542400_btab239-B0102566","author":"Zheng","year":"2017","journal-title":"50%:50% Jurkat:293T Cell Mixture"},{"key":"2023051608273542400_btab239-B21","doi-asserted-by":"crossref","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat. Commun"},{"key":"2023051608273542400_btab239-B22","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1016\/j.molcel.2017.01.023","article-title":"Comparative analysis of single-cell RNA sequencing methods","volume":"65","author":"Ziegenhain","year":"2017","journal-title":"Mol. Cell"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab239\/38523471\/btab239.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/19\/3220\/50338367\/btab239.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/19\/3220\/50338367\/btab239.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,27]],"date-time":"2024-08-27T20:01:57Z","timestamp":1724788917000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/19\/3220\/6217360"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,4,8]]},"references-count":24,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2021,10,11]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab239","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,10,1]]},"published":{"date-parts":[[2021,4,8]]}}}