{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T16:59:20Z","timestamp":1776790760605,"version":"3.51.2"},"reference-count":46,"publisher":"IOP Publishing","issue":"3","license":[{"start":{"date-parts":[[2023,7,27]],"date-time":"2023-07-27T00:00:00Z","timestamp":1690416000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,7,27]],"date-time":"2023-07-27T00:00:00Z","timestamp":1690416000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/100008795","name":"Manitoba Medical Services Foundation","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100008795","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Canada Research Chairs Tier II Program"}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2023,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Cell type identification using single-cell RNA sequencing data is critical for understanding disease mechanisms and drug discovery. Cell clustering analysis has been widely studied in health research for rare tumor cell detection. In this study, we propose a Gaussian mixture model-based variational graph autoencoder on scRNA-seq data (scGMM-VGAE) that integrates a statistical clustering model to a deep learning algorithm to significantly improve the cell clustering performance. This model feeds a cell-cell graph adjacency matrix and a gene feature matrix into a graph variational autoencoder (VGAE) to generate latent data. These data are then used for cell clustering by the Gaussian mixture model (GMM) module. To optimize the algorithm, a designed loss function is derived by combining parameter estimates from the GMM and VGAE. We test the proposed method on four publicly available and three simulated datasets which contain many biological and technical zeros. The scGMM-VGAE outperforms four selected baseline methods on three evaluation metrics in cell clustering. By successfully incorporating GMM into deep learning VGAE on scRNA-seq data, the proposed method shows higher accuracy in cell clustering on scRNA-seq data. This improvement has a significant impact on detecting rare cell types in health research. All source codes used in this study can be found at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/ericlin1230\/scGMM-VGAE\" xlink:type=\"simple\">https:\/\/github.com\/ericlin1230\/scGMM-VGAE<\/jats:ext-link>.<\/jats:p>","DOI":"10.1088\/2632-2153\/acd7c3","type":"journal-article","created":{"date-parts":[[2023,5,22]],"date-time":"2023-05-22T22:41:08Z","timestamp":1684795268000},"page":"035013","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["scGMM-VGAE: a Gaussian mixture model-based variational graph autoencoder algorithm for clustering single-cell RNA-seq data"],"prefix":"10.1088","volume":"4","author":[{"given":"Eric","family":"Lin","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Boyuan","family":"Liu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Leann","family":"Lac","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daryl L X","family":"Fung","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7541-9127","authenticated-orcid":true,"given":"Carson K","family":"Leung","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9546-2245","authenticated-orcid":true,"given":"Pingzhao","family":"Hu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"266","published-online":{"date-parts":[[2023,7,27]]},"reference":[{"key":"mlstacd7c3bib1","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1016\/j.cels.2016.08.011","article-title":"A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure","volume":"3","author":"Baron","year":"2016","journal-title":"Cell Syst."},{"key":"mlstacd7c3bib2","doi-asserted-by":"publisher","first-page":"1468","DOI":"10.1093\/bioinformatics\/btz752","article-title":"SPARSim single cell: a count data simulator for scRNA-seq data","volume":"36","author":"Baruzzo","year":"2020","journal-title":"Bioinformatics"},{"key":"mlstacd7c3bib3","doi-asserted-by":"publisher","first-page":"2223","DOI":"10.1093\/bioinformatics\/btab085","article-title":"Normalization of single-cell RNA-seq counts by log (x + 1) or log(1 + x)","volume":"37","author":"Booeshaghi","year":"2021","journal-title":"Bioinformatics"},{"key":"mlstacd7c3bib4","doi-asserted-by":"publisher","first-page":"1277","DOI":"10.1093\/bioinformatics\/btab804","article-title":"CellVGAE: an unsupervised scRNA-seq analysis workflow with graph attention networks","volume":"38","author":"Buterez","year":"2021","journal-title":"Bioinformatics"},{"key":"mlstacd7c3bib5","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1038\/nbt.4096","article-title":"Integrating single-cell transcriptomic data across different conditions, technologies, and species","volume":"36","author":"Butler","year":"2018","journal-title":"Nat. Biotechnol."},{"key":"mlstacd7c3bib6","doi-asserted-by":"publisher","first-page":"173","DOI":"10.3389\/fcvm.2019.00173","article-title":"Single-cell RNA sequencing of the cardiovascular system: new looks for old diseases","volume":"6","author":"Chaudhry","year":"2019","journal-title":"Front. Cardiovasc. Med."},{"key":"mlstacd7c3bib7","doi-asserted-by":"publisher","first-page":"317","DOI":"10.3389\/fgene.2019.00317","article-title":"Single-cell RNA-seq technologies and related computational data analysis","volume":"10","author":"Chen","year":"2019","journal-title":"Front. Genet."},{"key":"mlstacd7c3bib8","doi-asserted-by":"publisher","first-page":"bbab236","DOI":"10.1093\/bib\/bbab236","article-title":"Consensus clustering of single-cell RNA-seq data by enhancing network affinity","volume":"22","author":"Cui","year":"2021","journal-title":"Brief. Bioinform."},{"key":"mlstacd7c3bib9","doi-asserted-by":"publisher","first-page":"7285","DOI":"10.1073\/pnas.1507125112","article-title":"A survey of human brain transcriptome diversity at the single cell level","volume":"112","author":"Darmanis","year":"2015","journal-title":"Proc. Natl Acad. Sci."},{"key":"mlstacd7c3bib10","doi-asserted-by":"publisher","first-page":"897","DOI":"10.1038\/nbt1406","article-title":"What is the expectation maximization algorithm?","volume":"26","author":"Do","year":"2008","journal-title":"Nat. Biotechnol."},{"key":"mlstacd7c3bib11","doi-asserted-by":"publisher","first-page":"206","DOI":"10.1016\/j.csda.2016.05.007","article-title":"A variational expectation-maximization algorithm for temporal data clustering","volume":"103","author":"El Assaad","year":"2016","journal-title":"Comput. Stat. Data Anal."},{"key":"mlstacd7c3bib12","doi-asserted-by":"publisher","first-page":"390","DOI":"10.1038\/s41467-018-07931-2","article-title":"Single-cell RNA-seq denoising using a deep count autoencoder","volume":"10","author":"Eraslan","year":"2019","journal-title":"Nat. Commun."},{"key":"mlstacd7c3bib13","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0151984","article-title":"Expectation-maximization binary clustering for behavioural annotation","volume":"11","author":"Garriga","year":"2016","journal-title":"PLoS One"},{"key":"mlstacd7c3bib14","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1007509","article-title":"Clustering-independent analysis of genomic data using spectral simplicial theory","volume":"15","author":"Govek","year":"2019","journal-title":"PLoS Comput. Biol."},{"key":"mlstacd7c3bib15","doi-asserted-by":"publisher","first-page":"43992","DOI":"10.1109\/ACCESS.2020.2977671","article-title":"Variational autoencoder with optimizing Gaussian mixture model priors","volume":"8","author":"Guo","year":"2020","journal-title":"IEEE Access"},{"key":"mlstacd7c3bib16","doi-asserted-by":"publisher","first-page":"3573","DOI":"10.1016\/j.cell.2021.04.048","article-title":"Integrated analysis of multimodal single-cell data","volume":"184","author":"Hao","year":"2021","journal-title":"Cell"},{"key":"mlstacd7c3bib17","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1186\/s13073-017-0467-4","article-title":"A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications","volume":"9","author":"Haque","year":"2017","journal-title":"Genome Med."},{"key":"mlstacd7c3bib18","doi-asserted-by":"publisher","first-page":"4215","DOI":"10.1609\/aaai.v34i04.5843","article-title":"Collaborative graph convolutional networks: unsupervised learning meets semi-supervised learning","volume":"vol 34","author":"Hui","year":"2020"},{"key":"mlstacd7c3bib19","doi-asserted-by":"crossref","DOI":"10.24963\/ijcai.2017\/273","article-title":"Variational deep embedding: an unsupervised and generative approach to clustering","author":"Jiang","year":"2017"},{"key":"mlstacd7c3bib20","article-title":"Auto-encoding variational Bayes","author":"Kingma","year":"2014"},{"key":"mlstacd7c3bib21","article-title":"Variational graph autoencoders","author":"Kipf","year":"2016"},{"key":"mlstacd7c3bib22","article-title":"Semi-supervised classification with graph convolutional networks","author":"Kipf","year":"2017"},{"key":"mlstacd7c3bib23","first-page":"101","article-title":"MIC: mutual information based hierarchical clustering","author":"Kraskov","year":"2009"},{"key":"mlstacd7c3bib24","doi-asserted-by":"publisher","first-page":"1253","DOI":"10.3389\/fgene.2019.01253","article-title":"Benchmark and parameter sensitivity analysis of single-cell RNA sequencing clustering methods","volume":"10","author":"Krzak","year":"2019","journal-title":"Front. Genet."},{"key":"mlstacd7c3bib25","doi-asserted-by":"publisher","DOI":"10.1142\/S0219720020400053","article-title":"Single-cell RNA-seq data clustering: a survey with performance comparison study","volume":"4","author":"Li","year":"2020","journal-title":"J. Bioinform. Comput. Biol."},{"key":"mlstacd7c3bib26","doi-asserted-by":"publisher","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"Lopez","year":"2018","journal-title":"Nat. Methods"},{"key":"mlstacd7c3bib27","author":"Malik","year":"2019","edition":"1st edn"},{"key":"mlstacd7c3bib28","doi-asserted-by":"publisher","first-page":"861","DOI":"10.21105\/joss.00861","article-title":"UMAP: uniform manifold approximation and projection","volume":"3","author":"McInnes","year":"2018","journal-title":"J. Open-source Softw."},{"key":"mlstacd7c3bib29","doi-asserted-by":"publisher","first-page":"355","DOI":"10.1146\/annurevstatistics031017-100325","article-title":"Finite mixture models","volume":"6","author":"McLachlan","year":"2019","journal-title":"Annu. Rev. Stat. Appl."},{"key":"mlstacd7c3bib30","first-page":"827","article-title":"Gaussian mixture models","author":"Reynolds","year":"2015"},{"key":"mlstacd7c3bib31","doi-asserted-by":"publisher","DOI":"10.7717\/peerj.12087","article-title":"SC-JNMF: single-cell clustering integrating multiple quantification methods based on joint non-negative matrix factorization","volume":"9","author":"Shiga","year":"2021","journal-title":"PeerJ"},{"key":"mlstacd7c3bib32","doi-asserted-by":"publisher","first-page":"80716","DOI":"10.1109\/ACCESS.2020.2988796","article-title":"Unsupervised k-means clustering algorithm","volume":"8","author":"Sinaga","year":"2020","journal-title":"IEEE Access"},{"key":"mlstacd7c3bib33","doi-asserted-by":"publisher","first-page":"1888","DOI":"10.1016\/j.cell.2019.05.031","article-title":"Comprehensive integration of single-cell data","volume":"177","author":"Stuart","year":"2019","journal-title":"Cell"},{"key":"mlstacd7c3bib34","doi-asserted-by":"publisher","first-page":"bbab034","DOI":"10.1093\/bib\/bbab034","article-title":"Accurate feature selection improves single-cell RNA-seq cell clustering","volume":"22","author":"Su","year":"2021","journal-title":"Brief. Bioinform."},{"key":"mlstacd7c3bib35","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1007\/s12626-021-00100-w","article-title":"Expectation-maximization (EM) clustering as a preprocessing method for clinical pathway mining","volume":"16","author":"Tsumoto","year":"2022","journal-title":"Rev. Socionetwork Strateg."},{"key":"mlstacd7c3bib36","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TNNLS.2021.3121224","article-title":"Fusion of centroid-based clustering with graph clustering: an expectation maximization-based hybrid clustering","author":"Uykan","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"mlstacd7c3bib37","doi-asserted-by":"publisher","first-page":"bbab345","DOI":"10.1093\/bib\/bbab345","article-title":"A comparison of deep learning-based pre-processing and clustering approaches for single-cell RNA sequencing data","volume":"23","author":"Wang","year":"2022","journal-title":"Brief. Bioinform."},{"key":"mlstacd7c3bib38","doi-asserted-by":"publisher","first-page":"2692","DOI":"10.1093\/bioinformatics\/btac168","article-title":"EDClust: an EM-MM hybrid method for cell clustering in multiple-subject single-cell RNA sequencing","volume":"38","author":"Wei","year":"2022","journal-title":"Bioinformatics"},{"key":"mlstacd7c3bib39","doi-asserted-by":"publisher","first-page":"12035","DOI":"10.1021\/acs.chemrev.0c01140","article-title":"Aptamer-Based Detection of Circulating Targets for Precision Medicine","volume":"121","author":"Wu","year":"2021","journal-title":"Chem Rev"},{"key":"mlstacd7c3bib40","doi-asserted-by":"publisher","first-page":"4576","DOI":"10.1038\/s41467-019-12630-7","article-title":"SCALE method for single-cell ATAC-seq analysis via latent feature extraction","volume":"10","author":"Xiong","year":"2019","journal-title":"Nat. Commun."},{"key":"mlstacd7c3bib41","doi-asserted-by":"publisher","first-page":"1387","DOI":"10.1002\/hep.29353","article-title":"A single-cell transcriptomic analysis reveals precise pathways and regulatory mechanisms underlying hepatoblast differentiation","volume":"66","author":"Yang","year":"2017","journal-title":"Hepatology"},{"key":"mlstacd7c3bib42","doi-asserted-by":"publisher","first-page":"763","DOI":"10.1093\/bioinformatics\/17.9.763","article-title":"Principal component analysis for clustering gene expression data","volume":"17","author":"Yeung","year":"2001","journal-title":"Bioinformatics"},{"key":"mlstacd7c3bib43","doi-asserted-by":"publisher","first-page":"bbaa316","DOI":"10.1093\/bib\/bbaa316","article-title":"ScGMAI: a Gaussian mixture model for clustering single-cell RNA-seq data based on deep autoencoder","volume":"22","author":"Yu","year":"2021","journal-title":"Brief. Bioinform."},{"key":"mlstacd7c3bib44","doi-asserted-by":"publisher","first-page":"747","DOI":"10.1016\/j.asoc.2017.08.032","article-title":"Two improved k-means algorithms","volume":"68","author":"Yu","year":"2018","journal-title":"Appl. Soft Comput."},{"key":"mlstacd7c3bib45","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1186\/s12575-018-0067-8","article-title":"Silhouette scores for arbitrary defined groups in gene expression data and insights into differential expression results","volume":"20","author":"Zhao","year":"2018","journal-title":"Biol. Proced. Online"},{"key":"mlstacd7c3bib46","doi-asserted-by":"publisher","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat. Commun."}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acd7c3","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acd7c3\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acd7c3","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acd7c3\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acd7c3\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acd7c3\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acd7c3\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acd7c3\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,27]],"date-time":"2023-07-27T13:06:53Z","timestamp":1690463213000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acd7c3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,27]]},"references-count":46,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,7,27]]},"published-print":{"date-parts":[[2023,9,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/acd7c3","relation":{},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,27]]},"assertion":[{"value":"scGMM-VGAE: a Gaussian mixture model-based variational graph autoencoder algorithm for clustering single-cell RNA-seq data","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2023 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2022-11-28","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2023-05-22","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2023-07-27","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}