{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T06:08:44Z","timestamp":1768370924737,"version":"3.49.0"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2022,4,22]],"date-time":"2022-04-22T00:00:00Z","timestamp":1650585600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Research Grants Council of the Hong Kong Special Administrative Region [CityU","award":["11200218"],"award-info":[{"award-number":["11200218"]}]},{"name":"Health and Medical Research Fund, of the Food and Health Bureau"},{"name":"The Government of the Hong Kong Special Administrative Region","award":["07181426"],"award-info":[{"award-number":["07181426"]}]},{"name":"Hong Kong Institute for Data Science (HKIDS) at the City University of Hong Kong"},{"DOI":"10.13039\/100007567","name":"City University of Hong Kong","doi-asserted-by":"publisher","award":["CityU 11202219"],"award-info":[{"award-number":["CityU 11202219"]}],"id":[{"id":"10.13039\/100007567","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007567","name":"City University of Hong Kong","doi-asserted-by":"publisher","award":["CityU 11203520"],"award-info":[{"award-number":["CityU 11203520"]}],"id":[{"id":"10.13039\/100007567","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,5,26]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Thanks to the development of high-throughput sequencing technologies, massive amounts of various biomolecular data have been accumulated to revolutionize the study of genomics and molecular biology. One of the main challenges in analyzing this biomolecular data is to cluster their subtypes into subpopulations to facilitate subsequent downstream analysis. Recently, many clustering methods have been developed to address the biomolecular data. However, the computational methods often suffer from many limitations such as high dimensionality, data heterogeneity and noise.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>In our study, we develop a novel Graph-based Multiple Hierarchical Consensus Clustering (GMHCC) method with an unsupervised graph-based feature ranking (FR) and a graph-based linking method to explore the multiple hierarchical information of the underlying partitions of the consensus clustering for multiple types of biomolecular data. Indeed, we first propose to use a graph-based unsupervised FR model to measure each feature by building a graph over pairwise features and then providing each feature with a rank. Subsequently, to maintain the diversity and robustness of basic partitions (BPs), we propose multiple diverse feature subsets to generate several BPs and then explore the hierarchical structures of the multiple BPs by refining the global consensus function. Finally, we develop a new graph-based linking method, which explicitly considers the relationships between clusters to generate the final partition. Experiments on multiple types of biomolecular data including 35 cancer gene expression datasets and eight single-cell RNA-seq datasets validate the effectiveness of our method over several state-of-the-art consensus clustering approaches. Furthermore, differential gene analysis, gene ontology enrichment analysis and KEGG pathway analysis are conducted, providing novel insights into cell developmental lineages and characterization mechanisms.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>The source code is available at GitHub: https:\/\/github.com\/yifuLu\/GMHCC. The software and the supporting data can be downloaded from: https:\/\/figshare.com\/articles\/software\/GMHCC\/17111291.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac290","type":"journal-article","created":{"date-parts":[[2022,4,19]],"date-time":"2022-04-19T11:33:41Z","timestamp":1650368021000},"page":"3020-3028","source":"Crossref","is-referenced-by-count":4,"title":["GMHCC: high-throughput analysis of biomolecular data using graph-based multiple hierarchical consensus clustering"],"prefix":"10.1093","volume":"38","author":[{"given":"Yifu","family":"Lu","sequence":"first","affiliation":[{"name":"School of Artificial Intelligence, Jilin University , Changchun 130012, China"}]},{"given":"Zhuohan","family":"Yu","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Jilin University , Changchun 130012, China"}]},{"given":"Yunhe","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Jilin University , Changchun 130012, China"}]},{"given":"Zhiqiang","family":"Ma","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Jilin University , Changchun 130012, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6062-733X","authenticated-orcid":false,"given":"Ka-Chun","family":"Wong","sequence":"additional","affiliation":[{"name":"Department of Computer Science, City University of Hong Kong ,\u00a0Hong Kong 999077, Hong Kong SAR"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8716-9823","authenticated-orcid":false,"given":"Xiangtao","family":"Li","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Jilin University , Changchun 130012, China"}]}],"member":"286","published-online":{"date-parts":[[2022,4,22]]},"reference":[{"key":"2023041403081653200_","first-page":"166","author":"Ayad","year":"2003"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"0","DOI":"10.1186\/s12859-019-2742-4","article-title":"VPAC: variational projection for accurate clustering of single-cell transcriptomic data","volume":"20","author":"Chen","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023041403081653200_","first-page":"186","article-title":"Random projection for high dimensional data clustering: a cluster ensemble approach","author":"Fern","year":"2003"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1007\/3-540-48219-9_31","volume-title":"International Workshop on Multiple Classifier Systems","author":"Fred","year":"2001"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1109\/TPAMI.2005.113","article-title":"Combining multiple clusterings using evidence accumulation","volume":"27","author":"Fred","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1016\/j.ejor.2017.08.040","article-title":"High dimensional data classification and feature selection using support vector machines","volume":"265","author":"Ghaddar","year":"2018","journal-title":"Eur. J. Oper. Res"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"1460","DOI":"10.1109\/TCYB.2017.2702343","article-title":"Locally weighted ensemble clustering","volume":"48","author":"Huang","year":"2017","journal-title":"IEEE Trans. Cybern"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1093\/bioinformatics\/btq226","article-title":"LCE: a link-based cluster ensemble method for improved gene expression data analysis","volume":"26","author":"Iam-On","year":"2010","journal-title":"Bioinformatics"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"2396","DOI":"10.1109\/TPAMI.2011.84","article-title":"A link-based approach to the cluster ensemble problem","volume":"33","author":"Iam-On","year":"2011","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"6687","DOI":"10.1038\/sj.onc.1210754","article-title":"Hematopoietic developmental pathways: on cellular basis","volume":"26","author":"Iwasaki","year":"2007","journal-title":"Oncogene"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1007\/BF02289588","article-title":"Hierarchical clustering schemes","volume":"32","author":"Johnson","year":"1967","journal-title":"Psychometrika"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1038\/nmeth.4236","article-title":"SC3: consensus clustering of single-cell RNA-seq data","volume":"14","author":"Kiselev","year":"2017","journal-title":"Nat. Methods"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1002\/widm.30","article-title":"Density-based clustering","volume":"1","author":"Kriegel","year":"2011","journal-title":"WIREs Data Mining Knowl. Discov"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"2809","DOI":"10.1093\/bioinformatics\/bty1056","article-title":"Single-cell RNA-seq interpretations using evolutionary multiobjective ensemble pruning","volume":"35","author":"Li","year":"2019","journal-title":"Bioinformatics"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"bbab368","DOI":"10.1093\/bib\/bbab368","article-title":"High-throughput single-cell RNA-seq data imputation and characterization with surrogate-assisted automated deep learning","volume":"23","author":"Li","year":"2022","journal-title":"Brief. Bioinform"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"2691","DOI":"10.1093\/bioinformatics\/btx167","article-title":"Entropy-based consensus clustering for patient stratification","volume":"33","author":"Liu","year":"2017","journal-title":"Bioinformatics"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1007\/s10618-017-0539-5","article-title":"Infinite ensemble clustering","volume":"32","author":"Liu","year":"2018","journal-title":"Data Min. Knowl. Disc"},{"key":"2023041403081653200_","article-title":"Consensus clustering: an embedding perspective, extension and beyond","author":"Liu","year":"2019"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1038\/s42256-021-00333-y","article-title":"Simultaneous deep generative modelling and clustering of single-cell genomic data","volume":"3","author":"Liu","year":"2021","journal-title":"Nat. Mach. Intell"},{"key":"2023041403081653200_","first-page":"281","article-title":"Some methods for classification and analysis of multivariate observations","volume-title":"Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability","author":"MacQueen","year":"1967"},{"key":"2023041403081653200_","article-title":"UMAP: uniform manifold approximation and projection for dimension reduction","author":"McInnes","year":"2018"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1109\/TPAMI.2005.159","article-title":"Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy","volume":"27","author":"Peng","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"2047","DOI":"10.1109\/TNNLS.2015.2451151","article-title":"Space structure and clustering of categorical data","volume":"27","author":"Qian","year":"2015","journal-title":"IEEE Trans. Neural Netw. Learn. Syst"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"4396","DOI":"10.1109\/TPAMI.2020.3002843","article-title":"Infinite feature selection: a graph-based feature filtering approach","volume":"43","author":"Roffo","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1038\/nbt.3192","article-title":"Spatial reconstruction of single-cell gene expression data","volume":"33","author":"Satija","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023041403081653200_","first-page":"583","article-title":"Cluster ensembles\u2014a knowledge reuse framework for combining multiple partitions","volume":"3","author":"Strehl","year":"2002","journal-title":"J. Mach. Learn. Res"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-020-19465-7","article-title":"Ensemble dimensionality reduction and feature gene extraction for single-cell RNA-seq data","volume":"11","author":"Sun","year":"2020","journal-title":"Nat. Commun"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1038\/s42256-019-0037-0","article-title":"Clustering single-cell RNA-seq data with a model-based deep learning approach","volume":"1","author":"Tian","year":"2019","journal-title":"Nat. Mach. Intell"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1109\/ICDM.2003.1250937","article-title":"Combining multiple weak clusterings","volume-title":"Third IEEE International Conference on Data Mining","author":"Topchy","year":"2003"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1038\/nmeth.4207","article-title":"Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning","volume":"14","author":"Wang","year":"2017","journal-title":"Nat. Methods"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1016\/S0165-5728(03)00093-6","article-title":"Antigen processing and presentation in human muscle: cathepsin s is critical for MHC class II expression and upregulated in inflammatory myopathies","volume":"138","author":"Wiendl","year":"2003","journal-title":"J. Neuroimmunol"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-017-1382-0","article-title":"Scanpy: large-scale single-cell gene expression data analysis","volume":"19","author":"Wolf","year":"2018","journal-title":"Genome Biol"},{"key":"2023041403081653200_","first-page":"155","article-title":"K-means-based consensus clustering: a unified view","volume":"27","author":"Wu","year":"2015","journal-title":"IEEE Comput. Arch. Lett"},{"key":"2023041403081653200_","doi-asserted-by":"crossref","first-page":"D721","DOI":"10.1093\/nar\/gky900","article-title":"Cellmarker: a manually curated resource of cell markers in human and mouse","volume":"47","author":"Zhang","year":"2019","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac290\/43590035\/btac290.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/11\/3020\/49878660\/btac290.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/11\/3020\/49878660\/btac290.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T05:09:09Z","timestamp":1700456949000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/11\/3020\/6572336"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,4,22]]},"references-count":34,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2022,5,26]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac290","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,6,1]]},"published":{"date-parts":[[2022,4,22]]}}}