{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T10:57:20Z","timestamp":1774436240486,"version":"3.50.1"},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2025,8,13]],"date-time":"2025-08-13T00:00:00Z","timestamp":1755043200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62273153"],"award-info":[{"award-number":["62273153"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100021171","name":"Guangdong Basic and Applied Basic Research Foundation","doi-asserted-by":"publisher","award":["2024A1515010900"],"award-info":[{"award-number":["2024A1515010900"]}],"id":[{"id":"10.13039\/501100021171","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Accurate tumor subtype diagnosis is crucial for precision oncology, yet current methodologies face significant challenges. These include balancing model accuracy with interpretability and the high costs of generating multi-omics data in clinical settings. Moreover, there is a lack of validated models capable of classifying hierarchical tumor subtypes across a comprehensive pan-cancer cohort.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We present a graph neural network, HallmarkGraph, the first biologically informed model developed to classify hierarchical tumor subtypes in human cancer. Inspired by cancer hallmarks, the model\u2019s architecture integrates transcriptome profiles and gene regulatory interactions to perform multi-label classification. We evaluate the model on a comprehensive pan-cancer cohort comprising 11\u00a0476 samples from 26 primary cancers with 405 subtypes up to eight levels. The model demonstrates exceptional performance, achieving 5-fold cross-validation accuracy between 85% and 99% for tumor subtypes labeled with increasing details of genomic information. It also shows good generalizability on a validation dataset of 887 samples, assessed using three metrics that consider tumor subtypes at individual, combined, and sample levels. Benchmarking and ablation experiments show that hallmark-based embeddings slightly influence model performance, while the integrated multilayer perceptron plays a significant role in determining classifier accuracy. Additionally, we use the SHAP method to link cancer hallmarks with genes, identifying key features that influence model decisions. Our findings present a biologically informed machine learning framework capable of tracking tumor transcriptomic trajectories and distinguishing inter- and intra-tumor heterogeneity in pan-cancer. This approach holds promise for enhancing cancer diagnostics.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>HallmarkGraph is accessible at https:\/\/github.com\/laixn\/HallmarkGraph.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf444","type":"journal-article","created":{"date-parts":[[2025,8,13]],"date-time":"2025-08-13T16:34:14Z","timestamp":1755102854000},"source":"Crossref","is-referenced-by-count":3,"title":["HallmarkGraph: a cancer hallmark informed graph neural network for classifying hierarchical tumor subtypes"],"prefix":"10.1093","volume":"41","author":[{"given":"Qingsong","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Software Engineering, South China University of Technology , Guangzhou, 510006,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9415-0496","authenticated-orcid":false,"given":"Fei","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Software Engineering, South China University of Technology , Guangzhou, 510006,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4913-5822","authenticated-orcid":false,"given":"Xin","family":"Lai","sequence":"additional","affiliation":[{"name":"Systems and Network Medicine Lab, Biomedicine Unit, Faculty of Medicine and Health Technology, Tampere University , Tampere, 33520,","place":["Finland"]},{"name":"Department of Dermatology, Universit\u00e4tklinikum Erlangen and Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg , Erlangen, 91054,","place":["Germany"]}]}],"member":"286","published-online":{"date-parts":[[2025,8,13]]},"reference":[{"key":"2025090120014466600_btaf444-B1","author":"Gandrud C, Allaire JJ , Kent R","year":"2017"},{"key":"2025090120014466600_btaf444-B2","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/s10994-020-05913-4","article-title":"LoRAS: an oversampling approach for imbalanced datasets","volume":"110","author":"Bej","year":"2021","journal-title":"Mach Learn"},{"key":"2025090120014466600_btaf444-B3","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1186\/s12859-021-04105-8","article-title":"Establishing a consensus for the hallmarks of cancer based on gene ontology and pathway annotations","volume":"22","author":"Chen","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"2025090120014466600_btaf444-B4","doi-asserted-by":"crossref","first-page":"656","DOI":"10.1038\/s41591-023-02221-x","article-title":"Diagnostic classification of childhood cancer using multiscale transcriptomics","volume":"29","author":"Comitani","year":"2023","journal-title":"Nat Med"},{"key":"2025090120014466600_btaf444-B5","doi-asserted-by":"crossref","first-page":"590","DOI":"10.1038\/s41419-023-06092-5","article-title":"AZGP1 activation by Lenvatinib suppresses intrahepatic cholangiocarcinoma epithelial-mesenchymal transition through the TGF-\u03b21\/Smad3 pathway","volume":"14","author":"Deng","year":"2023","journal-title":"Cell Death Dis"},{"key":"2025090120014466600_btaf444-B6","doi-asserted-by":"crossref","first-page":"1185","DOI":"10.3390\/cancers14051185","article-title":"Deep learning-based pan-cancer classification model reveals tissue-of-origin specific gene expression signatures","volume":"14","author":"Divate","year":"2022","journal-title":"Cancers (Basel)"},{"key":"2025090120014466600_btaf444-B3160745","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1016\/j.ccell.2024.12.002","article-title":"Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets","volume":"43","author":"Ellrott","year":"2025","journal-title":"Cancer Cell"},{"key":"2025090120014466600_btaf444-B7","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1038\/s43018-020-00159-4","article-title":"Pathway-based classification of glioblastoma uncovers a mitochondrial subtype with therapeutic vulnerabilities","volume":"2","author":"Garofano","year":"2021","journal-title":"Nat Cancer"},{"key":"2025090120014466600_btaf444-B8","doi-asserted-by":"crossref","first-page":"5858","DOI":"10.3390\/cancers15245858","article-title":"Graph neural networks in cancer and oncology research: emerging and future trends","volume":"15","author":"Gogoshin","year":"2023","journal-title":"Cancers (Basel)"},{"key":"2025090120014466600_btaf444-B9","doi-asserted-by":"crossref","first-page":"2847","DOI":"10.1093\/bioinformatics\/btw313","article-title":"Complex heatmaps reveal patterns and correlations in multidimensional genomic data","volume":"32","author":"Gu","year":"2016","journal-title":"Bioinformatics"},{"key":"2025090120014466600_btaf444-B10","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1158\/2159-8290.CD-21-1059","article-title":"Hallmarks of cancer: new dimensions","volume":"12","author":"Hanahan","year":"2022","journal-title":"Cancer Discov"},{"key":"2025090120014466600_btaf444-B11","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1007\/s00262-024-03629-1","article-title":"Investigating the immunological function of alpha-2-glycoprotein 1, zinc-binding in regulating tumor response in the breast cancer microenvironment","volume":"73","author":"Hanamura","year":"2024","journal-title":"Cancer Immunol Immunother"},{"key":"2025090120014466600_btaf444-B12","doi-asserted-by":"crossref","first-page":"D891","DOI":"10.1093\/nar\/gkad1049","article-title":"Ensembl 2024","volume":"52","author":"Harrison","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025090120014466600_btaf444-B13","first-page":"770","author":"He","year":"2016"},{"key":"2025090120014466600_btaf444-B14","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1038\/s41586-020-1969-6","article-title":"Pan-cancer analysis of whole genomes","volume":"578","author":"ICGC\/TCGA Pan-Cancer Analysis of Whole Genomes Consortium","year":"2020","journal-title":"Nature"},{"key":"2025090120014466600_btaf444-B15","doi-asserted-by":"crossref","first-page":"1696","DOI":"10.1038\/s41588-023-01507-7","article-title":"Molecular classification of hormone receptor-positive HER2-negative breast cancer","volume":"55","author":"Jin","year":"2023","journal-title":"Nat Genet"},{"key":"2025090120014466600_btaf444-B16","author":"Kingma"},{"key":"2025090120014466600_btaf444-B17","doi-asserted-by":"crossref","first-page":"1029","DOI":"10.1002\/ijc.33860","article-title":"A disease network-based deep learning approach for characterizing melanoma","volume":"150","author":"Lai","year":"2022","journal-title":"Int J Cancer"},{"key":"2025090120014466600_btaf444-B18","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1186\/s12859-023-05622-4","article-title":"A multimodal graph neural network framework for cancer molecular subtype classification","volume":"25","author":"Li","year":"2024","journal-title":"BMC Bioinformatics"},{"key":"2025090120014466600_btaf444-B19","doi-asserted-by":"crossref","first-page":"508","DOI":"10.1186\/s12864-017-3906-0","article-title":"A comprehensive genomic pan-cancer classification using the cancer genome atlas gene expression data","volume":"18","author":"Li","year":"2017","journal-title":"BMC Genomics"},{"key":"2025090120014466600_btaf444-B20","doi-asserted-by":"crossref","first-page":"1095","DOI":"10.1016\/j.ccell.2022.09.012","article-title":"Artificial intelligence for multimodal data integration in oncology","volume":"40","author":"Lipkova","year":"2022","journal-title":"Cancer Cell"},{"key":"2025090120014466600_btaf444-B21","volume-title":"Advances in Neural Information Processing Systems (NIPS), Long Beach, California, USA","author":"Lundberg","year":"2017"},{"key":"2025090120014466600_btaf444-B22","doi-asserted-by":"crossref","first-page":"658","DOI":"10.1038\/s41591-022-01717-2","article-title":"Delivering precision oncology to patients with cancer","volume":"28","author":"Mateo","year":"2022","journal-title":"Nat Med"},{"key":"2025090120014466600_btaf444-B23","volume-title":"J Open Source Softw","author":"McInnes","year":"2018"},{"key":"2025090120014466600_btaf444-B24","doi-asserted-by":"crossref","first-page":"D672","DOI":"10.1093\/nar\/gkad1025","article-title":"The reactome pathway knowledgebase 2024","volume":"52","author":"Milacic","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025090120014466600_btaf444-B25","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1186\/s12920-020-0677-2","article-title":"Convolutional neural network models for cancer type prediction based on gene expression","volume":"13","author":"Mostavi","year":"2020","journal-title":"BMC Med Genomics"},{"key":"2025090120014466600_btaf444-B26","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1038\/35021093","article-title":"Molecular portraits of human breast tumours","volume":"406","author":"Perou","year":"2000","journal-title":"Nature"},{"key":"2025090120014466600_btaf444-B27","doi-asserted-by":"crossref","first-page":"203","DOI":"10.3389\/fphy.2020.00203","article-title":"Classification of cancer types using graph convolutional neural networks","volume":"8","author":"Ramirez","year":"2020","journal-title":"Front Phys"},{"key":"2025090120014466600_btaf444-B28","doi-asserted-by":"crossref","first-page":"639","DOI":"10.1038\/s41568-019-0185-x","article-title":"An analysis of genetic heterogeneity in untreated cancers","volume":"19","author":"Reiter","year":"2019","journal-title":"Nat Rev Cancer"},{"key":"2025090120014466600_btaf444-B29","doi-asserted-by":"crossref","first-page":"430","DOI":"10.1186\/s12859-022-04980-9","article-title":"Deep learning approach for cancer subtype classification using high-dimensional gene expression data","volume":"23","author":"Shen","year":"2022","journal-title":"BMC Bioinformatics"},{"key":"2025090120014466600_btaf444-B30","doi-asserted-by":"crossref","first-page":"bbab569","DOI":"10.1093\/bib\/bbab569","article-title":"Multimodal deep learning for biomedical data fusion: a review","volume":"23","author":"Stahlschmidt","year":"2022","journal-title":"Brief Bioinform"},{"key":"2025090120014466600_btaf444-B31","first-page":"207","article-title":"Downregulation of AZGP1 by Ikaros and histone deacetylase promotes tumor progression through the PTEN\/Akt and CD44s pathways in hepatocellular carcinoma","volume":"38","author":"Tian","year":"2017","journal-title":"Carcinogenesis"},{"key":"2025090120014466600_btaf444-B32","doi-asserted-by":"crossref","first-page":"bbac433","DOI":"10.1093\/bib\/bbac433","article-title":"Melanoma 2.0. Skin cancer as a paradigm for emerging diagnostic technologies, computational modelling and artificial intelligence","volume":"23","author":"Vera","year":"2022","journal-title":"Brief Bioinform"},{"key":"2025090120014466600_btaf444-B33","doi-asserted-by":"crossref","first-page":"970","DOI":"10.1038\/s41586-024-07894-z","article-title":"A pathology foundation model for cancer diagnosis and prognosis prediction","volume":"634","author":"Wang","year":"2024","journal-title":"Nature"},{"key":"2025090120014466600_btaf444-B34","doi-asserted-by":"crossref","first-page":"1408843","DOI":"10.3389\/frai.2024.1408843","article-title":"Multimodal data integration for oncology in the era of deep neural networks: a review","volume":"7","author":"Waqas","year":"2024","journal-title":"Front Artif Intell"},{"key":"2025090120014466600_btaf444-B35","first-page":"5075","author":"Wehrmann","year":"2018"},{"key":"2025090120014466600_btaf444-B36","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-24277-4","volume-title":"ggplot2: Elegant Graphics for Data Analysis","author":"Wickham","year":"2016","edition":"2nd ed"},{"key":"2025090120014466600_btaf444-B37","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1038\/s41586-024-07441-w","article-title":"A whole-slide foundation model for digital pathology from real-world data","volume":"630","author":"Xu","year":"2024","journal-title":"Nature"},{"key":"2025090120014466600_btaf444-B38","doi-asserted-by":"crossref","first-page":"572","DOI":"10.1093\/bib\/bby026","article-title":"Molecular subtyping of cancer: current status and moving toward clinical applications","volume":"20","author":"Zhao","year":"2019","journal-title":"Brief Bioinform"},{"key":"2025090120014466600_btaf444-B39","doi-asserted-by":"crossref","first-page":"103030","DOI":"10.1016\/j.ebiom.2020.103030","article-title":"CUP-AI-Dx: a tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence","volume":"61","author":"Zhao","year":"2020","journal-title":"EBioMedicine"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf444\/64037353\/btaf444.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/9\/btaf444\/64037353\/btaf444.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/9\/btaf444\/64037353\/btaf444.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,2]],"date-time":"2025-09-02T00:01:50Z","timestamp":1756771310000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf444\/8233693"}},"subtitle":[],"editor":[{"given":"Laura","family":"Cantini","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,8,13]]},"references-count":40,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2025,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf444","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,9]]},"published":{"date-parts":[[2025,8,13]]},"article-number":"btaf444"}}