{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,11]],"date-time":"2026-02-11T13:29:49Z","timestamp":1770816589997,"version":"3.50.1"},"reference-count":25,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T00:00:00Z","timestamp":1721001600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:p>The identification of cancer subtypes plays a very important role in the field of medicine. Accurate identification of cancer subtypes is helpful for both cancer treatment and prognosis Currently, most methods for cancer subtype identification are based on single-omics data, such as gene expression data. However, multi-omics data can show various characteristics about cancer, which also can improve the accuracy of cancer subtype identification. Therefore, how to extract features from multi-omics data for cancer subtype identification is the main challenge currently faced by researchers. In this paper, we propose a cancer subtype identification method named CAEM-GBDT, which takes gene expression data, miRNA expression data, and DNA methylation data as input, and adopts convolutional autoencoder network to identify cancer subtypes. Through a convolutional encoder layer, the method performs feature extraction on the input data. Within the convolutional encoder layer, a convolutional self-attention module is embedded to recognize higher-level representations of the multi-omics data. The extracted high-level representations from the convolutional encoder are then concatenated with the input to the decoder. The GBDT (Gradient Boosting Decision Tree) is utilized for cancer subtype identification. In the experiments, we compare CAEM-GBDT with existing cancer subtype identifying methods. Experimental results demonstrate that the proposed CAEM-GBDT outperforms other methods. The source code is available from GitHub at <jats:ext-link>https:\/\/github.com\/gxh-1\/CAEM-GBDT.git<\/jats:ext-link>.<\/jats:p>","DOI":"10.3389\/fbinf.2024.1403826","type":"journal-article","created":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T04:26:32Z","timestamp":1721017592000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["CAEM-GBDT: a cancer subtype identifying method using multi-omics data and convolutional autoencoder network"],"prefix":"10.3389","volume":"4","author":[{"given":"Jiquan","family":"Shen","sequence":"first","affiliation":[]},{"given":"Xuanhui","family":"Guo","sequence":"additional","affiliation":[]},{"given":"Hanwen","family":"Bai","sequence":"additional","affiliation":[]},{"given":"Junwei","family":"Luo","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2024,7,15]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"295","DOI":"10.3390\/cancers14020295","article-title":"Assessment of the molecular heterogeneity of E-cadherin expression in invasive lobular breast cancer","volume":"14","author":"Alexander","year":"2022","journal-title":"Cancers"},{"key":"B2","doi-asserted-by":"publisher","first-page":"822","DOI":"10.2174\/0929867328666210917115733","article-title":"Research progress in predicting DNA methylation modifications and the relation with human diseases","volume":"29","author":"Ao","year":"2022","journal-title":"Curr. Med. Chem."},{"key":"B3","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1038\/nature11412","article-title":"Comprehensive molecular portraits of human breast tumours","volume":"490","author":"Brigham","year":"2012","journal-title":"Nature"},{"key":"B4","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1186\/s12859-023-05273-5","article-title":"moBRCA-net: a breast cancer subtype classification framework based on multi-omics attention neural networks","volume":"24","author":"Choi","year":"2023","journal-title":"BMC Bioinforma."},{"key":"B5","doi-asserted-by":"publisher","first-page":"65","DOI":"10.3390\/genes13010065","article-title":"Identifying cancer subtypes using a residual graph convolution model on a sample similarity network","volume":"13","author":"Dai","year":"2021","journal-title":"Genes."},{"key":"B6","doi-asserted-by":"publisher","first-page":"1574","DOI":"10.3390\/math9131574","article-title":"A cascade deep forest model for breast cancer subtype classification using multi-omics data","volume":"9","author":"El-Nabawy","year":"2021","journal-title":"Mathematics"},{"key":"B7","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1007\/s12539-023-00550-6","article-title":"MSResG: using GAE and residual GCN to predict drug\u2013drug interactions based on multi-source drug features","volume":"15","author":"Guo","year":"2023","journal-title":"Interdiscip. Sci. Comput. Life Sci."},{"key":"B8","doi-asserted-by":"publisher","first-page":"519","DOI":"10.1038\/nature11404","article-title":"Comprehensive genomic characterization of squamous cell lung cancers","volume":"489","author":"Hammerman","year":"2012","journal-title":"Nature"},{"key":"B9","doi-asserted-by":"publisher","first-page":"bbab454","DOI":"10.1093\/bib\/bbab454","article-title":"A roadmap for multi-omics data integration using deep learning","volume":"23","author":"Kang","year":"2022","journal-title":"Briefings Bioinforma."},{"key":"B10","doi-asserted-by":"publisher","first-page":"1056605","DOI":"10.3389\/fphar.2022.1056605","article-title":"Drug repositioning based on heterogeneous networks and variational graph autoencoders","volume":"13","author":"Lei","year":"2022","journal-title":"Front. Pharmacol."},{"key":"B11","doi-asserted-by":"publisher","first-page":"1095","DOI":"10.1016\/j.ccell.2022.09.012","article-title":"Artificial intelligence for multimodal data integration in oncology","volume":"40","author":"Lipkova","year":"2022","journal-title":"Cancer Cell."},{"key":"B12","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1093\/biostatistics\/kxx017","article-title":"A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data","volume":"19","author":"Mo","year":"2018","journal-title":"Biostatistics"},{"key":"B13","doi-asserted-by":"publisher","first-page":"330","DOI":"10.1038\/nature11252","article-title":"Comprehensive molecular characterization of human colon and rectal cancer","volume":"487","author":"Muzny","year":"2012","journal-title":"Nature"},{"key":"B14","first-page":"1","article-title":"Convolutional autoencoder application for breast cancer classification","volume-title":"2020 IEEE 2nd international conference on system analysis and intelligent computing (SAIC)","author":"Naderan","year":"2020"},{"key":"B15","doi-asserted-by":"publisher","first-page":"430","DOI":"10.1186\/s12859-022-04980-9","article-title":"Deep learning approach for cancer subtype classification using high-dimensional gene expression data","volume":"23","author":"Shen","year":"2022","journal-title":"BMC Bioinforma."},{"key":"B16","doi-asserted-by":"publisher","first-page":"bbab398","DOI":"10.1093\/bib\/bbab398","article-title":"Subtype-WESLR: identifying cancer subtype with weighted ensemble sparse latent representation of multi-view data","volume":"23","author":"Song","year":"2022","journal-title":"Briefings Bioinforma."},{"key":"B17","doi-asserted-by":"publisher","first-page":"1032768","DOI":"10.3389\/fgene.2022.1032768","article-title":"SADLN: self-attention based deep learning network of integrating multi-omics data for cancer subtype recognition","volume":"13","author":"Sun","year":"2023","journal-title":"Front. Genet."},{"key":"B18","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1038\/nmeth.2810","article-title":"Similarity network fusion for aggregating data types on a genomic scale","volume":"11","author":"Wang","year":"2014","journal-title":"Nat. methods"},{"key":"B19","doi-asserted-by":"publisher","first-page":"5340","DOI":"10.2174\/0929867325666181101115801","article-title":"A review of drug repositioning based chemical-induced cell line expression data","volume":"27","author":"Wang","year":"2020","journal-title":"Curr. Med. Chem."},{"key":"B20","doi-asserted-by":"publisher","first-page":"1022","DOI":"10.1186\/s12864-015-2223-8","article-title":"Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification","volume":"16","author":"Wu","year":"2015","journal-title":"BMC genomics"},{"key":"B21","doi-asserted-by":"publisher","first-page":"527","DOI":"10.1186\/s12859-019-3116-7","article-title":"A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data","volume":"20","author":"Xu","year":"2019","journal-title":"BMC Bioinforma."},{"key":"B22","doi-asserted-by":"publisher","first-page":"2231","DOI":"10.1093\/bioinformatics\/btab109","article-title":"Subtype-GAN: a deep learning approach for integrative cancer subtyping of multi-omics data","volume":"37","author":"Yang","year":"2021","journal-title":"Bioinformatics"},{"key":"B23","doi-asserted-by":"publisher","first-page":"872785","DOI":"10.3389\/fphar.2022.872785","article-title":"Drug repositioning with GraphSAGE and clustering constraints based on drug and disease networks","volume":"13","author":"Zhang","year":"2022","journal-title":"Front. Pharmacol."},{"key":"B24","doi-asserted-by":"publisher","first-page":"bbad025","DOI":"10.1093\/bib\/bbad025","article-title":"Subtype-DCC: decoupled contrastive clustering method for cancer subtype identification based on multi-omics data","volume":"24","author":"Zhao","year":"2023","journal-title":"Briefings Bioinforma."},{"key":"B25","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1093\/nsr\/nwy108","article-title":"Deep forest","volume":"6","author":"Zhou","year":"2019","journal-title":"Natl. Sci. Rev."}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2024.1403826\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T04:26:35Z","timestamp":1721017595000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2024.1403826\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,15]]},"references-count":25,"alternative-id":["10.3389\/fbinf.2024.1403826"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2024.1403826","relation":{},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,15]]},"article-number":"1403826"}}