{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T15:45:59Z","timestamp":1772552759895,"version":"3.50.1"},"reference-count":48,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2025,2,18]],"date-time":"2025-02-18T00:00:00Z","timestamp":1739836800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Trond Mohn Foundation","award":["TMS2021TMT09"],"award-info":[{"award-number":["TMS2021TMT09"]}]},{"name":"Trond Mohn Foundation","award":["TMS2020TMT11"],"award-info":[{"award-number":["TMS2020TMT11"]}]},{"name":"Centre for Antimicrobial Resistance in Western Norway (CAMRIA)","award":["TMS2021TMT09"],"award-info":[{"award-number":["TMS2021TMT09"]}]},{"name":"Centre for Antimicrobial Resistance in Western Norway (CAMRIA)","award":["TMS2020TMT11"],"award-info":[{"award-number":["TMS2020TMT11"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>High-dimensional survival data, such as microarray datasets, present significant challenges in variable selection and model performance due to their complexity and dimensionality. Identifying important genes and understanding how these genes influence the survival of patients with cancer are of great interest and a major challenge to biomedical scientists, healthcare practitioners, and oncologists. Therefore, this study combined the strengths of two complementary feature selection methodologies: a filtering (correlation-based) approach and a wrapper method based on Iterative Bayesian Model Averaging (IBMA). This new approach, termed Correlation-Based IBMA, offers a highly efficient and effective means of selecting the most important and influential genes for predicting the survival of patients with cancer. The efficiency and consistency of the method were demonstrated using diffuse large B-cell lymphoma cancer data. The results revealed that the 15 most important genes out of 3835 gene features were consistently selected at a threshold p-value of 0.001, with genes with posterior probabilities below 1% being removed. The influence of these 15 genes on patient survival was assessed using the Cox Proportional Hazards (Cox-PH) Model. The results further revealed that eight genes were highly associated with patient survival at a 0.05 level of significance. Finally, these findings underscore the importance of integrating feature selection with robust modeling approaches to enhance accuracy and interpretability in high-dimensional survival data analysis.<\/jats:p>","DOI":"10.3390\/data10020026","type":"journal-article","created":{"date-parts":[[2025,2,18]],"date-time":"2025-02-18T06:41:12Z","timestamp":1739860872000},"page":"26","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Consistency and Stability in Feature Selection for High-Dimensional Microarray Survival Data in Diffuse Large B-Cell Lymphoma Cancer"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4392-8592","authenticated-orcid":false,"given":"Kazeem A.","family":"Dauda","sequence":"first","affiliation":[{"name":"Department of Mathematics, University of Bergen, 5007 Bergen, Norway"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-7922-9403","authenticated-orcid":false,"given":"Rasheed K.","family":"Lamidi","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Statistics, Kwara State University, Malete, P.M.B. 1530, Ilorin 23431, Kwara State, Nigeria"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,2,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1007\/s12672-023-00754-8","article-title":"Evolving therapeutic landscape of diffuse large B-cell lymphoma: Challenges and aspirations","volume":"14","author":"Chan","year":"2023","journal-title":"Discov. Oncol."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/B978-0-12-800266-7.00004-2","article-title":"B-cell receptor signaling in lymphoid malignancies and autoimmunity","volume":"123","author":"Avalos","year":"2014","journal-title":"Adv. Immunol."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"3315","DOI":"10.1007\/s00277-024-05880-z","article-title":"Advances in biology, diagnosis and treatment of DLBCL","volume":"103","author":"Shi","year":"2024","journal-title":"Ann. Hematol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"959","DOI":"10.1080\/17474086.2019.1660159","article-title":"Remaining challenges in predicting patient outcomes for diffuse large B-cell lymphoma","volume":"12","author":"Harkins","year":"2019","journal-title":"Expert Rev. Hematol."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"He, R., Oliveira, J.L., Hoyer, J.D., and Viswanatha, D.S. (2018). Molecular hematopathology. Hematopathology, Elsevier.","DOI":"10.1016\/B978-0-323-47913-4.00024-0"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Satam, H., Joshi, K., Mangrolia, U., Waghoo, S., Zaidi, G., Rawool, S., Thakare, R.P., Banday, S., Mishra, A.K., and Das, G. (2023). Next-generation sequencing technology: Current trends and advancements. Biology, 12.","DOI":"10.3390\/biology12070997"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Rahnenf\u00fchrer, J., De Bin, R., Benner, A., Ambrogi, F., Lusa, L., Boulesteix, A.L., Migliavacca, E., Binder, H., Michiels, S., and Sauerbrei, W. (2023). Statistical analysis of high-dimensional biomedical data: A gentle introduction to analytical goals, common approaches and challenges. BMC Med., 21.","DOI":"10.1186\/s12916-023-02858-y"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Dauda, K.A., Adeniyi, E.J., Lamidi, R.K., and Wahab, O.T. (2025). Exploring Flexible Penalization of Bayesian Survival Analysis Using Beta Process Prior for Baseline Hazard. Computation, 13.","DOI":"10.3390\/computation13020021"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"9191","DOI":"10.1007\/s13369-019-04064-6","article-title":"Hybrid filter\u2013wrapper feature selection method for sentiment classification","volume":"44","author":"Ansari","year":"2019","journal-title":"Arab. J. Sci. Eng."},{"key":"ref_10","unstructured":"Annest, A., and Yeung, W.K.Y. (2023). iterativeBMAsurv: The Iterative Bayesian Model Averaging (BMA) Algorithm for Survival Analysis, Bioconductor. R package version 1.60.0."},{"key":"ref_11","unstructured":"Raftery, A., Hoeting, J., Volinsky, C., Painter, I., and Yeung, K.Y. (2024). BMA: Bayesian Model Averaging, CRAN. R package version 3.18.19."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"382","DOI":"10.1214\/ss\/1009212519","article-title":"Bayesian Model Averaging: A Tutorial (with Comments by M. Clyde, D. Draper, and E.I. George, and a Rejoinder by the Authors)","volume":"14","author":"Hoeting","year":"1999","journal-title":"Stat. Sci."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Atmakuru, A., Di Fatta, G., Nicosia, G., and Badii, A. (2023, January 22\u201326). Improved Filter-Based Feature Selection Using Correlation and Clustering Techniques. Proceedings of the International Conference on Machine Learning, Optimization, and Data Science, Grasmere, UK.","DOI":"10.1007\/978-3-031-53969-5_28"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1007\/s10142-024-01415-x","article-title":"A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis","volume":"24","author":"Borah","year":"2024","journal-title":"Funct. Integr. Genom."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1728","DOI":"10.1007\/s10278-024-01049-2","article-title":"From Pixels to Prognosis: A Survey on AI-Driven Cancer Patient Survival Prediction Using Digital Histology Images","volume":"37","author":"Parvaiz","year":"2024","journal-title":"J. Imaging Inform. Med."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1007\/s12652-019-01364-5","article-title":"A novel hybrid wrapper\u2013filter approach based on genetic algorithm particle swarm optimization for feature subset selection","volume":"11","author":"Moslehi","year":"2020","journal-title":"J. Ambient Intell. Humaniz. Comput."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"15091","DOI":"10.1007\/s00521-021-06406-8","article-title":"A systematic review of emerging feature selection optimization methods for optimal text classification: The present state and prospective opportunities","volume":"33","author":"Abiodun","year":"2021","journal-title":"Neural Comput. Appl."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1950020","DOI":"10.1142\/S1469026819500202","article-title":"A New Hybrid Feature Subset Selection Framework Based on Binary Genetic Algorithm and Information Theory","volume":"18","author":"Shukla","year":"2019","journal-title":"Int. J. Comput. Intell. Appl."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"7839","DOI":"10.1007\/s00521-019-04171-3","article-title":"A wrapper-filter feature selection technique based on ant colony optimization","volume":"32","author":"Ghosh","year":"2020","journal-title":"Neural Comput. Appl."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1016\/j.engappai.2014.12.014","article-title":"Hybrid filter\u2013wrapper feature selection for short-term load forecasting","volume":"40","author":"Hu","year":"2015","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1016\/j.aca.2019.06.054","article-title":"A new hybrid filter\/wrapper algorithm for feature selection in classification","volume":"1080","author":"Zhang","year":"2019","journal-title":"Anal. Chim. Acta"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/s12293-018-0269-2","article-title":"A multi-objective hybrid filter-wrapper evolutionary approach for feature selection","volume":"11","author":"Hammami","year":"2019","journal-title":"Memetic Comput."},{"key":"ref_23","first-page":"e00778","article-title":"A novel hybrid dimension reduction technique for efficient selection of bio-marker genes and prediction of heart failure status of patients","volume":"12","author":"Dauda","year":"2021","journal-title":"Sci. Afr."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"5233","DOI":"10.1007\/s00500-018-3545-7","article-title":"Differential evolution for feature selection: A fuzzy wrapper\u2013filter approach","volume":"23","author":"Hancer","year":"2019","journal-title":"Soft Comput."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"3452","DOI":"10.1200\/JCO.2011.41.0985","article-title":"Concurrent Expression of MYC and BCL2 in Diffuse Large B-Cell Lymphoma Treated With Rituximab Plus Cyclophosphamide, Doxorubicin, Vincristine, and Prednisone","volume":"30","author":"Johnson","year":"2012","journal-title":"J. Clin. Oncol."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Dauda, K.A., Olorede, K.O., Banjoko, A.W., Yahya, W.B., and Ayipo, Y.O. (2024). Genetic Diagnosis, Classification, and Risk Prediction in Cancer Using Next-Generation Sequencing in Oncology. Computational Approaches in Biomaterials and Biomedical Engineering Applications, CRC Press.","DOI":"10.1201\/9781032699882-5"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1517\/17530059.2012.718329","article-title":"Challenges in biomarker discovery: Combining expert insights with statistical analysis of complex omics data","volume":"7","author":"McDermott","year":"2013","journal-title":"Expert Opin. Med Diagn."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Banegas-Luna, A.J., Pe\u00f1a-Garc\u00eda, J., Iftene, A., Guadagni, F., Ferroni, P., Scarpato, N., Zanzotto, F.M., Bueno-Crespo, A., and P\u00e9rez-S\u00e1nchez, H. (2021). Towards the interpretability of machine learning predictions for medical applications targeting personalised therapies: A cancer case survey. Int. J. Mol. Sci., 22.","DOI":"10.3390\/ijms22094394"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Xu, X., Qi, Z., Han, X., Wang, Y., Yu, M., and Geng, Z. (2024). Combined-task deep network based on LassoNet feature selection for predicting the comorbidities of acute coronary syndrome. Comput. Biol. Med., 170.","DOI":"10.1016\/j.compbiomed.2024.107992"},{"key":"ref_30","first-page":"25","article-title":"Survival analysis with multivariate adaptive regression splines using Cox-Snell residual","volume":"13","author":"Dauda","year":"2015","journal-title":"Ann. Comput. Sci. Ser."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"701","DOI":"10.1214\/aos\/1176344247","article-title":"Nonparametric Inference for a Family of Counting Processes","volume":"6","author":"Aalen","year":"1978","journal-title":"Ann. Stat."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1111\/j.2517-6161.1972.tb00899.x","article-title":"Regression models and life-tables","volume":"34","author":"Cox","year":"1972","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"697","DOI":"10.1016\/j.bbe.2019.05.001","article-title":"Decision tree for modeling survival data with competing risks","volume":"39","author":"Dauda","year":"2019","journal-title":"Biocybern. Biomed. Eng."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Dudoit, S., and Fridlyand, J. (2002). A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol., 3.","DOI":"10.1186\/gb-2002-3-7-research0036"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Annest, A., Bumgarner, R.E., Raftery, A.E., and Yeung, K.Y. (2009). Iterative bayesian model averaging: A method for the application of survival analysis to high-dimensional microarray data. BMC Bioinform., 10.","DOI":"10.1186\/1471-2105-10-72"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"67","DOI":"10.21315\/mjms2022.29.6.7","article-title":"Optimal tuning of random survival forest hyperparameter with an application to liver disease","volume":"29","author":"Dauda","year":"2022","journal-title":"Malays. J. Med. Sci. MJMS"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"96831","DOI":"10.1109\/ACCESS.2023.3312310","article-title":"Exploiting censored information in self-training for time-to-event prediction","volume":"11","author":"Haredasht","year":"2023","journal-title":"IEEE Access"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Klein, J.P., and Moeschberger, M.L. (2003). Survival Analysis: Techniques for Censored and Truncated Data, Springer.","DOI":"10.1007\/b97377"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Hosmer, D.W., Lemeshow, S., and May, S. (2008). Applied Survival Analysis: Regression Modeling of Time-to-Event Data, Wiley.","DOI":"10.1002\/9780470258019"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1093\/bioinformatics\/bth469","article-title":"Outcome signature genes in breast cancer: Is there a unique set?","volume":"21","author":"Kela","year":"2005","journal-title":"Bioinformatics"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1080\/01621459.1997.10473615","article-title":"Bayesian model averaging for linear regression models","volume":"92","author":"Raftery","year":"1997","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1002\/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4","article-title":"Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors","volume":"15","author":"Harrell","year":"1996","journal-title":"Stat. Med."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Kalbfleisch, J.D., and Prentice, R.L. (2002). The Statistical Analysis of Failure Time Data, John Wiley & Sons.","DOI":"10.1002\/9781118032985"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1016\/B978-0-444-51862-0.50018-6","article-title":"AIC, BIC and recent advances in model selection","volume":"7","author":"Chakrabarti","year":"2011","journal-title":"Philos. Stat."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1016\/j.zemedi.2023.06.003","article-title":"Application of multi-method-multi-model inference to radiation related solid cancer excess risks models for astronaut risk assessment","volume":"34","author":"Hafner","year":"2024","journal-title":"Z. F\u00fcr Med. Phys."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Dauda, K.A., Lamidi, R.K., Dauda, A.A., and Yahya, W.B. (2023). A New Generalized Gamma-Weibull Distribution with Applications to Time-to-event Data. bioRxiv.","DOI":"10.1101\/2023.11.18.567670"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"2313","DOI":"10.1056\/NEJMoa0802885","article-title":"Stromal gene signatures in large-B-cell lymphomas","volume":"359","author":"Lenz","year":"2008","journal-title":"N. Engl. J. Med."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Balakrishnan, N., Colton, T., Everitt, B., Piegorsch, W., Ruggeri, F., and Teugels, J.L. (2014). Parametric Models in Survival Analysis. Wiley StatsRef: Statistics Reference Online, Wiley.","DOI":"10.1002\/9781118445112"}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/2\/26\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:36:49Z","timestamp":1760027809000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/2\/26"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,18]]},"references-count":48,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,2]]}},"alternative-id":["data10020026"],"URL":"https:\/\/doi.org\/10.3390\/data10020026","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,18]]}}}