{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T13:37:19Z","timestamp":1761745039816,"version":"build-2065373602"},"reference-count":34,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T00:00:00Z","timestamp":1761696000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>Lung cancer (LC) remains a leading cause of cancer mortality worldwide, where accurate and early identification of gene mutations such as epidermal growth factor receptor (EGFR) is critical for precision treatment. However, machine learning-based radiomics approaches often face challenges due to the small and imbalanced nature of the datasets. This study proposes a comprehensive framework based on Generic Sparse Regularized Autoencoders with Kullback\u2013Leibler divergence (GSRA-KL) to generate high-quality synthetic radiomics data and overcome these limitations. A systematic approach generated 63 synthetic radiomics datasets by tuning a novel kl_weight regularization hyperparameter across three hidden-layer sizes, optimized using Optuna for computational efficiency. A rigorous assessment was conducted to evaluate the impact of hyperparameter tuning across 63 synthetic datasets, with a focus on the EGFR gene mutation. This evaluation utilized resemblance-dimension scores (RDS), novel utility-dimension scores (UDS), and t-SNE visualizations to ensure the validation of data quality, revealing that GSRA-KL achieves excellent performance (RDS &gt; 0.45, UDS &gt; 0.7), especially when class distribution is balanced, while remaining competitive with the Tabular Variational Autoencoder (TVAE). Additionally, a comprehensive statistical correlation analysis demonstrated strong and significant monotonic relationships among resemblance-based performance metrics up to moderate scaling (\u22641.0*), confirming the robustness and stability of inter-metric associations under varying configurations. Complementary computational cost evaluation further indicated that moderate kl_weight values yield an optimal balance between reconstruction accuracy and resource utilization, with Spearman correlations revealing improved reconstruction quality (MSE \u03c1=\u22120.78, p&lt;0.001) at reduced computational overhead. The ablation-style analysis confirmed that including the KL divergence term meaningfully enhances the generative capacity of GSRA-KL over its baseline counterpart. Furthermore, the GSRA-KL framework achieved substantial improvements in computational efficiency compared to prior PSO-based optimization methods, resulting in reduced memory usage and training time. Overall, GSRA-KL represents an incremental yet practical advancement for augmenting small and imbalanced high-dimensional radiomics datasets, showing promise for improved mutation prediction and downstream precision oncology studies.<\/jats:p>","DOI":"10.3390\/fi17110495","type":"journal-article","created":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T09:16:24Z","timestamp":1761729384000},"page":"495","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Sparse Regularized Autoencoders-Based Radiomics Data Augmentation for Improved EGFR Mutation Prediction in NSCLC"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2899-8223","authenticated-orcid":false,"given":"Muhammad Asif","family":"Munir","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan"},{"name":"Department of Electrical Engineering, Swedish College of Engineering and Technology, Shahbazpur Road, Rahim Yar Khan 64200, Pakistan"}]},{"given":"Reehan Ali","family":"Shah","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Shaheed Benazir Bhutto University, SBA (SBBU-SBA), Nawabshah 67450, Pakistan"},{"name":"Department of Computer Systems Engineering, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2779-1642","authenticated-orcid":false,"given":"Urooj","family":"Waheed","sequence":"additional","affiliation":[{"name":"Department of Computer Science, DHA Suffa University, Karachi 75500, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7819-7961","authenticated-orcid":false,"given":"Muhammad Aqeel","family":"Aslam","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, GIFT University, Gujranwala 52250, Pakistan"}]},{"given":"Zeeshan","family":"Rashid","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9064-9596","authenticated-orcid":false,"given":"Mohammed","family":"Aman","sequence":"additional","affiliation":[{"name":"Department of Industrial Engineering, College of Engineering, University of Business and Technology, Jeddah 21361, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6538-2984","authenticated-orcid":false,"given":"Muhammad I.","family":"Masud","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, College of Engineering, University of Business and Technology, Jeddah 21361, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7359-2743","authenticated-orcid":false,"given":"Zeeshan Ahmad","family":"Arfeen","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"e065303","DOI":"10.1136\/bmjopen-2022-065303","article-title":"Lung cancer mortality in the wake of the changing smoking epidemic: A descriptive study of the global burden in 2020 and 2040","volume":"13","author":"Morgan","year":"2023","journal-title":"BMJ Open"},{"key":"ref_2","first-page":"209","article-title":"Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries","volume":"71","author":"Sung","year":"2021","journal-title":"CA Cancer J. Clin."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"958","DOI":"10.1056\/NEJMoa0904554","article-title":"Screening for epidermal growth factor receptor mutations in lung cancer","volume":"361","author":"Rosell","year":"2009","journal-title":"N. Engl. J. Med."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"2380","DOI":"10.1056\/NEJMoa0909530","article-title":"Gefitinib or chemotherapy for non\u2013small-cell lung cancer with mutated EGFR","volume":"362","author":"Maemondo","year":"2010","journal-title":"N. Engl. J. Med."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1056\/NEJMoa1913662","article-title":"Overall survival with osimertinib in untreated, EGFR-mutated advanced NSCLC","volume":"382","author":"Ramalingam","year":"2020","journal-title":"N. Engl. J. Med."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"446","DOI":"10.1038\/nature25183","article-title":"The biology and management of non-small cell lung cancer","volume":"553","author":"Herbst","year":"2018","journal-title":"Nature"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/s12931-020-01608-5","article-title":"Mutation profile of non-small cell lung cancer revealed by next generation sequencing","volume":"22","author":"Chang","year":"2021","journal-title":"Respir. Res."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1148\/radiol.2015151169","article-title":"Radiomics: Images are more than pictures, they are data","volume":"278","author":"Gillies","year":"2016","journal-title":"Radiology"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1038\/nrclinonc.2017.141","article-title":"Radiomics: The bridge between medical imaging and personalized medicine","volume":"14","author":"Lambin","year":"2017","journal-title":"Nat. Rev. Clin. Oncol."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"4006","DOI":"10.1038\/ncomms5006","article-title":"Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach","volume":"5","author":"Aerts","year":"2014","journal-title":"Nat. Commun."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Parmar, C., Grossmann, P., Rietveld, D., Rietbergen, M.M., Lambin, P., and Aerts, H.J.W.L. (2015). Radiomic machine-learning classifiers for prognostic biomarkers of head and neck cancer. Front. Oncol., 5.","DOI":"10.3389\/fonc.2015.00272"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"995","DOI":"10.1016\/j.csbj.2019.07.001","article-title":"Radiomics and artificial intelligence for biomarker and prediction model development in oncology","volume":"17","author":"Forghani","year":"2019","journal-title":"Comput. Struct. Biotechnol. J."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1080\/14737140.2021.1852935","article-title":"Radiomics features as predictive and prognostic biomarkers in NSCLC","volume":"21","author":"Bortolotto","year":"2021","journal-title":"Expert Rev. Anticancer Ther."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.lungcan.2019.03.025","article-title":"Radiomics signature: A potential and incremental predictor for EGFR mutation status in NSCLC patients, comparison with CT morphology","volume":"132","author":"Tu","year":"2019","journal-title":"Lung Cancer"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"691","DOI":"10.1166\/jbn.2021.3056","article-title":"Radiomics analysis to enhance precise identification of epidermal growth factor receptor mutation based on positron emission tomography images of lung cancer patients","volume":"17","author":"Li","year":"2021","journal-title":"J. Biomed. Nanotechnol."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"154","DOI":"10.3390\/tomography7020014","article-title":"A radiogenomics ensemble to predict EGFR and KRAS mutations in NSCLC","volume":"7","author":"Moreno","year":"2021","journal-title":"Tomography"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"7324","DOI":"10.1109\/TNNLS.2022.3190671","article-title":"GMILT: A novel transformer network that can noninvasively predict EGFR mutation status","volume":"35","author":"Zhao","year":"2022","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"10851","DOI":"10.2147\/CMAR.S232473","article-title":"Application of radiomics for personalized treatment of cancer patients","volume":"11","author":"Meng","year":"2019","journal-title":"Cancer Manag. Res."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Bidzi\u0144ska, J., and Szurowska, E. (2023). See Lung Cancer with an AI. Cancers, 15.","DOI":"10.3390\/cancers15041321"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Wu, Y., Wu, F., Yang, S., Tang, E., and Liang, C. (2022). Radiomics in early lung cancer diagnosis: From diagnosis to clinical decision support and education. Diagnostics, 12.","DOI":"10.3390\/diagnostics12051064"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"e19","DOI":"10.1055\/s-0042-1760247","article-title":"Synthetic tabular data evaluation in the health domain covering resemblance, utility, and privacy dimensions","volume":"62","author":"Hernadez","year":"2023","journal-title":"Methods Inf. Med."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2892","DOI":"10.1016\/j.csbj.2024.07.005","article-title":"Synthetic data generation methods in healthcare: A review on open-source tools and methods","volume":"23","author":"Pezoulas","year":"2024","journal-title":"Comput. Struct. Biotechnol. J."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"112223","DOI":"10.1016\/j.asoc.2024.112223","article-title":"Challenges and opportunities of generative models on tabular data","volume":"166","author":"Wang","year":"2024","journal-title":"Appl. Soft Comput."},{"key":"ref_24","first-page":"7335","article-title":"Modeling tabular data using conditional gan","volume":"32","author":"Xu","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"7499","DOI":"10.1109\/TNNLS.2022.3229161","article-title":"Deep Neural Networks and Tabular Data: A Survey","volume":"35","author":"Borisov","year":"2022","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"7407","DOI":"10.1109\/ACCESS.2024.3523330","article-title":"Enhancing Gene Mutation Prediction With Sparse Regularized Autoencoders in Lung Cancer Radiomics Analysis","volume":"13","author":"Munir","year":"2024","journal-title":"IEEE Access"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. (2019, January 4\u20138). Optuna: A next-generation hyperparameter optimization framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA.","DOI":"10.1145\/3292500.3330701"},{"key":"ref_28","first-page":"1","article-title":"Statistical comparisons of classifiers over multiple data sets","volume":"7","year":"2006","journal-title":"J. Mach. Learn. Res."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"2044","DOI":"10.1016\/j.ins.2009.12.010","article-title":"Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power","volume":"180","author":"Luengo","year":"2010","journal-title":"Inf. Sci."},{"key":"ref_30","first-page":"2234","article-title":"Improved techniques for training GANs","volume":"29","author":"Salimans","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_31","unstructured":"Arjovsky, M., and Bottou, L. (2017). Towards principled methods for training generative adversarial networks. arXiv."},{"key":"ref_32","first-page":"5769","article-title":"Improved training of Wasserstein GANs","volume":"30","author":"Gulrajani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_33","first-page":"2951","article-title":"Practical Bayesian optimization of machine learning algorithms","volume":"25","author":"Snoek","year":"2012","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_34","unstructured":"Bergstra, J., Yamins, D., and Cox, D.D. (2013, January 16\u201321). Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/17\/11\/495\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T09:25:40Z","timestamp":1761729940000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/17\/11\/495"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,29]]},"references-count":34,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2025,11]]}},"alternative-id":["fi17110495"],"URL":"https:\/\/doi.org\/10.3390\/fi17110495","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,29]]}}}