{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T23:41:17Z","timestamp":1768434077050,"version":"3.49.0"},"reference-count":34,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2025,2,18]],"date-time":"2025-02-18T00:00:00Z","timestamp":1739836800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Dunarea de Jos University of Galati","award":["825\/30.09.2024"],"award-info":[{"award-number":["825\/30.09.2024"]}]},{"name":"Dunarea de Jos University of Galati","award":["826\/30.09.2024"],"award-info":[{"award-number":["826\/30.09.2024"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>The topic of this study is the testing of the robustness of machine learning (ML) and neural network (NN) models with a new idea based on corrupted data. Typically, ML and NN classifiers are trained on real feature data; however, a portion of the features may be false, with noise, or incorrect. The undesired content was analyzed in eight experiments with false data, six with feature noise, and six with label noise. These tests were all conducted on the public Breast Cancer Wisconsin Dataset (BCWD). Throughout this, the false and noise data were gradually corrupted in a random way, generating new data and replacing raw features that belonged to the BCWD. Artificial Intelligence (AI) should be properly selected while categorizing different diseases using medical data. The Pearson correlation coefficient (PCC) applied between features monitored their correlation in each experiment, and a correlation matrix between both true and false features was used. Four machine learning (ML) algorithms\u2014Random Forest (RF), XGBClassifier (XGB), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM)\u2014were used, as well as for the analysis of important features (IF) and the binary classification. The study was completed using three deep neural networks\u2014a simple Deep Neural Network (DNN), a Convolutional Neural Network (CNN), and a Transformer Neural Network (TNN). In the context of a binary classification, the accuracy, F1-score, Area Under the Curve (AUC), and Matthews correlation coefficient (MCC) metrics of the performance of classification in malignant versus benign breast cancer (BC) was computed. The results demonstrated the robustness of some methods and the sensitivity of other machine learning algorithms in the context of corrupted data, computational cost, and hyperparameters optimization.<\/jats:p>","DOI":"10.3390\/bdcc9020045","type":"journal-article","created":{"date-parts":[[2025,2,18]],"date-time":"2025-02-18T12:16:37Z","timestamp":1739880997000},"page":"45","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Impact on Classification Process Generated by Corrupted Features"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5934-329X","authenticated-orcid":false,"given":"Simona","family":"Moldovanu","sequence":"first","affiliation":[{"name":"Department of Computer Science and Information Technology, Faculty of Automation, Computers, Electrical Engineering and Electronics, Dunarea de Jos University of Galati, 47 Domneasca Str., 800008 Galati, Romania"},{"name":"The Modelling & Simulation Laboratory, Dunarea de Jos University of Galati, 47 Domneasca Str., 800008 Galati, Romania"}]},{"given":"Dan","family":"Munteanu","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Technology, Faculty of Automation, Computers, Electrical Engineering and Electronics, Dunarea de Jos University of Galati, 47 Domneasca Str., 800008 Galati, Romania"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5812-6613","authenticated-orcid":false,"given":"Carmen","family":"S\u00eerbu","sequence":"additional","affiliation":[{"name":"Department of Administration, Dunarea de Jos University of Galati, 47 Domneasca Str., 800008 Galati, Romania"}]}],"member":"1968","published-online":{"date-parts":[[2025,2,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Pichugin, Y.A., Malafeyev, O.A., Rylow, D., and Zaitseva, I. (2018). A statistical method for corrupt agents detection. AIP Conference Proceedings, AIP Publishing.","DOI":"10.1063\/1.5043758"},{"key":"ref_2","unstructured":"Zhu, Z., Dong, Z., and Liu, Y. (2022, January 17\u201323). Detecting corrupted labels without training a model to predict. Proceedings of the 39th International Conference on Machine Learning, PMLR, Baltimore, MD, USA."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"e18910","DOI":"10.2196\/18910","article-title":"Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing","volume":"8","author":"Rankin","year":"2020","journal-title":"JMIR Med. Inform."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"485","DOI":"10.1016\/j.ins.2021.12.018","article-title":"Differentially private synthetic medical data generation using convolutional GANs","volume":"586","author":"Torfi","year":"2022","journal-title":"Inf. Sci."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Goncalves, A., Ray, P., Soper, B., Stevens, J., Coyle, L., and Sales, A.P. (2020). Generation and evaluation of synthetic patient data. BMC Med. Res. Methodol., 20.","DOI":"10.1186\/s12874-020-00977-1"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"118994","DOI":"10.1016\/j.neuroimage.2022.118994","article-title":"Robust learning from corrupted EEG with dynamic spatial filtering","volume":"251","author":"Banville","year":"2022","journal-title":"NeuroImage"},{"key":"ref_7","unstructured":"Budach, L., Feuerpfeil, M., Ihde, N., Nathansen, A., Noack, N., Patzlaff, H., Harmouch, H., and Naumann, F. (2022). The Effects of Data Quality on Machine Learning Performance. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"e17164","DOI":"10.1002\/aic.17164","article-title":"Machine learning modeling and predictive control of nonlinear processes using noisy data","volume":"67","author":"Wu","year":"2021","journal-title":"AIChE J."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1016\/j.cherd.2022.07.035","article-title":"Physics-informed machine learning modeling for predictive control using noisy data","volume":"186","author":"Alhajeri","year":"2022","journal-title":"Chem. Eng. Res. Des."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1367","DOI":"10.1214\/22-EJS1987","article-title":"Binary classification with corrupted labels","volume":"16","author":"Lee","year":"2022","journal-title":"Electron. J. Stat."},{"key":"ref_11","unstructured":"Feldman, S., Einbinder, B.S., Bates, S., Angelopoulos, A.N., Gendler, A., and Romano, Y. (2023, January 13\u201315). Conformal prediction is robust to dispersive label noise. Proceedings of the Conformal and Probabilistic Prediction with Applications, Limassol, Cyprus."},{"key":"ref_12","first-page":"10456","article-title":"Using trusted data to train deep networks on labels corrupted by severe noise Advances in Neural Information Processing Systems","volume":"31","author":"Hendrycks","year":"2018","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_13","first-page":"4864","article-title":"Comparison of breast cancer classification models on Wisconsin dataset","volume":"2089","author":"Kadhim","year":"2022","journal-title":"Int. J. Reconfigurable Embed. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Mohammad, W., Teete, R., Al-Aaraj, H., Rubbai, Y., and Arabyat, M. (2022). Diagnosis of breast cancer pathology on the Wisconsin dataset with the help of data mining classification and clustering techniques. Appl. Bionics Biomech., 9.","DOI":"10.1155\/2022\/6187275"},{"key":"ref_15","first-page":"67","article-title":"An evaluation of the Wisconsin breast cancer dataset using ensemble classifiers and RFE feature selection","volume":"55","author":"Abdulkareem","year":"2021","journal-title":"Int. J. Sci. Basic Appl. Res."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"El-Shair, Z.A., S\u00e1nchez-P\u00e9rez, L.A., and Rawashdeh, S.A. (August, January 31). Comparative Study of Machine Learning Algorithms Using a Breast Cancer Dataset. Proceedings of the 2020 IEEE International Conference on Electro Information Technology (EIT), Chicago, IL, USA.","DOI":"10.1109\/EIT48999.2020.9208315"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"15","DOI":"10.62527\/ijasce.5.1.105","article-title":"Comparative Study of Machine Learning Models on Multiple Breast Cancer Datasets","volume":"5","author":"Sujon","year":"2023","journal-title":"Int. J. Adv. Sci. Comput. Eng."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Hern\u00e1ndez-Julio, Y.F., D\u00edaz-Pertuz, L.A., Prieto-Guevara, M.J., Barrios-Barrios, M.A., and Nieto-Bernal, W. (2023). Intelligent fuzzy system to predict the wisconsin breast cancer dataset. Int. J. Environ. Res. Public Health, 20.","DOI":"10.3390\/ijerph20065103"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"109","DOI":"10.56532\/mjsat.v4i2.245","article-title":"Deep Learning Paradigms for Breast Cancer Diagnosis: A Comparative Study on Wisconsin Diagnostic Dataset","volume":"4","author":"Jony","year":"2024","journal-title":"Malays. J. Sci. Adv. Technol."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"e212706","DOI":"10.1049\/ell2.12706","article-title":"Effective kernel-principal component analysis based approach for wisconsin breast cancer diagnosis","volume":"59","author":"Mushtaq","year":"2023","journal-title":"Electron. Lett."},{"key":"ref_21","unstructured":"Street, W.N., Wolberg, W.H., and Mangasarian, O.L. (February, January 31). Nuclear feature extraction for breast tumor diagnosis. Proceedings of the IS&T\/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, San Jose, CA, USA."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Hu, J., and Szymczak, S. (2023). A review on longitudinal data analysis with random forest. Briefings Bioinform., 24.","DOI":"10.1093\/bib\/bbad002"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Tabacaru, G., Moldovanu, S., R\u0103ducan, E., and Barbu, M. (2024). A Robust Machine Learning Model for Diabetic Retinopathy Classification. J. Imaging, 10.","DOI":"10.3390\/jimaging10010008"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Damian, F.A., Moldovanu, S., and Moraru, L. (2022, January 17\u201318). Melanoma detection using a random forest algorithm. Proceedings of the 2022 E-Health and Bioengineering Conference (EHB), Iasi, Romania.","DOI":"10.1109\/EHB55594.2022.9991668"},{"key":"ref_25","first-page":"6","article-title":"Comparison of multiclass classification techniques using dry bean dataset","volume":"4","author":"Khan","year":"2023","journal-title":"Int. J. Cogn. Comput. Eng."},{"key":"ref_26","first-page":"10007","article-title":"A comparative analysis of K-Nearest Neighbour, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning","volume":"3","author":"Bansal","year":"2022","journal-title":"Decis. Anal. J."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"509","DOI":"10.1016\/j.spc.2022.12.002","article-title":"Adopting Artificial Intelligence for enhancing the implementation of systemic circularity in the construction industry: A critical review","volume":"35","author":"Oluleye","year":"2022","journal-title":"Sustain. Prod. Consum."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"9","DOI":"10.52846\/stccj.2024.4.1.59","article-title":"Analyzing deep learning algorithms with statistical methods","volume":"4","author":"Trifan","year":"2024","journal-title":"Syst. Theory Control Comput. J."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_30","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Attention is all you need. Adv. Neural Inf. Process."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Starmans, M.P., van derVoort, S.R., Tovar, J.M.C., Veenland, J.F., Klein, S., and Niessen, W.J. (2020). Radiomics: Data mining using quantitative medical image features. Handbook of Medical Image Computing and Computer Assisted Intervention, Elsevier.","DOI":"10.1016\/B978-0-12-816176-0.00023-5"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"107043","DOI":"10.1016\/j.csda.2020.107043","article-title":"A new correlation coefficient between categorical, ordinal and interval variables with pearson characteristics","volume":"152","author":"Baak","year":"2020","journal-title":"Comput. Stat. Data Anal."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1057\/jt.2009.5","article-title":"The correlation coefficient: Its values range between +1\/\u22121, or do they?","volume":"17","author":"Ratner","year":"2009","journal-title":"J. Target. Meas. Anal. Mark."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Chicco, D., T\u00f6tsch, N., and Jurman, G. (2021). The Matthews correlation coefficient (MCC) is more re-liable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Min., 14.","DOI":"10.1186\/s13040-021-00244-z"}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/2\/45\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:37:09Z","timestamp":1760027829000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/2\/45"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,18]]},"references-count":34,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,2]]}},"alternative-id":["bdcc9020045"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9020045","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,18]]}}}