{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T07:17:37Z","timestamp":1777706257031,"version":"3.51.4"},"reference-count":31,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2025,8,25]],"date-time":"2025-08-25T00:00:00Z","timestamp":1756080000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Intelligent &amp; Fuzzy Systems: Applications in Engineering and Technology"],"published-print":{"date-parts":[[2026,3]]},"abstract":"<jats:p>Credit scoring, which forecasts the probability of loan default based on borrower attributes and credit history, is still a crucial task in the financial industry. Finding the most important characteristics to improve credit scoring accuracy has become more difficult due to the complexity of borrower profiles. This paper presents a systematic and multidimensional evaluation of the impact of different feature selection techniques, namely wrapper-based, filter-based, and embedded methods, on the performance of various machine learning classifiers such as Random Forest (RF) and Extreme Gradient Boosting (XGBoost). The influence of data resampling techniques to address class imbalance is also explored. The study evaluates all combinations under three settings: original, oversampled, and undersampled data, using three publicly available datasets: German, Taiwan, and Australian credit scoring datasets. Experimental results show that ensemble classifiers, especially XGBoost and RF, consistently outperform single classifier models. Additionally, feature selection methods, especially embedded and wrapper techniques, enhance model performance and reduce false positive and false negative rates across the three datasets.<\/jats:p>","DOI":"10.1177\/18758967251369775","type":"journal-article","created":{"date-parts":[[2025,8,25]],"date-time":"2025-08-25T15:00:05Z","timestamp":1756134005000},"page":"851-872","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":0,"title":["Improving Credit Scoring with Feature Selection and Predictive Modeling"],"prefix":"10.1177","volume":"50","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-0266-980X","authenticated-orcid":false,"given":"Mahmoud","family":"Abdelsalam","sequence":"first","affiliation":[{"name":"Mahmoud Abdelsalam, Department of Information Systems, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1943-2861","authenticated-orcid":false,"given":"Samir","family":"Abdelrazek","sequence":"additional","affiliation":[{"name":"Mahmoud Abdelsalam, Department of Information Systems, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7260-8857","authenticated-orcid":false,"given":"Islam R","family":"Abdelmaksoud","sequence":"additional","affiliation":[{"name":"Mahmoud Abdelsalam, Department of Information Systems, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt"}]}],"member":"179","published-online":{"date-parts":[[2025,8,25]]},"reference":[{"key":"e_1_3_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2023.3325331"},{"key":"e_1_3_2_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2016.04.013"},{"key":"e_1_3_2_4_1","unstructured":"Asuncion A. Newman D. (2007a). UCI machine learning repository. https:\/\/archive.ics.uci.edu\/ml\/datasets\/statlog+(german+credit+data). Accessed: 2025-01-01."},{"key":"e_1_3_2_5_1","unstructured":"Asuncion A. Newman D. (2007b). UCI machine learning repository. https:\/\/archive.ics.uci.edu\/dataset\/143\/statlog+australian+credit+approval. Accessed: 2025-05-01."},{"key":"e_1_3_2_6_1","doi-asserted-by":"crossref","unstructured":"Belete D. M. Manjaiah D. (2020). A comparative study of filter and wrapper methods on EDHS\u2013HIV\/AIDS dataset. In 2020 Third international conference on smart systems and inventive technology (ICSSIT) (pp. 1264\u20131271). IEEE.","DOI":"10.1109\/ICSSIT48917.2020.9214212"},{"key":"e_1_3_2_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.surg.2015.12.029"},{"key":"e_1_3_2_8_1","doi-asserted-by":"crossref","unstructured":"Chen T. Guestrin C. (2016). Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785\u2013794).","DOI":"10.1145\/2939672.2939785"},{"key":"e_1_3_2_9_1","doi-asserted-by":"publisher","DOI":"10.1002\/pds.5391"},{"key":"e_1_3_2_10_1","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-024-00882-0"},{"key":"e_1_3_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2022.3148298"},{"key":"e_1_3_2_12_1","volume-title":"Data mining: Accuracy and error measures for classification and prediction","author":"Galdi P.","year":"2019","unstructured":"Galdi P., Tagliaferri R. (2019). Data mining: Accuracy and error measures for classification and prediction. Elsevier."},{"issue":"2","key":"e_1_3_2_13_1","first-page":"105","article-title":"Why 70\/30 or 80\/20 relation between training and testing sets: A pedagogical explanation","volume":"11","author":"Gholamy A.","year":"2018","unstructured":"Gholamy A., Kreinovich V., Kosheleva O. (2018). Why 70\/30 or 80\/20 relation between training and testing sets: A pedagogical explanation. International Journal of Intelligent Technologies and Applied Statistics, 11(2), 105\u2013111.","journal-title":"International Journal of Intelligent Technologies and Applied Statistics"},{"key":"e_1_3_2_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2018.01.012"},{"key":"e_1_3_2_15_1","unstructured":"Ke G. Meng Q. Finley T. Wang T. Chen W. Ma W. Ye Q. Liu T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. In I. Guyon U. V. Luxburg S. Bengio H. Wallach R. Fergus S. Vishwanathan & R. Garnett (Eds.) Advances in neural information processing systems volume 30. Curran Associates Inc."},{"key":"e_1_3_2_16_1","unstructured":"Koc O. Ugur O. Kestel A. S. (2023). The impact of feature selection and transformation on machine learning methods in determining the credit scoring. arXiv preprint arXiv:2303.05427."},{"key":"e_1_3_2_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2024.125327"},{"key":"e_1_3_2_18_1","unstructured":"Labatut V. Cherifi H. (2012). Accuracy measures for the comparison of classifiers. CoRR abs\/1207.3790."},{"key":"e_1_3_2_19_1","doi-asserted-by":"publisher","DOI":"10.3390\/math9070746"},{"key":"e_1_3_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2930235"},{"issue":"1","key":"e_1_3_2_21_1","first-page":"55","article-title":"Data augmentation methods for reject inference in credit risk models","volume":"10","author":"Liao J.","year":"2024","unstructured":"Liao J., Wang W., Xue J., Lei A. (2024). Data augmentation methods for reject inference in credit risk models. Journal of Financial Data Science, 10(1), 55\u201372.","journal-title":"Journal of Financial Data Science"},{"key":"e_1_3_2_22_1","doi-asserted-by":"publisher","DOI":"10.51519\/journalisi.v5i2.487"},{"key":"e_1_3_2_23_1","unstructured":"Prokhorenkova L. Gusev G. Vorobev A. Dorogush A. V. Gulin A. (2018). CatBoost: Unbiased boosting with categorical features. Advances in neural information processing systems volume 31."},{"key":"e_1_3_2_24_1","doi-asserted-by":"crossref","unstructured":"Re M. Valentini G. (2012). Ensemble methods. Advances in machine learning and data mining for astronomy (pp. 563\u2013593).","DOI":"10.1201\/b11822-34"},{"key":"e_1_3_2_25_1","doi-asserted-by":"publisher","DOI":"10.52465\/joiser.v2i1.203"},{"key":"e_1_3_2_26_1","doi-asserted-by":"publisher","DOI":"10.1117\/1.JBO.26.10.105001"},{"key":"e_1_3_2_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-023-09232-2"},{"key":"e_1_3_2_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11277-021-09158-9"},{"key":"e_1_3_2_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejor.2024.10.046"},{"key":"e_1_3_2_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2023.114084"},{"key":"e_1_3_2_31_1","doi-asserted-by":"crossref","unstructured":"Yeh I. C. Lien C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients.","DOI":"10.1016\/j.eswa.2007.12.020"},{"key":"e_1_3_2_32_1","doi-asserted-by":"publisher","DOI":"10.1002\/ijfe.2019"}],"container-title":["Journal of Intelligent &amp; Fuzzy Systems: Applications in Engineering and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/18758967251369775","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/18758967251369775","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/18758967251369775","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T09:46:25Z","timestamp":1777455985000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/18758967251369775"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,25]]},"references-count":31,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,3]]}},"alternative-id":["10.1177\/18758967251369775"],"URL":"https:\/\/doi.org\/10.1177\/18758967251369775","relation":{},"ISSN":["1064-1246","1875-8967"],"issn-type":[{"value":"1064-1246","type":"print"},{"value":"1875-8967","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,25]]}}}