{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T19:07:41Z","timestamp":1774552061671,"version":"3.50.1"},"reference-count":43,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2022,8,19]],"date-time":"2022-08-19T00:00:00Z","timestamp":1660867200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Special Funding Project for the Science and Technology Innovation Cultivation of Guangdong University Students","award":["pdjh2020a0748"],"award-info":[{"award-number":["pdjh2020a0748"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>This paper proposes a new method that can identify and predict financial fraud among listed companies based on machine learning. We collected 18,060 transactions and 363 indicators of finance, including 362 financial variables and a class variable. Then, we eliminated 9 indicators which were not related to financial fraud and processed the missing values. After that, we extracted 13 indicators from 353 indicators which have a big impact on financial fraud based on multiple feature selection models and the frequency of occurrence of features in all algorithms. Then, we established five single classification models and three ensemble models for the prediction of financial fraud records of listed companies, including LR, RF, XGBOOST, SVM, and DT and ensemble models with a voting classifier. Finally, we chose the optimal single model from five machine learning algorithms and the best ensemble model among all hybrid models. In choosing the model parameter, optimal parameters were selected by using the grid search method and comparing several evaluation metrics of models. The results determined the accuracy of the optimal single model to be in a range from 97% to 99%, and that of the ensemble models as higher than 99%. This shows that the optimal ensemble model performs well and can efficiently predict and detect fraudulent activity of companies. Thus, a hybrid model which combines a logistic regression model with an XGBOOST model is the best among all models. In the future, it will not only be able to predict fraudulent behavior in company management but also reduce the burden of doing so.<\/jats:p>","DOI":"10.3390\/e24081157","type":"journal-article","created":{"date-parts":[[2022,8,21]],"date-time":"2022-08-21T22:23:13Z","timestamp":1661120593000},"page":"1157","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":52,"title":["Financial Fraud Detection and Prediction in Listed Companies Using SMOTE and Machine Learning Algorithms"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1375-2127","authenticated-orcid":false,"given":"Zhihong","family":"Zhao","sequence":"first","affiliation":[{"name":"School of Applied Science and Civil Engineering, Beijing Institute of Technology, Zhuhai 519085, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7696-519X","authenticated-orcid":false,"given":"Tongyuan","family":"Bai","sequence":"additional","affiliation":[{"name":"Faculty of Natural, Mathematical and Engineering Sciences, King\u2019s College, London WC2R 2LS, UK"}]}],"member":"1968","published-online":{"date-parts":[[2022,8,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1292","DOI":"10.1111\/joes.12294","article-title":"Financial fraud: A literature review","volume":"32","author":"Reurink","year":"2018","journal-title":"J. Econ. Surv."},{"key":"ref_2","first-page":"177","article-title":"Corrupt behavior in a psychological perspective","volume":"4","author":"Restya","year":"2019","journal-title":"Asia Pac. Fraud. J."},{"key":"ref_3","unstructured":"Treadway, J.C., Thompson, G., and Woolworth, F.W. (1987). Comment letters to the National Commission on Commission on Fraudulent Financial Reporting, Treadway Commission."},{"key":"ref_4","first-page":"31","article-title":"A study for establishing a fraud audit","volume":"17","author":"Li","year":"2002","journal-title":"Audit. Econ. Res."},{"key":"ref_5","first-page":"2383","article-title":"The impact of financial distress, stability, and liquidity on the likelihood of financial statement fraud","volume":"17","author":"Handoko","year":"2020","journal-title":"Palarch\u2019s J. Archaeol. Egypt\/Egyptology"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Peng, Z. (2020, August 01). A Ripple in the Muddy Waters: The Luckin Coffee Scandal and Short Selling Attacks. Available online: https:\/\/ssrn.com\/abstract=3672971.","DOI":"10.2139\/ssrn.3672971"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Li, Y. (2021). Research on the Effectiveness of China\u2019s A-share Main Board Market. E3S Web of Conferences, EDP Sciences.","DOI":"10.1051\/e3sconf\/202123501031"},{"key":"ref_8","first-page":"100176","article-title":"Intelligent financial fraud detection practices in post-pandemic era","volume":"2","author":"Zhu","year":"2021","journal-title":"Innovation"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Mohammed, R.A., Wong, K.W., Shiratuddin, M.F., and Wang, X. (2018). Scalable machine learning techniques for highly imbalanced credit card fraud detection: A comparative study. Pacific Rim International Conference on Artificial Intelligence, Springer.","DOI":"10.1007\/978-3-319-97310-4_27"},{"key":"ref_10","first-page":"1","article-title":"A systematic review on imbalanced data challenges in machine learning: Applications and solutions","volume":"52","author":"Kaur","year":"2020","journal-title":"Acm Comput. Surv. (CSUR)"},{"key":"ref_11","first-page":"42","article-title":"An overview of classification algorithms for imbalanced data set","volume":"2","author":"Ganganwar","year":"2012","journal-title":"Int. J. Emerg. Technol. Adv. Eng."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1007\/s10994-005-1505-9","article-title":"Combined SVM-based feature selection and classification","volume":"61","author":"Neumann","year":"2005","journal-title":"Mach. Learn."},{"key":"ref_13","unstructured":"Tang, J., Alelyani, S., and Liu, H. (2014). Feature selection for classification: A review. Data Classif. Algorithms Appl., 37\u201364."},{"key":"ref_14","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1016\/S0004-3702(97)00043-X","article-title":"Wrappers for feature subset selection","volume":"97","author":"Kohavi","year":"1997","journal-title":"Artif. Intell."},{"key":"ref_16","first-page":"64","article-title":"Review of feature selection for solving classification problems","volume":"3","author":"Omar","year":"2013","journal-title":"J. Inf. Syst. Res. Innov."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"726","DOI":"10.1080\/18756891.2016.1204120","article-title":"A mutual information estimator for continuous and discrete variables applied to feature selection and classification problem","volume":"9","author":"Coelho","year":"2016","journal-title":"Int. J. Comput. Intell. Syst."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"169","DOI":"10.2308\/aud.2000.19.1.169","article-title":"A Decision Aid for Assessing the Likelihood of Fraudulent Financial Reporting","volume":"19","author":"Bell","year":"2000","journal-title":"Audit. J. Pract. Theory"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1108\/02686900210424321","article-title":"Detecting False Financial Statements Using Published Data: Some Evidence from Greece","volume":"17","author":"Spathis","year":"2002","journal-title":"Manag. Audit. J."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"995","DOI":"10.1016\/j.eswa.2006.02.016","article-title":"Data mining techniques for the detection of fraudulent financial statements","volume":"32","author":"Kirkos","year":"2007","journal-title":"Expert Syst. Appl."},{"key":"ref_21","first-page":"53","article-title":"Detecting and Predicting Financial Statement Fraud: The Effectiveness of the Fraud Triangle and SAS No. 99","volume":"13","author":"Skousen","year":"2008","journal-title":"Soc. Sci. Electron. Publ."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1016\/j.dss.2010.11.006","article-title":"Detection of financial statement fraud and feature selection using data mining techniques","volume":"50","author":"Ravisankar","year":"2011","journal-title":"Decis. Support Syst."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1016\/j.dss.2010.08.010","article-title":"A computational model for financial reporting fraud detection","volume":"50","author":"Glancy","year":"2011","journal-title":"Decis. Support Syst."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"59475","DOI":"10.1109\/ACCESS.2018.2874063","article-title":"Cervical Cancer Diagnosis Using Random Forest Classifier with SMOTE and Feature Reduction Techniques","volume":"6","author":"Abdoh","year":"2018","journal-title":"IEEE Access"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"165286","DOI":"10.1109\/ACCESS.2021.3134330","article-title":"Performance Evaluation of Machine Learning Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost","volume":"6","author":"Ileberi","year":"2021","journal-title":"IEEE Access"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"352","DOI":"10.1016\/S1532-0464(03)00034-0","article-title":"Logistic regression and artificial neural network classification models: A methodology review","volume":"35","author":"Dreiseitl","year":"2002","journal-title":"J. Biomed. Inform."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1016\/j.eswa.2019.05.028","article-title":"A comparison of random forest variable selection methods for classification prediction modeling","volume":"134","author":"Speiser","year":"2019","journal-title":"Expert Syst. Appl."},{"key":"ref_29","first-page":"651","article-title":"Experimenting XGBOOST algorithm for prediction and classification of different data sets","volume":"9","author":"Ramraj","year":"2016","journal-title":"Int. J. Control. Theory Appl."},{"key":"ref_30","first-page":"185","article-title":"A review on support vector machine for data classification","volume":"11","author":"Bhavsar","year":"2012","journal-title":"Int. J. Adv. Res. Comput. Eng. Technol."},{"key":"ref_31","first-page":"130","article-title":"Decision tree methods: Applications for classification and prediction","volume":"27","author":"Song","year":"2015","journal-title":"Shanghai Arch. Psychiatry"},{"key":"ref_32","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_33","unstructured":"(2022, March 29). LogisticRegression. Available online: https:\/\/scikit-learn.org\/stable\/modules\/classes.html."},{"key":"ref_34","unstructured":"(2022, March 29). RandomForestClassifier. Available online: https:\/\/scikit-learn.org\/stable\/supervised_learning.html."},{"key":"ref_35","unstructured":"(2022, March 29). SVC. Available online: https:\/\/scikit-learn.org\/stable\/supervised_learning.html."},{"key":"ref_36","unstructured":"(2022, March 29). DecisionTreeClassifier. Available online: https:\/\/scikit-learn.org\/stable\/supervised_learning.html."},{"key":"ref_37","first-page":"19","article-title":"Comparison of bagging and voting ensemble machine learning algorithm as a classifier","volume":"9","author":"Kabari","year":"2019","journal-title":"Int. J. Adv. Res. Comput. Sci. Softw. Eng."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"14277","DOI":"10.1109\/ACCESS.2018.2806420","article-title":"Credit card fraud detection using AdaBoost and majority voting","volume":"6","author":"Randhawa","year":"2018","journal-title":"IEEE Access"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"25579","DOI":"10.1109\/ACCESS.2020.2971354","article-title":"An intelligent approach to credit card fraud detection using an optimized light gradient boosting machine","volume":"8","author":"Taha","year":"2020","journal-title":"IEEE Access"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1177\/8756479308317006","article-title":"Measures of association: How to choose?","volume":"24","author":"Khamis","year":"2008","journal-title":"J. Diagn. Med. Sonogr."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"9293877","DOI":"10.1155\/2021\/9293877","article-title":"Financial fraud detection in healthcare using machine learning and deep learning techniques","volume":"2021","author":"Mehbodniya","year":"2021","journal-title":"Secur. Commun. Netw."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"96","DOI":"10.14445\/22315381\/IJETT-V69I3P216","article-title":"A comparative study of using various machine learning and deep learning-based fraud detection models for universal health coverage schemes","volume":"69","author":"Gupta","year":"2021","journal-title":"Int. J. Eng. Trends Technol."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Mathew, A., Amudha, P., and Sivakumari, S. (2020). Deep learning techniques: An overvie. International Conference on Advanced Machine Learning Technologies and Applications, Springer.","DOI":"10.1007\/978-981-15-3383-9_54"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/8\/1157\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:12:39Z","timestamp":1760141559000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/8\/1157"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,19]]},"references-count":43,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2022,8]]}},"alternative-id":["e24081157"],"URL":"https:\/\/doi.org\/10.3390\/e24081157","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,19]]}}}