{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T02:21:14Z","timestamp":1777342874160,"version":"3.51.4"},"reference-count":27,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T00:00:00Z","timestamp":1765238400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>Fraud in financial services\u2014especially account opening fraud\u2014poses major operational and reputational risks. Static rules struggle to adapt to evolving tactics, missing novel patterns and generating excessive false positives. Machine learning promises adaptive detection, but deployment faces severe class imbalance: in the NeurIPS 2022 BAF Base benchmark used here, fraud prevalence is 1.10%. Standard metrics (accuracy, f1_weighted) can look strong while doing little for the minority class. We compare Logistic Regression, SVM (RBF), Random Forest, LightGBM, and a GRU model on N = 1,000,000 accounts under a unified preprocessing pipeline. All models are trained to minimize their loss function, while configurations are selected on a stratified development set using validation-weighted F1-score f1_weighted. For the four classical models, class weighting in the loss (class_weight\u00a0\u2208{None,\u2018balanced\u2019}) is treated as a hyperparameter and tuned. Similarly, the GRU is trained with a fixed class-weighted CrossEntropy loss that up-weights fraud cases. This ensures that both model families leverage weighted training objectives, while their final hyperparameters are consistently selected by the f1_weighted metric. Despite similar AUCs and aligned feature importance across families, the classical models converge to high-precision, low-recall solutions (1\u20136% fraud recall), whereas the GRU recovers 78% recall at 5% precision (AUC =0.8800). Under extreme imbalance, objective choice and operating point matter at least as much as architecture.<\/jats:p>","DOI":"10.3390\/computation13120290","type":"journal-article","created":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T09:02:50Z","timestamp":1765270970000},"page":"290","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Objective over Architecture: Fraud Detection Under Extreme Imbalance in Bank Account Opening"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-2585-6015","authenticated-orcid":false,"given":"Wenxi","family":"Sun","sequence":"first","affiliation":[{"name":"Krieger School of Arts and Sciences, Johns Hopkins University, Washington, DC 20001, USA"}]},{"given":"Qiannan","family":"Shen","sequence":"additional","affiliation":[{"name":"Graduate School of Art and Science, Boston University, Boston, MA 02215, USA"}]},{"given":"Yijun","family":"Gao","sequence":"additional","affiliation":[{"name":"Krieger School of Arts and Sciences, Johns Hopkins University, Washington, DC 20001, USA"}]},{"given":"Qinkai","family":"Mao","sequence":"additional","affiliation":[{"name":"Weissman School of Arts and Sciences, Baruch College, City University of New York, New York, NY 10010, USA"}]},{"given":"Tongsong","family":"Qi","sequence":"additional","affiliation":[{"name":"Charles V. Schaefer Jr. School of Engineering and Science, Stevens Institute of Technology, Hoboken, NJ 07030, USA"}]},{"given":"Shuo","family":"Xu","sequence":"additional","affiliation":[{"name":"Computer Science and Engineering Department, University of California San Diego, La Jolla, CA 92093, USA"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,9]]},"reference":[{"key":"ref_1","unstructured":"Aite-Novarica Group (2021). Synthetic Identity Fraud: The Elephant in the Room, Aite-Novarica Group. Technical Report."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1130","DOI":"10.1057\/s41599-024-03606-0","article-title":"Financial fraud detection through the application of machine learning techniques: A literature review","volume":"11","year":"2024","journal-title":"Humanit. Soc. Sci. Commun."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"101744","DOI":"10.1016\/j.ribaf.2022.101744","article-title":"Insurance fraud detection: Evidence from artificial intelligence and machine learning","volume":"62","author":"Aslam","year":"2022","journal-title":"Res. Int. Bus. Financ."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"6157","DOI":"10.1007\/s41060-025-00850-8","article-title":"A systematic review of machine learning approaches for detecting deceptive activities on social media: Methods, challenges, and biases","volume":"20","author":"Liu","year":"2025","journal-title":"Int. J. Data Sci. Anal."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Tian, Y., Xu, S., Cao, Y., Wang, Z., and Wei, Z. (2025). An empirical comparison of machine learning and deep learning models for automated fake news detection. Mathematics, 13.","DOI":"10.20944\/preprints202506.0122.v1"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Xu, S., Cao, Y., Wang, Z., and Tian, Y. (2025, January 20\u201322). Fraud detection in online transactions: Toward hybrid supervised\u2013unsupervised learning pipelines. Proceedings of the 6th International Conference on Electronic Communication and Artificial Intelligence (ICECAI 2025), Chengdu, China.","DOI":"10.1109\/ICECAI66283.2025.11171265"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Baisholan, N., Dietz, J.E., Gnatyuk, S., Turdalyuly, M., Matson, E.T., and Baisholanova, K. (2025). A Systematic Review of Machine Learning in Credit Card Fraud Detection Under Original Class Imbalance. Computers, 14.","DOI":"10.3390\/computers14100437"},{"key":"ref_8","unstructured":"Boabang, F., and Goussiatiner, S.A. (2025). An Enhanced Focal Loss Function to Mitigate Class Imbalance in Auto Insurance Fraud Detection. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Saito, T., and Rehmsmeier, M. (2015). The precision\u2013recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE, 10.","DOI":"10.1371\/journal.pone.0118432"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Ali, A., Shukor, A.R., Omar, S.H., and Abdu, S. (2022). Financial fraud detection based on machine learning: A systematic literature review. Appl. Sci., 12.","DOI":"10.3390\/app12199637"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1016\/j.cose.2015.09.005","article-title":"Intelligent financial fraud detection: A comprehensive review","volume":"57","author":"West","year":"2016","journal-title":"Comput. Secur."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1186\/s40854-023-00470-w","article-title":"Online payment fraud: From anomaly detection to risk management","volume":"9","author":"Vanini","year":"2023","journal-title":"Financ. Innov."},{"key":"ref_13","unstructured":"Jesus, S., Pombal, J., Alves, D., Cruz, A., Saleiro, P., Ribeiro, R.P., Gama, J., and Bizarro, P. (2022). Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hosmer, D.W., Lemeshow, S., and Sturdivant, R.X. (2013). Applied Logistic Regression, John Wiley & Sons, Inc.. [3rd ed.].","DOI":"10.1002\/9781118548387"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"100343","DOI":"10.1016\/j.simpa.2022.100343","article-title":"PLSSVM\u2014Parallel Least Squares Support Vector Machine","volume":"14","author":"Breyer","year":"2022","journal-title":"Softw. Impacts"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1007\/s43069-023-00223-6","article-title":"Random Forest Pruning Techniques: A Recent Review","volume":"4","author":"Manzali","year":"2023","journal-title":"Oper. Res. Forum"},{"key":"ref_17","unstructured":"Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.Y. (2017, January 4\u20139). LightGBM: A highly efficient gradient boosting decision tree. Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Cho, K., van Merri\u00ebnboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014, January 25\u201329). Learning Phrase Representations Using RNN Encoder\u2013Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1179"},{"key":"ref_19","first-page":"37","article-title":"Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation","volume":"2","author":"Powers","year":"2011","journal-title":"J. Mach. Learn. Technol."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Davis, J., and Goadrich, M. (2006, January 25\u201329). The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning (ICML), Pittsburgh, PA, USA.","DOI":"10.1145\/1143844.1143874"},{"key":"ref_21","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1023\/A:1022627411411","article-title":"Support-Vector Networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_23","unstructured":"Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984). Classification and Regression Trees, Wadsworth & Brooks\/Cole Advanced Books & Software."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: A gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"106332","DOI":"10.1016\/j.jbankfin.2021.106332","article-title":"The exclamation mark of Cain: Risk attitude and mutual fund flows","volume":"134","author":"Mugerman","year":"2022","journal-title":"J. Bank. Financ."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"67","DOI":"10.35566\/jbds\/caoyc","article-title":"Machine learning approaches for depression detection on social media: A systematic review of biases and methodological challenges","volume":"5","author":"Cao","year":"2025","journal-title":"J. Behav. Data Sci."}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/12\/290\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T10:58:37Z","timestamp":1765277917000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/12\/290"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,9]]},"references-count":27,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["computation13120290"],"URL":"https:\/\/doi.org\/10.3390\/computation13120290","relation":{},"ISSN":["2079-3197"],"issn-type":[{"value":"2079-3197","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,9]]}}}