{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T10:54:04Z","timestamp":1778324044489,"version":"3.51.4"},"reference-count":27,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2023,9,12]],"date-time":"2023-09-12T00:00:00Z","timestamp":1694476800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61202479"],"award-info":[{"award-number":["61202479"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>This research primarily explores the application of Natural Language Processing (NLP) technology in precision financial fraud detection, with a particular focus on the implementation and optimization of the FinChain-BERT model. Firstly, the FinChain-BERT model has been successfully employed for financial fraud detection tasks, improving the capability of handling complex financial text information through deep learning techniques. Secondly, novel attempts have been made in the selection of loss functions, with a comparison conducted between negative log-likelihood function and Keywords Loss Function. The results indicated that the Keywords Loss Function outperforms the negative log-likelihood function when applied to the FinChain-BERT model. Experimental results validated the efficacy of the FinChain-BERT model and its optimization measures. Whether in the selection of loss functions or the application of lightweight technology, the FinChain-BERT model demonstrated superior performance. The utilization of Keywords Loss Function resulted in a model achieving 0.97 in terms of accuracy, recall, and precision. Simultaneously, the model size was successfully reduced to 43 MB through the application of integer distillation technology, which holds significant importance for environments with limited computational resources. In conclusion, this research makes a crucial contribution to the application of NLP in financial fraud detection and provides a useful reference for future studies.<\/jats:p>","DOI":"10.3390\/info14090499","type":"journal-article","created":{"date-parts":[[2023,9,12]],"date-time":"2023-09-12T21:41:12Z","timestamp":1694554872000},"page":"499","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":27,"title":["FinChain-BERT: A High-Accuracy Automatic Fraud Detection Model Based on NLP Methods for Financial Scenarios"],"prefix":"10.3390","volume":"14","author":[{"given":"Xinze","family":"Yang","sequence":"first","affiliation":[{"name":"China Agricultural University, Beijing 100083, China"}]},{"given":"Chunkai","family":"Zhang","sequence":"additional","affiliation":[{"name":"China Agricultural University, Beijing 100083, China"}]},{"given":"Yizhi","family":"Sun","sequence":"additional","affiliation":[{"name":"China Agricultural University, Beijing 100083, China"}]},{"given":"Kairui","family":"Pang","sequence":"additional","affiliation":[{"name":"School of Business and Managemen, Jilin University, Jilin 130015, China"}]},{"given":"Luru","family":"Jing","sequence":"additional","affiliation":[{"name":"School of Software and Microelectronics, Peking University, Beijing 100083, China"}]},{"given":"Shiyun","family":"Wa","sequence":"additional","affiliation":[{"name":"Applied Computational Science and Engineering, Imperial College London, South Kensington Campus, London SW7 2AZ, UK"}]},{"given":"Chunli","family":"Lv","sequence":"additional","affiliation":[{"name":"China Agricultural University, Beijing 100083, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,9,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Syed, A.A., Ahmed, F., Kamal, M.A., and Trinidad Segovia, J.E. (2021). Assessing the role of digital finance on shadow economy and financial instability: An empirical analysis of selected South Asian countries. Mathematics, 9.","DOI":"10.2139\/ssrn.3982585"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1143","DOI":"10.1108\/JFC-04-2020-0062","article-title":"The risk of financial fraud: A management perspective","volume":"27","author":"Hashim","year":"2020","journal-title":"J. Financ. Crime"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"113421","DOI":"10.1016\/j.dss.2020.113421","article-title":"Deep learning for detecting financial statement fraud","volume":"139","author":"Craja","year":"2020","journal-title":"Decis. Support Syst."},{"key":"ref_4","first-page":"100176","article-title":"Intelligent financial fraud detection practices in post-pandemic era","volume":"2","author":"Zhu","year":"2021","journal-title":"Innovation"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1002\/jcaf.22389","article-title":"Data-driven auditing: A predictive modeling approach to fraud detection and classification","volume":"30","author":"Singh","year":"2019","journal-title":"J. Corp. Account. Financ."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Jan, C.L. (2021). Detection of financial statement fraud using deep learning for sustainable development of capital markets under information asymmetry. Sustainability, 13.","DOI":"10.3390\/su13179879"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"22516","DOI":"10.1109\/ACCESS.2022.3153478","article-title":"An analysis on financial statement fraud detection for Chinese listed companies using deep learning","volume":"10","author":"Xiuguo","year":"2022","journal-title":"IEEE Access"},{"key":"ref_8","first-page":"100269","article-title":"Fraud prediction using machine learning: The case of investment advisors in Canada","volume":"8","author":"Lokanan","year":"2022","journal-title":"Mach. Learn. Appl."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"De Oliveira, N.R., Pisa, P.S., Lopez, M.A., de Medeiros, D.S.V., and Mattos, D.M. (2021). Identifying fake news on social networks based on natural language processing: Trends and challenges. Information, 12.","DOI":"10.3390\/info12010038"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"444","DOI":"10.3390\/info12110444","article-title":"Investigating machine learning & natural language processing techniques applied for predicting depression disorder from online support forums: A systematic literature review","volume":"12","author":"Sandanapitchai","year":"2021","journal-title":"Information"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Kanakogi, K., Washizaki, H., Fukazawa, Y., Ogata, S., Okubo, T., Kato, T., Kanuka, H., Hazeyama, A., and Yoshioka, N. (2021). Tracing cve vulnerability information to capec attack patterns using natural language processing techniques. Information, 12.","DOI":"10.24251\/HICSS.2021.841"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Zhang, Y., He, S., Wa, S., Zong, Z., Lin, J., Fan, D., Fu, J., and Lv, C. (2022). Symmetry GAN Detection Network: An Automatic One-Stage High-Accuracy Detection Network for Various Types of Lesions on CT Images. Symmetry, 14.","DOI":"10.3390\/sym14020234"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Taherdoost, H., and Madanchian, M. (2023). Artificial intelligence and sentiment analysis: A review in competitive research. Computers, 12.","DOI":"10.3390\/computers12020037"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Dang, N.C., Moreno-Garc\u00eda, M.N., and De la Prieta, F. (2020). Sentiment analysis based on deep learning: A comparative study. Electronics, 9.","DOI":"10.3390\/electronics9030483"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Villavicencio, C., Macrohon, J.J., Inbaraj, X.A., Jeng, J.H., and Hsieh, J.G. (2021). Twitter sentiment analysis towards COVID-19 vaccines in the Philippines using na\u00efve bayes. Information, 12.","DOI":"10.3390\/info12050204"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Kwon, H.J., Ban, H.J., Jun, J.K., and Kim, H.S. (2021). Topic modeling and sentiment analysis of online review for airlines. Information, 12.","DOI":"10.3390\/info12020078"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Li, D., Wang, Y., Fang, Y., and Xiao, W. (2019). Abstract text summarization with a convolutional seq2seq model. Appl. Sci., 9.","DOI":"10.3390\/app9081665"},{"key":"ref_18","unstructured":"Zaremba, W., Sutskever, I., and Vinyals, O. (2014). Recurrent neural network regularization. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Graves, A., and Graves, A. (2012). Supervised Sequence Labelling with Recurrent Neural Networks, Springer.","DOI":"10.1007\/978-3-642-24797-2"},{"key":"ref_20","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_21","unstructured":"Federation, C.C. (2019, August 17). Negative Financial Information and Subject Determination. Available online: https:\/\/www.datafountain.cn\/competitions\/353."},{"key":"ref_22","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv."},{"key":"ref_23","unstructured":"Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv."},{"key":"ref_24","unstructured":"Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv."},{"key":"ref_25","unstructured":"Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv."},{"key":"ref_26","unstructured":"Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv."},{"key":"ref_27","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/14\/9\/499\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:49:36Z","timestamp":1760129376000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/14\/9\/499"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,12]]},"references-count":27,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2023,9]]}},"alternative-id":["info14090499"],"URL":"https:\/\/doi.org\/10.3390\/info14090499","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,12]]}}}