{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,8,1]],"date-time":"2026-08-01T17:06:26Z","timestamp":1785603986298,"version":"3.56.0"},"reference-count":38,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2024,5,15]],"date-time":"2024-05-15T00:00:00Z","timestamp":1715731200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>This study is an in-depth exploration of the nascent field of Natural Language Processing (NLP) and generative Artificial Intelligence (AI), and it concentrates on the vital task of distinguishing between human-generated text and content that has been produced by AI models. Particularly, this research pioneers the identification of financial text derived from AI models such as ChatGPT and paraphrasing tools like QuillBot. While our primary focus is on financial content, we have also pinpointed texts generated by paragraph rewriting tools and utilized ChatGPT for various contexts this multiclass identification was missing in previous studies. In this paper, we use a comprehensive feature extraction methodology that combines TF\u2013IDF with Word2Vec, along with individual feature extraction methods. Importantly, combining a Random Forest model with Word2Vec results in impressive outcomes. Moreover, this study investigates the significance of the window size parameters in the Word2Vec approach, revealing that a window size of one produces outstanding scores across various metrics, including accuracy, precision, recall and the F1 measure, all reaching a notable value of 0.74. In addition to this, our developed model performs well in classification, attaining AUC values of 0.94 for the \u2018GPT\u2019 class; 0.77 for the \u2018Quil\u2019 class; and 0.89 for the \u2018Real\u2019 class. We also achieved an accuracy of 0.72, precision of 0.71, recall of 0.72, and F1 of 0.71 for our extended prepared dataset. This study contributes significantly to the evolving landscape of AI text identification, providing valuable insights and promising directions for future research.<\/jats:p>","DOI":"10.3390\/computation12050101","type":"journal-article","created":{"date-parts":[[2024,5,15]],"date-time":"2024-05-15T06:14:42Z","timestamp":1715753682000},"page":"101","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Unveiling AI-Generated Financial Text: A Computational Approach Using Natural Language Processing and Generative Artificial Intelligence"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5583-1253","authenticated-orcid":false,"given":"Muhammad Asad","family":"Arshed","sequence":"first","affiliation":[{"name":"Department of Software Engineering, University of Management and Technology, Lahore 54770, Pakistan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2911-6480","authenticated-orcid":false,"given":"\u0218tefan Cristian","family":"Gherghina","sequence":"additional","affiliation":[{"name":"Department of Finance, Bucharest University of Economic Studies, 6 Piata Romana, 010374 Bucharest, Romania"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1284-234X","authenticated-orcid":false,"given":"Christine","family":"Dewi","sequence":"additional","affiliation":[{"name":"Department of Information Technology, Satya Wacana Christian University, Salatiga 50715, Indonesia"},{"name":"School of Information Technology, Deakin University, Campus 221 Burwood Hwy, Burwood, VIC 3125, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Asma","family":"Iqbal","sequence":"additional","affiliation":[{"name":"Department of Software Engineering, University of Management and Technology, Lahore 54770, Pakistan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2606-2405","authenticated-orcid":false,"given":"Shahzad","family":"Mumtaz","sequence":"additional","affiliation":[{"name":"Department of Data Science, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan"},{"name":"School of Natural and Computing Sciences, University of Aberdeen, Aberdeen AB24 3FX, Scotland, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2024,5,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Muneer, A., Alwadain, A., Ragab, M.G., and Alqushaibi, A. (2023). Cyberbullying Detection on Social Media Using Stacking Ensemble Learning and Enhanced BERT. Information, 14.","DOI":"10.3390\/info14080467"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Hadi, M.U., Al Tashi, Q., Qureshi, R., Shah, A., Muneer, A., Irfan, M., Zafar, A., Shaikh, M.B., Akhtar, N., and Wu, J. (2023). Large Language Models: A Comprehensive Survey of its Applications, Challenges, Limitations, and Future Prospects. Authorea Prepr.","DOI":"10.36227\/techrxiv.23589741.v2"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"857","DOI":"10.1007\/s11277-023-10312-8","article-title":"Demystifying the Role of Natural Language Processing (NLP) in Smart City Applications: Background, Motivation, Recent Advances, and Future Research Directions","volume":"130","author":"Tyagi","year":"2023","journal-title":"Wirel. Pers. Commun."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"3713","DOI":"10.1007\/s11042-022-13428-4","article-title":"Natural language processing: State of the art, current trends and challenges","volume":"82","author":"Khurana","year":"2023","journal-title":"Multimed. Tools Appl."},{"key":"ref_5","first-page":"84","article-title":"Collaborating with ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education","volume":"78","author":"Pavlik","year":"2023","journal-title":"J. Mass Commun. Educ."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1111\/epi.17474","article-title":"Transforming epilepsy research: A systematic review on natural language processing applications","volume":"64","author":"Yew","year":"2022","journal-title":"Epilepsia"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Muneer, A., and Fati, S.M. (2020). A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter. Futur. Internet, 12.","DOI":"10.3390\/fi12110187"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Fati, S.M., Muneer, A., Alwadain, A., and Balogun, A.O. (2023). Cyberbullying Detection on Twitter Using Deep Learning-Based Attention Mechanisms and Continuous Bag of Words Feature Extraction. Mathematics, 11.","DOI":"10.3390\/math11163567"},{"key":"ref_9","unstructured":"Gligori\u0107, K., Anderson, A., and West, R. (2020). Adoption of Twitter\u2019s New Length Limit: Is 280 the New 140?. arXiv."},{"key":"ref_10","unstructured":"(2023, September 04). How Many Users Does Twitter Have?. Available online: https:\/\/www.bankmycell.com\/blog\/how-many-users-does-twitter-have."},{"key":"ref_11","first-page":"183","article-title":"QuillBot as an online tool: Students\u2019 alternative in paraphrasing and rewriting of English writing","volume":"9","author":"Fitria","year":"2021","journal-title":"Englisia J."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"32","DOI":"10.33394\/jtp.v8i1.6392","article-title":"The Effectiveness of Using Quillbot In Improving Writing for Students of English Education Study Program","volume":"8","author":"Nurmayanti","year":"2023","journal-title":"J. Teknol. Pendidik."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Alawida, M., Mejri, S., Mehmood, A., Chikhaoui, B., and Abiodun, O.I. (2023). A Comprehensive Study of ChatGPT: Advancements, Limitations, and Ethical Considerations in Natural Language Processing and Cybersecurity. Information, 14.","DOI":"10.3390\/info14080462"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Liao, W., Liu, Z., Dai, H., Xu, S., Wu, Z., Zhang, Y., and Liu, T. (2023). Differentiate ChatGPT-Generated and Human-Written Medical Texts. arXiv.","DOI":"10.2196\/preprints.48904"},{"key":"ref_15","first-page":"7","article-title":"Academic integrity considerations of AI Large Language Models in the post-pandemic era: ChatGPT and beyond","volume":"20","author":"Perkins","year":"2023","journal-title":"J. Univ. Teach. Learn. Pract."},{"key":"ref_16","unstructured":"Zellers, R., Holtzman, A., Rashkin, H., Bisk, Y., Farhadi, A., Roesner, F., and Choi, Y. (2019). Defending Against Neural Fake News. Adv. Neural Inf. Process. Syst., 32, Available online: https:\/\/arxiv.org\/abs\/1905.12616v3."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Alamleh, H., AlQahtani, A.A.S., and ElSaid, A. (2023, January 27\u201328). Distinguishing Human-Written and ChatGPT-Generated Text Using Machine Learning. Proceedings of the 2023 Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, USA.","DOI":"10.1109\/SIEDS58326.2023.10137767"},{"key":"ref_18","unstructured":"Das, M., Kamalanathan, S., and Alphonse, P. (2023). A Comparative Study on TF-IDF Feature Weighting Method and Its Analysis Using Unstructured Dataset. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Jang, B., Kim, I., and Kim, J.W. (2019). Word2vec convolutional neural networks for classification of news articles and tweets. PLoS ONE, 14.","DOI":"10.1371\/journal.pone.0220976"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1016\/j.procs.2013.05.005","article-title":"The Role of Text Pre-processing in Sentiment Analysis","volume":"17","author":"Haddi","year":"2013","journal-title":"Procedia Comput. Sci."},{"key":"ref_21","unstructured":"(2022, March 20). Tweet-Preprocessor \u00b7 PyPI. Available online: https:\/\/pypi.org\/project\/tweet-preprocessor\/."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"39","DOI":"10.3233\/IDA-150390","article-title":"Extracting domain-specific stopwords for text classifiers","volume":"21","author":"Makrehchi","year":"2017","journal-title":"Intell. Data Anal."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"545","DOI":"10.1017\/S1351324920000224","article-title":"Universal Lemmatizer: A sequence-to-sequence model for lemmatizing Universal Dependencies treebanks","volume":"27","author":"Kanerva","year":"2020","journal-title":"Nat. Lang. Eng."},{"key":"ref_24","first-page":"012021","article-title":"Research of Text Classification Based on TF-IDF and CNN-LSTM","volume":"2171","author":"Zhou","year":"2022","journal-title":"J. Physics"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1111\/j.2517-6161.1958.tb00292.x","article-title":"The Regression Analysis of Binary Sequences","volume":"20","author":"Cox","year":"1958","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"238","DOI":"10.2307\/1403797","article-title":"Discriminatory analysis. Nonparametric discrimination: Consistency properties","volume":"57","author":"Fix","year":"1989","journal-title":"Int. Stat. Rev.\/Rev. Int. Stat."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1080\/00031305.1992.10475879","article-title":"An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression","volume":"46","author":"Altman","year":"1992","journal-title":"Am. Stat."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Han, T. (2023, January 26\u201328). Research on Chinese Patent Text Classification Based on SVM. Proceedings of the 2nd International Conference on Mathematical Statistics and Economic Analysis, MSEA 2023, Nanjing, China.","DOI":"10.4108\/eai.26-5-2023.2334244"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"101060","DOI":"10.1016\/j.scp.2023.101060","article-title":"Predicting the amount of medical waste using kernel-based SVM and deep learning methods for a private hospital in Turkey","volume":"33","author":"Altin","year":"2023","journal-title":"Sustain. Chem. Pharm."},{"key":"ref_30","unstructured":"Ho, T.K. (1995, January 14\u201316). Random decision forests. Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada."},{"key":"ref_31","unstructured":"(2023, September 09). Colab.Google. Available online: https:\/\/colab.google\/."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Zeini, H.A., Al-Jeznawi, D., Imran, H., Bernardo, L.F.A., Al-Khafaji, Z., and Ostrowski, K.A. (2023). Random Forest Algorithm for the Strength Prediction of Geopolymer Stabilized Clayey Soil. Sustainability, 15.","DOI":"10.3390\/su15021408"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"106131","DOI":"10.1016\/j.cor.2022.106131","article-title":"Comparing two SVM models through different metrics based on the confusion matrix","volume":"152","author":"Alcaraz","year":"2023","journal-title":"Comput. Oper. Res."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"2249","DOI":"10.1007\/s11063-022-11111-1","article-title":"Improving the Polarity of Text through word2vec Embedding for Primary Classical Arabic Sentiment Analysis","volume":"55","author":"Aoumeur","year":"2023","journal-title":"Neural Process. Lett."},{"key":"ref_35","first-page":"1","article-title":"Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo","volume":"19","author":"Kale","year":"2023","journal-title":"J. Comput. Virol. Hacking Tech."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Wei, L., Wang, L., Liu, F., and Qian, Z. (2023). Clustering Analysis of Wind Turbine Alarm Sequences Based on Domain Knowledge-Fused Word2vec. Appl. Sci., 13.","DOI":"10.3390\/app131810114"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"106876","DOI":"10.1016\/j.resconrec.2023.106876","article-title":"The evolution of research in resources, conservation & recycling revealed by Word2vec-enhanced data mining","volume":"190","author":"Zhu","year":"2023","journal-title":"Resour. Conserv. Recycl."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"102110","DOI":"10.1016\/j.datak.2022.102110","article-title":"Ontology-based semantic retrieval of documents using Word2vec model","volume":"144","author":"Sharma","year":"2023","journal-title":"Data Knowl. Eng."}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/12\/5\/101\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:42:36Z","timestamp":1760107356000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/12\/5\/101"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,15]]},"references-count":38,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2024,5]]}},"alternative-id":["computation12050101"],"URL":"https:\/\/doi.org\/10.3390\/computation12050101","relation":{},"ISSN":["2079-3197"],"issn-type":[{"value":"2079-3197","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,15]]}}}