{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T00:49:31Z","timestamp":1772498971321,"version":"3.50.1"},"reference-count":33,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2023,11,9]],"date-time":"2023-11-09T00:00:00Z","timestamp":1699488000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>In this article, we investigate the potential of synthetic resumes as a means for the rapid generation of training data and their effectiveness in data augmentation, especially in categories marked by sparse samples. The widespread implementation of machine learning algorithms in natural language processing (NLP) has notably streamlined the resume classification process, delivering time and cost efficiencies for hiring organizations. However, the performance of these algorithms depends on the abundance of training data. While selecting the right model architecture is essential, it is also crucial to ensure the availability of a robust, well-curated dataset. For many categories in the job market, data sparsity remains a challenge. To deal with this challenge, we employed the OpenAI API to generate both structured and unstructured resumes tailored to specific criteria. These synthetically generated resumes were cleaned, preprocessed and then utilized to train two distinct models: a transformer model (BERT) and a feedforward neural network (FFNN) that incorporated Universal Sentence Encoder 4 (USE4) embeddings. While both models were evaluated on the multiclass classification task of resumes, when trained on an augmented dataset containing 60 percent real data (from Indeed website) and 40 percent synthetic data from ChatGPT, the transformer model presented exceptional accuracy. The FFNN, albeit predictably, achieved lower accuracy. These findings highlight the value of augmented real-world data with ChatGPT-generated synthetic resumes, especially in the context of limited training data. The suitability of the BERT model for such classification tasks further reinforces this narrative.<\/jats:p>","DOI":"10.3390\/fi15110363","type":"journal-article","created":{"date-parts":[[2023,11,9]],"date-time":"2023-11-09T08:08:58Z","timestamp":1699517338000},"page":"363","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Generating Synthetic Resume Data with Large Language Models for Enhanced Job Description Classification"],"prefix":"10.3390","volume":"15","author":[{"given":"Panagiotis","family":"Skondras","sequence":"first","affiliation":[{"name":"Data and Media Laboratory, Department of Electrical and Computer Engineering, University of Peloponnese, 22100 Tripoli, Greece"}]},{"given":"Panagiotis","family":"Zervas","sequence":"additional","affiliation":[{"name":"Data and Media Laboratory, Department of Electrical and Computer Engineering, University of Peloponnese, 22100 Tripoli, Greece"}]},{"given":"Giannis","family":"Tzimas","sequence":"additional","affiliation":[{"name":"Data and Media Laboratory, Department of Electrical and Computer Engineering, University of Peloponnese, 22100 Tripoli, Greece"}]}],"member":"1968","published-online":{"date-parts":[[2023,11,9]]},"reference":[{"key":"ref_1","unstructured":"Ye, J., Chen, X., Xu, N., Zu, C., Shao, Z., Liu, S., Cui, Y., Zhou, Z., Gong, C., and Shen, Y. (2023). A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models. arXiv."},{"key":"ref_2","unstructured":"Kuchnik, M., Smith, V., and Amvrosiadis, G. (2022). Validating Large Language Models with ReLM. ArXiv [Cs.LG]. arXiv."},{"key":"ref_3","unstructured":"(2023, September 29). OpenAI API. Available online: https:\/\/bit.ly\/3UOELSX."},{"key":"ref_4","unstructured":"White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., and Schmidt, D. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv."},{"key":"ref_5","first-page":"1146","article-title":"Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models","volume":"29","author":"Strobelt","year":"2023","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_6","unstructured":"Liu, Y., Deng, G., Xu, Z., Li, Y., Zheng, Y., Zhang, Y., Zhao, L., Zhang, T., and Liu, Y. (2023). Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Gao, A. (2023). Prompt Engineering for Large Language Models. Soc. Sci. Res. Netw., in press.","DOI":"10.2139\/ssrn.4504303"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Liu, V., and Chilton, L.B. (May, January 29). Design Guidelines for Prompt Engineering Text-to-Image Generative Models. Proceedings of the CHI \u201822: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA.","DOI":"10.1145\/3491102.3501825"},{"key":"ref_9","unstructured":"Sabit, E. (2023). Prompt Engineering for ChatGPT: A Quick Guide to Techniques, Tips, And Best Practices. TechRxiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Josifoski, M., Sakota, M., Peyrard, M., and West, R. (2023). Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction. arXiv.","DOI":"10.18653\/v1\/2022.naacl-main.342"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Xu, B., Wang, Q., Lyu, Y., Dai, D., Zhang, Y., and Mao, Z. (2023, January 9\u201314). S2ynRE: Two-stage Self-training with Synthetic data for Low-resource Relation Extraction. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada.","DOI":"10.18653\/v1\/2023.acl-long.455"},{"key":"ref_12","unstructured":"Whitehouse, C., Choudhury, M., and Aji, A.F. (2023). LLM-powered Data Augmentation for Enhanced Crosslingual Performance. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Jeronymo, V., Bonifacio, L., Abonizio, H., Fadaee, M., Lotufo, R., Zavrel, J., and Nogueira, R. (2023). InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval. arXiv.","DOI":"10.1145\/3477495.3531863"},{"key":"ref_14","unstructured":"Veselovsky, V., Ribeiro, M.H., Arora, A., Josifoski, M., Anderson, A., and West, R. (2023). Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social Science. arXiv."},{"key":"ref_15","unstructured":"Abonizio, H., Bonifacio, L., Jeronymo, V., Lotufo, R., Zavrel, J., and Nogueira, R. (2023). InPars Toolkit: A Unified and Reproducible Synthetic Data Generation Pipeline for Neural Information Retrieval. arXiv."},{"key":"ref_16","first-page":"7","article-title":"A Survey on Data Augmentation for Text Classification","volume":"55","author":"Bayer","year":"2022","journal-title":"ACM Comput. Surv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Shi, Z., and Lipani, A. (2023). Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis. arXiv.","DOI":"10.14428\/esann\/2023.ES2023-42"},{"key":"ref_18","unstructured":"Kumar, V., Choudhary, A., and Cho, E. (2021). Data Augmentation using Pre-trained Transformer Models. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"3182","DOI":"10.14778\/3476311.3476403","article-title":"Data augmentation for ML-driven data preparation and integration","volume":"14","author":"Li","year":"2021","journal-title":"ACM Proc. VLDB Endow."},{"key":"ref_20","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Under-standing. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Malinowski, J., Keim, T., Wendt, O., and Weitzel, T. (2006, January 4\u20137). Matching people and jobs: A bilateral recommendation approach. Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS\u201906), Kauai, HI, USA.","DOI":"10.1109\/HICSS.2006.266"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Yi, X., Allan, J., and Croft, W.B. (2007, January 23\u201327). Matching resumes and jobs based on relevance models. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands.","DOI":"10.1145\/1277741.1277920"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Tallapragada, V.V.S., Raj, V.S., Deepak, U., Sai, P.D., and Mallikarjuna, T. (2023, January 17\u201319). Improved Resume Parsing based on Contextual Meaning Extraction using BERT. Proceedings of the 2023 7th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India.","DOI":"10.1109\/ICICCS56967.2023.10142800"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"5069","DOI":"10.1007\/s00521-020-05302-x","article-title":"Skills prediction based on multi-label resume classification using CNN with model predictions explanation","volume":"33","author":"Jiechieu","year":"2020","journal-title":"Neural Comput. Appl."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Li, X., Shu, H., Zhai, Y., and Lin, Z. (2021, January 13\u201316). A Method for Resume Information Extraction Using BERT-BiLSTM-CRF. Proceedings of the 2021 IEEE 21st International Conference on Communication Technology (ICCT), Tianjin, China.","DOI":"10.1109\/ICCT52962.2021.9657937"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"84559","DOI":"10.1109\/ACCESS.2021.3087913","article-title":"Information Extraction from Free-Form CV Documents in Multiple Languages","volume":"9","author":"Vukadin","year":"2021","journal-title":"IEEE Access"},{"key":"ref_27","unstructured":"(2023, September 29). O*NET Code Connector. Available online: https:\/\/www.onetcodeconnector.org\/."},{"key":"ref_28","unstructured":"(2023, September 29). \u201cWelcome to the O*Net Web Services Site!\u201d O*NET Web Services. Available online: https:\/\/services.onetcenter.org\/."},{"key":"ref_29","unstructured":"Anand, Y., Nussbaum, Z., Duderstadt, B., Schmidt, B., and Mulyar, A. (2023, September 29). GPT4All: Training an Assistant-Style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo. Available online: https:\/\/github.com\/nomic-ai\/gpt4all."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1145\/219717.219748","article-title":"WordNet: A lexical database for English","volume":"38","author":"Miller","year":"1995","journal-title":"Commun. ACM"},{"key":"ref_31","unstructured":"(2023, September 29). Hugging Face Libraries. Available online: https:\/\/huggingface.co\/docs\/hub\/models-libraries."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Skondras, P., Psaroudakis, G., Zervas, P., and Tzimas, G. (2023, January 10\u201312). Efficient Resume Classification through Rapid Dataset Creation Using ChatGPT. Proceedings of the Fourteenth International Conference on Information, Intelligence, Systems and Applications (IISA 2023), Volos, Greece.","DOI":"10.1109\/IISA59645.2023.10345870"},{"key":"ref_33","unstructured":"Decorte, J.-J., Van Hautte, J., Demeester, T., and Develder, C. (2021). JobBERT: Understanding Job Titles through Skills. arXiv."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/15\/11\/363\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:20:18Z","timestamp":1760131218000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/15\/11\/363"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,9]]},"references-count":33,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2023,11]]}},"alternative-id":["fi15110363"],"URL":"https:\/\/doi.org\/10.3390\/fi15110363","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,9]]}}}