{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T17:04:08Z","timestamp":1767891848777,"version":"3.49.0"},"reference-count":36,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2024,9,27]],"date-time":"2024-09-27T00:00:00Z","timestamp":1727395200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Ministry of Trade, Industry and Energy (MOTIE) and Korea Institute for Advancement of Technology (KIAT)","award":["P0025661"],"award-info":[{"award-number":["P0025661"]}]},{"name":"Ministry of Trade, Industry and Energy (MOTIE) and Korea Institute for Advancement of Technology (KIAT)","award":["RS-2022-00155885"],"award-info":[{"award-number":["RS-2022-00155885"]}]},{"name":"Institute of Information &amp; communications Technology Planning &amp; Evaluation (IITP)","award":["P0025661"],"award-info":[{"award-number":["P0025661"]}]},{"name":"Institute of Information &amp; communications Technology Planning &amp; Evaluation (IITP)","award":["RS-2022-00155885"],"award-info":[{"award-number":["RS-2022-00155885"]}]},{"name":"Artificial Intelligence Convergence Innovation Human Resources Development","award":["P0025661"],"award-info":[{"award-number":["P0025661"]}]},{"name":"Artificial Intelligence Convergence Innovation Human Resources Development","award":["RS-2022-00155885"],"award-info":[{"award-number":["RS-2022-00155885"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>Survey data play a crucial role in various research fields, including economics, education, and healthcare, by providing insights into human behavior and opinions. However, item non-response, where respondents fail to answer specific questions, presents a significant challenge by creating incomplete datasets that undermine data integrity and can hinder or even prevent accurate analysis. Traditional methods for addressing missing data, such as statistical imputation techniques and deep learning models, often fall short when dealing with the rich linguistic content of survey data. These approaches are also hampered by high time complexity for training and the need for extensive preprocessing or feature selection. In this paper, we introduce an approach that leverages Large Language Models (LLMs) through prompt engineering for predicting item non-responses in survey data. Our method combines the strengths of both traditional imputation techniques and deep learning methods with the advanced linguistic understanding of LLMs. By integrating respondent similarities, question relevance, and linguistic semantics, our approach enhances the accuracy and comprehensiveness of survey data analysis. The proposed method bypasses the need for complex preprocessing and additional training, making it adaptable, scalable, and capable of generating explainable predictions in natural language. We evaluated the effectiveness of our LLM-based approach through a series of experiments, demonstrating its competitive performance against established methods such as Multivariate Imputation by Chained Equations (MICE), MissForest, and deep learning models like TabTransformer. The results show that our approach not only matches but, in some cases, exceeds the performance of these methods while significantly reducing the time required for data processing.<\/jats:p>","DOI":"10.3390\/fi16100351","type":"journal-article","created":{"date-parts":[[2024,9,27]],"date-time":"2024-09-27T06:10:27Z","timestamp":1727417427000},"page":"351","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-Response"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4587-4337","authenticated-orcid":false,"given":"Junyung","family":"Ji","sequence":"first","affiliation":[{"name":"Department of Applied Artificial Intelligence, Hanyang University at Ansan, Ansan 15588, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-2066-0003","authenticated-orcid":false,"given":"Jiwoo","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Applied Artificial Intelligence, Hanyang University at Ansan, Ansan 15588, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3049-035X","authenticated-orcid":false,"given":"Younghoon","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Applied Artificial Intelligence, Hanyang University at Ansan, Ansan 15588, Republic of Korea"}]}],"member":"1968","published-online":{"date-parts":[[2024,9,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1017\/S0266267120000152","article-title":"Measuring norms using social survey data","volume":"37","author":"Lisciandra","year":"2021","journal-title":"Econ. Philos."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"42","DOI":"10.24191\/ajue.v15i3.05","article-title":"Student engagement and academic performance of students of partido state university","volume":"15","author":"Delfino","year":"2019","journal-title":"Asian J. Univ. Educ."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Liamputtong, P. (2023). Social Surveys and Public Health. Handbook of Social Sciences and Global Public Health, Springer International Publishing.","DOI":"10.1007\/978-3-031-25110-8"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1177\/096228029600500302","article-title":"Handling missing data in survey research","volume":"5","author":"Brick","year":"1996","journal-title":"Stat. Methods Med Res."},{"key":"ref_5","first-page":"113","article-title":"Nearest neighbor imputation for survey data","volume":"16","author":"Chen","year":"2000","journal-title":"J. Off. Stat."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v045.i07","article-title":"Amelia II: A Program for Missing Data","volume":"45","author":"Honaker","year":"2011","journal-title":"J. Stat. Softw."},{"key":"ref_7","first-page":"1","article-title":"mice: Multivariate Imputation by Chained Equations in R","volume":"45","year":"2011","journal-title":"J. Stat. Softw."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1093\/bioinformatics\/btr597","article-title":"MissForest\u2014Non-parametric missing value imputation for mixed-type data","volume":"28","author":"Stekhoven","year":"2011","journal-title":"Bioinformatics"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1038\/nmeth.3904","article-title":"Points of Significance: Logistic regression","volume":"13","author":"Lever","year":"2016","journal-title":"Nat. Methods"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_11","unstructured":"Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.Y. (2017, January 4\u20139). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Ke, G., Xu, Z., Zhang, J., Bian, J., and Liu, T.Y. (2019, January 4\u20138). DeepGBM: A Deep Learning Framework Distilled by GBDT for Online Prediction Tasks. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA.","DOI":"10.1145\/3292500.3330858"},{"key":"ref_13","unstructured":"Huang, X., Khetan, A., Cvitkovic, M., and Karnin, Z. (2020). TabTransformer: Tabular Data Modeling Using Contextual Embeddings. arXiv."},{"key":"ref_14","unstructured":"Somepalli, G., Goldblum, M., Schwarzschild, A., Bruss, C.B., and Goldstein, T. (2021). SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Moser, C., and Kalton, G. (2017). Question wording. Research Design, Routledge.","DOI":"10.4324\/9781315128498-12"},{"key":"ref_16","unstructured":"Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., and Askell, A. (2020). Language Models are Few-Shot Learners. arXiv."},{"key":"ref_17","unstructured":"Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., and Iwasawa, Y. (2023). Large Language Models are Zero-Shot Reasoners. arXiv."},{"key":"ref_18","unstructured":"Liu, J., Liu, C., Zhou, P., Lv, R., Zhou, K., and Zhang, Y. (2023). Is ChatGPT a Good Recommender? A Preliminary Study. arXiv."},{"key":"ref_19","unstructured":"Yang, F., Chen, Z., Jiang, Z., Cho, E., Huang, X., and Lu, Y. (2023). PALR: Personalization Aware LLMs for Recommendation. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Salemi, A., Mysore, S., Bendersky, M., and Zamani, H. (2024). LaMP: When Large Language Models Meet Personalization. arXiv.","DOI":"10.18653\/v1\/2024.acl-long.399"},{"key":"ref_21","unstructured":"Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K\u00fcttler, H., Lewis, M., tau Yih, W., and Rockt\u00e4schel, T. (2021). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Izacard, G., and Grave, E. (2021). Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. arXiv.","DOI":"10.18653\/v1\/2021.eacl-main.74"},{"key":"ref_23","unstructured":"Xu, P., Ping, W., Wu, X., McAfee, L., Zhu, C., Liu, Z., Subramanian, S., Bakhturina, E., Shoeybi, M., and Catanzaro, B. (2024). Retrieval meets Long Context Large Language Models. arXiv."},{"key":"ref_24","unstructured":"Rogers, A., Boyd-Graber, J., and Okazaki, N. Precise Zero-Shot Dense Retrieval without Relevance Labels. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)."},{"key":"ref_25","unstructured":"Asai, A., Wu, Z., Wang, Y., Sil, A., and Hajishirzi, H. (2023). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. arXiv."},{"key":"ref_26","unstructured":"Yan, S.Q., Gu, J.C., Zhu, Y., and Ling, Z.H. (2024). Corrective Retrieval Augmented Generation. arXiv."},{"key":"ref_27","unstructured":"Kim, J., and Lee, B. (2024). AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction. arXiv."},{"key":"ref_28","unstructured":"Simmons, G., and Savinov, V. (2024). Assessing Generalization for Subpopulation Representative Modeling via In-Context Learning. arXiv."},{"key":"ref_29","unstructured":"Plaat, A., Wong, A., Verberne, S., Broekens, J., van Stein, N., and Back, T. (2024). Reasoning with Large Language Models, a Survey. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Huang, J., and Chang, K.C.C. (2023). Towards Reasoning in Large Language Models: A Survey. arXiv.","DOI":"10.18653\/v1\/2023.findings-acl.67"},{"key":"ref_31","unstructured":"Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., and Wang, H. (2024). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Ram, O., Levine, Y., Dalmedigos, I., Muhlgay, D., Shashua, A., Leyton-Brown, K., and Shoham, Y. (2023). In-Context Retrieval-Augmented Language Models. arXiv.","DOI":"10.1162\/tacl_a_00605"},{"key":"ref_33","first-page":"31210","article-title":"Large Language Models Can Be Easily Distracted by Irrelevant Context","volume":"Volume 202","author":"Krause","year":"2023","journal-title":"Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23\u201329 July 2023"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Liu, J., Shen, D., Zhang, Y., Dolan, B., Carin, L., and Chen, W. (2021). What Makes Good In-Context Examples for GPT-3?. arXiv.","DOI":"10.18653\/v1\/2022.deelio-1.10"},{"key":"ref_35","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., and Liu, Z. (2024). BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. arXiv.","DOI":"10.18653\/v1\/2024.findings-acl.137"}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/16\/10\/351\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:04:53Z","timestamp":1760112293000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/16\/10\/351"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,27]]},"references-count":36,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2024,10]]}},"alternative-id":["fi16100351"],"URL":"https:\/\/doi.org\/10.3390\/fi16100351","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,9,27]]}}}