{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T16:51:16Z","timestamp":1754153476058,"version":"3.41.2"},"reference-count":32,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,7,23]],"date-time":"2025-07-23T00:00:00Z","timestamp":1753228800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:sec><jats:title>Introduction<\/jats:title><jats:p>The current study leverages large language models (LLMs) to capture health behaviors expressed in social media posts, focusing on COVID-19 vaccine-related content from 2020 to 2021.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>To examine the capabilities of prompt engineering and fine-tuning approaches with LLMs, this study examines the performance of three state-of-the-art LLMs: GPT-4o, GPT-4o-mini, and GPT-4o-mini with fine-tuning, focusing on their ability to classify individuals\u2019 vaccination behavior, intention to vaccinate, and information sharing. We then cross-validate these classifications with nationwide vaccination statistics to assess alignment with observed trends.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>GPT-4o-mini with fine-tuning outperformed both GPT-4o and the standard GPT-4o-mini in terms of accuracy, precision, recall, and F1 score. Using GPT-4o-mini with fine-tuning for classification, about 9.84% of the posts (<jats:italic>N<\/jats:italic>\u202f=\u202f36,912) included personal behavior related to getting the COVID-19 vaccine while a majority of posts (71.45%; <jats:italic>N<\/jats:italic>\u202f=\u202f267,930) included information sharing about the virus. Lastly, we found a strong correlation (<jats:italic>r<\/jats:italic>\u202f=\u202f0.76, <jats:italic>p<\/jats:italic>\u202f&amp;lt;\u202f0.01) between vaccination behaviors expressed on social media and the actual vaccine uptake over time.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>This study suggests that LLMs can serve as powerful tools for estimating real-world behaviors. Methodological and practical implications of utilizing LLMs in human behavior research are further discussed.<\/jats:p><\/jats:sec>","DOI":"10.3389\/frai.2025.1602984","type":"journal-article","created":{"date-parts":[[2025,7,23]],"date-time":"2025-07-23T05:36:01Z","timestamp":1753248961000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["From digital traces to public vaccination behaviors: leveraging large language models for big data classification"],"prefix":"10.3389","volume":"8","author":[{"given":"Yoo Jung","family":"Oh","sequence":"first","affiliation":[]},{"given":"Muhammad Ehab","family":"Rasul","sequence":"additional","affiliation":[]},{"given":"Emily","family":"McKinley","sequence":"additional","affiliation":[]},{"given":"Christopher","family":"Calabrese","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,7,23]]},"reference":[{"key":"ref1","doi-asserted-by":"crossref","DOI":"10.31234\/osf.io\/5b26t","volume-title":"Which humans?","author":"Atari","year":"2023"},{"key":"ref2","doi-asserted-by":"publisher","first-page":"589","DOI":"10.1001\/jamainternmed.2023.1838","article-title":"Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum","volume":"183","author":"Ayers","year":"2023","journal-title":"JAMA Intern. Med."},{"key":"ref3","doi-asserted-by":"crossref","DOI":"10.1038\/s41467-025-61345-5","article-title":"LLM-generated messages can persuade humans on policy issues","volume-title":"Nature Communications","author":"Bai","year":"2025"},{"key":"ref4","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2406.08660","article-title":"Fine-tuned \u201csmall\u201d LLMs (still) significantly outperform zero-shot generative AI models in text classification","author":"Bucher","year":"2024","journal-title":"arXiv [csCL]"},{"key":"ref5","doi-asserted-by":"crossref","first-page":"1441","DOI":"10.1145\/3589335.3651910","article-title":"Automated claim matching with large language models: empowering fact-checkers in the fight against misinformation","volume-title":"Companion proceedings of the ACM web conference 2024","author":"Choi","year":"2024"},{"key":"ref6","article-title":"Scaling instruction-finetuned language models","author":"Chung","year":"2022","journal-title":"arXiv [csLG]"},{"key":"ref7","doi-asserted-by":"publisher","first-page":"eadq1814","DOI":"10.1126\/science.adq1814","article-title":"Durably reducing conspiracy beliefs through dialogues with AI","volume":"385","author":"Costello","year":"2024","journal-title":"Science"},{"key":"ref8","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2018","journal-title":"In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies"},{"key":"ref9","doi-asserted-by":"publisher","first-page":"6649","DOI":"10.1016\/j.vaccine.2014.09.039","article-title":"Mapping vaccine hesitancy-country-specific characteristics of a global phenomenon","volume":"32","author":"Dub\u00e9","year":"2014","journal-title":"Vaccine"},{"key":"ref10","doi-asserted-by":"publisher","first-page":"e2305016120","DOI":"10.1073\/pnas.2305016120","article-title":"ChatGPT outperforms crowd workers for text-annotation tasks","volume":"120","author":"Gilardi","year":"2023","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref11","doi-asserted-by":"publisher","first-page":"6239","DOI":"10.1177\/20531680241236239","article-title":"Large language models as a substitute for human experts in annotating political text. Res","volume":"11","author":"Heseltine","year":"2024","journal-title":"Politics"},{"key":"ref12","doi-asserted-by":"publisher","first-page":"22105","DOI":"10.1609\/aaai.v38i20.30214","article-title":"Bad actor, good advisor: exploring the role of large language models in fake news detection","volume":"38","author":"Hu","year":"2024","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1080\/10410236.2024.2375478","article-title":"Impact of COVID-19 vaccine persuasion strategies on social endorsement and public response on Chinese social media","volume":"40","author":"Ji","year":"2024","journal-title":"Health Commun."},{"key":"ref14","doi-asserted-by":"publisher","first-page":"e30251","DOI":"10.2196\/30251","article-title":"Leveraging transfer learning to analyze opinions, attitudes, and behavioral intentions toward COVID-19 vaccines: social media content and temporal analysis","volume":"23","author":"Liu","year":"2021","journal-title":"J. Med. Internet Res."},{"key":"ref15","doi-asserted-by":"publisher","first-page":"103809","DOI":"10.1016\/j.ipm.2024.103809","article-title":"Are LLMs good at structured outputs? A benchmark for evaluating structured output capabilities in LLMs","volume":"61","author":"Liu","year":"2024","journal-title":"Inf. Process. Manag."},{"key":"ref16","doi-asserted-by":"publisher","first-page":"563","DOI":"10.1093\/joc\/jqz033","article-title":"Toward an aggregate, implicit, and dynamic model of norm formation: capturing large-scale media representations of dynamic descriptive norms through automated and crowdsourced content analysis","volume":"69","author":"Liu","year":"2019","journal-title":"J. Commun."},{"key":"ref17","doi-asserted-by":"publisher","first-page":"947","DOI":"10.1038\/s41562-021-01122-8","article-title":"A global database of COVID-19 vaccinations","volume":"5","author":"Mathieu","year":"2021","journal-title":"Nat. Hum. Behav."},{"year":"2024","key":"ref18"},{"key":"ref19","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2303.08774","article-title":"GPT-4 technical report","author":"Achiam","year":"2023","journal-title":"arXiv [csCL]"},{"key":"ref20","doi-asserted-by":"publisher","first-page":"e2308950121","DOI":"10.1073\/pnas.2308950121","article-title":"GPT is an effective tool for multilingual psychological text analysis","volume":"121","author":"Rathje","year":"2024","journal-title":"Proc. Natl. Acad. Sci."},{"key":"ref21","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2304.11085","article-title":"Testing the reliability of ChatGPT for text annotation and classification: a cautionary remark","author":"Reiss","year":"2023","journal-title":"arXiv [csCL]"},{"key":"ref22","doi-asserted-by":"crossref","first-page":"1640","DOI":"10.1145\/3459637.3482440","article-title":"Integrating pattern-and fact-based fake news detection via model preference learning","volume-title":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","author":"Sheng","year":"2021"},{"key":"ref23","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2102.02503","article-title":"Understanding the capabilities, limitations, and societal impact of large language models","author":"Tamkin","year":"2021","journal-title":"arXiv [csCL]"},{"key":"ref24","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2304.06588","article-title":"ChatGPT-4 outperforms experts and crowd workers in annotating political twitter messages with zero-shot learning","author":"T\u00f6rnberg","year":"2023","journal-title":"arXiv [csCL]"},{"key":"ref25","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1038\/s41746-024-01029-4","article-title":"Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs","volume":"7","author":"Wang","year":"2024","journal-title":"NPJ Digit Med"},{"key":"ref26","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1037\/0033-2909.132.2.249","article-title":"Does changing behavioral intentions engender behavior change? A meta-analysis of the experimental evidence","volume":"132","author":"Webb","year":"2006","journal-title":"Psychol. Bull."},{"key":"ref27","article-title":"Chain-of-thought prompting elicits reasoning in large language models","author":"Wei","year":"2022","journal-title":"Advances in neural information processing systems"},{"key":"ref28","article-title":"A prompt pattern catalog to enhance prompt engineering with ChatGPT","author":"White","year":"2023","journal-title":"arXiv [csSE]"},{"volume-title":"Sizing up twitter users","year":"2019","author":"Wojcik","key":"ref29"},{"key":"ref30","doi-asserted-by":"publisher","first-page":"103665","DOI":"10.1016\/j.ipm.2024.103665","article-title":"AI for social science and social science of AI: a survey","volume":"61","author":"Xu","year":"2024","journal-title":"Inf. Process. Manag."},{"key":"ref31","article-title":"Tree of thoughts: deliberate problem solving with large language models","author":"Yao","year":"2023","journal-title":"Advances in neural information processing systems"},{"key":"ref32","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2304.10145","article-title":"Can chatGPT reproduce human-generated labels? A study of social computing tasks","author":"Zhu","year":"2023","journal-title":"arXiv [csAI]"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1602984\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,23]],"date-time":"2025-07-23T05:36:04Z","timestamp":1753248964000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1602984\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,23]]},"references-count":32,"alternative-id":["10.3389\/frai.2025.1602984"],"URL":"https:\/\/doi.org\/10.3389\/frai.2025.1602984","relation":{},"ISSN":["2624-8212"],"issn-type":[{"type":"electronic","value":"2624-8212"}],"subject":[],"published":{"date-parts":[[2025,7,23]]},"article-number":"1602984"}}