{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,7]],"date-time":"2026-07-07T01:21:29Z","timestamp":1783387289012,"version":"3.54.6"},"reference-count":16,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T00:00:00Z","timestamp":1742947200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>Large language models (LLMs), such as GPT-4o, provide versatile techniques for generating and formatting structured data. However, prompt style plays a critical role in determining the accuracy, efficiency, and token cost of the generated outputs. This paper explores the effectiveness of three specific prompt styles\u2013JSON, YAML, and Hybrid CSV\/Prefix\u2013for structured data generation across diverse applications. We focus on scenarios such as personal stories, receipts, and medical records, using randomized datasets to evaluate each prompt style's impact. Our analysis examines these prompt styles across three key metrics: accuracy in preserving data attributes, token cost associated with output generation, and processing time required for completion. By incorporating structured validation and comparative analysis, we ensure precise evaluation of each prompt style's performance. Results are visualized through metrics-based comparisons, such as Prompt Style vs. Accuracy, Prompt Style vs. Token Cost, and Prompt Style vs. Processing Time. Our findings reveal trade-offs between prompt style complexity and performance, with JSON providing high accuracy for complex data, YAML offering a balance between readability and efficiency, and Hybrid CSV\/Prefix excelling in token and time efficiency for flat data structures. This paper explores the pros and cons of applying the GPT-4o LLM to generate structured data. It also provides practical recommendations for selecting prompt styles tailored to specific requirements, such as data integrity, cost-effectiveness, and real-time processing needs. Our findings contribute to research on how prompt engineering can optimize structured data generation for AI-driven applications, as well as documenting limitations that motivate future work needed to improve LLMs for complex tasks.<\/jats:p>","DOI":"10.3389\/frai.2025.1558938","type":"journal-article","created":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T06:53:07Z","timestamp":1742971987000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Enhancing structured data generation with GPT-4o evaluating prompt efficiency across prompt styles"],"prefix":"10.3389","volume":"8","author":[{"given":"Ashraf","family":"Elnashar","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jules","family":"White","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Douglas C.","family":"Schmidt","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1965","published-online":{"date-parts":[[2025,3,26]]},"reference":[{"key":"B1","article-title":"Gpt-4 Technical Report","author":"Achiam","year":"2023","journal-title":"arXiv preprint arXiv:2303.08774"},{"key":"B2","doi-asserted-by":"crossref","unstructured":"Arnes\n              J. I.\n            \n            \n              Horsch\n              A.\n            \n          \n          Schema-Based Priming of Large Language Model for Data Object Validation Compliance\n          \n          2023","DOI":"10.2139\/ssrn.4453361"},{"key":"B3","first-page":"9","article-title":"\u201cLightweight formats for product model data exchange and preservation,\u201d","author":"Ball","year":"2007","journal-title":"PV 2007 Conference"},{"key":"B4","article-title":"Language models are few-shot learners","author":"Brown","year":"2020","journal-title":"arXiv preprint arXiv:2005.14165"},{"key":"B5","doi-asserted-by":"publisher","first-page":"1418","DOI":"10.1038\/s41467-024-45563-x","article-title":"Structured information extraction from scientific text with large language models","volume":"15","author":"Dagdelen","year":"2024","journal-title":"Nat. Commun"},{"key":"B6","article-title":"Schema-guided natural language generation","author":"Du","year":"2020","journal-title":"arXiv preprint arXiv:2005.05480"},{"key":"B7","first-page":"1","article-title":"\u201cComparison between JSON and YAML for data serialization,\u201d","author":"Eriksson","year":"2011","journal-title":"The School of Computer Science and Engineering Royal Institute of Technology"},{"key":"B8","article-title":"Hierarchical neural story generation","author":"Fan","year":"2018","journal-title":"arXiv preprint arXiv:1805.04833"},{"key":"B9","doi-asserted-by":"publisher","first-page":"358","DOI":"10.1080\/02702711.2022.2156949","article-title":"The effects of the format and frequency of prompts on source evaluation and multiple-text comprehension","volume":"44","author":"Guo","year":"2023","journal-title":"Read. Psychol"},{"key":"B10","doi-asserted-by":"publisher","DOI":"10.1145\/3442442.3451893","article-title":"\u201cGenerating rich product descriptions for conversational E-commerce systems,\u201d","author":"Kedia","year":"2021","journal-title":"Companion Proceedings of the Web Conference 2021"},{"key":"B11","first-page":"215","article-title":"\u201cMedSyn: LLM-based synthetic medical text generation framework,\u201d","volume-title":"Joint European Conference on Machine Learning and Knowledge Discovery in Databases","author":"Kumichev","year":"2024"},{"key":"B12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3560815","article-title":"Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing","volume":"55","author":"Liu","year":"2023","journal-title":"ACM Comput. Surv"},{"key":"B13","article-title":"\u201cPrompt patterns for structured data extraction from unstructured text,\u201d","volume-title":"Proceedings of the 31st Pattern Languages of Programming (PLoP) Conference","author":"Moundas","year":"2024"},{"key":"B14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3636430","article-title":"StructCoder: structure-aware transformer for code generation","volume":"18","author":"Tipirneni","year":"2024","journal-title":"ACM Trans. Knowl. Discov Data"},{"key":"B15","article-title":"\u201cAttention is all you need,\u201d","author":"Vaswani","year":"2017","journal-title":"Advances in Neural Information Processing Systems"},{"key":"B16","article-title":"Mint: evaluating LLMs in multi-turn interaction with tools and language feedback","author":"Wang","year":"2023","journal-title":"arXiv preprint arXiv:2309.10691"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1558938\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T06:53:34Z","timestamp":1742972014000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1558938\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,26]]},"references-count":16,"alternative-id":["10.3389\/frai.2025.1558938"],"URL":"https:\/\/doi.org\/10.3389\/frai.2025.1558938","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,26]]},"article-number":"1558938"}}