{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T18:02:52Z","timestamp":1772906572471,"version":"3.50.1"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643685335","type":"electronic"}],"license":[{"start":{"date-parts":[[2024,8,22]],"date-time":"2024-08-22T00:00:00Z","timestamp":1724284800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,8,22]]},"abstract":"<jats:p>Synthetic tabular health data plays a crucial role in healthcare research, addressing privacy regulations and the scarcity of publicly available datasets. This is essential for diagnostic and treatment advancements. Among the most promising models are transformer-based Large Language Models (LLMs) and Generative Adversarial Networks (GANs). In this paper, we compare LLM models of the Pythia LLM Scaling Suite with varying model sizes ranging from 14M to 1B, against a reference GAN model (CTGAN). The generated synthetic data are used to train random forest estimators for classification tasks to make predictions on the real-world data. Our findings indicate that as the number of parameters increases, LLM models outperform the reference GAN model. Even the smallest 14M parameter models perform comparably to GANs. Moreover, we observe a positive correlation between the size of the training dataset and model performance. We discuss implications, challenges, and considerations for the real-world usage of LLM models for synthetic tabular data generation.<\/jats:p>","DOI":"10.3233\/shti240571","type":"book-chapter","created":{"date-parts":[[2024,8,23]],"date-time":"2024-08-23T09:54:07Z","timestamp":1724406847000},"source":"Crossref","is-referenced-by-count":2,"title":["Large Language Models for Synthetic Tabular Health Data: A Benchmark Study"],"prefix":"10.3233","author":[{"given":"Marko","family":"Miletic","sequence":"first","affiliation":[{"name":"Bern University of Applied Sciences, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3432-2860","authenticated-orcid":false,"given":"Murat","family":"Sariyar","sequence":"additional","affiliation":[{"name":"Bern University of Applied Sciences, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"7437","container-title":["Studies in Health Technology and Informatics","Digital Health and Informatics Innovations for Sustainable Health Care Systems"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/SHTI240571","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,23]],"date-time":"2024-08-23T09:54:09Z","timestamp":1724406849000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/SHTI240571"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,22]]},"ISBN":["9781643685335"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/shti240571","relation":{},"ISSN":["0926-9630","1879-8365"],"issn-type":[{"value":"0926-9630","type":"print"},{"value":"1879-8365","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,22]]}}}