{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T05:14:09Z","timestamp":1775020449790,"version":"3.50.1"},"reference-count":39,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,4,24]],"date-time":"2025-04-24T00:00:00Z","timestamp":1745452800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Digit. Health"],"abstract":"<jats:p>The generation of synthetic tabular data has emerged as a key privacy-enhancing technology to address challenges in data sharing, particularly in healthcare, where sensitive attributes can compromise patient privacy. Despite significant progress, balancing fidelity, utility, and privacy in complex medical datasets remains a substantial challenge. This paper introduces a comprehensive and holistic evaluation framework for synthetic tabular data, consolidating metrics and privacy risk measures across three key categories (fidelity, utility and privacy) and incorporating a fidelity-utility tradeoff metric. The framework was applied to three open-source medical datasets to evaluate synthetic tabular data generated by five generative models, both with and without differential privacy. Results showed that simpler models generally achieved better fidelity and utility, while more complex models provided lower privacy risks. The addition of differential privacy enhanced privacy preservation but often reduced fidelity and utility, highlighting the complexity of balancing fidelity, utility and privacy in synthetic data generation for medical datasets. Despite its contributions, this study acknowledges limitations, such as the lack of evaluation metrics neither privacy risk measures for required model training time and resource usage, reliance on default model parameters, and the assessment of models that incorporates differential privacy with only a single privacy budget. Future work should explore parameter optimization, alternative privacy mechanisms, broader applications of the framework to diverse datasets and domains, and collaborations with clinicians for clinical utility evaluation. This study provides a foundation for improving synthetic tabular data evaluation and advancing privacy-preserving data sharing in healthcare.<\/jats:p>","DOI":"10.3389\/fdgth.2025.1576290","type":"journal-article","created":{"date-parts":[[2025,4,24]],"date-time":"2025-04-24T05:29:26Z","timestamp":1745472566000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Comprehensive evaluation framework for synthetic tabular data in health: fidelity, utility and privacy analysis of generative models with and without privacy guarantees"],"prefix":"10.3389","volume":"7","author":[{"given":"Mikel","family":"Hernandez","sequence":"first","affiliation":[]},{"given":"Pablo A.","family":"Osorio-Marulanda","sequence":"additional","affiliation":[]},{"given":"Mikel","family":"Catalina","sequence":"additional","affiliation":[]},{"given":"Lorea","family":"Loinaz","sequence":"additional","affiliation":[]},{"given":"Gorka","family":"Epelde","sequence":"additional","affiliation":[]},{"given":"Naiara","family":"Aginako","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,4,24]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1155\/2020\/8830200","article-title":"The recent progress and applications of digital technologies in healthcare: a review","volume":"2020","author":"Senbekov","year":"2020","journal-title":"Int J Telemed Appl"},{"key":"B2","doi-asserted-by":"publisher","first-page":"341","DOI":"10.51594\/imsrj.v4i3.932","article-title":"The impact of big data on healthcare product development: a theoretical and analytical review","volume":"4","author":"Oluwaseun Ogundipe","year":"2024","journal-title":"Int Med Sci Res J"},{"key":"B3","article-title":"Regulation (EU) 2016\/679 of the European parliament and of the council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95\/46\/EC (general data protection regulation) (Tech. rep.)","year":""},{"key":"B4","article-title":"Health insurance portability and accountability act of 1996 (HIPAA) (Tech. rep.). U.S. Department of Health & Human Services (1996)","year":""},{"key":"B5","doi-asserted-by":"publisher","first-page":"88048","DOI":"10.1109\/ACCESS.2024.3417608","article-title":"Privacy mechanisms and evaluation metrics for synthetic data generation: a systematic review","volume":"12","author":"Osorio-Marulanda","year":"2024","journal-title":"IEEE Access"},{"key":"B6","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1016\/j.neucom.2022.04.053","article-title":"Synthetic data generation for tabular health records: a systematic review","volume":"493","author":"Hernandez","year":"2022","journal-title":"Neurocomputing"},{"key":"B7","doi-asserted-by":"publisher","first-page":"55","DOI":"10.3390\/bdcc8060055","article-title":"A secure data publishing and access service for sensitive data from living labs: enabling collaboration with external researchers via shareable data","volume":"8","author":"Hernandez","year":"2024","journal-title":"Big Data Cogn Comput"},{"key":"B8","doi-asserted-by":"publisher","first-page":"105763","DOI":"10.1016\/j.ijmedinf.2024.105763","article-title":"Synthetic data generation in healthcare: a scoping review of reviews on domains, motivations, and future applications","volume":"195","author":"Rujas","year":"2025","journal-title":"Int J Med Inform"},{"key":"B9","doi-asserted-by":"crossref","DOI":"10.1145\/3339252.3339281","article-title":"On the utility of synthetic data: an empirical evaluation on machine learning tasks","author":"Hittmeir","year":""},{"key":"B10","doi-asserted-by":"publisher","first-page":"e18910","DOI":"10.2196\/18910","article-title":"Reliability of supervised machine learning using synthetic data in health care: model to preserve privacy for data sharing","volume":"8","author":"Rankin","year":"2020","journal-title":"JMIR Med Inform"},{"key":"B11","doi-asserted-by":"publisher","first-page":"11147","DOI":"10.1109\/ACCESS.2022.3144765","article-title":"A multi-dimensional evaluation of synthetic data generators","volume":"10","author":"Dankar","year":"2022","journal-title":"IEEE Access"},{"key":"B12","doi-asserted-by":"publisher","first-page":"e35734","DOI":"10.2196\/35734","article-title":"Utility metrics for evaluating synthetic health data generation methods: validation study","volume":"10","author":"El Emam","year":"2022","journal-title":"JMIR Med Inform"},{"key":"B13","doi-asserted-by":"publisher","first-page":"e19","DOI":"10.1055\/s-0042-1760247","article-title":"Synthetic tabular data evaluation in the health domain covering resemblance, utility, and privacy dimensions","volume":"62","author":"Hernandez","year":"2023","journal-title":"Methods Inf Med"},{"key":"B14","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1007\/s10618-024-01081-4","article-title":"Syntheval: a framework for detailed utility and privacy evaluation of tabular synthetic data","volume":"39","author":"Lautrup","year":"2025","journal-title":"Data Min Knowl Discov"},{"key":"B15","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-031-63219-8_24","article-title":"An evaluation framework for synthetic data generation models","author":"Livieris","year":""},{"key":"B16","doi-asserted-by":"publisher","first-page":"44497","DOI":"10.1109\/ACCESS.2025.3549680","article-title":"Utility meets privacy: a critical evaluation of tabular data synthesizers","volume":"13","author":"H\u00f6llig","year":"2025","journal-title":"IEEE Access"},{"key":"B17","article-title":"Structured evaluation of synthetic tabular data. arXiv [Preprint]","author":"Cheng-Hsin Yang","year":""},{"key":"B18","doi-asserted-by":"publisher","first-page":"105413","DOI":"10.1016\/j.ijmedinf.2024.105413","article-title":"Can I trust my fake data\u2014a comprehensive quality assessment framework for synthetic tabular data in healthcare","volume":"185","author":"Vallevik","year":"2024","journal-title":"Int J Med Inform"},{"key":"B19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3666006","article-title":"Thinking in categories: a survey on assessing the quality for time series synthesis","volume":"16","author":"Stenger","year":"2024","journal-title":"J Data Inform Qual"},{"key":"B20","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1038\/s41746-024-01359-3","article-title":"A scoping review of privacy and utility metrics in medical synthetic data","volume":"8","author":"Kaabachi","year":"2025","journal-title":"npj Digit Med"},{"key":"B21","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-030-62365-4_3","article-title":"A novel evaluation metric for synthetic data generation","author":"Galloni","year":""},{"key":"B22","doi-asserted-by":"publisher","first-page":"107043","DOI":"10.1016\/j.csda.2020.107043.","article-title":"A new correlation coefficient between categorical, ordinal and interval variables with pearson characteristics","volume":"152","author":"Baak","year":"2020","journal-title":"Comput Stat Data Anal"},{"key":"B23","doi-asserted-by":"publisher","first-page":"783","DOI":"10.1214\/aos\/1018031260","article-title":"Multivariate analysis by data depth: descriptive statistics, graphics and inference, (with discussion and a rejoinder by Liu and Singh)","volume":"27","author":"Liu","year":"1999","journal-title":"Ann Stat"},{"key":"B24","doi-asserted-by":"publisher","first-page":"1601","DOI":"10.3390\/electronics12071601","article-title":"Nonparametric generation of synthetic data using copulas","volume":"12","author":"Restrepo","year":"2023","journal-title":"Electronics"},{"key":"B25","article-title":"A unified framework for quantifying privacy risk in synthetic data. In: Proceedings of Privacy Enhancing Technologies Symposium (2023)","author":"Giomi","year":""},{"key":"B26","doi-asserted-by":"publisher","first-page":"2209","DOI":"10.1056\/NEJMoa1516192","article-title":"Genomic classification and prognosis in acute myeloid leukemia","volume":"374","author":"Papaemmanuil","year":"2016","journal-title":"New Engl J Med"},{"key":"B27","article-title":"Data from: Data for: a hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical-datasets (2019)","author":"Liu","year":""},{"key":"B28","article-title":"Data from: Cardiovascular disease dataset (2018)","author":"Ulianova","year":""},{"key":"B29","article-title":"Differentially private non parametric copulas: generating synthetic data with non parametric copulas under privacy guarantees","author":"Osorio-Marulanda","year":""},{"key":"B30","doi-asserted-by":"crossref","DOI":"10.1109\/DSAA.2016.49","article-title":"The synthetic data vault","author":"Patki","year":""},{"key":"B31","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1186\/s12874-020-00977-1","article-title":"Generation and evaluation of synthetic patient data","volume":"20","author":"Goncalves","year":"2020","journal-title":"BMC Med Res Methodol"},{"key":"B32","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1016\/j.neucom.2019.12.136","article-title":"Generation and evaluation of privacy preserving synthetic health data","volume":"416","author":"Yale","year":"2020","journal-title":"Neurocomputing"},{"key":"B33","article-title":"Creating a differential privacy securing synthetic data generation for tabular, relational and time series data (Tech. rep.). GitHub (2022)","author":"Priyanshu","year":""},{"key":"B34","article-title":"Modeling tabular data using conditional GaN","author":"Xu","year":""},{"key":"B35","article-title":"Generating tabular datasets under differential privacy. arXiv [Preprint]","author":"Truda","year":""},{"key":"B36","article-title":"Tablediffusion (Tech. rep.). Github (2023)","author":"Truda","year":""},{"key":"B37","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-540-79228-4_1","article-title":"Differential privacy: a survey of results","author":"Dwork","year":""},{"key":"B38","first-page":"39","article-title":"Accelerating the machine learning lifecycle with mlflow","volume":"41","author":"Zaharia","year":"2018","journal-title":"IEEE Data Eng Bull"},{"key":"B39","article-title":"Systematic assessment of tabular data synthesis. In: Proceedings of the 13th International Conference on Learning Representations (ICLR). (2024)","author":"Du","year":""}],"container-title":["Frontiers in Digital Health"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1576290\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,24]],"date-time":"2025-04-24T05:29:29Z","timestamp":1745472569000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1576290\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,24]]},"references-count":39,"alternative-id":["10.3389\/fdgth.2025.1576290"],"URL":"https:\/\/doi.org\/10.3389\/fdgth.2025.1576290","relation":{},"ISSN":["2673-253X"],"issn-type":[{"value":"2673-253X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,24]]},"article-number":"1576290"}}