{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T12:38:54Z","timestamp":1774010334642,"version":"3.50.1"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T00:00:00Z","timestamp":1772064000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T00:00:00Z","timestamp":1773964800000},"content-version":"vor","delay-in-days":22,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Regulators Pioneer Fund, MRC & Innovate"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Intell Syst"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Synthetic data offers a promising avenue for addressing privacy, scarcity, and fairness challenges in healthcare datasets. However, there is limited evaluation of how different generation methods balance fidelity, utility, and fairness, particularly for underrepresented subgroups. This study addresses this gap by comparing representative generative modelling techniques, both probabilistic and deep approaches, that are popular in the research literature. We empirically evaluate BayesBoost, CTGAN, TVAE, CopulaGAN, and DECAF on two healthcare datasets containing numerical, binary, and categorical features. Each model\u2019s performance is assessed along three axes: data fidelity, machine learning utility, and fairness, using Accuracy Parity, Equalised Odds, and Predictive Rate Parity. Results show that BayesBoost consistently achieved superior fidelity, utility, and fairness preservation, particularly when paired with Random Forest classifiers, achieving around\n                    <jats:italic>60<\/jats:italic>\n                    \u2013\n                    <jats:italic>63<\/jats:italic>\n                    % higher downstream utility than GAN-based deep generative baselines (e.g., Random Forest accuracy up to\n                    <jats:italic>0.88<\/jats:italic>\n                    with BayesBoost versus\n                    <jats:italic>0.54<\/jats:italic>\n                    to\n                    <jats:italic>\u2212 0.55<\/jats:italic>\n                    for GAN-based methods). Deep generative models, while effective in capturing complex structures, often degraded fairness, especially for underrepresented groups, with equalised odds deviating by over\n                    <jats:italic>100<\/jats:italic>\n                    % from the ideal parity value of\n                    <jats:italic>1.0<\/jats:italic>\n                    in some settings. The Variational Autoencoder outperformed other deep generative models in fairness preservation, especially for equalised odds, although with some reduction in fidelity and utility. Overall, these findings suggest that synthetic data generation for healthcare must move beyond fidelity evaluations to explicitly assess fairness and subgroup impacts, with probabilistic models such as BayesBoost showing strong potential for ethical deployment, while deep generative models require further adaptation for fairness-sensitive applications.\n                  <\/jats:p>","DOI":"10.1007\/s44196-026-01173-7","type":"journal-article","created":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T03:23:16Z","timestamp":1772076196000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Probabilistic Versus Deep Generative Models: A Fairness Centred Evaluation of Synthetic Healthcare Tabular Data"],"prefix":"10.1007","volume":"19","author":[{"given":"Dima","family":"Alattal","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Barbara","family":"Draghi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Puja","family":"Myles","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Richard","family":"Branson","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Allan","family":"Tucker","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2026,2,26]]},"reference":[{"key":"1173_CR1","doi-asserted-by":"publisher","unstructured":"Jordon, J., Szpruch, L., Houssiau, F., Bottarelli, M., Cherubin, G., et al.: Synthetic data - what, why and how? Preprint at (2022). https:\/\/doi.org\/10.48550\/arXiv.2205.03257","DOI":"10.48550\/arXiv.2205.03257"},{"key":"1173_CR2","doi-asserted-by":"publisher","first-page":"53275","DOI":"10.7554\/eLife.53275","volume":"9","author":"DS Quintana","year":"2020","unstructured":"Quintana, D.S.: A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation. Elife 9, 53275 (2020)","journal-title":"Elife"},{"key":"1173_CR3","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1016\/j.neucom.2022.04.053","volume":"493","author":"M Hernandez","year":"2022","unstructured":"Hernandez, M., Epelde, G., Alberdi, A., Cilla, R., Rankin, D.: Synthetic data generation for tabular health records: a systematic review. Neurocomputing 493, 28\u201345 (2022)","journal-title":"Neurocomputing"},{"issue":"1","key":"1173_CR4","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1080\/20476965.2020.1857214","volume":"10","author":"PR Harper","year":"2021","unstructured":"Harper, P.R., Moore, J.W., Woolley, T.E.: Covid-19 transmission modelling of students returning home from university. Health Systems 10(1), 31\u201340 (2021)","journal-title":"Health Systems"},{"key":"1173_CR5","doi-asserted-by":"crossref","unstructured":"Caramelo, F., Ferreira, N., Oliveiros, B.: Estimation of risk factors for covid-19 mortality-preliminary results. 2020\u201302 (2020)","DOI":"10.1101\/2020.02.24.20027268"},{"key":"1173_CR6","doi-asserted-by":"crossref","unstructured":"Wang, Z., Myles, P., Tucker, A.: Generating and evaluating synthetic uk primary care data: preserving data utility & patient privacy. In: 2019 IEEE 32nd International symposium on computer-based medical systems (CBMS), pp. 126\u2013131 (2019). IEEE","DOI":"10.1109\/CBMS.2019.00036"},{"issue":"2","key":"1173_CR7","doi-asserted-by":"publisher","first-page":"819","DOI":"10.1111\/coin.12427","volume":"37","author":"Z Wang","year":"2021","unstructured":"Wang, Z., Myles, P., Tucker, A.: Generating and evaluating cross-sectional synthetic electronic healthcare data: preserving data utility and patient privacy. Comput. Intell. 37(2), 819\u2013851 (2021)","journal-title":"Comput. Intell."},{"key":"1173_CR8","doi-asserted-by":"crossref","unstructured":"Draghi, B., Wang, Z., Myles, P., Tucker, A.: Bayesboost: identifying and handling bias using synthetic data generators. In: Third international workshop on learning with imbalanced domains: theory and applications, pp. 49\u201362 (2021). PMLR","DOI":"10.2139\/ssrn.4052302"},{"key":"1173_CR9","doi-asserted-by":"crossref","unstructured":"Yadav, P., Gaur, M., Madhukar, R.K., Verma, G., Kumar, P.: Rigorous experimental analysis of tabular data generated using tvae and ctgan. Int. J. Adv. Comput. Sci. Appl. 15(4) (2024)","DOI":"10.14569\/IJACSA.2024.01504125"},{"key":"1173_CR10","doi-asserted-by":"publisher","first-page":"15","DOI":"10.9734\/ajrcos\/2023\/v16i1331","volume":"16","author":"P Marecha","year":"2023","unstructured":"Marecha, P., Ye, L.: Generation and evaluation of tabular data in different domains using gans. Asian J. Res. Comput. Sci. 16, 15\u201327 (2023). https:\/\/doi.org\/10.9734\/ajrcos\/2023\/v16i1331","journal-title":"Asian J. Res. Comput. Sci."},{"issue":"14","key":"1173_CR11","doi-asserted-by":"publisher","first-page":"5975","DOI":"10.3390\/app14145975","volume":"14","author":"M Miletic","year":"2024","unstructured":"Miletic, M., Sariyar, M.: Challenges of using synthetic data generation methods for tabular microdata. Appl. Sci. 14(14), 5975 (2024)","journal-title":"Appl. Sci."},{"key":"1173_CR12","first-page":"22221","volume":"34","author":"B Van Breugel","year":"2021","unstructured":"Van Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: Decaf: generating fair synthetic data using causally-aware generative networks. Adv. Neural. Inf. Process. Syst. 34, 22221\u201322233 (2021)","journal-title":"Adv. Neural. Inf. Process. Syst."},{"key":"1173_CR13","unstructured":"Liu, T., Qian, Z., Berrevoets, J., Schaar, M.: Goggle: Generative modelling for tabular data by learning relational structure. In: The Eleventh international conference on learning representations (2023)"},{"key":"1173_CR14","doi-asserted-by":"crossref","unstructured":"Wang, A.X., Chukova, S.S., Simpson, C.R., Nguyen, B.P.: Challenges and opportunities of generative models on tabular data. Appl. Soft Comput. 112223 (2024)","DOI":"10.1016\/j.asoc.2024.112223"},{"key":"1173_CR15","unstructured":"Stoian, M.C., Dyrmishi, S., Cordy, M., Lukasiewicz, T., Giunchiglia, E.: How realistic is your synthetic data? constraining deep generative models for tabular data. arXiv preprint arXiv:2402.04823 (2024)"},{"key":"1173_CR16","doi-asserted-by":"crossref","unstructured":"Nik, A.H.Z., Riegler, M.A., Halvorsen, P., Stor\u00e5s, A.M.: Generation of synthetic tabular healthcare data using generative adversarial networks. In: International conference on multimedia modeling. pp. 434\u2013446 (2023). Springer","DOI":"10.1007\/978-3-031-27077-2_34"},{"key":"1173_CR17","doi-asserted-by":"crossref","unstructured":"Pezoulas, V.C., Zaridis, D.I., Mylona, E., Androutsos, C., Apostolidis, K., Tachos, N.S., Fotiadis, D.I.: Synthetic data generation methods in healthcare: a review on open-source tools and methods. Comput. Struct. Biotechnol. J. (2024)","DOI":"10.1016\/j.csbj.2024.07.005"},{"key":"1173_CR18","doi-asserted-by":"crossref","unstructured":"Zand, R., Abedi, V., Hontecillas, R., Lu, P., Noorbakhsh-Sabet, N., Verma, M., Leber, A., Tubau-Juni, N., Bassaganya-Riera, J.: Development of synthetic patient populations and in silico clinical trials. Accelerated Path to Cures, 57\u201377 (2018)","DOI":"10.1007\/978-3-319-73238-1_5"},{"issue":"5","key":"1173_CR19","doi-asserted-by":"publisher","first-page":"1699","DOI":"10.1093\/bib\/bby043","volume":"20","author":"F Pappalardo","year":"2019","unstructured":"Pappalardo, F., Russo, G., Tshinanu, F.M., Viceconti, M.: In silico clinical trials: concepts and early adoptions. Brief. Bioinform. 20(5), 1699\u20131708 (2019)","journal-title":"Brief. Bioinform."},{"issue":"11","key":"1173_CR20","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1145\/3422622","volume":"63","author":"I Goodfellow","year":"2020","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM 63(11), 139\u2013144 (2020)","journal-title":"Commun. ACM"},{"key":"1173_CR21","unstructured":"Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional gan. Advances in neural information processing systems. 32 (2019)"},{"key":"1173_CR22","doi-asserted-by":"crossref","unstructured":"Shahul Hameed, M.A., Qureshi, A.M., Kaushik, A.: Bias mitigation via synthetic data generation: a review. Electronics (2079\u20139292) 13(19) (2024)","DOI":"10.3390\/electronics13193909"},{"key":"1173_CR23","doi-asserted-by":"publisher","first-page":"231197","DOI":"10.1001\/jamahealthforum.2023.1197","volume":"4","author":"A Jain","year":"2023","unstructured":"Jain, A., Brooks, J.R., Alford, C.C., Chang, C.S., Mueller, N.M., Umscheid, C.A., Bierman, A.S.: Awareness of racial and ethnic bias and potential solutions to address bias with use of health care algorithms. JAMA Health Forum 4, 231197\u2013231197 (2023). (American Medical Association)","journal-title":"JAMA Health Forum"},{"key":"1173_CR24","doi-asserted-by":"crossref","unstructured":"Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd innovations in theoretical computer science conference, pp. 214\u2013226 (2012)","DOI":"10.1145\/2090236.2090255"},{"issue":"2","key":"1173_CR25","doi-asserted-by":"publisher","first-page":"153","DOI":"10.1089\/big.2016.0047","volume":"5","author":"A Chouldechova","year":"2017","unstructured":"Chouldechova, A.: Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big data 5(2), 153\u2013163 (2017)","journal-title":"Big data"},{"key":"1173_CR26","unstructured":"Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. Advances in neural information processing systems. 29 (2016)"},{"key":"1173_CR27","doi-asserted-by":"crossref","unstructured":"Verma, S., Rubin, J.: Fairness definitions explained. In: Proceedings of the international workshop on software fairness. pp. 1\u20137 (2018)","DOI":"10.1145\/3194770.3194776"},{"key":"1173_CR28","doi-asserted-by":"publisher","unstructured":"Datalink, C.P.R.: CPRD cardiovascular disease synthetic dataset (Version 2020.06.001) [Data set]. Clinical Practice Research Datalink (2020). https:\/\/doi.org\/10.11581\/YK6N-B652","DOI":"10.11581\/YK6N-B652"},{"key":"1173_CR29","doi-asserted-by":"publisher","unstructured":"kharoua, R.E.: Diabetes Health Dataset Analysis. Kaggle (2024). https:\/\/doi.org\/10.34740\/KAGGLE\/DSV\/8665939","DOI":"10.34740\/KAGGLE\/DSV\/8665939"},{"key":"1173_CR30","unstructured":"Fusion, B.: GeNIe Modeller. Bayes Fusion (2025). https:\/\/www.bayesfusion.com\/"},{"key":"1173_CR31","doi-asserted-by":"publisher","unstructured":"Alattal, D.R., Wang, Z., Myles, P., Tucker, A.: Creating synthetic geospatial patient data to mimic real data whilst preserving privacy: *2022 35th international symposium on computer-based medical systems (cbms). In: 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), pp. 7\u201312 (2023). https:\/\/doi.org\/10.1109\/CBMS58004.2023.00183","DOI":"10.1109\/CBMS58004.2023.00183"},{"key":"1173_CR32","volume":"15","author":"E Ferrara","year":"2024","unstructured":"Ferrara, E.: The butterfly effect in artificial intelligence systems: implications for ai bias and fairness. Mach. Learn. Appl. 15, 100525 (2024)","journal-title":"Mach. Learn. Appl."},{"key":"1173_CR33","doi-asserted-by":"crossref","unstructured":"Draghi, B., Wang, Z., Myles, P., Tucker, A.: Identifying and handling data bias within primary healthcare data using synthetic data generators. Heliyon. 10(2) (2024)","DOI":"10.1016\/j.heliyon.2024.e24164"},{"key":"1173_CR34","doi-asserted-by":"crossref","unstructured":"Makhlouf, K., Zhioua, S., Palamidessi, C.: When causality meets fairness: a survey. J. Log. Algeb. Methods Programm. 101000 (2024)","DOI":"10.1016\/j.jlamp.2024.101000"},{"key":"1173_CR35","unstructured":"Binkyte-Sadauskiene, R., Makhlouf, K., Pinz\u00f3n, C., Zhioua, S., Palamidessi, C.: Causal discovery for fairness. CoRR arXiv: 2206.06685 (2022)"}],"container-title":["International Journal of Computational Intelligence Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s44196-026-01173-7","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44196-026-01173-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44196-026-01173-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T10:36:40Z","timestamp":1774003000000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s44196-026-01173-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,26]]},"references-count":35,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,12]]}},"alternative-id":["1173"],"URL":"https:\/\/doi.org\/10.1007\/s44196-026-01173-7","relation":{},"ISSN":["1875-6883"],"issn-type":[{"value":"1875-6883","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,26]]},"assertion":[{"value":"8 September 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 January 2026","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 January 2026","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 February 2026","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"For the use of CPRD data to generate synthetic data: this was covered by CPRD\u2019s Database Research Ethics Approval (IRAS number: 242149).","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"Not applicable.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Materials availability"}},{"value":"Not applicable.","order":6,"name":"Ethics","group":{"name":"EthicsHeading","label":"Code availability"}}],"article-number":"135"}}