{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,5]],"date-time":"2026-05-05T01:07:01Z","timestamp":1777943221997,"version":"3.51.4"},"reference-count":52,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T00:00:00Z","timestamp":1777680000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T00:00:00Z","timestamp":1777680000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"WASP\u2013DDLS postdoctoral grant Visualization and de-Identification of Biobank Data to Propel Precision Medicine Research","award":["KAW 2023-03705"],"award-info":[{"award-number":["KAW 2023-03705"]}]},{"name":"WASP\u2013DDLS postdoctoral grant Visualization and de-Identification of Biobank Data to Propel Precision Medicine Research","award":["KAW 2023-03705"],"award-info":[{"award-number":["KAW 2023-03705"]}]},{"DOI":"10.13039\/100012538","name":"Swedish Cancer Foundation","doi-asserted-by":"publisher","award":["24 3406 Pj 01 H"],"award-info":[{"award-number":["24 3406 Pj 01 H"]}],"id":[{"id":"10.13039\/100012538","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Ume\u00e5 University Infrastructure","award":["FS 2.1.6-1689-24"],"award-info":[{"award-number":["FS 2.1.6-1689-24"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Access to large, diverse biomedical datasets is critical for advancing medical research, yet privacy regulations severely restrict data sharing. We present an end-to-end framework for privacy-preserving health data synthesis that integrates advanced deep generative models (DGMs) with robust preprocessing, formal differential privacy (DP) training for select DGMs, empirical privacy risk evaluation, data-sufficiency analysis, domain-guided quality control, and biobank visualization tools. Released as open-source containerized software, the framework ensures reproducible deployment while preserving statistical fidelity, machine learning (ML) utility, and privacy guarantees. Empirical evaluations across diverse biobank datasets demonstrate that \u2014a transformer-based diffusion model\u2013combined with our correlation\u2014and distribution-aware  loss function achieves superior performance balancing fidelity, privacy, and computational efficiency. The tailored preprocessing pipeline effectively handles high missingness rates, substantially improving distributional accuracy and clinical plausibility. Across 26 biobank datasets spanning three regulatory levels, the framework shows that  with correlation- and distribution-aware loss function consistently achieves superior performance in terms of fidelity, privacy, and computational efficiency.<\/jats:p>","DOI":"10.1038\/s41746-026-02662-x","type":"journal-article","created":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T07:05:38Z","timestamp":1777705538000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Anonymization and visualization of health data and biomarkers"],"prefix":"10.1038","volume":"9","author":[{"given":"Minh H.","family":"Vu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel","family":"Edler","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carl","family":"Wibom","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Martin","family":"Rosvall","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Beatrice","family":"Melin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2026,5,2]]},"reference":[{"key":"2662_CR1","doi-asserted-by":"publisher","first-page":"2929","DOI":"10.1038\/s41591-023-02608-w","volume":"29","author":"A Arora","year":"2023","unstructured":"Arora, A. et al. The value of standards for health datasets in artificial intelligence-based applications. Nat. Med. 29, 2929\u20132938 (2023).","journal-title":"Nat. Med."},{"key":"2662_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1177\/2050312120934839","volume":"8","author":"M Mallappallil","year":"2020","unstructured":"Mallappallil, M. et al. A review of big data and medical research. SAGE Open Med. 8, 1\u201311 (2020).","journal-title":"SAGE Open Med."},{"key":"2662_CR3","doi-asserted-by":"publisher","DOI":"10.1186\/s12911-018-0719-2","volume":"18","author":"M Prosperi","year":"2018","unstructured":"Prosperi, M. et al. Big data hurdles in precision medicine and precision public health. BMC Med. Inform. Decis. Mak. 18, 74 (2018).","journal-title":"BMC Med. Inform. Decis. Mak."},{"key":"2662_CR4","doi-asserted-by":"publisher","first-page":"812","DOI":"10.1093\/bib\/bbaa418","volume":"22","author":"A Dagliati","year":"2021","unstructured":"Dagliati, A. et al. Health informatics and EHR to support clinical research in the big data era. Brief. Bioinforma. 22, 812\u2013827 (2021).","journal-title":"Brief. Bioinforma."},{"key":"2662_CR5","doi-asserted-by":"publisher","first-page":"e2312417","DOI":"10.1001\/jamanetworkopen.2023.45892","volume":"6","author":"MY Ng","year":"2023","unstructured":"Ng, M. Y. et al. Perceptions of data set experts on important characteristics of health data sets for AI readiness. JAMA Netw. Open 6, e2312417 (2023).","journal-title":"JAMA Netw. Open"},{"key":"2662_CR6","doi-asserted-by":"publisher","first-page":"687","DOI":"10.1197\/jamia.M2470","volume":"14","author":"RL Richesson","year":"2007","unstructured":"Richesson, R. L. & Krischer, J. P. Data standards in clinical research: gaps, overlaps, challenges and opportunities. J. Am. Med. Inform. Assoc. 14, 687\u2013695 (2007).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"2662_CR7","doi-asserted-by":"publisher","DOI":"10.1038\/sdata.2016.18","volume":"3","author":"MD Wilkinson","year":"2016","unstructured":"Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).","journal-title":"Sci. Data"},{"key":"2662_CR8","doi-asserted-by":"crossref","unstructured":"Miotto, R., Li, L., Kidd, B. A. & Dudley, J. T. Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records. Sci. Rep. 6, 26094 (2016).","DOI":"10.1038\/srep26094"},{"key":"2662_CR9","doi-asserted-by":"publisher","DOI":"10.1038\/s41746-018-0029-1","volume":"1","author":"A Rajkomar","year":"2018","unstructured":"Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. NPJ Digital Med. 1, 18 (2018).","journal-title":"NPJ Digital Med."},{"key":"2662_CR10","unstructured":"European Parliament and Council of the European Union. Regulation (EU) 2016\/679 (General Data Protection Regulation). Off. J. Eur. Union https:\/\/eur-lex.europa.eu\/eli\/reg\/2016\/679\/oj (2016)."},{"key":"2662_CR11","doi-asserted-by":"publisher","DOI":"10.1038\/s41746-020-00362-8","volume":"4","author":"D McGraw","year":"2021","unstructured":"McGraw, D. & Mandl, K. D. Privacy protections to encourage use of health-relevant digital data in a learning health system. NPJ Digital Med. 4, 2 (2021).","journal-title":"NPJ Digital Med."},{"key":"2662_CR12","doi-asserted-by":"publisher","first-page":"e069925","DOI":"10.1136\/bmjopen-2022-069925","volume":"13","author":"MC Jones","year":"2023","unstructured":"Jones, M. C., Stone, T., Mason, S. M., Eames, A. & Franklin, M. Navigating data governance associated with real-world data for public benefit: an overview in the UK and future considerations. BMJ Open 13, e069925 (2023).","journal-title":"BMJ Open"},{"key":"2662_CR13","doi-asserted-by":"publisher","first-page":"6967166","DOI":"10.1155\/2021\/6967166","volume":"2021","author":"D Xiang","year":"2021","unstructured":"Xiang, D. & Cai, W. Privacy protection and secondary use of health data: strategies and methods. BioMed. Res. Int. 2021, 6967166 (2021).","journal-title":"BioMed. Res. Int."},{"key":"2662_CR14","doi-asserted-by":"publisher","first-page":"1297","DOI":"10.1016\/S0140-6736(18)33067-8","volume":"393","author":"KM Keyes","year":"2019","unstructured":"Keyes, K. M. & Westreich, D. Uk biobank, big data, and the consequences of non-representativeness. Lancet 393, 1297 (2019).","journal-title":"Lancet"},{"key":"2662_CR15","doi-asserted-by":"publisher","first-page":"1216","DOI":"10.1038\/s41562-023-01579-9","volume":"7","author":"T Schoeler","year":"2023","unstructured":"Schoeler, T. et al. Participation bias in the UK Biobank distorts genetic associations and downstream analyses. Nat. Hum. Behav. 7, 1216\u20131227 (2023).","journal-title":"Nat. Hum. Behav."},{"key":"2662_CR16","doi-asserted-by":"publisher","first-page":"dyae054","DOI":"10.1093\/ije\/dyae054","volume":"53","author":"S van Alten","year":"2024","unstructured":"van Alten, S., Domingue, B. W., Faul, J., Galama, T. & Marees, A. T. Reweighting UK Biobank corrects for pervasive selection bias due to volunteering. Int. J. Epidemiol. 53, dyae054 (2024).","journal-title":"Int. J. Epidemiol."},{"key":"2662_CR17","doi-asserted-by":"publisher","first-page":"b2393","DOI":"10.1136\/bmj.b2393","volume":"338","author":"JA Sterne","year":"2009","unstructured":"Sterne, J. A. et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338, b2393 (2009).","journal-title":"BMJ"},{"key":"2662_CR18","unstructured":"European Commission. Rare diseases and European reference networks. https:\/\/health.ec.europa.eu\/rare-diseases-and-european-reference-networks\/rare-diseases_en (2025)."},{"key":"2662_CR19","doi-asserted-by":"crossref","unstructured":"Dwork, C. Differential privacy. In Automata, Languages and Programming (ICALP)\u2014Lecture Notes in Computer Science, Vol. 4052, 1\u201312 (Springer, 2006).","DOI":"10.1007\/11787006_1"},{"key":"2662_CR20","doi-asserted-by":"publisher","first-page":"e005122","DOI":"10.1161\/CIRCOUTCOMES.118.005122","volume":"12","author":"BK Beaulieu-Jones","year":"2019","unstructured":"Beaulieu-Jones, B. K. et al. Privacy-preserving generative deep neural networks support clinical data sharing. Circ. Cardiovasc. Qual. Outcomes 12, e005122 (2019).","journal-title":"Circ. Cardiovasc. Qual. Outcomes"},{"key":"2662_CR21","unstructured":"Goodfellow, I. J. et al. Generative adversarial nets. In Advances in Neural Information Processing Systems (NeurIPS), 2672\u20132680 (Curran Associates, Inc., 2014)."},{"key":"2662_CR22","unstructured":"Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In 2nd International Conference on Learning Representations (ICLR) https:\/\/openreview.net\/forum?id=33X9fd2-9FyZd (OpenReview.net, 2014)."},{"key":"2662_CR23","unstructured":"Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Proc. of the 34th International Conference on Neural Information Processing Systems, 6840\u20136851 (Curran Associates, Inc., 2020)."},{"key":"2662_CR24","doi-asserted-by":"publisher","first-page":"228","DOI":"10.1093\/jamia\/ocy142","volume":"26","author":"MK Baowaly","year":"2019","unstructured":"Baowaly, M. K., Lin, C.-C., Liu, C.-L. & Chen, K.-T. Synthesizing electronic health records using improved generative adversarial networks. J. Am. Med. Inform. Assoc. 26, 228\u2013241 (2019).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"2662_CR25","unstructured":"Yahi, A., Vanguri, R., Elhadad, N. & Tatonetti, N. P. Generative Adversarial Networks for Electronic Health Records: A Framework for Exploring and Evaluating Methods for Predicting Drug-Induced Laboratory Test Trajectories. NIPS Workshop on Machine Learning for Health Care (Curran Associates, Inc., 2017)."},{"key":"2662_CR26","unstructured":"Xu, L., Skoularidou, M., Cuesta-Infante, A. & Veeramachaneni, K. Modeling tabular data using conditional GAN. In Proc. of the 33rd International Conference on Neural Information Processing Systems, 7335\u20137345 (Curran Associates, Inc., 2019)."},{"key":"2662_CR27","unstructured":"Zhao, Z., Kunar, A., Birke, R. & Chen, L. Y. CTAB-GAN: effective table data synthesizing. In Asian Conference on Machine Learning, 97\u2013112 (PMLR, 2021)."},{"key":"2662_CR28","doi-asserted-by":"crossref","unstructured":"Sun, C., van Soest, J. & Dumontier, M. Generating Synthetic Personal Health Data Using Conditional Generative Adversarial Networks Combining with Differential Privacy. J. Biomed. Inform. 141, 104404 (2023).","DOI":"10.1016\/j.jbi.2023.104404"},{"key":"2662_CR29","unstructured":"Kotelnikov, A., Baranchuk, D., Rubachev, I. & Babenko, A. Tabddpm: modelling tabular data with diffusion models. In International Conference on Machine Learning, 17564\u201317579 (PMLR, 2023)."},{"key":"2662_CR30","unstructured":"Zhang, H. et al. Mixed-Type Tabular Data Synthesis with Score-Based Diffusion in Latent Space. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=4Ay23yeuz0 (OpenReview.net, 2024)."},{"key":"2662_CR31","unstructured":"Vu, M. H. et al. A unified framework for tabular generative modeling: loss functions, benchmarks, and improved multi-objective Bayesian optimization approaches. Transact. Mach. Learn. Res. https:\/\/openreview.net\/forum?id=RPZ0EW0lz0 (2026)."},{"key":"2662_CR32","unstructured":"Choi, E. et al. Generating multi-label discrete patient records using generative adversarial networks. In Machine Learning for Healthcare Conference (PMLR, 2017)."},{"key":"2662_CR33","doi-asserted-by":"publisher","first-page":"e23139","DOI":"10.2196\/23139","volume":"22","author":"K El Emam","year":"2020","unstructured":"El Emam, K. et al. Evaluating identity disclosure risk in fully synthetic health data: model development and validation. J. Med. Internet Res. 22, e23139 (2020).","journal-title":"J. Med. Internet Res."},{"key":"2662_CR34","doi-asserted-by":"crossref","unstructured":"Patki, N., Wedge, R. & Veeramachaneni, K. The synthetic data vault. In International Conference on Data Science and Advanced Analytics (IEEE, 2016).","DOI":"10.1109\/DSAA.2016.49"},{"key":"2662_CR35","doi-asserted-by":"publisher","first-page":"e308","DOI":"10.1371\/journal.pone.0000308","volume":"2","author":"HA Piwowar","year":"2007","unstructured":"Piwowar, H. A., Day, R. S. & Fridsma, D. B. Sharing detailed research data is associated with increased citation rate. PLoS ONE 2, e308 (2007).","journal-title":"PLoS ONE"},{"key":"2662_CR36","unstructured":"Bica, I. et al. Synthcity: a library for benchmarking synthetic data generation and evaluation. https:\/\/github.com\/vanderschaarlab\/synthcity (2023)."},{"key":"2662_CR37","unstructured":"DataCebo, Inc. Synthetic Data Metrics. Version 0.23.0. https:\/\/docs.sdv.dev\/sdmetrics\/ (2025)."},{"key":"2662_CR38","doi-asserted-by":"crossref","unstructured":"Little, R. J. & Rubin, D. B. Statistical Analysis with Missing Data 3rd edn (Wiley, 2019).","DOI":"10.1002\/9781119482260"},{"key":"2662_CR39","doi-asserted-by":"publisher","first-page":"107501","DOI":"10.1016\/j.patcog.2020.107501","volume":"107","author":"A Naz\u00e1bal","year":"2020","unstructured":"Naz\u00e1bal, A., Olmos, P. M., Ghahramani, Z. & Valera, I. Handling incomplete heterogeneous data using VAEs. Pattern Recognit. 107, 107501 (2020).","journal-title":"Pattern Recognit."},{"key":"2662_CR40","first-page":"35839","volume":"35","author":"I Peis","year":"2022","unstructured":"Peis, I., Ma, C. & Hern\u00e1ndez-Lobato, J. M. Missing data imputation and acquisition with deep hierarchical models and Hamiltonian Monte Carlo. Adv. Neural Inf. Process. Syst. 35, 35839\u201335851 (2022).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"2662_CR41","doi-asserted-by":"crossref","unstructured":"Satopaa, V., Albrecht, J., Irwin, D. & Raghavan, B. Finding a \u201ckneedle\" in a Haystack: detecting knee points in system behavior. In 2011 31st International Conference on Distributed Computing Systems Workshops (IEEE, 2011).","DOI":"10.1109\/ICDCSW.2011.20"},{"key":"2662_CR42","unstructured":"Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations (ICLR) https:\/\/openreview.net\/forum?id=8gmWwjFyLj (OpenReview.net, 2015)."},{"key":"2662_CR43","doi-asserted-by":"crossref","unstructured":"Raschka, S., Patterson, J. & Nolet, C. Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence. Information 11, 193 (2020).","DOI":"10.3390\/info11040193"},{"key":"2662_CR44","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825\u20132830 (2011).","journal-title":"J. Mach. Learn. Res."},{"key":"2662_CR45","doi-asserted-by":"crossref","unstructured":"Karras, T., Aittala, M., Aila, T. & Laine, S. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems. Vol. 35, 26565\u201326577 (Curran Associates, Inc., 2022).","DOI":"10.52202\/068431-1926"},{"key":"2662_CR46","unstructured":"Bergstra, J., Bardenet, R., Bengio, Y. & K\u00e9gl, B. Algorithms for Hyper-Parameter Optimization. In Advances in Neural Information Processing Systems. Vol. 24, 2546\u20132554 (Curran Associates, Inc., 2011)."},{"key":"2662_CR47","doi-asserted-by":"crossref","unstructured":"Friedman, M. The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance. J. Am. Stat. Assoc. 32, 675\u2013701 (1937).","DOI":"10.1080\/01621459.1937.10503522"},{"key":"2662_CR48","unstructured":"Nemenyi, P. B. Distribution-free Multiple Comparisons (Princeton University, 1963)."},{"key":"2662_CR49","doi-asserted-by":"crossref","unstructured":"Sweeney, L. k-Anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10, 557\u2013570 (2002).","DOI":"10.1142\/S0218488502001648"},{"key":"2662_CR50","doi-asserted-by":"crossref","unstructured":"Li, N., Li, T. & Venkatasubramanian, S. t-Closeness: privacy beyond k-Anonymity and l-Diversity. In 2007 IEEE 23rd International Conference on Data Engineering (IEEE, 2007).","DOI":"10.1109\/ICDE.2007.367856"},{"key":"2662_CR51","doi-asserted-by":"publisher","first-page":"2378","DOI":"10.1109\/JBHI.2020.2980262","volume":"24","author":"J Yoon","year":"2020","unstructured":"Yoon, J., Drumright, L. N. & van der Schaar, M. Anonymization through data synthesis using generative adversarial networks (ads-gan). IEEE J. Biomed. Health Inform. 24, 2378\u20132388 (2020).","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"2662_CR52","unstructured":"Liu, L., Zhang, J. & Shokri, R. DoMIA: membership inference attacks via density ratio estimation. Preprint at arXiv https:\/\/arxiv.org\/abs\/2112.01526 (2021)."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-026-02662-x","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-026-02662-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-026-02662-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T07:19:21Z","timestamp":1777706361000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-026-02662-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,5,2]]},"references-count":52,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,12]]}},"alternative-id":["2662"],"URL":"https:\/\/doi.org\/10.1038\/s41746-026-02662-x","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,5,2]]},"assertion":[{"value":"8 December 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 April 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 May 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"347"}}