{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T21:50:41Z","timestamp":1774389041964,"version":"3.50.1"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,8,11]],"date-time":"2023-08-11T00:00:00Z","timestamp":1691712000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,8,11]],"date-time":"2023-08-11T00:00:00Z","timestamp":1691712000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Privacy concerns often arise as the key bottleneck for the sharing of data between consumers and data holders, particularly for sensitive data such as Electronic Health Records (EHR). This impedes the application of data analytics and ML-based innovations with tremendous potential. One promising approach for such privacy concerns is to instead use synthetic data. We propose a generative modeling framework, EHR-Safe, for generating highly realistic and privacy-preserving synthetic EHR data. EHR-Safe is based on a two-stage model that consists of sequential encoder-decoder networks and generative adversarial networks. Our innovations focus on the key challenging aspects of real-world EHR data: heterogeneity, sparsity, coexistence of numerical and categorical features with distinct characteristics, and time-varying features with highly-varying sequence lengths. Under numerous evaluations, we demonstrate that the fidelity of EHR-Safe is almost-identical with real data (&lt;3% accuracy difference for the models trained on them) while yielding almost-ideal performance in practical privacy metrics.<\/jats:p>","DOI":"10.1038\/s41746-023-00888-7","type":"journal-article","created":{"date-parts":[[2023,8,11]],"date-time":"2023-08-11T18:02:23Z","timestamp":1691776943000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":66,"title":["EHR-Safe: generating high-fidelity and privacy-preserving synthetic electronic health records"],"prefix":"10.1038","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5481-5171","authenticated-orcid":false,"given":"Jinsung","family":"Yoon","sequence":"first","affiliation":[]},{"given":"Michel","family":"Mizrahi","sequence":"additional","affiliation":[]},{"given":"Nahid Farhady","family":"Ghalaty","sequence":"additional","affiliation":[]},{"given":"Thomas","family":"Jarvinen","sequence":"additional","affiliation":[]},{"given":"Ashwin S.","family":"Ravi","sequence":"additional","affiliation":[]},{"given":"Peter","family":"Brune","sequence":"additional","affiliation":[]},{"given":"Fanyu","family":"Kong","sequence":"additional","affiliation":[]},{"given":"Dave","family":"Anderson","sequence":"additional","affiliation":[]},{"given":"George","family":"Lee","sequence":"additional","affiliation":[]},{"given":"Arie","family":"Meir","sequence":"additional","affiliation":[]},{"given":"Farhana","family":"Bandukwala","sequence":"additional","affiliation":[]},{"given":"Elli","family":"Kanal","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6333-1729","authenticated-orcid":false,"given":"Sercan \u00d6.","family":"Ar\u0131k","sequence":"additional","affiliation":[]},{"given":"Tomas","family":"Pfister","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,8,11]]},"reference":[{"key":"888_CR1","doi-asserted-by":"publisher","first-page":"2744","DOI":"10.1109\/JBHI.2020.3040225","volume":"25","author":"T Zhu","year":"2020","unstructured":"Zhu, T., Li, K., Herrero, P. & Georgiou, P. Deep learning for diabetes: a systematic review. IEEE J. Biomed. Health Inform. 25, 2744\u20132757 (2020).","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"888_CR2","doi-asserted-by":"publisher","first-page":"35558","DOI":"10.1109\/ACCESS.2018.2848936","volume":"6","author":"L Yu","year":"2018","unstructured":"Yu, L., Chan, W. M., Zhao, Y. & Tsui, K.-L. Personalized health monitoring system of elderly wellness at the community level in Hong Kong. IEEE Access 6, 35558\u201335567 (2018).","journal-title":"IEEE Access"},{"key":"888_CR3","doi-asserted-by":"publisher","first-page":"1656","DOI":"10.1038\/s41591-022-01873-5","volume":"28","author":"R Liu","year":"2022","unstructured":"Liu, R. et al. Systematic pan-cancer analysis of mutation\u2013treatment interactions using large real-world clinicogenomics data. Nat. Med. 28, 1656\u20131661 (2022).","journal-title":"Nat. Med."},{"key":"888_CR4","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1016\/j.procs.2017.08.292","volume":"113","author":"K Abouelmehdi","year":"2017","unstructured":"Abouelmehdi, K., Beni-Hssane, A., Khaloufi, H. & Saadi, M. Big data security and privacy in healthcare: a review. Procedia Comput. Sci. 113, 73\u201380 (2017).","journal-title":"Procedia Comput. Sci."},{"key":"888_CR5","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1109\/MIC.2018.022021660","volume":"22","author":"A Iyengar","year":"2018","unstructured":"Iyengar, A., Kundu, A. & Pallis, G. Healthcare informatics and privacy. IEEE Internet Comput. 22, 29\u201331 (2018).","journal-title":"IEEE Internet Comput."},{"key":"888_CR6","doi-asserted-by":"crossref","unstructured":"Ray, P. & Wimalasiri, J. The need for technical solutions for maintaining the privacy of EHR. In Proc. 2006 International Conference of the IEEE Engineering in Medicine and Biology Society, 4686\u20134689 (IEEE, 2006).","DOI":"10.1109\/IEMBS.2006.260862"},{"key":"888_CR7","doi-asserted-by":"publisher","first-page":"425","DOI":"10.1016\/j.procs.2015.08.363","volume":"63","author":"M Azarm-Daigle","year":"2015","unstructured":"Azarm-Daigle, M., Kuziemsky, C. & Peyton, L. A review of cross organizational healthcare data sharing. Procedia Comput. Sci. 63, 425\u2013432 (2015).","journal-title":"Procedia Comput. Sci."},{"key":"888_CR8","doi-asserted-by":"publisher","first-page":"550","DOI":"10.1197\/jamia.M2444","volume":"14","author":"\u00d6 Uzuner","year":"2007","unstructured":"Uzuner, \u00d6., Luo, Y. & Szolovits, P. Evaluating the state-of-the-art in automatic de-identification. J. Am. Med. Inform. Assoc. 14, 550\u2013563 (2007).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"888_CR9","unstructured":"Janmey, V. & Elkin, P. L. Re-identification risk in HIPAA de-identified datasets: the MVA attack. AMIA Annu. Symp. Proc. 2018, 1329\u20131337 (2018)."},{"key":"888_CR10","doi-asserted-by":"publisher","first-page":"493","DOI":"10.1038\/s41551-021-00751-8","volume":"5","author":"RJ Chen","year":"2021","unstructured":"Chen, R. J., Lu, M. Y., Chen, T. Y., Williamson, D. F. & Mahmood, F. Synthetic data in machine learning for medicine and healthcare. Nat. Biomed. Eng. 5, 493\u2013497 (2021).","journal-title":"Nat. Biomed. Eng."},{"key":"888_CR11","unstructured":"Goodfellow, I. et al. Generative adversarial nets. In Proc. 27th International Conference on Neural Information Processing Systems, Vol. 27, 2672\u20132680 (2014)."},{"key":"888_CR12","unstructured":"Van den Oord, A. et al. Conditional image generation with PixelCNN decoders. In Proc. 30th International Conference on Neural Information Processing Systems, 4797\u20134805 (2016)."},{"key":"888_CR13","unstructured":"Van den Oord, A. et al. Wavenet: a generative model for raw audio. Preprint at https:\/\/arxiv.org\/abs\/1609.03499 (2016)."},{"key":"888_CR14","unstructured":"Nowozin, S., Cseke, B. & Tomioka, R. f-GAN: training generative neural samplers using variational divergence minimization. In Proc. 30th International Conference on Neural Information Processing Systems, 271\u2013279 (2016)."},{"key":"888_CR15","unstructured":"Yoon, J., Jarrett, D. & Van der Schaar, M. Time-series generative adversarial networks. In Proc. 33rd Conference on Neural Information Processing Systems (2019)."},{"key":"888_CR16","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1109\/MSP.2017.2765202","volume":"35","author":"A Creswell","year":"2018","unstructured":"Creswell, A. et al. Generative adversarial networks: an overview. IEEE Signal Process. Mag. 35, 53\u201365 (2018).","journal-title":"IEEE Signal Process. Mag."},{"key":"888_CR17","unstructured":"Karras, T., Aila, T., Laine, S. & Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. In Proc. International Conference on Learning Representations (ICLR) (2018)."},{"key":"888_CR18","first-page":"17022","volume":"33","author":"J Kong","year":"2020","unstructured":"Kong, J., Kim, J. & Bae, J. HiFi-GAN: generative adversarial networks for efficient and high fidelity speech synthesis. Adv. Neural Inf. Process. Syst. 33, 17022\u201317033 (2020).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"888_CR19","unstructured":"de Masson d\u2019Autume, C., Mohamed, S., Rosca, M. & Rae, J. Training language GANs from scratch. In Proc. 33rd Conference on Neural Information Processing Systems (2019)."},{"key":"888_CR20","doi-asserted-by":"crossref","unstructured":"Liu, Y., Peng, J., James, J. & Wu, Y. PPGAN: privacy-preserving generative adversarial network. In Proc. 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), 985\u2013989 (IEEE, 2019).","DOI":"10.1109\/ICPADS47876.2019.00150"},{"key":"888_CR21","unstructured":"Jordon, J., Yoon, J. & Van Der Schaar, M. PATE-GAN: generating synthetic data with differential privacy guarantees. In Proc. 2019 International Conference On Learning Representations (2019)."},{"key":"888_CR22","first-page":"28968","volume":"34","author":"D Jarrett","year":"2021","unstructured":"Jarrett, D., Bica, I. & van der Schaar, M. Time-series generation by contrastive imitation. Adv. Neural Inf. Process. Syst. 34, 28968\u201328982 (2021).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"888_CR23","unstructured":"Choi, E. et al. Generating multi-label discrete patient records using generative adversarial networks. PMLR 68, 286\u2013305 (2017)."},{"key":"888_CR24","unstructured":"Lu, C., Reddy, C. K., Wang, P., Nie, D. & Ning, Y. Multi-label clinical time-series generation via conditional GAN. Preprint at https:\/\/arxiv.org\/abs\/2204.04797 (2022)."},{"key":"888_CR25","unstructured":"Johnson, A., Pollard, T. & Mark, R. MIMIC-III clinical database (version 1.4). PhysioNet 10 (2016). https:\/\/physionet.org\/content\/mimiciii\/1.4\/."},{"key":"888_CR26","doi-asserted-by":"publisher","first-page":"160035","DOI":"10.1038\/sdata.2016.35","volume":"3","author":"AE Johnson","year":"2016","unstructured":"Johnson, A. E. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016).","journal-title":"Sci. Data"},{"key":"888_CR27","doi-asserted-by":"publisher","first-page":"e215","DOI":"10.1161\/01.CIR.101.23.e215","volume":"101","author":"AL Goldberger","year":"2000","unstructured":"Goldberger, A. L. et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101, e215\u2013e220 (2000).","journal-title":"Circulation"},{"key":"888_CR28","doi-asserted-by":"publisher","first-page":"180178","DOI":"10.1038\/sdata.2018.178","volume":"5","author":"TJ Pollard","year":"2018","unstructured":"Pollard, T. J. et al. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci. Data 5, 180178 (2018).","journal-title":"Sci. Data"},{"key":"888_CR29","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1016\/j.smhl.2018.07.001","volume":"9","author":"R Sadeghi","year":"2018","unstructured":"Sadeghi, R., Banerjee, T. & Romine, W. Early hospital mortality prediction using vital signals. Smart Health 9, 265\u2013274 (2018).","journal-title":"Smart Health"},{"key":"888_CR30","unstructured":"Sheikhalishahi, S., Balaraman, V. & Osmani, V. Benchmarking machine learning models on eICU critical care dataset. Preprint at https:\/\/arxiv.org\/abs\/1910.00964 (2019)."},{"key":"888_CR31","doi-asserted-by":"publisher","first-page":"907","DOI":"10.1109\/TCSS.2019.2916086","volume":"6","author":"G Liu","year":"2019","unstructured":"Liu, G. et al. SocInf: membership inference attacks on social media health data with machine learning. IEEE Trans. Comput. Soc. Syst. 6, 907\u2013921 (2019).","journal-title":"IEEE Trans. Comput. Soc. Syst."},{"key":"888_CR32","doi-asserted-by":"crossref","unstructured":"Su, D., Huynh, H. T., Chen, Z., Lu, Y. & Lu, W. Re-identification attack to privacy-preserving data analysis with noisy sample-mean. In Proc. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1045\u20131053 (2020).","DOI":"10.1145\/3394486.3403148"},{"key":"888_CR33","unstructured":"Mehnaz, S. et al. Are your sensitive attributes private? Novel model inversion attribute inference attacks on classification models. In Proc. 31st USENIX Security Symposium (USENIX Security 22), 4579\u20134596 (2022)."},{"key":"888_CR34","unstructured":"Esteban, C., Hyland, S. L. & R\u00e4tsch, G. Real-valued (medical) time series generation with recurrent conditional GANs. Preprint at https:\/\/arxiv.org\/abs\/1706.02633 (2017)."},{"key":"888_CR35","unstructured":"Mogren, O. C-RNN-GAN: continuous recurrent neural networks with adversarial training. Preprint at https:\/\/arxiv.org\/abs\/1611.09904 (2016)."},{"key":"888_CR36","doi-asserted-by":"crossref","unstructured":"Torkzadehmahani, R., Kairouz, P. & Paten, B. DP-CGAN: differentially private synthetic data and label generation. In Proc. IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019).","DOI":"10.1109\/CVPRW.2019.00018"},{"key":"888_CR37","doi-asserted-by":"crossref","unstructured":"Abadi, M. et al. Deep learning with differential privacy. In Proc. 2016 ACM SIGSAC Conference on Computer and Communications Security, 308\u2013318 (2016).","DOI":"10.1145\/2976749.2978318"},{"key":"888_CR38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3446374","volume":"54","author":"D Saxena","year":"2021","unstructured":"Saxena, D. & Cao, J. Generative adversarial networks (gans) challenges, solutions, and future directions. ACM Comput. Surv. (CSUR) 54, 1\u201342 (2021).","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"888_CR39","unstructured":"Arjovsky, M., Chintala, S. & Bottou, L. Wasserstein generative adversarial networks. PMLR 70, 214\u2013223 (2017)."},{"key":"888_CR40","unstructured":"Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V. & Courville, A. C. Improved training of Wasserstein GANs. In Proc. 31st International Conference on Neural Information Processing Systems, 5769\u20135779 (2017)."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00888-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00888-7","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00888-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,17]],"date-time":"2023-11-17T20:26:22Z","timestamp":1700252782000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00888-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,11]]},"references-count":40,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["888"],"URL":"https:\/\/doi.org\/10.1038\/s41746-023-00888-7","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,11]]},"assertion":[{"value":"19 January 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 July 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 August 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"This work was approved by Google, and no extramural funding was used for this project. All authors are affiliated with Google. The authors have no other competing interests to declare.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"141"}}