{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T03:44:23Z","timestamp":1777520663644,"version":"3.51.4"},"reference-count":56,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,4,29]],"date-time":"2023-04-29T00:00:00Z","timestamp":1682726400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,4,29]],"date-time":"2023-04-29T00:00:00Z","timestamp":1682726400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-19-CHIA-0020"],"award-info":[{"award-number":["ANR-19-CHIA-0020"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004063","name":"Knut och Alice Wallenbergs Stiftelse","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004063","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004359","name":"Vetenskapsr\u00e5det","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004359","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Data-driven medical care delivery must always respect patient privacy\u2014a requirement that is not easily met. This issue has impeded improvements to healthcare software and has delayed the long-predicted prevalence of artificial intelligence in healthcare. Until now, it has been very difficult to share data between healthcare organizations, resulting in poor statistical models due to unrepresentative patient cohorts. Synthetic data, i.e., artificial but realistic electronic health records, could overcome the drought that is troubling the healthcare sector. Deep neural network architectures, in particular, have shown an incredible ability to learn from complex data sets and generate large amounts of unseen data points with the same statistical properties as the training data. Here, we present a generative neural network model that can create synthetic health records with realistic timelines. These clinical trajectories are generated on a per-patient basis and are represented as linear-sequence graphs of clinical events over time. We use a variational graph autoencoder (VGAE) to generate synthetic samples from real-world electronic health records. Our approach generates health records not seen in the training data. We show that these artificial patient trajectories are realistic and preserve patient privacy and can therefore support the safe sharing of data across organizations.<\/jats:p>","DOI":"10.1038\/s41746-023-00822-x","type":"journal-article","created":{"date-parts":[[2023,4,29]],"date-time":"2023-04-29T18:02:08Z","timestamp":1682791328000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":41,"title":["Synthetic electronic health records generated with variational graph autoencoders"],"prefix":"10.1038","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0336-5879","authenticated-orcid":false,"given":"Giannis","family":"Nikolentzos","sequence":"first","affiliation":[]},{"given":"Michalis","family":"Vazirgiannis","sequence":"additional","affiliation":[]},{"given":"Christos","family":"Xypolopoulos","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4068-0341","authenticated-orcid":false,"given":"Markus","family":"Lingman","sequence":"additional","affiliation":[]},{"given":"Erik G.","family":"Brandt","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,4,29]]},"reference":[{"key":"822_CR1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41746-020-00323-1","volume":"3","author":"N Rieke","year":"2020","unstructured":"Rieke, N. et al. The future of digital health with federated learning. npj Digit. Med. 3, 1\u20137 (2020).","journal-title":"npj Digit. Med."},{"key":"822_CR2","unstructured":"Abadi, M. et al. in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308\u2013318 (2016)."},{"key":"822_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3214303","volume":"51","author":"A Acar","year":"2018","unstructured":"Acar, A., Aksu, H., Uluagac, A. S. & Conti, M. A survey on homomorphic encryption schemes: theory and implementation. ACM Comput. Surv. 51, 1\u201335 (2018).","journal-title":"ACM Comput. Surv."},{"key":"822_CR4","unstructured":"Yoon, J., Jarrett, D. & Van der Schaar, M. in Advances in Neural Information Processing Systems (2019)."},{"key":"822_CR5","doi-asserted-by":"publisher","unstructured":"Ramponi, G., Protopapas, P., Brambilla, M. & Janssen, R. T-cgan: conditional generative adversarial network for data augmentation in noisy time series with irregular sampling. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.1811.08295 (2018).","DOI":"10.48550\/arXiv.1811.08295"},{"key":"822_CR6","doi-asserted-by":"publisher","first-page":"712","DOI":"10.1109\/TITS.2019.2962338","volume":"22","author":"S Kuutti","year":"2020","unstructured":"Kuutti, S., Bowden, R., Jin, Y., Barber, P. & Fallah, S. A survey of deep learning applications to autonomous vehicle control. IEEE Trans. Intell. Transp. Syst. 22, 712\u2013733 (2020).","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"822_CR7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41467-020-18073-9","volume":"11","author":"M Popel","year":"2020","unstructured":"Popel, M. et al. Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals. Nat. Commun. 11, 1\u201315 (2020).","journal-title":"Nat. Commun."},{"key":"822_CR8","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1021\/acs.accounts.0c00699","volume":"54","author":"WP Walters","year":"2020","unstructured":"Walters, W. P. & Barzilay, R. Applications of deep learning in molecule generation and molecular property prediction. Acc. Chem. Res. 54, 263\u2013270 (2020).","journal-title":"Acc. Chem. Res."},{"key":"822_CR9","unstructured":"Choi, E. et al. in Proceedings of Machine Learning for Healthcare 2017, pp. 286\u2013305 (2017)."},{"key":"822_CR10","unstructured":"Jordon, J., Yoon, J. & Van Der Schaar, M. in 7th International Conference on Learning Representations (2019)."},{"key":"822_CR11","doi-asserted-by":"publisher","unstructured":"Esteban, C., Hyland, S. L. & R\u00e4tsch, G. Real-valued (medical) time series generation with recurrent conditional gans. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.1706.02633 (2017).","DOI":"10.48550\/arXiv.1706.02633"},{"key":"822_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41746-022-00666-x","volume":"5","author":"P Wendland","year":"2022","unstructured":"Wendland, P. et al. Generation of realistic synthetic data using multimodal neural ordinary differential equations. npj Digit. Med. 5, 1\u201310 (2022).","journal-title":"npj Digit. Med."},{"key":"822_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41746-020-00353-9","volume":"3","author":"A Tucker","year":"2020","unstructured":"Tucker, A., Wang, Z., Rotalinti, Y. & Myles, P. Generating high-fidelity synthetic patient data for assessing machine learning healthcare software. npj Digit. Med. 3, 1\u201313 (2020).","journal-title":"npj Digit. Med."},{"key":"822_CR14","doi-asserted-by":"publisher","first-page":"493","DOI":"10.1038\/s41551-021-00751-8","volume":"5","author":"RJ Chen","year":"2021","unstructured":"Chen, R. J., Lu, M. Y., Chen, T. Y., Williamson, D. F. & Mahmood, F. Synthetic data in machine learning for medicine and healthcare. Nat. Biomed. Eng. 5, 493\u2013497 (2021).","journal-title":"Nat. Biomed. Eng."},{"key":"822_CR15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12874-020-00977-1","volume":"20","author":"A Goncalves","year":"2020","unstructured":"Goncalves, A. et al. Generation and evaluation of synthetic patient data. BMC Med. Res. Methodol. 20, 1\u201340 (2020).","journal-title":"BMC Med. Res. Methodol."},{"key":"822_CR16","unstructured":"Kingma, D. P. & Welling, M. in 2nd International Conference on Learning Representations (2014)."},{"key":"822_CR17","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1561\/2200000056","volume":"12","author":"DP Kingma","year":"2019","unstructured":"Kingma, D. P. & Welling, M. An introduction to variational autoencoders. Found. Trends\u00ae Mach. Learn. 12, 307\u2013392 (2019).","journal-title":"Found. Trends\u00ae Mach. Learn."},{"key":"822_CR18","doi-asserted-by":"crossref","unstructured":"Simonovsky, M. & Komodakis, N. in Proceedings of the 27th International Conference on Artificial Neural Networks, pp. 412\u2013422 (2018).","DOI":"10.1007\/978-3-030-01418-6_41"},{"key":"822_CR19","unstructured":"Salha, G., Limnios, S., Hennequin, R., Tran, V. A. & Vazirgiannis, M. in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 589\u2013598 (2019)."},{"key":"822_CR20","doi-asserted-by":"crossref","unstructured":"Chatzianastasis, M., Dasoulas, G., Siolas, G. & Vazirgiannis, M. in Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision Workshops, pp. 393\u2013402 (2021).","DOI":"10.1109\/ICCVW54120.2021.00048"},{"key":"822_CR21","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1109\/MSP.2017.2765202","volume":"35","author":"A Creswell","year":"2018","unstructured":"Creswell, A. et al. Generative adversarial networks: an overview. IEEE Signal Process. Mag. 35, 53\u201365 (2018).","journal-title":"IEEE Signal Process. Mag."},{"key":"822_CR22","doi-asserted-by":"crossref","unstructured":"Gui, J., Sun, Z., Wen, Y., Tao, D. & Ye, J. A review on generative adversarial networks: algorithms, theory, and applications. IEEE Trans. Knowl. Data Eng., 3313\u20133332 (2021).","DOI":"10.1109\/TKDE.2021.3130191"},{"key":"822_CR23","doi-asserted-by":"publisher","first-page":"801","DOI":"10.1093\/jamia\/ocaa303","volume":"28","author":"D Kaur","year":"2021","unstructured":"Kaur, D. et al. Application of Bayesian networks to generate synthetic health data. J. Am. Med. Inform. Assoc. 28, 801\u2013811 (2021).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"822_CR24","doi-asserted-by":"publisher","first-page":"230","DOI":"10.1093\/jamia\/ocx079","volume":"25","author":"J Walonoski","year":"2018","unstructured":"Walonoski, J. et al. Synthea: an approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J. Am. Med. Inform. Assoc. 25, 230\u2013238 (2018).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"822_CR25","doi-asserted-by":"publisher","first-page":"228","DOI":"10.1093\/jamia\/ocy142","volume":"26","author":"MK Baowaly","year":"2019","unstructured":"Baowaly, M. K., Lin, C. C., Liu, C. L. & Chen, K. T. Synthesizing electronic health records using improved generative adversarial networks. J. Am. Med. Inform. Assoc. 26, 228\u2013241 (2019).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"822_CR26","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1016\/j.neucom.2019.12.136","volume":"416","author":"A Yale","year":"2020","unstructured":"Yale, A. et al. Generation and evaluation of privacy preserving synthetic health data. Neurocomputing 416, 244\u2013255 (2020).","journal-title":"Neurocomputing"},{"key":"822_CR27","doi-asserted-by":"crossref","unstructured":"Arvanitis, T.N., White, S., Harrison, S., Chaplin, R. & Despotou, G. A method for machine learning generation of realistic synthetic datasets for validating healthcare applications. Health Inform. J. 28, 1\u201316 (2022).","DOI":"10.1177\/14604582221077000"},{"key":"822_CR28","unstructured":"Chin-Cheong, K., Sutter, T. & Vogt, J. E. in Workshop on Machine Learning for Health (ML4H) at the 33rd Conference on Neural Information Processing Systems (2019)."},{"key":"822_CR29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3446374","volume":"54","author":"D Saxena","year":"2021","unstructured":"Saxena, D. & Cao, J. Generative adversarial networks (gans) challenges, solutions, and future directions. ACM Comput. Surv. 54, 1\u201342 (2021).","journal-title":"ACM Comput. Surv."},{"key":"822_CR30","unstructured":"You, J., Ying, R., Ren, X., Hamilton, W. & Leskovec, J. in Proceedings of the 35th International Conference on Machine Learning, pp. 5708\u20135717 (2018)."},{"key":"822_CR31","unstructured":"Jin, W., Barzilay, R. & Jaakkola, T. in Proceedings of the 35th International Conference on Machine Learning, pp. 2323\u20132332 (2018)."},{"key":"822_CR32","unstructured":"Li, Y., Vinyals, O., Dyer, C., Pascanu, R. & Battaglia, P. in Proceedings of the 35th International Conference on Machine Learning (2018)."},{"key":"822_CR33","doi-asserted-by":"publisher","first-page":"242","DOI":"10.1016\/j.neucom.2021.04.039","volume":"450","author":"P Bongini","year":"2021","unstructured":"Bongini, P., Bianchini, M. & Scarselli, F. Molecular generative graph neural networks for drug discovery. Neurocomputing 450, 242\u2013252 (2021).","journal-title":"Neurocomputing"},{"key":"822_CR34","unstructured":"Johnson, A. et al. Mimic-iv https:\/\/physionet.org\/content\/mimiciv\/1.0\/ (2021)."},{"key":"822_CR35","unstructured":"Implemented in the SHAARPEC Analytics platform. https:\/\/www.shaarpec.com."},{"key":"822_CR36","unstructured":"Bender, D. & Sartipi, K. in Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems, pp. 326\u2013331 (2013)."},{"key":"822_CR37","unstructured":"Jang, E., Gu, S. & Poole, B. in 5th International Conference on Learning Representations (2017)."},{"key":"822_CR38","doi-asserted-by":"publisher","unstructured":"De Cao, N. & Kipf, T. Molgan: an implicit generative model for small molecular graphs. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.1805.11973 (2018)","DOI":"10.48550\/arXiv.1805.11973"},{"key":"822_CR39","doi-asserted-by":"publisher","first-page":"943","DOI":"10.1613\/jair.1.13225","volume":"72","author":"G Nikolentzos","year":"2021","unstructured":"Nikolentzos, G., Siglidis, G. & Vazirgiannis, M. Graph kernels: a survey. J. Artif. Intell. Res. 72, 943\u20131027 (2021).","journal-title":"J. Artif. Intell. Res."},{"key":"822_CR40","unstructured":"Shervashidze, N., Schweitzer, P., Van Leeuwen, E. J., Mehlhorn, K. & Borgwardt, K. M. Weisfeiler-Lehman graph kernels. J. Mach. Learn. Res. 12, 2539\u20132561 (2011)."},{"key":"822_CR41","unstructured":"Borgwardt, K. M. & Kriegel, H. P. in Proceedings of the 5th IEEE International Conference on Data Mining (2005)."},{"key":"822_CR42","unstructured":"Weggenmann, B., Rublack, V., Andrejczuk, M., Mattern, J. & Kerschbaum, F. in Proceedings of the ACM Web Conference 2022, pp. 721\u2013731 (2022)."},{"key":"822_CR43","unstructured":"Kawai, W., Mukuta, Y. & Harada, T. Scalable generative models for graphs with graph attention mechanism. Preprint at arXiv https:\/\/arxiv.org\/pdf\/1906.01861.pdf (2019)."},{"key":"822_CR44","first-page":"723","volume":"13","author":"A Gretton","year":"2012","unstructured":"Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch\u00f6lkopf, B. & Smola, A. A Kernel two-sample test. J. Mach. Learn. Res. 13, 723\u2013773 (2012).","journal-title":"J. Mach. Learn. Res."},{"key":"822_CR45","doi-asserted-by":"publisher","first-page":"715","DOI":"10.1186\/1471-2458-13-715","volume":"13","author":"J Engdahl","year":"2013","unstructured":"Engdahl, J., Holm\u00e9n, A., Rosenqvist, M. & Str\u00f6mberg, U. Uptake of atrial fibrillation screening aiming at stroke prevention: geo-mapping of target population and non-participation. BMC Public Health 13, 715\u2013724 (2013).","journal-title":"BMC Public Health"},{"key":"822_CR46","doi-asserted-by":"crossref","first-page":"e2","DOI":"10.1161\/CIR.0b013e318245fac5","volume":"125","author":"WG Members","year":"2012","unstructured":"Members, W. G. et al. Heart disease and stroke statistics\u20142012 update: a report from the American heart association. Circulation 125, e2\u2013e220 (2012).","journal-title":"Circulation"},{"key":"822_CR47","doi-asserted-by":"publisher","first-page":"1545","DOI":"10.1016\/S0140-6736(16)31678-6","volume":"388","author":"T Vos","year":"2016","unstructured":"Vos, T. et al. Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990\u20132015: a systematic analysis for the global burden of disease study 2015. The Lancet 388, 1545\u20131602 (2016).","journal-title":"The Lancet"},{"key":"822_CR48","first-page":"629","volume":"9","author":"BJ Mortazavi","year":"2016","unstructured":"Mortazavi, B. J. et al. Analysis of machine learning techniques for heart failure readmissions. Circulation 9, 629\u2013640 (2016).","journal-title":"Circulation"},{"key":"822_CR49","doi-asserted-by":"publisher","first-page":"1443","DOI":"10.1162\/089976601750264965","volume":"13","author":"B Sch\u00f6lkopf","year":"2001","unstructured":"Sch\u00f6lkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J. & Williamson, R. C. Estimating the support of a high-dimensional distribution. Neural Comput. 13, 1443\u20131471 (2001).","journal-title":"Neural Comput."},{"key":"822_CR50","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3457607","volume":"54","author":"N Mehrabi","year":"2021","unstructured":"Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. A survey on bias and fairness in machine learning. ACM Comput. Surv. 54, 1\u201335 (2021).","journal-title":"ACM Comput. Surv."},{"key":"822_CR51","doi-asserted-by":"publisher","first-page":"2377","DOI":"10.1001\/jama.2019.18058","volume":"322","author":"RB Parikh","year":"2019","unstructured":"Parikh, R. B., Teeple, S. & Navathe, A. S. Addressing bias in artificial intelligence in health care. JAMA 322, 2377\u20132378 (2019).","journal-title":"JAMA"},{"key":"822_CR52","doi-asserted-by":"publisher","first-page":"99","DOI":"10.29012\/jpc.v1i1.567","volume":"1","author":"JP Reiter","year":"2009","unstructured":"Reiter, J. P. & Mitra, R. Estimating risks of identification disclosure in partially synthetic data. J. Privacy Confid. 1, 99\u2013110 (2009).","journal-title":"J. Privacy Confid."},{"key":"822_CR53","first-page":"531","volume":"18","author":"JP Reiter","year":"2002","unstructured":"Reiter, J. P. Satisfying disclosure restrictions with synthetic data sets. J. Off. Stat. 18, 531 (2002).","journal-title":"J. Off. Stat."},{"key":"822_CR54","doi-asserted-by":"crossref","unstructured":"Park, N. et al. Data synthesis based on generative adversarial networks. Proc. VLDB Endow. 11, 1071\u20131083 (2018).","DOI":"10.14778\/3231751.3231757"},{"key":"822_CR55","doi-asserted-by":"publisher","first-page":"270","DOI":"10.1162\/neco.1989.1.2.270","volume":"1","author":"RJ Williams","year":"1989","unstructured":"Williams, R. J. & Zipser, D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1, 270\u2013280 (1989).","journal-title":"Neural Comput."},{"key":"822_CR56","unstructured":"Fu, H. et al. in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 240\u2013250 (2019)."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00822-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00822-x","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00822-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,19]],"date-time":"2024-10-19T13:47:31Z","timestamp":1729345651000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00822-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,29]]},"references-count":56,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["822"],"URL":"https:\/\/doi.org\/10.1038\/s41746-023-00822-x","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,29]]},"assertion":[{"value":"12 October 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 April 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 April 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"83"}}