{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T10:25:24Z","timestamp":1775471124884,"version":"3.50.1"},"reference-count":55,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2020,12,23]],"date-time":"2020-12-23T00:00:00Z","timestamp":1608681600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100015703","name":"Philips Research North America","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100015703","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,3,18]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objective<\/jats:title>\n                  <jats:p>This study seeks to develop a fully automated method of generating synthetic data from a real dataset that could be employed by medical organizations to distribute health data to researchers, reducing the need for access to real data. We hypothesize the application of Bayesian networks will improve upon the predominant existing method, medBGAN, in handling the complexity and dimensionality of healthcare data.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>We employed Bayesian networks to learn probabilistic graphical structures and simulated synthetic patient records from the learned structure. We used the University of California Irvine (UCI) heart disease and diabetes datasets as well as the MIMIC-III diagnoses database. We evaluated our method through statistical tests, machine learning tasks, preservation of rare events, disclosure risk, and the ability of a machine learning classifier to discriminate between the real and synthetic data.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Our Bayesian network model outperformed or equaled medBGAN in all key metrics. Notable improvement was achieved in capturing rare variables and preserving association rules.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion<\/jats:title>\n                  <jats:p>Bayesian networks generated data sufficiently similar to the original data with minimal risk of disclosure, while offering additional transparency, computational efficiency, and capacity to handle more data types in comparison to existing methods. We hope this method will allow healthcare organizations to efficiently disseminate synthetic health data to researchers, enabling them to generate hypotheses and develop analytical tools.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusion<\/jats:title>\n                  <jats:p>We conclude the application of Bayesian networks is a promising option for generating realistic synthetic health data that preserves the features of the original data without compromising data privacy.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocaa303","type":"journal-article","created":{"date-parts":[[2020,11,16]],"date-time":"2020-11-16T20:16:28Z","timestamp":1605557788000},"page":"801-811","source":"Crossref","is-referenced-by-count":85,"title":["Application of Bayesian networks to generate synthetic health data"],"prefix":"10.1093","volume":"28","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6123-7499","authenticated-orcid":false,"given":"Dhamanpreet","family":"Kaur","sequence":"first","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, Massachusetts, USA"}]},{"given":"Matthew","family":"Sobiesk","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, Massachusetts, USA"}]},{"given":"Shubham","family":"Patil","sequence":"additional","affiliation":[{"name":"Rochester Institute of Technology, Rochester, New York, USA"}]},{"given":"Jin","family":"Liu","sequence":"additional","affiliation":[{"name":"Clinical Informatics, Philips Research North America, Cambridge, Massachusetts, USA"}]},{"given":"Puran","family":"Bhagat","sequence":"additional","affiliation":[{"name":"Clinical Informatics, Philips Research North America, Cambridge, Massachusetts, USA"}]},{"given":"Amar","family":"Gupta","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, Massachusetts, USA"}]},{"given":"Natasha","family":"Markuzon","sequence":"additional","affiliation":[{"name":"Clinical Informatics, Philips Research North America, Cambridge, Massachusetts, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,12,23]]},"reference":[{"issue":"1","key":"2021031906034260200_ocaa303-B1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s00392-016-1025-6","article-title":"Electronic health records to facilitate clinical research","volume":"106","author":"Cowie","year":"2017","journal-title":"Clin Res Cardiol"},{"issue":"5","key":"2021031906034260200_ocaa303-B2","doi-asserted-by":"crossref","first-page":"757","DOI":"10.1093\/eurpub\/ckv149","article-title":"Will the trilogue on the EU Data Protection Regulation recognise the importance of health research?","volume":"25","author":"Coppen","year":"2015","journal-title":"Eur J Public Health"},{"issue":"1","key":"2021031906034260200_ocaa303-B3","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1136\/amiajnl-2013-002061","article-title":"Don't take your EHR to heaven, donate it to science: legal and research policies for EHR post mortem","volume":"21","author":"Huser","year":"2014","journal-title":"J Am Med Inform Assoc"},{"issue":"9","key":"2021031906034260200_ocaa303-B4","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1080\/15265161.2010.494215","article-title":"Is deidentification sufficient to protect health privacy in research?","volume":"10","author":"Rothstein","year":"2010","journal-title":"Am J Bioeth"},{"issue":"12","key":"2021031906034260200_ocaa303-B5","doi-asserted-by":"crossref","first-page":"e28071","DOI":"10.1371\/journal.pone.0028071","article-title":"A systematic review of re-identification attacks on health data","volume":"6","author":"Emam","year":"2011","journal-title":"PLoS One"},{"issue":"3","key":"2021031906034260200_ocaa303-B6","doi-asserted-by":"crossref","first-page":"1694","DOI":"10.1166\/asl.2018.11139","article-title":"Anonymizing healthcare records: a study of privacy preserving data publishing techniques","volume":"24","author":"Jayabalan","year":"2018","journal-title":"Adv Sci Lett"},{"issue":"3","key":"2021031906034260200_ocaa303-B7","first-page":"95","article-title":"A review of synthetic data generation methods for privacy preserving data publishing","volume":"6","author":"Surendra","year":"2017","journal-title":"IJSTR"},{"issue":"2","key":"2021031906034260200_ocaa303-B8","first-page":"461","article-title":"Discussion: statistical disclosure limitation","volume":"9","author":"Rubin","year":"1993","journal-title":"J Offic Stat"},{"issue":"3","key":"2021031906034260200_ocaa303-B9","first-page":"441","article-title":"Using CART to generate partially synthetic public use microdata","volume":"21","author":"Reiter","year":"2005","journal-title":"J Offic Stat"},{"issue":"1","key":"2021031906034260200_ocaa303-B10","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1214\/16-BA1047","article-title":"Dirichlet process mixture models for modeling and generating synthetic versions of nested categorical data","volume":"13","author":"Hu","year":"2018","journal-title":"Bayesian Anal"},{"key":"2021031906034260200_ocaa303-B11","first-page":"1","article-title":"Multiple imputation for statistical disclosure limitation","volume":"19","author":"Raghunathan","year":"2003","journal-title":"J Offc Stat"},{"issue":"1","key":"2021031906034260200_ocaa303-B12","first-page":"27","article-title":"Random forests for generating partially synthetic, categorical data","volume":"3","author":"Caiola","year":"2010","journal-title":"Trans Data Priv"},{"key":"2021031906034260200_ocaa303-B13","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1007\/978-3-642-15838-4_14","volume-title":"Privacy in Statistical Databases","author":"Drechsler","year":"2010"},{"key":"2021031906034260200_ocaa303-B14","first-page":"1","article-title":"Bayesian mixture modeling for multivariate conditional distributions","volume":"14","author":"DeYoreo","year":"2016","journal-title":"J Stat Theory Pract"},{"key":"2021031906034260200_ocaa303-B15","author":"Choi","year":"2017"},{"key":"2021031906034260200_ocaa303-B16","author":"Park","year":"2013"},{"issue":"C","key":"2021031906034260200_ocaa303-B17","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1016\/j.spasta.2015.07.008","article-title":"Bayesian marked point process modeling for generating fully synthetic public use data with point-referenced geography","volume":"14","author":"Quick","year":"2015","journal-title":"Spat Stat"},{"issue":"3","key":"2021031906034260200_ocaa303-B18","doi-asserted-by":"crossref","first-page":"230","DOI":"10.1093\/jamia\/ocx079","article-title":"Synthea: an approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record","volume":"25","author":"Walonoski","year":"2018","journal-title":"J Am Med Inform Assoc"},{"issue":"1","key":"2021031906034260200_ocaa303-B19","doi-asserted-by":"crossref","DOI":"10.1186\/s12911-019-0793-0","article-title":"The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures","volume":"19","author":"Chen","year":"2019","journal-title":"BMC Med Inform Decis Mak"},{"issue":"3","key":"2021031906034260200_ocaa303-B20","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1093\/jamia\/ocy142","article-title":"Synthesizing electronic health records using improved generative adversarial networks","volume":"26","author":"Baowaly","year":"2019","journal-title":"J Am Med Inform Assoc"},{"issue":"2","key":"2021031906034260200_ocaa303-B21","doi-asserted-by":"crossref","first-page":"94","DOI":"10.7861\/futurehosp.6-2-94","article-title":"The potential for artificial intelligence in healthcare","volume":"6","author":"Davenport","year":"2019","journal-title":"Future Healthc J"},{"issue":"11","key":"2021031906034260200_ocaa303-B22","doi-asserted-by":"crossref","first-page":"e1002689","DOI":"10.1371\/journal.pmed.1002689","article-title":"Machine learning in medicine: addressing ethical challenges","volume":"15","author":"Vayena","year":"2018","journal-title":"PLoS Med"},{"issue":"10","key":"2021031906034260200_ocaa303-B23","doi-asserted-by":"crossref","first-page":"1419","DOI":"10.1093\/jamia\/ocy068","article-title":"Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review","volume":"25","author":"Cao","year":"2018","journal-title":"J Am Med Inform Assoc"},{"issue":"Pt 1","key":"2021031906034260200_ocaa303-B24","first-page":"664","article-title":"A new machine learning classifier for high dimensional healthcare data","volume":"129","author":"Padman","year":"2007","journal-title":"Stud Health Technol Inform"},{"issue":"3","key":"2021031906034260200_ocaa303-B25","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1093\/jamia\/ocv110","article-title":"Real-time prediction of mortality, readmission, and length of stay using electronic health record data","volume":"23","author":"Cai","year":"2016","journal-title":"J Am Med Inform Assoc"},{"key":"2021031906034260200_ocaa303-B26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jbi.2018.10.007","article-title":"CBN: constructing a clinical Bayesian network based on data from the electronic medical record","volume":"88","author":"Shen","year":"2018","journal-title":"J Biomed Inform"},{"issue":"e2","key":"2021031906034260200_ocaa303-B27","doi-asserted-by":"crossref","first-page":"e267","DOI":"10.1136\/amiajnl-2013-001865","article-title":"Patient-tailored prioritization for a pediatric care decision support system through machine learning","volume":"20","author":"Klann","year":"2013","journal-title":"J Am Med Inform Assoc"},{"key":"2021031906034260200_ocaa303-B28","volume-title":"Learning Bayesian Networks","author":"Neapolitan","year":"2003"},{"issue":"3","key":"2021031906034260200_ocaa303-B29","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v035.i03","article-title":"Learning Bayesian networks with the bnlearn R Package","volume":"35","author":"Scutari","year":"2010","journal-title":"J Stat Softw"},{"issue":"3","key":"2021031906034260200_ocaa303-B30","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1007\/BF00994016","article-title":"Learning Bayesian networks: the combination of knowledge and statistical data","volume":"20","author":"Heckerman","year":"1995","journal-title":"Mach Learn"},{"key":"2021031906034260200_ocaa303-B31","first-page":"549","article-title":"Using Bayesian networks to create synthetic data","volume":"25","author":"Young","year":"2009","journal-title":"J Off Stat"},{"issue":"4","key":"2021031906034260200_ocaa303-B32","doi-asserted-by":"crossref","first-page":"1423","DOI":"10.1145\/3134428","article-title":"PrivBayes: private data release via Bayesian networks","volume":"42","author":"Zhang","year":"2017","journal-title":"ACM Trans Database Syst"},{"issue":"12","key":"2021031906034260200_ocaa303-B33","doi-asserted-by":"crossref","first-page":"3232","DOI":"10.1016\/j.csda.2011.06.006","article-title":"An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets","volume":"55","author":"Drechsler","year":"2011","journal-title":"Comput Stat Data Anal"},{"key":"2021031906034260200_ocaa303-B34","author":"Mclachlan","year":"2018"},{"issue":"2","key":"2021031906034260200_ocaa303-B35","doi-asserted-by":"crossref","first-page":"e16492","DOI":"10.2196\/16492","article-title":"Analyzing medical research results based on synthetic data and their relation to real data results: systematic comparison from five observational studies","volume":"8","author":"Reiner-Benaim","year":"2020","journal-title":"JMIR Med Inform"},{"key":"2021031906034260200_ocaa303-B36","author":"Pollard","year":"2016"},{"key":"2021031906034260200_ocaa303-B37","author":"Dua","year":"1988"},{"key":"2021031906034260200_ocaa303-B38","author":"Dua","year":"2014"},{"issue":"5","key":"2021031906034260200_ocaa303-B39","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1016\/0002-9149(89)90524-9","article-title":"International application of a new probability algorithm for the diagnosis of coronary artery disease","volume":"64","author":"Detrano","year":"1989","journal-title":"Am J Cardiol"},{"key":"2021031906034260200_ocaa303-B40","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2014\/781670","article-title":"Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records","volume":"2014","author":"Strack","year":"2014","journal-title":"BioMed Res Int"},{"issue":"1","key":"2021031906034260200_ocaa303-B41","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1007\/s10994-006-6889-7","article-title":"The max-min hill-climbing Bayesian network structure learning algorithm","volume":"65","author":"Tsamardinos","year":"2006","journal-title":"Mach Learn"},{"key":"2021031906034260200_ocaa303-B42","doi-asserted-by":"crossref","first-page":"S14","DOI":"10.1186\/1471-2105-13-S15-S14","article-title":"Empirical evaluation of scoring functions for Bayesian network model selection","volume":"13 (Suppl 15","author":"Liu","year":"2012","journal-title":"BMC Bioinform"},{"key":"2021031906034260200_ocaa303-B43","first-page":"549","volume-title":"Introduction to Algorithms","author":"Cormen","year":"2009","edition":"3rd ed."},{"key":"2021031906034260200_ocaa303-B44","volume-title":"Understanding Bayesian Networks with Examples in R","author":"Scutari","year":"2017"},{"issue":"6","key":"2021031906034260200_ocaa303-B45","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3127881","article-title":"Mining Electronic Health Records (EHRs): a survey","volume":"50","author":"Yadav","year":"2018","journal-title":"ACM Comput Surv"},{"key":"2021031906034260200_ocaa303-B46","first-page":"322","article-title":"Generation of realistic synthetic validation healthcare datasets using generative adversarial networks","volume":"272","author":"Bilici","year":"2020","journal-title":"Stud Health Technol Inform"},{"key":"2021031906034260200_ocaa303-B47","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1007\/978-3-319-09885-2_4","volume-title":"Advanced Research in Data Privacy","author":"Matwin","year":"2015"},{"issue":"1","key":"2021031906034260200_ocaa303-B48","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1186\/s12874-020-00977-1","article-title":"Generation and evaluation of synthetic patient data","volume":"20","author":"Goncalves","year":"2020","journal-title":"BMC Med Res Methodol"},{"key":"2021031906034260200_ocaa303-B49","author":"MacQueen","year":"1967"},{"key":"2021031906034260200_ocaa303-B50","first-page":"880","volume-title":"Advanced Engineering Mathematics","author":"Kreyszig","year":"1979"},{"key":"2021031906034260200_ocaa303-B51","author":"Dagum","year":"1992"},{"key":"2021031906034260200_ocaa303-B52","author":"Gal","year":"2015"},{"issue":"487","key":"2021031906034260200_ocaa303-B53","doi-asserted-by":"crossref","first-page":"1042","DOI":"10.1198\/jasa.2009.tm08439","article-title":"Nonparametric Bayes modeling of multivariate categorical data","volume":"104","author":"Dunson","year":"2009","journal-title":"J Am Stat Assoc"},{"issue":"1","key":"2021031906034260200_ocaa303-B54","first-page":"405","article-title":"Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality","volume":"20","author":"Reiter","year":"2010","journal-title":"Stat Sin"},{"key":"2021031906034260200_ocaa303-B55","author":"Camino","year":"2018"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/28\/4\/801\/36642121\/ocaa303.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/28\/4\/801\/36642121\/ocaa303.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,19]],"date-time":"2021-03-19T06:05:03Z","timestamp":1616133903000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/28\/4\/801\/6046159"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,23]]},"references-count":55,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2020,12,23]]},"published-print":{"date-parts":[[2021,3,18]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocaa303","relation":{},"ISSN":["1527-974X"],"issn-type":[{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,4,1]]},"published":{"date-parts":[[2020,12,23]]}}}