{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T19:58:41Z","timestamp":1766087921645,"version":"3.37.3"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2022,10,13]],"date-time":"2022-10-13T00:00:00Z","timestamp":1665619200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"DOI":"10.13039\/501100012023","name":"Sanquin Blood Supply Foundation","doi-asserted-by":"publisher","award":["PPOC-16-27"],"award-info":[{"award-number":["PPOC-16-27"]}],"id":[{"id":"10.13039\/501100012023","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,12,13]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objective<\/jats:title>\n                  <jats:p>Privacy is a concern whenever individual patient health data is exchanged for scientific research. We propose using mixed sum-product networks (MSPNs) as private representations of data and take samples from the network to generate synthetic data that can be shared for subsequent statistical analysis. This anonymization method was evaluated with respect to privacy and information loss.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and methods<\/jats:title>\n                  <jats:p>Using a simulation study, information loss was quantified by assessing whether synthetic data could reproduce regression parameters obtained from the original data. Predictors variable types were varied between continuous, count, categorical, and mixed discrete-continuous. Additionally, we measured whether the MSPN approach successfully anonymizes the data by removing associations between background and sensitive information for these datasets.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>The synthetic data generated with MSPNs yielded regression results highly similar to those generated with original data, differing less than 5% in most simulation scenarios. Standard errors increased compared to the original data. Particularly for smaller datasets (1000 records), this resulted in a discrepancy between the estimated and empirical standard errors. Sensitive values could no longer be inferred from background information for at least 99% of tested individuals.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion<\/jats:title>\n                  <jats:p>The proposed anonymization approach yields very promising results. Further research is required to evaluate its performance with other types of data and analyses, and to predict how user parameter choices affect a bias-privacy trade-off.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusion<\/jats:title>\n                  <jats:p>Generating synthetic data from MSPNs is a promising, easy-to-use approach for anonymization of sensitive individual health data that yields informative and private data.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocac184","type":"journal-article","created":{"date-parts":[[2022,10,13]],"date-time":"2022-10-13T19:06:49Z","timestamp":1665688009000},"page":"16-25","source":"Crossref","is-referenced-by-count":4,"title":["Generating synthetic mixed discrete-continuous health records with mixed sum-product networks"],"prefix":"10.1093","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2761-1245","authenticated-orcid":false,"given":"Shannon K S","family":"Kroes","sequence":"first","affiliation":[{"name":"Transfusion Technology Assessment Group, Donor Medicine Research Department, Sanquin Research , Amsterdam, The Netherlands"},{"name":"Leiden Institute of Advanced Computer Science, Computer Science, Leiden University , Leiden, The Netherlands"},{"name":"Department of Clinical Epidemiology, Leiden University Medical Center , Leiden, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0510-3549","authenticated-orcid":false,"given":"Matthijs","family":"van Leeuwen","sequence":"additional","affiliation":[{"name":"Leiden Institute of Advanced Computer Science, Computer Science, Leiden University , Leiden, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9238-6999","authenticated-orcid":false,"given":"Rolf H H","family":"Groenwold","sequence":"additional","affiliation":[{"name":"Department of Clinical Epidemiology, Leiden University Medical Center , Leiden, The Netherlands"},{"name":"Department of Biomedical Data Sciences, Leiden University Medical Center , Leiden, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1682-7817","authenticated-orcid":false,"given":"Mart P","family":"Janssen","sequence":"additional","affiliation":[{"name":"Transfusion Technology Assessment Group, Donor Medicine Research Department, Sanquin Research , Amsterdam, The Netherlands"},{"name":"Leiden Institute of Advanced Computer Science, Computer Science, Leiden University , Leiden, The Netherlands"}]}],"member":"286","published-online":{"date-parts":[[2022,10,13]]},"reference":[{"year":"2020","author":"Torfi","key":"2022121408280652800_ocac184-B1"},{"key":"2022121408280652800_ocac184-B2","doi-asserted-by":"crossref","first-page":"406","DOI":"10.1007\/978-3-030-45385-5_36","volume-title":"International Work-Conference on Bioinformatics and Biomedical Engineering","author":"Piacentino","year":"2020"},{"issue":"3","key":"2022121408280652800_ocac184-B3","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1093\/jamia\/ocy142","article-title":"Synthesizing electronic health records using improved generative adversarial networks","volume":"26","author":"Baowaly","year":"2019","journal-title":"J Am Med Inform Assoc"},{"key":"2022121408280652800_ocac184-B4","first-page":"253","article-title":"PeGS: perturbed gibbs samplers that generate privacy-compliant synthetic data","author":"Park","year":"2014","journal-title":"Trans. Data Priv"},{"key":"2022121408280652800_ocac184-B5","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1007\/978-3-642-15838-4_14","volume-title":"International Conference on Privacy in Statistical Databases","author":"Drechsler","year":"2010"},{"issue":"4","key":"2022121408280652800_ocac184-B6","doi-asserted-by":"crossref","first-page":"801","DOI":"10.1093\/jamia\/ocaa303","article-title":"Application of Bayesian networks to generate synthetic health data","volume":"28","author":"Kaur","year":"2021","journal-title":"J Am Med Inform Assoc"},{"key":"2022121408280652800_ocac184-B7","first-page":"1677","volume-title":"Proceedings of the VLDB Endowment International Conference on Very Large Data Bases","author":"Li","year":"2014"},{"key":"2022121408280652800_ocac184-B8","first-page":"1","volume-title":"International Conference on Theory and Applications of Models of Computation","author":"Dwork","year":"2008"},{"key":"2022121408280652800_ocac184-B9","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1007\/978-3-642-24861-0_22","volume-title":"International Conference on Information Security","author":"Lee","year":"2011"},{"first-page":"689","year":"2011","author":"Poon","key":"2022121408280652800_ocac184-B10"},{"key":"2022121408280652800_ocac184-B11","first-page":"44","article-title":"Sum-product networks: a survey","author":"Sanchez-Cauce","year":"2021","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2022121408280652800_ocac184-B12","first-page":"3828","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Molina","year":"2018"},{"key":"2022121408280652800_ocac184-B13","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1016\/j.cosrev.2019.02.001","article-title":"Privacy preserving publication of relational and transaction data: survey on the anonymization of patient data","volume":"32","author":"Puri","year":"2019","journal-title":"Comput Sci Rev"},{"issue":"1","key":"2022121408280652800_ocac184-B14","doi-asserted-by":"crossref","first-page":"3\u2013es","DOI":"10.1145\/1217299.1217302","article-title":"l-diversity: privacy beyond k-anonymity","volume":"1","author":"Machanavajjhala","year":"2007","journal-title":"ACM Trans Knowl Discov Data"},{"issue":"05","key":"2022121408280652800_ocac184-B15","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1142\/S0218488502001648","article-title":"k-anonymity: a model for protecting privacy","volume":"10","author":"Sweeney","year":"2002","journal-title":"Int J Unc Fuzz Knowl Based Syst"},{"first-page":"139","year":"2006","author":"Xiao","key":"2022121408280652800_ocac184-B16"},{"issue":"3","key":"2022121408280652800_ocac184-B17","doi-asserted-by":"crossref","first-page":"561","DOI":"10.1109\/TKDE.2010.236","article-title":"Slicing: a new approach for privacy preserving data publishing","volume":"24","author":"Li","year":"2012","journal-title":"IEEE Trans Knowl Data Eng"},{"year":"2020","author":"Terrovitis","key":"2022121408280652800_ocac184-B18","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1207.0135"},{"issue":"4","key":"2022121408280652800_ocac184-B19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1749603.1749605","article-title":"Privacy-preserving data publishing: a survey of recent developments","volume":"42","author":"Fung","year":"2010","journal-title":"ACM Comput Surv"},{"year":"2019","author":"Molina","key":"2022121408280652800_ocac184-B20","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1901.03704"},{"author":"Arthur","key":"2022121408280652800_ocac184-B21","first-page":"1027"},{"issue":"2","key":"2022121408280652800_ocac184-B22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1177\/1460458220983398","article-title":"Evaluating privacy of individuals in medical data","volume":"27","author":"Kroes","year":"2021","journal-title":"Health Inform J"},{"key":"2022121408280652800_ocac184-B23","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1145\/1376616.1376666","volume-title":"Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data","author":"Li","year":"2008"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/30\/1\/16\/47829656\/ocac184.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/30\/1\/16\/47829656\/ocac184.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,14]],"date-time":"2022-12-14T11:50:21Z","timestamp":1671018621000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/30\/1\/16\/6760688"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,13]]},"references-count":23,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,10,13]]},"published-print":{"date-parts":[[2022,12,13]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocac184","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"type":"print","value":"1067-5027"},{"type":"electronic","value":"1527-974X"}],"subject":[],"published-other":{"date-parts":[[2023,1,1]]},"published":{"date-parts":[[2022,10,13]]}}}