{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T04:42:41Z","timestamp":1780980161085,"version":"3.54.1"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,8,16]],"date-time":"2025-08-16T00:00:00Z","timestamp":1755302400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,8,16]],"date-time":"2025-08-16T00:00:00Z","timestamp":1755302400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"NIH","award":["R01MH126137, R41TR004515, T32GM141746"],"award-info":[{"award-number":["R01MH126137, R41TR004515, T32GM141746"]}]},{"name":"NIH","award":["R01MH126137, R41TR004515, T32GM141746"],"award-info":[{"award-number":["R01MH126137, R41TR004515, T32GM141746"]}]},{"name":"NSF","award":["1916425, 1734853, 1636840"],"award-info":[{"award-number":["1916425, 1734853, 1636840"]}]},{"name":"NSF","award":["1916425, 1734853, 1636840"],"award-info":[{"award-number":["1916425, 1734853, 1636840"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>The sensitive nature of electronic health records (EHR) and wearable data presents challenges in sharing biomedical resources while minimizing re-identification risks. This article introduces an end-to-end, titratable pipeline that generates privacy-preserving \u201cdigital twin\u201d datasets from complex EHR and wearable-device records (Apple Watch data from 3029 participants) using DataSifter and Synthetic Data Vault (SDV) methods. Various obfuscation levels were applied (DataSifter: small, medium, large; SDV: CTGAN, Gaussian Copula) and benchmarked using utility (statistical fidelity, machine learning performance) and privacy (re-identification risk, detection likelihood) metrics. The highest-obfuscation DataSifter twin delivered the strongest privacy protection (0.83) while preserving key statistical and predictive signals (83.1% confidence interval overlap in regression models), outperforming SDV, particularly for longitudinal data. Despite declining performance in machine learning tasks with higher obfuscation, utility was generally preserved. The study underscores the importance of digital twin datasets and highlights DataSifter\u2019s adaptability in privacy-utility trade-offs, advocating its utility for secure data sharing.<\/jats:p>","DOI":"10.1038\/s41746-025-01935-1","type":"journal-article","created":{"date-parts":[[2025,8,16]],"date-time":"2025-08-16T17:28:42Z","timestamp":1755365322000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Medical data sharing and synthetic clinical data generation \u2013 maximizing biomedical resource utilization and minimizing participant re-identification risks"],"prefix":"10.1038","volume":"8","author":[{"given":"Simeone","family":"Marino","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ruth","family":"Cassidy","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Joseph","family":"Nanni","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yuxuan","family":"Wang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yipeng","family":"Liu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mingyi","family":"Tang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yuan","family":"Yuan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Toby","family":"Chen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Anik","family":"Sinha","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Balaji","family":"Pandian","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ivo D.","family":"Dinov","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Michael L.","family":"Burns","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2025,8,16]]},"reference":[{"key":"1935_CR1","doi-asserted-by":"publisher","first-page":"1629","DOI":"10.1111\/cts.13055","volume":"14","author":"CP Austin","year":"2021","unstructured":"Austin, C. P. Opportunities and challenges in translational science. Clin. Transl. Sci. 14, 1629\u20131647 (2021).","journal-title":"Clin. Transl. Sci."},{"key":"1935_CR2","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1186\/s12874-020-00977-1","volume":"20","author":"A Goncalves","year":"2020","unstructured":"Goncalves, A. et al. Generation and evaluation of synthetic patient data. BMC Med. Res. Methodol. 20, 108 (2020).","journal-title":"BMC Med. Res. Methodol"},{"key":"1935_CR3","doi-asserted-by":"publisher","first-page":"4684","DOI":"10.1007\/s11227-018-2686-x","volume":"75","author":"G Ayoade","year":"2019","unstructured":"Ayoade, G. et al. Secure data processing for IoT middleware systems. J. Supercomput. 75, 4684\u20134709 (2019).","journal-title":"J. Supercomput."},{"key":"1935_CR4","doi-asserted-by":"crossref","unstructured":"Hong, Z., Li, Z. & Xia, Y. SDVisor: Secure debug enclave with hypervisor. in 2019 IEEE International Conference on Service-Oriented System Engineering (SOSE) (IEEE, 2019).","DOI":"10.1109\/SOSE.2019.00036"},{"key":"1935_CR5","doi-asserted-by":"crossref","unstructured":"Gentry, C. A. Fully Homomorphic Encryption Scheme. (Proquest, Umi Dissertation Publishing, 2011).","DOI":"10.1007\/978-3-642-20465-4_9"},{"key":"1935_CR6","doi-asserted-by":"crossref","unstructured":"Wood, A., Shpilrain, V., Najarian, K., Mostashari, A. & Kahrobaei, D. Private-key fully homomorphic encryption for private classification. in Mathematical Software \u2013 ICMS 2018 475\u2013481 (Springer International Publishing, Cham, 2018).","DOI":"10.1007\/978-3-319-96418-8_56"},{"key":"1935_CR7","doi-asserted-by":"crossref","unstructured":"Dwork, C. Differential privacy: A survey of results. in Lecture Notes in Computer Science 1\u201319 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2008).","DOI":"10.1007\/978-3-540-79228-4_1"},{"key":"1935_CR8","doi-asserted-by":"crossref","unstructured":"Dwork, C. The differential privacy frontier (extended abstract). in Theory of Cryptography 496\u2013502 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2009).","DOI":"10.1007\/978-3-642-00457-5_29"},{"key":"1935_CR9","first-page":"17","volume":"7","author":"C Dwork","year":"2017","unstructured":"Dwork, C., McSherry, F., Nissim, K. & Smith, A. Calibrating noise to sensitivity in private data analysis. J. Priv. Confid. 7, 17\u201351 (2017).","journal-title":"J. Priv. Confid."},{"key":"1935_CR10","doi-asserted-by":"crossref","unstructured":"Dwork, C. & Roth, A. The Algorithmic Foundations of Differential Privacy. (now, Hanover, MD, 2014).","DOI":"10.1561\/9781601988195"},{"key":"1935_CR11","doi-asserted-by":"crossref","unstructured":"Xiao, X. iReduct: Differential privacy with reduced relative errors. in Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (ACM, 2011).","DOI":"10.1145\/1989323.1989348"},{"key":"1935_CR12","doi-asserted-by":"publisher","first-page":"1200","DOI":"10.1109\/TKDE.2010.247","volume":"23","author":"X Xiao","year":"2011","unstructured":"Xiao, X., Wang, G. & Gehrke, J. Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23, 1200\u20131214 (2011).","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"1935_CR13","doi-asserted-by":"publisher","first-page":"636","DOI":"10.1126\/science.aaa9375","volume":"349","author":"C Dwork","year":"2015","unstructured":"Dwork, C. et al. STATISTICS. The reusable holdout: Preserving validity in adaptive data analysis. Science 349, 636\u2013638 (2015).","journal-title":"Science"},{"key":"1935_CR14","doi-asserted-by":"publisher","first-page":"102","DOI":"10.1136\/amiajnl-2012-001047","volume":"20","author":"C Dwork","year":"2013","unstructured":"Dwork, C. & Pottenger, R. Toward practicing privacy. J. Am. Med. Inform. Assoc. 20, 102\u2013108 (2013).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"1935_CR15","doi-asserted-by":"publisher","first-page":"9561","DOI":"10.1007\/s10586-018-2723-9","volume":"22","author":"G Prabu Kanna","year":"2019","unstructured":"Prabu Kanna, G. & Vasudevan, V. A fully homomorphic\u2013elliptic curve cryptography based encryption algorithm for ensuring the privacy preservation of the cloud data. Clust. Comput 22, 9561\u20139569 (2019).","journal-title":"Clust. Comput"},{"key":"1935_CR16","doi-asserted-by":"publisher","first-page":"1442","DOI":"10.1111\/j.1475-6773.2010.01140.x","volume":"45","author":"S Rosenbaum","year":"2010","unstructured":"Rosenbaum, S. Data governance and stewardship: designing data stewardship entities and advancing data access. Health Serv. Res. 45, 1442\u20131455 (2010).","journal-title":"Health Serv. Res."},{"key":"1935_CR17","doi-asserted-by":"publisher","first-page":"752","DOI":"10.1109\/TKDE.2013.38","volume":"26","author":"S Bajaj","year":"2014","unstructured":"Bajaj, S. & Sion, R. TrustedDB: A trusted hardware-based database with privacy and data confidentiality. IEEE Trans. Knowl. Data Eng. 26, 752\u2013765 (2014).","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"1935_CR18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2799647","volume":"33","author":"A Baumann","year":"2015","unstructured":"Baumann, A., Peinado, M. & Hunt, G. Shielding applications from an untrusted cloud with Haven. ACM Trans. Comput. Syst. 33, 1\u201326 (2015).","journal-title":"ACM Trans. Comput. Syst."},{"key":"1935_CR19","doi-asserted-by":"crossref","unstructured":"Gupta, C. P. & Sharma, I. A fully homomorphic encryption scheme with symmetric keys with application to private data processing in clouds. In 2013 Fourth International Conference on the Network of the Future (NoF) (IEEE, 2013).","DOI":"10.1109\/NOF.2013.6724526"},{"key":"1935_CR20","doi-asserted-by":"crossref","unstructured":"Johnson, N., Near, J. P., Hellerstein, J. M. & Song, D. Chorus: A programming framework for building scalable differential privacy mechanisms. in 2020 IEEE European Symposium on Security and Privacy (EuroS&P) (IEEE, 2020).","DOI":"10.1109\/EuroSP48549.2020.00041"},{"key":"1935_CR21","doi-asserted-by":"publisher","first-page":"104404","DOI":"10.1016\/j.jbi.2023.104404","volume":"143","author":"C Sun","year":"2023","unstructured":"Sun, C., van Soest, J. & Dumontier, M. Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy. J. Biomed. Inform. 143, 104404 (2023).","journal-title":"J. Biomed. Inform."},{"key":"1935_CR22","unstructured":"Xie, L., Lin, K., Wang, S., Wang, F. & Zhou, J. Differentially private generative adversarial network. arXiv [cs.LG] (2018)."},{"key":"1935_CR23","doi-asserted-by":"publisher","first-page":"485","DOI":"10.1016\/j.ins.2021.12.018","volume":"586","author":"A Torfi","year":"2022","unstructured":"Torfi, A., Fox, E. A. & Reddy, C. K. Differentially private synthetic medical data generation using convolutional GANs. Inf. Sci. 586, 485\u2013500 (2022).","journal-title":"Inf. Sci."},{"key":"1935_CR24","doi-asserted-by":"crossref","unstructured":"Torkzadehmahani, R., Kairouz, P. & Paten, B. DP-CGAN: Differentially private synthetic data and label generation. arXiv [cs.LG] (2020).","DOI":"10.1109\/CVPRW.2019.00018"},{"key":"1935_CR25","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1038\/s41746-022-00610-z","volume":"5","author":"R Laubenbacher","year":"2022","unstructured":"Laubenbacher, R. et al. Building digital twins of the human immune system: toward a roadmap. NPJ Digit. Med. 5, 64 (2022).","journal-title":"NPJ Digit. Med."},{"key":"1935_CR26","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1038\/s41746-024-01073-0","volume":"7","author":"E Katsoulakis","year":"2024","unstructured":"Katsoulakis, E. et al. Digital twins for health: a scoping review. NPJ Digit. Med. 7, 77 (2024).","journal-title":"NPJ Digit. Med."},{"key":"1935_CR27","doi-asserted-by":"crossref","unstructured":"Patki, N., Wedge, R. & Veeramachaneni, K. The synthetic data vault. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (IEEE, 2016).","DOI":"10.1109\/DSAA.2016.49"},{"key":"1935_CR28","unstructured":"Dinov, I., Vandervest, J. & Marino, S. Electronic Medical Record Datasifter. (Google Patents, 2020)."},{"key":"1935_CR29","doi-asserted-by":"crossref","unstructured":"Noah, J. P., Near, J. M. & Hellerstein, D. Chorus: a programming framework for building scalable differential privacy mechanisms. in 2020 IEEE European Symposium on Security and Privacy (EuroS&P) 535\u2013551 (IEEE, 2020).","DOI":"10.1109\/EuroSP48549.2020.00041"},{"key":"1935_CR30","doi-asserted-by":"publisher","first-page":"e16492","DOI":"10.2196\/16492","volume":"8","author":"A Reiner Benaim","year":"2020","unstructured":"Reiner Benaim, A. et al. Analyzing medical research results based on synthetic data and their relation to real data results: Systematic comparison from five observational studies. JMIR Med. Inform. 8, e16492 (2020).","journal-title":"JMIR Med. Inform."},{"key":"1935_CR31","doi-asserted-by":"publisher","first-page":"e707","DOI":"10.1016\/S2589-7500(21)00138-2","volume":"3","author":"JR Golbus","year":"2021","unstructured":"Golbus, J. R., Pescatore, N. A., Nallamothu, B. K., Shah, N. & Kheterpal, S. Wearable device signals and home blood pressure data across age, sex, race, ethnicity, and clinical phenotypes in the Michigan Predictive Activity & Clinical Trajectories in Health (MIPACT) study: a prospective, community-based observational study. Lancet Digit. Health 3, e707\u2013e715 (2021).","journal-title":"Lancet Digit. Health"},{"key":"1935_CR32","unstructured":"The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies. https:\/\/www.equator-network.org\/reporting-guidelines\/strobe\/."},{"key":"1935_CR33","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1080\/00949655.2018.1545228","volume":"89","author":"S Marino","year":"2018","unstructured":"Marino, S. et al. HDDA: DataSifter: statistical obfuscation of electronic health records and other sensitive datasets. J. Stat. Comput. Simul. 89, 249\u2013271 (2018).","journal-title":"J. Stat. Comput. Simul."},{"key":"1935_CR34","doi-asserted-by":"publisher","first-page":"174830262110653","DOI":"10.1177\/17483026211065379","volume":"16","author":"N Zhou","year":"2022","unstructured":"Zhou, N., Wang, L., Marino, S., Zhao, Y. & Dinov, I. D. DataSifter II: Partially synthetic data sharing of sensitive information containing time-varying correlated observations. J. Algorithm Comput. Technol. 16, 174830262110653 (2022).","journal-title":"J. Algorithm Comput. Technol."},{"key":"1935_CR35","doi-asserted-by":"publisher","DOI":"10.1007\/s10916-022-01880-6","volume":"46","author":"N Zhou","year":"2022","unstructured":"Zhou, N., Wu, Q., Wu, Z., Marino, S. & Dinov, I. D. DataSifterText: Partially synthetic text generation for sensitive clinical notes. J. Med. Syst. 46, 96 (2022).","journal-title":"J. Med. Syst."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01935-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01935-1","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01935-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,9]],"date-time":"2025-09-09T15:14:33Z","timestamp":1757430873000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01935-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,16]]},"references-count":35,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1935"],"URL":"https:\/\/doi.org\/10.1038\/s41746-025-01935-1","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,16]]},"assertion":[{"value":"7 March 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 August 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 August 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"SM and ID developed the DataSifter statistical obfuscator technique, which is patented by the University of Michigan, and exclusively licensed to GrayRain. There is no funding support from this company.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"526"}}