{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,24]],"date-time":"2026-04-24T15:16:41Z","timestamp":1777043801153,"version":"3.51.4"},"reference-count":59,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,3,18]],"date-time":"2025-03-18T00:00:00Z","timestamp":1742256000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Digit. Health"],"abstract":"<jats:p>Rare disease research faces significant challenges due to limited patient data, strict privacy regulations, and the need for diverse datasets to develop accurate AI-driven diagnostics and treatments. Synthetic data\u2014artificially generated datasets that mimic patient data while preserving privacy\u2014offer a promising solution to these issues. This article explores how synthetic data can bridge data gaps, enabling the training of AI models, simulating clinical trials, and facilitating cross-border collaborations in rare disease research. We examine case studies where synthetic data successfully replicated patient characteristics, and supported predictive modelling and ensured compliance with regulations like GDPR and HIPAA. While acknowledging current limitations, we discuss synthetic data\u2019s potential to revolutionise rare disease research by enhancing data availability and privacy file enabling more efficient and effective research efforts in diagnosing, treating, and managing rare diseases globally.<\/jats:p>","DOI":"10.3389\/fdgth.2025.1563991","type":"journal-article","created":{"date-parts":[[2025,3,18]],"date-time":"2025-03-18T07:00:56Z","timestamp":1742281256000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":35,"title":["Synthetic data generation: a privacy-preserving approach to accelerate rare disease research"],"prefix":"10.3389","volume":"7","author":[{"given":"Jorge M.","family":"Mendes","sequence":"first","affiliation":[]},{"given":"Aziz","family":"Barbar","sequence":"additional","affiliation":[]},{"given":"Marwa","family":"Refaie","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,3,18]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1007\/s40290-020-00332-1","article-title":"Establishing patient registries for rare diseases: rationale and challenges","volume":"34","author":"Boulanger","year":"2020","journal-title":"Pharmaceut Med"},{"key":"B2","doi-asserted-by":"publisher","first-page":"2011","DOI":"10.1093\/ndt\/gfp095","article-title":"Eunefron, the european network for the study of orphan nephropathies","volume":"24","author":"Devuyst","year":"2009","journal-title":"Nephrol Dial Transplant"},{"key":"B3","doi-asserted-by":"publisher","first-page":"416","DOI":"10.1017\/s0266462314000464","article-title":"Generating health technology assessment evidence for rare diseases","volume":"30","author":"Facey","year":"2014","journal-title":"Int J Technol Assess Health Care"},{"key":"B4","doi-asserted-by":"publisher","first-page":"105313","DOI":"10.1016\/j.compbiomed.2022.105313","article-title":"Fairvasc: a semantic web approach to rare disease registry integration","volume":"145","author":"Mcglinn","year":"2022","journal-title":"Comput Biol Med"},{"key":"B5","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1016\/j.atg.2014.04.003","article-title":"Rare disease research: breaking the privacy barrier","volume":"3","author":"Mascalzoni","year":"2014","journal-title":"Appl Transl Genomics"},{"key":"B6","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1038\/s41746-023-00927-3","article-title":"Harnessing the power of synthetic data in healthcare: innovation, application, and privacy","volume":"6","author":"Giuffr\u00e9","year":"2023","journal-title":"npj Digit Med"},{"key":"B7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/SmartNets58706.2023.10215825","article-title":"Leveraging generative AI models for synthetic data generation in healthcare: balancing research and privacy","volume-title":"2023 International Conference on Smart Applications, Communications and Networking (SmartNets)","author":"Jadon","year":"2023"},{"key":"B8","doi-asserted-by":"publisher","first-page":"1190","DOI":"10.21474\/IJAR01\/12392","article-title":"Rare disease registries \u2013 purpose, challenges & solutions","volume":"4","author":"Aziz","year":"2021","journal-title":"Int J Adv Res"},{"key":"B9","doi-asserted-by":"publisher","first-page":"12","DOI":"10.46982\/gjmt.2023.108","article-title":"Navigating the complexity of rare diseases: challenges, innovations, and future directions","volume":"5","author":"Ibrahim","year":"2023","journal-title":"Glob J Med Ther"},{"key":"B10","doi-asserted-by":"publisher","first-page":"686","DOI":"10.1007\/s11427-017-9099-3","article-title":"Towards efficiency in rare disease research: what is distinctive and important?","volume":"60","author":"Jia","year":"2017","journal-title":"Sci China Life Sci"},{"key":"B11","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1007\/978-90-481-9485-8_14","volume-title":"Rare Diseases Social Epidemiology: Analysis of Inequalities","author":"Kole","year":"2010"},{"key":"B12","doi-asserted-by":"publisher","first-page":"867","DOI":"10.1016\/j.drudis.2019.01.005","article-title":"Orphan drugs: major development challenges at the clinical stage","volume":"24","author":"Fonseca","year":"2019","journal-title":"Drug Discov Today"},{"key":"B13","doi-asserted-by":"publisher","first-page":"1412","DOI":"10.1002\/cpt.3407","article-title":"Getting the dose right in drug development for rare diseases: barriers and enablers","volume":"16","author":"Ahmed","year":"2024","journal-title":"Clin Pharmacol Ther"},{"key":"B14","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1016\/j.berh.2014.03.004","article-title":"Methodology of clinical trials for rare diseases","volume":"28","author":"Smith","year":"2014","journal-title":"Best Pract Res Clin Rheumatol"},{"key":"B15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.17140\/ctpoj-3-110","article-title":"Beyond placebo: alternative options to the randomized control trial design in rare disease studies","volume":"3","author":"Sriram","year":"2020","journal-title":"Clin Trials Pract Open J"},{"key":"B16","doi-asserted-by":"publisher","first-page":"2","DOI":"10.4172\/2471-9846.1000167","article-title":"Orphan drugs: getting arms around rare diseases","volume":"3","author":"Irmak","year":"2017","journal-title":"J Community Public Health Nurs"},{"key":"B17","doi-asserted-by":"publisher","first-page":"69","DOI":"10.3390\/jimaging9030069","article-title":"GANs for medical image synthesis: an empirical study","volume":"9","author":"Skandarani","year":"2023","journal-title":"J Imaging"},{"key":"B18","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1038\/s41598-023-50566-7","article-title":"Conditional generative learning for medical image imputation","volume":"14","author":"Raad","year":"2024","journal-title":"Sci Rep"},{"key":"B19","article-title":"Cyclegan models for MRI image translation. CoRR [Preprint]","volume-title":"abs\/2401.00023","author":"Czobit","year":"2024"},{"key":"B20","doi-asserted-by":"publisher","first-page":"868","DOI":"10.1016\/j.neunet.2023.06.033","article-title":"Tcgan: convolutional generative adversarial network for time series classification and clustering","volume":"165","author":"Huang","year":"2023","journal-title":"Neural Netw"},{"key":"B21","doi-asserted-by":"publisher","first-page":"vead022","DOI":"10.1093\/ve\/vead022","article-title":"Mutagan: a sequence-to-sequence gan framework to predict mutations of evolving protein populations","volume":"9","author":"Berman","year":"2023","journal-title":"Virus Evol"},{"key":"B22","article-title":"Multi-modal conditional GAN: data synthesis in the medical domain","volume-title":"NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research","author":"Ziegler","year":"2022"},{"key":"B23","doi-asserted-by":"publisher","first-page":"17","DOI":"10.35940\/ijeat.B3263.1211221","article-title":"Gans and vaes as methods of synthetic data generation and augmentation to enhance heart disease prediction","volume":"11","author":"Sahoo","year":"2021","journal-title":"Int J Eng Adv Technol"},{"key":"B24","doi-asserted-by":"publisher","first-page":"7609","DOI":"10.1038\/s41467-022-35295-1","article-title":"A multifaceted benchmarking of synthetic electronic health record generation models","volume":"13","author":"Yan","year":"2022","journal-title":"Nat Commun"},{"key":"B25","doi-asserted-by":"publisher","first-page":"107105","DOI":"10.1016\/j.compeleceng.2021.107105","article-title":"Key strategies for synthetic data generation for training intelligent systems based on people detection from omnidirectional cameras","volume":"92","author":"Aranjuelo","year":"2021","journal-title":"Comput Electr Eng"},{"key":"B26","doi-asserted-by":"publisher","first-page":"2892","DOI":"10.1016\/j.csbj.2024.07.005","article-title":"Synthetic data generation methods in healthcare: a review on open-source tools and methods","volume":"23","author":"Pezoulas","year":"2024","journal-title":"Comput Struct Biotechnol J"},{"key":"B27","doi-asserted-by":"publisher","first-page":"35","DOI":"10.62802\/x9ae7523","article-title":"Advanced AI and augmented reality (AR) integration in medical and surgical practice","volume":"8","author":"Liv","year":"2024","journal-title":"Next Front Life Sci AI"},{"key":"B28","doi-asserted-by":"publisher","first-page":"443","DOI":"10.60087\/jaigs.v5i1.256","article-title":"The role of synthetic data in advancing ai models: opportunities, challenges, and ethical considerations","volume":"5","author":"Kumar","year":"2024","journal-title":"J Artif Intell Gen Sci"},{"key":"B29","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1007\/s12038-022-00278-3","article-title":"Leveraging deep learning algorithms for synthetic data generation to design and analyze biological networks","volume":"47","author":"Achuthan","year":"2022","journal-title":"J Biosci"},{"key":"B30","doi-asserted-by":"crossref","DOI":"10.1101\/2024.06.09.24308649","article-title":"Cross-modality synthetic data augmentation using GANs: enhancing brain MRI and chest x-ray classification","volume-title":"medRxiv","author":"Dhawan","year":"2024"},{"key":"B31","doi-asserted-by":"publisher","first-page":"25813","DOI":"10.21203\/rs.3.rs-4473429\/v1","article-title":"Adversarial robustness improvement for x-ray bone segmentation using synthetic data created from computed tomography scans","volume":"14","author":"Fok","year":"2024","journal-title":"Springer Sci Bus Media LLC"},{"key":"B32","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1038\/s41746-024-01076-x","article-title":"Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence","volume":"7","author":"Eckardt","year":"2024","journal-title":"npj Digit Med"},{"key":"B33","doi-asserted-by":"publisher","first-page":"e2300021","DOI":"10.1200\/cci.23.00021","article-title":"Synthetic data generation by artificial intelligence to accelerate research and precision medicine in hematology","volume":"7","author":"D\u2019Amico","year":"2023","journal-title":"JCO Clin Cancer Inform"},{"key":"B34","doi-asserted-by":"publisher","first-page":"167","DOI":"10.1186\/s12911-024-02563-7","article-title":"Collaborative learning from distributed data with differentially private synthetic data","volume":"24","author":"Prediger","year":"2024","journal-title":"BMC Med Inform Decis Mak"},{"key":"B35","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1155\/2022\/2886795","article-title":"Security and privacy threats to federated learning: issues, methods, and challenges","volume":"2022","author":"Zhang","year":"2022","journal-title":"Secur Commun Netw"},{"key":"B36","doi-asserted-by":"publisher","first-page":"103638","DOI":"10.1016\/j.im.2022.103638","article-title":"Antecedents and consequences of data breaches: a systematic review","volume":"59","author":"Schlackl","year":"2022","journal-title":"Inform Manage"},{"key":"B37","doi-asserted-by":"publisher","first-page":"e1009303","DOI":"10.1371\/journal.pgen.1009303","article-title":"Creating artificial human genomes using generative neural networks","volume":"17","author":"Yelmen","year":"2021","journal-title":"PLoS Genet"},{"key":"B38","article-title":"Data from: European health data & evidence network: Advancing health research with synthetic data solutions in a gdpr-compliant framework","volume-title":"EHDEN Project Documentation","year":"2022"},{"key":"B39","article-title":"Data from: Omop common data model","year":""},{"key":"B40","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1093\/jamia\/ocad214","article-title":"European health data & evidence network\u2013learnings from building out a standardized international health data network","volume":"31","author":"Voss","year":"2023","journal-title":"J Am Med Inform Assoc"},{"key":"B41","doi-asserted-by":"publisher","first-page":"28","DOI":"10.3233\/SHTI240007","article-title":"Mapping the bulgarian diabetes register to OMOP CDM: Application results","volume":"313","author":"Krastev","year":"2024","journal-title":"Stud Health Technol Inform"},{"key":"B42","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1186\/s13023-024-03254-2","article-title":"Synthetic datasets for open software development in rare disease research","volume":"19","author":"Al-Dhamari","year":"2024","journal-title":"Orphanet J Rare Dis"},{"key":"B43","doi-asserted-by":"publisher","first-page":"106263","DOI":"10.1016\/j.bspc.2024.106263","article-title":"Robust deep learning for eye fundus images: bridging real and synthetic data for enhancing generalization","volume":"94","author":"Oliveira","year":"2024","journal-title":"Biomed Signal Process Control"},{"key":"B44","doi-asserted-by":"publisher","first-page":"104688","DOI":"10.1016\/j.imavis.2023.104688","article-title":"Synthetic data for face recognition: current state and future prospects","volume":"135","author":"Boutros","year":"2023","journal-title":"Image Vis Comput"},{"key":"B45","doi-asserted-by":"publisher","first-page":"205034","DOI":"10.1109\/access.2020.3036916","article-title":"Gdpr compliant information confidentiality preservation in big data processing","volume":"8","author":"Caruccio","year":"2020","journal-title":"IEEE Access"},{"key":"B46","doi-asserted-by":"publisher","first-page":"1347","DOI":"10.1080\/08870446.2019.1606222","article-title":"Why and how we should care about the general data protection regulation","volume":"34","author":"Crutzen","year":"2019","journal-title":"Psychol Health"},{"key":"B47","doi-asserted-by":"publisher","first-page":"ezad289","DOI":"10.1093\/ejcts\/ezad289","article-title":"The significance of general data protection regulation in the compliant data contribution to the European society of thoracic surgeons database","volume":"64","author":"Bertolaccini","year":"2023","journal-title":"Eur J Cardiothorac Surg"},{"key":"B48","doi-asserted-by":"publisher","first-page":"1607","DOI":"10.1038\/ejhg.2015.27","article-title":"Stakeholders\u2019 perspectives on biobank-based genomic research: systematic review of the literature","volume":"23","author":"Husedzinovic","year":"2015","journal-title":"Eur J Hum Genet"},{"key":"B49","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1038\/s41746-023-00771-5","article-title":"Patient-centric synthetic data generation, no reason to risk reidentification in biomedical data analysis","volume":"6","author":"Guillaudeux","year":"2023","journal-title":"Npj Digit Med"},{"key":"B50","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1038\/s41746-018-0070-0","article-title":"Natural language generation for electronic health records","volume":"1","author":"Lee","year":"2018","journal-title":"Npj Digit Med"},{"key":"B51","doi-asserted-by":"publisher","first-page":"RP84874","DOI":"10.7554\/elife.84874","article-title":"Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations","volume":"12","author":"Lauterbur","year":"2023","journal-title":"eLife"},{"key":"B52","doi-asserted-by":"publisher","first-page":"3446","DOI":"10.1109\/tpami.2022.3180560","article-title":"Memory uncertainty learning for real-world single image deraining","volume":"45","author":"Huang","year":"2023","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"B53","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1080\/00031305.1990.10475726","article-title":"Improving the teaching of applied statistics: putting the data back into data analysis","volume":"44","author":"Singer","year":"1990","journal-title":"Am Stat"},{"key":"B54","doi-asserted-by":"publisher","first-page":"300","DOI":"10.1109\/tai.2022.3229289","article-title":"A universal metric for robust evaluation of synthetic tabular data","volume":"5","author":"Chundawat","year":"2024","journal-title":"IEEE Trans Artif Intell"},{"key":"B55","doi-asserted-by":"publisher","first-page":"485","DOI":"10.1016\/j.ins.2021.12.018","article-title":"Differentially private synthetic medical data generation using convolutional GANs","volume":"586","author":"Torfi","year":"2021","journal-title":"Inf Sci"},{"key":"B56","doi-asserted-by":"publisher","first-page":"105331","DOI":"10.1016\/j.isci.2022.105331","article-title":"Synthetic data as an enabler for machine learning applications in medicine","volume":"25","author":"Rajotte","year":"2022","journal-title":"IScience"},{"key":"B57","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1109\/mic.2008.55","article-title":"Generating synthetic data to match data mining patterns","volume":"12","author":"Eno","year":"2008","journal-title":"IEEE Internet Comput"},{"key":"B58","doi-asserted-by":"publisher","first-page":"1181","DOI":"10.3390\/s19051181","article-title":"Synsys: a synthetic data generation system for healthcare applications","volume":"19","author":"Dahmen","year":"2019","journal-title":"Sensors"},{"key":"B59","doi-asserted-by":"publisher","first-page":"e0000082","DOI":"10.1371\/journal.pdig.0000082","article-title":"Synthetic data in health care: a narrative review","volume":"2","author":"Gonzales","year":"2023","journal-title":"PLoS Digit Health"}],"container-title":["Frontiers in Digital Health"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1563991\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,18]],"date-time":"2025-03-18T07:01:02Z","timestamp":1742281262000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1563991\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,18]]},"references-count":59,"alternative-id":["10.3389\/fdgth.2025.1563991"],"URL":"https:\/\/doi.org\/10.3389\/fdgth.2025.1563991","relation":{},"ISSN":["2673-253X"],"issn-type":[{"value":"2673-253X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,18]]},"article-number":"1563991"}}