{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T07:43:06Z","timestamp":1772610186977,"version":"3.50.1"},"reference-count":36,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2022,4,1]],"date-time":"2022-04-01T00:00:00Z","timestamp":1648771200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100023699","name":"HDR UK","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100023699","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100011693","name":"Department for Business, Energy and Industrial Strategy, UK Government","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100011693","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Health Informatics J"],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:p>Digital health applications can improve quality and effectiveness of healthcare, by offering a number of new tools to users, which are often considered a medical device. Assuring their safe operation requires, amongst others, clinical validation, needing large datasets to test them in realistic clinical scenarios. Access to datasets is challenging, due to patient privacy concerns. Development of synthetic datasets is seen as a potential alternative. The objective of the paper is the development of a method for the generation of realistic synthetic datasets, statistically equivalent to real clinical datasets, and demonstrate that the Generative Adversarial Network (GAN) based approach is fit for purpose. A generative adversarial network was implemented and trained, in a series of six experiments, using numerical and categorical variables, including ICD-9 and laboratory codes, from three clinically relevant datasets. A number of contextual steps provided the success criteria for the synthetic dataset. A synthetic dataset that exhibits very similar statistical characteristics with the real dataset was generated. Pairwise association of variables is very similar. A high degree of Jaccard similarity and a successful K-S test further support this. The proof of concept of generating realistic synthetic datasets was successful, with the approach showing promise for further work.<\/jats:p>","DOI":"10.1177\/14604582221077000","type":"journal-article","created":{"date-parts":[[2022,4,13]],"date-time":"2022-04-13T04:45:23Z","timestamp":1649825123000},"update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":24,"title":["A method for machine learning generation of realistic synthetic datasets for validating healthcare applications"],"prefix":"10.1177","volume":"28","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5473-135X","authenticated-orcid":false,"given":"Theodoros N","family":"Arvanitis","sequence":"first","affiliation":[{"name":"Institute of Digital Healthcare, WMG, University of Warwick, Coventry, UK"}]},{"given":"Sean","family":"White","sequence":"additional","affiliation":[{"name":"Clinical Assurance Team, NHS Digital, Leeds, UK"}]},{"given":"Stuart","family":"Harrison","sequence":"additional","affiliation":[{"name":"Institute of Digital Healthcare, WMG, University of Warwick, Coventry, UK"}]},{"given":"Rupert","family":"Chaplin","sequence":"additional","affiliation":[{"name":"Data Science and Innovation, NHS Digital, London, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3437-6412","authenticated-orcid":false,"given":"George","family":"Despotou","sequence":"additional","affiliation":[{"name":"Institute of Digital Healthcare, WMG, University of Warwick, Coventry, UK"}]}],"member":"179","published-online":{"date-parts":[[2022,4,13]]},"reference":[{"key":"bibr1-14604582221077000","unstructured":"MHRA. Guidance: medical device stand-alone software including apps (including IVDMDs). Available: https:\/\/assets.publishing.service.gov.uk\/government\/uploads\/system\/uploads\/attachment_data\/file\/717865\/Software_flow_chart_Ed_1-05.pdf (10\/10\/2019)"},{"key":"bibr2-14604582221077000","unstructured":"Regulation (EU) 2017\/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, amending Directive 2001\/83\/EC, Regulation (EC) No 178\/2002 and Regulation (EC) No 1223\/2009 and repealing Council Directives 90\/385\/EEC and 93\/42\/EEC."},{"key":"bibr3-14604582221077000","volume-title":"Clinical Decision Support Software Draft Guidance for Industry and Food and Drug Administration Staff","author":"FDA"},{"key":"bibr4-14604582221077000","first-page":"1","volume":"22","author":"Bellovin SM","year":"2019","journal-title":"Stan Tech L Rev"},{"key":"bibr5-14604582221077000","doi-asserted-by":"publisher","DOI":"10.1186\/1472-6947-10-59"},{"key":"bibr6-14604582221077000","doi-asserted-by":"publisher","DOI":"10.5210\/ojphi.v1i1.2720"},{"key":"bibr7-14604582221077000","doi-asserted-by":"publisher","DOI":"10.1093\/jamia\/ocx079"},{"key":"bibr8-14604582221077000","doi-asserted-by":"publisher","DOI":"10.1186\/1472-6947-10-59"},{"key":"bibr11-14604582221077000","doi-asserted-by":"publisher","DOI":"10.3390\/data3030030"},{"key":"bibr9-14604582221077000","doi-asserted-by":"crossref","unstructured":"Begoli E, Brown K, Srinivas S, et al. A generator framework for high-volume, high-fidelity synthetic mental health notes. 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA. 2018: 951\u2013958.","DOI":"10.1109\/BigData.2018.8621981"},{"key":"bibr10-14604582221077000","doi-asserted-by":"publisher","DOI":"10.1038\/s41746-018-0070-0"},{"key":"bibr12-14604582221077000","doi-asserted-by":"crossref","unstructured":"McLachlan S, Dube K, Gallagher T. Using the CareMap with health incidents statistics for generating the realistic synthetic electronic healthcare record. 2016 IEEE International Conference on Healthcare Informatics (ICHI), Chicago, IL. 2016: 439\u2013448.","DOI":"10.1109\/ICHI.2016.83"},{"key":"bibr13-14604582221077000","doi-asserted-by":"publisher","DOI":"10.3390\/s19051181"},{"key":"bibr14-14604582221077000","unstructured":"Avino L, Ruffini M, Gavalda R. Generating synthetic but plausible healthcare record datasets. arXiv:1807.01514v1 [stat.ML] 4 Jul 2018"},{"key":"bibr15-14604582221077000","doi-asserted-by":"publisher","DOI":"10.1109\/CBMS.2019.00036"},{"key":"bibr16-14604582221077000","doi-asserted-by":"publisher","DOI":"10.1109\/AIKE.2019.00057"},{"key":"bibr17-14604582221077000","unstructured":"McLahlan S, Dube K, Gallagher T, et al. The ATEN framework for creating the realistic synthetic electronic health record. Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies, HEALTHINF, 5: 220\u2013230."},{"key":"bibr18-14604582221077000","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2018.10.130"},{"key":"bibr19-14604582221077000","doi-asserted-by":"crossref","unstructured":"Baowaly MK, Liu C, Chen K. Realistic data synthesis using enhanced generative adversarial networks. 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), Sardinia, Italy, 2019: 289\u2013292.","DOI":"10.1109\/AIKE.2019.00057"},{"key":"bibr20-14604582221077000","doi-asserted-by":"crossref","unstructured":"Norgaard S, Saeedi R, Sasani K, et al. Synthetic sensor data generation for health applications: a supervised deep learning approach. 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, 2018: 1164\u20131167.","DOI":"10.1109\/EMBC.2018.8512470"},{"key":"bibr21-14604582221077000","unstructured":"Guan J, Li R, Yu S, et al. Generation of synthetic electronic medical record text. arXiv:1812.02793 [cs.CL]"},{"key":"bibr22-14604582221077000","unstructured":"Choi E, Biswal S, Malin B, et al. Generating multi-label discrete patient records using generative adversarial networks. arXiv:1703.06490 [cs.LG]"},{"key":"bibr23-14604582221077000","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01216-8_14"},{"key":"bibr24-14604582221077000","doi-asserted-by":"crossref","unstructured":"Zare M, Wojtusiak J. Weighted Itemsets Error (WIE) approach for evaluating generated synthetic patient data. 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, 2018: 1017\u20131022.","DOI":"10.1109\/ICMLA.2018.00166"},{"key":"bibr25-14604582221077000","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2018.10.009"},{"key":"bibr26-14604582221077000","volume-title":"Hospital Admitted Patient Care Activity, 2017-18, Publication, Part of Hospital Admitted Patient Care Activity, National statistics","author":"NHS Digital"},{"key":"bibr27-14604582221077000","volume-title":"Hospital Accident and Emergency Activity, 2017-18, Publication, Part of Hospital Accident & Emergency Activity","author":"NHS Digital"},{"key":"bibr28-14604582221077000","doi-asserted-by":"publisher","DOI":"10.1038\/sdata.2016.35"},{"key":"bibr30-14604582221077000","first-page":"2672","volume-title":"Advances in neural information processing systems","author":"Goodfellow I","year":"2014"},{"key":"bibr31-14604582221077000","unstructured":"Mirza M, Osindero S. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784. 2014 Nov 6."},{"key":"bibr32-14604582221077000","unstructured":"Arjovsky M, Chintala S, Bottou L. Wasserstein gan. arXiv preprint arXiv:1701.07875. 2017 Jan 26."},{"key":"bibr33-14604582221077000","first-page":"5767","author":"Gulrajani I","year":"2017","journal-title":"In Advances in neural information processing systems."},{"key":"bibr29-14604582221077000","doi-asserted-by":"publisher","DOI":"10.1016\/j.csbj.2019.06.003"},{"key":"bibr34-14604582221077000","unstructured":"Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014 Dec 22."},{"key":"bibr35-14604582221077000","volume-title":"Searching for activation functions","author":"Ramachandran P"},{"key":"bibr36-14604582221077000","unstructured":"Despotou G, Harrison S, White S, et al. Safety justification of healthcare applications using synthetic datasets. In: The Importance of Health Informatics in Public Health during a Pandemic. Studies in Health Technology and Informatics, IOS Press, 272, pp. 35\u201338."}],"container-title":["Health Informatics Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/14604582221077000","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/14604582221077000","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/14604582221077000","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,2]],"date-time":"2025-03-02T05:03:00Z","timestamp":1740891780000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/14604582221077000"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4]]},"references-count":36,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["10.1177\/14604582221077000"],"URL":"https:\/\/doi.org\/10.1177\/14604582221077000","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.02.11.21250741","asserted-by":"object"}]},"ISSN":["1460-4582","1741-2811"],"issn-type":[{"value":"1460-4582","type":"print"},{"value":"1741-2811","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4]]},"article-number":"14604582221077000"}}