{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,15]],"date-time":"2026-05-15T15:31:42Z","timestamp":1778859102109,"version":"3.51.4"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,12,1]],"date-time":"2021-12-01T00:00:00Z","timestamp":1638316800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,12,13]],"date-time":"2021-12-13T00:00:00Z","timestamp":1639353600000},"content-version":"vor","delay-in-days":12,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Discov Artif Intell"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Synthetic data is a rapidly evolving field with growing interest from multiple industry stakeholders and European bodies. In particular, the pharmaceutical industry is starting to realise the value of synthetic data which is being utilised more prevalently as a method to optimise data utility and sharing, ultimately as an innovative response to the growing demand for improved privacy. Synthetic data is data generated by simulation, based upon and mirroring properties of an original dataset. Here, with supporting viewpoints from across the pharmaceutical industry, we set out to explore use cases for synthetic data across seven key but relatable areas for optimising data utility for improved data privacy and protection. We also discuss the various methods which can be used to produce a synthetic dataset and availability of metrics to ensure robust quality of generated synthetic datasets. Lastly, we discuss the potential merits, challenges and future direction of synthetic data within the pharmaceutical industry and the considerations for this privacy enhancing technology.<\/jats:p>","DOI":"10.1007\/s44163-021-00016-y","type":"journal-article","created":{"date-parts":[[2021,12,13]],"date-time":"2021-12-13T12:13:54Z","timestamp":1639397634000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":63,"title":["Synthetic data use: exploring use cases to optimise data utility"],"prefix":"10.1007","volume":"1","author":[{"given":"Stefanie","family":"James","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chris","family":"Harbron","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Janice","family":"Branson","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mimmi","family":"Sundler","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,12,13]]},"reference":[{"key":"16_CR1","unstructured":"Vivli. About Vivli: Overview\u2014Vivli; 2021. https:\/\/vivli.org\/about\/overview\/. Accessed 4 Oct 2021."},{"key":"16_CR2","unstructured":"Sdv.dev. The synthetic data vault. Put synthetic data to work!; 2021. https:\/\/sdv.dev\/. Accessed 4 Oct 2021."},{"key":"16_CR3","unstructured":"European Data Protection Supervisor. Is the future of privacy synthetic; 2021. https:\/\/edps.europa.eu\/press-publications\/press-news\/blog\/future-privacy-synthetic_en. Accessed 4 Oct 2021."},{"key":"16_CR4","unstructured":"AIMultiple.\u00a0The ultimate guide to synthetic data: uses, benefits & tools; 2021. https:\/\/research.aimultiple.com\/synthetic-data\/. Accessed 4 Oct 2021."},{"key":"16_CR5","unstructured":"Forrester.\u00a0AI 2.0: upgrade your enterprise with five next-generation ai advances; 2021. https:\/\/www.forrester.com\/report\/AI-20-Upgrade-Your-Enterprise-With-Five-NextGeneration-AI-Advances\/RES163520?objectid=RES163520. Accessed 29 Nov 2021."},{"key":"16_CR6","unstructured":"Cprd.com. Synthetic data | CPRD; 2021. https:\/\/www.cprd.com\/content\/synthetic-data. Accessed 4 Oct 2021."},{"key":"16_CR7","unstructured":"Iknl.nl. Synthetische dataset NKR beschikbaar voor onderzoekers; 2021. https:\/\/iknl.nl\/nieuws\/2021\/synthetische-data-nkr-beschikbaar-voor-onderzoeker. Accessed 22 Oct 2021."},{"key":"16_CR8","unstructured":"Ico.org.uk.\u00a0Privacy attacks on AI models; 2021. https:\/\/ico.org.uk\/about-the-ico\/news-and-events\/ai-blog-privacy-attacks-on-ai-models\/. Accessed 29 Nov 2021."},{"key":"16_CR9","unstructured":"OneTrust Data Guidance. Norway: datatilsynet fines NIF NOK 1.2M for disclosing personal data of 3.2M individuals; 2021. https:\/\/www.dataguidance.com\/news\/norway-datatilsynet-fines-nif-nok-12m-disclosing. Accessed 4 Oct 2021."},{"key":"16_CR10","unstructured":"General Data Protection Regulation (GDPR). Chapter 2\u2014principles\u2014General Data Protection Regulation (GDPR). https:\/\/gdpr-info.eu\/chapter-2\/. Accessed 3 Nov 2021."},{"key":"16_CR11","unstructured":"General Data Protection Regulation (GDPR). Art. 5 GDPR\u2014principles relating to processing of personal data\u2014General Data Protection Regulation (GDPR). https:\/\/gdpr-info.eu\/art-5-gdpr\/. Accessed 3 Nov 2021."},{"issue":"4","key":"16_CR12","doi-asserted-by":"publisher","first-page":"e043497","DOI":"10.1136\/bmjopen-2020-043497","volume":"11","author":"Z Azizi","year":"2021","unstructured":"Azizi Z, Zheng C, Mosquera L, Pilote L, El Emam K. Can synthetic data be a proxy for real clinical trial data? A validation study. BMJ Open. 2021;11(4):e043497.","journal-title":"BMJ Open"},{"key":"16_CR13","unstructured":"healthdatainsight.org.uk. The Simulacrum\u2014healthdatainsight.org.uk; 2021. https:\/\/healthdatainsight.org.uk\/project\/the-simulacrum\/. Accessed 4 Oct 2021."},{"key":"16_CR14","unstructured":"Replica-analytics.com. Replica analytics | resources | knowledgebase; 2021. https:\/\/replica-analytics.com\/knowledgebase. Accessed 4 Oct 2021."},{"key":"16_CR15","doi-asserted-by":"publisher","first-page":"14","DOI":"10.4018\/978-1-59140-471-2.ch003","volume":"14","author":"R Wilson","year":"2003","unstructured":"Wilson R, Rosen P. Protecting data through perturbation techniques: the impact on knowledge discovery in databases. J Database Manag. 2003;14:14\u201326. https:\/\/doi.org\/10.4018\/978-1-59140-471-2.ch003.","journal-title":"J Database Manag"},{"key":"16_CR16","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1038\/s41746-020-00353-9","volume":"3","author":"A Tucker","year":"2020","unstructured":"Tucker A, Wang Z, Rotalinti Y, et al. Generating high-fidelity synthetic patient data for assessing machine learning healthcare software. NPJ Digit Med. 2020;3:147. https:\/\/doi.org\/10.1038\/s41746-020-00353-9.","journal-title":"NPJ Digit Med"},{"key":"16_CR17","unstructured":"Legislation.gov.uk. Regulation (EU) 2016\/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95\/46\/EC (General Data Protection Regulation) (Text with EEA relevance); 2021. https:\/\/www.legislation.gov.uk\/eur\/2016\/679\/contents. Accessed 4 Oct 2021."},{"issue":"3","key":"16_CR18","first-page":"411","volume":"66","author":"MA Babyak","year":"2004","unstructured":"Babyak MA. What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models. Psychosom Med. 2004;66(3):411\u201321.","journal-title":"Psychosom Med"},{"key":"16_CR19","unstructured":"Langberg H, Hvidbak T, Closter Jespersen M. Synthetic health data Hackathon. Deloitte and Rigshospitalet, 2021, p. 8."},{"issue":"3","key":"16_CR20","doi-asserted-by":"crossref","first-page":"281","DOI":"10.69554\/JCFU2737","volume":"3","author":"S Bamford","year":"2021","unstructured":"Bamford S. Applications of privacy-enhancing technology to data sharing at a global pharmaceutical company. J Data Protect Privacy. 2021;3(3):281\u201390.","journal-title":"J Data Protect Privacy"},{"key":"#cr-split#-16_CR21.1","unstructured":"Andrew White. By 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated-Andrew White"},{"key":"#cr-split#-16_CR21.2","unstructured":"2021. https:\/\/blogs.gartner.com\/andrew_white\/2021\/07\/24\/by-2024-60-of-the-data-used-for-the-development-of-ai-and-analytics-projects-will-be-synthetically-generated\/. Accessed 22 Oct 2021."},{"key":"16_CR22","doi-asserted-by":"publisher","first-page":"e1280","DOI":"10.1002\/widm.1280","volume":"8","author":"A Zimek","year":"2018","unstructured":"Zimek A, Filzmoser P. There and back again: outlier detection between statistical reasoning and data mining algorithms. Wiley Interdiscip Rev Data Mining Knowl Discov. 2018;8:e1280.","journal-title":"Wiley Interdiscip Rev Data Mining Knowl Discov"},{"key":"16_CR23","unstructured":"Peachey J, Li G, Chew P, Manak D.\u00a0Faster and cheaper clinical trials. The benefit of synthetic data. [ebook] Accenture; 2021. https:\/\/www.accenture.com\/_acnmedia\/PDF-148\/Accenture-Insilico-Faster-And-Cheaper.pdf. Accessed 7 Oct 2021."},{"issue":"6","key":"16_CR24","doi-asserted-by":"publisher","first-page":"1002","DOI":"10.1002\/pst.2120","volume":"20","author":"H Burger","year":"2021","unstructured":"Burger H, Gerlinger C, Harbron C, Koch A, Posch M, Rochon J, Schiel A. The use of external controls: to what extent can it currently be recommended? Pharm Stat. 2021;20(6):1002\u201316.","journal-title":"Pharm Stat"},{"key":"16_CR25","unstructured":"Deloitte Insights.\u00a0Global Human Capital Trends 2017; 2021. https:\/\/www2.deloitte.com\/us\/en\/insights\/focus\/human-capital-trends\/2017.html. Accessed 15 Oct 2021."},{"key":"16_CR26","unstructured":"van der Schaar Lab.\u00a0Synthetic data: breaking the data logjam in machine learning for healthcare \/\/ van der Schaar Lab; 2021. https:\/\/www.vanderschaar-lab.com\/synthetic-data-breaking-the-data-logjam-in-machine-learning-for-healthcare\/. Accessed 4 Oct 2021."}],"container-title":["Discover Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44163-021-00016-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s44163-021-00016-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44163-021-00016-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,14]],"date-time":"2024-09-14T09:09:21Z","timestamp":1726304961000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s44163-021-00016-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12]]},"references-count":27,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["16"],"URL":"https:\/\/doi.org\/10.1007\/s44163-021-00016-y","relation":{},"ISSN":["2731-0809"],"issn-type":[{"value":"2731-0809","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12]]},"assertion":[{"value":"12 November 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 December 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 December 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no competing interests to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"15"}}