{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T09:22:08Z","timestamp":1774776128926,"version":"3.50.1"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2025,11,4]],"date-time":"2025-11-04T00:00:00Z","timestamp":1762214400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000943","name":"CSIRO","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000943","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Machine-generated or synthetic data is a valuable resource for training artificial intelligence algorithms, evaluating rare workflows, and sharing data under stricter data legislations. However, current statistical and deep learning methods struggle with large data volumes, are prone to hallucinating scenarios incompatible with reality, and seldom quantify privacy meaningfully.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here, we introduce Genomator, a logic solving approach (SAT solving), which efficiently produces private and realistic representations of the original data. We demonstrate the method on genomic data, which arguably is the most complex and private information. We benchmark Genomator against state-of-the-art methodologies (Markov generation, Wasserstein Generative Adversarial Network and Conditional Restricted Boltzmann Machines), demonstrating a 40%\u2013530% accuracy improvement and 57%\u2013172% higher privacy. Genomator is also 3\u2013100 times more efficient, making it the only tested method that scales to whole genomes. We show the universal trade-off between privacy and accuracy, and use Genomator\u2019s tuning capability to cater to all applications along the spectrum, from provable private representations of sensitive cohorts, to datasets with indistinguishable pharmacogenomic profiles. Demonstrating the production-scale generation of tuneable synthetic genomes hold great potential for balancing underrepresented populations in medical research and advancing global data exchange.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Genomator is available at https:\/\/github.com\/csiro\/genomator.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf600","type":"journal-article","created":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T13:14:53Z","timestamp":1761830093000},"source":"Crossref","is-referenced-by-count":1,"title":["Privacy-hardened and hallucination-resistant synthetic data generation with logic-solvers"],"prefix":"10.1093","volume":"41","author":[{"given":"Mark A","family":"Burgess","sequence":"first","affiliation":[{"name":"Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation , Canberra, 2601,","place":["Australia"]}]},{"given":"Brendan","family":"Hosking","sequence":"additional","affiliation":[{"name":"Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation , Sydney, 2145,","place":["Australia"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0350-3899","authenticated-orcid":false,"given":"Roc","family":"Reguant","sequence":"additional","affiliation":[{"name":"Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation , Sydney, 2145,","place":["Australia"]}]},{"given":"Anubhav","family":"Kaphle","sequence":"additional","affiliation":[{"name":"Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation , Melbourne, 3052,","place":["Australia"]}]},{"given":"Mitchell J","family":"O\u2019Brien","sequence":"additional","affiliation":[{"name":"Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation , Sydney, 2145,","place":["Australia"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0217-9927","authenticated-orcid":false,"given":"Letitia M F","family":"Sng","sequence":"additional","affiliation":[{"name":"Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation , Sydney, 2145,","place":["Australia"]}]},{"given":"Yatish","family":"Jain","sequence":"additional","affiliation":[{"name":"Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation , Sydney, 2145,","place":["Australia"]},{"name":"Applied BioSciences, Faculty of Science and Engineering, Macquarie University , Macquarie Park, 2109,","place":["Australia"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8033-9810","authenticated-orcid":false,"given":"Denis C","family":"Bauer","sequence":"additional","affiliation":[{"name":"Applied BioSciences, Faculty of Science and Engineering, Macquarie University , Macquarie Park, 2109,","place":["Australia"]},{"name":"Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation , Adelaide, 5000,","place":["Australia"]},{"name":"Department of Biomedical Informatics and Digital Health, School of Medical Sciences, University of Sydney , Sydney, 2050,","place":["Australia"]},{"name":"The University of Adelaide, Australian Institute for Machine Learning , Adelaide, 5000,","place":["Australia"]}]}],"member":"286","published-online":{"date-parts":[[2025,11,4]]},"reference":[{"key":"2025122218562945300_btaf600-B1","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"1000 Genomes Project Consortium; Auton A, Brooks LD, Durbin RM","year":"2015","journal-title":"Nature"},{"key":"2025122218562945300_btaf600-B2","doi-asserted-by":"crossref","DOI":"10.3233\/FAIA336","volume-title":"Handbook of Satisfiability","author":"Biere","year":"2021"},{"key":"2025122218562945300_btaf600-B3","doi-asserted-by":"crossref","first-page":"646","DOI":"10.1038\/s41588-020-0651-0","article-title":"Privacy challenges and research opportunities for genomic data sharing","volume":"52","author":"Bonomi","year":"2020","journal-title":"Nat Genet"},{"key":"2025122218562945300_btaf600-B4","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1186\/s13742-015-0047-8","article-title":"Second-generation PLINK: rising to the challenge of larger and richer datasets","volume":"4","author":"Chang","year":"2015","journal-title":"Gigascience"},{"key":"2025122218562945300_btaf600-B5","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1038\/s41551-021-00751-8","article-title":"Synthetic data in machine learning for medicine and healthcare","volume":"5","author":"Chen","year":"2021","journal-title":"Nat Biomed Eng"},{"key":"2025122218562945300_btaf600-B6","doi-asserted-by":"crossref","first-page":"643","DOI":"10.1002\/1097-0142(19940201)73:3<643::AID-CNCR2820730323>3.0.CO;2-5","article-title":"Autosomal dominant inheritance of early-onset breast cancer implications for risk prediction","volume":"73","author":"Claus","year":"1994","journal-title":"Cancer"},{"key":"2025122218562945300_btaf600-B7","doi-asserted-by":"crossref","first-page":"13149","DOI":"10.3390\/ijms232113149","article-title":"What is a digital twin? Experimental design for a data-centric machine learning perspective in health","volume":"23","author":"Emmert-Streib","year":"2022","journal-title":"Int J Mol Sci"},{"key":"2025122218562945300_btaf600-B8","first-page":"1","article-title":"POT: Python optimal transport","volume":"22","author":"Flamary","year":"2021","journal-title":"J Mach Learn Res"},{"key":"2025122218562945300_btaf600-B9","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1038\/tpj.2015.70","article-title":"Interethnic variation of CYP2C19 alleles, \u2018predicted\u2019 phenotypes and \u2018measured\u2019 metabolic phenotypes across world populations","volume":"16","author":"Fricke-Galindo","year":"2016","journal-title":"Pharmacogenomics J"},{"key":"2025122218562945300_btaf600-B10","doi-asserted-by":"crossref","first-page":"312","DOI":"10.56553\/popets-2023-0055","article-title":"A unified framework for quantifying privacy risk in synthetic data","volume":"2023","author":"Giomi","year":"2023","journal-title":"PoPETs"},{"key":"2025122218562945300_btaf600-B11","doi-asserted-by":"crossref","first-page":"e0000082","DOI":"10.1371\/journal.pdig.0000082","article-title":"Synthetic data in health care: a narrative review","volume":"2","author":"Gonzales","year":"2023","journal-title":"PLOS Digit Health"},{"key":"2025122218562945300_btaf600-B12","first-page":"428","author":"Ignatiev","year":"2018"},{"key":"2025122218562945300_btaf600-B13","doi-asserted-by":"crossref","first-page":"959","DOI":"10.1002\/cpt.2526","article-title":"Clinical pharmacogenetics implementation consortium guideline for CYP2C19 genotype and clopidogrel therapy: 2022 update","volume":"112","author":"Lee","year":"2022","journal-title":"Clin Pharmacol Ther"},{"key":"2025122218562945300_btaf600-B14","doi-asserted-by":"crossref","first-page":"3827","DOI":"10.1182\/blood-2009-12-255992","article-title":"Warfarin pharmacogenetics: a single VKORC1 polymorphism is predictive of dose across 3 racial groups","volume":"115","author":"Limdi","year":"2010","journal-title":"Blood"},{"key":"2025122218562945300_btaf600-B15","doi-asserted-by":"crossref","first-page":"100129","DOI":"10.1016\/j.xgen.2022.100129","article-title":"PrecisionFDA truth challenge V2: calling variants from short and long reads in difficult-to-map regions","volume":"2","author":"Olson","year":"2022","journal-title":"Cell Genom"},{"key":"2025122218562945300_btaf600-B16","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1093\/idpl\/ipad002","article-title":"Protection of genomic data and the Australian privacy act: when are genomic data \u2018personal information\u2019?","volume":"13","author":"Paltiel","year":"2023","journal-title":"Int Data Privacy Law"},{"key":"2025122218562945300_btaf600-B17","author":"POT: Python Optimal Transport\u2013POT Python Optimal Transport 0.9.3 documentation"},{"key":"2025122218562945300_btaf600-B18","author":"Ramos","year":"2021"},{"key":"2025122218562945300_btaf600-B19","doi-asserted-by":"crossref","first-page":"100029","DOI":"10.1016\/j.xgen.2021.100029","article-title":"GA4GH: international policies and standards for data sharing across genomic research and healthcare","volume":"1","author":"Rehm","year":"2021","journal-title":"Cell Genom"},{"key":"2025122218562945300_btaf600-B20","doi-asserted-by":"crossref","first-page":"839","DOI":"10.1534\/genetics.108.093153","article-title":"Linkage disequilibrium between loci with unknown phase","volume":"182","author":"Rogers","year":"2009","journal-title":"Genetics"},{"key":"2025122218562945300_btaf600-B21","article-title":"Anonymization and risk","author":"Rubinstein","journal-title":"91 Washington Law Review 703, NYU School of Law, Public Law Research Paper No. 15-36, 2016."},{"key":"2025122218562945300_btaf600-B22","doi-asserted-by":"crossref","first-page":"1017","DOI":"10.1038\/s41431-022-01113-x","article-title":"Recommendations for whole genome sequencing in diagnostics for rare diseases","volume":"30","author":"Souche","year":"2022","journal-title":"Eur J Hum Genet"},{"key":"2025122218562945300_btaf600-B23","first-page":"1451","article-title":"Synthetic data\u2014anonymisation groundhog day","volume":"2022","author":"Stadler","year":"2020","journal-title":"Proc 31st USENIX Secur Symp Secur"},{"key":"2025122218562945300_btaf600-B24","doi-asserted-by":"crossref","first-page":"2304","DOI":"10.1093\/bioinformatics\/btr341","article-title":"HAPGEN2: simulation of multiple disease SNPs","volume":"27","author":"Su","year":"2011","journal-title":"Bioinformatics"},{"key":"2025122218562945300_btaf600-B25","first-page":"629","article-title":"One hundred years of linkage disequilibrium","volume":"209","author":"Sved","year":"2018","journal-title":"Genetics"},{"key":"2025122218562945300_btaf600-B26","doi-asserted-by":"publisher","author":"Tao","year":"2021","DOI":"10.48550\/arXiv.2112.09238"},{"key":"2025122218562945300_btaf600-B27","author":"Three steps for businesses to make AI data and compute more sustainable\u2014OECD.AI"},{"key":"2025122218562945300_btaf600-B28","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1093\/bioinformatics\/bty643","article-title":"Re-identification of individuals in genomic data-sharing beacons via allele inference","volume":"35","author":"Von Thenen","year":"2019","journal-title":"Bioinformatics"},{"key":"2025122218562945300_btaf600-B29","doi-asserted-by":"publisher","author":"Wharrie","year":"2022","DOI":"10.1101\/2022.12.22.521552"},{"key":"2025122218562945300_btaf600-B30","doi-asserted-by":"publisher","author":"Xie","year":"2018","DOI":"10.48550\/arXiv.1802.06739"},{"key":"2025122218562945300_btaf600-B31","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1016\/j.neucom.2019.12.136","article-title":"Generation and evaluation of privacy preserving synthetic health data","volume":"416","author":"Yale","year":"2020","journal-title":"Neurocomputing (Amst)"},{"key":"2025122218562945300_btaf600-B32","doi-asserted-by":"crossref","first-page":"e1009303","DOI":"10.1371\/journal.pgen.1009303","article-title":"Creating artificial human genomes using generative neural networks","volume":"17","author":"Yelmen","year":"2021","journal-title":"PLoS Genet"},{"key":"2025122218562945300_btaf600-B33","doi-asserted-by":"crossref","first-page":"e1011584","DOI":"10.1371\/journal.pcbi.1011584","article-title":"Deep convolutional and conditional neural networks for large-scale genomic data generation","volume":"19","author":"Yelmen","year":"2023","journal-title":"PLoS Comput Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf600\/65175517\/btaf600.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf600\/65175517\/btaf600.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf600\/65175517\/btaf600.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,22]],"date-time":"2025-12-22T23:56:40Z","timestamp":1766447800000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf600\/8314204"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,11,4]]},"references-count":33,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf600","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,12]]},"published":{"date-parts":[[2025,11,4]]},"article-number":"btaf600"}}