{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T09:22:07Z","timestamp":1774776127195,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1011584","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2023,11,9]],"date-time":"2023-11-09T00:00:00Z","timestamp":1699488000000}}],"reference-count":45,"publisher":"Public Library of Science (PLoS)","issue":"10","license":[{"start":{"date-parts":[[2023,10,30]],"date-time":"2023-10-30T00:00:00Z","timestamp":1698624000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-20-CE45-0010-01 RoDAPoG"],"award-info":[{"award-number":["ANR-20-CE45-0010-01 RoDAPoG"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-20-CE45-0010-01 RoDAPoG"],"award-info":[{"award-number":["ANR-20-CE45-0010-01 RoDAPoG"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-20-CE45-0010-01 RoDAPoG"],"award-info":[{"award-number":["ANR-20-CE45-0010-01 RoDAPoG"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-20-CE45-0010-01 RoDAPoG"],"award-info":[{"award-number":["ANR-20-CE45-0010-01 RoDAPoG"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-20-CE45-0010-01 RoDAPoG"],"award-info":[{"award-number":["ANR-20-CE45-0010-01 RoDAPoG"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-20-CE45-0010-01 RoDAPoG"],"award-info":[{"award-number":["ANR-20-CE45-0010-01 RoDAPoG"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100012818","name":"Comunidad de Madrid","doi-asserted-by":"publisher","award":["Refs. 2019-T1\/TIC-13298"],"award-info":[{"award-number":["Refs. 2019-T1\/TIC-13298"]}],"id":[{"id":"10.13039\/100012818","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Banco Santander and the UCM","award":["grant PR44\/21-29937"],"award-info":[{"award-number":["grant PR44\/21-29937"]}]},{"DOI":"10.13039\/501100008530","name":"Fondo Europeo de Desarrollo Regional","doi-asserted-by":"crossref","award":["PID2021-125506NA-I00"],"award-info":[{"award-number":["PID2021-125506NA-I00"]}],"id":[{"id":"10.13039\/501100008530","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-11-LABEX-0045-DIGICOSME"],"award-info":[{"award-number":["ANR-11-LABEX-0045-DIGICOSME"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-11-IDEX-0003-02"],"award-info":[{"award-number":["ANR-11-IDEX-0003-02"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Applications of generative models for genomic data have gained significant momentum in the past few years, with scopes ranging from data characterization to generation of genomic segments and functional sequences. In our previous study, we demonstrated that generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be used to create novel high-quality artificial genomes (AGs) which can preserve the complex characteristics of real genomes such as population structure, linkage disequilibrium and selection signals. However, a major drawback of these models is scalability, since the large feature space of genome-wide data increases computational complexity vastly. To address this issue, we implemented a novel convolutional Wasserstein GAN (WGAN) model along with a novel conditional RBM (CRBM) framework for generating AGs with high SNP number. These networks implicitly learn the varying landscape of haplotypic structure in order to capture complex correlation patterns along the genome and generate a wide diversity of plausible haplotypes. We performed comparative analyses to assess both the quality of these generated haplotypes and the amount of possible privacy leakage from the training data. As the importance of genetic privacy becomes more prevalent, the need for effective privacy protection measures for genomic data increases. We used generative neural networks to create large artificial genome segments which possess many characteristics of real genomes without substantial privacy leakage from the training dataset. In the near future, with further improvements in haplotype quality and privacy preservation, large-scale artificial genome databases can be assembled to provide easily accessible surrogates of real databases, allowing researchers to conduct studies with diverse genomic data within a safe ethical framework in terms of donor privacy.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1011584","type":"journal-article","created":{"date-parts":[[2023,10,30]],"date-time":"2023-10-30T14:05:32Z","timestamp":1698674732000},"page":"e1011584","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":21,"title":["Deep convolutional and conditional neural networks for large-scale genomic data generation"],"prefix":"10.1371","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0731-0223","authenticated-orcid":true,"given":"Burak","family":"Yelmen","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3017-0858","authenticated-orcid":true,"given":"Aur\u00e9lien","family":"Decelle","sequence":"additional","affiliation":[]},{"given":"Leila Lea","family":"Boulos","sequence":"additional","affiliation":[]},{"given":"Antoine","family":"Szatkownik","sequence":"additional","affiliation":[]},{"given":"Cyril","family":"Furtlehner","sequence":"additional","affiliation":[]},{"given":"Guillaume","family":"Charpiat","sequence":"additional","affiliation":[]},{"given":"Flora","family":"Jay","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2023,10,30]]},"reference":[{"key":"pcbi.1011584.ref001","article-title":"Deep learning for population size history inference: Design, comparison and combination with approximate Bayesian computation","author":"T Sanchez","year":"2020","journal-title":"Molecular Ecology Resources"},{"key":"pcbi.1011584.ref002","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1007\/978-1-0716-0199-0_5","volume-title":"Statistical Population Genomics. Methods in Molecular Biology","author":"A Koropoulis","year":"2020"},{"key":"pcbi.1011584.ref003","doi-asserted-by":"crossref","DOI":"10.3389\/fgene.2020.00350","article-title":"Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci","volume":"11","author":"HL Nicholls","year":"2020","journal-title":"Frontiers in Genetics"},{"key":"pcbi.1011584.ref004","doi-asserted-by":"crossref","first-page":"5762","DOI":"10.1016\/j.csbj.2021.10.009","article-title":"AI applications in functional genomics","volume":"19","author":"C Caudai","year":"2021","journal-title":"Computational and Structural Biotechnology Journal"},{"key":"pcbi.1011584.ref005","article-title":"Deep learning in population genetics","author":"K Korfmann","year":"2023","journal-title":"Genome Biology and Evolution"},{"issue":"1","key":"pcbi.1011584.ref006","doi-asserted-by":"crossref","first-page":"null","DOI":"10.1146\/annurev-biodatasci-020722-115651","article-title":"An Overview of Deep Generative Models in Functional and Evolutionary Genomics","volume":"6","author":"B Yelmen","year":"2023","journal-title":"Annual Review of Biomedical Data Science"},{"key":"pcbi.1011584.ref007","unstructured":"Killoran N, Lee LJ, Delong A, Duvenaud D, Frey BJ. Generating and designing DNA with deep generative models; 2017. Available from: http:\/\/arxiv.org\/abs\/1712.06148."},{"key":"pcbi.1011584.ref008","author":"WW Booker","year":"2022","journal-title":"This population doesn\u2019t exist: learning the distribution of evolutionary histories with generative adversarial networks"},{"key":"pcbi.1011584.ref009","doi-asserted-by":"crossref","unstructured":"Perera M, Montserrat DM, Barrab\u00e9s M, Geleta M, Gir\u00f3-I-Nieto X, Ioannidis AG. Generative Moment Matching Networks for Genotype Simulation. In: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC); 2022. p. 1379\u20131383.","DOI":"10.1109\/EMBC48229.2022.9871045"},{"key":"pcbi.1011584.ref010","doi-asserted-by":"crossref","unstructured":"Das S, Shi X. Offspring GAN augments biased human genomic data. In: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. BCB\u201922. New York, NY, USA; 2022. p. 1\u201310. Available from: https:\/\/doi.org\/10.1145\/3535508.3545537.","DOI":"10.1145\/3535508.3545537"},{"key":"pcbi.1011584.ref011","unstructured":"Montserrat DM, Bustamante C, Ioannidis A. Class-Conditional VAE-GAN for Local-Ancestry Simulation; 2019. Available from: http:\/\/arxiv.org\/abs\/1911.13220."},{"key":"pcbi.1011584.ref012","unstructured":"Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative Adversarial Networks; 2014. Available from: http:\/\/arxiv.org\/abs\/1406.2661."},{"key":"pcbi.1011584.ref013","unstructured":"Radford A, Metz L, Chintala S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks; 2016. Available from: http:\/\/arxiv.org\/abs\/1511.06434."},{"key":"pcbi.1011584.ref014","doi-asserted-by":"crossref","unstructured":"Guo J, Lu S, Cai H, Zhang W, Yu Y, Wang J. Long Text Generation via Adversarial Training with Leaked Information; 2017. Available from: http:\/\/arxiv.org\/abs\/1709.08624.","DOI":"10.1609\/aaai.v32i1.11957"},{"issue":"1","key":"pcbi.1011584.ref015","doi-asserted-by":"crossref","first-page":"5684","DOI":"10.1038\/s41467-021-26017-0","article-title":"VEGA is an interpretable generative model for inferring biological network activity in single-cell transcriptomics","volume":"12","author":"L Seninge","year":"2021","journal-title":"Nature Communications"},{"key":"pcbi.1011584.ref016","unstructured":"Yoon J, Jordon J, van der Schaar M. GAIN: Missing Data Imputation using Generative Adversarial Nets; 2018. Available from: http:\/\/arxiv.org\/abs\/1806.02920."},{"issue":"6","key":"pcbi.1011584.ref017","first-page":"e48316","article-title":"Re-identifiability of genomic data and the GDPR","volume":"20","author":"M Shabani","year":"2019","journal-title":"EMBO reports"},{"issue":"5","key":"pcbi.1011584.ref018","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1038\/gim.2017.141","article-title":"Impact of HIPAA\u2019s minimum necessary standard on genomic data sharing","volume":"20","author":"BJ Evans","year":"2018","journal-title":"Genetics in Medicine"},{"issue":"1","key":"pcbi.1011584.ref019","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1472-6947-14-S1-S2","article-title":"Differentially private genome data dissemination through top-down specialization","volume":"14","author":"S Wang","year":"2014","journal-title":"BMC Medical Informatics and Decision Making"},{"issue":"11","key":"pcbi.1011584.ref020","doi-asserted-by":"crossref","first-page":"909","DOI":"10.1038\/s42256-022-00551-y","article-title":"Federated learning and Indigenous genomic data sovereignty","volume":"4","author":"N Boscarino","year":"2022","journal-title":"Nature Machine Intelligence"},{"issue":"2","key":"pcbi.1011584.ref021","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pgen.1009303","article-title":"Creating artificial human genomes using generative neural networks","volume":"17","author":"B Yelmen","year":"2021","journal-title":"PLOS Genetics"},{"key":"pcbi.1011584.ref022","unstructured":"Arjovsky M, Chintala S, Bottou L. Wasserstein GAN; 2017. Available from: https:\/\/arxiv.org\/abs\/1701.07875."},{"key":"pcbi.1011584.ref023","volume-title":"Advances in Neural Information Processing Systems","author":"GW Taylor","year":"2006"},{"key":"pcbi.1011584.ref024","first-page":"5345","volume-title":"Advances in Neural Information Processing Systems","author":"A Decelle","year":"2021"},{"key":"pcbi.1011584.ref025","unstructured":"Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A. Improved training of wasserstein GANs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS\u201917. Red Hook, NY, USA; 2017. p. 5769\u20135779."},{"key":"pcbi.1011584.ref026","doi-asserted-by":"crossref","unstructured":"Karras T, Laine S, Aila T. A Style-Based Generator Architecture for Generative Adversarial Networks; 2019. Available from: http:\/\/arxiv.org\/abs\/1812.04948.","DOI":"10.1109\/CVPR.2019.00453"},{"issue":"3","key":"pcbi.1011584.ref027","doi-asserted-by":"crossref","DOI":"10.1093\/g3journal\/jkac020","article-title":"A deep learning framework for characterization of genotype data","volume":"12","author":"K Ausmees","year":"2022","journal-title":"G3 Genes|Genomes|Genetics"},{"key":"pcbi.1011584.ref028","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition; 2015. Available from: http:\/\/arxiv.org\/abs\/1512.03385.","DOI":"10.1109\/CVPR.2016.90"},{"key":"pcbi.1011584.ref029","unstructured":"Lin Z, Khetan A, Fanti G, Oh S. PacGAN: The power of two samples in generative adversarial networks; 2018. Available from: http:\/\/arxiv.org\/abs\/1712.04086."},{"issue":"1","key":"pcbi.1011584.ref030","doi-asserted-by":"crossref","first-page":"014110","DOI":"10.1103\/PhysRevE.108.014110","article-title":"Unsupervised hierarchical clustering using the learning dynamics of restricted Boltzmann machines","volume":"108","author":"A Decelle","year":"2023","journal-title":"Physical Review E"},{"key":"pcbi.1011584.ref031","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"Consortium TGP","year":"2015","journal-title":"Nature"},{"key":"pcbi.1011584.ref032","unstructured":"Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library; 2019. Available from: http:\/\/arxiv.org\/abs\/1912.01703."},{"issue":"8","key":"pcbi.1011584.ref033","doi-asserted-by":"crossref","first-page":"1771","DOI":"10.1162\/089976602760128018","article-title":"Training Products of Experts by Minimizing Contrastive Divergence","volume":"14","author":"GE Hinton","year":"2002","journal-title":"Neural Computation"},{"key":"pcbi.1011584.ref034","unstructured":"Agoritsas E, Catania G, Decelle A, Seoane B. Explaining the effects of non-convergent sampling in the training of Energy-Based Models; 2023. Available from: http:\/\/arxiv.org\/abs\/2301.09428."},{"key":"pcbi.1011584.ref035","unstructured":"Fissore G, Decelle A, Furtlehner C, Han Y. Robust Multi-Output Learning with Highly Incomplete Data via Restricted Boltzmann Machines; 2019. Available from: http:\/\/arxiv.org\/abs\/1912.09382."},{"key":"pcbi.1011584.ref036","unstructured":"Kingma DP, Welling M. Auto-Encoding Variational Bayes; 2022. Available from: http:\/\/arxiv.org\/abs\/1312.6114."},{"key":"pcbi.1011584.ref037","doi-asserted-by":"crossref","unstructured":"Yale A, Dash S, Dutta R, Guyon I, Pavao A, Bennett KP. Privacy Preserving Synthetic Health Data; 2019. Available from: https:\/\/hal.inria.fr\/hal-02160496.","DOI":"10.1016\/j.neucom.2019.12.136"},{"key":"pcbi.1011584.ref038","doi-asserted-by":"crossref","DOI":"10.24072\/pcjournal.72","article-title":"Simulation of bacterial populations with SLiM","volume":"2","author":"J Cury","year":"2022","journal-title":"Peer Community Journal"},{"key":"pcbi.1011584.ref039","doi-asserted-by":"crossref","unstructured":"Hayes J, Melis L, Danezis G, De Cristofaro E. LOGAN: Membership Inference Attacks Against Generative Models; 2018. Available from: http:\/\/arxiv.org\/abs\/1705.07663.","DOI":"10.2478\/popets-2019-0008"},{"issue":"1","key":"pcbi.1011584.ref040","doi-asserted-by":"crossref","DOI":"10.1093\/g3journal\/jkaa036","article-title":"Visualizing population structure with variational autoencoders","volume":"11","author":"CJ Battey","year":"2021","journal-title":"G3 Genes|Genomes|Genetics"},{"issue":"8","key":"pcbi.1011584.ref041","doi-asserted-by":"crossref","first-page":"2689","DOI":"10.1111\/1755-0998.13386","article-title":"Automatic inference of demographic parameters using generative adversarial networks","volume":"21","author":"Z Wang","year":"2021","journal-title":"Molecular Ecology Resources"},{"issue":"2","key":"pcbi.1011584.ref042","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1038\/s41592-020-01008-z","article-title":"Automated Design of Deep Learning Methods for Biomedical Image Segmentation","volume":"18","author":"F Isensee","year":"2021","journal-title":"Nature Methods"},{"key":"pcbi.1011584.ref043","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TETC.2022.3218372","article-title":"I Choose You: Automated Hyperparameter Tuning for Deep Learning-based Side-channel Analysis","author":"L Wu","year":"2022","journal-title":"IEEE Transactions on Emerging Topics in Computing"},{"key":"pcbi.1011584.ref044","doi-asserted-by":"crossref","unstructured":"B\u00e9reux N, Decelle A, Furtlehner C, Seoane B. Learning a Restricted Boltzmann Machine using biased Monte Carlo sampling; 2022. Available from: http:\/\/arxiv.org\/abs\/2206.01310.","DOI":"10.21468\/SciPostPhys.14.3.032"},{"key":"pcbi.1011584.ref045","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1016\/j.neunet.2022.06.022","article-title":"Privacy preserving Generative Adversarial Networks to model Electronic Health Records","volume":"153","author":"R Venugopal","year":"2022","journal-title":"Neural Networks"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1011584","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2023,11,9]],"date-time":"2023-11-09T00:00:00Z","timestamp":1699488000000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1011584","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,9]],"date-time":"2023-11-09T13:57:29Z","timestamp":1699538249000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1011584"}},"subtitle":[],"editor":[{"given":"Piero","family":"Fariselli","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,10,30]]},"references-count":45,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2023,10,30]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1011584","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.03.07.530442","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,30]]}}}