{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:33:29Z","timestamp":1772138009591,"version":"3.50.1"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2020,10,2]],"date-time":"2020-10-02T00:00:00Z","timestamp":1601596800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Federal Ministry of Education and Research in Germany"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,7,20]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Deep generative models can be trained to represent the joint distribution of data, such as measurements of single nucleotide polymorphisms (SNPs) from several individuals. Subsequently, synthetic observations are obtained by drawing from this distribution. This has been shown to be useful for several tasks, such as removal of noise, imputation, for better understanding underlying patterns, or even exchanging data under privacy constraints. Yet, it is still unclear how well these approaches work with limited sample size. We investigate such settings specifically for binary data, e.g. as relevant when considering SNP measurements, and evaluate three frequently employed generative modeling approaches, variational autoencoders (VAEs), deep Boltzmann machines (DBMs) and generative adversarial networks (GANs). This includes conditional approaches, such as when considering gene expression conditional on SNPs. Recovery of pair-wise odds ratios (ORs) is considered as a primary performance criterion. For simulated as well as real SNP data, we observe that DBMs generally can recover structure for up to 300 variables, with a tendency of over-estimating ORs when not carefully tuned. VAEs generally get the direction and relative strength of pairwise relations right, yet with considerable under-estimation of ORs. GANs provide stable results only with larger sample sizes and strong pair-wise relations in the data. Taken together, DBMs and VAEs (in contrast to GANs) appear to be well suited for binary omics data, even at rather small sample sizes. This opens the way for many potential applications where synthetic observations from omics data might be useful.<\/jats:p>","DOI":"10.1093\/bib\/bbaa226","type":"journal-article","created":{"date-parts":[[2020,8,24]],"date-time":"2020-08-24T07:08:38Z","timestamp":1598252918000},"source":"Crossref","is-referenced-by-count":6,"title":["Synthetic observations from deep generative models and binary omics data with limited sample size"],"prefix":"10.1093","volume":"22","author":[{"given":"Jens","family":"Nu\u00dfberger","sequence":"first","affiliation":[{"name":"Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Germany"}]},{"given":"Frederic","family":"Boesel","sequence":"additional","affiliation":[{"name":"Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9135-1743","authenticated-orcid":false,"given":"Stefan","family":"Lenz","sequence":"additional","affiliation":[{"name":"Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5666-8662","authenticated-orcid":false,"given":"Harald","family":"Binder","sequence":"additional","affiliation":[{"name":"Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4021-1796","authenticated-orcid":false,"given":"Moritz","family":"Hess","sequence":"additional","affiliation":[{"name":"Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Germany"}]}],"member":"286","published-online":{"date-parts":[[2020,10,2]]},"reference":[{"key":"2021072112100944000_ref1","first-page":"448","article-title":"Deep Boltzmann machines","volume":"5","author":"Salakhutdinov","year":"2009","journal-title":"Artif Intell Stat"},{"key":"2021072112100944000_ref2","article-title":"Auto-encoding variational bayes","author":"Kingma","year":"2013","journal-title":"arXiv preprint arXiv:13126114"},{"key":"2021072112100944000_ref3","first-page":"2672","article-title":"Generative adversarial nets","volume":"27","author":"Goodfellow","year":"2014","journal-title":"Adv Neural Inform Process Syst"},{"key":"2021072112100944000_ref4","doi-asserted-by":"crossref","DOI":"10.1161\/CIRCOUTCOMES.118.005122","article-title":"Privacy-preserving generative deep neural networks support clinical data sharing","volume":"12","author":"Beaulieu-Jones","year":"2019","journal-title":"Circ Cardiovasc Qual Outcomes"},{"key":"2021072112100944000_ref5","article-title":"Creating artificial human genomes using generative models","author":"Yelmen","year":"2019","journal-title":"bioRxiv"},{"key":"2021072112100944000_ref6","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"Lopez","year":"2018","journal-title":"Nat Methods"},{"key":"2021072112100944000_ref7","doi-asserted-by":"crossref","first-page":"3743","DOI":"10.1093\/bioinformatics\/btz158","article-title":"Dr. VAE: improving drug response prediction via modeling of drug perturbation effects","volume":"35","author":"Ramp\u00e1\u0161ek","year":"2019","journal-title":"Bioinformatics"},{"key":"2021072112100944000_ref8","doi-asserted-by":"crossref","first-page":"3173","DOI":"10.1093\/bioinformatics\/btx408","article-title":"Partitioned learning of deep Boltzmann machines for SNP data","volume":"33","author":"Hess","year":"2017","journal-title":"Bioinformatics"},{"key":"2021072112100944000_ref9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-018-07931-2","article-title":"Single-cell RNA-Seq denoising using a deep count autoencoder","volume":"10","author":"Eraslan","year":"2019","journal-title":"Nat Commun"},{"key":"2021072112100944000_ref10","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1038\/s41576-019-0122-6","article-title":"Deep learning: new computational modelling techniques for genomics","volume":"20","author":"Eraslan","year":"2019","journal-title":"Nat Rev Genet"},{"key":"2021072112100944000_ref11","article-title":"Generating multi-label discrete patient records using generative adversarial networks","author":"Choi","year":"2017","journal-title":"arXiv preprint arXiv:170306490"},{"key":"2021072112100944000_ref12","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1038\/nature15394","article-title":"An integrated map of structural variation in 2,504 human genomes","volume":"526","author":"Sudmant","year":"2015","journal-title":"Nature"},{"key":"2021072112100944000_ref13","article-title":"A note on the evaluation of generative models","author":"Theis","year":"2015","journal-title":"arXiv preprint arXiv:151101844"},{"key":"2021072112100944000_ref14","doi-asserted-by":"crossref","first-page":"i603","DOI":"10.1093\/bioinformatics\/bty563","article-title":"Conditional generative adversarial network for gene expression inference","volume":"34","author":"Wang","year":"2018","journal-title":"Bioinformatics"},{"key":"2021072112100944000_ref15","first-page":"1125","article-title":"Image-to-image translation with conditional adversarial networks","author":"Isola","year":"2017","journal-title":"Proc IEEE Conf Comput Vis Pat Recogn"},{"key":"2021072112100944000_ref16","article-title":"Conditional generative adversarial nets","author":"Mirza","year":"2014","journal-title":"arXiv preprint arXiv:14111784"},{"key":"2021072112100944000_ref17","first-page":"2","article-title":"Conditional generative adversarial nets for convolutional face generation. Class project for Stanford CS231N","volume":"2014","author":"Gauthier","year":"2014","journal-title":"Convolut Neural Netw Vis Recogn, Winter Semester"},{"key":"2021072112100944000_ref18","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1207\/s15516709cog0901_7","article-title":"A learning algorithm for Boltzmann machines","volume":"9","author":"Ackley","year":"1985","journal-title":"Cognit Sci"},{"key":"2021072112100944000_ref19","doi-asserted-by":"crossref","first-page":"791","DOI":"10.1145\/1273496.1273596","article-title":"Restricted Boltzmann machines for collaborative filtering","author":"Salakhutdinov","year":"2007","journal-title":"Proceedings of the 24th International Conference on Machine Learning"},{"key":"2021072112100944000_ref20","doi-asserted-by":"crossref","first-page":"663","DOI":"10.1111\/rssa.12358","article-title":"General and specific utility measures for synthetic data","volume":"181","author":"Snoke","year":"2018","journal-title":"J R Stat Soc A Stat Soc"},{"key":"2021072112100944000_ref21","doi-asserted-by":"crossref","first-page":"1468","DOI":"10.1136\/bmj.320.7247.1468","article-title":"The odds ratio","volume":"320","author":"Bland","year":"2000","journal-title":"BMJ"},{"key":"2021072112100944000_ref22","first-page":"265","article-title":"TensorFlow: a system for large-scale machine learning","volume":"16","author":"Abadi","year":"2016","journal-title":"12th {Usenix} Symposium on Operating Systems Design and Implementation {Osdi}"},{"key":"2021072112100944000_ref23","doi-asserted-by":"crossref","first-page":"602","DOI":"10.21105\/joss.00602","article-title":"Flux: elegant machine learning with Julia","volume":"3","author":"Innes","year":"2018","journal-title":"J Open Source Softw"},{"key":"2021072112100944000_ref24","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1137\/141000671","article-title":"Julia: a fresh approach to numerical computing","volume":"59","author":"Bezanson","year":"2017","journal-title":"SIAM Rev"},{"key":"2021072112100944000_ref25","article-title":"Unsupervised deep learning on biomedical data with Boltzmann Machines","author":"Lenz","year":"2019","journal-title":"bioRxiv"},{"key":"2021072112100944000_ref26","doi-asserted-by":"crossref","first-page":"859","DOI":"10.1080\/01621459.2017.1285773","article-title":"Variational inference: a review for statisticians","volume":"112","author":"Blei","year":"2017","journal-title":"J Am Stat Assoc"},{"key":"2021072112100944000_ref27","first-page":"700","article-title":"Are GANs created equal? A large-scale study","volume":"31","author":"Lucic","year":"2018","journal-title":"Adv Neural Inform Process Syst"},{"key":"2021072112100944000_ref28","article-title":"Wasserstein GAN","author":"Arjovsky","year":"2017","journal-title":"arXiv preprint arXiv:170107875"},{"key":"2021072112100944000_ref29","first-page":"255","article-title":"Convolutional networks for images, speech, and time series. The handbook of brain theory and","volume":"3361","author":"LeCun","year":"1995","journal-title":"Neural Netw"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/4\/bbaa226\/39136273\/bbaa226.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/4\/bbaa226\/39136273\/bbaa226.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,7,21]],"date-time":"2021-07-21T08:14:50Z","timestamp":1626855290000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaa226\/5917048"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,2]]},"references-count":29,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,7,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaa226","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.06.11.147058","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,7]]},"published":{"date-parts":[[2020,10,2]]},"article-number":"bbaa226"}}