{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T17:09:39Z","timestamp":1774717779955,"version":"3.50.1"},"reference-count":50,"publisher":"SAGE Publications","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IDT"],"published-print":{"date-parts":[[2022,1,10]]},"abstract":"<jats:p>We introduce the DP-auto-GAN framework for synthetic data generation, which combines the low dimensional representation of autoencoders with the flexibility of Generative Adversarial Networks (GANs). This framework can be used to take in raw sensitive data and privately train a model for generating synthetic data that will satisfy similar statistical properties as the original data. This learned model can generate an arbitrary amount of synthetic data, which can then be freely shared due to the post-processing guarantee of differential privacy. Our framework is applicable to unlabeled mixed-type data, that may include binary, categorical, and real-valued data. We implement this framework on both binary data (MIMIC-III) and mixed-type data (ADULT), and compare its performance with existing private algorithms on metrics in unsupervised settings. We also introduce a new quantitative metric able to detect diversity, or lack thereof, of synthetic data.<\/jats:p>","DOI":"10.3233\/idt-210195","type":"journal-article","created":{"date-parts":[[2021,12,7]],"date-time":"2021-12-07T19:08:07Z","timestamp":1638904087000},"page":"779-807","source":"Crossref","is-referenced-by-count":6,"title":["Differentially private synthetic mixed-type data generation for unsupervised learning"],"prefix":"10.1177","volume":"15","author":[{"given":"Uthaipon Tao","family":"Tantipongpipat","sequence":"first","affiliation":[{"name":"Twitter, San Francisco, CA, USA"}]},{"given":"Chris","family":"Waites","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, CA, USA"}]},{"given":"Digvijay","family":"Boob","sequence":"additional","affiliation":[{"name":"Southern Methodist University, Dallas, TX, USA"}]},{"given":"Amaresh Ankit","family":"Siva","sequence":"additional","affiliation":[{"name":"Amazon, Seattle, WA, USA"}]},{"given":"Rachel","family":"Cummings","sequence":"additional","affiliation":[{"name":"Columbia University, New York, NY, USA"}]}],"member":"179","reference":[{"key":"10.3233\/IDT-210195_ref1","doi-asserted-by":"crossref","unstructured":"Narayanan A, Shmatikov V. Robust De-anonymization of Large Sparse Datasets. In: Proceedings of the 2008 IEEE Symposium on Security and Privacy. Oakland S&P \u201908; 2008. pp.\u00a0111-125.","DOI":"10.1109\/SP.2008.33"},{"key":"10.3233\/IDT-210195_ref2","unstructured":"Barbaro M, Zeller T. A Face is Exposed for AOL Searcher No. 4417749. New York Times; 2006. [Online, Retrieved 9\/25\/2019]. New York Times. Available from: https\/\/www.nytimes.com\/2006\/08\/09\/technology\/09aol.html."},{"key":"10.3233\/IDT-210195_ref3","first-page":"1701","article-title":"Broken promises of privacy: Responding to the surprising failure of anonymization","volume":"57","author":"Ohm","year":"2010","journal-title":"UCLA Law Review."},{"key":"10.3233\/IDT-210195_ref4","unstructured":"Carlini N, Liu C, Erlingsson \u00da, Kos J, Song D. The Secret Sharer: Evaluating and testing unintended memorization in neural networks. In: Proceedings of the 28th USENIX Security Symposium. USENIX Security \u201919; 2019. pp. 267-284."},{"key":"10.3233\/IDT-210195_ref5","doi-asserted-by":"crossref","unstructured":"Dwork C, McSherry F, Nissim K, Smith A. Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Conference on Theory of Cryptography. TCC \u201906; 2006. pp. 265-284.","DOI":"10.1007\/11681878_14"},{"key":"10.3233\/IDT-210195_ref6","unstructured":"Triastcyn A, Faltings B. Generating artificial data for private deep learning. In: Proceedings of the PAL: Privacy-Enhancing Artificial Intelligence and Language Technologies. PAL \u201918; 2018. pp. 33-40."},{"key":"10.3233\/IDT-210195_ref7","doi-asserted-by":"crossref","unstructured":"Blum A, Ligett K, Roth A. A learning theory approach to non-interactive database privacy. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing. STOC \u201908; 2008. pp. 609-618.","DOI":"10.1145\/1374376.1374464"},{"key":"10.3233\/IDT-210195_ref8","doi-asserted-by":"crossref","unstructured":"Hardt M, Rothblum GN. A multiplicative weights mechanism for privacy-preserving data analysis. In: Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science. FOCS \u201910; 2010. pp. 61-70.","DOI":"10.1109\/FOCS.2010.85"},{"key":"10.3233\/IDT-210195_ref9","unstructured":"Kingma DP, Welling M. Auto-encoding variational bayes; 2013. ArXiv preprint 1312.6114."},{"key":"10.3233\/IDT-210195_ref10","first-page":"510","article-title":"Privacy preserving synthetic data release using deep learning","author":"Abay","year":"2018","journal-title":"Machine Learning and Knowledge Discovery in Databases (ECML PKDD \u201918). vol. 11051 of Lecture Notes in Computer Science"},{"key":"10.3233\/IDT-210195_ref11","unstructured":"Chen Q, Xiang C, Xue M, Li B, Borisov N, Kaarfar D, et al. Differentially Private Data Generative Models; 2018. ArXiv preprint 1812.02274."},{"key":"10.3233\/IDT-210195_ref12","doi-asserted-by":"crossref","first-page":"160035","DOI":"10.1038\/sdata.2016.35","article-title":"MIMIC-III, a freely accessible critical care database","volume":"3","author":"Johnson","year":"2016","journal-title":"Scientific Data."},{"key":"10.3233\/IDT-210195_ref13","unstructured":"Dua D, Graff C. UCI Machine Learning Repository; 2017. Available from: http:\/\/archive.ics.uci.edu\/ml."},{"key":"10.3233\/IDT-210195_ref14","doi-asserted-by":"crossref","unstructured":"Frigerio L, de\u00a0Oliveira AS, Gomez L, Duverger P. Differentially private generative adversarial networks for time series, continuous, and discrete open data. In: International Conference on ICT Systems Security and Privacy Protection. IFIP SEC \u201919 2019. pp. 151-164.","DOI":"10.1007\/978-3-030-22312-0_11"},{"key":"10.3233\/IDT-210195_ref15","unstructured":"Xie L, Lin K, Wang S, Wang F, Zhou J. Differentially private generative adversarial network; 2018. ArXiv preprint 1802.06739."},{"issue":"6","key":"10.3233\/IDT-210195_ref16","doi-asserted-by":"crossref","first-page":"1109","DOI":"10.1109\/TKDE.2018.2855136","article-title":"Differentially private mixture of generative neural networks","volume":"31","author":"Acs","year":"2018","journal-title":"IEEE Transactions on Knowledge and Data Engineering."},{"key":"10.3233\/IDT-210195_ref17","unstructured":"Hardt M, Ligett K, McSherry F. A simple and practical algorithm for differentially private data release. In: Advances in Neural Information Processing Systems 25, NIPS \u201912; 2012. pp. 2339-2347."},{"key":"10.3233\/IDT-210195_ref18","unstructured":"Gaboardi M, Arias EJG, Hsu J, Roth A, Wu ZS. Dual query: Practical private query release for high dimensional data. In: Proceedings of the 31st International Conference on Machine Learning. ICML \u201914; 2014. pp. 1170-1178."},{"key":"10.3233\/IDT-210195_ref19","doi-asserted-by":"crossref","unstructured":"Zhang J, Cormode G, Procopiuc CM, Srivastava D, Xiao X. PrivBayes: Private data release via Bayesian networks. ACM Transactions on Database Systems (TODS). 2017; 42(4): 25.","DOI":"10.1145\/3134428"},{"key":"10.3233\/IDT-210195_ref20","doi-asserted-by":"crossref","unstructured":"Ping H, Stoyanovich J, Howe B. DataSynthesizer: Privacy-preserving synthetic datasets. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management. SSDBM \u201917; 2017. pp.\u00a0421-42:5.","DOI":"10.1145\/3085504.3091117"},{"key":"10.3233\/IDT-210195_ref21","first-page":"95","article-title":"A review of synthetic data generation methods for privacy preserving data publishing","volume":"6","author":"Surendra","year":"2017","journal-title":"International Journal of Scientific and Technology."},{"key":"10.3233\/IDT-210195_ref22","doi-asserted-by":"crossref","unstructured":"Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, et al. Deep learning with differential privacy. In: Proceedings of the 2016 ACM Conference on Computer and Communications Security. CCS \u201916; 2016. pp. 308-318.","DOI":"10.1145\/2976749.2978318"},{"key":"10.3233\/IDT-210195_ref23","doi-asserted-by":"crossref","unstructured":"Mironov I. R\u00e9nyi differential privacy. In: Proceedings of the 2017 IEEE 30th Computer Security Foundations Symposium. CSF \u201917; 2017. pp.\u00a0263-275.","DOI":"10.1109\/CSF.2017.11"},{"key":"10.3233\/IDT-210195_ref24","unstructured":"Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative Adversarial Nets. In: Advances in Neural Information Processing Systems 27, NIPS \u201914, 2014. pp. 2672-2680."},{"key":"10.3233\/IDT-210195_ref25","unstructured":"Mogren O. C-RNN-GAN: Continuous recurrent neural networks with adversarial training. Constructive Machine Learning Workshop (CML) at NeurIPS 2016, 2016."},{"key":"10.3233\/IDT-210195_ref26","doi-asserted-by":"crossref","unstructured":"Saito M, Matsumoto E, Saito S. Temporal generative adversarial nets with singular value clipping. In: Proceedings of the IEEE International Conference on Computer Vision. ICCV \u201917, 2017. pp. 2830-2839.","DOI":"10.1109\/ICCV.2017.308"},{"key":"10.3233\/IDT-210195_ref27","unstructured":"Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X. Improved Techniques for Training GANs. In: Advances in Neural Information Processing Systems 29, NIPS \u201916, 2016. pp. 2234-2242."},{"key":"10.3233\/IDT-210195_ref28","unstructured":"Jang E, Gu S, Poole B. Categorical reparameterization with Gumbel-softmax. In: Proceedings of the 5th International Conference on Learning Representations. ICLR \u201917; 2017. Available from: https\/\/openreview.net\/forum?id=rkE3y85ee."},{"key":"10.3233\/IDT-210195_ref29","unstructured":"Kusner MJ, Hern\u00e1ndez-Lobato JM. GANs for sequences of discrete elements with the Gumbel-softmax distribution; 2016. ArXiv preprint 1611.04051."},{"key":"10.3233\/IDT-210195_ref30","doi-asserted-by":"crossref","unstructured":"Wang H, Wang J, Wang J, Zhao M, Zhang W, Zhang F, et al. GraphGAN: Graph representation learning with generative adversarial nets. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. AAAI \u201918; 2018. pp. 2508-2515.","DOI":"10.1609\/aaai.v32i1.11872"},{"key":"10.3233\/IDT-210195_ref31","unstructured":"Xu L, Skoularidou M, Cuesta-Infante A, Veeramachaneni K. Modeling tabular data using Conditional GAN. In: Advances in Neural Information Processing Systems 32, NeurIPS \u201919; 2019. pp. 7333-7343."},{"key":"10.3233\/IDT-210195_ref32","doi-asserted-by":"crossref","unstructured":"Lim SK, Loo Y, Tran NT, Cheung NM, Roig G, Elovici Y. DOPING: Generative data augmentation for unsupervised anomaly detection with GAN. In: Proceedings of the 2018 IEEE International Conference on Data Mining. ICDM \u201918; 2018. pp. 1122-1127.","DOI":"10.1109\/ICDM.2018.00146"},{"key":"10.3233\/IDT-210195_ref33","doi-asserted-by":"crossref","unstructured":"Park N, Mohammadi M, Gorde K, Jajodia S, Park H, Kim Y. Data synthesis based on generative adversarial networks. Proceedings of the VLDB Endowment. 2018; 11(10): 1071-1083.","DOI":"10.14778\/3231751.3231757"},{"key":"10.3233\/IDT-210195_ref34","unstructured":"Arjovsky M, Chintala S, Bottou L. Wasserstein GAN, 2017. ArXiv preprint 1701.07875."},{"key":"10.3233\/IDT-210195_ref35","unstructured":"Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC. Improved Training of Wasserstein GANs. In: Advances in Neural Information Processing Systems 30, NIPS \u201917; 2017. pp. 5767-5777."},{"key":"10.3233\/IDT-210195_ref36","unstructured":"Alzantot M, Srivastava M. Differential Privacy Synthetic Data Generation using WGANs, 2019. Available from: https:\/\/github.com\/nesl\/nist_differential_privacy_synthetic_data_challenge\/."},{"key":"10.3233\/IDT-210195_ref37","unstructured":"Mirza M, Osindero S. Conditional generative adversarial nets, 2014. ArXiv preprint 1411.1784."},{"key":"10.3233\/IDT-210195_ref38","doi-asserted-by":"crossref","unstructured":"Torkzadehmahani R, Kairouz P, Paten B. DP-CGAN: Differentially Private Synthetic Data and Label Generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019.","DOI":"10.1109\/CVPRW.2019.00018"},{"key":"10.3233\/IDT-210195_ref39","unstructured":"Papernot N, Abadi M, Erlingsson U, Goodfellow I, Talwar K. Semi-supervised knowledge transfer for deep learning from private training data. In: International Conference on Learning Representations. ICLR \u201917, 2017. Available from: https\/\/openreview.net\/forum?id=HkwoSDPgg."},{"key":"10.3233\/IDT-210195_ref40","unstructured":"Papernot N, Song S, Mironov I, Raghunathan A, Talwar K, Erlingsson \u00da. Scalable private learning with PATE. In: International Conference on Learning Representations. ICLR \u201918, 2018. Available from: https\/\/openreview.net\/forum?id=rkZB1XbRZ."},{"key":"10.3233\/IDT-210195_ref41","unstructured":"Jordon J, Yoon J, van\u00a0der Schaar M. PATE-GAN: generating synthetic data with differential privacy guarantees. In: Proceedings of the 7th International Conference on Learning Representations. ICLR \u201919; 2019. Available from: https\/\/openreview.net\/forum?id=S1zk9iRqF7."},{"key":"10.3233\/IDT-210195_ref42","unstructured":"Park M, Foulds J, Choudhary K, Welling M. DP-EM: Differentially Private Expectation Maximization. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. AISTATS \u201917; 2017. pp. 896-904."},{"key":"10.3233\/IDT-210195_ref46","doi-asserted-by":"crossref","unstructured":"Charest AS. How can we analyze differentially-private synthetic datasets? Journal of Privacy and Confidentiality. 2011; 2(2): 21-33.","DOI":"10.29012\/jpc.v2i2.589"},{"key":"10.3233\/IDT-210195_ref47","unstructured":"Choi E, Biswal S, Malin B, Duke J, Stewart WF, Sun J. Generating multi-label discrete patient records using generative adversarial networks. In: Proceedings of Machine Learning for Healthcare, 2017, pp. 286-305."},{"key":"10.3233\/IDT-210195_ref48","unstructured":"McMahan HB, Andrew G. A general approach to adding differential privacy to iterative training procedures. PPML18: Privacy Preserving Machine Learning \u2013 NeurIPS 2018 Workshop. 2018."},{"issue":"7","key":"10.3233\/IDT-210195_ref50","doi-asserted-by":"crossref","first-page":"3797","DOI":"10.1109\/TIT.2014.2320500","article-title":"R\u00e9nyi divergence and Kullback-Leibler divergence","volume":"60","author":"Van\u00a0Erven","year":"2014","journal-title":"IEEE Transactions on Information Theory."},{"key":"10.3233\/IDT-210195_ref51","unstructured":"Wang YX, Balle B, Kasiviswanathan S. Subsampled R\u00e9nyi Differential Privacy and Analytical Moments Accountant. In: Proceedings of the 22th International Conference on Artificial Intelligence and Statistics. AISTATS \u201919; 2019. pp. 1226-1235."},{"key":"10.3233\/IDT-210195_ref52","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1016\/j.cviu.2018.10.009","article-title":"Pros and cons of GAN evaluation measures","volume":"179","author":"Borji","year":"2019","journal-title":"Computer Vision and Image Understanding."},{"key":"10.3233\/IDT-210195_ref53","unstructured":"Bagdasaryan E, Poursaeed O, Shmatikov V. Differential privacy has disparate impact on model accuracy. In: Advances in Neural Information Processing Systems 32, NeurIPS \u201919; 2019. pp. 15479-15488."},{"key":"10.3233\/IDT-210195_ref54","doi-asserted-by":"crossref","unstructured":"Jeffreys H. An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London Series A Mathematical and Physical Sciences. 1946; 186(1007): 453-461.","DOI":"10.1098\/rspa.1946.0056"}],"container-title":["Intelligent Decision Technologies"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/IDT-210195","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,10]],"date-time":"2025-03-10T20:53:59Z","timestamp":1741640039000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/IDT-210195"}},"subtitle":[],"editor":[{"given":"George A.","family":"Tsihrintzis","sequence":"additional","affiliation":[]},{"given":"Maria","family":"Virvou","sequence":"additional","affiliation":[]},{"given":"Ioannis","family":"Hatzilygeroudis","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,1,10]]},"references-count":50,"journal-issue":{"issue":"4"},"URL":"https:\/\/doi.org\/10.3233\/idt-210195","relation":{},"ISSN":["1872-4981","1875-8843"],"issn-type":[{"value":"1872-4981","type":"print"},{"value":"1875-8843","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,10]]}}}