{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T04:31:08Z","timestamp":1762057868474,"version":"build-2065373602"},"reference-count":38,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2022,7,1]],"date-time":"2022-07-01T00:00:00Z","timestamp":1656633600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>We consider the problem of enhancing user privacy in common data analysis and machine learning development tasks, such as data annotation and inspection, by substituting the real data with samples from a generative adversarial network. We propose employing Bayesian differential privacy as the means to achieve a rigorous theoretical guarantee while providing a better privacy-utility trade-off. We demonstrate experimentally that our approach produces higher-fidelity samples compared to prior work, allowing to (1) detect more subtle data errors and biases, and (2) reduce the need for real data labelling by achieving high accuracy when training directly on artificial samples.<\/jats:p>","DOI":"10.3390\/a15070232","type":"journal-article","created":{"date-parts":[[2022,7,2]],"date-time":"2022-07-02T11:12:35Z","timestamp":1656760355000},"page":"232","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Generating Higher-Fidelity Synthetic Datasets with Privacy Guarantees"],"prefix":"10.3390","volume":"15","author":[{"given":"Aleksei","family":"Triastcyn","sequence":"first","affiliation":[{"name":"\u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne, 1015 Lausanne, Switzerland"}]},{"given":"Boi","family":"Faltings","sequence":"additional","affiliation":[{"name":"\u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne, 1015 Lausanne, Switzerland"}]}],"member":"1968","published-online":{"date-parts":[[2022,7,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Fredrikson, M., Jha, S., and Ristenpart, T. (2015, January 12\u201316). Model inversion attacks that exploit confidence information and basic countermeasures. Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA.","DOI":"10.1145\/2810103.2813677"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Shokri, R., Stronati, M., Song, C., and Shmatikov, V. (2017, January 22\u201326). Membership inference attacks against machine learning models. Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA.","DOI":"10.1109\/SP.2017.41"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Hitaj, B., Ateniese, G., and P\u00e9rez-Cruz, F. (November, January 30). Deep models under the GAN: Information leakage from collaborative deep learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA.","DOI":"10.1145\/3133956.3134012"},{"key":"ref_4","unstructured":"Truex, S., Liu, L., Gursoy, M.E., Yu, L., and Wei, W. (2018). Towards demystifying membership inference attacks. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K., and Zhang, L. (2016, January 24\u201328). Deep learning with differential privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria.","DOI":"10.1145\/2976749.2978318"},{"key":"ref_6","unstructured":"Dwork, C. (2006, January 10\u201314). Differential Privacy. Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006), Venice, Italy."},{"key":"ref_7","unstructured":"McMahan, H.B., Moore, E., Ramage, D., Hampson, S., and Arcas, B.A.y. (2016). Communication-efficient learning of deep networks from decentralized data. arXiv."},{"key":"ref_8","unstructured":"Augenstein, S., McMahan, H.B., Ramage, D., Ramaswamy, S., Kairouz, P., Chen, M., and Mathews, R. (2019). Generative Models for Effective ML on Private, Decentralized Datasets. arXiv."},{"key":"ref_9","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014, January 8\u201313). Generative adversarial nets. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, USA."},{"key":"ref_10","unstructured":"Triastcyn, A., and Faltings, B. (2019). Bayesian Differential Privacy for Machine Learning. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Shokri, R., and Shmatikov, V. (2015, January 12\u201316). Privacy-preserving deep learning. Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA.","DOI":"10.1145\/2810103.2813687"},{"key":"ref_12","unstructured":"Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I., and Talwar, K. (2016). Semi-supervised knowledge transfer for deep learning from private training data. arXiv."},{"key":"ref_13","unstructured":"Papernot, N., Song, S., Mironov, I., Raghunathan, A., Talwar, K., and Erlingsson, \u00da. (2018). Scalable Private Learning with PATE. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Mironov, I., Pandey, O., Reingold, O., and Vadhan, S. (2009, January 16\u201320). Computational differential privacy. Proceedings of the Annual International Cryptology Conference, Santa Barbara, CA, USA.","DOI":"10.1007\/978-3-642-03356-8_8"},{"key":"ref_15","unstructured":"Mir, D.J. (2012, January 25\u201326). Information-theoretic foundations of differential privacy. Proceedings of the International Symposium on Foundations and Practice of Security, Montreal, QC, Canada."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"5018","DOI":"10.1109\/TIT.2016.2584610","article-title":"On the relation between identifiability, differential privacy, and mutual-information privacy","volume":"62","author":"Wang","year":"2016","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_17","unstructured":"Dwork, C., and Rothblum, G.N. (2016). Concentrated differential privacy. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Bun, M., and Steinke, T. (2016, January 10\u201313). Concentrated differential privacy: Simplifications, extensions, and lower bounds. Proceedings of the Theory of Cryptography Conference, Tel Aviv, Israel.","DOI":"10.1007\/978-3-662-53641-4_24"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Bun, M., Dwork, C., Rothblum, G.N., and Steinke, T. (2018, January 25\u201329). Composable and versatile privacy via truncated CDP. Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, Los Angeles, CA, USA.","DOI":"10.1145\/3188745.3188946"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Mironov, I. (2017, January 21\u201325). Renyi differential privacy. Proceedings of the 2017 IEEE 30th Computer Security Foundations Symposium (CSF), Santa Barbara, CA, USA.","DOI":"10.1109\/CSF.2017.11"},{"key":"ref_21","first-page":"4","article-title":"Differential privacy applications to Bayesian and linear mixed model estimation","volume":"5","author":"Abowd","year":"2013","journal-title":"J. Priv. Confidentiality"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"963","DOI":"10.1111\/rssa.12100","article-title":"A new method for protecting interrelated time series with Bayesian prior distributions and synthetic data","volume":"178","author":"Schneider","year":"2015","journal-title":"J. R. Stat. Soc. Ser. A Stat. Soc."},{"key":"ref_23","first-page":"3","article-title":"On the Meaning and Limits of Empirical Differential Privacy","volume":"7","author":"Charest","year":"2017","journal-title":"J. Priv. Confidentiality"},{"key":"ref_24","unstructured":"Triastcyn, A., and Faltings, B. (2019, January 12). Federated Generative Privacy. Proceedings of the IJCAI Workshop on Federated Machine Learning for User Privacy and Data Confidentiality (FML 2019), Macau, China."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Beaulieu-Jones, B.K., Wu, Z.S., Williams, C., and Greene, C.S. (2017). Privacy-preserving generative deep neural networks support clinical data sharing. bioRxiv, 159756.","DOI":"10.1101\/159756"},{"key":"ref_26","unstructured":"Xie, L., Lin, K., Wang, S., Wang, F., and Zhou, J. (2018). Differentially Private Generative Adversarial Network. arXiv."},{"key":"ref_27","unstructured":"Zhang, X., Ji, S., and Wang, T. (2018). Differentially Private Releasing via Deep Generative Model. arXiv."},{"key":"ref_28","unstructured":"Jordon, J., Yoon, J., and van der Schaar, M. (May, January 30). PATE-GAN: Generating synthetic data with differential privacy guarantees. Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada."},{"key":"ref_29","unstructured":"Long, Y., Lin, S., Yang, Z., Gunter, C.A., and Li, B. (2019). Scalable Differentially Private Generative Student Model via PATE. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1561\/0400000042","article-title":"The algorithmic foundations of differential privacy","volume":"9","author":"Dwork","year":"2014","journal-title":"Found. Trends\u00ae Theor. Comput. Sci."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Aldous, D.J. (1985). Exchangeability and related topics. \u00c9cole d\u2019\u00c9t\u00e9 de Probabilit\u00e9s de Saint-Flour XIII\u20141983, Springer.","DOI":"10.1007\/BFb0099420"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Triastcyn, A., and Faltings, B. (2019). Federated Learning with Bayesian Differential Privacy. arXiv.","DOI":"10.1109\/BigData47090.2019.9005465"},{"key":"ref_33","unstructured":"Triastcyn, A. (2020). Data-Aware Privacy-Preserving Machine Learning, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne\u2014EPFL. Technical Report."},{"key":"ref_34","unstructured":"Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A.C. (2017, January 4\u20139). Improved training of wasserstein gans. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_36","unstructured":"Xiao, H., Rasul, K., and Vollgraf, R. (2017). Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv."},{"key":"ref_37","unstructured":"Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein GAN. arXiv."},{"key":"ref_38","unstructured":"Klambauer, G., Unterthiner, T., Mayr, A., and Hochreiter, S. (2017, January 4\u20139). Self-normalizing neural networks. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/7\/232\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:41:40Z","timestamp":1760139700000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/7\/232"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,1]]},"references-count":38,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,7]]}},"alternative-id":["a15070232"],"URL":"https:\/\/doi.org\/10.3390\/a15070232","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2022,7,1]]}}}