{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T11:45:04Z","timestamp":1774957504624,"version":"3.50.1"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,10]]},"abstract":"<jats:p>\n            Benchmarking is crucial for evaluating a DBMS, yet existing benchmarks often fail to reflect the varied nature of user workloads. As a result, there is increasing momentum toward creating databases that incorporate real-world user data to more accurately mirror business environments. However, privacy concerns deter users from directly sharing their data, underscoring the importance of creating synthesized databases for benchmarking that also prioritize privacy protection. Differential privacy (DP)-based data synthesis has become a key method for safeguarding privacy when sharing data, but the focus has largely been on minimizing errors in aggregate queries or downstream ML tasks, with less attention given to benchmarking factors like query runtime performance. This paper delves into differentially private database synthesis specifically for benchmark publishing scenarios, aiming to produce a synthetic database whose benchmarking factors closely resemble those of the original data. Introducing\n            <jats:italic>PrivBench<\/jats:italic>\n            , an innovative synthesis framework based on sum-product networks (SPNs), we support the synthesis of high-quality benchmark databases that maintain fidelity in both data distribution and query runtime performance while preserving privacy. We validate that PrivBench can ensure database-level DP even when generating multi-relation databases with complex reference relationships. Our extensive experiments show that PrivBench efficiently synthesizes data that maintains privacy and excels in both data distribution similarity and query runtime similarity.\n          <\/jats:p>","DOI":"10.14778\/3705829.3705855","type":"journal-article","created":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T23:21:06Z","timestamp":1740784866000},"page":"413-425","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Privacy-Enhanced Database Synthesis for Benchmark Publishing"],"prefix":"10.14778","volume":"18","author":[{"given":"Yunqing","family":"Ge","sequence":"first","affiliation":[{"name":"Shenzhen University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianbin","family":"Qin","sequence":"additional","affiliation":[{"name":"SICS, Shenzhen University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shuyuan","family":"Zheng","sequence":"additional","affiliation":[{"name":"Osaka Univeristy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yongrui","family":"Zhong","sequence":"additional","affiliation":[{"name":"Shenzhen University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bo","family":"Tang","sequence":"additional","affiliation":[{"name":"Southern University of Science and Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yu-Xuan","family":"Qiu","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rui","family":"Mao","sequence":"additional","affiliation":[{"name":"SICS, Shenzhen University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ye","family":"Yuan","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Makoto","family":"Onizuka","sequence":"additional","affiliation":[{"name":"Osaka Univeristy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chuan","family":"Xiao","sequence":"additional","affiliation":[{"name":"Osaka Univeristy, Nagoya University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,2,28]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Internet Movie Database. https:\/\/www.imdb.com\/."},{"key":"e_1_2_1_2_1","unstructured":"TPC benchmarks. https:\/\/www.tpc.org\/."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476272"},{"issue":"2","key":"e_1_2_1_4_1","first-page":"1","article-title":"Privlava: Synthesizing relational data with foreign keys under differential privacy","volume":"1","author":"Cai K.","year":"2023","unstructured":"K. Cai, X. Xiao, and G. Cormode. Privlava: Synthesizing relational data with foreign keys under differential privacy. SIGMOD, 1(2):142:1--142:25, 2023.","journal-title":"SIGMOD"},{"key":"e_1_2_1_5_1","first-page":"12673","article-title":"GS-WGAN: A gradient-sanitized approach for learning differentially private generators","volume":"33","author":"Chen D.","year":"2020","unstructured":"D. Chen, T. Orekondy, and M. Fritz. GS-WGAN: A gradient-sanitized approach for learning differentially private generators. NeurIPS, 33:12673--12684, 2020.","journal-title":"NeurIPS"},{"key":"e_1_2_1_6_1","first-page":"1535","volume-title":"SIGMOD","author":"Chen L.","year":"2019","unstructured":"L. Chen, P. Koutris, and A. Kumar. Towards model-based pricing for machine learning in a data marketplace. In SIGMOD, pages 1535--1552, 2019."},{"key":"e_1_2_1_7_1","first-page":"759","volume-title":"SIMOD","author":"Dong W.","year":"2022","unstructured":"W. Dong, J. Fang, K. Yi, Y. Tao, and A. Machanavajjhala. R2T: Instance-optimal truncation for differentially private query evaluation with foreign keys. In SIMOD, pages 759--772, 2022."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/11787006_1"},{"key":"e_1_2_1_9_1","volume-title":"The algorithmic foundations of differential privacy. FnT TCS, 9(3--4):211--407","author":"Dwork C.","year":"2014","unstructured":"C. Dwork and A. Roth. The algorithmic foundations of differential privacy. FnT TCS, 9(3--4):211--407, 2014."},{"issue":"10","key":"e_1_2_1_10_1","first-page":"1886","article-title":"Kamino: Constraint-aware differentially private data synthesis","volume":"14","author":"Ge C.","year":"2021","unstructured":"C. Ge, S. Mohapatra, X. He, and I. F. Ilyas. Kamino: Constraint-aware differentially private data synthesis. PVLDB, 14(10):1886--1899, 2021.","journal-title":"PVLDB"},{"key":"e_1_2_1_11_1","volume-title":"Privacy-enhanced database synthesis for benchmark publishing. arXiv preprint arXiv:2405.01312","author":"Ge Y.","year":"2024","unstructured":"Y. Ge, J. Qin, S. Zheng, Y. Zhong, B. Tang, Y.-X. Qiu, R. Mao, Y. Yuan, M. Onizuka, and C. Xiao. Privacy-enhanced database synthesis for benchmark publishing. arXiv preprint arXiv:2405.01312, 2024."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/3503585.3503586"},{"key":"e_1_2_1_13_1","volume-title":"DeepDB: Learn from data, not from queries! PVLDB, 13(7):992--1005","author":"Hilprecht B.","year":"2020","unstructured":"B. Hilprecht, A. Schmidt, M. Kulessa, A. Molina, K. Kersting, and C. Binnig. DeepDB: Learn from data, not from queries! PVLDB, 13(7):992--1005, 2020."},{"key":"e_1_2_1_14_1","first-page":"398","volume-title":"CSF","author":"Hsu J.","year":"2014","unstructured":"J. Hsu, M. Gaboardi, A. Haeberlen, S. Khanna, A. Narayan, B. C. Pierce, and A. Roth. Differential privacy: An economic method for choosing epsilon. In CSF, pages 398--410, 2014."},{"key":"e_1_2_1_15_1","volume-title":"ICLR","author":"Jordon J.","year":"2019","unstructured":"J. Jordon, J. Yoon, and M. van der Schaar. PATE-GAN: Generating synthetic data with differential privacy guarantees. In ICLR, 2019."},{"key":"e_1_2_1_16_1","volume-title":"CIDR","author":"Kipf A.","year":"2019","unstructured":"A. Kipf, T. Kipf, B. Radke, V. Leis, P. A. Boncz, and A. Kemper. Learned cardinalities: Estimating correlated joins with deep learning. In CIDR, 2019."},{"key":"e_1_2_1_17_1","first-page":"202","volume-title":"KDD","volume":"96","author":"Kohavi R.","year":"1996","unstructured":"R. Kohavi et al. Scaling up the accuracy of Naive-Bayes classifiers: A decision-tree hybrid. In KDD, volume 96, pages 202--207, 1996."},{"key":"e_1_2_1_18_1","first-page":"19","volume-title":"PAC","author":"Kohli N.","year":"2018","unstructured":"N. Kohli and P. Laskowski. Epsilon voting: Mechanism design for parameter selection in differential privacy. In PAC, pages 19--30, 2018."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342274"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2770870"},{"issue":"1","key":"e_1_2_1_21_1","first-page":"16","article-title":"Generating synthetic mixed discrete-continuous health records with mixed sum-product networks","volume":"30","author":"Kroes S. K. S.","year":"2022","unstructured":"S. K. S. Kroes, M. van Leeuwen, R. H. H. Groenwold, and M. P. Janssen. Generating synthetic mixed discrete-continuous health records with mixed sum-product networks. JAMIA, 30(1):16--25, 2022.","journal-title":"JAMIA"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2691190.2691191"},{"issue":"13","key":"e_1_2_1_24_1","first-page":"1677","article-title":"DPSynthesizer: Differentially private data synthesizer for privacy preserving data sharing","volume":"7","author":"Li H.","year":"2014","unstructured":"H. Li, L. Xiong, L. Zhang, and X. Jiang. DPSynthesizer: Differentially private data synthesizer for privacy preserving data sharing. PVLDB, 7(13):1677--1680, 2014.","journal-title":"PVLDB"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/3447689.3447700"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-023-00807-y"},{"key":"e_1_2_1_27_1","first-page":"2965","volume-title":"NeurIPS","author":"Long Y.","year":"2021","unstructured":"Y. Long, B. Wang, Z. Yang, B. Kailkhura, A. Zhang, C. A. Gunter, and B. Li. G-PATE: Scalable differentially private data generator via private aggregation of teacher discriminators. In NeurIPS, pages 2965--2977, 2021."},{"key":"e_1_2_1_28_1","volume-title":"Winning the NIST contest: A scalable and general approach to differentially private synthetic data. JPC, 11(3)","author":"McKenna R.","year":"2021","unstructured":"R. McKenna, G. Miklau, and D. Sheldon. Winning the NIST contest: A scalable and general approach to differentially private synthetic data. JPC, 11(3), 2021."},{"issue":"11","key":"e_1_2_1_29_1","first-page":"2599","article-title":"AIM: An adaptive and iterative mechanism for differentially private synthetic data","volume":"15","author":"McKenna R.","year":"2022","unstructured":"R. McKenna, B. Mullins, D. Sheldon, and G. Miklau. AIM: An adaptive and iterative mechanism for differentially private synthetic data. PVLDB, 15(11):2599--2612, 2022.","journal-title":"PVLDB"},{"key":"e_1_2_1_30_1","first-page":"4435","volume-title":"ICML","volume":"97","author":"McKenna R.","year":"2019","unstructured":"R. McKenna, D. Sheldon, and G. Miklau. Graphical-model based estimation and inference for differential privacy. In ICML, volume 97, pages 4435--4444, 2019."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687738"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3085504.3091117"},{"key":"e_1_2_1_33_1","first-page":"689","volume-title":"ICCVW","author":"Poon H.","year":"2011","unstructured":"H. Poon and P. M. Domingos. Sum-product networks: A new deep architecture. In ICCVW, pages 689--690, 2011."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/3583140.3583168"},{"issue":"7","key":"e_1_2_1_35_1","first-page":"3821","article-title":"Sum-product networks: A survey","volume":"44","author":"S\u00e5nchez-Cauce R.","year":"2022","unstructured":"R. S\u00e5nchez-Cauce, I. Par\u00eds, and F. J. D\u00edez. Sum-product networks: A survey. TPAMI, 44(7):3821--3839, 2022.","journal-title":"TPAMI"},{"key":"e_1_2_1_36_1","first-page":"138","volume-title":"PSD","author":"Snoke J.","year":"2018","unstructured":"J. Snoke and A. B. Slavkovic. pmse mechanism: Differentially private synthetic data with maximal distributional similarity. In PSD, pages 138--159, 2018."},{"key":"e_1_2_1_37_1","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1145\/2857705.2857708","volume-title":"CODASPY","author":"Su D.","year":"2016","unstructured":"D. Su, J. Cao, N. Li, E. Bertino, and H. Jin. Differentially private K-Means clustering. In CODASPY, pages 26--37, 2016."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389762"},{"key":"e_1_2_1_39_1","first-page":"98","volume-title":"CVPRW","author":"Torkzadehmahani R.","year":"2019","unstructured":"R. Torkzadehmahani, P. Kairouz, and B. Paten. DP-CGAN: Differentially private synthetic data and label generation. In CVPRW, pages 98--104, 2019."},{"key":"e_1_2_1_40_1","first-page":"1946","volume-title":"ECAI","author":"Treiber A.","year":"2020","unstructured":"A. Treiber, A. Molina, C. Weinert, T. Schneider, and K. Kersting. CryptoSPN: Privacy-preserving sum-product network inference. In ECAI, pages 1946--1953, 2020."},{"key":"e_1_2_1_41_1","first-page":"9765","volume-title":"ICML","author":"Vietri G.","year":"2020","unstructured":"G. Vietri, G. Tian, M. Bun, T. Steinke, and S. Wu. New oracle-efficient algorithms for private synthetic data release. In ICML, pages 9765--9774, 2020."},{"key":"e_1_2_1_42_1","volume-title":"DPSyn: Experiences in the NIST differential privacy data synthesis challenges. JPC, 11(2)","author":"Wang T.","year":"2021","unstructured":"T. Wang, N. Li, and Z. Zhang. DPSyn: Experiences in the NIST differential privacy data synthesis challenges. JPC, 11(2), 2021."},{"key":"e_1_2_1_43_1","volume-title":"Differentially private generative adversarial network. CoRR, abs\/1802.06739","author":"Xie L.","year":"2018","unstructured":"L. Xie, K. Lin, S. Wang, F. Wang, and J. Zhou. Differentially private generative adversarial network. CoRR, abs\/1802.06739, 2018."},{"key":"e_1_2_1_44_1","first-page":"1542","volume-title":"SIGMOD","author":"Yang J.","year":"2022","unstructured":"J. Yang, P. Wu, G. Cong, T. Zhang, and X. He. SAM: Database generation from query workloads with supervised autoregressive models. In SIGMOD, pages 1542--1555, 2022."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/3421424.3421432"},{"issue":"4","key":"e_1_2_1_46_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3134428","article-title":"PrivBayes: Private data release via bayesian networks","volume":"42","author":"Zhang J.","year":"2017","unstructured":"J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao. PrivBayes: Private data release via bayesian networks. TODS, 42(4):25:1--25:41, 2017.","journal-title":"TODS"},{"key":"e_1_2_1_47_1","first-page":"929","volume-title":"USENIX Security","author":"Zhang Z.","year":"2021","unstructured":"Z. Zhang, T. Wang, N. Li, J. Honorio, M. Backes, S. He, J. Chen, and Y. Zhang. PrivSyn: Differentially private data synthesis. In USENIX Security, pages 929--946, 2021."},{"key":"e_1_2_1_48_1","first-page":"29","volume-title":"MDM","author":"Zheng S.","year":"2020","unstructured":"S. Zheng, Y. Cao, and M. Yoshikawa. Money cannot buy everything: Trading mobile data with controllable privacy loss. In MDM, pages 29--38, 2020."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.14778\/3587136.3587141"},{"key":"e_1_2_1_50_1","first-page":"1525","volume-title":"IEEE BigData","author":"Zheng S.","year":"2022","unstructured":"S. Zheng, Y. Cao, M. Yoshikawa, H. Li, and Q. Yan. FL-Market: Trading private models in federated learning. In IEEE BigData, pages 1525--1534, 2022."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3705829.3705855","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T23:24:39Z","timestamp":1740785079000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3705829.3705855"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10]]},"references-count":50,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,10]]}},"alternative-id":["10.14778\/3705829.3705855"],"URL":"https:\/\/doi.org\/10.14778\/3705829.3705855","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,10]]},"assertion":[{"value":"2025-02-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}