{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T10:17:59Z","timestamp":1777630679309,"version":"3.51.4"},"reference-count":26,"publisher":"Association for Computing Machinery (ACM)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,1]]},"abstract":"<jats:p>\n            Synthesizing data using declarative formalisms has been persuasively advocated in contemporary data generation frameworks. In particular, they specify operator output volumes through row-cardinality constraints. However, thus far, adherence to these volumetric constraints has been limited to the Filter and Join operators. A critical deficiency is the lack of support for the Projection operator, which is at the core of basic SQL constructs such as Distinct, Union and Group By. The technical challenge here is that cardinality\n            <jats:italic>unions<\/jats:italic>\n            in multi-dimensional space, and not mere summations, need to be captured in the generation process. Further, dependencies\n            <jats:italic>across<\/jats:italic>\n            different data subspaces need to be taken into account.\n          <\/jats:p>\n          <jats:p>\n            We address the above lacuna by presenting\n            <jats:bold>PiGen<\/jats:bold>\n            , a dynamic data generator that incorporates Projection cardinality constraints in its ambit. The design is based on a projection subspace division strategy that supports the expression of constraints using optimized linear programming formulations. Further, techniques of symmetric refinement and workload decomposition are introduced to handle constraints across different projection subspaces. Finally, PiGen supports dynamic generation, where data is generated on-demand during query processing, making it amenable to Big Data environments. A detailed evaluation on workloads derived from real-world and synthetic benchmarks demonstrates that PiGen can accurately and efficiently model Projection outcomes, representing an essential step forward in customized database generation.\n          <\/jats:p>","DOI":"10.14778\/3510397.3510398","type":"journal-article","created":{"date-parts":[[2022,5,18]],"date-time":"2022-05-18T22:23:10Z","timestamp":1652912590000},"page":"998-1010","source":"Crossref","is-referenced-by-count":6,"title":["Projection-compliant database generation"],"prefix":"10.14778","volume":"15","author":[{"given":"Anupam","family":"Sanghi","sequence":"first","affiliation":[{"name":"Indian Institute of Science"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shadab","family":"Ahmed","sequence":"additional","affiliation":[{"name":"Indian Institute of Science"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jayant R.","family":"Haritsa","sequence":"additional","affiliation":[{"name":"Indian Institute of Science"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,5,18]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"[n.d.]. Dagstuhl Seminar 21442. Ensuring the Reliability and Robustness of Database Management Systems. https:\/\/www.dagstuhl.de\/en\/program\/calendar\/semhp\/?semnr=21442  [n.d.]. Dagstuhl Seminar 21442. Ensuring the Reliability and Robustness of Database Management Systems. https:\/\/www.dagstuhl.de\/en\/program\/calendar\/semhp\/?semnr=21442"},{"key":"e_1_2_1_2_1","unstructured":"[n.d.]. JOB Benchmark. https:\/\/github.com\/gregrahn\/join-order-benchmark  [n.d.]. JOB Benchmark. https:\/\/github.com\/gregrahn\/join-order-benchmark"},{"key":"e_1_2_1_3_1","unstructured":"[n.d.]. PostgreSQL. https:\/\/www.postgresql.org\/docs\/9.6\/  [n.d.]. PostgreSQL. https:\/\/www.postgresql.org\/docs\/9.6\/"},{"key":"e_1_2_1_4_1","unstructured":"[n.d.]. TPC-DS. http:\/\/tpc.org\/tpcds\/  [n.d.]. TPC-DS. http:\/\/tpc.org\/tpcds\/"},{"key":"e_1_2_1_5_1","unstructured":"[n.d.]. TPC-H. http:\/\/tpc.org\/tpch\/  [n.d.]. TPC-H. http:\/\/tpc.org\/tpch\/"},{"key":"e_1_2_1_6_1","unstructured":"[n.d.]. Z3. https:\/\/github.com\/Z3Prover\/z3  [n.d.]. Z3. https:\/\/github.com\/Z3Prover\/z3"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367530"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989395"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3402755.3402785"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2007.367896"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247520"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1112\/blms\/27.5.417"},{"key":"e_1_2_1_13_1","volume-title":"Proc. of 31st VLDB Conf. 1097--1107","author":"Bruno Nicolas","year":"2005","unstructured":"Nicolas Bruno and Surajit Chaudhuri . 2005 . Flexible Database Generators . In Proc. of 31st VLDB Conf. 1097--1107 . Nicolas Bruno and Surajit Chaudhuri. 2005. Flexible Database Generators. In Proc. of 31st VLDB Conf. 1097--1107."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457242"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/191839.191886"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1137\/19M1296306"},{"key":"e_1_2_1_17_1","volume-title":"Proc. of 13th VLDB Conf. 147--154","author":"Lenzerini Maurizio","year":"1987","unstructured":"Maurizio Lenzerini and Paolo Nobili . 1987 . On The Satisfiability of Dependency Constraints in Entity-Relationship Schemata . In Proc. of 13th VLDB Conf. 147--154 . Maurizio Lenzerini and Paolo Nobili. 1987. On The Satisfiability of Dependency Constraints in Entity-Relationship Schemata. In Proc. of 13th VLDB Conf. 147--154."},{"key":"e_1_2_1_18_1","volume-title":"Proc. of USENIX ATC. 575--586","author":"Li Yuming","year":"2018","unstructured":"Yuming Li , Rong Zhang , Xiaoyan Yang , Zhenjie Zhang , and Aoying Zhou . 2018 . Touchstone: Generating Enormous Query-Aware Test Databases . In Proc. of USENIX ATC. 575--586 . Yuming Li, Rong Zhang, Xiaoyan Yang, Zhenjie Zhang, and Aoying Zhou. 2018. Touchstone: Generating Enormous Query-Aware Test Databases. In Proc. of USENIX ATC. 575--586."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-014-0354-1"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2735378"},{"key":"e_1_2_1_22_1","volume-title":"Proc. of 26th DASFAA Conf. 105--112","author":"Sanghi Anupam","unstructured":"Anupam Sanghi , Rajkumar Santhanam , and Jayant R. Haritsa . 2021. Towards Generating HiFi Databases . In Proc. of 26th DASFAA Conf. 105--112 . Anupam Sanghi, Rajkumar Santhanam, and Jayant R. Haritsa. 2021. Towards Generating HiFi Databases. In Proc. of 26th DASFAA Conf. 105--112."},{"key":"e_1_2_1_23_1","volume-title":"Proc. of 21st EDBT Conf. 301--312","author":"Sanghi Anupam","year":"2018","unstructured":"Anupam Sanghi , Raghav Sood , Jayant R. Haritsa , and Srikanta Tirthapura . 2018 . Scalable and Dynamic Regeneration of Big Data Volumes . In Proc. of 21st EDBT Conf. 301--312 . Anupam Sanghi, Raghav Sood, Jayant R. Haritsa, and Srikanta Tirthapura. 2018. Scalable and Dynamic Regeneration of Big Data Volumes. In Proc. of 21st EDBT Conf. 301--312."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/3229863.3236238"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2479440.2479445"},{"key":"e_1_2_1_26_1","unstructured":"Abraham Silberschatz Henry F Korth and S. Sudarshan. 2020. Database System Concepts Seventh Edition. McGraw-Hill New York.  Abraham Silberschatz Henry F Korth and S. Sudarshan. 2020. Database System Concepts Seventh Edition. McGraw-Hill New York."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1137\/18M1184679"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3510397.3510398","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:20:40Z","timestamp":1672219240000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3510397.3510398"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1]]},"references-count":26,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,1]]}},"alternative-id":["10.14778\/3510397.3510398"],"URL":"https:\/\/doi.org\/10.14778\/3510397.3510398","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,1]]}}}