{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,19]],"date-time":"2026-05-19T07:14:23Z","timestamp":1779174863300,"version":"3.51.4"},"reference-count":110,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,6,17]]},"abstract":"<jats:p>After decades of research in approximate query processing (AQP), its adoption in the industry remains limited. Existing methods struggle to simultaneously provide user-specified error guarantees, eliminate maintenance overheads, and avoid modifications to database management systems. To address these challenges, we introduce two novel techniques, TAQA and BSAP. TAQA is a two-stage online AQP algorithm that achieves all three properties for arbitrary queries. However, it can be slower than exact queries if we use standard row-level sampling. BSAP resolves this by enabling block-level sampling with statistical guarantees in TAQA. We implement TAQA and BSAP in a prototype middleware system, PilotDB, that is compatible with all DBMSs supporting efficient block-level sampling. We evaluate PilotDB on PostgreSQL, SQL Server, and DuckDB over real-world benchmarks, demonstrating up to 126X speedups when running with a 5% guaranteed error.<\/jats:p>","DOI":"10.1145\/3725335","type":"journal-article","created":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:23:29Z","timestamp":1750281809000},"page":"1-28","source":"Crossref","is-referenced-by-count":2,"title":["PilotDB: Database-Agnostic Online Approximate Query Processing with A Priori Error Guarantees"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-6738-8930","authenticated-orcid":false,"given":"Yuxuan","family":"Zhu","sequence":"first","affiliation":[{"name":"University of Illinois Urbana Champaign, Champaign, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-0353-4184","authenticated-orcid":false,"given":"Tengjun","family":"Jin","sequence":"additional","affiliation":[{"name":"University of Illinois Urbana Champaign, Champaign, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4061-7094","authenticated-orcid":false,"given":"Stefanos","family":"Baziotis","sequence":"additional","affiliation":[{"name":"University of Illinois Urbana Champaign, Champaign, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-2120-1879","authenticated-orcid":false,"given":"Chengsong","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of Illinois Urbana Champaign, Champaign, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8140-2321","authenticated-orcid":false,"given":"Charith","family":"Mendis","sequence":"additional","affiliation":[{"name":"University of Illinois Urbana Champaign, Champaign, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9860-9938","authenticated-orcid":false,"given":"Daniel","family":"Kang","sequence":"additional","affiliation":[{"name":"University of Illinois Urbana Champaign, Champaign, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,6,18]]},"reference":[{"key":"e_1_2_2_1_1","volume-title":"Aqua: A fast decision support systems using approximate query answers. In PVLDB.","author":"Acharya Swarup","year":"1999","unstructured":"Swarup Acharya, Phillip B Gibbons, and Viswanath Poosala. 1999. Aqua: A fast decision support systems using approximate query answers. In PVLDB."},{"key":"e_1_2_2_2_1","volume-title":"Congressional samples for approximate answering of group-by queries. In SIGMOD.","author":"Acharya Swarup","year":"2000","unstructured":"Swarup Acharya, Phillip B Gibbons, and Viswanath Poosala. 2000. Congressional samples for approximate answering of group-by queries. In SIGMOD."},{"key":"e_1_2_2_3_1","doi-asserted-by":"crossref","unstructured":"Swarup Acharya Phillip B Gibbons Viswanath Poosala and Sridhar Ramaswamy. 1999. The aqua approximate query answering system. In SIGMOD.","DOI":"10.1145\/304182.304581"},{"key":"e_1_2_2_4_1","doi-asserted-by":"crossref","unstructured":"Swarup Acharya Phillip B Gibbons Viswanath Poosala and Sridhar Ramaswamy. 1999. Join synopses for approximate query answering. In SIGMOD.","DOI":"10.1145\/304182.304207"},{"key":"e_1_2_2_5_1","doi-asserted-by":"crossref","unstructured":"Sameer Agarwal Barzan Mozafari Aurojit Panda Henry Milner Samuel Madden and Ion Stoica. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys.","DOI":"10.1145\/2465351.2465355"},{"key":"e_1_2_2_6_1","unstructured":"Azure Synapse Analytics. [n. d.]. Designing tables. https:\/\/learn.microsoft.com\/en-us\/azure\/synapse-analytics\/sqldata-warehouse\/sql-data-warehouse-tables-overview Accessed: 2024-05--12."},{"key":"e_1_2_2_7_1","doi-asserted-by":"crossref","unstructured":"Brian Babcock Surajit Chaudhuri and Gautam Das. 2003. Dynamic sample selection for approximate query processing. In SIGMOD.","DOI":"10.1145\/872757.872822"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389732"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352073"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1052623497325107"},{"key":"e_1_2_2_11_1","unstructured":"Surajit Chaudhuri Gautam Das Mayur Datar Rajeev Motwani and Vivek Narasayya. 2001. Overcoming limitations of sampling for aggregation queries. In ICDE."},{"key":"e_1_2_2_12_1","volume-title":"Optimized stratified sampling for approximate query processing. ACM Transactions on Database Systems","author":"Chaudhuri Surajit","year":"2007","unstructured":"Surajit Chaudhuri, Gautam Das, and Vivek Narasayya. 2007. Optimized stratified sampling for approximate query processing. ACM Transactions on Database Systems (2007)."},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3056097"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/304181.304206"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/3007263.3007277"},{"key":"e_1_2_2_16_1","doi-asserted-by":"crossref","unstructured":"Jiecao Chen and Qin Zhang. 2017. Bias-Aware Sketches. PVLDB (2017).","DOI":"10.14778\/3099622.3099627"},{"key":"e_1_2_2_17_1","unstructured":"Xingguang Chen Fangyuan Zhang and Sibo Wang. 2022. Efficient Approximate Algorithms for Empirical Variance with Hashed Block Sampling. In KDD."},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3085504.3085514"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-21042-1_29"},{"key":"e_1_2_2_20_1","unstructured":"ClickHouse. [n. d.]. ClickBench: a Benchmark For Analytical Databases. https:\/\/github.com\/ClickHouse\/ClickBench Accessed: 2024-06-04."},{"key":"e_1_2_2_21_1","unstructured":"Google Cloud. [n. d.]. Approximate aggregate functions | BigQuery. https:\/\/cloud.google.com\/bigquery\/docs\/reference\/standard-sql\/approximate_aggregate_functions Accessed: 2024-06-05."},{"key":"e_1_2_2_22_1","unstructured":"Google Cloud. [n. d.]. Table sampling | BigQuery | Google Cloud. https:\/\/cloud.google.com\/bigquery\/docs\/tablesampling Accessed: 2024-05--12."},{"key":"e_1_2_2_23_1","unstructured":"Transaction Processing Performance Council. [n. d.]. TPC-H Homepage. https:\/\/www.tpc.org\/tpch\/ Accessed: 2024-06-04."},{"key":"e_1_2_2_24_1","unstructured":"Azure Databricks. 2024. TABLESAMPLE clause. https:\/\/learn.microsoft.com\/en-us\/azure\/databricks\/sql\/languagemanual\/ sql-ref-syntax-qry-select-sampling Accessed: 2024-05--12."},{"key":"e_1_2_2_25_1","unstructured":"DuckDB Developers. 2025. DuckDB 1.2.0 ''Histrionicus''. https:\/\/github.com\/duckdb\/duckdb\/releases\/tag\/v1.2.0"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/3484224.3484234"},{"key":"e_1_2_2_27_1","doi-asserted-by":"crossref","unstructured":"Bolin Ding Silu Huang Surajit Chaudhuri Kaushik Chakrabarti and Chi Wang. 2016. Sample+ seek: Approximating aggregates with distribution precision guarantee. In SIGMOD.","DOI":"10.1145\/2882903.2915249"},{"key":"e_1_2_2_28_1","volume-title":"Turbo-charging estimate convergence in DBO. PVLDB","author":"Dobra Alin","year":"2009","unstructured":"Alin Dobra, Chris Jermaine, Florin Rusu, and Fei Xu. 2009. Turbo-charging estimate convergence in DBO. PVLDB (2009)."},{"key":"e_1_2_2_29_1","unstructured":"DuckDB. [n. d.]. Samples. https:\/\/duckdb.org\/docs\/sql\/samples.html Accessed: 2024-05--12."},{"key":"e_1_2_2_30_1","unstructured":"DuckDB. [n. d.]. SELECT Statement. https:\/\/duckdb.org\/docs\/sql\/statements\/select.html#row-ids Accessed: 2024-06- 21."},{"key":"e_1_2_2_31_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC). 1--14","author":"Duplyakin Dmitry","year":"2019","unstructured":"Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, GaryWong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. 2019. The Design and Operation of CloudLab. In Proceedings of the USENIX Annual Technical Conference (ATC). 1--14. https:\/\/www.flux.utah.edu\/paper\/duplyakin-atc19"},{"key":"e_1_2_2_32_1","volume-title":"Revisiting reuse for approximate query processing. PVLDB","author":"Galakatos Alex","year":"2017","unstructured":"Alex Galakatos, Andrew Crotty, Emanuel Zgraggen, Carsten Binnig, and Tim Kraska. 2017. Revisiting reuse for approximate query processing. PVLDB (2017)."},{"key":"e_1_2_2_33_1","volume-title":"Icicles: Self-tuning samples for approximate query answering. In PVLDB.","author":"Ganti Venkatesh","year":"2000","unstructured":"Venkatesh Ganti, Mong-Li Lee, and Raghu Ramakrishnan. 2000. Icicles: Self-tuning samples for approximate query answering. In PVLDB."},{"key":"e_1_2_2_34_1","doi-asserted-by":"crossref","unstructured":"Minos Garofalakis Johannes Gehrke and Rajeev Rastogi. 2002. Querying and Mining Data Streams: You Only Get One Look. (2002).","DOI":"10.1145\/564691.564794"},{"key":"e_1_2_2_35_1","doi-asserted-by":"crossref","unstructured":"Sudipto Guha and Boulos Harb. 2005. Wavelet synopsis for data streams: minimizing non-euclidean error. In KDD.","DOI":"10.1145\/1081870.1081884"},{"key":"e_1_2_2_36_1","volume-title":"Ripple joins for online aggregation. ACM SIGMOD Record","author":"Haas Peter J","year":"1999","unstructured":"Peter J Haas and Joseph M Hellerstein. 1999. Ripple joins for online aggregation. ACM SIGMOD Record (1999)."},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007601"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1996.0041"},{"key":"e_1_2_2_39_1","doi-asserted-by":"crossref","unstructured":"Joseph M Hellerstein Peter J Haas and Helen J Wang. 1997. Online aggregation. In SIGMOD.","DOI":"10.1145\/253262.253291"},{"key":"e_1_2_2_40_1","volume-title":"Considerations in determining sample size for pilot studies. Research in nursing & health 31, 2","author":"Hertzog Melody A","year":"2008","unstructured":"Melody A Hertzog. 2008. Considerations in determining sample size for pilot studies. Research in nursing & health 31, 2 (2008), 180--191."},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389704"},{"key":"e_1_2_2_42_1","unstructured":"Apache Hive. [n. d.]. LanguageManual Sampling. https:\/\/cwiki.apache.org\/confluence\/display\/hive\/languagemanual+ sampling Accessed: 2024-05--12."},{"key":"e_1_2_2_43_1","unstructured":"Robert V Hogg Joseph W McKean and Allen T Craig. 2019. Introduction to mathematical statistics. Pearson."},{"key":"e_1_2_2_44_1","volume-title":"Statistical estimators for aggregate relational algebra queries. ACM Transactions on Database Systems (TODS)","author":"Hou Wen-Chi","year":"1991","unstructured":"Wen-Chi Hou and Gultekin Ozsoyoglu. 1991. Statistical estimators for aggregate relational algebra queries. ACM Transactions on Database Systems (TODS) (1991)."},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/119995.115837"},{"key":"e_1_2_2_46_1","unstructured":"Wen-Chi Hou Gultekin Ozsoyoglu and Baldeo K Taneja. 1988. Statistical estimators for relational algebra expressions. In SIGMOD."},{"key":"e_1_2_2_47_1","volume-title":"Seth Pettie, and Barzan Mozafari.","author":"Huang Dawei","year":"2019","unstructured":"Dawei Huang, Dong Young Yoon, Seth Pettie, and Barzan Mozafari. 2019. Joins on samples: A theoretical guide for practitioners. arXiv preprint arXiv:1912.03443 (2019)."},{"key":"e_1_2_2_48_1","unstructured":"Microsoft Community Hub. [n. d.]. Where is a record really located? https:\/\/techcommunity.microsoft.com\/t5\/coreinfrastructure- and-security\/where-is-a-record-really-located\/ba-p\/370972 Accessed: 2024-06--21."},{"key":"e_1_2_2_49_1","unstructured":"Apache Impala. [n. d.]. Impala 4.0.0 Documentation: Table and Column Statistics. https:\/\/docs.cloudera.com\/cdwruntime\/cloud\/impala-sql-reference\/topics\/impala-virtual-columns.html#pnavId1 Accessed: 2024-06--21."},{"key":"e_1_2_2_50_1","unstructured":"Apache Impala. [n. d.]. TABLESAMPLE Clause. https:\/\/impala.apache.org\/docs\/build\/html\/topics\/impala_tablesample.html Accessed: 2024-05--12."},{"key":"e_1_2_2_51_1","unstructured":"Apache Impala. [n. d.]. Understanding Impala Query Performance. https:\/\/impala.apache.org\/docs\/build\/html\/topics\/impala_explain_plan.html Accessed: 2024-09--21."},{"key":"e_1_2_2_52_1","unstructured":"Instacart. [n. d.]. Instacart Market Basket Analysis. https:\/\/www.kaggle.com\/c\/instacart-market-basket-analysis\/data Accessed: 2024-05--12."},{"key":"e_1_2_2_53_1","volume-title":"Scalable approximate query processing with the DBO engine. ACM Transactions on Database Systems","author":"Jermaine Chris","year":"2008","unstructured":"Chris Jermaine, Subramanian Arumugam, Abhijit Pol, and Alin Dobra. 2008. Scalable approximate query processing with the DBO engine. ACM Transactions on Database Systems (2008)."},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1021\/ac00224a023"},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3097992"},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02869535"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882940"},{"key":"e_1_2_2_58_1","volume-title":"Accelerating approximate aggregation queries with expensive predicates. arXiv preprint arXiv:2108.06313","author":"Kang Daniel","year":"2021","unstructured":"Daniel Kang, John Guibas, Peter Bailis, Tatsunori Hashimoto, Yi Sun, and Matei Zaharia. 2021. Accelerating approximate aggregation queries with expensive predicates. arXiv preprint arXiv:2108.06313 (2021)."},{"key":"e_1_2_2_59_1","volume-title":"The data warehouse toolkit: The definitive guide to dimensional modeling","author":"Kimball Ralph","unstructured":"Ralph Kimball and Margy Ross. 2013. The data warehouse toolkit: The definitive guide to dimensional modeling. John Wiley & Sons."},{"key":"e_1_2_2_60_1","volume-title":"Modular Analytic Query Engine. In Companion of the 2024 International Conference on Management of Data. 5--17","author":"Lamb Andrew","year":"2024","unstructured":"Andrew Lamb, Yijie Shen, Dani\u00ebl Heres, Jayjeet Chakraborty, Mehmet Ozan Kabak, Liang-Chi Hsieh, and Chao Sun. 2024. Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine. In Companion of the 2024 International Conference on Management of Data. 5--17."},{"key":"e_1_2_2_61_1","unstructured":"Database Languages. 2003. SQL ISO\/IEC 9075*:2003."},{"key":"e_1_2_2_62_1","unstructured":"Microsoft Learn. [n. d.]. Intelligent query processing features in detail. https:\/\/learn.microsoft.com\/en-us\/sql\/relationaldatabases\/performance\/intelligent-query-processing-details?view=sql-server-ver16#approximate-queryprocessing Accessed: 2024-06-05."},{"key":"e_1_2_2_63_1","unstructured":"Microsoft Learn. [n. d.]. SET SHOWPLAN_ALL (Transact-SQL). https:\/\/learn.microsoft.com\/en-us\/sql\/t-sql\/statements\/ set-showplan-all-transact-sql?view=sql-server-ver16 Accessed: 2024-09--21."},{"key":"e_1_2_2_64_1","unstructured":"Feifei Li Bin Wu Ke Yi and Zhuoyue Zhao. 2016. Wander join: Online aggregation via random walks. In SIGMOD."},{"key":"e_1_2_2_65_1","volume-title":"Bounded approximate query processing. TKDE","author":"Li Kaiyu","year":"2018","unstructured":"Kaiyu Li, Yong Zhang, Guoliang Li, Wenbo Tao, and Ying Yan. 2018. Bounded approximate query processing. TKDE (2018)."},{"key":"e_1_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3324958"},{"key":"e_1_2_2_67_1","unstructured":"Microsoft. 2024. SQL Server 2022 CU13. https:\/\/packages.microsoft.com\/ubuntu\/20.04\/mssql-server-2022\/pool\/main\/m\/mssql-server\/"},{"key":"e_1_2_2_68_1","doi-asserted-by":"crossref","unstructured":"Barzan Mozafari. 2017. Approximate query engines: Commercial challenges and research opportunities. In SIGMOD.","DOI":"10.1145\/3035918.3056098"},{"key":"e_1_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3555041.3589681"},{"key":"e_1_2_2_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2749454"},{"key":"e_1_2_2_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915230"},{"key":"e_1_2_2_72_1","volume-title":"A Sampling Algebra for Aggregate Estimation. PVLDB","author":"Nirkhiwale Supriya","year":"2013","unstructured":"Supriya Nirkhiwale, Alin Dobra, and Christopher Jermaine. 2013. A Sampling Algebra for Aggregate Estimation. PVLDB (2013)."},{"key":"e_1_2_2_73_1","unstructured":"Frank Olken and Doron Rotem. 1986. Simple Random Sampling from Relational Databases. In PVLDB."},{"key":"e_1_2_2_74_1","volume-title":"Taster: Self-tuning, elastic and online approximate query processing","author":"Olma Matthaios","year":"2019","unstructured":"Matthaios Olma, Odysseas Papapetrou, Raja Appuswamy, and Anastasia Ailamaki. 2019. Taster: Self-tuning, elastic and online approximate query processing. In ICDE. IEEE."},{"key":"e_1_2_2_75_1","first-page":"50","article-title":"The star schema benchmark (SSB)","volume":"200","author":"O'Neil Patrick E","year":"2007","unstructured":"Patrick E O'Neil, Elizabeth J O'Neil, and Xuedong Chen. 2007. The star schema benchmark (SSB). Pat 200, 0 (2007), 50.","journal-title":"Pat"},{"key":"e_1_2_2_76_1","volume-title":"Propagation of uncertainty through mathematical operations","author":"Palmer M","year":"2003","unstructured":"M Palmer. 2003. Propagation of uncertainty through mathematical operations. Massachusetts Institute of (2003)."},{"key":"e_1_2_2_77_1","doi-asserted-by":"crossref","unstructured":"Prashant Pandey Michael A Bender Rob Johnson and Rob Patro. 2017. A general-purpose counting filter: Making every bit count. In SIGMOD.","DOI":"10.1145\/3035918.3035963"},{"key":"e_1_2_2_78_1","doi-asserted-by":"publisher","DOI":"10.14778\/3402707.3402748"},{"key":"e_1_2_2_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196905"},{"key":"e_1_2_2_80_1","volume-title":"Accurate estimation of the number of tuples satisfying a condition. ACM Sigmod Record","author":"Piatetsky-Shapiro Gregory","year":"1984","unstructured":"Gregory Piatetsky-Shapiro and Charles Connell. 1984. Accurate estimation of the number of tuples satisfying a condition. ACM Sigmod Record (1984)."},{"key":"e_1_2_2_81_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066224"},{"key":"e_1_2_2_82_1","volume-title":"Improved histograms for selectivity estimation of range predicates. ACM SIGMOD Record","author":"Poosala Viswanath","year":"1996","unstructured":"Viswanath Poosala, Peter J Haas, Yannis E Ioannidis, and Eugene J Shekita. 1996. Improved histograms for selectivity estimation of range predicates. ACM SIGMOD Record (1996)."},{"key":"e_1_2_2_83_1","unstructured":"PostgreSQL. [n. d.]. EXPLAIN. https:\/\/www.postgresql.org\/docs\/current\/sql-explain.html Accessed: 2024-09--21."},{"key":"e_1_2_2_84_1","unstructured":"PostgreSQL. [n. d.]. PostgreSQL: Documentation: 16: 5.5. System Columns. https:\/\/www.postgresql.org\/docs\/current\/ddlsystem-columns.html#DDL-SYSTEM-COLUMNS-CTID Accessed: 2024-06--21."},{"key":"e_1_2_2_85_1","unstructured":"Presto. [n. d.]. Cost in Explain. https:\/\/prestodb.io\/docs\/current\/optimizer\/cost-in-explain.html Accessed: 2024-09--21."},{"key":"e_1_2_2_86_1","unstructured":"Presto. [n. d.]. Hive Connector. https:\/\/prestodb.io\/docs\/current\/connector\/hive.html#extra-hidden-columns Accessed: 2024-06--21."},{"key":"e_1_2_2_87_1","unstructured":"Presto. [n. d.]. SELECT - Presto 0.287 Documentation. https:\/\/prestodb.io\/docs\/current\/sql\/select.html#tablesample Accessed: 2024-05--12."},{"key":"e_1_2_2_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3320212"},{"key":"e_1_2_2_89_1","volume-title":"Approximate partition selection for big-data workloads using summary statistics. PVLDB","author":"Rong Kexin","year":"2020","unstructured":"Kexin Rong, Yao Lu, Peter Bailis, Srikanth Kandula, and Philip Levis. 2020. Approximate partition selection for big-data workloads using summary statistics. PVLDB (2020)."},{"key":"e_1_2_2_90_1","volume-title":"Accelerating aggregation queries on unstructured streams of data. arXiv preprint arXiv:2308.09157","author":"Russo Matthew","year":"2023","unstructured":"Matthew Russo, Tatsunori Hashimoto, Daniel Kang, Yi Sun, and Matei Zaharia. 2023. Accelerating aggregation queries on unstructured streams of data. arXiv preprint arXiv:2308.09157 (2023)."},{"key":"e_1_2_2_91_1","doi-asserted-by":"publisher","DOI":"10.5555\/645481.653275"},{"key":"e_1_2_2_92_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137658"},{"key":"e_1_2_2_93_1","volume-title":"Deepola: Online aggregation for deeply nested queries. In SIGMOD.","author":"Sheoran Nikhil","year":"2022","unstructured":"Nikhil Sheoran. 2022. Deepola: Online aggregation for deeply nested queries. In SIGMOD."},{"key":"e_1_2_2_94_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983323.2983353"},{"key":"e_1_2_2_95_1","unstructured":"Talia Tamarin-Brodsky and John Marshall. [n. d.]. Error Analysis. ([n. d.])."},{"key":"e_1_2_2_96_1","unstructured":"PostgreSQL Team. 2024. PostgreSQL 16.3. The PostgreSQL Global Development Group."},{"key":"e_1_2_2_97_1","unstructured":"PostgreSQL Team. 2025. tsm_system_rows - the SYSTEM_ROWS sampling method for TABLESAMPLE. https:\/\/www.postgresql.org\/docs\/current\/tsm-system-rows.html"},{"key":"e_1_2_2_98_1","first-page":"126","article-title":"Data warehouse configuration","volume":"97","author":"Theodoratos Dimitri","year":"1997","unstructured":"Dimitri Theodoratos, Timos Sellis, et al. 1997. Data warehouse configuration. In VLDB, Vol. 97. 126--135.","journal-title":"VLDB"},{"key":"e_1_2_2_99_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611540.3611602"},{"key":"e_1_2_2_100_1","unstructured":"PostgreSQL wiki. [n. d.]. TABLESAMPLE Implementation. https:\/\/wiki.postgresql.org\/wiki\/TABLESAMPLE_ Implementation Accessed: 2024-05--12."},{"key":"e_1_2_2_101_1","volume-title":"Beng Chin Ooi, and Kian-Lee Tan","author":"Wu Sai","year":"2010","unstructured":"Sai Wu, Beng Chin Ooi, and Kian-Lee Tan. 2010. Continuous sampling for online aggregation over multiple queries. In SIGMOD."},{"key":"e_1_2_2_102_1","volume-title":"2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 1081--1092","author":"Wu Wentao","year":"2013","unstructured":"Wentao Wu, Yun Chi, Shenghuo Zhu, Junichi Tatemura, Hakan Hacig\u00fcm\u00fcs, and Jeffrey F Naughton. 2013. Predicting query execution time: Are optimizer cost models really unusable?. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 1081--1092."},{"key":"e_1_2_2_103_1","volume-title":"Confidence bounds for sampling-based group by estimates. ACM Transactions on Database Systems","author":"Xu Fei","year":"2008","unstructured":"Fei Xu, Christopher Jermaine, and Alin Dobra. 2008. Confidence bounds for sampling-based group by estimates. ACM Transactions on Database Systems (2008)."},{"key":"e_1_2_2_104_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733022"},{"key":"e_1_2_2_105_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2735381"},{"key":"e_1_2_2_106_1","doi-asserted-by":"crossref","unstructured":"Kai Zeng Shi Gao Jiaqi Gu Barzan Mozafari and Carlo Zaniolo. 2014. ABS: a system for scalable approximate queries with accuracy guarantees. In SIGMOD.","DOI":"10.1145\/2588555.2594532"},{"key":"e_1_2_2_107_1","doi-asserted-by":"crossref","unstructured":"Kai Zeng Shi Gao Barzan Mozafari and Carlo Zaniolo. 2014. The analytical bootstrap: a new method for fast error estimation in approximate query processing. In SIGMOD. 277--288.","DOI":"10.1145\/2588555.2588579"},{"key":"e_1_2_2_108_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183739"},{"key":"e_1_2_2_109_1","doi-asserted-by":"publisher","DOI":"10.14778\/3538598.3538606"},{"key":"e_1_2_2_110_1","unstructured":"Yuxuan Zhu Tengjun Jin Stefanos Baziotis Chengsong Zhang Charith Mendis and Daniel Kang. 2025. PilotDB: Database-Agnostic Online Approximate Query Processing with A Priori Error Guarantees (Technical Report). https:\/\/arxiv.org\/abs\/2503.21087"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3725335","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T18:51:19Z","timestamp":1774983079000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3725335"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,17]]},"references-count":110,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,6,17]]}},"alternative-id":["10.1145\/3725335"],"URL":"https:\/\/doi.org\/10.1145\/3725335","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,17]]}}}