{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T08:54:36Z","timestamp":1775638476563,"version":"3.50.1"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,3,12]],"date-time":"2024-03-12T00:00:00Z","timestamp":1710201600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"NSF","award":["2008815"],"award-info":[{"award-number":["2008815"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2024,3,12]]},"abstract":"<jats:p>Sketches are single-pass small-space data summaries that can quickly estimate the cardinality of join queries. However, sketches are not directly applicable to join queries with dynamic filter conditions --- where arbitrary selection predicate(s) are applied --- since a sketch is limited to a fixed selection. While multiple sketches for various selections can be used in combination, they each incur individual storage and maintenance costs. Alternatively, exact sketches can be built during runtime for every selection. To make this process scale, a high-degree of parallelism --- available in hardware accelerators such as GPUs --- is required. Therefore, sketch usage for cardinality estimation in query optimization is limited. Following recent work that applies transformers to cardinality estimation, we design a novel learning-based method to approximate the sketch of any arbitrary selection, enabling sketches for join queries with filter conditions. We train a transformer on each table to estimate the sketch of any subset of the table, i.e., any arbitrary selection. Transformers achieve this by learning the joint distribution amongst table attributes, which is equivalent to a multidimensional sketch. Subsequently, transformers can approximate any sketch, enabling sketches for join cardinality estimation. In turn, estimating joins via approximate sketches allows tables to be modeled individually and thus scales linearly with the number of tables. We evaluate the accuracy and efficacy of approximate sketches on queries with selection predicates consisting of conjunctions of point and range conditions. Approximate sketches achieve similar accuracy to exact sketches with at least one order of magnitude less overhead.<\/jats:p>","DOI":"10.1145\/3639321","type":"journal-article","created":{"date-parts":[[2024,3,26]],"date-time":"2024-03-26T18:51:32Z","timestamp":1711479092000},"page":"1-24","source":"Crossref","is-referenced-by-count":5,"title":["Approximate Sketches"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8407-7563","authenticated-orcid":false,"given":"Brian","family":"Tsan","sequence":"first","affiliation":[{"name":"University of California, Merced, Merced, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-7161-8444","authenticated-orcid":false,"given":"Asoke","family":"Datta","sequence":"additional","affiliation":[{"name":"University of California, Merced, Merced, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3484-2535","authenticated-orcid":false,"given":"Yesdaulet","family":"Izenov","sequence":"additional","affiliation":[{"name":"University of California, Merced, Merced, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7018-9043","authenticated-orcid":false,"given":"Florin","family":"Rusu","sequence":"additional","affiliation":[{"name":"University of California, Merced, Merced, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,3,26]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/303976.303978"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1997.1545"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/362686.362692"},{"key":"e_1_2_2_4_1","volume-title":"Lin (Eds.)","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877--1901. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2020\/file\/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf"},{"key":"e_1_2_2_5_1","volume-title":"Proceedings of the 31st International Conference on Very Large Data Bases","author":"Cormode Graham","year":"2005","unstructured":"Graham Cormode and Minos N. Garofalakis. 2005. Sketching Streams Through the Net: Distributed Approximate Query Tracking. In Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 - September 2, 2005, Klemens B\u00f6 hm, Christian S. Jensen, Laura M. Haas, Martin L. Kersten, Per-\u00c5ke Larson, and Beng Chin Ooi (Eds.). ACM, 13--24. http:\/\/www.vldb.org\/archives\/website\/2005\/program\/paper\/tue\/p13-cormode.pdf"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jalgor.2003.12.001"},{"key":"e_1_2_2_7_1","unstructured":"Fan Deng and Davood Rafiei. 2006. New Estimation Algorithms for Streaming Data: Count-min Can Do More. https:\/\/webdocs.cs.ualberta.ca\/ drafiei\/papers\/cmm.pdf"},{"key":"e_1_2_2_8_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv: 1810.04805 [cs.CL]"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.46298\/dmtcs.3545"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-24741-8_33"},{"key":"e_1_2_2_11_1","volume-title":"Proceedings of the 28th International Conference on Very Large Data Bases","author":"Gilbert Anna C.","unstructured":"Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, and Martin J. Strauss. 2002. How to Summarize the Universe: Dynamic Maintenance of Quantiles. In Proceedings of the 28th International Conference on Very Large Data Bases (Hong Kong, China) (VLDB '02). VLDB Endowment, 454--465."},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/3503585.3503586"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3384345.3384349"},{"key":"e_1_2_2_14_1","volume-title":"Intl Conf. on Learning Representations. 20 pages. https:\/\/doi.org\/paper\/2019_c_hsu_iclr","author":"Hsu C","year":"2019","unstructured":"C Hsu, P Indyk, D Katabi, and A Vakilian. 2019. Learning-based frequency estimation algorithms. In Intl Conf. on Learning Representations. 20 pages. https:\/\/doi.org\/paper\/2019_c_hsu_iclr"},{"key":"e_1_2_2_15_1","unstructured":"IMDB. 2023. Internet Movie Database. https:\/\/www.imdb.com\/"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452840"},{"key":"e_1_2_2_17_1","doi-asserted-by":"crossref","unstructured":"Norman P. Jouppi Cliff Young Nishant Patil David Patterson Gaurav Agrawal Raminder Bajwa Sarah Bates Suresh Bhatia Nan Boden Al Borchers Rick Boyle Pierre luc Cantin Clifford Chao Chris Clark Jeremy Coriell Mike Daley Matt Dau Jeffrey Dean Ben Gelb Tara Vazir Ghaemmaghami Rajendra Gottipati William Gulland Robert Hagmann C. Richard Ho Doug Hogberg John Hu Robert Hundt Dan Hurt Julian Ibarz Aaron Jaffey Alek Jaworski Alexander Kaplan Harshit Khaitan Andy Koch Naveen Kumar Steve Lacy James Laudon James Law Diemthu Le Chris Leary Zhuyuan Liu Kyle Lucke Alan Lundin Gordon MacKean Adriana Maggiore Maire Mahony Kieran Miller Rahul Nagarajan Ravi Narayanaswami Ray Ni Kathy Nix Thomas Norrie Mark Omernick Narayana Penukonda Andy Phelps and Jonathan Ross. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. https:\/\/arxiv.org\/pdf\/1704.04760.pdf","DOI":"10.1145\/3140659.3080246"},{"key":"e_1_2_2_18_1","volume-title":"Learned Cardinalities: Estimating Correlated Joins with Deep Learning. arxiv","author":"Kipf Andreas","year":"2018","unstructured":"Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. arxiv: 1809.00677 [cs.DB]"},{"key":"e_1_2_2_19_1","volume-title":"Jeffrey Dean, and Neoklis Polyzotis.","author":"Kraska Tim","year":"2017","unstructured":"Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2017. The Case for Learned Index Structures. CoRR, Vol. abs\/1712.01208 (2017). showeprint[arXiv]1712.01208 http:\/\/arxiv.org\/abs\/1712.01208"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-017-0480--7"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476254"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/3326943.3326986"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687738"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476259"},{"key":"e_1_2_2_25_1","unstructured":"PostgreSQL. 2023. PostgreSQL Documentation 13. https:\/\/www.postgresql.org\/docs\/13\/row-estimation-examples.html"},{"key":"e_1_2_2_26_1","volume-title":"Morcos","author":"Sorscher Ben","year":"2023","unstructured":"Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, and Ari S. Morcos. 2023. Beyond neural scaling laws: beating power law scaling via data pruning. arxiv: 2206.14486 [cs.LG]"},{"key":"e_1_2_2_27_1","doi-asserted-by":"crossref","unstructured":"Brian Tsan. 2024. Approximate Sketches. https:\/\/github.com\/Btsan\/ApproximateSketch.","DOI":"10.1145\/3639321"},{"key":"e_1_2_2_28_1","volume-title":"CoRR","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. CoRR, Vol. abs\/1706.03762 (2017). showeprint[arXiv]1706.03762"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824051"},{"key":"e_1_2_2_30_1","volume-title":"BayesCard: A Unified Bayesian Framework for Cardinality Estimation. CoRR","author":"Wu Ziniu","year":"2020","unstructured":"Ziniu Wu and Amir Shaikhha. 2020. BayesCard: A Unified Bayesian Framework for Cardinality Estimation. CoRR, Vol. abs\/2012.14743 (2020). showeprint[arXiv]2012.14743 https:\/\/arxiv.org\/abs\/2012.14743"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517885"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/3421424.3421432"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3368289.3368294"},{"key":"e_1_2_2_34_1","volume-title":"8th International Conference on Learning Representations, ICLR 2020","author":"You Yang","year":"2020","unstructured":"Yang You, Jing Li, Sashank J. Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, and Cho-Jui Hsieh. 2020. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net. https:\/\/openreview.net\/forum?id=Syx4wnEtvH"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/3461535.3461539"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639321","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3639321","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T15:16:46Z","timestamp":1755789406000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639321"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,12]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,3,12]]}},"alternative-id":["10.1145\/3639321"],"URL":"https:\/\/doi.org\/10.1145\/3639321","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,12]]}}}