{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T09:51:04Z","timestamp":1773481864092,"version":"3.50.1"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,3,12]],"date-time":"2024-03-12T00:00:00Z","timestamp":1710201600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["92267203 62021002 62072265 62232005"],"award-info":[{"award-number":["92267203 62021002 62072265 62232005"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Beijing Key Laboratory of Industrial Big Data System and Application"},{"name":"the National Key Research and Development Plan","award":["2021YFB3300500"],"award-info":[{"award-number":["2021YFB3300500"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2024,3,12]]},"abstract":"<jats:p>Quantiles are fundamental statistics in various data science tasks, but costly to compute, e.g., by loading the entire data in memory for ranking. With limited memory space, prevalent in end devices or databases with heavy loads, it needs to scan the data in multiple passes. The idea is to gradually shrink the range of the queried quantile till it is small enough to fit in memory for ranking the result. Existing methods use deterministic sketches to determine the exact range of quantile, known as deterministic filter, which could be inefficient in range shrinking. In this study, we propose to shrink the ranges more aggressively, using randomized summaries such as KLL sketch. That is, with a high probability the quantile lies in a smaller range, namely probabilistic filter, determined by the randomized sketch. Specifically, we estimate the expected passes for determining the exact quantiles with probabilistic filters, and select a proper probability that can minimize the expected passes. Analyses show that our exact quantile determination method can terminate in P passes with 1-\u03b4 confidence, storing O(N 1\/P logP-1\/2P (1\/\u03b4)) items, close to the lower bound \u00d8mega(N1\/P) for a fixed \u03b4. The approach has been deployed as a function in an LSM-tree based time-series database Apache IoTDB. Remarkably, the randomized sketches can be pre-computed for the immutable SSTables in LSM-tree. Moreover, multiple quantile queries could share the data passes for probabilistic filters in range estimation. Extensive experiments on real and synthetic datasets demonstrate the superiority of our proposal compared to the existing methods with deterministic filters. On average, our method takes 0.48 fewer passes and 18% of the time compared with the state-of-the-art deterministic sketch (GK sketch).<\/jats:p>","DOI":"10.1145\/3639280","type":"journal-article","created":{"date-parts":[[2024,3,26]],"date-time":"2024-03-26T18:51:32Z","timestamp":1711479092000},"page":"1-26","source":"Crossref","is-referenced-by-count":3,"title":["Determining Exact Quantiles with Randomized Summaries"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-3197-2722","authenticated-orcid":false,"given":"Ziling","family":"Chen","sequence":"first","affiliation":[{"name":"Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-5412-4489","authenticated-orcid":false,"given":"Haoquan","family":"Guan","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9503-2755","authenticated-orcid":false,"given":"Shaoxu","family":"Song","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6868-4045","authenticated-orcid":false,"given":"Xiangdong","family":"Huang","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1698-8992","authenticated-orcid":false,"given":"Chen","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6841-7943","authenticated-orcid":false,"given":"Jianmin","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,3,26]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183761"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2500128"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476266"},{"key":"e_1_2_1_4_1","unstructured":"Experiment Code and Data. 2023. https:\/\/github.com\/thssdb\/exact-quantile."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3452021.3458323"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467152"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3375395.3387650"},{"key":"e_1_2_1_8_1","unstructured":"Bitcoin Dataset. 2018. https:\/\/www.kaggle.com\/shiheyingzhe\/bitcoin-transaction-data-from-2009-to-2018."},{"key":"e_1_2_1_9_1","unstructured":"Binance Dataset. 2022. https:\/\/www.kaggle.com\/datasets\/jorijnsmit\/binance-full-history."},{"key":"e_1_2_1_10_1","unstructured":"Electric Dataset. 2022. https:\/\/www.kaggle.com\/datasets\/jeanmidev\/smart-meters-in-london."},{"key":"e_1_2_1_11_1","unstructured":"IBM Dataset. 2023. https:\/\/www.kaggle.com\/datasets\/ealtman2019\/ibm-transactions-for-anti-money-laundering-aml."},{"key":"e_1_2_1_12_1","unstructured":"Thruster Dataset. 2022. https:\/\/www.kaggle.com\/datasets\/patrickfleith\/spacecraft-thruster-firing-tests-dataset."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.simpa.2020.100049"},{"key":"e_1_2_1_15_1","volume-title":"Computing Extremely Accurate Quantiles Using t-Digests. CoRR abs\/1902.04023","author":"Dunning Ted","year":"2019","unstructured":"Ted Dunning and Otmar Ertl. 2019. Computing Extremely Accurate Quantiles Using t-Digests. CoRR abs\/1902.04023 (2019). arXiv:1902.04023 http:\/\/arxiv.org\/abs\/1902.04023"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/375663.375670"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367510"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/5.1.10"},{"key":"e_1_2_1_19_1","unstructured":"Configuration in Apache IoTDB. 2023. https:\/\/iotdb.apache.org\/UserGuide\/Master\/Reference\/Common-Config-Manual. html."},{"key":"e_1_2_1_20_1","unstructured":"Document in Apache IoTDB. 2023. https:\/\/iotdb.apache.org\/UserGuide\/Master\/Operators-Functions\/Data-Profiling. html#quantile."},{"key":"e_1_2_1_21_1","unstructured":"Implementation in Apache IoTDB. 2023. https:\/\/github.com\/apache\/iotdb\/tree\/research\/exact-quantile."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-012722442-8\/50011-2"},{"key":"e_1_2_1_23_1","unstructured":"Apache IoTDB. 2023. https:\/\/iotdb.apache.org\/."},{"key":"e_1_2_1_24_1","volume-title":"Streaming Quantiles Algorithms with Small Space and Update Time. CoRR abs\/1907.00236","author":"Ivkin Nikita","year":"2019","unstructured":"Nikita Ivkin, Edo Liberty, Kevin J. Lang, Zohar S. Karnin, and Vladimir Braverman. 2019. Streaming Quantiles Algorithms with Small Space and Update Time. CoRR abs\/1907.00236 (2019). arXiv:1907.00236 http:\/\/arxiv.org\/abs\/ 1907.00236"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE53745.2022.00315"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2016.17"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1090\/S0002-9939-1953-0055639-3"},{"key":"e_1_2_1_28_1","volume-title":"Rolia","author":"Kwon YongChul","year":"2011","unstructured":"YongChul Kwon, Magdalena Balazinska, Bill Howe, and Jerome A. Rolia. 2011. A Study of Skew in MapReduce Applications. https:\/\/api.semanticscholar.org\/CorpusID:134490"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/S11390-023--3088-Y"},{"key":"e_1_2_1_30_1","volume-title":"MCUNet: Tiny Deep Learning on IoT Devices. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020","author":"Lin Ji","year":"2020","unstructured":"Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, and Song Han. 2020. MCUNet: Tiny Deep Learning on IoT Devices. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/ 86c51678350f656dcc7f490a43946ee5-Abstract.html"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-019-00555-y"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304204"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352135"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304--3975(80)90061--4"},{"key":"e_1_2_1_35_1","volume-title":"The Flashsort1 Algorithm. Dr. Dobb's Journal (02","author":"Neubert Karl-Dietrich","year":"1998","unstructured":"Karl-Dietrich Neubert. 1998. The Flashsort1 Algorithm. Dr. Dobb's Journal (02 1998), 123--125."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/s002360050048"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/0020-0190(95)00150-B"},{"key":"e_1_2_1_38_1","unstructured":"InfluxDB Quantile. 2023. https:\/\/github.com\/influxdata\/flux\/blob\/8ad36639ede4826242455389fff4810adfc4e884\/stdlib\/ universe\/quantile.go#L404."},{"key":"e_1_2_1_39_1","unstructured":"PostgreSQL Quantile. 2023. https:\/\/www.postgresql.org\/docs\/current\/functions-aggregate.html."},{"key":"e_1_2_1_40_1","unstructured":"Supplementary. 2023. https:\/\/github.com\/thssdb\/exact-quantile\/blob\/main\/Appendix.pdf."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589775"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3410403"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-020-00151-z"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/S41019-023-00223-W"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639280","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3639280","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T15:15:49Z","timestamp":1755789349000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639280"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,12]]},"references-count":44,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,3,12]]}},"alternative-id":["10.1145\/3639280"],"URL":"https:\/\/doi.org\/10.1145\/3639280","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,12]]}}}