{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T06:27:24Z","timestamp":1778048844464,"version":"3.51.4"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2024,5,29]],"date-time":"2024-05-29T00:00:00Z","timestamp":1716940800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2024,5,29]]},"abstract":"<jats:p>Log-structured merge-trees (LSM-trees) are widely used in key-value stores because of its excellent write performance. To reduce LSM-tree's read amplification due to overlapping sorted runs, each file (i.e., SSTable) in an LSM-tree is typically associated with a point or range filter to reduce unnecessary I\/Os to the runs that do not contain the target key (range). However, as modern SSDs get faster, probing multiple in-memory filters per query often makes the system CPU bottlenecked, thus compromising the system's throughput. In this paper, we developed the Global Range Filter (GRF) for RocksDB that reduces the number of filter probes per query to one. We follow the pioneering Chucky's approach by storing the sorted run IDs within the filter. However, we identify two practical challenges in building a global range filter: correctness in multi-version concurrency control and efficiency in frequent updates. We solve both challenges by the novel Shape Encoding algorithm. With further optimizations, GRF achieves a dominating performance over the state-of-the-art filters under different workloads when integrated into RocksDB.<\/jats:p>","DOI":"10.1145\/3654944","type":"journal-article","created":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T09:44:53Z","timestamp":1717062293000},"page":"1-27","source":"Crossref","is-referenced-by-count":14,"title":["GRF: A Global Range Filter for LSM-Trees with Shape Encoding"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8006-4220","authenticated-orcid":false,"given":"Hengrui","family":"Wang","sequence":"first","affiliation":[{"name":"Institute for Interdisciplinary Information Science, Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-7642-2763","authenticated-orcid":false,"given":"Te","family":"Guo","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Purdue University, West Lafayette, IN, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-4366-7376","authenticated-orcid":false,"given":"Junzhao","family":"Yang","sequence":"additional","affiliation":[{"name":"Institute for Interdisciplinary Information Science, Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4821-1558","authenticated-orcid":false,"given":"Huanchen","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute for Interdisciplinary Information Science, Tsinghua University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2024,5,30]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"[n. d.]. Apache. Cassandra. http:\/\/cassandra.apache.org."},{"key":"e_1_2_1_2_1","unstructured":"[n. d.]. Apache. HBase. http:\/\/hbase.apache.org\/."},{"key":"e_1_2_1_3_1","unstructured":"[n. d.]. Asynchronous IO in RocksDB. https:\/\/rocksdb.org\/blog\/2022\/10\/07\/asynchronous-io-in-rocksdb.html."},{"key":"e_1_2_1_4_1","unstructured":"[n. d.]. Facebook. RocksDB. https:\/\/github.com\/facebook\/rocksdb.."},{"key":"e_1_2_1_5_1","volume-title":"Disk and Log in Log Structured Key-Value Stores. In USENIX Annual Technical Conference. https:\/\/api.semanticscholar.org\/CorpusID:22824631","author":"Balmau Oana","year":"2017","unstructured":"Oana Balmau, Diego Didona, Rachid Guerraoui, Willy Zwaenepoel, Huapeng Yuan, Aashray Arora, Karan Gupta, and Pavan Konka. 2017. TRIAD: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores. In USENIX Annual Technical Conference. https:\/\/api.semanticscholar.org\/CorpusID:22824631"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/362686.362692"},{"key":"e_1_2_1_7_1","volume-title":"Reed","author":"Carrea Laura","year":"2016","unstructured":"Laura Carrea, Alexei Vernitski, and Martin J. Reed. 2016. Yes-no Bloom filter: A way of representing sets with fewer false positives. ArXiv abs\/1603.01060 (2016). https:\/\/api.semanticscholar.org\/CorpusID:16308190"},{"key":"e_1_2_1_8_1","volume-title":"USENIX Annual Technical Conference. https:\/\/api.semanticscholar.org\/CorpusID:260548458","author":"Chan Helen H. W.","year":"2018","unstructured":"Helen H. W. Chan, Yongkun Li, Patrick P. C. Lee, and Yinlong Xu. 2018. HashKV: Enabling Efficient Updates in KV Storage via Hashing. In USENIX Annual Technical Conference. https:\/\/api.semanticscholar.org\/CorpusID:260548458"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3485450.3485461"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588726"},{"key":"e_1_2_1_11_1","volume-title":"SplinterDB: Closing the Bandwidth Gap for NVMe Key-Value Stores. In 2020 USENIX Annual Technical Conference (USENIX ATC 20)","author":"Conway Alexander","year":"2020","unstructured":"Alexander Conway, Abhishek Gupta, Vijay Chidambaram, Martin Farach-Colton, Richard Spillane, Amy Tai, and Rob Johnson. 2020. SplinterDB: Closing the Bandwidth Gap for NVMe Key-Value Stores. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 49--63. https:\/\/www.usenix.org\/conference\/atc20\/presentation\/ conway"},{"key":"e_1_2_1_12_1","volume-title":"Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier. ArXiv abs\/1910.09131","author":"Dai Zhenwei","year":"2019","unstructured":"Zhenwei Dai and Anshumali Shrivastava. 2019. Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier. ArXiv abs\/1910.09131 (2019). https:\/\/api.semanticscholar.org\/CorpusID:204800314"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064054"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589285"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915219"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196927"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3319903"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457273"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551853"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/3436905.3436919"},{"key":"e_1_2_1_21_1","volume-title":"Dillinger and Stefan Walzer","author":"Peter","year":"2021","unstructured":"Peter C. Dillinger and Stefan Walzer. 2021. Ribbon filter: practically smaller than Bloom and Xor. ArXiv abs\/2103.02515 (2021)."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","author":"Ding Jialin","year":"2019","unstructured":"Jialin Ding, Umar Farooq Minhas, Hantian Zhang, Yinan Li, Chi Wang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, and David B. Lomet. 2019. ALEX: An Updatable Adaptive Learned Index. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (2019)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064033"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3483840"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1368436.1368454"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/3523210.3523211"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2674005.2674994"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/3389133.3389135"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 2019 International Conference on Management of Data","author":"Galakatos Alex","year":"2018","unstructured":"Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2018. FITing-Tree: A Dataaware Index Structure. Proceedings of the 2019 International Conference on Management of Data (2018)."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1975.1055357"},{"key":"e_1_2_1_31_1","volume-title":"Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters. arXiv: Data Structures and Algorithms","author":"Graf Thomas Mueller","year":"2019","unstructured":"Thomas Mueller Graf and Daniel Lemire. 2019. Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters. arXiv: Data Structures and Algorithms (2019)."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3314041"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3529337.3529345"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526167"},{"key":"e_1_2_1_35_1","volume-title":"Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes. ArXiv abs\/2006.13713","author":"Kondylakis Haridimos","year":"2018","unstructured":"Haridimos Kondylakis, Niv Dayan, Konstantinos Zoumpatianos, and Themis Palpanas. 2018. Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes. ArXiv abs\/2006.13713 (2018)."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 2018 International Conference on Management of Data","author":"Kraska Tim","year":"2017","unstructured":"Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2017. The Case for Learned Index Structures. Proceedings of the 2018 International Conference on Management of Data (2017)."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/3303753.3303757"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485447.3511996"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCOMM.2015.2478462"},{"key":"e_1_2_1_40_1","volume-title":"USENIX Conference on File and Storage Technologies. https:\/\/api.semanticscholar.org\/CorpusID:11367463","author":"Lu Lanyue","unstructured":"Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating Keys from Values in SSD-conscious Storage. In USENIX Conference on File and Storage Technologies. https:\/\/api.semanticscholar.org\/CorpusID:11367463"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389731"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.14778\/3421424.3421425"},{"key":"e_1_2_1_43_1","doi-asserted-by":"crossref","unstructured":"Michael Mitzenmacher. 2018. A Model for Learned Bloom Filters and Optimizing by Sandwiching. In Neural Information Processing Systems. https:\/\/api.semanticscholar.org\/CorpusID:54173703","DOI":"10.1007\/978-1-4614-8265-9_751"},{"key":"e_1_2_1_44_1","first-page":"1","article-title":"Adaptive Cuckoo Filters","volume":"25","author":"Mitzenmacher Michael","year":"2017","unstructured":"Michael Mitzenmacher, Salvatore Pontarelli, and Pedro Reviriego. 2017. Adaptive Cuckoo Filters. Journal of Experimental Algorithmics (JEA) 25 (2017), 1 -- 20. https:\/\/api.semanticscholar.org\/CorpusID:9085650","journal-title":"Journal of Experimental Algorithmics (JEA)"},{"key":"e_1_2_1_45_1","volume-title":"Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads. ArXiv abs\/2308.07013","author":"Mo Dingheng","year":"2023","unstructured":"Dingheng Mo, Fanchao Chen, Siqiang Luo, and Caihua Shan. 2023. Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads. ArXiv abs\/2308.07013 (2023). https:\/\/api.semanticscholar.org\/CorpusID:260886977"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3617333"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452841"},{"key":"e_1_2_1_48_1","volume-title":"Lillicrap","author":"Rae W.","year":"2019","unstructured":"JackW. Rae, Sergey Bartunov, and Timothy P. Lillicrap. 2019. Meta-Learning Neural Bloom Filters. ArXiv abs\/1906.04304 (2019). https:\/\/api.semanticscholar.org\/CorpusID:86477253"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476274"},{"key":"e_1_2_1_50_1","volume-title":"On Log-Structured Merge for Solid-State Drives. 2017 IEEE 33rd International Conference on Data Engineering (ICDE) (2017","author":"Thonangi Risi","year":"2017","unstructured":"Risi Thonangi and Jun Yang. 2017. On Log-Structured Merge for Solid-State Drives. 2017 IEEE 33rd International Conference on Data Engineering (ICDE) (2017), 683--694. https:\/\/api.semanticscholar.org\/CorpusID:852089"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.14778\/3529337"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE55515.2023.00158"},{"key":"e_1_2_1_53_1","volume-title":"Hash Adaptive Bloom Filter. 2021 IEEE 37th International Conference on Data Engineering (ICDE) (2021","author":"Xie Rongbiao","year":"2021","unstructured":"Rongbiao Xie, Meng Li, Zheyu Miao, Rong Gu, He Huang, Haipeng Dai, and Guihai Chen. 2021. Hash Adaptive Bloom Filter. 2021 IEEE 37th International Conference on Data Engineering (ICDE) (2021), 636--647. https:\/\/api.semanticscholar. org\/CorpusID:235421991"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196931"},{"key":"e_1_2_1_55_1","volume-title":"REMIX: Efficient Range Query for LSM-trees. In 19th USENIX Conference on File and Storage Technologies (FAST 21)","author":"Zhong Wenshao","year":"2021","unstructured":"Wenshao Zhong, Chen Chen, Xingbo Wu, and Song Jiang. 2021. REMIX: Efficient Range Query for LSM-trees. In 19th USENIX Conference on File and Storage Technologies (FAST 21). USENIX Association, 51--64. https:\/\/www.usenix.org\/conference\/fast21\/presentation\/zhong"},{"key":"e_1_2_1_56_1","unstructured":"Zichen Zhu. 2023. SHaMBa: Reducing Bloom Filter Overhead in LSM Trees. In PhD@VLDB. https:\/\/api.semanticscholar.org\/CorpusID:259848936"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3654944","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3654944","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T14:38:57Z","timestamp":1755787137000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3654944"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,29]]},"references-count":56,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,5,29]]}},"alternative-id":["10.1145\/3654944"],"URL":"https:\/\/doi.org\/10.1145\/3654944","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,29]]}}}