{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:30:54Z","timestamp":1772166654699,"version":"3.50.1"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,2,27]],"date-time":"2025-02-27T00:00:00Z","timestamp":1740614400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,2,27]],"date-time":"2025-02-27T00:00:00Z","timestamp":1740614400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>This study aims to develop an LSM tree that uses two different kinds of probabilistic data structures (PDS). These two data structures are the Bloom Filters and the state-of-the-art Cuckoo Filter released in 2014 by Fan. Cuckoo filters are a perfect choice for saving space and also for deleting keys that are not needed in the algorithm so the filters are synchronized with the keys of all the SSTables. The implementation in this study of Cuckoo filters showed three main advantages (1) Keys that are deleted in the SSTables are immediately reflected in the Cuckoo Filters (2) When SSTables are merged, Cuckoo Filters are also merged by measuring its load factor for availability (3) The performance of Cuckoo Filters compared to Bloom filters after deleting and query shows better performance despite that cuckoo filters are also being updated. This approach is expected to reduce storage space and the amount of time for queries and deletions on the overall data structure. This new optimized LSM tree is expected to be implemented in HBase as future work to check its performance with a real-world database for more empirical results.<\/jats:p>","DOI":"10.1186\/s40537-025-01097-7","type":"journal-article","created":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T07:55:43Z","timestamp":1740729343000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["LSM tree read-deletion operations optimization through the implementation of cuckoo filters"],"prefix":"10.1186","volume":"12","author":[{"given":"Humberto Cesar Villalta","family":"Valverde","sequence":"first","affiliation":[]},{"given":"KwangSik","family":"Kim","sequence":"additional","affiliation":[]},{"given":"Kisu","family":"Kim","sequence":"additional","affiliation":[]},{"given":"Jinman","family":"Kwon","sequence":"additional","affiliation":[]},{"given":"Jaechoon","family":"Lim","sequence":"additional","affiliation":[]},{"given":"Yongjoo","family":"Jun","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,2,27]]},"reference":[{"key":"1097_CR1","doi-asserted-by":"publisher","unstructured":"Kahrs S. Red-black trees with types. In: Cambridge University Press, United Kingdom 11.4 2001, pp. 425-432. https:\/\/doi.org\/10.1017\/S0956796801004026.","DOI":"10.1017\/S0956796801004026"},{"key":"1097_CR2","unstructured":"Siddharth N, Shubham S, Singh OS. AVL tree and its operations. In: International Journal of Innovative Research in Technology 1.6. 2014, pp. 1826-1828."},{"key":"1097_CR3","doi-asserted-by":"publisher","first-page":"173","DOI":"10.1016\/S0019-9958(85)80034-6","volume":"67","author":"AK Tsakalidis","year":"1995","unstructured":"Tsakalidis AK. AVL-trees for localized search. Inf Control. 1995;67:173\u201394.","journal-title":"Inf Control"},{"key":"1097_CR4","unstructured":"Robert S. Left-leaning red-black trees. Tech. rep. Princeton University."},{"key":"1097_CR5","doi-asserted-by":"publisher","unstructured":"Praveena DA, Bharathi B. A survey paper on big data analytics. In: 2017 International Conference on Information Communication and Embedded Systems (ICICES), Chennai, India 2017, pp. 1-9. https:\/\/doi.org\/10.1109\/ICICES.2017.8070723.","DOI":"10.1109\/ICICES.2017.8070723"},{"key":"1097_CR6","doi-asserted-by":"publisher","unstructured":"Ruijie T. et al. A survey of spatio-temporal big data indexing methods in distributed environment. In: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15 (May 2022). https:\/\/doi.org\/10.1109\/JSTARS.2022.3175657.","DOI":"10.1109\/JSTARS.2022.3175657"},{"key":"1097_CR7","doi-asserted-by":"publisher","first-page":"229","DOI":"10.3233\/AIS-220101","volume":"14","author":"K Mandeep","year":"2021","unstructured":"Mandeep K, Amritpal S. Probabilistic data structures in smart city: survey, applications, challenges, and research directions. J Ambient Intell Smart Environ. 2021;14:229\u201384. https:\/\/doi.org\/10.3233\/AIS-220101.","journal-title":"J Ambient Intell Smart Environ"},{"key":"1097_CR8","doi-asserted-by":"publisher","unstructured":"Niv D, Moshe T. Chucky: a succinct Cuckoo filter for LSM-Tree. In: International Conference on Management of Data 2021, pp. 365-378. https:\/\/doi.org\/10.1145\/3448016.3457273.","DOI":"10.1145\/3448016.3457273."},{"key":"1097_CR9","doi-asserted-by":"publisher","unstructured":"Novak B, Ari T, David S. Birdwatching: false negatives in cuckoo filters. In: Association for Computing Machinery. 2020, pp. 13-14. https:\/\/doi.org\/10.1145\/3426746.3434063.","DOI":"10.1145\/3426746.3434063"},{"key":"1097_CR10","doi-asserted-by":"publisher","unstructured":"Feiyue W. et al. The power of better choice: reducing relocations in cuckoo filter. In: International Conference on Distributed Computing Systems (ICDCS) 2019, pp. 358-367. https:\/\/doi.org\/10.1109\/ICDCS.2019.00043.","DOI":"10.1109\/ICDCS.2019.00043"},{"key":"1097_CR11","doi-asserted-by":"publisher","unstructured":"Hanhua C. et al. The dynamic cuckoo filter. In: International Conference on Network Protocols (ICNP) (2017), pp. 1-10. https:\/\/doi.org\/10.1109\/ICNP.2017.8117563.","DOI":"10.1109\/ICNP.2017.8117563"},{"key":"1097_CR12","doi-asserted-by":"publisher","unstructured":"Kun H, Tong Y. Additive and subtractive cuckoo filters. In: International Symposium on Quality of Service (IWQoS). 2020, pp. 1-10. https:\/\/doi.org\/10.1109\/IWQoS49365.2020.9213014.","DOI":"10.1109\/IWQoS49365.2020.9213014"},{"key":"1097_CR13","doi-asserted-by":"publisher","unstructured":"Bin F. et al. Cuckoo filter: practically better than bloom. In: 2014. https:\/\/doi.org\/10.1145\/2674005.2674994.","DOI":"10.1145\/2674005.2674994"},{"key":"1097_CR14","doi-asserted-by":"publisher","unstructured":"Pedro R, Jorge M, Salvatore P. CFBF: Reducing the insertion time of cuckoo filters with an integrated bloom filter. In: IEEE 22 Communications Letters 23.10 2019, pp. 1857-1861. https:\/\/doi.org\/10.1109\/LCOMM.2019.2930508.","DOI":"10.1109\/LCOMM.2019.2930508"},{"key":"1097_CR15","unstructured":"Su Y. et al. Authorized certificateless conjunctive keyword search on encrypted EHRs from WSNs\u201d. In: Journal of Information Science Engineering 36.4. 2020."},{"key":"1097_CR16","doi-asserted-by":"crossref","unstructured":"Zhang T et al. Cuckoo-RPL: cuckoo filter based RPL for defending AMI network from blackhole attacks. In: Chinese Control Conference (CCC) (2019), pp. 8920-8925.","DOI":"10.23919\/ChiCC.2019.8866139"},{"key":"1097_CR17","doi-asserted-by":"crossref","unstructured":"Chaudhary R et al. SDN-enabled multi-attribute-based secure communication for smart grid in IIoT environment. In: IEEE Transactions on Industrial Informatics 14.6 (2018), pp. 2629-2640.","DOI":"10.1109\/TII.2018.2789442"},{"key":"1097_CR18","doi-asserted-by":"crossref","unstructured":"Kalalas C, Alonso-Zarate J. Lightweight and space-efficient vehicle authentication based on Cuckoo filter. In: IEEE 3rd 5G World Forum (5GWF) (2020), 139-144.","DOI":"10.1109\/5GWF49715.2020.9221363"},{"key":"1097_CR19","doi-asserted-by":"publisher","unstructured":"O\u2019Neil Patrick. et al. The log-structured merge-tree (LSM-Tree)\u201d. In: Acta Informatica 15.2. 1996. https:\/\/doi.org\/10.1007\/s002360050048","DOI":"10.1007\/s002360050048"},{"key":"1097_CR20","unstructured":"Apache Reference Guide. accessed October 25, 2023. url: https:\/\/hbase.apache.org\/apache"},{"key":"1097_CR21","unstructured":"User Manual (2.x). accessed April 10, 2023. url: https:\/\/accumulo.apache.org\/docs\/2.x\/getting-started\/quickstart"},{"key":"1097_CR22","unstructured":"Fay C. et al. Bigtable: a distributed storage system for structured data. In: OSDI 2006."},{"key":"1097_CR23","unstructured":"Welcome to Apache Cassandra\u2019s documentation! accessed May 5, 2023. url: https:\/\/cassandra.apache.org\/doc\/latest\/"},{"key":"1097_CR24","unstructured":"Stefan H, Marc N, Alexander H. HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. In: EDBT\/ICDT (Mar. 2018)."},{"key":"1097_CR25","unstructured":"Rasmus P. Cuckoo hashing. In: Elsevier Science . 2003."},{"key":"1097_CR26","doi-asserted-by":"publisher","unstructured":"Reviriego et al. Cuckoo filters and Bloom filters: comparison and application to packet classification. In: IEEE Transactions on Network and Service Management 17.4 (Dec. 2020), pp. 2690-2701. https:\/\/doi.org\/10.1109\/TNSM.2020.3024680.","DOI":"10.1109\/TNSM.2020.3024680"},{"key":"1097_CR27","doi-asserted-by":"crossref","unstructured":"Burton BH. Space\/time trade-offs in hash coding with allowable errors. In: Communications of the ACM 13.7 1970. pp. 422-426.","DOI":"10.1145\/362686.362692"},{"key":"1097_CR28","doi-asserted-by":"publisher","unstructured":"a Paulo S\u00e9rgio Almeida, Baquero Carlos, Pregui\u00e7a, Nuno, Hutchison, David. \u201cScalable Bloom Filters\u201d, Elsevier, 2007;101, 255-261, https:\/\/doi.org\/10.1016\/j.ipl.2006.10.007.","DOI":"10.1016\/j.ipl.2006.10.007"},{"key":"1097_CR29","doi-asserted-by":"publisher","unstructured":"Andrew A, et al. Gaussian KD-Trees for fast high-dimensional filtering. In: ACM Transactions on Graphics 8.3 2009, pp. 1-12. https:\/\/doi.org\/10.1145\/1531326.1531327.","DOI":"10.1145\/1531326.1531327"},{"key":"1097_CR30","doi-asserted-by":"crossref","unstructured":"Steve L, Michael M. Using the power of two choices to improve Bloom filters. In: Internet Mathematics 4.1 2007 pp. 17-33.","DOI":"10.1080\/15427951.2007.10129136"},{"key":"1097_CR31","doi-asserted-by":"publisher","unstructured":"Radhakrishna B, Kanala TR, Vaibhav, Panyala R. Hunting the pertinency of hash and bloom filter combinations on GPU for fast pattern matching. In: Int. j. inf. tecnol.14 (2022), 2667-2679. https:\/\/doi.org\/10.1007\/s41870-022-00964-3.23","DOI":"10.1007\/s41870-022-00964-3.23"},{"key":"1097_CR32","unstructured":"Intel \u00ae NUC 11 Performance kit - NUC11PAHi7. accessed April 10, 2023. URL:https:\/\/ark.intel.com\/content\/www\/us\/en\/ark\/products\/205073\/intel-nuc-11-performance-kit-nuc11pahi7.html."},{"key":"1097_CR33","doi-asserted-by":"crossref","unstructured":"He K, et al. FlatLSM: write-optimized LSM-Tree for PM-based KV stores. In: ACM Transactions on Storage 19.2. 2023, pp. 1-26.","DOI":"10.1145\/3579855"},{"key":"1097_CR34","unstructured":"Mun JH, et al. LSM-Trees Under (Memory) pressure. In: Proceedings of the International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (Sept. 2022), pp. 1-13."},{"key":"1097_CR35","doi-asserted-by":"crossref","unstructured":"Ren K, et al. SlimDB: a space-efficient KKey-ValueStorage engine for semi sorted data. In: PVLDB. 2017; 10(13):2037\u201348.","DOI":"10.14778\/3151106.3151108"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-025-01097-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-025-01097-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-025-01097-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T07:55:49Z","timestamp":1740729349000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-025-01097-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,27]]},"references-count":35,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1097"],"URL":"https:\/\/doi.org\/10.1186\/s40537-025-01097-7","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-3226421\/v1","asserted-by":"object"}]},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,27]]},"assertion":[{"value":"2 August 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 February 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 February 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"Not applicable","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"51"}}