{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T19:48:23Z","timestamp":1774986503437,"version":"3.50.1"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,6,17]]},"abstract":"<jats:p>Key-value databases with Log Structured Merge tree are increasingly favored by modern applications. Apart from supporting fast lookup on primary key, efficient queries on non-key attributes are also highly demanded by many of these applications. To enhance query performance, many auxiliary structures like secondary indexing and filters have been developed. However, existing auxiliary structures suffer from three limitations. First, creating filter for every disk component has low lookup efficiency as all components need to be searched during query processing. Second, current secondary index design requires primary table access to fetch the data entries for each output primary key from the index. This indirect entries fetching process involves significant point lookup overhead in the primary table and hence hinders the query performance. Last, maintaining the consistency between the secondary index and the primary table is challenging due to the out-of-place update mechanism of the LSM-tree. To overcome the limitations in existing auxiliary structures for non-key attributes queries, this paper proposes a novel secondary index framework, NEXT, for LSM-based key-value storage system. NEXT utilizes a two-level structure which is integrated with the primary table. In particular, NEXT proposes to create secondary index blocks on each LSM disk component to map the secondary attributes to their corresponding data blocks. In addition, NEXT introduces a global index component which is created on top of all secondary index blocks to direct the secondary index operation to the target secondary index blocks. Finally, NEXT adopts two optimization strategies to further improve the query performance. We implement NEXT on RocksDB and experimentally evaluate its performance against existing methods. Experiments on both static and mixed workloads demonstrate that NEXT outperforms existing methods for different types of non-key attributes.<\/jats:p>","DOI":"10.1145\/3725330","type":"journal-article","created":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:23:29Z","timestamp":1750281809000},"page":"1-25","source":"Crossref","is-referenced-by-count":0,"title":["NEXT: A New Secondary Index Framework for LSM-based Data Storage"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-0295-0200","authenticated-orcid":false,"given":"Jiachen","family":"Shi","sequence":"first","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-3641-1054","authenticated-orcid":false,"given":"Jingyi","family":"Yang","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4430-6373","authenticated-orcid":false,"given":"Gao","family":"Cong","sequence":"additional","affiliation":[{"name":"Nanyang Technological Univesity, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0762-6562","authenticated-orcid":false,"given":"Xiaoli","family":"Li","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research , A*STAR, Singapore, Singapore and Nanyang Technological University, Singapore, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2025,6,18]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733085.2733096"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732951.2732958"},{"key":"e_1_2_2_3_1","unstructured":"Apache. [n. d.]. Cassandra. http:\/\/cassandra.apache.org\/."},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465296"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3340964.3340965"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1365815.1365816"},{"key":"e_1_2_2_7_1","volume-title":"HINT: A Hierarchical Index for Intervals in Main Memory. In SIGMOD '22: International Conference on Management of Data","author":"Christodoulou George","year":"2022","unstructured":"George Christodoulou, Panagiotis Bouros, and Nikos Mamoulis. 2022. HINT: A Hierarchical Index for Intervals in Main Memory. In SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, Zachary G. Ives, Angela Bonifati, and Amr El Abbadi (Eds.). ACM, 1257--1270."},{"key":"e_1_2_2_8_1","unstructured":"Google Code. [n. d.]. Google Code CPP-Btree. https:\/\/code.google.com\/archive\/p\/cpp-btree\/source."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2491245"},{"key":"e_1_2_2_10_1","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020","author":"Dai Yifan","year":"2020","unstructured":"Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan Alagappan, Brian Kroth, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2020. From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4--6, 2020. USENIX Association, 155--171."},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064054"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196927"},{"key":"e_1_2_2_13_1","volume-title":"Chucky: A Succinct Cuckoo Filter for LSM-Tree. In SIGMOD '21: International Conference on Management of Data","author":"Dayan Niv","year":"2021","unstructured":"Niv Dayan and Moshe Twitto. 2021. Chucky: A Succinct Cuckoo Filter for LSM-Tree. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021. ACM, 365--378."},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1294261.1294281"},{"key":"e_1_2_2_15_1","volume-title":"Optimizing Space Amplification in RocksDB. In 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings. www.cidrdb.org.","author":"Dong Siying","year":"2017","unstructured":"Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, Tony Savor, and Michael Strum. 2017. Optimizing Space Amplification in RocksDB. In 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings. www.cidrdb.org."},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3483840"},{"key":"e_1_2_2_17_1","volume-title":"SpatialHadoop: A MapReduce Framework for Spatial Data. In 31st IEEE International Conference on Data Engineering, ICDE 2015","author":"Eldawy Ahmed","year":"2015","unstructured":"Ahmed Eldawy and Mohamed F. Mokbel. 2015. SpatialHadoop: A MapReduce Framework for Spatial Data. In 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13--17, 2015. 1352--1363."},{"key":"e_1_2_2_18_1","unstructured":"Facebook. [n. d.]. RocksDB. https:\/\/github.com\/facebook\/rocksdb\/."},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2674005.2674994"},{"key":"e_1_2_2_20_1","unstructured":"Google. [n. d.]. LevelDB. https:\/\/github.com\/google\/leveldb\/."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.3390\/a15040113"},{"key":"e_1_2_2_22_1","unstructured":"MongoDB Inc. [n. d.]. MongoDB. https:\/\/www.mongodb.com\/."},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/3424573.3424577"},{"key":"e_1_2_2_24_1","unstructured":"Cockroach Labs. [n. d.]. CockroachDB. https:\/\/www.cockroachlabs.com\/."},{"key":"e_1_2_2_25_1","volume-title":"SineKV: Decoupled Secondary Indexing for LSM-based Key-Value Stores. In 40th IEEE International Conference on Distributed Computing Systems, ICDCS 2020","author":"Li Fei","year":"2020","unstructured":"Fei Li, Youyou Lu, Zhe Yang, and Jiwu Shu. 2020. SineKV: Decoupled Secondary Indexing for LSM-based Key-Value Stores. In 40th IEEE International Conference on Distributed Computing Systems, ICDCS 2020, Singapore, November 29 - December 1, 2020. IEEE, 1112--1122."},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3033273"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3303753.3303759"},{"key":"e_1_2_2_28_1","first-page":"3217","article-title":"MyRocks","volume":"13","author":"Matsunobu Yoshinori","year":"2020","unstructured":"Yoshinori Matsunobu, Siying Dong, and Herman Lee. 2020. MyRocks: LSM-Tree Database Storage Engine Serving Facebook's Social Graph. Proc. VLDB Endow., Vol. 13, 12 (2020), 3217--3230.","journal-title":"LSM-Tree Database Storage Engine Serving Facebook's Social Graph. Proc. VLDB Endow."},{"key":"e_1_2_2_29_1","unstructured":"MongoDB. [n. d.]. MongoDB Manual. https:\/\/www.mongodb.com\/docs\/manual\/."},{"key":"e_1_2_2_30_1","unstructured":"Gero Mueller. [n. d.]. R-Trees: A Dynamic Index Structure for Spatial Searching. http:\/\/www.superliminal.com\/sources\/sources.htm."},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/s002360050048"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498698.1594230"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196900"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/3151106.3151108"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3522563"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389757"},{"key":"e_1_2_2_37_1","volume-title":"Constructing and Analyzing the LSM Compaction Design Space (Updated Version). CoRR","author":"Sarkar Subhadeep","year":"2022","unstructured":"Subhadeep Sarkar, Dimitris Staratzis, Zichen Zhu, and Manos Athanassoulis. 2022. Constructing and Analyzing the LSM Compaction Design Space (Updated Version). CoRR, Vol. abs\/2202.04522 (2022)."},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213862"},{"key":"e_1_2_2_39_1","article-title":"Review - The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. ACM SIGMOD","volume":"2","author":"Sellis Timos K.","year":"2000","unstructured":"Timos K. Sellis. 2000. Review - The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. ACM SIGMOD Digit. Rev., Vol. 2 (2000).","journal-title":"Digit. Rev."},{"key":"e_1_2_2_40_1","volume-title":"The LSM RUM-Tree: A Log Structured Merge R-Tree for Update-intensive Spatial Workloads. In 37th IEEE International Conference on Data Engineering, ICDE 2021","author":"Shin Jaewoo","year":"2021","unstructured":"Jaewoo Shin, Jianguo Wang, and Walid G. Aref. 2021. The LSM RUM-Tree: A Log Structured Merge R-Tree for Update-intensive Spatial Workloads. In 37th IEEE International Conference on Data Engineering, ICDE 2021, Chania, Greece, April 19--22, 2021. IEEE, 2285--2290."},{"key":"e_1_2_2_41_1","volume-title":"Aref","author":"Shin Jaewoo","year":"2023","unstructured":"Jaewoo Shin, Jianguo Wang, and Walid G. Aref. 2023. An Update-intensive LSM-based R-tree Index. CoRR, Vol. abs\/2305.01087 (2023)."},{"key":"e_1_2_2_42_1","volume-title":"LSbM-tree: Re-Enabling Buffer Caching in Data Management for Mixed Reads and Writes. In 37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017","author":"Teng Dejun","year":"2017","unstructured":"Dejun Teng, Lei Guo, Rubao Lee, Feng Chen, Siyuan Ma, Yanfeng Zhang, and Xiaodong Zhang. 2017. LSbM-tree: Re-Enabling Buffer Caching in Data Management for Mixed Reads and Writes. In 37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017, Atlanta, GA, USA, June 5--8, 2017. IEEE Computer Society, 68--79."},{"key":"e_1_2_2_43_1","unstructured":"UC-Reverside-DatabaseLab. [n. d.]. Chirp: A Twitter-like Workload Generator. http:\/\/www.cs.ucr.edu\/ ameno002\/benchmark\/."},{"key":"e_1_2_2_44_1","volume-title":"Revisiting Secondary Indexing in LSM-based Storage Systems with Persistent Memory. In 2023 USENIX Annual Technical Conference, USENIX ATC 2023","author":"Wang Jing","year":"2023","unstructured":"Jing Wang, Youyou Lu, Qing Wang, Yuhao Zhang, and Jiwu Shu. 2023. Revisiting Secondary Indexing in LSM-based Storage Systems with Persistent Memory. In 2023 USENIX Annual Technical Conference, USENIX ATC 2023, Boston, MA, USA, July 10--12, 2023. USENIX Association, 817--832."}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3725330","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T18:51:42Z","timestamp":1774983102000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3725330"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,17]]},"references-count":44,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,6,17]]}},"alternative-id":["10.1145\/3725330"],"URL":"https:\/\/doi.org\/10.1145\/3725330","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,17]]}}}