{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T15:48:21Z","timestamp":1763135301602,"version":"3.45.0"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T00:00:00Z","timestamp":1763078400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T00:00:00Z","timestamp":1763078400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000145","name":"Division of Information and Intelligent Systems","doi-asserted-by":"publisher","award":["1838222","1838222"],"award-info":[{"award-number":["1838222","1838222"]}],"id":[{"id":"10.13039\/100000145","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    The efficiency of LSM-tree-based systems is critically dependent on the chosen compaction strategy, broadly categorized into leveled and stack-based strategies. Leveled compaction is beneficial due to its ability to break down larger compactions into more manageable, smaller sub-compactions via partitioning, thereby enhancing parallelism, reducing write stalling, and increasing disk utilization. Particularly for sequential insertions, it enables the transfer of whole files to lower levels without necessitating rewrites, referred to as trivial moves, a capability typically absent in stack-based strategies. These strategies often resort to no or simplistic partitioning techniques, limiting parallelism and the feasibility of trivial moves. This work introduces a hybrid strategy aiming to integrate the best features of both compaction strategies. We introduce two innovative coordinated partitioning algorithms,\n                    <jats:italic>Local-Range<\/jats:italic>\n                    and\n                    <jats:italic>Global-Range<\/jats:italic>\n                    , designed to apply to any stack-based strategy. These algorithms aim to boost parallelism in compactions and facilitate trivial moves, thus reducing the overall compaction costs. By extending RocksDB to incorporate partitioning for stack-based strategies and conducting a comparative analysis with various baselines across different workloads, we demonstrate that the Global-Range partitioning notably improves compaction performance with minimal added overhead.\n                  <\/jats:p>","DOI":"10.1186\/s40537-025-01298-0","type":"journal-article","created":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T15:44:59Z","timestamp":1763135099000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Enhancing LSM trees merge efficiency via coordinated sorted runs partitioning"],"prefix":"10.1186","volume":"12","author":[{"given":"Qizhong","family":"Mao","sequence":"first","affiliation":[]},{"given":"Vagelis","family":"Hristidis","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,11,14]]},"reference":[{"key":"1298_CR1","unstructured":"Apache: AsterixDB (2022).\u00a0https:\/\/asterixdb.apache.org."},{"key":"1298_CR2","unstructured":"Google: Bigtable (2022). https:\/\/cloud.google.com\/bigtable"},{"key":"1298_CR3","unstructured":"Apache: Cassandra (2022). http:\/\/cassandra.apache.org"},{"key":"1298_CR4","unstructured":"Apache: HBase (2022). https:\/\/hbase.apache.org"},{"issue":"3","key":"1298_CR5","doi-asserted-by":"publisher","first-page":"361","DOI":"10.1007\/s00778-020-00638-1","volume":"30","author":"Q Mao","year":"2021","unstructured":"Mao Q, Jacobs S, Amjad W, Hristidis V, Tsotras VJ, Young NE. Comparison and evaluation of state-of-the-art LSM merge policies. VLDB J. 2021;30(3):361\u201378.","journal-title":"VLDB J"},{"key":"1298_CR6","unstructured":"LevelDB (2019). https:\/\/github.com\/google\/leveldb"},{"key":"1298_CR7","unstructured":"Meta: RocksDB (2022). https:\/\/rocksdb.org"},{"key":"1298_CR8","unstructured":"Mao Q. RocksDB (Partitioned Stack-based Compaction) (2022). https:\/\/github.com\/autopear\/rocksdb\/tree\/pstack"},{"issue":"11","key":"1298_CR9","doi-asserted-by":"publisher","first-page":"3071","DOI":"10.14778\/3551793.3551853","volume":"15","author":"N Dayan","year":"2022","unstructured":"Dayan N, Weiss T, Dashevsky S, Pan M, Bortnikov E, Twitto M. Spooky: Granulating LSM-tree compactions correctly. Proc VLDB Endow. 2022;15(11):3071\u201384.","journal-title":"Proc VLDB Endow"},{"key":"1298_CR10","unstructured":"Meta: RocksDB Wiki (2022). https:\/\/github.com\/facebook\/rocksdb\/wiki"},{"issue":"9","key":"1298_CR11","doi-asserted-by":"publisher","first-page":"509","DOI":"10.1145\/361002.361007","volume":"18","author":"JL Bentley","year":"1975","unstructured":"Bentley JL. Multidimensional binary search trees used for associative searching. Commun ACM. 1975;18(9):509\u201317.","journal-title":"Commun ACM"},{"key":"1298_CR12","unstructured":"Sp\u00f6cker G. streamhist-cpp (2021). https:\/\/github.com\/guntersp\/streamhist-cpp"},{"key":"1298_CR13","unstructured":"Ben-Haim Y, Tom-Tov E. A streaming parallel decision tree algorithm. J Mach Learn Res 2010;11(2)\u00a0"},{"issue":"4","key":"1298_CR14","doi-asserted-by":"publisher","first-page":"417","DOI":"10.1007\/BF01302964","volume":"14","author":"M Ajtai","year":"1994","unstructured":"Ajtai M. The complexity of the pigeonhole principle. Combinatorica. 1994;14(4):417\u201333.","journal-title":"Combinatorica"},{"key":"1298_CR15","unstructured":"Meta: RocksDB Wiki: Performance Benchmarks (2022). https:\/\/github.com\/facebook\/rocksdb\/wiki\/Performance-Benchmarks"},{"issue":"4","key":"1298_CR16","doi-asserted-by":"publisher","first-page":"417","DOI":"10.1007\/s00778-005-0171-7","volume":"16","author":"C Jermaine","year":"2007","unstructured":"Jermaine C, Omiecinski E, Yee WG. The partitioned exponential file for database storage management. VLDB J. 2007;16(4):417\u201337.","journal-title":"VLDB J"},{"key":"1298_CR17","doi-asserted-by":"crossref","unstructured":"Zhang W, Xu Y, Li Y, Li D. Improving write performance of LSMT-based key-value store. In: 2016 IEEE 22nd Int Conf Para Distrib Syst (ICPADS), 2016. pp. 553\u2013560. IEEE","DOI":"10.1109\/ICPADS.2016.0079"},{"key":"1298_CR18","doi-asserted-by":"crossref","unstructured":"Mei F, Cao Q, Jiang H, Li J. SifrDB: A unified solution for write-optimized key-value stores in large datacenter. In: Proc of the ACM Symp on Cloud Comput, 2018. pp. 477\u2013489.","DOI":"10.1145\/3267809.3267829"},{"issue":"4","key":"1298_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3139922","volume":"13","author":"T Yao","year":"2017","unstructured":"Yao T, Wan J, Huang P, He X, Wu F, Xie C. Building efficient key-value stores via a lightweight compaction tree. ACM Trans Storage. 2017;13(4):1\u201328.","journal-title":"ACM Trans Storage"},{"key":"1298_CR20","doi-asserted-by":"crossref","unstructured":"Yao T, Wan J, Huang P, He X, Gui Q, Wu F, Xie C. A light-weight compaction tree to reduce I\/O amplification toward efficient key-value stores. In: Proc 33rd Int Conf Massive Storage Syst Technol (MSST), 2017. pp. 1\u201313.","DOI":"10.1145\/3139922"},{"key":"1298_CR21","doi-asserted-by":"crossref","unstructured":"Raju P, Kadekodi R, Chidambaram V, Abraham I. PebblesDB: Building key-value stores using fragmented log-structured merge trees. In: Proc 26th Symp Oper Syst Prin, 2017. pp. 497\u2013514.","DOI":"10.1145\/3132747.3132765"},{"key":"1298_CR22","volume-title":"Design of a write-optimized data store","author":"H Amur","year":"2013","unstructured":"Amur H, Andersen DG, Kaminsky M, Schwan K. Design of a write-optimized data store. Georgia Institute of Technology: Technical report; 2013."},{"key":"1298_CR23","unstructured":"Wu X, Xu Y, Shao Z, Jiang S. LSM-trie: An LSM-tree-based ultra-large key-value store for small data items. In: 2015 USENIX Annu Tech Conf (USENIX ATC 15), 2015. pp. 71\u201382."},{"issue":"3","key":"1298_CR24","first-page":"1","volume":"15","author":"Y Li","year":"2019","unstructured":"Li Y, Chan HH, Lee PP, Xu Y. Enabling efficient updates in KV storage via hashing: design and performance evaluation. ACM Trans Storage. 2019;15(3):1\u201329.","journal-title":"ACM Trans Storage"},{"key":"1298_CR25","doi-asserted-by":"crossref","unstructured":"Mao Q, Hristidis V. Increase merge efficiency in lsm trees through coordinated partitioning of sorted runs. In: 2023 IEEE Inter Conf on Big Data (BigData), 2023. pp. 530\u2013535. IEEE","DOI":"10.1109\/BigData59044.2023.10386222"},{"key":"1298_CR26","unstructured":"Zhong W, Chen C, Wu X, Jiang S. REMIX: Efficient range query for LSM-trees. In: 19th USENIX Conf on File and Storage Tech (FAST 21), 2021. pp. 51\u201364."},{"key":"1298_CR27","doi-asserted-by":"crossref","unstructured":"Zhang H, Lim H, Leis V, Andersen DG, Kaminsky M, Keeton K, Pavlo A. SuRF: Practical range query filtering with fast succinct tries. In: Proceedings of the 2018 Inter Conf Mgmt of Data, 2018. pp. 323\u2013336.\u00a0","DOI":"10.1145\/3183713.3196931"},{"key":"1298_CR28","doi-asserted-by":"crossref","unstructured":"Luo S, Chatterjee S, Ketsetsidis R, Dayan N, Qin, W, Idreos S. Rosetta: A robust space-time optimized range filter for key-value stores. In: Proc 2020 ACM SIGMOD Inter Conf Mgmt of Data, 2020. pp. 2071\u20132086.","DOI":"10.1145\/3318464.3389731"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-025-01298-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-025-01298-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-025-01298-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T15:45:06Z","timestamp":1763135106000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-025-01298-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,14]]},"references-count":28,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1298"],"URL":"https:\/\/doi.org\/10.1186\/s40537-025-01298-0","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,14]]},"assertion":[{"value":"27 March 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 October 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 November 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"252"}}