{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:42:27Z","timestamp":1760060547922,"version":"build-2065373602"},"reference-count":44,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2025,8,30]],"date-time":"2025-08-30T00:00:00Z","timestamp":1756512000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Key Research and Development Plan of China","award":["2023YFB4502704"],"award-info":[{"award-number":["2023YFB4502704"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>Optimizing metadata indexing remains critical for enhancing distributed file system performance. The Traditional Log-Structured Merge-Trees (LSM-Trees) architecture, while effective for write-intensive operations, exhibits significant limitations when handling massive metadata workloads, particularly manifesting as suboptimal read performance and substantial indexing overhead. Although existing learned indexes perform well on read-only workloads, they struggle to support modifications such as inserts and updates effectively. This paper proposes SwiftKV, a novel metadata indexing scheme that combines LSM-Tree and learned indexes to address these issues. Firstly, SwiftKV employs a dynamic partition strategy to narrow the metadata search range. Secondly, a two-level learned index block, consisting of Greedy Piecewise Linear Regression (Greedy-PLR) and Linear Regression (LR) models, is leveraged to replace the typical Sorted String Table (SSTable) index block for faster location prediction than binary search. Thirdly, SwiftKV incorporates a load-aware construction mechanism and parallel optimization to minimize training overhead and enhance efficiency. This work bridges the gap between LSM-Trees\u2019 write efficiency and learned indexes\u2019 query performance, offering a scalable and high-performance solution for modern distributed file systems. This paper implements the prototype of SwiftKV based on RocksDB. The experimental results show that it narrows the memory usage of index blocks by 30.06% and reduces read latency by 1.19\u00d7~1.60\u00d7 without affecting write performance. Furthermore, SwiftKV\u2019s two-level learned index achieves a 15.13% reduction in query latency and a 44.03% reduction in memory overhead compared to a single-level model. For all YCSB workloads, SwiftKV outperforms other schemes.<\/jats:p>","DOI":"10.3390\/fi17090398","type":"journal-article","created":{"date-parts":[[2025,9,2]],"date-time":"2025-09-02T08:23:38Z","timestamp":1756801418000},"page":"398","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["SwiftKV: A Metadata Indexing Scheme Integrating LSM-Tree and Learned Index for Distributed KV Stores"],"prefix":"10.3390","volume":"17","author":[{"given":"Zhenfei","family":"Wang","sequence":"first","affiliation":[{"name":"School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4251-8441","authenticated-orcid":false,"given":"Jianxun","family":"Feng","sequence":"additional","affiliation":[{"name":"School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Longxiang","family":"Dun","sequence":"additional","affiliation":[{"name":"School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ziliang","family":"Bao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chunfeng","family":"Du","sequence":"additional","affiliation":[{"name":"School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,8,30]]},"reference":[{"key":"ref_1","unstructured":"Leung, A.W., Shao, M., Bisson, T., Pasupathy, S., and Miller, E.L. (2009, January 24\u201327). Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems. Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST\u201909), San Francisco, CA, USA."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"3850","DOI":"10.1109\/TPDS.2022.3170574","article-title":"The State of the Art of Metadata Managements in Large-Scale Distributed File Systems\u2014Scalability, Performance and Availability","volume":"33","author":"Dai","year":"2022","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_3","first-page":"1","article-title":"End-to-end I\/O Monitoring on Leading Supercomputers","volume":"19","author":"Yang","year":"2023","journal-title":"ACM Trans. Storage"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"2840","DOI":"10.1109\/TNET.2023.3266400","article-title":"An Adaptive Metadata Management Scheme Based on Deep Reinforcement Learning for Large-Scale Distributed File Systems","volume":"31","author":"Huang","year":"2023","journal-title":"IEEE\/ACM Trans. Netw."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Jiao, Y., Bertron, S., Patel, S., Zeller, L., Bennett, R., Mukherjee, N., Bender, M.A., Condict, M., Conway, A., and Farach-Colton, M. (2022, January 5\u20138). BetrFS: A Compleat File System for Commodity SSDs. Proceedings of the 7th European Conference on Computer Systems (EuroSys\u201922), Rennes, France.","DOI":"10.1145\/3492321.3519571"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Ren, K., Zheng, Q., Patil, S., and Gibson, G. (2014, January 16\u201321). IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion. Proceedings of the 14th International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201914), New Orleans, LA, USA.","DOI":"10.1109\/SC.2014.25"},{"key":"ref_7","unstructured":"Ren, K., and Gibson, G. (November, January 31). TABLEFS: Embedding a NoSQL Database Inside the Local File System. Proceedings of the 2012 Asia-Pacific Magnetic Recording Conference (APMRC\u201912), Singapore."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Mei, F., Cao, Q., Jiang, H., and Tintri, L.T. (2017, January 25\u201327). LSM-tree Managed Storage for Large-Scale Key-Value Store. Proceedings of the 2017 Symposium on Cloud Computing (SOCC\u201917), Santa Clara, CA, USA.","DOI":"10.1145\/3127479.3127486"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1365815.1365816","article-title":"Bigtable: A Distributed Storage System for Structured Data","volume":"26","author":"Chang","year":"2008","journal-title":"ACM Trans. Comput. Syst."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wang, L., Ding, G., Zhao, Y., Wu, D., and He, C. (2017, January 18\u201320). Optimization of LevelDB by Separating Key and Value. Proceedings of the 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT\u201917), Taipei, Taiwan.","DOI":"10.1109\/PDCAT.2017.00074"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3483840","article-title":"Rocksdb: Evolution of Development Priorities in a KV Store Serving Large-Scale Applications","volume":"17","author":"Dong","year":"2021","journal-title":"ACM Trans. Storage"},{"key":"ref_12","unstructured":"Vora, M.N. (2011, January 24\u201326). Hadoop-HBase for Large-Scale Data. Proceedings of the 2011 International Conference on Computer Science and Network Technology (ICCSNT\u201911), Harbin, China."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1145\/1773912.1773922","article-title":"Cassandra: A Decentralized Structured Storage System","volume":"44","author":"Lakshman","year":"2010","journal-title":"ACM SIGOPS Oper. Syst. Rev."},{"key":"ref_14","unstructured":"Wu, F., Yang, M.H., Zhang, B., and Du, D.H. (2020, January 15\u201317). AC-Key: Adaptive Caching for LSM-based Key-Value Stores. Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC\u201920), Online."},{"key":"ref_15","unstructured":"Zhong, W., Chen, C., Wu, X., and Jiang, S. (2021, January 23\u201325). REMIX: Efficient Range Query for LSM-trees. Proceedings of the 19th USENIX Conference on File and Storage Technologies (FAST\u201921), Online."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Kraska, T., Beutel, A., Chi, E.H., Dean, J., and Polyzotis, N. (2018, January 10\u201315). The Case for Learned Index Structures. Proceedings of the 2018 International Conference on Management of Data (SIGMOD\u201918), Portland, OR, USA.","DOI":"10.1145\/3183713.3196909"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Ding, J., Minhas, U.F., Yu, J., Wang, C., Do, J., Li, Y., Zhang, H., Chandramouli, B., Gehrke, J., and Kossmann, D. (2020, January 14\u201319). ALEX: An Updatable Adaptive Learned Index. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD\u201920), Portland, OR, USA.","DOI":"10.1145\/3318464.3389711"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1162","DOI":"10.14778\/3389133.3389135","article-title":"The PGM-index: A Fully-Dynamic Compressed Learned Index Witwithovable Worst-Case Bounds","volume":"13","author":"Ferragina","year":"2020","journal-title":"Proc. VLDB Endow."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Tang, C., Wang, Y., Dong, Z., Hu, G., Wang, Z., Wang, M., and Chen, H. (2020, January 22\u201326). XIndex: A Scalable Learned Index for Multicore Data Storage. Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201920), San Diego, CA, USA.","DOI":"10.1145\/3332466.3374547"},{"key":"ref_20","unstructured":"Dai, Y., Xu, Y., Ganesan, A., Alagappan, R., Kroth, B., Arpaci-Dusseau, A., and Arpaci-Dusseau, R. (2020, January 4\u20136). From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees. Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201920), Virtual Event."},{"key":"ref_21","unstructured":"Abu-Libdeh, H., Alt\u0131nb\u00fcken, D., Beutel, A., Chi, E.H., Doshi, L., Kraska, T., (Steve)Li, X., Andy Ly, A., and Olston, C. (2020). Learned Indexes for a Google-Scale Disk-Based Database. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3465405","article-title":"Survey of Distributed File System Design Choices","volume":"18","author":"Macko","year":"2022","journal-title":"ACM Trans. Storage"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"062015","DOI":"10.1088\/1742-6596\/898\/6\/062015","article-title":"CephFS: A New Generation Storage Platform for Australian High Energy Physics","volume":"898","author":"Borges","year":"2017","journal-title":"J. Phys. Conf. Ser."},{"key":"ref_24","unstructured":"Karun, A.K., and Chitharanjan, K. (2013, January 11\u201312). A Review on Hadoop\u2014HDFS InfraStructure Extensions. Proceedings of the 2013 IEEE Conference on Information & Communication Technologies (ICT\u201913), Thuckalay, India."},{"key":"ref_25","unstructured":"Yang, X., Liu, Q., Yin, B., Zhang, Q., Zhou, D., and Wei, X. (2017, January 19\u201321). kd Tree Construction Designed for Motion Blur. Proceedings of the 28th Eurographics Symposium on Rendering: Experimental Ideas & Implementations (EGSR\u201917), Helsinki, Finland."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Hadjieleftheriou, M., Manolopoulos, Y., Theodoridis, Y., and Tsotras, V.J. (2017). R-trees: A Dynamic Index Structure for Spatial Searching. Encyclopedia of GIS, Springer.","DOI":"10.1007\/978-3-319-17885-1_1151"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Hua, Y., Jiang, H., Zhu, Y., Feng, D., and Tian, L. (2009, January 14\u201320). SmartStore: A New Metadata Organization Paradigm with Semantic-Awareness for Next-Generation File Systems. Proceedings of the 2009 Conference on High Performance Computing Networking, Storage and Analysis (SC\u201909), Portland, OR, USA.","DOI":"10.1145\/1654059.1654070"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Manno, D., Lee, J., Challa, P., Zheng, Q., Bonnie, D., Grider, G., and Settlemyer, B. (2022, January 13\u201318). Gufi: Fast, Secure File System Metadata Search for Both Privileged and Unprivileged Users. Proceedings of the 2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201922), Dallas, TX, USA.","DOI":"10.1109\/SC41404.2022.00062"},{"key":"ref_29","unstructured":"Leibovici, T. (2015). Taking Back Control of HPC File Systems with Robinhood Policy Engine. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Li, S., Lu, Y., Shu, J., Hu, Y., and Li, T. (2017, January 12\u201317). Locofs: A Loosely-Coupled Metadata Service for Distributed File Systems. Proceedings of the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201917), Denver, CO, USA.","DOI":"10.1145\/3126908.3126928"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Roh, H., Park, S., Kim, S., Shin, M., and Lee, S.W. (2011). B+-tree Index Optimization by Exploiting Internal Parallelism of Flash-based Solid State Drives. arXiv.","DOI":"10.14778\/2095686.2095688"},{"key":"ref_32","unstructured":"Zuo, P., Hua, Y., and Wu, J. (2018, January 8\u201310). Write-Optimized and High-Performance Hashing index scheme for persistent memory. Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918), Carlsbad, CA, USA."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1016\/j.isprsjprs.2019.08.006","article-title":"An Enhanced Bloom Index for Quantifying Floral Phenology Using Multi-Scale Remote Sensing Observations","volume":"156","author":"Chen","year":"2019","journal-title":"J. Photogramm. Remote Sens."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"843","DOI":"10.1109\/TPDS.2022.3232382","article-title":"PetaKV: Building Efficient Key-Value Store for File System Metadata on Persistent Memory","volume":"34","author":"Zhang","year":"2022","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1696","DOI":"10.14778\/3401960.3401967","article-title":"Sharing Opportunities for OLTP Workloads in Different Isolation Levels","volume":"13","author":"Rehrmann","year":"2020","journal-title":"Proc. VLDB Endow."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Mitra, S., Winslett, M., and Hsu, W. (2008, January 9\u201312). Query-Based Partitioning of Documents and Indexes for Information Lifecycle Management. Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD\u201908), Vancouver, BC, Canada.","DOI":"10.1145\/1376616.1376680"},{"key":"ref_37","unstructured":"Keogh, E., Chu, S., Hart, D., and Pazzani, M. (December, January 29). An Online Algorithm for Segmenting Time Series. Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM\u201901), San Jose, CA, USA."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Fedorova, A., Mustard, C., Beschastnikh, I., Rubin, J., Wong, A., Miucin, S., and Ye, L. (2018, January 4\u20139). Performance Comprehension at WiredTiger. Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC\/FSE\u201918), Lake Buena Vista, FL, USA.","DOI":"10.1145\/3236024.3236081"},{"key":"ref_39","unstructured":"Chan, H., Li, Y., Lee, P., and Xu, Y. (2018, January 11\u201313). HashKV: Enabling Efficient Updates in KV Storage via Hashing. Proceedings of the 2018 USENIX Annual Technical Conference (ATC\u201918), Boston, MA, USA."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2701415","article-title":"Verification of a Cryptographic Primitive: SHA-256","volume":"37","author":"Appel","year":"2015","journal-title":"ACM Trans. Program. Lang. Syst."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., and Sears, R. (2010, January 10\u201311). Benchmarking Cloud Serving Systems with YCSB. Proceedings of the 1st ACM Symposium on Cloud Computing (SOCC\u201910), Indianapolis, IN, USA.","DOI":"10.1145\/1807128.1807152"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Wu, J., Zhang, Y., Chen, S., Wang, J., Chen, Y., and Xing, C. (2021). Updatable Learned Index with Precise Positions. arXiv.","DOI":"10.14778\/3457390.3457393"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"243","DOI":"10.14778\/3565816.3565826","article-title":"PLIN: A persistent learned index for non-volatile memory with high performance and instant recovery","volume":"16","author":"Zhang","year":"2022","journal-title":"Proc. VLDB Endow."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Sun, J., Li, S., Sun, Y., Sun, C., Vucinic, D., and Huang, J. (2023, January 25\u201329). LeaFTL: A Learning-Based Flash Translation Layer for Solid-State Drives. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201923), Vancouver, BC, Canada.","DOI":"10.1145\/3575693.3575744"}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/17\/9\/398\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:36:14Z","timestamp":1760034974000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/17\/9\/398"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,30]]},"references-count":44,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["fi17090398"],"URL":"https:\/\/doi.org\/10.3390\/fi17090398","relation":{},"ISSN":["1999-5903"],"issn-type":[{"type":"electronic","value":"1999-5903"}],"subject":[],"published":{"date-parts":[[2025,8,30]]}}}