{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,19]],"date-time":"2026-05-19T07:14:41Z","timestamp":1779174881491,"version":"3.51.4"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"9","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,5]]},"abstract":"<jats:p>\n            Learned indexes have attracted significant research interest due to their potential to offer better space-time trade-offs compared to B+-tree variants. Among various learned indexes, the PGM-Index based on error-bounded piecewise linear approximation is an elegant data structure that has demonstrated\n            <jats:italic toggle=\"yes\">provably<\/jats:italic>\n            superior performance over conventional B+-tree indexes. However, despite numerous efforts to optimize the design of the PGM-Index, few systematically study the root causes of performance mismatches observed in practice. In this paper, we explore two key research questions.\n            <jats:bold>Q1<\/jats:bold>\n            :\n            <jats:italic toggle=\"yes\">Why are PGM-Indexes theoretically effective?<\/jats:italic>\n            and\n            <jats:bold>Q2<\/jats:bold>\n            :\n            <jats:italic toggle=\"yes\">Why do PGM-Indexes underperform in practice?<\/jats:italic>\n            For\n            <jats:bold>Q1<\/jats:bold>\n            , we show that for a set of\n            <jats:italic toggle=\"yes\">N<\/jats:italic>\n            sorted keys, the PGM-Index can achieve a lookup time of\n            <jats:italic toggle=\"yes\">O<\/jats:italic>\n            (log log\n            <jats:italic toggle=\"yes\">N<\/jats:italic>\n            ) while using\n            <jats:italic toggle=\"yes\">O<\/jats:italic>\n            (\n            <jats:italic toggle=\"yes\">N<\/jats:italic>\n            ) space. For\n            <jats:bold>Q2<\/jats:bold>\n            , we identify that querying PGM-Indexes is highly memory-bound, where the internal index search operations often become the bottleneck. To fill the performance gap, we propose PGM++, a\n            <jats:italic toggle=\"yes\">simple yet effective<\/jats:italic>\n            extension to the original PGM-Index that employs a mixture of different search strategies, with hyper-parameters automatically tuned through a cost model calibrated by theoretical findings. Extensive experiments show that, at comparable space costs, PGM++ speeds up index lookup queries by up to 2.31X and 1.56X when compared to the original PGM-Index and SOTA baselines.\n          <\/jats:p>","DOI":"10.14778\/3746405.3746415","type":"journal-article","created":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T17:06:20Z","timestamp":1756919180000},"page":"2886-2898","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Why Are Learned Indexes So Effective but Sometimes Ineffective?"],"prefix":"10.14778","volume":"18","author":[{"given":"Qiyu","family":"Liu","sequence":"first","affiliation":[{"name":"Southwest University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Siyuan","family":"Han","sequence":"additional","affiliation":[{"name":"HKUST"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanlin","family":"Qi","sequence":"additional","affiliation":[{"name":"HIT Shenzhen"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jingshu","family":"Peng","sequence":"additional","affiliation":[{"name":"ByteDance"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jin","family":"Li","sequence":"additional","affiliation":[{"name":"Harvard University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Longlong","family":"Lin","sequence":"additional","affiliation":[{"name":"Southwest University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lei","family":"Chen","sequence":"additional","affiliation":[{"name":"HKUST &amp; HKUST (GZ)"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,9,3]]},"reference":[{"key":"e_1_2_1_1_1","article-title":"A Kernel Multiple Change-point Algorithm via Model Selection","volume":"20","author":"Arlot Sylvain","year":"2019","unstructured":"Sylvain Arlot, Alain Celisse, and Za\u00efd Harchaoui. 2019. A Kernel Multiple Change-point Algorithm via Model Selection. J. Mach. Learn. Res. 20 (2019), 162:1\u2013162:56.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_1_2_1","unstructured":"biglittle [n.d.]. big.LITTLE: Balancing Power Efficiency and Performance. https:\/\/www.arm.com\/en\/technologies\/big-little. Accessed: 2024-06-12."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196896"},{"key":"e_1_2_1_4_1","volume-title":"A course in probability theory","author":"Chung Kai Lai","unstructured":"Kai Lai Chung. 2000. A course in probability theory. Elsevier."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/356770.356776"},{"key":"e_1_2_1_6_1","volume-title":"Introduction to algorithms","author":"Cormen Thomas H","unstructured":"Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2022. Introduction to algorithms. MIT press."},{"key":"e_1_2_1_7_1","volume-title":"Arpaci-Dusseau","author":"Dai Yifan","year":"2020","unstructured":"Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan Alagappan, Brian Kroth, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2020. From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees. In OSDI. USENIX Association, 155\u2013171."},{"key":"e_1_2_1_8_1","volume-title":"ALEX: An Updatable Adaptive Learned Index. In SIGMOD Conference. ACM, 969\u2013984","author":"Ding Jialin","year":"2020","unstructured":"Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David B. Lomet, and Tim Kraska. 2020. ALEX: An Updatable Adaptive Learned Index. In SIGMOD Conference. ACM, 969\u2013984."},{"key":"e_1_2_1_9_1","volume-title":"ICML (Proceedings of Machine Learning Research)","volume":"119","author":"Ferragina Paolo","year":"2020","unstructured":"Paolo Ferragina, Fabrizio Lillo, and Giorgio Vinciguerra. 2020. Why Are Learned Indexes So Effective?. In ICML (Proceedings of Machine Learning Research), Vol. 119. PMLR, 3123\u20133132."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/3389133.3389135"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3319860"},{"key":"e_1_2_1_12_1","volume-title":"Computer architecture: a quantitative approach","author":"Hennessy John L","unstructured":"John L Hennessy and David A Patterson. 2011. Computer architecture: a quantitative approach. Elsevier."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/356924.356928"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807206"},{"key":"e_1_2_1_15_1","first-page":"1","article-title":"RadixSpline: a single-pass learned index. In aiDM@SIGMOD","volume":"5","author":"Kipf Andreas","year":"2020","unstructured":"Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: a single-pass learned index. In aiDM@SIGMOD. ACM, 5:1\u20135:5.","journal-title":"ACM"},{"key":"e_1_2_1_16_1","volume-title":"The Case for Learned Index Structures. In SIGMOD Conference. ACM, 489\u2013504","author":"Kraska Tim","year":"2018","unstructured":"Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In SIGMOD Conference. ACM, 489\u2013504."},{"key":"e_1_2_1_17_1","volume-title":"The Bw-Tree: AB-tree for new hardware platforms","author":"Levandoski Justin J.","unstructured":"Justin J. Levandoski, David B. Lomet, and Sudipta Sengupta. 2013. The Bw-Tree: AB-tree for new hardware platforms. In ICDE. IEEE Computer Society, 302\u2013313."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/3598581.3598593"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-024-00893-6"},{"key":"e_1_2_1_20_1","volume-title":"BitTuner: A Toolbox for Automatically Configuring Learned Data Compressors. In 2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE Computer Society, 4548\u20134551","author":"Liu Qiyu","year":"2025","unstructured":"Qiyu Liu, Yuxin Luo, Mengke Cui, Siyuan Han, Jingshu Peng, Jin Li, and Lei Chen. 2025. BitTuner: A Toolbox for Automatically Configuring Learned Data Compressors. In 2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE Computer Society, 4548\u20134551."},{"key":"e_1_2_1_21_1","volume-title":"LHist: Towards Learning Multi-dimensional Histogram for Massive Spatial Data","author":"Liu Qiyu","unstructured":"Qiyu Liu, Yanyan Shen, and Lei Chen. 2021. LHist: Towards Learning Multi-dimensional Histogram for Massive Spatial Data. In ICDE. IEEE, 1188\u20131199."},{"key":"e_1_2_1_22_1","volume-title":"HAP: An Efficient Hamming Space Index Based on Augmented Pigeonhole Principle. In SIGMOD Conference. ACM, 917\u2013930","author":"Liu Qiyu","year":"2022","unstructured":"Qiyu Liu, Yanyan Shen, and Lei Chen. 2022. HAP: An Efficient Hamming Space Index Based on Augmented Pigeonhole Principle. In SIGMOD Conference. ACM, 917\u2013930."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407830"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/3421424.3421425"},{"key":"e_1_2_1_25_1","volume-title":"CDFShop: Exploring and Optimizing Learned Index Structures. In SIGMOD Conference. ACM, 2789\u20132792","author":"Marcus Ryan","year":"2020","unstructured":"Ryan Marcus, Emily Zhang, and Tim Kraska. 2020. CDFShop: Exploring and Optimizing Learned Index Structures. In SIGMOD Conference. ACM, 2789\u20132792."},{"key":"e_1_2_1_26_1","unstructured":"openstreetmap [n.d.]. OpenStreetMap. https:\/\/www.openstreetmap.org\/. Accessed: 2024-06-12."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/358746.358758"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/s002360050048"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/359545.359557"},{"key":"e_1_2_1_30_1","unstructured":"pgm [n.d.]. PGM-Index. https:\/\/github.com\/gvinciguerra\/PGM-index. Accessed: 2024-06-12."},{"key":"e_1_2_1_31_1","unstructured":"postgresql [n.d.]. PostgreSQL. https:\/\/www.postgresql.org\/docs\/current\/indexes.html. Accessed: 2024-06-12."},{"key":"e_1_2_1_32_1","unstructured":"pytorch [n.d.]. PyTorch. https:\/\/pytorch.org\/. Accessed: 2024-06-12."},{"key":"e_1_2_1_33_1","unstructured":"rmi [n.d.]. rmi. https:\/\/github.com\/learnedsystems\/RMI\/. Accessed: 2024-06-12."},{"key":"e_1_2_1_34_1","volume-title":"SIGMOD Conference. ACM, 36\u201353","author":"Sandt Peter Van","unstructured":"Peter Van Sandt, Yannis Chronis, and Jignesh M. Patel. 2019. Efficiently Searching In-Memory Sorted Arrays: Revenge of the Interpolation Search?. In SIGMOD Conference. ACM, 36\u201353."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/3236187.3236205"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/3236187.3236205"},{"key":"e_1_2_1_37_1","unstructured":"sparksql [n.d.]. Spark SQL. https:\/\/spark.apache.org\/sql\/. Accessed: 2024-06-12."},{"key":"e_1_2_1_38_1","unstructured":"stx [n.d.]. STXB+-tree. https:\/\/panthema.net\/2007\/stx-btree\/. Accessed: 2024-06-12."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/3594512.3594528"},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Chuzhe Tang Youyun Wang Zhiyuan Dong Gansen Hu Zhaoguo Wang Minjie Wang and Haibo Chen. 2020. XIndex: a scalable learned index for multicore data storage. In PPoPP. ACM 308\u2013320.","DOI":"10.1145\/3332466.3374547"},{"key":"e_1_2_1_41_1","unstructured":"tensorflow [n.d.]. TensorFlow. https:\/\/www.tensorflow.org\/. Accessed: 2024-06-12."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3478289"},{"key":"e_1_2_1_43_1","unstructured":"wikidata [n.d.]. Wikidata. https:\/\/www.wikidata.org\/wiki\/Wikidata:Main_Page. Accessed: 2024-06-12."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551848"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/3457390.3457393"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/3547305.3547322"},{"key":"e_1_2_1_48_1","first-page":"1","article-title":"Wormhole","volume":"18","author":"Wu Xingbo","year":"2019","unstructured":"Xingbo Wu, Fan Ni, and Song Jiang. 2019. Wormhole: A Fast Ordered Index for In-memory Data Management. In EuroSys. ACM, 18:1\u201318:16.","journal-title":"A Fast Ordered Index for In-memory Data Management. In EuroSys. ACM"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-014-0355-0"},{"key":"e_1_2_1_50_1","volume-title":"ICML (Proceedings of Machine Learning Research)","volume":"202","author":"Zeighami Sepanta","year":"2023","unstructured":"Sepanta Zeighami and Cyrus Shahabi. 2023. On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing. In ICML (Proceedings of Machine Learning Research), Vol. 202. PMLR, 40669\u201340680."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551823"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3654954"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565816.3565826"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3746405.3746415","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T19:52:49Z","timestamp":1757015569000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3746405.3746415"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5]]},"references-count":53,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2025,5]]}},"alternative-id":["10.14778\/3746405.3746415"],"URL":"https:\/\/doi.org\/10.14778\/3746405.3746415","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,5]]},"assertion":[{"value":"2025-09-03","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}