{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T08:57:09Z","timestamp":1775638629863,"version":"3.50.1"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"5","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,1]]},"abstract":"<jats:p>Efficient query optimization is crucial for database management systems. Recently, machine learning models have been applied in query optimizers to generate better plans, but the unpredictable performance regressions prevent them from being truly applicable. To be more specific, while a learned query optimizer commonly outperforms the traditional query optimizer on average for a workload of queries, its performance regression seems inevitable for some queries due to model under-fitting and difficulty in generalization. In this paper, we propose a system called Eraser to resolve this problem. Eraser aims at eliminating performance regressions while still attaining considerable overall performance improvement. To this end, Eraser applies a two-stage strategy to estimate the model accuracy for each candidate plan, and helps the learned query optimizer select more reliable plans. The first stage serves as a coarse-grained filter that removes all highly risky plans with feature values that are seen for the first time. The second stage clusters plans in a more fine-grained manner and evaluates each cluster according to the prediction quality of learned query optimizers for selecting the final execution plan. Eraser can be deployed as a plugin on top of any learned query optimizer. We implement Eraser and demonstrate its superiority on PostgreSQL and Spark. In our experiments, Eraser eliminates most of the regressions while bringing very little negative impact on the overall performance of learned query optimizers, no matter whether they perform better or worse than the traditional query optimizer. Meanwhile, it is adaptive to dynamic settings and generally applicable to different database systems.<\/jats:p>","DOI":"10.14778\/3641204.3641205","type":"journal-article","created":{"date-parts":[[2024,5,2]],"date-time":"2024-05-02T22:05:43Z","timestamp":1714687543000},"page":"926-938","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Eraser: Eliminating Performance Regression on Learned Query Optimizer"],"prefix":"10.14778","volume":"17","author":[{"given":"Lianggui","family":"Weng","sequence":"first","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"given":"Rong","family":"Zhu","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"given":"Di","family":"Wu","sequence":"additional","affiliation":[{"name":"Alibaba Group, HUST, Hangzhou, China"}]},{"given":"Bolin","family":"Ding","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"given":"Bolong","family":"Zheng","sequence":"additional","affiliation":[{"name":"HUST, Wuhan, China"}]},{"given":"Jingren","family":"Zhou","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2024,5,2]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2022. HyperQO implementation. https:\/\/github.com\/yxfish13\/HyperQO."},{"key":"e_1_2_1_2_1","unstructured":"2022. Lero implementation. https:\/\/github.com\/Blondig\/Lero-on-PostgreSQL."},{"key":"e_1_2_1_3_1","unstructured":"2022. PerfGuard implementation. https:\/\/github.com\/WoodyBryant\/Perfguard."},{"key":"e_1_2_1_4_1","volume-title":"Multi-loss sub-ensembles for accurate classification with uncertainty estimation. ArXiv Preprint ArXiv:2010.01917","author":"Achrack Omer","year":"2020","unstructured":"Omer Achrack, Raizy Kellerman, and Ouriel Barzilay. 2020. Multi-loss sub-ensembles for accurate classification with uncertainty estimation. ArXiv Preprint ArXiv:2010.01917 (2020)."},{"key":"e_1_2_1_5_1","volume-title":"Alekh Jindal, Peter Orenberg, Hiren Patel, Shi Qiao, Vijay Ramani, Lucas Rosenblatt, et al.","author":"Ammerlaan Remmelt","year":"2021","unstructured":"Remmelt Ammerlaan, Gilbert Antonius, Marc Friedman, HM Sajjad Hossain, Alekh Jindal, Peter Orenberg, Hiren Patel, Shi Qiao, Vijay Ramani, Lucas Rosenblatt, et al. 2021. PerfGuard: Deploying ML-for-systems without performance regressions, almost! Proceedings of the VLDB Endowment 14, 13 (2021), 3362--3375."},{"key":"e_1_2_1_6_1","volume-title":"VLDB","volume":"97","author":"Chaudhuri Surajit","year":"1997","unstructured":"Surajit Chaudhuri and Vivek R Narasayya. 1997. An efficient, cost-driven index selection tool for Microsoft SQL server. In VLDB, Vol. 97. San Francisco, 146--155."},{"key":"e_1_2_1_7_1","unstructured":"Transaction Processing Performance Council(TPC). 2021. TPC-H vesion 2 and version 3. http:\/\/www.tpc.org\/tpch\/."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3324957"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2015.01.026"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/3503585.3503586"},{"key":"e_1_2_1_11_1","volume-title":"Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al.","author":"Han Yuxing","year":"2021","unstructured":"Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al. 2021. Cardinality estimation in DBMS: A comprehensive benchmark evaluation. ArXiv Preprint ArXiv:2109.05877 (2021)."},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Benjamin Hilprecht Andreas Schmidt Moritz Kulessa Alejandro Molina Kristian Kersting and Carsten Binnig. 2020. DeepDB: Learn from data not from queries! PVLDB 13 7 992--1005.","DOI":"10.14778\/3384345.3384349"},{"key":"e_1_2_1_13_1","volume-title":"Snapshot ensembles: Train 1, get m for free. ArXiv Preprint ArXiv.00109","author":"Huang Gao","year":"2017","unstructured":"Gao Huang, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E Hopcroft, and Kilian Q Weinberger. 2017. Snapshot ensembles: Train 1, get m for free. ArXiv Preprint ArXiv.00109 (2017)."},{"key":"e_1_2_1_14_1","volume-title":"VLDB","volume":"24","author":"Darera Jayant Harish D","year":"2008","unstructured":"Harish D Pooja N Darera Jayant and R Haritsa. 2008. Identifying robust plans through plan diagram reduction. In VLDB, Vol. 24. Citeseer, 25."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/3192965.3192971"},{"key":"e_1_2_1_16_1","volume-title":"Learned cardinalities: Estimating correlated joins with deep learning. ArXiv Preprint ArXiv:1809.00677","author":"Kipf Andreas","year":"2018","unstructured":"Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with deep learning. ArXiv Preprint ArXiv:1809.00677 (2018)."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380591"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588713"},{"key":"e_1_2_1_19_1","volume-title":"Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems 30","author":"Lakshminarayanan Balaji","year":"2017","unstructured":"Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems 30 (2017)."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526179"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352129"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW.2007.4401028"},{"key":"e_1_2_1_24_1","volume-title":"Bao: Making learned query optimization practical. In SIGMOD. 1275--1288.","author":"Marcus Ryan","year":"2021","unstructured":"Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making learned query optimization practical. In SIGMOD. 1275--1288."},{"key":"e_1_2_1_25_1","volume-title":"Neo: A learned query optimizer. ArXiv Preprint ArXiv:1904.03711","author":"Marcus Ryan","year":"2019","unstructured":"Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A learned query optimizer. ArXiv Preprint ArXiv:1904.03711 (2019)."},{"key":"e_1_2_1_26_1","volume-title":"Plan-structured deep neural network models for query performance prediction. ArXiv Preprint ArXiv:1902.00132","author":"Marcus Ryan","year":"2019","unstructured":"Ryan Marcus and Olga Papaemmanouil. 2019. Plan-structured deep neural network models for query performance prediction. ArXiv Preprint ArXiv:1902.00132 (2019)."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10139"},{"key":"e_1_2_1_28_1","volume-title":"An empirical analysis of deep learning for cardinality estimation. ArXiv Preprint ArXiv:1905.06425","author":"Ortiz Jennifer","year":"2019","unstructured":"Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, and S Sathiya Keerthi. 2019. An empirical analysis of deep learning for cardinality estimation. ArXiv Preprint ArXiv:1905.06425 (2019)."},{"key":"e_1_2_1_29_1","volume-title":"Hybrid Artificial Intelligent Systems: 13th International Conference, HAIS 2018, Oviedo, Spain, June 20--22, 2018, Proceedings 13","author":"Pedrozo Wendel G\u00f3es","year":"2018","unstructured":"Wendel G\u00f3es Pedrozo, J\u00falio Cesar Nievola, and Deborah Carvalho Ribeiro. 2018. An adaptive approach for index tuning with learning classifier systems on hybrid storage environments. In Hybrid Artificial Intelligent Systems: 13th International Conference, HAIS 2018, Oviedo, Spain, June 20--22, 2018, Proceedings 13. Springer, 716--729."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW49219.2020.00035"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW.2007.4401029"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/582095.582099"},{"key":"e_1_2_1_33_1","volume-title":"An end-to-end learning-based cost estimator. ArXiv Preprint ArXiv:1906.02560","author":"Sun Ji","year":"2019","unstructured":"Ji Sun and Guoliang Li. 2019. An end-to-end learning-based cost estimator. ArXiv Preprint ArXiv:1906.02560 (2019)."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/3339490.3339503"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/3402707.3402724"},{"key":"e_1_2_1_36_1","volume-title":"Deep sub-ensembles for fast uncertainty estimation in image classification. ArXiv Preprint ArXiv:1910.08168","author":"Valdenegro-Toro Matias","year":"2019","unstructured":"Matias Valdenegro-Toro. 2019. Deep sub-ensembles for fast uncertainty estimation in image classification. ArXiv Preprint ArXiv:1910.08168 (2019)."},{"key":"e_1_2_1_37_1","article-title":"Visualizing data using t-SNE","volume":"9","author":"der Maaten Laurens Van","year":"2008","unstructured":"Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008).","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_38_1","volume-title":"Are we ready for learned cardinality estimation? ArXiv Preprint ArXiv:2012.06743","author":"Wang Xiaoying","year":"2020","unstructured":"Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. 2020. Are we ready for learned cardinality estimation? ArXiv Preprint ArXiv:2012.06743 (2020)."},{"key":"e_1_2_1_39_1","volume-title":"Batchensemble: An alternative approach to efficient ensemble and lifelong learning. ArXiv Preprint ArXiv:2002.06715","author":"Wen Yeming","year":"2020","unstructured":"Yeming Wen, Dustin Tran, and Jimmy Ba. 2020. Batchensemble: An alternative approach to efficient ensemble and lifelong learning. ArXiv Preprint ArXiv:2002.06715 (2020)."},{"key":"e_1_2_1_40_1","volume-title":"Eraser: Eliminating performance regression on learned query optimizer. https:\/\/github.com\/duoyw\/Eraser\/tree\/main\/paper","author":"Weng Lianggui","year":"2023","unstructured":"Lianggui Weng, Rong Zhu, Di Wu, Bolin Ding, Bolong Zheng, and Jingren Zhou. 2023. Eraser: Eliminating performance regression on learned query optimizer. https:\/\/github.com\/duoyw\/Eraser\/tree\/main\/paper (2023)."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517885"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565838.3565846"},{"key":"e_1_2_1_43_1","first-page":"10","article-title":"Spark: Cluster computing with working sets","volume":"10","author":"Zaharia Matei","year":"2010","unstructured":"Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, Ion Stoica, et al. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10--10 (2010), 95.","journal-title":"HotCloud"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300065"},{"key":"e_1_2_1_45_1","first-page":"1466","article-title":"Lero: A learning-to-rank query optimizer","volume":"16","author":"Zhu Rong","year":"2022","unstructured":"Rong Zhu, Wei Chen, Bolin Ding, Xingguang Chen, Andreas Pfadler, Ziniu Wu, and Jingren Zhou. 2022. Lero: A learning-to-rank query optimizer. Proc. VLDB Endow. 16, 6 (2022), 1466--1479.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_2_1_46_1","unstructured":"Rong Zhu Ziniu Wu Chengliang Chai Andreas Pfadler Bolin Ding Guoliang Li and Jingren Zhou. 2022. Learned query optimizer: At the forefront of AI-driven databases. In EDBT. 1--4."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/3461535.3461539"},{"key":"e_1_2_1_48_1","volume-title":"International Conference on Autonomic Computing, 2004. Proceedings. IEEE, 180--187","author":"Zilio Daniel C","year":"2004","unstructured":"Daniel C Zilio, Calisto Zuzarte, Sam Lightstone, Wenbin Ma, Guy M Lohman, Roberta J Cochrane, Hamid Pirahesh, Latha Colby, Jarek Gryz, Eric Alton, et al. 2004. Recommending materialized views and indexes with the IBM DB2 design advisor. In International Conference on Autonomic Computing, 2004. Proceedings. IEEE, 180--187."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3641204.3641205","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,2]],"date-time":"2024-05-02T22:09:11Z","timestamp":1714687751000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3641204.3641205"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1]]},"references-count":48,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,1]]}},"alternative-id":["10.14778\/3641204.3641205"],"URL":"https:\/\/doi.org\/10.14778\/3641204.3641205","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,1]]},"assertion":[{"value":"2024-05-02","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}