{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,30]],"date-time":"2026-05-30T00:48:35Z","timestamp":1780102115956,"version":"3.54.0"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,3]]},"abstract":"<jats:p>The current boom of learned query optimizers (LQO) can be explained not only by the general continuous improvement of deep learning (DL) methods but also by the straightforward formulation of a query optimization problem (QOP) as a machine learning (ML) one. The idea is often to replace dynamic programming approaches, widespread for solving QOP, with more powerful methods such as reinforcement learning. However, such a rapid \"game change\" in the field of QOP could not pass without consequences - other parts of the ML pipeline, except for predictive model development, have large improvement potential. For instance, different LQOs introduce their own restrictions on training data generation from queries, use an arbitrary train\/validation approach, and evaluate on a voluntary split of benchmark queries.<\/jats:p>\n          <jats:p>\n            In this paper, we attempt to standardize the ML pipeline for evaluating LQOs by introducing a new\n            <jats:italic>end-to-end benchmarking framework.<\/jats:italic>\n            Additionally, we guide the reader through each data science stage in the ML pipeline and provide novel insights from the machine learning perspective, considering the specifics of QOP. Finally, we perform a\n            <jats:italic>rigorous evaluation of existing LQOs, showing that PostgreSQL outperforms these LQOs in almost all experiments depending on the train\/test splits.<\/jats:italic>\n          <\/jats:p>","DOI":"10.14778\/3654621.3654625","type":"journal-article","created":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T22:21:08Z","timestamp":1717107668000},"page":"1565-1577","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Is Your Learned Query Optimizer Behaving As You Expect? A Machine Learning Perspective"],"prefix":"10.14778","volume":"17","author":[{"given":"Claude","family":"Lehmann","sequence":"first","affiliation":[{"name":"Zurich University of Applied Sciences, Winterthur, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Pavel","family":"Sulimov","sequence":"additional","affiliation":[{"name":"Zurich University of Applied Sciences, Winterthur, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kurt","family":"Stockinger","sequence":"additional","affiliation":[{"name":"Zurich University of Applied Sciences, Winterthur, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,5,30]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/335191.335420"},{"key":"e_1_2_1_2_1","unstructured":"Jason Brownlee. 2020. Data preparation for machine learning: data cleaning feature selection and data transforms in Python. Machine Learning Mastery."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/3587136.3587150"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.14778\/3598581.3598597"},{"key":"e_1_2_1_5_1","unstructured":"Vijay Prakash Dwivedi and Xavier Bresson. 2021. A Generalization of Transformer Networks to Graphs. arXiv:2012.09699 [cs.LG]"},{"key":"e_1_2_1_6_1","volume-title":"Database Systems: The Complete Book (2 ed.)","author":"Garcia-Molina Hector","year":"2008","unstructured":"Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. 2008. Database Systems: The Complete Book (2 ed.). Prentice Hall Press, USA."},{"key":"e_1_2_1_7_1","volume-title":"Deep Learning","author":"Goodfellow Ian J.","unstructured":"Ian J. Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press, Cambridge, MA, USA. http:\/\/www.deeplearningbook.org."},{"key":"e_1_2_1_8_1","volume-title":"Jingliang Duan, Jie Li, Yangang Ren, Qi Sun, and Bo Cheng.","author":"Guan Yang","year":"2021","unstructured":"Yang Guan, Shengbo Eben Li, Jingliang Duan, Jie Li, Yangang Ren, Qi Sun, and Bo Cheng. 2021. Direct and indirect reinforcement learning. arXiv:1912.10600 [cs.LG]"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-35488-8_1"},{"key":"e_1_2_1_10_1","volume-title":"Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Jiangneng Li, and Bin Cui.","author":"Han Yuxing","year":"2022","unstructured":"Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Tan Wei Liang, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Jiangneng Li, and Bin Cui. 2022. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. VLDB 15, 4 (2022)."},{"key":"e_1_2_1_11_1","volume-title":"Join query optimization with deep reinforcement learning algorithms. arXiv preprint arXiv:1911.11689","author":"Heitz Jonas","year":"2019","unstructured":"Jonas Heitz and Kurt Stockinger. 2019. Join query optimization with deep reinforcement learning algorithms. arXiv preprint arXiv:1911.11689 (2019)."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/3384345.3384349"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-39940-9_384"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the 5th International Conference on Learning Representations (Palais des Congr\u00e8s Neptune","author":"Thomas","unstructured":"Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International Conference on Learning Representations (Palais des Congr\u00e8s Neptune, Toulon, France) (ICLR '17). https:\/\/openreview.net\/forum?id=SJU4ayYgl"},{"key":"e_1_2_1_15_1","volume-title":"Learning to Optimize Join Queries With Deep Reinforcement Learning. (08","author":"Krishnan Sanjay","year":"2018","unstructured":"Sanjay Krishnan, Zongheng Yang, Kenneth Goldberg, Joseph Hellerstein, and Ion Stoica. 2018. Learning to Optimize Join Queries With Deep Reinforcement Learning. (08 2018)."},{"key":"e_1_2_1_16_1","volume-title":"Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196","author":"Krishnan Sanjay","year":"2018","unstructured":"Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, and Ion Stoica. 2018. Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196 (2018)."},{"key":"e_1_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Joseph P La Salle. 1976. The stability of dynamical systems. SIAM.","DOI":"10.1137\/1.9781611970432"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544811"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000016"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177730491"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452838"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342644"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3211954.3211957"},{"key":"e_1_2_1_25_1","volume-title":"Towards a Hands-Free Query Optimizer through Deep Learning. (09","author":"Marcus Ryan","year":"2018","unstructured":"Ryan Marcus and Olga Papaemmanouil. 2018. Towards a Hands-Free Query Optimizer through Deep Learning. (09 2018)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10139"},{"key":"e_1_2_1_27_1","volume-title":"pg_hint_plan Documentation. https:\/\/pghintplan.osdn.jp\/pg_hint_plan.html. [Online","author":"Nippon Telegraph and Telephone Corporation","year":"2023","unstructured":"Nippon Telegraph and Telephone Corporation. 2012. pg_hint_plan Documentation. https:\/\/pghintplan.osdn.jp\/pg_hint_plan.html. [Online; accessed August, 2023]."},{"key":"e_1_2_1_28_1","volume-title":"Research and Development in Intelligent Systems XXVII","author":"Petkovi\u0107 Du\u0161an","unstructured":"Du\u0161an Petkovi\u0107. 2011. Dynamic Programming Algorithm vs. Genetic Algorithm: Which is Faster?. In Research and Development in Intelligent Systems XXVII, Max Bramer, Miltos Petridis, and Adrian Hopgood (Eds.). Springer London, London, 483--488."},{"key":"e_1_2_1_29_1","volume-title":"Queries in PostgreSQL: Sort and Merge. https:\/\/postgrespro.com\/blog\/pgsql\/5969770. [Online","author":"Rogov Egor","year":"2023","unstructured":"Egor Rogov. 2022. Queries in PostgreSQL: Sort and Merge. https:\/\/postgrespro.com\/blog\/pgsql\/5969770. [Online; accessed August, 2023]."},{"key":"e_1_2_1_30_1","volume-title":"Artificial Intelligence: A Modern Approach (3 ed.)","author":"Russell Stuart","year":"2010","unstructured":"Stuart Russell and Peter Norvig. 2010. Artificial Intelligence: A Modern Approach (3 ed.). Prentice Hall."},{"key":"e_1_2_1_31_1","volume-title":"Tuning Your PostgreSQL Server. https:\/\/wiki.postgresql.org\/wiki\/Tuning_Your_PostgreSQL_Server. [Online","author":"Smith Greg","year":"2023","unstructured":"Greg Smith, Robert Treat, and Christopher Browne. 2021. Tuning Your PostgreSQL Server. https:\/\/wiki.postgresql.org\/wiki\/Tuning_Your_PostgreSQL_Server. [Online; accessed August, 2023]."},{"key":"e_1_2_1_32_1","volume-title":"Barto","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (second ed.). The MIT Press. http:\/\/incompleteideas.net\/book\/the-book-2nd.html"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-1150"},{"key":"e_1_2_1_34_1","volume-title":"Genetic Query Optimization (GEQO) in PostgreSQL. https:\/\/www.postgresql.org\/docs\/current\/geqo-pg-intro.html. [Online","author":"The PostgreSQL Global Development Group","year":"2023","unstructured":"The PostgreSQL Global Development Group. 2023. Genetic Query Optimization (GEQO) in PostgreSQL. https:\/\/www.postgresql.org\/docs\/current\/geqo-pg-intro.html. [Online; accessed August, 2023]."},{"key":"e_1_2_1_35_1","volume-title":"TPC Benchmarks Overview. https:\/\/www.tpc.org\/information\/benchmarks5.asp. [Online","author":"Transaction Processing Performance Council","year":"2023","unstructured":"Transaction Processing Performance Council. 2023. TPC Benchmarks Overview. https:\/\/www.tpc.org\/information\/benchmarks5.asp. [Online; accessed August, 2023]."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research), Sanjoy Dasgupta and David McAllester (Eds.)","volume":"28","author":"Wan Li","year":"2013","unstructured":"Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. 2013. Regularization of Neural Networks using DropConnect. In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research), Sanjoy Dasgupta and David McAllester (Eds.), Vol. 28. PMLR, Atlanta, Georgia, USA, 1058--1066. https:\/\/proceedings.mlr.press\/v28\/wan13.html"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.536256"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","unstructured":"Geoffrey I. Webb. 2010. Overfitting. Springer US Boston MA 744--744. 10.1007\/978-0-387-30164-8_623","DOI":"10.1007\/978-0-387-30164-8_623"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517885"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/3421424.3421432"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565838.3565846"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00116"},{"key":"e_1_2_1_43_1","doi-asserted-by":"crossref","first-page":"2962","DOI":"10.14778\/3611479.3611501","article-title":"Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis","volume":"16","author":"Yunjia Zhang","year":"2023","unstructured":"Zhang Yunjia, Chronis Yannis, Patel Jignesh M., and Rekatsinas Theodoros. 2023. Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis. Proc. VLDB Endow. 16, 9 (2023), 2962--2975.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_2_1_44_1","volume-title":"Lero: A Learning-to-Rank Query Optimizer. arXiv preprint arXiv:2302.06873","author":"Zhu Rong","year":"2023","unstructured":"Rong Zhu, Wei Chen, Bolin Ding, Xingguang Chen, Andreas Pfadler, Ziniu Wu, and Jingren Zhou. 2023. Lero: A Learning-to-Rank Query Optimizer. arXiv preprint arXiv:2302.06873 (2023)."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3654621.3654625","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T22:24:03Z","timestamp":1717107843000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3654621.3654625"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3]]},"references-count":44,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,3]]}},"alternative-id":["10.14778\/3654621.3654625"],"URL":"https:\/\/doi.org\/10.14778\/3654621.3654625","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,3]]},"assertion":[{"value":"2024-05-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}