{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T04:58:02Z","timestamp":1781326682982,"version":"3.54.1"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"4","funder":[{"DOI":"10.13039\/501100001711","name":"Swiss National Science Foundation","doi-asserted-by":"crossref","award":["192105"],"award-info":[{"award-number":["192105"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"crossref"}]},{"name":"European Union's Horizon Europe Research and Innovation Programme","award":["101188416"],"award-info":[{"award-number":["101188416"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,9,22]]},"abstract":"<jats:p>Query optimization has become a research area where classical algorithms are being challenged by machine learning algorithms. At the same time, recent trends in learned query optimizers have shown that it is prudent to take advantage of decades of database research and augment classical query optimizers by shrinking the plan search space through different types of hints (e.g. by specifying the join type, scan type or the order of joins) rather than completely replacing the classical query optimizer with machine learning models. It is especially relevant for cases when classical optimizers cannot fully enumerate all logical and physical plans and, as an alternative, need to rely on less robust approaches like genetic algorithms. However, even symbiotically learned query optimizers are hampered by the need for vast amounts of training data, slow plan generation during inference and unstable results across various workload conditions. In this paper, we present GenJoin - a novel learned query optimizer that considers the query optimization problem as a generative task and is capable of learning from a random set of subplan hints to produce query plans that outperform classical optimizers. GenJoin is the first learned query optimizer that significantly and consistently outperforms PostgreSQL as well as state-of-the-art methods on two well-known real-world benchmarks across a variety of workloads using rigorous machine learning evaluations.<\/jats:p>","DOI":"10.1145\/3749165","type":"journal-article","created":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T17:17:03Z","timestamp":1758647823000},"page":"1-25","source":"Crossref","is-referenced-by-count":3,"title":["GenJoin: Conditional Generative Plan-to-Plan Query Optimizer that Learns from Subplan Hints"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2885-2646","authenticated-orcid":false,"given":"Pavel","family":"Sulimov","sequence":"first","affiliation":[{"name":"Zurich University of Applied Sciences, Winterthur, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4693-0444","authenticated-orcid":false,"given":"Claude","family":"Lehmann","sequence":"additional","affiliation":[{"name":"Zurich University of Applied Sciences, Winterthur, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4034-4812","authenticated-orcid":false,"given":"Kurt","family":"Stockinger","sequence":"additional","affiliation":[{"name":"Zurich University of Applied Sciences, Winterthur, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,9,23]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Alekh Jindal, Peter Orenberg, Hiren Patel, Shi Qiao, Vijay Ramani, Lucas Rosenblatt, et al.","author":"Ammerlaan Remmelt","year":"2021","unstructured":"Remmelt Ammerlaan, Gilbert Antonius, Marc Friedman, HM Sajjad Hossain, Alekh Jindal, Peter Orenberg, Hiren Patel, Shi Qiao, Vijay Ramani, Lucas Rosenblatt, et al., 2021. PerfGuard: deploying ML-for-systems without performance regressions, almost! Proceedings of the VLDB Endowment, Vol. 14, 13 (2021), 3362-3375."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611540.3611544"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/335191.335420"},{"key":"e_1_2_1_4_1","volume-title":"Pattern Recognition and Machine Learning (Information Science and Statistics)","author":"Bishop Christopher M.","unstructured":"Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/3587136.3587150"},{"key":"e_1_2_1_6_1","first-page":"2261","article-title":"LEON","volume":"16","author":"Chen Xu","year":"2023","unstructured":"Xu Chen, Haitian Chen, Zibo Liang, Shuncheng Liu, Jinghong Wang, Kai Zeng, Han Su, and Kai Zheng. 2023a. LEON: A New Framework for ML-Aided Query Optimization. Proc. VLDB Endow., Vol. 16, 9 (2023), 2261-2273.","journal-title":"A New Framework for ML-Aided Query Optimization. Proc. VLDB Endow."},{"key":"e_1_2_1_7_1","volume-title":"Introduction to Algorithms","author":"Cormen Thomas H.","unstructured":"Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press.","edition":"3"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.2017.1285773"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1561\/1900000082"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0927-0507(06)13019-4"},{"key":"e_1_2_1_11_1","volume-title":"The Cascades Framework for Query Optimization","author":"Graefe Goetz","year":"1995","unstructured":"Goetz Graefe. 1995. The Cascades Framework for Query Optimization. IEEE Data(base) Engineering Bulletin, Vol. 18 (1995), 19-29. https:\/\/api.semanticscholar.org\/CorpusID:260706023"},{"key":"e_1_2_1_12_1","volume-title":"Join query optimization with deep reinforcement learning algorithms. arXiv preprint arXiv:1911.11689","author":"Heitz Jonas","year":"2019","unstructured":"Jonas Heitz and Kurt Stockinger. 2019. Join query optimization with deep reinforcement learning algorithms. arXiv preprint arXiv:1911.11689 (2019)."},{"key":"e_1_2_1_13_1","volume-title":"Zero-shot cost models for out-of-the-box learned cost prediction. arXiv preprint arXiv:2201.00561","author":"Hilprecht Benjamin","year":"2022","unstructured":"Benjamin Hilprecht and Carsten Binnig. 2022. Zero-shot cost models for out-of-the-box learned cost prediction. arXiv preprint arXiv:2201.00561 (2022)."},{"key":"e_1_2_1_14_1","volume-title":"Deepdb: Learn from data, not from queries! arXiv preprint arXiv:1909.00607","author":"Hilprecht Benjamin","year":"2019","unstructured":"Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2019. Deepdb: Learn from data, not from queries! arXiv preprint arXiv:1909.00607 (2019)."},{"key":"e_1_2_1_15_1","volume-title":"Inference and: Relationsship. Griffin. https:\/\/books.google.ch\/books?id=elabQwAACAAJ","author":"Kendall M.G.","year":"1973","unstructured":"M.G. Kendall and A. Stuart. 1973. The Advanced Theory of Statistics. Vol. 2: Inference and: Relationsship. Griffin. https:\/\/books.google.ch\/books?id=elabQwAACAAJ"},{"key":"e_1_2_1_16_1","volume-title":"Learned cardinalities: Estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677","author":"Kipf Andreas","year":"2018","unstructured":"Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677 (2018)."},{"key":"e_1_2_1_17_1","volume-title":"Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196","author":"Krishnan Sanjay","year":"2018","unstructured":"Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, and Ion Stoica. 2018. Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196 (2018)."},{"key":"e_1_2_1_18_1","volume-title":"Re: Bitmap indexes etc., https:\/\/www.postgresql.org\/message-id\/12553.1135634231@sss.pgh.pa.us. [Online","author":"Lane Tom","year":"2005","unstructured":"Tom Lane. 2005. Re: Bitmap indexes etc., https:\/\/www.postgresql.org\/message-id\/12553.1135634231@sss.pgh.pa.us. [Online; accessed April, 2025]."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3654621.3654625"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","unstructured":"D.S. Lemons and P. Langevin. 2002. An Introduction to Stochastic Processes in Physics. Johns Hopkins University Press. 2001046459 https:\/\/books.google.ch\/books?id=Uw6YDkd_CXcC","DOI":"10.56021\/9780801868665"},{"key":"e_1_2_1_22_1","volume-title":"Cardinality Estimation: Is Machine Learning a Silver Bullet?. In AIDB. https:\/\/www.microsoft.com\/en-us\/research\/publication\/cardinality-estimation-is-machine-learning-a-silver-bullet\/","author":"Li Beibin","year":"2021","unstructured":"Beibin Li, Yao Lu, Chi Wang, and Srikanth Kandula. 2021. Cardinality Estimation: Is Machine Learning a Silver Bullet?. In AIDB. https:\/\/www.microsoft.com\/en-us\/research\/publication\/cardinality-estimation-is-machine-learning-a-silver-bullet\/"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-19-7784-8"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476254"},{"key":"e_1_2_1_25_1","volume-title":"Learned Query Superoptimization. arXiv preprint arXiv:2303.15308","author":"Marcus Ryan","year":"2023","unstructured":"Ryan Marcus. 2023. Learned Query Superoptimization. arXiv preprint arXiv:2303.15308 (2023)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3542700.3542703"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342644"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3211954.3211957"},{"key":"e_1_2_1_29_1","unstructured":"Ryan Marcus and Olga Papaemmanouil. 2018b. Towards a Hands-Free Query Optimizer through Deep Learning. (2018). arXiv:1809.10212 [cs.DB] https:\/\/arxiv.org\/abs\/1809.10212"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/1792918.1792952"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.14778\/3636218.3636229"},{"key":"e_1_2_1_32_1","volume-title":"Learning structured output representation using deep conditional generative models. Advances in neural information processing systems","author":"Sohn Kihyuk","year":"2015","unstructured":"Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output representation using deep conditional generative models. Advances in neural information processing systems, Vol. 28 (2015)."},{"key":"e_1_2_1_33_1","volume-title":"The probable error of a mean. Biometrika","year":"1908","unstructured":"Student. 1908. The probable error of a mean. Biometrika (1908), 1-25."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.5555\/3312046"},{"key":"e_1_2_1_35_1","volume-title":"Steering the PostgreSQL query optimizer using hinting: State-Of-The-Art and open challenges. (06","author":"Thiessat Jerome","year":"2024","unstructured":"Jerome Thiessat, Dirk Habich, and Wolfgang Lehner. 2024. Steering the PostgreSQL query optimizer using hinting: State-Of-The-Art and open challenges. (06 2024)."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02289263"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.536256"},{"key":"e_1_2_1_38_1","unstructured":"Junxiong Wang Kaiwen Wang Yueying Li Nathan Kallus Immanuel Trummer and Wen Sun. 2024. JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning. https:\/\/openreview.net\/forum?id=aAEBTnTGo3"},{"key":"e_1_2_1_39_1","volume-title":"Denny Zhou, et al.","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al., 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, Vol. 35 (2022), 24824-24837."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/3641204.3641205"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611479.3611528"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588721"},{"key":"e_1_2_1_43_1","volume-title":"Bayescard: Revitilizing bayesian frameworks for cardinality estimation. arXiv preprint arXiv:2012.14743","author":"Wu Ziniu","year":"2020","unstructured":"Ziniu Wu, Amir Shaikhha, Rong Zhu, Kai Zeng, Yuxing Han, and Jingren Zhou. 2020. Bayescard: Revitilizing bayesian frameworks for cardinality estimation. arXiv preprint arXiv:2012.14743 (2020)."},{"key":"e_1_2_1_44_1","volume-title":"COOOL: A Learning-To-Rank Approach for SQL Hint Recommendations. arXiv:2304.04407 [cs.DB] https:\/\/arxiv.org\/abs\/2304.04407","author":"Xu Xianghong","year":"2023","unstructured":"Xianghong Xu, Zhibing Zhao, Tieying Zhang, Rong Kang, Luming Sun, and Jianjun Chen. 2023. COOOL: A Learning-To-Rank Approach for SQL Hint Recommendations. arXiv:2304.04407 [cs.DB] https:\/\/arxiv.org\/abs\/2304.04407"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517885"},{"key":"e_1_2_1_46_1","volume-title":"Neurocard: One cardinality estimator for all tables. arXiv preprint arXiv:2006.08109","author":"Yang Zongheng","year":"2020","unstructured":"Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. Neurocard: One cardinality estimator for all tables. arXiv preprint arXiv:2006.08109 (2020)."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565838.3565846"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00116"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.14778\/3529337.3529349"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/3583140.3583160"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.14778\/3641204.3641209"},{"key":"e_1_2_1_52_1","volume-title":"FLAT: fast, lightweight and accurate method for cardinality estimation. arXiv preprint arXiv:2011.09022","author":"Zhu Rong","year":"2020","unstructured":"Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, and Bin Cui. 2020. FLAT: fast, lightweight and accurate method for cardinality estimation. arXiv preprint arXiv:2011.09022 (2020)."}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3749165","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T04:40:34Z","timestamp":1781325634000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3749165"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,22]]},"references-count":52,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,9,22]]}},"alternative-id":["10.1145\/3749165"],"URL":"https:\/\/doi.org\/10.1145\/3749165","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,22]]}}}