{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T09:01:18Z","timestamp":1775638878010,"version":"3.50.1"},"reference-count":40,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:p>Machine learning has become a prominent method in many database optimization problems such as cost estimation, index selection and query optimization. Translating query execution plans into their vectorized representations is non-trivial. Recently, several query plan representation methods have been proposed. However, they have two limitations. First, they do not fully utilize readily available database statistics in the representation, which characterizes the data distribution. Second, they typically have difficulty in modeling long paths of information flow in a query plan, and capturing parent-children dependency between operators.<\/jats:p>\n          <jats:p>To tackle these limitations, we propose QueryFormer, a learning-based query plan representation model with a tree-structured Transformer architecture. In particular, we propose a novel scheme to integrate histograms obtained from database systems into query plan encoding. In addition, to effectively capture the information flow following the tree structure of a query plan, we develop a tree-structured model with the attention mechanism. We integrate QueryFormer into four machine learning models, each for a database optimization task, and experimental results show that QueryFormer is able to improve performance of these models significantly.<\/jats:p>","DOI":"10.14778\/3529337.3529349","type":"journal-article","created":{"date-parts":[[2022,6,22]],"date-time":"2022-06-22T22:23:05Z","timestamp":1655936585000},"page":"1658-1670","source":"Crossref","is-referenced-by-count":87,"title":["QueryFormer"],"prefix":"10.14778","volume":"15","author":[{"given":"Yue","family":"Zhao","sequence":"first","affiliation":[{"name":"Nanyang Technological University"}]},{"given":"Gao","family":"Cong","sequence":"additional","affiliation":[{"name":"Nanyang Technological University"}]},{"given":"Jiachen","family":"Shi","sequence":"additional","affiliation":[{"name":"Nanyang Technological University"}]},{"given":"Chunyan","family":"Miao","sequence":"additional","affiliation":[{"name":"Nanyang Technological University"}]}],"member":"320","published-online":{"date-parts":[[2022,6,22]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"11 Histograms. Retrieved","year":"2022","unstructured":"2021. Database SQL Tuning Guide , 11 Histograms. Retrieved February 10, 2022 from https:\/\/docs.oracle.com\/database\/121\/TGSQL\/tgsql_histo.htm#TGSQL366 2021. Database SQL Tuning Guide, 11 Histograms. Retrieved February 10, 2022 from https:\/\/docs.oracle.com\/database\/121\/TGSQL\/tgsql_histo.htm#TGSQL366"},{"key":"e_1_2_1_2_1","volume-title":"Row Estimation Examples. Retrieved","year":"2022","unstructured":"2021. Documentation PostgreSQL 12 71.1 . Row Estimation Examples. Retrieved February 10, 2022 from https:\/\/www.postgresql.org\/docs\/12\/row-estimation-examples.html 2021. Documentation PostgreSQL 12 71.1. Row Estimation Examples. Retrieved February 10, 2022 from https:\/\/www.postgresql.org\/docs\/12\/row-estimation-examples.html"},{"key":"e_1_2_1_3_1","volume-title":"Explain. Retrieved","year":"2022","unstructured":"2021. Documentation PostgreSQL 12 , Explain. Retrieved February 10, 2022 from https:\/\/www.postgresql.org\/docs\/12\/sql-explain.html 2021. Documentation PostgreSQL 12, Explain. Retrieved February 10, 2022 from https:\/\/www.postgresql.org\/docs\/12\/sql-explain.html"},{"key":"e_1_2_1_4_1","volume-title":"EXPLAIN Statement. Retrieved","author":"SQL","year":"2022","unstructured":"2021. My SQL 8. 0 Reference Manual , EXPLAIN Statement. Retrieved February 10, 2022 from https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/explain.html 2021. MySQL 8.0 Reference Manual, EXPLAIN Statement. Retrieved February 10, 2022 from https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/explain.html"},{"key":"e_1_2_1_5_1","volume-title":"Learning-based Query Performance Modeling and Prediction. In IEEE 28th International Conference on Data Engineering (ICDE","author":"Akdere Mert","year":"2012","unstructured":"Mert Akdere , Ugur \u00c7etintemel , Matteo Riondato , Eli Upfal , and Stanley B. Zdonik . 2012 . Learning-based Query Performance Modeling and Prediction. In IEEE 28th International Conference on Data Engineering (ICDE 2012 ). Mert Akdere, Ugur \u00c7etintemel, Matteo Riondato, Eli Upfal, and Stanley B. Zdonik. 2012. Learning-based Query Performance Modeling and Prediction. In IEEE 28th International Conference on Data Engineering (ICDE 2012)."},{"key":"e_1_2_1_6_1","volume-title":"VLDB","author":"Dageville Beno\u00eet","year":"2004","unstructured":"Beno\u00eet Dageville , Dinesh Das , Karl Dias , Khaled Yagoub , Mohamed Za\u00eft , and Mohamed Ziauddin . 2004 . Automatic SQL Tuning in Oracle 10g. In (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases , VLDB 2004. Beno\u00eet Dageville, Dinesh Das, Karl Dias, Khaled Yagoub, Mohamed Za\u00eft, and Mohamed Ziauddin. 2004. Automatic SQL Tuning in Oracle 10g. In (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/375663.375685"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019."},{"key":"e_1_2_1_9_1","volume-title":"Narasayya","author":"Ding Bailu","year":"2019","unstructured":"Bailu Ding , Sudipto Das , Ryan Marcus , Wentao Wu , Surajit Chaudhuri , and Vivek R . Narasayya . 2019 . AI Meets AI: Leveraging Query Executions to Improve Index Recommendations (SIGMOD '19). Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2019. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations (SIGMOD '19)."},{"key":"e_1_2_1_10_1","volume-title":"Representation Learning on Graphs: Methods and Applications","author":"Hamilton William L.","year":"2017","unstructured":"William L. Hamilton , Rex Ying , and Jure Leskovec . 2017. Representation Learning on Graphs: Methods and Applications . IEEE Data Eng. Bull . 40, 3 ( 2017 ). William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation Learning on Graphs: Methods and Applications. IEEE Data Eng. Bull. 40, 3 (2017)."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/375663.375687"},{"key":"e_1_2_1_12_1","volume-title":"Learned Cardinalities: Estimating Correlated Joins with Deep Learning. CoRR abs\/1809.00677","author":"Kipf Andreas","year":"2018","unstructured":"Andreas Kipf , Thomas Kipf , Bernhard Radke , Viktor Leis , Peter A. Boncz , and Alfons Kemper . 2018 . Learned Cardinalities: Estimating Correlated Joins with Deep Learning. CoRR abs\/1809.00677 (2018). Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter A. Boncz, and Alfons Kemper. 2018. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. CoRR abs\/1809.00677 (2018)."},{"key":"e_1_2_1_13_1","volume-title":"Which is the Best in the Land? An Experimental Evaluation of Index Selection Algorithms. 13, 12","author":"Kossmann Jan","year":"2020","unstructured":"Jan Kossmann , Stefan Halfpap , Marcel Jankrift , and Rainer Schlosser . 2020. Magic Mirror in My Hand , Which is the Best in the Land? An Experimental Evaluation of Index Selection Algorithms. 13, 12 ( 2020 ). Jan Kossmann, Stefan Halfpap, Marcel Jankrift, and Rainer Schlosser. 2020. Magic Mirror in My Hand, Which is the Best in the Land? An Experimental Evaluation of Index Selection Algorithms. 13, 12 (2020)."},{"key":"e_1_2_1_14_1","volume-title":"ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E. Hinton . 2012 . ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012 . Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"key":"e_1_2_1_16_1","volume-title":"Bao: Making Learned Query Optimization Practical. In SIGMOD '21: International Conference on Management of Data.","author":"Marcus Ryan","year":"2021","unstructured":"Ryan Marcus , Parimarjan Negi , Hongzi Mao , Nesime Tatbul , Mohammad Alizadeh , and Tim Kraska . 2021 . Bao: Making Learned Query Optimization Practical. In SIGMOD '21: International Conference on Management of Data. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making Learned Query Optimization Practical. In SIGMOD '21: International Conference on Management of Data."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342644"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3211954.3211957"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342646"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687738"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/3015812.3016002"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.14778\/3213880.3213882"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/1182635.1164217"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389727"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 30th International Conference on Machine Learning, ICML 2013 (JMLR Workshop and Conference Proceedings)","volume":"28","author":"Pascanu Razvan","year":"2013","unstructured":"Razvan Pascanu , Tom\u00e1s Mikolov , and Yoshua Bengio . 2013 . On the difficulty of training recurrent neural networks . In Proceedings of the 30th International Conference on Machine Learning, ICML 2013 (JMLR Workshop and Conference Proceedings) , Vol. 28 . Razvan Pascanu, Tom\u00e1s Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013 (JMLR Workshop and Conference Proceedings), Vol. 28."},{"key":"e_1_2_1_26_1","volume-title":"New TPC Benchmarks for Decision Support and Web Commerce. SIGMOD Rec. 29, 4","author":"P\u00f6ss Meikel","year":"2000","unstructured":"Meikel P\u00f6ss and Chris Floyd . 2000. New TPC Benchmarks for Decision Support and Web Commerce. SIGMOD Rec. 29, 4 ( 2000 ). Meikel P\u00f6ss and Chris Floyd. 2000. New TPC Benchmarks for Decision Support and Web Commerce. SIGMOD Rec. 29, 4 (2000)."},{"key":"e_1_2_1_27_1","volume-title":"Human Work Interaction Design. Designing Engaging Automation - 5th IFIP WG 13.6 Working Conference","author":"Seymoens Tom","year":"2018","unstructured":"Tom Seymoens , Femke Ongenae , An Jacobs , Stijn Verstichel , and Ann Ackaert . 2018. A Methodology to Involve Domain Experts and Machine Learning Techniques in the Design of Human-Centered Algorithms . In Human Work Interaction Design. Designing Engaging Automation - 5th IFIP WG 13.6 Working Conference , 2018 (IFIP Advances in Information and Communication Technology) , Vol. 544 . Tom Seymoens, Femke Ongenae, An Jacobs, Stijn Verstichel, and Ann Ackaert. 2018. A Methodology to Involve Domain Experts and Machine Learning Techniques in the Design of Human-Centered Algorithms. In Human Work Interaction Design. Designing Engaging Automation - 5th IFIP WG 13.6 Working Conference, 2018 (IFIP Advances in Information and Communication Technology), Vol. 544."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/3436905.3436907"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380584"},{"key":"e_1_2_1_30_1","volume-title":"An End-to-End Learning-based Cost Estimator. PVLDB 13, 3","author":"Sun Ji","year":"2019","unstructured":"Ji Sun and Guoliang Li. 2019. An End-to-End Learning-based Cost Estimator. PVLDB 13, 3 ( 2019 ). Ji Sun and Guoliang Li. 2019. An End-to-End Learning-based Cost Estimator. PVLDB 13, 3 (2019)."},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing.","author":"Tai Kai Sheng","unstructured":"Kai Sheng Tai , Richard Socher , and Christopher D. Manning . 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks . In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505756"},{"key":"e_1_2_1_33_1","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017 . Attention is All you Need . In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017."},{"key":"e_1_2_1_34_1","volume-title":"Le","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang , Zihang Dai , Yiming Yang , Jaime Carbonell , Ruslan Salakhutdinov , and Quoc V . Le . 2019 . XLNet: Generalized Autoregressive Pretraining for Language Understanding . Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding."},{"key":"e_1_2_1_35_1","volume-title":"Do Transformers Really Perform Bad for Graph Representation? CoRR abs\/2106.05234","author":"Ying Chengxuan","year":"2021","unstructured":"Chengxuan Ying , Tianle Cai , Shengjie Luo , Shuxin Zheng , Guolin Ke , Di He , Yanming Shen , and Tie-Yan Liu . 2021. Do Transformers Really Perform Bad for Graph Representation? CoRR abs\/2106.05234 ( 2021 ). Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. 2021. Do Transformers Really Perform Bad for Graph Representation? CoRR abs\/2106.05234 (2021)."},{"key":"e_1_2_1_36_1","volume-title":"Reinforcement Learning with Tree-LSTM for Join Order Selection. In 2020 IEEE 36th International Conference on Data Engineering (ICDE).","author":"Yu Xiang","year":"2020","unstructured":"Xiang Yu , Guoliang Li , Chengliang Chai , and Nan Tang . 2020 . Reinforcement Learning with Tree-LSTM for Join Order Selection. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). Xiang Yu, Guoliang Li, Chengliang Chai, and Nan Tang. 2020. Reinforcement Learning with Tree-LSTM for Join Order Selection. In 2020 IEEE 36th International Conference on Data Engineering (ICDE)."},{"key":"e_1_2_1_37_1","volume-title":"Automatic View Generation with Deep Learning and Reinforcement Learning. In 36th IEEE International Conference on Data Engineering, ICDE","author":"Yuan Haitao","year":"2020","unstructured":"Haitao Yuan , Guoliang Li , Ling Feng , Ji Sun , and Yue Han . 2020 . Automatic View Generation with Deep Learning and Reinforcement Learning. In 36th IEEE International Conference on Data Engineering, ICDE 2020. Haitao Yuan, Guoliang Li, Ling Feng, Ji Sun, and Yue Han. 2020. Automatic View Generation with Deep Learning and Reinforcement Learning. In 36th IEEE International Conference on Data Engineering, ICDE 2020."},{"key":"e_1_2_1_38_1","volume-title":"Automatic View Generation with Deep Learning and Reinforcement Learning. In 36th IEEE International Conference on Data Engineering, ICDE 2020","author":"Yuan Haitao","year":"2020","unstructured":"Haitao Yuan , Guoliang Li , Ling Feng , Ji Sun , and Yue Han . 2020 . Automatic View Generation with Deep Learning and Reinforcement Learning. In 36th IEEE International Conference on Data Engineering, ICDE 2020 , 2020. Haitao Yuan, Guoliang Li, Ling Feng, Ji Sun, and Yue Han. 2020. Automatic View Generation with Deep Learning and Reinforcement Learning. In 36th IEEE International Conference on Data Engineering, ICDE 2020, 2020."},{"key":"e_1_2_1_39_1","volume-title":"Feng Cheng, Shixuan Sun, and Bingsheng He.","author":"Zhi Kang Johan Kok","year":"2021","unstructured":"Johan Kok Zhi Kang , Gaurav , Sien Yi Tan , Feng Cheng, Shixuan Sun, and Bingsheng He. 2021 . Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload . Johan Kok Zhi Kang, Gaurav, Sien Yi Tan, Feng Cheng, Shixuan Sun, and Bingsheng He. 2021. Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload."},{"key":"e_1_2_1_40_1","volume-title":"VLDB","author":"Zilio Daniel C.","year":"2004","unstructured":"Daniel C. Zilio , Jun Rao , Sam Lightstone , Guy M. Lohman , Adam J. Storm , Christian Garcia-Arellano , and Scott Fadden . 2004 . DB2 Design Advisor: Integrated Automatic Physical Database Design. In (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases , VLDB 2004. Daniel C. Zilio, Jun Rao, Sam Lightstone, Guy M. Lohman, Adam J. Storm, Christian Garcia-Arellano, and Scott Fadden. 2004. DB2 Design Advisor: Integrated Automatic Physical Database Design. In (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3529337.3529349","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:47:39Z","timestamp":1672220859000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3529337.3529349"}},"subtitle":["a tree transformer model for query plan representation"],"short-title":[],"issued":{"date-parts":[[2022,4]]},"references-count":40,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["10.14778\/3529337.3529349"],"URL":"https:\/\/doi.org\/10.14778\/3529337.3529349","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,4]]}}}