{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,19]],"date-time":"2026-04-19T01:24:41Z","timestamp":1776561881296,"version":"3.51.2"},"reference-count":62,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T00:00:00Z","timestamp":1685059200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61872315"],"award-info":[{"award-number":["61872315"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,5,26]]},"abstract":"<jats:p>Generation-based testing techniques have shown their effectiveness in detecting logic bugs of DBMS, which are often caused by improper implementation of query optimizers. Nonetheless, existing generation-based debug tools are limited to single-table queries and there is a substantial research gap regarding multi-table queries with join operators. In this paper, we propose TQS, a novel testing framework targeted at detecting logic bugs derived by queries involving multi-table joins. Given a target DBMS, TQS achieves the goal with two key components: Data-guided Schema and Query Generation (DSG) and Knowledge-guided Query Space Exploration (KQE). DSG addresses the key challenge of multi-table query debugging: how to generate ground-truth (query, result) pairs for verification. It adopts the database normalization technique to generate a testing schema and maintains a bitmap index for result tracking. To improve debug efficiency, DSG also artificially inserts some noises into the generated data. To avoid repetitive query space search, KQE forms the problem as isomorphic graph set discovery and combines the graph embedding and weighted random walk for query generation. We evaluated TQS on four popular DBMSs: MySQL, MariaDB, TiDB and PolarDB. Experimental results show that TQS is effective in finding logic bugs of join optimization in database management systems. It successfully detected 115 bugs within 24 hours, including 31 bugs in MySQL, 30 in MariaDB, 31 in TiDB, and 23 in PolarDB respectively.<\/jats:p>","DOI":"10.1145\/3588909","type":"journal-article","created":{"date-parts":[[2023,5,30]],"date-time":"2023-05-30T17:42:05Z","timestamp":1685468525000},"page":"1-26","source":"Crossref","is-referenced-by-count":26,"title":["Detecting Logic Bugs of Join Optimizations in DBMS"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8611-0283","authenticated-orcid":false,"given":"Xiu","family":"Tang","sequence":"first","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7903-1496","authenticated-orcid":false,"given":"Sai","family":"Wu","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9964-2470","authenticated-orcid":false,"given":"Dongxiang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-0770-5775","authenticated-orcid":false,"given":"Feifei","family":"Li","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7483-0045","authenticated-orcid":false,"given":"Gang","family":"Chen","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2023,5,30]]},"reference":[{"key":"e_1_2_2_1_1","volume-title":"Tamer \u00d6 zsu, and Khuzaima Daudjee","author":"Alucc G\u00fc","year":"2014","unstructured":"G\u00fc nes Alucc, Olaf Hartig, M. Tamer \u00d6 zsu, and Khuzaima Daudjee. 2014a. Diversified Stress Testing of RDF Data Management Systems. In ISWC. Springer, 197--212."},{"key":"e_1_2_2_2_1","volume-title":"International Semantic Web Conference. 197--212","author":"Alucc G\u00fcnecs","year":"2014","unstructured":"G\u00fcnecs Alucc, Olaf Hartig, M Tamer \u00d6zsu, and Khuzaima Daudjee. 2014b. Diversified stress testing of RDF data management systems. In International Semantic Web Conference. 197--212."},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/3204028.3204034"},{"key":"e_1_2_2_4_1","volume-title":"Aur\u00e9 lien Lemay, and Nicky Advokaat","author":"Bagan Guillaume","year":"2017","unstructured":"Guillaume Bagan, Angela Bonifati, Radu Ciucanu, George H. L. Fletcher, Aur\u00e9 lien Lemay, and Nicky Advokaat. 2017. gMark: Schema-Driven Generation of Graphs and Queries. In ICDE. 63--64."},{"key":"e_1_2_2_5_1","unstructured":"Hardik Bati Leo Giakoumakis Steve Herbert and Aleksandras Surna. 2007. A genetic approach for random testing of database systems. In VLDB. ACM 1243--1251."},{"key":"e_1_2_2_6_1","volume-title":"Tamer \u00d6 zsu","author":"Binnig Carsten","year":"2007","unstructured":"Carsten Binnig, Donald Kossmann, Eric Lo, and M. Tamer \u00d6 zsu. 2007. QAGen: generating query-aware test databases. In SIGMOD. ACM, 341--352."},{"key":"e_1_2_2_7_1","volume-title":"Semantics in Databases","author":"Biskup Joachim","unstructured":"Joachim Biskup. 1995. Achievements of Relational Database Schema Design Theory Revisited. In Semantics in Databases. Springer, 29--54."},{"key":"e_1_2_2_8_1","volume-title":"Bernstein","author":"Biskup Joachim","year":"1979","unstructured":"Joachim Biskup, Umeshwar Dayal, and Philip A. Bernstein. 1979. Synthesizing Independent Database Schemas. In SIGMOD, Philip A. Bernstein (Ed.). ACM, 143--151."},{"key":"e_1_2_2_9_1","first-page":"1","article-title":"The berlin sparql benchmark","volume":"5","author":"Bizer Christian","year":"2009","unstructured":"Christian Bizer and Andreas Schultz. 2009. The berlin sparql benchmark. IJSWIS, Vol. 5, 2 (2009), 1--24.","journal-title":"IJSWIS"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-21064-8_2"},{"key":"e_1_2_2_11_1","volume-title":"Graph homomorphisms and universal algebra course notes. TU Dresden","author":"Bodirsky Manuel","year":"2015","unstructured":"Manuel Bodirsky. 2015. Graph homomorphisms and universal algebra course notes. TU Dresden (2015)."},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1670412.1670413"},{"key":"e_1_2_2_13_1","unstructured":"Nicolas Bruno and Surajit Chaudhuri. 2005. Flexible Database Generators. In VLDB. ACM 1097--1107."},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2006.190"},{"key":"e_1_2_2_15_1","doi-asserted-by":"crossref","unstructured":"Wei Cao Yingqiang Zhang Xinjun Yang Feifei Li Sheng Wang Qingda Hu Xuntao Cheng Zongzhi Chen Zhenjun Liu Jing Fang Bo Wang Yuhui Wang Haiqing Sun Ze Yang Zhushi Cheng Sen Chen Jian Wu Wei Hu Jianwei Zhao Yusong Gao Songlu Cai Yunyang Zhang and Jiawang Tong. 2021. PolarDB Serverless: A Cloud Native Database for Disaggregated Data Centers. In SIGMOD. ACM 2477--2489.","DOI":"10.1145\/3448016.3457560"},{"key":"e_1_2_2_16_1","volume-title":"NASA Formal Methods - 4th International Symposium","author":"Cuoq Pascal","unstructured":"Pascal Cuoq, Benjamin Monate, Anne Pacalet, Virgile Prevosto, John Regehr, Boris Yakobowski, and Xuejun Yang. 2012. Testing Static Analyzers with Randomly Generated Programs. In NASA Formal Methods - 4th International Symposium, NFM. Springer, 120--125."},{"key":"e_1_2_2_17_1","volume-title":"LMKG: Learned Models for Cardinality Estimation in Knowledge Graphs. In EDBT. OpenProceedings.org, 2:169--2:182.","author":"Davitkova Angjela","year":"2022","unstructured":"Angjela Davitkova, Damjan Gjurovski, and Sebastian Michel. 2022. LMKG: Learned Models for Cardinality Estimation in Knowledge Graphs. In EDBT. OpenProceedings.org, 2:169--2:182."},{"key":"e_1_2_2_18_1","unstructured":"DB-Engines. 2018. DB-Engines Ranking. [EB\/OL]. https:\/\/db-engines.com\/en\/ranking."},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/44498.44499"},{"key":"e_1_2_2_20_1","first-page":"730","article-title":"Efficient Streaming Subgraph Isomorphism with Graph Neural Networks","volume":"14","author":"Duong Chi Thang","year":"2021","unstructured":"Chi Thang Duong, Dung Hoang, Hongzhi Yin, Matthias Weidlich, Quoc Viet Hung Nguyen, and Karl Aberer. 2021. Efficient Streaming Subgraph Isomorphism with Graph Neural Networks. VLDB, Vol. 14, 5 (2021), 730--742.","journal-title":"VLDB"},{"key":"e_1_2_2_21_1","doi-asserted-by":"crossref","unstructured":"Orri Erling Alex Averbuch Josep Larriba-Pey Hassan Chafi Andrey Gubichev Arnau Prat Minh-Duc Pham and Peter Boncz. 2015. The LDBC social network benchmark: Interactive workload. In SIGMOD. 619--630.","DOI":"10.1145\/2723372.2742786"},{"key":"e_1_2_2_22_1","volume-title":"Johnson","author":"Garey M. R.","year":"1979","unstructured":"M. R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman."},{"key":"e_1_2_2_23_1","volume-title":"Weinberger","author":"Gray Jim","year":"1994","unstructured":"Jim Gray, Prakash Sundaresan, Susanne Englert, Kenneth Baclawski, and Peter J. Weinberger. 1994. Quickly Generating Billion-Record Synthetic Databases. In SIGMOD. ACM Press, 243--252."},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304510.2304525"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.websem.2005.06.005"},{"key":"e_1_2_2_26_1","unstructured":"Kenneth Houkj\u00e6r Kristian Torp and Rico Wind. 2006. Simple and Realistic Data Generation. In VLDB. ACM 1243--1246."},{"key":"e_1_2_2_27_1","first-page":"3072","article-title":"TiDB: a Raft-based HTAP database","volume":"13","author":"Huang Dongxu","year":"2020","unstructured":"Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, et al. 2020. TiDB: a Raft-based HTAP database. VLDB, Vol. 13, 12 (2020), 3072--3084.","journal-title":"VLDB"},{"key":"e_1_2_2_28_1","volume-title":"TANE: An efficient algorithm for discovering functional and approximate dependencies. The computer journal","author":"Huhtala Yka","year":"1999","unstructured":"Yka Huhtala, Juha K\"arkk\"ainen, Pasi Porkka, and Hannu Toivonen. 1999. TANE: An efficient algorithm for discovering functional and approximate dependencies. The computer journal, Vol. 42, 2 (1999), 100--111."},{"key":"e_1_2_2_29_1","first-page":"57","article-title":"APOLLO","volume":"13","author":"Jung Jinho","year":"2019","unstructured":"Jinho Jung, Hong Hu, Joy Arulraj, Taesoo Kim, and Woon-Hak Kang. 2019. APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems. VLDB, Vol. 13, 1 (2019), 57--70.","journal-title":"VLDB"},{"key":"e_1_2_2_30_1","volume-title":"Automatic testing of symbolic execution engines via program generation and differential testing","author":"Kapus Timotej","unstructured":"Timotej Kapus and Cristian Cadar. 2017. Automatic testing of symbolic execution engines via program generation and differential testing. In ASE. IEEE Computer Society, 590--600."},{"key":"e_1_2_2_31_1","volume-title":"Query-Aware Test Generation Using a Relational Constraint Solver","author":"Khalek Shadi Abdul","unstructured":"Shadi Abdul Khalek, Bassem Elkarablieh, Yai O. Laleye, and Sarfraz Khurshid. 2008. Query-Aware Test Generation Using a Relational Constraint Solver. In ASE. IEEE Computer Society, 238--247."},{"key":"e_1_2_2_32_1","doi-asserted-by":"crossref","unstructured":"Shadi Abdul Khalek and Sarfraz Khurshid. 2010. Automated SQL query generation for systematic testing of database engines. In ASE. ACM 329--332.","DOI":"10.1145\/1858996.1859063"},{"key":"e_1_2_2_33_1","doi-asserted-by":"crossref","unstructured":"Vu Le Mehrdad Afshari and Zhendong Su. 2014. Compiler validation via equivalence modulo inputs. In PLDI. ACM 216--226.","DOI":"10.1145\/2594291.2594334"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-009-0157-y"},{"key":"e_1_2_2_35_1","volume-title":"The Theory of Relational Databases","author":"Maier David","unstructured":"David Maier. 1983. The Theory of Relational Databases. Computer Science Press."},{"key":"e_1_2_2_36_1","unstructured":"MariaDB. 2022. MariaDB hints. [EB\/OL]. https:\/\/mariadb.com\/kb\/en\/optimizer-switch\/."},{"key":"e_1_2_2_37_1","unstructured":"Mariadb. 2022. Mariadb Homepage. [EB\/OL]. https:\/\/mariadb.org\/."},{"key":"e_1_2_2_38_1","first-page":"100","article-title":"Differential Testing for Software","volume":"10","author":"McKeeman William M.","year":"1998","unstructured":"William M. McKeeman. 1998. Differential Testing for Software. Digit. Tech. J., Vol. 10, 1 (1998), 100--107.","journal-title":"Digit. Tech. J."},{"key":"e_1_2_2_39_1","doi-asserted-by":"crossref","unstructured":"Chaitanya Mishra Nick Koudas and Calisto Zuzarte. 2008. Generating targeted queries for database testing. In SIGMOD. ACM 499--510.","DOI":"10.1145\/1376616.1376668"},{"key":"e_1_2_2_40_1","unstructured":"MySQL. 2022a. MySQL hints. [EB\/OL]. https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/optimizer-hints.html."},{"key":"e_1_2_2_41_1","unstructured":"MySQL. 2022b. MySQL Homepage. [EB\/OL]. https:\/\/www.mysql.com."},{"key":"e_1_2_2_42_1","unstructured":"Stack Overflow. 2021. Developer Survey Results. [EB\/OL]. https:\/\/insights.stackoverflow.com\/survey\/2021."},{"key":"e_1_2_2_43_1","volume-title":"A Hybrid Approach to Functional Dependency Discovery","author":"Papenbrock Thorsten","unstructured":"Thorsten Papenbrock and Felix Naumann. 2016. A Hybrid Approach to Functional Dependency Discovery. In SIGMOD, Fatma\u00d6zcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 821--833."},{"key":"e_1_2_2_44_1","unstructured":"Thorsten Papenbrock and Felix Naumann. 2017. Data-driven Schema Normalization. In EDBT. OpenProceedings.org 342--353."},{"key":"e_1_2_2_45_1","volume-title":"Stephens","author":"Poess Meikel","year":"2004","unstructured":"Meikel Poess and John M. Stephens. 2004. Generating Thousand Benchmark Queries in Seconds. In VLDB. Morgan Kaufmann, 1045--1053."},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897356.2897362"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3409710"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3428279"},{"key":"e_1_2_2_49_1","unstructured":"Manuel Rigger and Zhendong Su. 2020c. SQLancer. [EB\/OL]. https:\/\/github.com\/sqlancer\/sqlancer."},{"key":"e_1_2_2_50_1","unstructured":"Manuel Rigger and Zhendong Su. 2020d. Testing database engines via pivoted query synthesis. In OSDI 20. 667--682."},{"key":"e_1_2_2_51_1","volume-title":"a SPARQL performance benchmark","author":"Schmidt Michael","unstructured":"Michael Schmidt, Thomas Hornung, Georg Lausen, and Christoph Pinkel. 2009. SP^ 2Bench: a SPARQL performance benchmark. In ICDE. IEEE, 222--233."},{"key":"e_1_2_2_52_1","unstructured":"Andreas Seltenreich. 2020. SQLSmith. [EB\/OL]. https:\/\/github.com\/anse1\/sqlsmith."},{"key":"e_1_2_2_53_1","unstructured":"Donald R. Slutz. 1998. Massive Stochastic Testing of SQL. In VLDB Ashish Gupta Oded Shmueli and Jennifer Widom (Eds.). Morgan Kaufmann 618--622."},{"key":"e_1_2_2_54_1","unstructured":"TiDB. 2022. TiDB hints. [EB\/OL]. https:\/\/docs.pingcap.com\/tidb\/v5.3\/optimizer-hints."},{"key":"e_1_2_2_55_1","volume-title":"Rundensteiner","author":"Vartak Manasi","year":"2010","unstructured":"Manasi Vartak, Venkatesh Raghavan, and Elke A. Rundensteiner. 2010. QRelX: generating meaningful queries that provide cardinality assurance. In SIGMOD. ACM, 1215--1218."},{"key":"e_1_2_2_56_1","first-page":"1458","article-title":"Embedded Functional Dependencies and Data-completeness Tailored Database Design","volume":"12","author":"Wei Ziheng","year":"2019","unstructured":"Ziheng Wei and Sebastian Link. 2019. Embedded Functional Dependencies and Data-completeness Tailored Database Design. VLDB, Vol. 12, 11 (2019), 1458--1470.","journal-title":"VLDB"},{"key":"e_1_2_2_57_1","volume-title":"Compressing Bitmap Indexes for Faster Search Operations","author":"Wu Kesheng","unstructured":"Kesheng Wu, Ekow J. Otoo, and Arie Shoshani. 2002. Compressing Bitmap Indexes for Faster Search Operations. In SSDBM. IEEE Computer Society, 99--108."},{"key":"e_1_2_2_58_1","first-page":"1","article-title":"Snowtrail: Testing with Production Queries on a Cloud Database","volume":"4","author":"Yan Jiaqi","year":"2018","unstructured":"Jiaqi Yan, Qiuye Jin, Shrainik Jain, Stratis D. Viglas, and Allison W. Lee. 2018. Snowtrail: Testing with Production Queries on a Cloud Database. In SIGMOD. ACM, 4:1--4:6.","journal-title":"SIGMOD. ACM"},{"key":"e_1_2_2_59_1","doi-asserted-by":"crossref","unstructured":"Xuejun Yang Yang Chen Eric Eide and John Regehr. 2011. Finding and understanding bugs in C compilers. In SIGPLAN PLDI. ACM 283--294.","DOI":"10.1145\/1993498.1993532"},{"key":"e_1_2_2_60_1","first-page":"984","article-title":"Efficient Bi-triangle Counting for Large Bipartite Networks","volume":"14","author":"Yang Yixing","year":"2021","unstructured":"Yixing Yang, Yixiang Fang, Maria E. Orlowska, Wenjie Zhang, and Xuemin Lin. 2021. Efficient Bi-triangle Counting for Large Bipartite Networks. VLDB, Vol. 14, 6 (2021), 984--996.","journal-title":"VLDB"},{"key":"e_1_2_2_61_1","volume-title":"Neural Subgraph Matching. CoRR","author":"Ying Rex","year":"2020","unstructured":"Rex Ying, Zhaoyu Lou, Jiaxuan You, Chengtao Wen, Arquimedes Canedo, and Jure Leskovec. 2020. Neural Subgraph Matching. CoRR, Vol. abs\/2007.03092 (2020)."},{"key":"e_1_2_2_62_1","volume-title":"SQUIRREL: Testing Database Management Systems with Language Validity and Coverage Feedback. In CCS. ACM, 955--970.","author":"Zhong Rui","year":"2020","unstructured":"Rui Zhong, Yongheng Chen, Hong Hu, Hangfan Zhang, Wenke Lee, and Dinghao Wu. 2020. SQUIRREL: Testing Database Management Systems with Language Validity and Coverage Feedback. In CCS. ACM, 955--970."}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588909","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3588909","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:36Z","timestamp":1750178856000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588909"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,26]]},"references-count":62,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,5,26]]}},"alternative-id":["10.1145\/3588909"],"URL":"https:\/\/doi.org\/10.1145\/3588909","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,26]]}}}