{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T06:19:52Z","timestamp":1764829192651,"version":"3.41.0"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,2,28]],"date-time":"2024-02-28T00:00:00Z","timestamp":1709078400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["IIS-0639106, IIS-0415101, and EIA-0080123"],"award-info":[{"award-number":["IIS-0639106, IIS-0415101, and EIA-0080123"]}]},{"name":"NRF","award":["NRF-2021R1I1A3056669 and NRF-2018R1A6A1A03025109"],"award-info":[{"award-number":["NRF-2021R1I1A3056669 and NRF-2018R1A6A1A03025109"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Database Syst."],"published-print":{"date-parts":[[2024,3,31]]},"abstract":"<jats:p>\n            The query optimization phase within a database management system (DBMS) ostensibly finds the fastest query execution plan from a potentially large set of enumerated plans, all of which correctly compute the same result of the specified query. Sometimes the cost-based optimizer selects a slower plan, for a variety of reasons. Previous work has focused on increasing the performance of specific components, often a single operator, within an individual DBMS. However, that does not address the fundamental question: from where does this suboptimality arise, across DBMSes generally? In particular, the contribution of each of many possible factors to DBMS suboptimality is currently unknown. To identify the root causes of DBMS suboptimality, we first introduce the notion of\n            <jats:italic>empirical suboptimality<\/jats:italic>\n            of a query plan chosen by the DBMS, indicated by the existence of a query plan that performs more efficiently than the chosen plan, for the same query. A crucial aspect is that this can be measured externally to the DBMS, and thus does not require access to its source code. We then propose a novel\n            <jats:italic>predictive model<\/jats:italic>\n            to explain the relationship between various factors in query optimization and empirical suboptimality. Our model associates suboptimality with the factors of complexity of the schema, of the underlying data on which the query is evaluated, of the query itself, and of the DBMS optimizer. The model also characterizes concomitant interactions among these factors. This model induces a number of specific hypotheses that were tested on multiple DBMSes. We performed a series of experiments that examined the plans for thousands of queries run on four popular DBMSes. We tested the model on over a million of these query executions, using correlational analysis, regression analysis, and causal analysis, specifically Structural Equation Modeling (SEM). We observed that the dependent construct of empirical suboptimality prevalence correlates positively with nine specific constructs characterizing four identified factors that explain in concert much of the variance of suboptimality of two extensive benchmarks, across these disparate DBMSes. This predictive model shows that it is the common aspects of these DBMSes that predict suboptimality,\n            <jats:italic>not<\/jats:italic>\n            the particulars embedded in the inordinate complexity of each of these DBMSes. This paper thus provides a new methodology to study mature query optimizers, identifies underlying DBMS-independent causes for the observed suboptimality, and quantifies the relative contribution of each of these causes to the observed suboptimality. This work thus provides a roadmap for fundamental improvements of cost-based query optimizers.\n          <\/jats:p>","DOI":"10.1145\/3636425","type":"journal-article","created":{"date-parts":[[2024,1,10]],"date-time":"2024-01-10T12:24:31Z","timestamp":1704889471000},"page":"1-40","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Identifying the Root Causes of DBMS Suboptimality"],"prefix":"10.1145","volume":"49","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-2818-6353","authenticated-orcid":false,"given":"Sabah","family":"Currim","sequence":"first","affiliation":[{"name":"Office of Budget and Planning, University of Arizona, Tucson, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4703-7460","authenticated-orcid":false,"given":"Richard T.","family":"Snodgrass","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Arizona, Tucson, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3124-2566","authenticated-orcid":false,"given":"Young-Kyoon","family":"Suh","sequence":"additional","affiliation":[{"name":"School of Computer Science &amp; Engineering, Kyungpook National University, Daegu, South Korea"}]}],"member":"320","published-online":{"date-parts":[[2024,2,28]]},"reference":[{"key":"e_1_3_4_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/1322432.1322433"},{"key":"e_1_3_4_3_2","doi-asserted-by":"publisher","DOI":"10.4018\/jkm.2007040101"},{"key":"e_1_3_4_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335420"},{"key":"e_1_3_4_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066172"},{"key":"e_1_3_4_6_2","doi-asserted-by":"publisher","DOI":"10.1037\/0022-3514.51.6.1173"},{"key":"e_1_3_4_7_2","doi-asserted-by":"publisher","DOI":"10.14778\/2536222.2536235"},{"key":"e_1_3_4_8_2","doi-asserted-by":"publisher","DOI":"10.2307\/258353"},{"key":"e_1_3_4_9_2","doi-asserted-by":"publisher","DOI":"10.4204\/EPTCS.26.6"},{"key":"e_1_3_4_10_2","doi-asserted-by":"publisher","DOI":"10.5555\/208443"},{"key":"e_1_3_4_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2996454"},{"key":"e_1_3_4_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465331"},{"key":"e_1_3_4_13_2","volume-title":"The Design of Experiments (9 ed.)","author":"Fisher Ronald A.","year":"1971","unstructured":"Ronald A. Fisher. 1971. The Design of Experiments (9 ed.). Macmillan, New York."},{"key":"e_1_3_4_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/152610.152611"},{"key":"e_1_3_4_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3186728.3164145"},{"key":"e_1_3_4_16_2","first-page":"1081","volume-title":"Proceedings of the International Conference on Very Large Data Bases","author":"Harish D.","year":"2007","unstructured":"D. Harish, Pooja N. Darera, and Jayant R. Haritsa. 2007. On the production of anorexic plan diagrams. In Proceedings of the International Conference on Very Large Data Bases. VLDB Endowment, Vienna, Austria, 1081\u20131092."},{"issue":"1","key":"e_1_3_4_17_2","first-page":"1517","article-title":"The Picasso database query optimizer visualizer","volume":"3","author":"Haritsa Jayant R.","year":"2010","unstructured":"Jayant R. Haritsa. 2010. The Picasso database query optimizer visualizer. Proceedings of the International Conference on Very Large Data Bases 3, 1-2 (Sept.2010), 1517\u20131520.","journal-title":"Proceedings of the International Conference on Very Large Data Bases"},{"key":"e_1_3_4_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/275487.275492"},{"key":"e_1_3_4_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/234313.234367"},{"key":"e_1_3_4_20_2","doi-asserted-by":"publisher","DOI":"10.2307\/25148625"},{"key":"e_1_3_4_21_2","doi-asserted-by":"publisher","DOI":"10.1080\/10705519909540118"},{"key":"e_1_3_4_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-155860869-6\/50023-8"},{"key":"e_1_3_4_23_2","unstructured":"IMDb.com. 2020. IMDb Datasets. URL: https:\/\/www.imdb.com\/interfaces\/."},{"key":"e_1_3_4_24_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-012722442-8\/50011-2"},{"volume-title":"ISO SQL:2008 International Standard","year":"2008","key":"e_1_3_4_25_2","unstructured":"ISO. 2008. ISO SQL:2008 International Standard. Technical Report. International Organization for Standardization, Geneva, Switzerland."},{"key":"e_1_3_4_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/356924.356928"},{"key":"e_1_3_4_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/276304.276315"},{"key":"e_1_3_4_28_2","doi-asserted-by":"publisher","DOI":"10.14778\/2180912.2180913"},{"key":"e_1_3_4_29_2","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"key":"e_1_3_4_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2007.367848"},{"key":"e_1_3_4_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/62061.62063"},{"issue":"3","key":"e_1_3_4_32_2","first-page":"59","article-title":"Causality in databases","volume":"33","author":"Meliou Alexandra","year":"2010","unstructured":"Alexandra Meliou, Wolfgang Gatterbauer, Joseph Y. Halpern, Christoph Koch, Katherine F. Moore, and Dan Suciu. 2010. Causality in databases. IEEE Data Eng. Bull. 33, 3 (2010), 59\u201367.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_3_4_33_2","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733070"},{"key":"e_1_3_4_34_2","volume-title":"Advanced SQL: 1999","author":"Melton Jim","year":"2003","unstructured":"Jim Melton. 2003. Advanced SQL: 1999. Morgan Kaufmann, Burlington, MA."},{"key":"e_1_3_4_35_2","doi-asserted-by":"publisher","DOI":"10.5555\/647520.727704"},{"key":"e_1_3_4_36_2","doi-asserted-by":"publisher","DOI":"10.1007\/s002360050048"},{"key":"e_1_3_4_37_2","doi-asserted-by":"publisher","DOI":"10.2202\/1557-4679.1203"},{"key":"e_1_3_4_38_2","first-page":"68","volume-title":"Handbook of Structural Equation Modeling","author":"Pearl Judea","year":"2012","unstructured":"Judea Pearl. 2012. The causal foundations of structural equation modeling. In Handbook of Structural Equation Modeling. The Guilford Press, New York, NY, US, 68\u201391."},{"key":"e_1_3_4_39_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-87993-0_3"},{"key":"e_1_3_4_40_2","first-page":"1228","volume-title":"Proceedings of the International Conference on Very Large Data Bases","author":"Reddy Naveen","year":"2005","unstructured":"Naveen Reddy and Jayant R. Haritsa. 2005. Analyzing plan diagrams of database query optimizers. In Proceedings of the International Conference on Very Large Data Bases. VLDB Endowment, Trondheim, Norway, 1228\u20131239."},{"key":"e_1_3_4_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872804"},{"key":"e_1_3_4_42_2","doi-asserted-by":"publisher","DOI":"10.18637\/jss.v048.i02"},{"key":"e_1_3_4_43_2","first-page":"7","volume-title":"Proceedings of the 8th USENIX Conference on Theory and Practice of Provenance (TaPP\u201916)","author":"Salimi Babak","year":"2016","unstructured":"Babak Salimi, Leopoldo Bertossi, Dan Suciu, and Guy Van Den Broeck. 2016. Quantifying causal effects on query answering in databases. In Proceedings of the 8th USENIX Conference on Theory and Practice of Provenance (TaPP\u201916). USENIX Association, Washington, DC, 7\u201310."},{"key":"e_1_3_4_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/335191.336564"},{"key":"e_1_3_4_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/582095.582099"},{"key":"e_1_3_4_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3508024"},{"key":"e_1_3_4_47_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2016.12.004"},{"key":"e_1_3_4_48_2","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733050"},{"key":"e_1_3_4_49_2","first-page":"122","volume-title":"The SAGE Encyclopedia of Communication Research Methods","author":"Warner Benjamin R.","year":"2018","unstructured":"Benjamin R. Warner. 2018. Causality. In The SAGE Encyclopedia of Communication Research Methods, Mike Allen (Ed.). SAGE Publications, Inc, Thousand Oaks, CA, US, 122\u2013124."},{"key":"e_1_3_4_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/565117.565127"},{"key":"e_1_3_4_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544881"}],"container-title":["ACM Transactions on Database Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3636425","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3636425","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:35:41Z","timestamp":1750178141000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3636425"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,28]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,3,31]]}},"alternative-id":["10.1145\/3636425"],"URL":"https:\/\/doi.org\/10.1145\/3636425","relation":{},"ISSN":["0362-5915","1557-4644"],"issn-type":[{"type":"print","value":"0362-5915"},{"type":"electronic","value":"1557-4644"}],"subject":[],"published":{"date-parts":[[2024,2,28]]},"assertion":[{"value":"2021-11-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-10-04","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-02-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}