{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T02:17:06Z","timestamp":1775873826409,"version":"3.50.1"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:p>\n            This paper is an experimental and analytical study of two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins in the context of graph database management systems: (i) optimistic estimators that make uniformity and conditional independence assumptions; and (ii) the recent pessimistic estimators that use information theoretic linear programs (LPs). We begin by analyzing how optimistic estimators use pre-computed statistics to generate cardinality estimates. We show these estimators can be modeled as picking bottom-to-top paths in a\n            <jats:italic>cardinality estimation graph<\/jats:italic>\n            (CEG), which contains sub-queries as nodes and edges whose weights are average degree statistics. We show that existing optimistic estimators have either undefined or fixed choices for picking CEG paths as their estimates and ignore alternative choices. Instead, we outline a space of optimistic estimators to make an estimate on CEGs, which subsumes existing estimators. We show, using an extensive empirical analysis, that effective paths depend on the structure of the queries. While on acyclic queries and queries with small-size cycles, using the maximum-weight path is effective to address the well known underestimation problem, on queries with larger cycles these estimates tend to overestimate, which can be addressed by using minimum weight paths. We next show that optimistic estimators and seemingly disparate LP-based pessimistic estimators are in fact connected. Specifically, we show that CEGs can also model some recent pessimistic estimators. This connection allows us to adopt an optimization from pessimistic estimators to optimistic ones, and provide insights into the pessimistic estimators, such as showing that they have combinatorial solutions.\n          <\/jats:p>","DOI":"10.14778\/3529337.3529339","type":"journal-article","created":{"date-parts":[[2022,6,22]],"date-time":"2022-06-22T22:23:05Z","timestamp":1655936585000},"page":"1533-1545","source":"Crossref","is-referenced-by-count":17,"title":["Accurate summary-based cardinality estimation through the lens of cardinality estimation graphs"],"prefix":"10.14778","volume":"15","author":[{"given":"Jeremy","family":"Chen","sequence":"first","affiliation":[{"name":"University of Waterloo"}]},{"given":"Yuqing","family":"Huang","sequence":"additional","affiliation":[{"name":"University of Waterloo"}]},{"given":"Mushi","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Waterloo"}]},{"given":"Semih","family":"Salihoglu","sequence":"additional","affiliation":[{"name":"University of Waterloo"}]},{"given":"Ken","family":"Salem","sequence":"additional","affiliation":[{"name":"University of Waterloo"}]}],"member":"320","published-online":{"date-parts":[[2022,6,22]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Mahmoud Abo Khamis Hung Q. Ngo and Dan Suciu. 2016. Computing Join Queries with Functional Dependencies. In PODS.  Mahmoud Abo Khamis Hung Q. Ngo and Dan Suciu. 2016. Computing Join Queries with Functional Dependencies. In PODS.","DOI":"10.1145\/2902251.2902289"},{"key":"e_1_2_1_2_1","volume-title":"Naughton","author":"Aboulnaga Ashraf","year":"2001","unstructured":"Ashraf Aboulnaga , Alaa R. Alameldeen , and Jeffrey F . Naughton . 2001 . Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. In VLDB. Ashraf Aboulnaga, Alaa R. Alameldeen, and Jeffrey F. Naughton. 2001. Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. In VLDB."},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Ashraf Aboulnaga and Surajit Chaudhuri. 1999. Self-Tuning Histograms: Building Histograms Without Looking at Data. In SIGMOD.  Ashraf Aboulnaga and Surajit Chaudhuri. 1999. Self-Tuning Histograms: Building Histograms Without Looking at Data. In SIGMOD.","DOI":"10.1145\/304182.304198"},{"key":"e_1_2_1_4_1","doi-asserted-by":"crossref","unstructured":"G\u00fcne\u015f Alu\u00e7 Olaf Hartig M. Tamer \u00d6zsu and Khuzaima Daudjee. 2014. Diversified Stress Testing of RDF Data Management Systems. In ISWC.  G\u00fcne\u015f Alu\u00e7 Olaf Hartig M. Tamer \u00d6zsu and Khuzaima Daudjee. 2014. Diversified Stress Testing of RDF Data Management Systems. In ISWC.","DOI":"10.1007\/978-3-319-11964-9_13"},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"A. Atserias M. Grohe and D. Marx. 2013. Size Bounds and Query Plans for Relational Joins. SICOMP 42 4 (2013).  A. Atserias M. Grohe and D. Marx. 2013. Size Bounds and Query Plans for Relational Joins. SICOMP 42 4 (2013).","DOI":"10.1137\/110859440"},{"key":"e_1_2_1_6_1","unstructured":"Walter Cai Magdalena Balazinska and Dan Suciu. 2019. Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities. In SIGMOD.  Walter Cai Magdalena Balazinska and Dan Suciu. 2019. Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities. In SIGMOD."},{"key":"e_1_2_1_8_1","unstructured":"Yu Chen and Ke Yi. 2020. Random Sampling and Size Estimation Over Cyclic Joins. In ICDT.  Yu Chen and Ke Yi. 2020. Random Sampling and Size Estimation Over Cyclic Joins. In ICDT."},{"key":"e_1_2_1_9_1","volume-title":"Introduction to Algorithms (2 ed.)","author":"Cormen Thomas H.","unstructured":"Thomas H. Cormen , Charles E. Leiserson , Ronald L. Rivest , and Clifford Stein . 2001. Introduction to Algorithms (2 ed.) . The MIT Press . Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2001. Introduction to Algorithms (2 ed.). The MIT Press."},{"key":"e_1_2_1_10_1","first-page":"2012","volume":"201","unstructured":"DBLP 201 2. DBLP 2012 - 2011 -28 Dump. https:\/\/dblp.org\/. DBLP 2012. DBLP 2012-11-28 Dump. https:\/\/dblp.org\/.","journal-title":"DBLP"},{"key":"e_1_2_1_11_1","unstructured":"Epinions 2003. Epinions. https:\/\/snap.stanford.edu\/data\/soc-Epinions1.html.  Epinions 2003. Epinions. https:\/\/snap.stanford.edu\/data\/soc-Epinions1.html."},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Lise Getoor Benjamin Taskar and Daphne Koller. 2001. Selectivity Estimation Using Probabilistic Models. In SIGMOD.  Lise Getoor Benjamin Taskar and Daphne Koller. 2001. Selectivity Estimation Using Probabilistic Models. In SIGMOD.","DOI":"10.1145\/375663.375727"},{"key":"e_1_2_1_13_1","volume-title":"Gregory Valiant, and Paul Valiant.","author":"Gottlob Georg","year":"2012","unstructured":"Georg Gottlob , Stephanie Tien Lee , Gregory Valiant, and Paul Valiant. 2012 . Size and Treewidth Bounds for Conjunctive Queries. JACM 59, 3 (2012). Georg Gottlob, Stephanie Tien Lee, Gregory Valiant, and Paul Valiant. 2012. Size and Treewidth Bounds for Conjunctive Queries. JACM 59, 3 (2012)."},{"key":"e_1_2_1_14_1","unstructured":"Graphflow 2019. Graphflow Source Code. https:\/\/tinyurl.com\/wyuzb9pr.  Graphflow 2019. Graphflow Source Code. https:\/\/tinyurl.com\/wyuzb9pr."},{"key":"e_1_2_1_15_1","volume-title":"Arun N.","author":"Naughton Peter","year":"1996","unstructured":"Haas, Peter J. and Naughton , Jeffrey F. and Seshadri , S . and Swami , Arun N. 1996 . Selectivity and Cost Estimation for Joins Based on Random Sampling. JCSS 52, 3 (1996). Haas, Peter J. and Naughton, Jeffrey F. and Seshadri, S. and Swami, Arun N. 1996. Selectivity and Cost Estimation for Joins Based on Random Sampling. JCSS 52, 3 (1996)."},{"key":"e_1_2_1_16_1","unstructured":"Hetionet 2015. Hetionet v1.0. https:\/\/het.io\/.  Hetionet 2015. Hetionet v1.0. https:\/\/het.io\/."},{"key":"e_1_2_1_17_1","first-page":"4","volume":"62","author":"Joglekar Manas","year":"2018","unstructured":"Manas Joglekar and Christopher R\u00e9. 2018. It's All a Matter of Degree - Using Degree Information to Optimize Multiway Joins. TOCS 62 , 4 ( 2018 ). Manas Joglekar and Christopher R\u00e9. 2018. It's All a Matter of Degree - Using Degree Information to Optimize Multiway Joins. TOCS 62, 4 (2018).","journal-title":"Optimize Multiway Joins. TOCS"},{"key":"e_1_2_1_18_1","unstructured":"Kyoungmin Kim Hyeonji Kim George Fletcher and Wook-Shin Han. 2021. Combining Sampling and Synopses with Worst-Case Optimal Runtime and Quality Guarantees for Graph Pattern Cardinality Estimation. In SIGMOD.  Kyoungmin Kim Hyeonji Kim George Fletcher and Wook-Shin Han. 2021. Combining Sampling and Synopses with Worst-Case Optimal Runtime and Quality Guarantees for Graph Pattern Cardinality Estimation. In SIGMOD."},{"key":"e_1_2_1_19_1","volume-title":"Really? PVLDB 9, 3","author":"Leis Viktor","year":"2015","unstructured":"Viktor Leis , Andrey Gubichev , Atanas Mirchev , Peter Boncz , Alfons Kemper , and Thomas Neumann . 2015. How Good Are Query Optimizers , Really? PVLDB 9, 3 ( 2015 ). Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? PVLDB 9, 3 (2015)."},{"key":"e_1_2_1_20_1","unstructured":"Viktor Leis Bernhard Radke Andrey Gubichev Alfons Kemper and Thomas Neumann. 2017. Cardinality Estimation Done Right: Index-Based Join Sampling. In CIDR.  Viktor Leis Bernhard Radke Andrey Gubichev Alfons Kemper and Thomas Neumann. 2017. Cardinality Estimation Done Right: Index-Based Join Sampling. In CIDR."},{"key":"e_1_2_1_21_1","volume-title":"Query Optimization Through the Looking Glass, and What We Found Running the Join Order Benchmark. VLDBJ 27, 5","author":"Leis Viktor","year":"2018","unstructured":"Viktor Leis , Bernhard Radke , Andrey Gubichev , Atanas Mirchev , Peter Boncz , Alfons Kemper , and Thomas Neumann . 2018. Query Optimization Through the Looking Glass, and What We Found Running the Join Order Benchmark. VLDBJ 27, 5 ( 2018 ). Viktor Leis, Bernhard Radke, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2018. Query Optimization Through the Looking Glass, and What We Found Running the Join Order Benchmark. VLDBJ 27, 5 (2018)."},{"key":"e_1_2_1_22_1","volume-title":"Wander Join: Online Aggregation via Random Walks. In SIGMOD.","author":"Li Feifei","year":"2016","unstructured":"Feifei Li , Bin Wu , Ke Yi , and Zhuoyue Zhao . 2016 . Wander Join: Online Aggregation via Random Walks. In SIGMOD. Feifei Li, Bin Wu, Ke Yi, and Zhuoyue Zhao. 2016. Wander Join: Online Aggregation via Random Walks. In SIGMOD."},{"key":"e_1_2_1_23_1","unstructured":"Angela Maduko Kemafor Anyanwu Amit Sheth and Paul Schliekelman. 2008. Graph Summaries for Subgraph Frequency Estimation. In ESWC.  Angela Maduko Kemafor Anyanwu Amit Sheth and Paul Schliekelman. 2008. Graph Summaries for Subgraph Frequency Estimation. In ESWC."},{"key":"e_1_2_1_24_1","volume-title":"Jeffrey Scott Vitter, and Min Wang","author":"Matias Yossi","year":"1998","unstructured":"Yossi Matias , Jeffrey Scott Vitter, and Min Wang . 1998 . Wavelet-Based Histograms for Selectivity Estimation. In SIGMOD. Yossi Matias, Jeffrey Scott Vitter, and Min Wang. 1998. Wavelet-Based Histograms for Selectivity Estimation. In SIGMOD."},{"key":"e_1_2_1_25_1","volume-title":"Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins. PVLDB 12, 11","author":"Mhedhbi Amine","year":"2019","unstructured":"Amine Mhedhbi and Semih Salihoglu . 2019. Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins. PVLDB 12, 11 ( 2019 ). Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins. PVLDB 12, 11 (2019)."},{"key":"e_1_2_1_26_1","first-page":"i","volume":"198","author":"Muralikrishna M.","unstructured":"M. Muralikrishna and David J. DeWitt. 198 8. Equ i -Depth Histograms for Estimating Selectivity Factors for Multi-Dimensional Queries. In SIGMOD. M. Muralikrishna and David J. DeWitt. 1988. Equi-Depth Histograms for Estimating Selectivity Factors for Multi-Dimensional Queries. In SIGMOD.","journal-title":"David J. DeWitt."},{"key":"e_1_2_1_27_1","unstructured":"Parimarjan Negi Ryan Marcus Hongzi Mao Nesime Tatbul Tim Kraska and Mohammad Alizadeh. 2020. Cost-Guided Cardinality Estimation: Focus Where it Matters. In ICDEW.  Parimarjan Negi Ryan Marcus Hongzi Mao Nesime Tatbul Tim Kraska and Mohammad Alizadeh. 2020. Cost-Guided Cardinality Estimation: Focus Where it Matters. In ICDEW."},{"key":"e_1_2_1_28_1","volume-title":"Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins. In ICDE.","author":"Neumann Thomas","year":"2011","unstructured":"Thomas Neumann and Guido Moerkotte . 2011 . Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins. In ICDE. Thomas Neumann and Guido Moerkotte. 2011. Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins. In ICDE."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453927"},{"key":"e_1_2_1_30_1","unstructured":"Yannis Papakonstantinou Hector Garcia-Molina and Jennifer Widom. 1995. Object Exchange Across Heterogeneous Information Sources. In ICDE.  Yannis Papakonstantinou Hector Garcia-Molina and Jennifer Widom. 1995. Object Exchange Across Heterogeneous Information Sources. In ICDE."},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Yeonsu Park Seongyun Ko Sourav S. Bhowmick Kyoungmin Kim Kijae Hong and Wook-Shin Han. 2020. G-CARE: A Framework for Performance Benchmarking of Cardinality Estimation Techniques for Subgraph Matching. In SIGMOD.  Yeonsu Park Seongyun Ko Sourav S. Bhowmick Kyoungmin Kim Kijae Hong and Wook-Shin Han. 2020. G-CARE: A Framework for Performance Benchmarking of Cardinality Estimation Techniques for Subgraph Matching. In SIGMOD.","DOI":"10.1145\/3318464.3389702"},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Neoklis Polyzotis and Minos Garofalakis. 2002. Statistical Synopses for Graph-Structured XML Databases. In SIGMOD.  Neoklis Polyzotis and Minos Garofalakis. 2002. Statistical Synopses for Graph-Structured XML Databases. In SIGMOD.","DOI":"10.1145\/564691.564733"},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Neoklis Polyzotis Minos Garofalakis and Yannis Ioannidis. 2004. Approximate XML Query Answers. In SIGMOD.  Neoklis Polyzotis Minos Garofalakis and Yannis Ioannidis. 2004. Approximate XML Query Answers. In SIGMOD.","DOI":"10.1145\/1007568.1007599"},{"key":"e_1_2_1_34_1","volume-title":"Ioannidis","author":"Poosala Viswanath","year":"1997","unstructured":"Viswanath Poosala and Yannis E . Ioannidis . 1997 . Selectivity Estimation Without the Attribute Value Independence Assumption. In VLDB. Viswanath Poosala and Yannis E. Ioannidis. 1997. Selectivity Estimation Without the Attribute Value Independence Assumption. In VLDB."},{"key":"e_1_2_1_35_1","unstructured":"RDF3X 2020. RDF-3X Source Code. https:\/\/github.com\/gh-rdf3x\/gh-rdf3x\/.  RDF3X 2020. RDF-3X Source Code. https:\/\/github.com\/gh-rdf3x\/gh-rdf3x\/."},{"key":"e_1_2_1_36_1","volume-title":"Kostylev","author":"Stefanoni Giorgio","year":"2018","unstructured":"Giorgio Stefanoni , Boris Motik , and Egor V . Kostylev . 2018 . Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation. In WWW. Giorgio Stefanoni, Boris Motik, and Egor V. Kostylev. 2018. Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation. In WWW."},{"key":"e_1_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Wei Sun Yibei Ling Naphtali Rishe and Yi Deng. 1993. An Instant and Accurate Size Estimation Method for Joins and Selections in a Retrieval-Intensive Environment. In SIGMOD.  Wei Sun Yibei Ling Naphtali Rishe and Yi Deng. 1993. An Instant and Accurate Size Estimation Method for Joins and Selections in a Retrieval-Intensive Environment. In SIGMOD.","DOI":"10.1145\/170035.170055"},{"key":"e_1_2_1_38_1","volume-title":"Mohamed Zait, and Sunil P. Chakkappen.","author":"Vengerov David","year":"2015","unstructured":"David Vengerov , Andre Cavalheiro Menck , Mohamed Zait, and Sunil P. Chakkappen. 2015 . Join Size Estimation Subject to Filter Conditions. PVLDB 8, 12 (2015). David Vengerov, Andre Cavalheiro Menck, Mohamed Zait, and Sunil P. Chakkappen. 2015. Join Size Estimation Subject to Filter Conditions. PVLDB 8, 12 (2015)."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-012088469-8.50024-3"},{"key":"e_1_2_1_40_1","unstructured":"WatDiv2014. WatDiv v.0.6. https:\/\/dsg.uwaterloo.ca\/watdiv\/.  WatDiv2014. WatDiv v.0.6. https:\/\/dsg.uwaterloo.ca\/watdiv\/."},{"key":"e_1_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Lucas Woltmann Claudio Hartmann Maik Thiele Dirk Habich and Wolfgang Lehner. 2019. Cardinality Estimation with Local Deep Learning Models. In aiDM.  Lucas Woltmann Claudio Hartmann Maik Thiele Dirk Habich and Wolfgang Lehner. 2019. Cardinality Estimation with Local Deep Learning Models. In aiDM.","DOI":"10.1145\/3329859.3329875"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476327"},{"key":"e_1_2_1_43_1","unstructured":"Wentao Wu Jeffrey F. Naughton and Harneet Singh. 2016. Sampling-Based Query Re-Optimization. In SIGMOD.  Wentao Wu Jeffrey F. Naughton and Harneet Singh. 2016. Sampling-Based Query Re-Optimization. In SIGMOD."},{"key":"e_1_2_1_44_1","unstructured":"Yuqing Wu Jignesh M. Patel and H. V. Jagadish. 2002. Estimating Answer Sizes for XML Queries. In EDBT.  Yuqing Wu Jignesh M. Patel and H. V. Jagadish. 2002. Estimating Answer Sizes for XML Queries. In EDBT."},{"key":"e_1_2_1_45_1","unstructured":"YAGO 2008. YAGO 1. https:\/\/yago-knowledge.org\/downloads\/yago-1.  YAGO 2008. YAGO 1. https:\/\/yago-knowledge.org\/downloads\/yago-1."},{"key":"e_1_2_1_46_1","volume-title":"Ilyas","author":"Zhang Ning","year":"2006","unstructured":"Ning Zhang , M. Tamer Ozsu , Ashraf Aboulnaga , and Ihab F . Ilyas . 2006 . XSEED : Accurate and Fast Cardinality Estimation for XPath Queries. In ICDE. Ning Zhang, M. Tamer Ozsu, Ashraf Aboulnaga, and Ihab F. Ilyas. 2006. XSEED: Accurate and Fast Cardinality Estimation for XPath Queries. In ICDE."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3529337.3529339","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:43:57Z","timestamp":1672220637000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3529337.3529339"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4]]},"references-count":45,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["10.14778\/3529337.3529339"],"URL":"https:\/\/doi.org\/10.14778\/3529337.3529339","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,4]]}}}