{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T17:38:32Z","timestamp":1740159512317,"version":"3.37.3"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,1,10]],"date-time":"2022-01-10T00:00:00Z","timestamp":1641772800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,1,10]],"date-time":"2022-01-10T00:00:00Z","timestamp":1641772800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002957","name":"Technische Universit\u00e4t Dresden","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100002957","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Datenbank Spektrum"],"published-print":{"date-parts":[[2022,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Cardinality estimation is a\u00a0fundamental task in database query processing and optimization. As shown in recent papers, machine learning (ML)-based approaches may deliver more accurate cardinality estimations than traditional approaches. However, a\u00a0lot of training queries have to be executed during the<jats:italic>model training phase<\/jats:italic>to learn a\u00a0data-dependent ML model making it very time-consuming. Many of those training or example queries use the same base data, have the same query structure, and only differ in their selective predicates. To speed up the model training phase, our core idea is to determine a\u00a0<jats:italic>predicate-independent pre-aggregation<\/jats:italic>of the base data and to execute the example queries over this pre-aggregated data. Based on this idea, we present a\u00a0specific<jats:italic>aggregate-based training phase<\/jats:italic>for ML-based cardinality estimation approaches in this paper. As we are going to show with different workloads in our evaluation, we are able to achieve an average speedup of 90 with our<jats:italic>aggregate-based training phase<\/jats:italic>and thus outperform indexes.<\/jats:p>","DOI":"10.1007\/s13222-021-00400-z","type":"journal-article","created":{"date-parts":[[2022,1,10]],"date-time":"2022-01-10T12:04:03Z","timestamp":1641816243000},"page":"45-57","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Aggregate-based Training Phase for ML-based Cardinality Estimation"],"prefix":"10.1007","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0720-8878","authenticated-orcid":false,"given":"Lucas","family":"Woltmann","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Claudio","family":"Hartmann","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dirk","family":"Habich","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wolfgang","family":"Lehner","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,1,10]]},"reference":[{"issue":"3","key":"400_CR1","doi-asserted-by":"publisher","first-page":"204","DOI":"10.14778\/2850583.2850594","volume":"9","author":"V Leis","year":"2015","unstructured":"Leis V, Gubichev A, Mirchev A et al (2015) How good are query optimizers, really? Proc VLDB Endow 9(3):204\u2013215","journal-title":"Proc VLDB Endow"},{"issue":"4","key":"400_CR2","doi-asserted-by":"publisher","first-page":"499","DOI":"10.1145\/3186728.3164145","volume":"11","author":"H Harmouch","year":"2017","unstructured":"Harmouch H, Naumann F (2017) Cardinality estimation: an experimental survey. Proc VLDB Endow 11(4):499\u2013512","journal-title":"Proc VLDB Endow"},{"issue":"1","key":"400_CR3","doi-asserted-by":"publisher","first-page":"982","DOI":"10.14778\/1687627.1687738","volume":"2","author":"G Moerkotte","year":"2009","unstructured":"Moerkotte G, Neumann T, Steidl G (2009) Preventing bad plans by bounding the impact of cardinality estimation errors. Proc VLDB Endow 2(1):982\u2013993","journal-title":"Proc VLDB Endow"},{"key":"400_CR4","first-page":"409","volume-title":"VLDB","author":"K Youssefi","year":"1979","unstructured":"Youssefi K, Wong E (1979) Query processing in a\u00a0relational database management system. In: VLDB, pp 409\u2013417"},{"key":"400_CR5","first-page":"864","volume-title":"ICDE","author":"P Fender","year":"2011","unstructured":"Fender P, Moerkotte G (2011) A\u00a0new, highly efficient, and easy to implement top-down join enumeration algorithm. In: ICDE, pp 864\u2013875"},{"key":"400_CR6","first-page":"1","volume-title":"ADMS@VLDB","author":"V Rosenfeld","year":"2015","unstructured":"Rosenfeld V, Heimel M, Viebig C et al (2015) The operator variant selection problem on heterogeneous hardware. In: ADMS@VLDB, pp 1\u201312"},{"issue":"3","key":"400_CR7","doi-asserted-by":"publisher","first-page":"9:1","DOI":"10.1145\/3323991","volume":"44","author":"P Damme","year":"2019","unstructured":"Damme P, Ungeth\u00fcm A, Hildebrandt J et al (2019) From a\u00a0comprehensive experimental survey to a\u00a0cost-based selection strategy for lightweight integer compression algorithms. ACM Trans Database Syst 44(3):9:1\u20139:46. https:\/\/doi.org\/10.1145\/3323991","journal-title":"ACM Trans Database Syst"},{"issue":"7","key":"400_CR8","doi-asserted-by":"publisher","first-page":"733","DOI":"10.14778\/3067421.3067423","volume":"10","author":"T Karnagel","year":"2017","unstructured":"Karnagel T, Habich D, Lehner W (2017) Adaptive work placement for query processing on heterogeneous computing resources. Proc VLDB Endow 10(7):733\u2013744","journal-title":"Proc VLDB Endow"},{"key":"400_CR9","volume-title":"CIDR","author":"A Kipf","year":"2019","unstructured":"Kipf A, Kipf T, Radke B et al (2019) Learned cardinalities: Estimating correlated joins with deep learning. In: CIDR"},{"key":"400_CR10","first-page":"53","volume-title":"CASCON","author":"H Liu","year":"2015","unstructured":"Liu H, Xu M, Yu Z et al (2015) Cardinality estimation using neural networks. In: CASCON, pp 53\u201359"},{"key":"400_CR11","doi-asserted-by":"publisher","first-page":"5:1","DOI":"10.1145\/3329859.3329875","volume-title":"Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM at SIGMOD 2019","author":"L Woltmann","year":"2019","unstructured":"Woltmann L, Hartmann C, Thiele M et al (2019) Cardinality estimation with local deep learning models. In: Bordawekar R, Shmueli O (eds) Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM at SIGMOD 2019 Amsterdam, The Netherlands, 5 July 2019. vol\u00a010. ACM, pp\u00a05:1\u20135:8 https:\/\/doi.org\/10.1145\/3329859.3329875"},{"key":"400_CR12","series-title":"Proceedings, LNI","doi-asserted-by":"publisher","first-page":"135","DOI":"10.18420\/btw2021-07","volume-title":"Datenbanksysteme f\u00fcr Business, Technologie und Web (BTW 2021), 19. Fachtagung des GI-Fachbereichs \u201cDatenbanken und Informationssysteme\u201d (DBIS)","author":"L Woltmann","year":"2021","unstructured":"Woltmann L, Hartmann C, Habich D et al (2021) Aggregate-based training phase for ML-based cardinality estimation. In: Sattler K, Herschel M, Lehner W (eds) Datenbanksysteme f\u00fcr Business, Technologie und Web (BTW 2021), 19. Fachtagung des GI-Fachbereichs \u201cDatenbanken und Informationssysteme\u201d (DBIS) Dresden, Germany, 13.-17. September 2021. Proceedings, LNI, vol P\u2011311. Gesellschaft f\u00fcr Informatik, Bonn, pp 135\u2013154 https:\/\/doi.org\/10.18420\/btw2021-07"},{"key":"400_CR13","volume-title":"ICDE","author":"J Gray","year":"1996","unstructured":"Gray J, Bosworth A, Lyaman A et al (1996) Data cube: a\u00a0relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: ICDE"},{"key":"400_CR14","unstructured":"IMDb (2017) Imdb: Internet movie database. ftp:\/\/ftp.fu-berlin.de\/pub\/misc\/movies\/database\/frozendata\/. Accessed 1 Sept 2021"},{"issue":"11","key":"400_CR15","doi-asserted-by":"publisher","first-page":"1733","DOI":"10.14778\/3342263.3342646","volume":"12","author":"R Marcus","year":"2019","unstructured":"Marcus R, Papaemmanouil O (2019) Plan-structured deep neural network models for query performance prediction. Proc VLDB Endow 12(11):1733\u20131746. https:\/\/doi.org\/10.14778\/3342263.3342646","journal-title":"Proc VLDB Endow"},{"issue":"3","key":"400_CR16","doi-asserted-by":"publisher","first-page":"307","DOI":"10.14778\/3368289.3368296","volume":"13","author":"J Sun","year":"2019","unstructured":"Sun J, Li G (2019) An end-to-end learning-based cost estimator. Proc VLDB Endow 13(3):307\u2013319. https:\/\/doi.org\/10.14778\/3368289.3368296","journal-title":"Proc VLDB Endow"},{"key":"400_CR17","first-page":"489","volume-title":"SIGMOD","author":"T Kraska","year":"2018","unstructured":"Kraska T, Beutel A, Chi EH et al (2018) The case for learned index structures. In: SIGMOD, pp 489\u2013504"},{"key":"400_CR18","unstructured":"Kipf A (2019) Learned cardinalities in pytorch. https:\/\/github.com\/andreaskipf\/learnedcardinalities\/. Accessed 20 Oct 2021"},{"key":"400_CR19","doi-asserted-by":"crossref","unstructured":"Woltmann L (2019) Cardinality estimation with local deep learning models. https:\/\/github.com\/lucaswo\/cardest\/. Accessed 20 Oct 2021","DOI":"10.1145\/3329859.3329875"},{"key":"400_CR20","first-page":"506","volume-title":"VLDB","author":"S Agarwal","year":"1996","unstructured":"Agarwal S, Agrawal R, Deshpande P et al (1996) On the computation of multidimensional aggregates. In: VLDB, pp 506\u2013521"},{"key":"400_CR21","first-page":"205","volume-title":"SIGMOD","author":"V Harinarayan","year":"1996","unstructured":"Harinarayan V, Rajaraman A, Ullman JD (1996) Implementing data cubes efficiently. In: SIGMOD, pp 205\u2013216"},{"key":"400_CR22","first-page":"522","volume-title":"VLDB","author":"A Shukla","year":"1996","unstructured":"Shukla A, Deshpande P, Naughton JF et al (1996) Storage estimation for multidimensional aggregates in the presence of hierarchies. In: VLDB, pp 522\u2013531"},{"key":"400_CR23","first-page":"159","volume-title":"SIGMOD","author":"Y Zhao","year":"1997","unstructured":"Zhao Y, Deshpande P, Naughton JF (1997) An array-based algorithm for simultaneous multidimensional aggregates. In: SIGMOD, pp 159\u2013170"},{"key":"400_CR24","first-page":"1717","volume-title":"SIGMOD","author":"A Kumar","year":"2017","unstructured":"Kumar A, Boehm M, Yang J (2017) Data management in machine learning: challenges, techniques, and systems. In: SIGMOD, pp 1717\u20131722"},{"key":"400_CR25","doi-asserted-by":"publisher","first-page":"1937","DOI":"10.1145\/3299869.3320218","volume-title":"SIGMOD","author":"A Kipf","year":"2019","unstructured":"Kipf A, Vorona D, M\u00fcller J et al (2019) Estimating cardinalities with deep sketches. In: SIGMOD, pp 1937\u20131940 https:\/\/doi.org\/10.1145\/3299869.3320218"},{"key":"400_CR26","first-page":"28","volume":"35","author":"F F\u00e4rber","year":"2012","unstructured":"F\u00e4rber F, May N, Lehner W et al (2012) The SAP HANA database \u2013 an architecture overview. IEEE Data Eng Bull 35:28\u201333","journal-title":"IEEE Data Eng Bull"},{"key":"400_CR27","first-page":"287","volume-title":"KDD","author":"R Agrawal","year":"1996","unstructured":"Agrawal R, Shim K (1996) Developing tightly-coupled data mining applications on a\u00a0relational database system. In: KDD, pp 287\u2013290"},{"issue":"1","key":"400_CR28","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1007\/s10844-007-0047-y","volume":"32","author":"C Cho","year":"2009","unstructured":"Cho C, Wu Y, Chen ALP (2009) Effective database transformation and efficient support computation for mining sequential patterns. J\u00a0Intell Inf Syst 32(1):23\u201351","journal-title":"J Intell Inf Syst"},{"key":"400_CR29","first-page":"429","volume-title":"VLDB","author":"A Hinneburg","year":"2003","unstructured":"Hinneburg A, Lehner W, Habich D (2003) Combi-operator: database support for data mining applications. In: VLDB, pp 429\u2013439"},{"key":"400_CR30","first-page":"379","volume-title":"ICDE","author":"A Netz","year":"2001","unstructured":"Netz A, Chaudhuri S, Fayyad UM et al (2001) Integrating data mining with SQL databases: OLE DB for data mining. In: ICDE, pp 379\u2013387"},{"key":"400_CR31","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1145\/342009.335468","volume-title":"SIGMOD","author":"C Ordonez","year":"2000","unstructured":"Ordonez C, Cereghini P (2000) SQLEM: fast clustering in SQL using the EM algorithm. In: SIGMOD, pp 559\u2013570"},{"issue":"12","key":"400_CR32","doi-asserted-by":"publisher","first-page":"2715","DOI":"10.14778\/3476311.3476327","volume":"14","author":"L Woltmann","year":"2021","unstructured":"Woltmann L, Olwig D, Hartmann C et al (2021) PostCENN: postgresql with machine learning models for cardinality estimation. Proc VLDB Endow 14(12):2715\u20132718","journal-title":"Proc VLDB Endow"}],"container-title":["Datenbank-Spektrum"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13222-021-00400-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13222-021-00400-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13222-021-00400-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,22]],"date-time":"2023-01-22T12:25:50Z","timestamp":1674390350000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13222-021-00400-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,10]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,3]]}},"alternative-id":["400"],"URL":"https:\/\/doi.org\/10.1007\/s13222-021-00400-z","relation":{},"ISSN":["1618-2162","1610-1995"],"issn-type":[{"type":"print","value":"1618-2162"},{"type":"electronic","value":"1610-1995"}],"subject":[],"published":{"date-parts":[[2022,1,10]]},"assertion":[{"value":"27 October 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 December 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 January 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}