{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T17:02:52Z","timestamp":1771520572639,"version":"3.50.1"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T00:00:00Z","timestamp":1685059200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"The National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61825205 61902343"],"award-info":[{"award-number":["61825205 61902343"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"The Fundamental Research Funds for the Central Universities","award":["2021FZZX001-25"],"award-info":[{"award-number":["2021FZZX001-25"]}]},{"name":"The Zhejiang Provincial Natural Science Foundation for Distinguished Young Scholars","award":["LR21F020005"],"award-info":[{"award-number":["LR21F020005"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,5,26]]},"abstract":"<jats:p>Cardinality estimation, predicting the query result size, is a fundamental problem in databases. Existing skyline cardinality estimation methods are computationally infeasible for massive skyline queries over the large-scale database. In this paper, we introduce a unified skyline family w.r.t. various skyline variants. We propose an efficient and effective skyline family cardinality estimation model, named EECE, in an end-to-end manner. EECE consists of two modules, unsupervised data distribution learning (DDL) and supervised monotonic cardinality estimation (MCE). DDL leverages the mixture data guided transformer to learn the distribution of database and query parameters for model pre-training. MCE further incorporates supervised learning and parameter clamping to enhance the estimation under monotonicity guarantees. We develop an efficient incremental learning algorithm for EECE to adapt the database and query logs update. Extensive experiments on several real-world and synthetic datasets demonstrate that, EECE speeds up the cardinality estimation by six orders of magnitude, with more than 39% accuracy gain, compared to the state-of-the-art approaches.<\/jats:p>","DOI":"10.1145\/3588958","type":"journal-article","created":{"date-parts":[[2023,5,30]],"date-time":"2023-05-30T17:42:05Z","timestamp":1685468525000},"page":"1-21","source":"Crossref","is-referenced-by-count":3,"title":["Efficient and Effective Cardinality Estimation for Skyline Family"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8632-1539","authenticated-orcid":false,"given":"Xiaoye","family":"Miao","sequence":"first","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9531-0906","authenticated-orcid":false,"given":"Yangyang","family":"Wu","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-6573-0918","authenticated-orcid":false,"given":"Jiazhen","family":"Peng","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3816-8450","authenticated-orcid":false,"given":"Yunjun","family":"Gao","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4703-7348","authenticated-orcid":false,"given":"Jianwei","family":"Yin","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2023,5,30]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"Stephan Borzsony Donald Kossmann and Konrad Stocker. 2001. The skyline operator. In ICDE. 421--430."},{"key":"e_1_2_2_2_1","volume-title":"Anthony KH Tung, and Zhenjie Zhang","author":"Chan Chee-Yong","year":"2006","unstructured":"Chee-Yong Chan, HV Jagadish, Kian-Lee Tan, Anthony KH Tung, and Zhenjie Zhang. 2006. Finding k-dominant skylines in high dimensional space. In SIGMOD. 503--514."},{"key":"e_1_2_2_3_1","doi-asserted-by":"crossref","unstructured":"Surajit Chaudhuri Nilesh Dalvi and Raghav Kaushik. 2006. Robust cardinality and cost estimation for skyline operator. In ICDE. 64--73.","DOI":"10.1109\/ICDE.2006.131"},{"key":"e_1_2_2_4_1","unstructured":"Mark Chen Alec Radford Rewon Child Jeffrey Wu Heewoo Jun David Luan and Ilya Sutskever. 2020. Generative pretraining from pixels. In ICML. 1691--1703."},{"key":"e_1_2_2_5_1","first-page":"1547","article-title":"The unique qualities of a geographic information system: A commentary","volume":"54","author":"Cooperative GI","year":"1988","unstructured":"GI Cooperative and Fort Collins. 1988. The unique qualities of a geographic information system: A commentary. Photogrammetric Engineering and Remote Sensing, Vol. 54, 11 (1988), 1547--9.","journal-title":"Photogrammetric Engineering and Remote Sensing"},{"key":"e_1_2_2_6_1","unstructured":"Evangelos Dellis and Bernhard Seeger. 2007. Efficient computation of reverse skyline queries.. In VLDB. 291--302."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022491825047"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/3329772.3329780"},{"key":"e_1_2_2_9_1","doi-asserted-by":"crossref","unstructured":"Hannes Eder and Fang Wei. 2009. Evaluation of skyline algorithms in PostgreSQL. In IDEAS. 334--337.","DOI":"10.1145\/1620432.1620473"},{"key":"e_1_2_2_10_1","unstructured":"Dumitru Erhan Aaron Courville Yoshua Bengio and Pascal Vincent. 2010. Why does unsupervised pre-training help deep learning?. In AISTATS. 201--208."},{"key":"e_1_2_2_11_1","doi-asserted-by":"crossref","unstructured":"Theodoros Evgeniou and Massimiliano Pontil. 2004. Regularized multi--task learning. In SIGKDD. 109--117.","DOI":"10.1145\/1014052.1014067"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11280-017-0444-2"},{"key":"e_1_2_2_13_1","doi-asserted-by":"crossref","unstructured":"Malay Haldar Prashant Ramanathan Tyler Sax Mustafa Abdool Lanbo Zhang Aamir Mansawala Shulin Yang Bradley Turnbull and Junshuo Liao. 2020. Improving deep learning for airbnb search. In SIGKDD. 2822--2830.","DOI":"10.1145\/3394486.3403333"},{"key":"e_1_2_2_14_1","volume-title":"Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al.","author":"Han Yuxing","year":"2021","unstructured":"Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al. 2021. Cardinality estimation in DBMS: A comprehensive benchmark evaluation. ArXiv Preprint ArXiv:2109.05877 (2021)."},{"key":"e_1_2_2_15_1","volume-title":"Patrick Kamnang Wanko, and Sofian Maabout","author":"Hanusse Nicolas","year":"2016","unstructured":"Nicolas Hanusse, Patrick Kamnang Wanko, and Sofian Maabout. 2016. Using histograms for skyline size estimation. In IDEAS. 125--134."},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3186728.3164145"},{"key":"e_1_2_2_17_1","unstructured":"Robert L Heckman and William R King. 1994. Behavioral indicators of customer satisfaction with vendor-provided information services. In ICIS. 429--444."},{"key":"e_1_2_2_18_1","doi-asserted-by":"crossref","unstructured":"David Held Sebastian Thrun and Silvio Savarese. 2016. Learning to track at 100 FPS with deep regression networks. In ECCV. 749--765.","DOI":"10.1007\/978-3-319-46448-0_45"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3384345.3384349"},{"key":"e_1_2_2_20_1","volume-title":"Fleetrec: Large-scale recommendation inference on hybrid GPU-FPGA clusters. In SIGKDD. 3097--3105.","author":"Jiang Wenqi","year":"2021","unstructured":"Wenqi Jiang, Zhenhao He, Shuai Zhang, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, et al. 2021. Fleetrec: Large-scale recommendation inference on hybrid GPU-FPGA clusters. In SIGKDD. 3097--3105."},{"key":"e_1_2_2_21_1","volume-title":"A survey of skyline query processing. ArXiv Preprint ArXiv:1704.01788","author":"Kalyvas Christos","year":"2017","unstructured":"Christos Kalyvas and Theodoros Tzouramanis. 2017. A survey of skyline query processing. ArXiv Preprint ArXiv:1704.01788 (2017)."},{"key":"e_1_2_2_22_1","doi-asserted-by":"crossref","unstructured":"Werner Kie\u00dfling and Gerhard K\u00f6stler. 2002. Preference SQL$-$Design implementation experiences. In VLDB. 990--1001.","DOI":"10.1016\/B978-155860869-6\/50098-6"},{"key":"e_1_2_2_23_1","volume-title":"Seo, Wook-Shin Han, Kangwoo Choi, and Jaehyok Chong.","author":"Kim Kyoungmin","year":"2022","unstructured":"Kyoungmin Kim, Jisung Jung, In Seo, Wook-Shin Han, Kangwoo Choi, and Jaehyok Chong. 2022. Learned cardinality estimation: An in-depth study. In SIGMOD. 1214--1227."},{"key":"e_1_2_2_24_1","volume-title":"Learned cardinalities: Estimating correlated joins with deep learning. ArXiv Preprint ArXiv:1809.00677","author":"Kipf Andreas","year":"2018","unstructured":"Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with deep learning. ArXiv Preprint ArXiv:1809.00677 (2018)."},{"key":"e_1_2_2_25_1","doi-asserted-by":"crossref","unstructured":"Julia A Lasserre Christopher M Bishop and Thomas P Minka. 2006. Principled hybrids of generative and discriminative models. In CVPR. 87--94.","DOI":"10.1109\/CVPR.2006.227"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"key":"e_1_2_2_27_1","volume-title":"Warper: Efficiently adapting learned cardinality estimators to data and workload drifts. In SIGMOD. 1--14.","author":"Li Beibin","year":"2022","unstructured":"Beibin Li, Yao Lu, and Srikanth Kandula. 2022. Warper: Efficiently adapting learned cardinality estimators to data and workload drifts. In SIGMOD. 1--14."},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476254"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-011-0441-1"},{"key":"e_1_2_2_30_1","volume-title":"BTW","author":"Mandl Stefan","year":"2015","unstructured":"Stefan Mandl, Oleksandr Kozachuk, Markus Endres, and Werner Kie\u00dfling. 2015. Preference analytics in EXASolution. In BTW 2015. 613--632."},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2016.07.034"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2019.2946798"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2015.2460742"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2805824"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/3494124.3494143"},{"key":"e_1_2_2_36_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TKDE.2022.3186498","article-title":"An experimental survey of missing data imputation algorithms","volume":"1","author":"Miao Xiaoye","year":"2022","unstructured":"Xiaoye Miao, Yangyang Wu, Lu Chen, Yunjun Gao, and Jianwei Yin. 2022a. An experimental survey of missing data imputation algorithms. IEEE Transactions on Knowledge and Data Engineering, Vol. 1, 1 (2022), 1--20.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_2_2_37_1","doi-asserted-by":"crossref","unstructured":"Xiaoye Miao Yangyang Wu Jun Wang Yunjun Gao Xudong Mao and Jianwei Yin. 2021. Generative semi-supervised learning for multivariate time series imputation. In AAAI. 8983--8991.","DOI":"10.1609\/aaai.v35i10.17086"},{"key":"e_1_2_2_38_1","doi-asserted-by":"crossref","unstructured":"Guido Moerkotte David DeHaan Norman May Anisoara Nica and Alexander B\u00f6hm. 2014. Exploiting ordered dictionaries to efficiently construct histograms with q-error guarantees in SAP HANA. In SIGMOD. 361--372.","DOI":"10.1145\/2588555.2595629"},{"key":"e_1_2_2_39_1","volume-title":"EANA: Reducing privacy risk on large-scale recommendation models. In RecSys. 399--407.","author":"Ning Lin","year":"2022","unstructured":"Lin Ning, Steve Chien, Shuang Song, Mei Chen, Yunqi Xue, and Devora Berlowitz. 2022. EANA: Reducing privacy risk on large-scale recommendation models. In RecSys. 399--407."},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1061318.1061320"},{"key":"e_1_2_2_41_1","unstructured":"Jian Pei Wen Jin Martin Ester and Yufei Tao. 2005. Catching the best views of skyline: A semantic approach based on decisive subspaces. In VLDB. 253--264."},{"key":"e_1_2_2_42_1","unstructured":"PostgreSQL. 1996. https:\/\/www.postgresql.org\/. (1996)."},{"key":"e_1_2_2_43_1","unstructured":"Alec Radford Jeffrey Wu Rewon Child David Luan Dario Amodei Ilya Sutskever et al. 2019. Language models are unsupervised multitask learners. OpenAI blog Vol. 1 8 (2019) 9."},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2018.04.004"},{"key":"e_1_2_2_45_1","doi-asserted-by":"crossref","unstructured":"Ji Sun Guoliang Li and Nan Tang. 2021a. Learned cardinality estimation for similarity queries. In SIGMOD. 1745--1757.","DOI":"10.1145\/3448016.3452790"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/3485450.3485459"},{"key":"e_1_2_2_47_1","doi-asserted-by":"crossref","unstructured":"Xiu Tang Sai Wu Mingli Song Shanshan Ying Feifei Li and Gang Chen. 2022. PreQR: Pre-training representation for SQL understanding. In SIGMOD. 204--216.","DOI":"10.1145\/3514221.3517878"},{"key":"e_1_2_2_48_1","unstructured":"Hugo Touvron Matthieu Cord Matthijs Douze Francisco Massa Alexandre Sablayrolles and Herv\u00e9 J\u00e9gou. 2021. Training data-efficient image transformers & distillation through attention. In ICML. 10347--10357."},{"key":"e_1_2_2_49_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS. 5998--6008."},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/3485450.3485458"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.14778\/3461535.3461552"},{"key":"e_1_2_2_52_1","doi-asserted-by":"crossref","unstructured":"Yaoshu Wang Chuan Xiao Jianbin Qin Rui Mao Makoto Onizuka Wei Wang Rui Zhang and Yoshiharu Ishikawa. 2020. Consistent and flexible selectivity estimation for high-dimensional data. In SIGMOD. 2319--2327.","DOI":"10.1145\/3448016.3452772"},{"key":"e_1_2_2_53_1","doi-asserted-by":"crossref","unstructured":"Peizhi Wu and Gao Cong. 2021. A unified deep model of learning from both data and queries for cardinality estimation. In SIGMOD. 2009--2022.","DOI":"10.1145\/3448016.3452830"},{"key":"e_1_2_2_54_1","doi-asserted-by":"crossref","unstructured":"Tian Xia Donghui Zhang and Yufei Tao. 2008. On skylining with flexible dominance relation. In ICDE. 1397--1399.","DOI":"10.1109\/ICDE.2008.4497568"},{"key":"e_1_2_2_55_1","volume-title":"NeuroCard: One cardinality estimator for all tables. ArXiv Preprint ArXiv:2006.08109","author":"Yang Zongheng","year":"2020","unstructured":"Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: One cardinality estimator for all tables. ArXiv Preprint ArXiv:2006.08109 (2020)."},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.14778\/3368289.3368294"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2013.119"},{"key":"e_1_2_2_58_1","doi-asserted-by":"crossref","unstructured":"Zhenjie Zhang Yin Yang Ruichu Cai Dimitris Papadias and Anthony Tung. 2009. Kernel-based skyline cardinality estimation. In SIGMOD. 509--522.","DOI":"10.1145\/1559845.1559899"},{"key":"e_1_2_2_59_1","volume-title":"Zongyan He, Rui Li, and Hao Zhang.","author":"Zhao Kangfei","year":"2022","unstructured":"Kangfei Zhao, Jeffrey Xu Yu, Zongyan He, Rui Li, and Hao Zhang. 2022. Lightweight and accurate cardinality estimation by neural network gaussian process. In SIGMOD. 973--987."},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/3461535.3461539"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588958","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3588958","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:38Z","timestamp":1750178858000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588958"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,26]]},"references-count":60,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,5,26]]}},"alternative-id":["10.1145\/3588958"],"URL":"https:\/\/doi.org\/10.1145\/3588958","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,26]]}}}