{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T19:01:50Z","timestamp":1774983710364,"version":"3.50.1"},"reference-count":62,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,2,10]],"date-time":"2025-02-10T00:00:00Z","timestamp":1739145600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"crossref","award":["No. 62472068, 62272086"],"award-info":[{"award-number":["No. 62472068, 62272086"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Shenzhen Municipal Science and Technology R&D Funding Basic Research Program","award":["JCYJ20210324133607021"],"award-info":[{"award-number":["JCYJ20210324133607021"]}]},{"name":"Municipal Government of Quzhou under Grant","award":["No. 2023D044, 2023Z003, 2023D043, 2022D037, 2022D039, 2022D020"],"award-info":[{"award-number":["No. 2023D044, 2023Z003, 2023D043, 2022D037, 2022D039, 2022D020"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,2,10]]},"abstract":"<jats:p>\n                    In the realm of big data and cloud analytics, efficiently managing and retrieving high-dimensional data presents a critical challenge. Traditional indexes often struggle with the storage overhead inherent in large datasets. There is a growing interest in the adoption of Small Materialize Aggregation (SMA) among cloud database vendors due to its ability to maintain lightweight block-level metadata, facilitating efficient block skipping. However, SMA performance relies heavily on data layout. This is especially critical in scenarios with wide tables containing hundreds of dimensions, where the curse of dimensionality exacerbates the issue. In this paper, we propose\n                    <jats:sc>AdaCurve<\/jats:sc>\n                    , a novel approach aimed at enhancing block skipping in high-dimensional datasets through adaptive optimization of data layout. Unlike conventional static and non-adaptive space-filling curves (SFCs),\n                    <jats:sc>AdaCurve<\/jats:sc>\n                    leverages machine learning to develop an adaptive curve---a dynamically adjusting optimal projection function tailored to high-dimensional workloads and data characteristics. We introduce an attention-based network to handle high-dimensional data and a learnable objective for training adaptive curves in an end-to-end manner. Extensive experiments conducted on the Spark with real-world datasets demonstrate the effectiveness of\n                    <jats:sc>AdaCurve<\/jats:sc>\n                    . We have shown that\n                    <jats:sc>AdaCurve<\/jats:sc>\n                    effectively scales to datasets with dimensions of up to 1,000 columns, achieving a 2.8\u00d7 improvement in block skipping compared to SFCs.\n                  <\/jats:p>","DOI":"10.1145\/3709710","type":"journal-article","created":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T15:45:06Z","timestamp":1739288706000},"page":"1-26","source":"Crossref","is-referenced-by-count":1,"title":["Optimizing Block Skipping for High-Dimensional Data with Learned Adaptive Curve"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-3909-4021","authenticated-orcid":false,"given":"Xu","family":"Chen","sequence":"first","affiliation":[{"name":"University of Electronic Science and Technology of China, Chengdu, China, &amp; GaussDB(DWS) Team, Huawei Technologies Co., Ltd., Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3557-6598","authenticated-orcid":false,"given":"Shuncheng","family":"Liu","sequence":"additional","affiliation":[{"name":"Huawei Technologies Co., Ltd., Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-0723-5819","authenticated-orcid":false,"given":"Tong","family":"Yuan","sequence":"additional","affiliation":[{"name":"Huawei Technologies Co., Ltd., Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-1003-763X","authenticated-orcid":false,"given":"Tao","family":"Ye","sequence":"additional","affiliation":[{"name":"Huawei Technologies Co., Ltd., Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-5788-5668","authenticated-orcid":false,"given":"Kai","family":"Zeng","sequence":"additional","affiliation":[{"name":"Huawei Technologies Co. Ltd., Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5579-0378","authenticated-orcid":false,"given":"Han","family":"Su","sequence":"additional","affiliation":[{"name":"Yangtze Delta Region Institute(Quzhou), University of Electronic Science and Technology of China, Quzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0217-3998","authenticated-orcid":false,"given":"Kai","family":"Zheng","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China, &amp; Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, China"}]}],"member":"320","published-online":{"date-parts":[[2025,2,11]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2016. Compound and Interleaved Sort Keys. https:\/\/aws.amazon.com\/cn\/blogs\/big-data\/amazon-redshift-engineerings-advanced-table-design-playbook-compound-and-interleaved-sort-keys\/."},{"key":"e_1_2_1_2_1","unstructured":"2017. Attribute Clustering. https:\/\/docs.oracle.com\/database\/121\/DWHSG\/attcluster.htm."},{"key":"e_1_2_1_3_1","unstructured":"2017. Z-Order Indexing for Multifaceted Queries in Amazon DynamoDB: Part 1. https:\/\/aws.amazon.com\/cn\/blogs\/database\/z-order-indexing-for-multifaceted-queries-in-amazon-dynamodb-part-1\/."},{"key":"e_1_2_1_4_1","unstructured":"2018. Processing Petabytes of Data in Seconds with Databricks Delta. https:\/\/www.databricks.com\/blog\/2018\/07\/31\/processing-petabytes-of-data-in-seconds-with-databricks-delta.html."},{"key":"e_1_2_1_5_1","unstructured":"2023. BMTree source code. https:\/\/github.com\/gravesprite\/Learned-BMTree."},{"key":"e_1_2_1_6_1","unstructured":"2024. TPC-H. https:\/\/www.tpc.org\/tpch\/."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415560"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742797"},{"key":"e_1_2_1_9_1","volume-title":"Jamie Ryan Kiros, and Geoffrey E Hinton","author":"Ba Jimmy Lei","year":"2016","unstructured":"Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016)."},{"key":"e_1_2_1_10_1","volume-title":"Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473","author":"Bahdanau Dzmitry","year":"2014","unstructured":"Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-7091-0738-6_2"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/361002.361007"},{"key":"e_1_2_1_13_1","volume-title":"International Conference on Machine Learning. PMLR, 950--959","author":"Blondel Mathieu","year":"2020","unstructured":"Mathieu Blondel, Olivier Teboul, Quentin Berthet, and Josip Djolonga. 2020. Fast differentiable sorting and ranking. In International Conference on Machine Learning. PMLR, 950--959."},{"key":"e_1_2_1_14_1","unstructured":"Ralf Bousseljot Dieter Kreiseler and Allard Schnabel. 1995. Nutzung der EKG-Signaldatenbank CARDIODAT der PTB \u00fcber das Internet. (1995)."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/3598581.3598597"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/3594512.3594525"},{"key":"e_1_2_1_17_1","unstructured":"Angjela Davitkova Evica Milchevski and Sebastian Michel. 2020. The ML-Index: A Multidimensional Learned Index for Point Range and Nearest-Neighbor Queries.. In EDBT. 407--410."},{"key":"e_1_2_1_18_1","volume-title":"Yanzhu Ji, Davide Pagano, Gopal Paliwal, Panos Parchas, Pascal Pfeil, Orestis Polychroniou, et al .","author":"Ding Jialin","year":"2024","unstructured":"Jialin Ding, Matt Abrams, Sanghita Bandyopadhyay, Luciano Di Palma, Yanzhu Ji, Davide Pagano, Gopal Paliwal, Panos Parchas, Pascal Pfeil, Orestis Polychroniou, et al . 2024. Automated multidimensional data layouts in Amazon Redshift. (2024)."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457270"},{"key":"e_1_2_1_20_1","volume-title":"Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. arXiv preprint arXiv:2006.13282","author":"Ding Jialin","year":"2020","unstructured":"Jialin Ding, Vikram Nathan, Mohammad Alizadeh, and Tim Kraska. 2020. Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. arXiv preprint arXiv:2006.13282 (2020)."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE53745.2022.00201"},{"key":"e_1_2_1_22_1","volume-title":"LMSFC: A Novel Multidimensional Index based on Learned Monotonic Space Filling Curves. arXiv preprint arXiv:2304.12635","author":"Gao Jian","year":"2023","unstructured":"Jian Gao, Xin Cao, Xin Yao, Gong Zhang, and Wei Wang. 2023. LMSFC: A Novel Multidimensional Index based on Learned Monotonic Space Filling Curves. arXiv preprint arXiv:2304.12635 (2023)."},{"key":"e_1_2_1_23_1","volume-title":"Leveraging soft functional dependencies for indexing multi-dimensional data. arXiv preprint arXiv:2006.16393","author":"Ghaffari Behzad","year":"2020","unstructured":"Behzad Ghaffari, Ali Hadian, and Thomas Heinis. 2020. Leveraging soft functional dependencies for indexing multi-dimensional data. arXiv preprint arXiv:2006.16393 (2020)."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03730-6_10"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-16175-9_2"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/602259.602266"},{"key":"e_1_2_1_27_1","volume-title":"Reducing the dimensionality of data with neural networks. science 313, 5786","author":"Hinton Geoffrey E","year":"2006","unstructured":"Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. science 313, 5786 (2006), 504--507."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/93597.98742"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1071610.1071612"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 3rd International Workshop on Applied AI for Database Systems and Applications.","author":"Kang Rong","year":"2021","unstructured":"Rong Kang, Wentao Wu, Chen Wang, Ce Zhang, and Jianmin Wang. 2021. The case for ml-enhanced high-dimensional indexes. In Proceedings of the 3rd International Workshop on Applied AI for Database Systems and Applications."},{"key":"e_1_2_1_31_1","volume-title":"Supervised contrastive learning. Advances in neural information processing systems 33","author":"Khosla Prannay","year":"2020","unstructured":"Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. Advances in neural information processing systems 33 (2020)."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687765"},{"key":"e_1_2_1_33_1","volume-title":"Learned cardinalities: Estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677","author":"Kipf Andreas","year":"2018","unstructured":"Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677 (2018)."},{"key":"e_1_2_1_34_1","volume-title":"SOSD: A benchmark for learned indexes. arXiv preprint arXiv:1911.13014","author":"Kipf Andreas","year":"2019","unstructured":"Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2019. SOSD: A benchmark for learned indexes. arXiv preprint arXiv:1911.13014 (2019)."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196909"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.5555\/646102.681186"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/3598581.3598589"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/MPRV.2018.03367731"},{"key":"e_1_2_1_39_1","volume-title":"Small materialized aggregates: A light weight index structure for data warehousing. None","author":"Moerkotte Guido","year":"1998","unstructured":"Guido Moerkotte. 1998. Small materialized aggregates: A light weight index structure for data warehousing. None (1998)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-45072-6_7"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1025196714293"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.908985"},{"key":"e_1_2_1_43_1","unstructured":"Guy M Morton. 1966. A computer oriented geodetic data base and a new technique in file sequencing. (1966)."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380579"},{"key":"e_1_2_1_45_1","volume-title":"Cortex: Harnessing correlations to boost query performance. arXiv preprint arXiv:2012.06683","author":"Nathan Vikram","year":"2020","unstructured":"Vikram Nathan, Jialin Ding, Tim Kraska, and Mohammad Alizadeh. 2020. Cortex: Harnessing correlations to boost query performance. arXiv preprint arXiv:2012.06683 (2020)."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/348.318586"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/MDM.2011.41"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035934"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407829"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536222.2536233"},{"key":"e_1_2_1_51_1","volume-title":"Experience replay for continual learning. Advances in neural information processing systems 32","author":"Rolnick David","year":"2019","unstructured":"David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, and Gregory Wayne. 2019. Experience replay for continual learning. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610515"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/3025111.3025123"},{"key":"e_1_2_1_54_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008."},{"key":"e_1_2_1_55_1","volume-title":"a large publicly available electrocardiography dataset. Scientific data 7, 1","author":"Wagner Patrick","year":"2020","unstructured":"Patrick Wagner, Nils Strodthoff, Ralf-Dieter Bousseljot, Dieter Kreiseler, Fatima I Lunze, Wojciech Samek, and Tobias Schaeffter. 2020. PTB-XL, a large publicly available electrocardiography dataset. Scientific data 7, 1 (2020), 154."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/MDM.2019.00121"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10586-022-03723-y"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2018.00119"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389770"},{"key":"e_1_2_1_60_1","volume-title":"Deep unsupervised cardinality estimation. arXiv preprint arXiv:1905.04278","author":"Yang Zongheng","year":"2019","unstructured":"Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep unsupervised cardinality estimation. arXiv preprint arXiv:1905.04278 (2019)."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1155\/2020\/8897026"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137765.3137769"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3709710","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3709710","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T18:20:27Z","timestamp":1774981227000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3709710"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,10]]},"references-count":62,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,2,10]]}},"alternative-id":["10.1145\/3709710"],"URL":"https:\/\/doi.org\/10.1145\/3709710","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,10]]}}}