{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T05:11:39Z","timestamp":1775538699821,"version":"3.50.1"},"reference-count":85,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["No. 62272353 and No. 62276193"],"award-info":[{"award-number":["No. 62272353 and No. 62276193"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"Huawei Cloud Database Innovation Lab","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,12,4]]},"abstract":"<jats:p>With the development of AI and the growing demand for computational power, hardware is becoming increasingly specialized and heterogeneous. The emergence of diverse specialized hardware architectures, each with distinct characteristics and programming abstractions, poses significant portability and sustainability challenges for existing data processing systems. Tensor Computation Runtimes (TCRs) abstract away the low-level hardware complexities by providing users with a hardware-independent tensor-based interface, enabling data scientists to effectively leverage the powerful capabilities of new hardware accelerators (collectively referred to as XPU). Built on TCRs, the existing relational query engine TQP demonstrates portability across a wide range of target hardware and sustainability along with the ongoing evolution of TCRs and hardware. However, it neglects the big gap between irregular SQL workloads and uniform tensor operations when mapping SQL operators to tensor programs, which causes significant storage and computation overhead. In this paper, for the first time, we analyze the underlying gap between SQL and tensors, and provide guidelines to bridge it. Following these guidelines, we build a new Tensor-based Query Engine Enhanced (TQEx) by bridging the gap from multiple aspects: develop efficient storage and computation strategies for variable-length data, and design efficient SQL operators such as join and aggregate based on tensors. We also extend TQEx to multi-XPUs for large-scale data processing. Extensive experimental studies show that our query engine, TQEx, achieves a 9.6\u00d7 speedup (with a peak of 41.9\u00d7) over TQP on TPC-H, and it is also 27.9\u00d7 faster than leading GPU databases such as HeavyDB. On TPC-H at scale factor 100, TQEx outperforms DuckDB by 12.2\u00d7 and HeavyDB by 22.7\u00d7 on supported queries.<\/jats:p>","DOI":"10.1145\/3769835","type":"journal-article","created":{"date-parts":[[2025,12,6]],"date-time":"2025-12-06T04:32:13Z","timestamp":1764995533000},"page":"1-27","source":"Crossref","is-referenced-by-count":0,"title":["TQEx: Tensor-based Query Engine Enhanced by Bridging the Gap"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-6371-5462","authenticated-orcid":false,"given":"Haitao","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Computer Science, Wuhan University, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-4773-561X","authenticated-orcid":false,"given":"Ran","family":"Pang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Wuhan University, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3422-8017","authenticated-orcid":false,"given":"Yuanyuan","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Wuhan University, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0026-9283","authenticated-orcid":false,"given":"Hao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Huawei Technologies, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-5483-3317","authenticated-orcid":false,"given":"Congli","family":"Gao","sequence":"additional","affiliation":[{"name":"Huawei Technologies, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9376-818X","authenticated-orcid":false,"given":"Ming","family":"Zhong","sequence":"additional","affiliation":[{"name":"School of Computer Science, Wuhan University, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0051-0046","authenticated-orcid":false,"given":"Jiawei","family":"Jiang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Wuhan University, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4667-5794","authenticated-orcid":false,"given":"Tieyun","family":"Qian","sequence":"additional","affiliation":[{"name":"School of Computer Science, Wuhan University, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9738-827X","authenticated-orcid":false,"given":"Jeffrey Xu","family":"Yu","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2025,12,5]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2024. BlazingSQL. https:\/\/github.com\/BlazingDB"},{"key":"e_1_2_1_2_1","unstructured":"2024. HeavyDB. https:\/\/www.heavy.ai"},{"key":"e_1_2_1_3_1","unstructured":"2024. Kinetica. https:\/\/www.kinetica.com"},{"key":"e_1_2_1_4_1","unstructured":"2024. Profiler. https:\/\/github.com\/pytorch\/pytorch\/blob\/main\/torch\/autograd\/profiler.py."},{"key":"e_1_2_1_5_1","unstructured":"2025. Broadcasting semantics. https:\/\/pytorch.org\/docs\/stable\/notes\/broadcasting.html access date: 2025\/4\/10."},{"key":"e_1_2_1_6_1","volume-title":"Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng.","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. 265-283."},{"key":"e_1_2_1_7_1","first-page":"85","article-title":"Multi-core, main-memory joins: sort vs. hash revisited","volume":"7","author":"Balkesen Cagri","year":"2013","unstructured":"Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M. Tamer \u00d6zsu. 2013. Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7 (2013), 85-96.","journal-title":"PVLDB"},{"key":"e_1_2_1_8_1","first-page":"221","article-title":"Apache Calcite","author":"Begoli Edmon","year":"2018","unstructured":"Edmon Begoli, Jes\u00fas Camacho-Rodr\u00edguez, Julian Hyde, Michael J. Mior, and Daniel Lemire. 2018. Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources. In SIGMOD. 221-230.","journal-title":"In SIGMOD."},{"key":"e_1_2_1_9_1","first-page":"37","article-title":"Design and evaluation of main memory hash join algorithms for multi-core CPUs","author":"Blanas Spyros","year":"2011","unstructured":"Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD. 37-48.","journal-title":"SIGMOD."},{"key":"e_1_2_1_10_1","first-page":"1003","article-title":"GaccO - A GPU-accelerated OLTP DBMS","author":"Boeschen Nils","year":"2022","unstructured":"Nils Boeschen and Carsten Binnig. 2022. GaccO - A GPU-accelerated OLTP DBMS. In SIGMOD. 1003-1016.","journal-title":"SIGMOD."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1941487.1941507"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/359842.359859"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536274.2536325"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/3632093.3632107"},{"key":"e_1_2_1_15_1","volume-title":"Jianbin Qin, and Bo Tang.","author":"Cao Jiaping","year":"2025","unstructured":"Jiaping Cao, Le Xu, Man Lung Yiu, Jianbin Qin, and Bo Tang. 2025. GPH: An Efficient and Effective Perfect Hashing Scheme for GPU Architectures. SIGMOD 3 (2025), 165:1-165:26."},{"key":"e_1_2_1_16_1","volume-title":"MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. CoRR abs\/1512.01274","author":"Chen Tianqi","year":"2015","unstructured":"Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. CoRR abs\/1512.01274 (2015)."},{"key":"e_1_2_1_17_1","first-page":"578","article-title":"TVM","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Q. Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In OSDI. 578-594.","journal-title":"An Automated End-to-End Optimizing Compiler for Deep Learning. In OSDI."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/1454159.1454171"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3303753.3303760"},{"key":"e_1_2_1_20_1","first-page":"483","article-title":"Automatic contention detection and amelioration for data-intensive operations","author":"Cieslewicz John","year":"2010","unstructured":"John Cieslewicz, Kenneth A. Ross, Kyoho Satsumi, and Yang Ye. 2010. Automatic contention detection and amelioration for data-intensive operations. In SIGMOD. 483-494.","journal-title":"SIGMOD."},{"key":"e_1_2_1_21_1","unstructured":"Coralogix. 2024. Coralogix - Full-Stack Observability Platform with In-Stream Data Analytics. https:\/\/coralogix.com"},{"key":"e_1_2_1_22_1","unstructured":"Transaction Processing Performance Council. 2005. Transaction processing performance council. Web Site http:\/\/www. tpc.org (2005)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-55210-3_215"},{"key":"e_1_2_1_24_1","unstructured":"DBpedia. 2025. DBpedia. https:\/\/www.dbpedia.org. Accessed: 2025-07-15."},{"key":"e_1_2_1_25_1","volume-title":"Data, Everywhere - A Special Report on Managing Information. The Economist","author":"Economist The","year":"2010","unstructured":"The Economist. 2010. Data, Data, Everywhere - A Special Report on Managing Information. The Economist (2010). https:\/\/www.economist.com\/special-report\/2010\/02\/27\/data-data-everywhere"},{"key":"e_1_2_1_26_1","first-page":"93","article-title":"Massive atomics for massive parallelism on GPUs","author":"Egielski Ian J.","year":"2014","unstructured":"Ian J. Egielski, Jesse Huang, and Eddy Z. Zhang. 2014. Massive atomics for massive parallelism on GPUs. In ISMM. 93-103.","journal-title":"ISMM."},{"key":"e_1_2_1_27_1","first-page":"1603","article-title":"Pipelined Query Processing in Coprocessor Environments","author":"Funke Henning","year":"2018","unstructured":"Henning Funke, Sebastian Bre\u00df, Stefan Noll, Volker Markl, and Jens Teubner. 2018. Pipelined Query Processing in Coprocessor Environments. In SIGMOD. 1603-1618.","journal-title":"SIGMOD."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/3380750.3380758"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551833"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536360.2536370"},{"key":"e_1_2_1_31_1","first-page":"1360","article-title":"TCUDB","author":"Hu Yu-Ching","year":"2022","unstructured":"Yu-Ching Hu, Yuliang Li, and Hung-Wei Tseng. 2022. TCUDB: Accelerating Database with Tensor Processors. In SIGMOD. 1360-1374.","journal-title":"Accelerating Database with Tensor Processors. In SIGMOD."},{"key":"e_1_2_1_32_1","unstructured":"InfluxData Inc. 2024. InfluxDB - open source time series metrics and analytics database. https:\/\/influxdata.com\/"},{"key":"e_1_2_1_33_1","first-page":"189","article-title":"AA-Sort","author":"Inoue Hiroshi","year":"2007","unstructured":"Hiroshi Inoue, Takao Moriyama, Hideaki Komatsu, and Toshio Nakatani. 2007. AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors. In PACT. 189-198.","journal-title":"In PACT."},{"key":"e_1_2_1_34_1","first-page":"1202","article-title":"Caribou: Intelligent distributed storage","volume":"10","author":"Istv\u00e1n Zsolt","year":"2017","unstructured":"Zsolt Istv\u00e1n, David Sidler, and Gustavo Alonso. 2017. Caribou: Intelligent distributed storage. PVLDB 10, 11 (2017), 1202-1213.","journal-title":"PVLDB"},{"key":"e_1_2_1_35_1","volume-title":"The Modern Data Architecture: The Deconstructed Database. login Usenix Mag. 43, 4","author":"Khurana Amandeep","year":"2018","unstructured":"Amandeep Khurana and Julien Le Dem. 2018. The Modern Data Architecture: The Deconstructed Database. login Usenix Mag. 43, 4 (2018), 37-40."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687564"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1137\/0206024"},{"key":"e_1_2_1_38_1","first-page":"998","article-title":"Relax","author":"Lai Ruihang","year":"2025","unstructured":"Ruihang Lai, Junru Shao, Siyuan Feng, Steven Lyubomirsky, Bohan Hou, Wuwei Lin, Zihao Ye, Hongyi Jin, Yuchen Jin, Jiawei Liu, Lesheng Jin, Yaxing Cai, Ziheng Jiang, Yong Wu, Sunghyun Park, Prakalp Srivastava, Jared Roesch, Todd C. Mowry, and Tianqi Chen. 2025. Relax: Composable Abstractions for End-to-End Dynamic Machine Learning. In ASPLOS. 998-1013.","journal-title":"Composable Abstractions for End-to-End Dynamic Machine Learning. In ASPLOS."},{"key":"e_1_2_1_39_1","volume-title":"Liang-Chi Hsieh, and Chao Sun.","author":"Lamb Andrew","year":"2024","unstructured":"Andrew Lamb, Yijie Shen, Dani\u00ebl Heres, Jayjeet Chakraborty, Mehmet Ozan Kabak, Liang-Chi Hsieh, and Chao Sun. 2024. Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine. In SIGMOD. 5-17."},{"key":"e_1_2_1_40_1","first-page":"743","article-title":"Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age","author":"Leis Viktor","year":"2014","unstructured":"Viktor Leis, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In SIGMOD. 743-754.","journal-title":"SIGMOD."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3007328.3007331"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2020.3030548"},{"key":"e_1_2_1_43_1","first-page":"1633","article-title":"Pump Up the Volume","author":"Lutz Clemens","year":"2020","unstructured":"Clemens Lutz, Sebastian Bre\u00df, Steffen Zeuch, Tilmann Rabl, and Volker Markl. 2020. Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects. In SIGMOD. 1633-1649.","journal-title":"Processing Large Data on GPUs with Fast Interconnects. In SIGMOD."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.14778\/3151113.3151114"},{"key":"e_1_2_1_45_1","unstructured":"Microsoft. 2022. ONNX Runtime. https:\/\/github.com\/microsoft\/onnxruntime"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/3503585.3503601"},{"key":"e_1_2_1_47_1","volume-title":"Todd C. Mowry, and Andrew Pavlo.","author":"Ngom Amadou","year":"2021","unstructured":"Amadou Ngom, Prashanth Menon, Matthew Butrovich, Lin Ma, Wan Shen Lim, Todd C. Mowry, and Andrew Pavlo. 2021. Filter Representation in Vectorized Query Execution. In DAMON. Article 6, 7 pages."},{"key":"e_1_2_1_48_1","unstructured":"NVIDIA. 2025. cuDF. https:\/\/docs.rapids.ai\/api\/cudf\/stable\/. Accessed: 2025-07-15."},{"key":"e_1_2_1_49_1","unstructured":"NVIDIA. 2025. Nsight Compute. https:\/\/developer.nvidia.com\/nsight-compute. Accessed: 2025-07-15."},{"key":"e_1_2_1_50_1","first-page":"125","article-title":"Accelerating database systems using FPGAs: A survey","author":"Papaphilippou Philippos","year":"2018","unstructured":"Philippos Papaphilippou andWayne Luk. 2018. Accelerating database systems using FPGAs: A survey. In FPL. 125-1255.","journal-title":"FPL."},{"key":"e_1_2_1_51_1","first-page":"8024","article-title":"PyTorch: An Imperative Style, High-Performance Deep Learning Library","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\u00f6pf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NIPS. 8024-8035.","journal-title":"NIPS."},{"key":"e_1_2_1_52_1","first-page":"1935","article-title":"GPL: A GPU-based pipelined query processing engine","author":"Paul Johns","year":"2016","unstructured":"Johns Paul, Jiong He, and Bingsheng He. 2016. GPL: A GPU-based pipelined query processing engine. In SIGMOD. 1935-1950.","journal-title":"SIGMOD."},{"key":"e_1_2_1_53_1","first-page":"3372","article-title":"Velox: Meta's Unified Execution Engine","volume":"15","author":"Pedreira Pedro","year":"2022","unstructured":"Pedro Pedreira, Orri Erling, Maria Basmanova, Kevin Wilfong, Laith Sakka, Krishna Pai, Wei He, and Biswapesh Chattopadhyay. 2022. Velox: Meta's Unified Execution Engine. PVLDB 15, 12 (2022), 3372-3384.","journal-title":"PVLDB"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.14778\/3603581.3603604"},{"key":"e_1_2_1_55_1","first-page":"755","article-title":"A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort","author":"Polychroniou Orestis","year":"2014","unstructured":"Orestis Polychroniou and Kenneth A. Ross. 2014. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In SIGMOD. 755-766.","journal-title":"SIGMOD."},{"key":"e_1_2_1_56_1","unstructured":"The Dask Project. 2024. The dask-sql Project. https:\/\/dask-sql.readthedocs.io\/en\/latest\/"},{"key":"e_1_2_1_57_1","unstructured":"Pytorch. 2024. Accelerating PyTorch with CUDA Graphs. https:\/\/pytorch.org\/blog\/accelerating-pytorch-with-cudagraphs\/"},{"key":"e_1_2_1_58_1","unstructured":"PyTorch. 2025. Introduction to torch.compile. https:\/\/pytorch.org\/tutorials\/intermediate\/torch_compile_tutorial.html. Accessed: 2025-04-15."},{"key":"e_1_2_1_59_1","unstructured":"Mark Raasveldt and Hannes M\u00fchleisen. 2020. Data Management for Data Science-Towards Embedded Analytics. In CIDR."},{"key":"e_1_2_1_60_1","volume-title":"Relay: A High-Level Compiler for Deep Learning. CoRR abs\/1904.08368","author":"Roesch Jared","year":"2019","unstructured":"Jared Roesch, Steven Lyubomirsky, Marisa Kirisame, Logan Weber, Josh Pollock, Luis Vega, Ziheng Jiang, Tianqi Chen, Thierry Moreau, and Zachary Tatlock. 2019. Relay: A High-Level Compiler for Deep Learning. CoRR abs\/1904.08368 (2019)."},{"key":"e_1_2_1_61_1","volume-title":"Query Processing on Heterogeneous CPU\/GPU Systems. ACM Comput. Surv. 55, 2","author":"Rosenfeld Viktor","year":"2023","unstructured":"Viktor Rosenfeld, Sebastian Bre\u00df, and Volker Markl. 2023. Query Processing on Heterogeneous CPU\/GPU Systems. ACM Comput. Surv. 55, 2 (2023), 11:1-11:38."},{"key":"e_1_2_1_62_1","first-page":"1961","article-title":"An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory","author":"Schuh Stefan","year":"2016","unstructured":"Stefan Schuh, Xiao Chen, and Jens Dittrich. 2016. An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory. In SIGMOD. 1961-1976.","journal-title":"SIGMOD."},{"key":"e_1_2_1_63_1","unstructured":"Seafowl. 2024. Seafowl Postgres Accelerator. https:\/\/seafowl.io\/"},{"key":"e_1_2_1_64_1","first-page":"1617","article-title":"A study of the fundamental performance characteristics of GPUs and CPUs for database analytics","author":"Shanbhag Anil","year":"2020","unstructured":"Anil Shanbhag, Samuel Madden, and Xiangyao Yu. 2020. A study of the fundamental performance characteristics of GPUs and CPUs for database analytics. In SIGMOD. 1617-1632.","journal-title":"SIGMOD."},{"key":"e_1_2_1_65_1","first-page":"1390","article-title":"Tile-based Lightweight Integer Compression in GPU","author":"Shanbhag Anil","year":"2022","unstructured":"Anil Shanbhag, Bobbi W. Yogatama, Xiangyao Yu, and Samuel Madden. 2022. Tile-based Lightweight Integer Compression in GPU. In SIGMOD. 1390-1403.","journal-title":"SIGMOD."},{"key":"e_1_2_1_66_1","unstructured":"SIGOPS. 2020. The Increasing Heterogeneity of Cloud Hardware and What It Means for Systems. https:\/\/www.sigops.org\/2020\/the-increasing-heterogeneity-of-cloud-hardware-and-what-it-means-for-systems\/"},{"key":"e_1_2_1_67_1","first-page":"719","article-title":"GPU-accelerated string matching for database applications","volume":"25","author":"Sitaridi Evangelia A.","year":"2016","unstructured":"Evangelia A. Sitaridi and Kenneth A. Ross. 2016. GPU-accelerated string matching for database applications. PVLDB 25 (2016), 719-740.","journal-title":"PVLDB"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.4380211006"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.331.6018.692"},{"key":"e_1_2_1_70_1","unstructured":"Statista. 2024. Volume of data\/information created captured copied andconsumed worldwide from 2010 to 2025. https:\/\/www.statista.com\/statistics\/871513\/worldwide-data-created\/"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/79173.79184"},{"key":"e_1_2_1_72_1","unstructured":"Synnada. 2024. Synnada Real-Time Data Platform. https:\/\/www.synnada.ai\/"},{"key":"e_1_2_1_73_1","first-page":"362","article-title":"Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware","author":"Teubner Jens","year":"2013","unstructured":"Jens Teubner, Gustavo Alonso, Cagri Balkesen, and M. Tamer Ozsu. 2013. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE. 362-373.","journal-title":"ICDE."},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2017.29"},{"key":"e_1_2_1_75_1","first-page":"51","article-title":"Characterization and analysis of dynamic parallelism in unstructured GPU applications","author":"Wang Jin","year":"2014","unstructured":"Jin Wang and Sudhakar Yalamanchili. 2014. Characterization and analysis of dynamic parallelism in unstructured GPU applications. In IISWC. 51-60.","journal-title":"IISWC."},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732967.2732976"},{"key":"e_1_2_1_77_1","first-page":"111","article-title":"Shuhai: Benchmarking high bandwidth memory on fpgas","author":"Wang Zeke","year":"2020","unstructured":"Zeke Wang, Hongjing Huang, Jie Zhang, and Gustavo Alonso. 2020. Shuhai: Benchmarking high bandwidth memory on fpgas. In FCCM. 111-119.","journal-title":"FCCM."},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/1519103.1519113"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_2_1_80_1","volume-title":"Efficiently Processing Joins and Grouped Aggregations on GPUs. SIGMOD 3","author":"Wu Bowen","year":"2025","unstructured":"Bowen Wu, Dimitrios Koutsoukos, and Gustavo Alonso. 2025. Efficiently Processing Joins and Grouped Aggregations on GPUs. SIGMOD 3 (2025), 27 pages."},{"key":"e_1_2_1_81_1","first-page":"1","article-title":"Scalable aggregation on multicore processors","author":"Ye Yang","year":"2011","unstructured":"Yang Ye, Kenneth A. Ross, and Norases Vesdapunt. 2011. Scalable aggregation on multicore processors. In DaMoN. 1-9.","journal-title":"DaMoN."},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551809"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536206.2536210"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2021.3100743"},{"key":"e_1_2_1_85_1","first-page":"1060","article-title":"Incorporating partitioning and parallel plans into the SCOPE optimizer","author":"Zhou Jingren","year":"2010","unstructured":"Jingren Zhou, Per-Ake Larson, and Ronnie Chaiken. 2010. Incorporating partitioning and parallel plans into the SCOPE optimizer. In ICDE. 1060-1071.","journal-title":"ICDE."}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3769835","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T04:29:57Z","timestamp":1775536197000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3769835"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,4]]},"references-count":85,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,12,4]]}},"alternative-id":["10.1145\/3769835"],"URL":"https:\/\/doi.org\/10.1145\/3769835","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,4]]}}}