{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,8]],"date-time":"2026-02-08T03:59:53Z","timestamp":1770523193136,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":77,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,6,9]],"date-time":"2024-06-09T00:00:00Z","timestamp":1717891200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,6,9]]},"DOI":"10.1145\/3626246.3653368","type":"proceedings-article","created":{"date-parts":[[2024,5,23]],"date-time":"2024-05-23T10:26:39Z","timestamp":1716459999000},"page":"5-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-5687-5276","authenticated-orcid":false,"given":"Andrew","family":"Lamb","sequence":"first","affiliation":[{"name":"InfluxData, Boston, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2679-1088","authenticated-orcid":false,"given":"Yijie","family":"Shen","sequence":"additional","affiliation":[{"name":"Space and Time, Irvine, CA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-3087-4992","authenticated-orcid":false,"given":"Dani\u00ebl","family":"Heres","sequence":"additional","affiliation":[{"name":"Coralogix, Utrecht, Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1647-2494","authenticated-orcid":false,"given":"Jayjeet","family":"Chakraborty","sequence":"additional","affiliation":[{"name":"UC Santa Cruz, Santa Cruz, CA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-5425-2834","authenticated-orcid":false,"given":"Mehmet Ozan","family":"Kabak","sequence":"additional","affiliation":[{"name":"Synnada, Austin, TX, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-6513-984X","authenticated-orcid":false,"given":"Liang-Chi","family":"Hsieh","sequence":"additional","affiliation":[{"name":"Apple, Seattle, WA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-8192-3496","authenticated-orcid":false,"given":"Chao","family":"Sun","sequence":"additional","affiliation":[{"name":"Apple, Cupertino, CA, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,6,9]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2007.367892"},{"key":"e_1_3_2_1_2_1","unstructured":"Mustafa Akur and Mehmet Ozan Kabak. 2023. Running Windowing Queries in Stream Processing. https:\/\/www.synnada.ai\/blog\/running-window-query-instream- processing"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415560"},{"key":"e_1_3_2_1_4_1","unstructured":"Arroyo. 2023. Arroyo - Serverless Stream Processing. https:\/\/www.arroyo.dev\/"},{"key":"e_1_3_2_1_5_1","unstructured":"The Blaze Authors. 2023. The Blaze accelerator for Apache Spark. https:\/\/github. com\/blaze-init\/blaze"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3190662"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526054"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3277104.3277119"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3314045"},{"key":"e_1_3_2_1_10_1","volume-title":"VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10--14","author":"Chaudhuri Surajit","year":"2000","unstructured":"Surajit Chaudhuri and Gerhard Weikum. 2000. Rethinking Database System Architecture: Towards a Self-Tuning RISC-Style Database System. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10--14, 2000, Cairo, Egypt, Amr El Abbadi, Michael L. Brodie, Sharma Chakravarthy, Umeshwar Dayal, Nabil Kamel, Gunter Schlageter, and Kyu-Young Whang (Eds.). Morgan Kaufmann, 1--10. http:\/\/www.vldb.org\/conf\/2000\/P001.pdf"},{"key":"e_1_3_2_1_11_1","unstructured":"The Apache Arrow DataFusion Comet. 2024. The Comet accelerator for Apache Spark. https:\/\/github.com\/apache\/arrow-datafusion-comet"},{"key":"e_1_3_2_1_12_1","volume-title":"Cargo: Rust's built-in package manager. https: \/\/crates.io\/","author":"The Rust","year":"2023","unstructured":"The Rust community. 2023. Cargo: Rust's built-in package manager. https: \/\/crates.io\/"},{"key":"e_1_3_2_1_13_1","unstructured":"Coralogix. 2023. Coralogix - Full-Stack Observability Platform with In-Stream Data Analytics. https:\/\/coralogix.com"},{"key":"e_1_3_2_1_14_1","unstructured":"IBM Corporation. 2023. IBM DB2. https:\/\/www.ibm.com\/products\/db2"},{"key":"e_1_3_2_1_15_1","unstructured":"Microsoft Corporation. 2023. Microsoft SQL Server. https:\/\/www.microsoft.com\/ en-us\/sql-server"},{"key":"e_1_3_2_1_16_1","unstructured":"Oracle Corporation. 2023. The Oracle Database Server. https:\/\/www.oracle.com\/ database\/"},{"key":"e_1_3_2_1_17_1","unstructured":"The Transaction Processing Council. 2023. The TPC-H Benchmark. https:\/\/www. tpc.org\/tpch\/"},{"key":"e_1_3_2_1_18_1","unstructured":"Voltron Data. 2023. The Composable Codex. https:\/\/voltrondata.com\/codex"},{"key":"e_1_3_2_1_19_1","unstructured":"Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. (2004) 137--150. http:\/\/www.usenix.org\/events\/osdi04\/tech\/ dean.html"},{"key":"e_1_3_2_1_20_1","unstructured":"Delta-rs. 2024. A native Rust library for Delta Lake. https:\/\/github.com\/deltaio\/ delta-rs"},{"key":"e_1_3_2_1_21_1","unstructured":"Arrow developers. 2023. Mailing list: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format. https:\/\/lists.apache.org\/thread\/ r28rw5n39jwtvn08oljl09d4q2c1ysvb"},{"key":"e_1_3_2_1_22_1","volume-title":"Substrait: Cross-Language Serialization for Relational Algebra. https:\/\/substrait.io\/","author":"Developers Substrait","year":"2023","unstructured":"Substrait Developers. 2023. Substrait: Cross-Language Serialization for Relational Algebra. https:\/\/substrait.io\/"},{"key":"e_1_3_2_1_23_1","volume-title":"DeWitt and Michael Stonebraker","author":"David","year":"2008","unstructured":"David J. DeWitt and Michael Stonebraker. 2008. MapReduce: A major step backwards. https:\/\/homes.cs.washington.edu\/~billhowe\/mapreduce_a_major_step_ backwards.html"},{"key":"e_1_3_2_1_24_1","unstructured":"Apache Software Foundation. 2023. Apache Arrow. https:\/\/arrow.apache.org"},{"key":"e_1_3_2_1_25_1","unstructured":"Apache Software Foundation. 2023. Apache Arrow DataFusion. https:\/\/arrow. apache.org\/datafusion\/"},{"key":"e_1_3_2_1_26_1","unstructured":"Apache Software Foundation. 2023. Apache Parquet. https:\/\/parquet.apache.org"},{"key":"e_1_3_2_1_27_1","unstructured":"Apache Software Foundation. 2023. A Primer on ASF Governance. https:\/\/www. apache.org\/foundation\/governance\/"},{"key":"e_1_3_2_1_28_1","unstructured":"Apache Software Foundation. 2023. PyArrow - Apache Arrow Python bindings. https:\/\/arrow.apache.org\/docs\/python\/index.html"},{"key":"e_1_3_2_1_29_1","unstructured":"The Apache Software Foundation. 2023. Apache DataFusion SQL reference. https: \/\/arrow.apache.org\/datafusion\/user-guide\/sql\/index.html"},{"key":"e_1_3_2_1_30_1","volume-title":"Apache Iceberg: The open table format for analytic datasets. https:\/\/iceberg.apache.org\/","author":"Software Foundation The Apache","year":"2024","unstructured":"The Apache Software Foundation. 2024. Apache Iceberg: The open table format for analytic datasets. https:\/\/iceberg.apache.org\/"},{"key":"e_1_3_2_1_31_1","unstructured":"The Zig Software Foundation. 2023. The Zig programming language. https: \/\/ziglang.org\/"},{"key":"e_1_3_2_1_32_1","unstructured":"Rust futures crate. 2023. Stream trait. https:\/\/docs.rs\/futures\/0.3.28\/futures\/ prelude\/stream\/trait.Stream.html"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/93597.98720"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1132960.1132964"},{"key":"e_1_3_2_1_35_1","unstructured":"H2O.ai. 2023. Database-like ops benchmark. https:\/\/h2oai.github.io\/dbbenchmark\/"},{"key":"e_1_3_2_1_36_1","first-page":"40","article-title":"MonetDB: Two Decades of Research in Columnoriented Database Architectures","volume":"35","author":"Idreos Stratos","year":"2012","unstructured":"Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, K. Sjoerd Mullender, and Martin L. Kersten. 2012. MonetDB: Two Decades of Research in Columnoriented Database Architectures. IEEE Data Eng. Bull. 35, 1 (2012), 40--45. http: \/\/sites.computer.org\/debull\/A12mar\/monetdb.pdf","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_3_2_1_37_1","unstructured":"Apple Inc. 2023. The Swift programming language. https:\/\/developer.apple.com\/ swift\/"},{"key":"e_1_3_2_1_38_1","unstructured":"ClickHouse Inc. 2023. ClickBench - a Benchmark For Analytical DBMS. https: \/\/benchmark.clickhouse.com\/"},{"key":"e_1_3_2_1_39_1","unstructured":"InfluxData Inc. 2023. Announcing InfluxDB IOx - The Future Core of InfluxDB Built with Rust and Arrow. https:\/\/www.influxdata.com\/blog\/announcing-influxdbiox\/"},{"key":"e_1_3_2_1_40_1","unstructured":"InfluxData Inc. 2023. The Influx Query Language Specification. https:\/\/github. com\/influxdata\/influxql"},{"key":"e_1_3_2_1_41_1","unstructured":"InfluxData Inc. 2023. InfluxDB - open source time series metrics and analytics database. https:\/\/influxdata.com\/"},{"key":"e_1_3_2_1_42_1","volume-title":"Information technology - Database languages - SQL. Standard","author":"IEC","unstructured":"ISO\/IEC 9075:2023 2023. Information technology - Database languages - SQL. Standard. International Organization for Standardization."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/3275366.3275370"},{"key":"e_1_3_2_1_44_1","volume-title":"The Modern Data Architecture: The Deconstructed Database. login Usenix Mag. 43, 4","author":"Khurana Amandeep","year":"2018","unstructured":"Amandeep Khurana and Julien Le Dem. 2018. The Modern Data Architecture: The Deconstructed Database. login Usenix Mag. 43, 4 (2018). https:\/\/www.usenix. org\/publications\/login\/winter-2018-vol-43-no-4\/khurana"},{"key":"e_1_3_2_1_45_1","unstructured":"DuckDB Labs. 2023. Parallel Grouped Aggregation in DuckDB. https:\/\/duckdb. org\/2022\/03\/07\/aggregate-hashtable.html"},{"key":"e_1_3_2_1_46_1","unstructured":"Andrew Lamb. 2022. Using Rustlang's Async Tokio Runtime for CPU-Bound Tasks. https:\/\/thenewstack.io\/using-rustlangs-async-tokio-runtime-for-cpubound- tasks\/"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367518"},{"key":"e_1_3_2_1_48_1","unstructured":"Andrew Lamb and Raphael Taylor-Davies. 2022. Querying Parquet with Millisecond Latency. https:\/\/arrow.apache.org\/blog\/2022\/12\/26\/querying-parquet-withmillisecond- latency\/"},{"key":"e_1_3_2_1_49_1","unstructured":"Andrew Lamb and Raphael Taylor-Davies. 2023. Fast and Memory Efficient Multi- Column Sorts in Apache Arrow Rust. https:\/\/arrow.apache.org\/blog\/2022\/11\/07\/ multi-column-sorts-in-arrow-rust-part-1\/"},{"key":"e_1_3_2_1_50_1","unstructured":"Andrew Lamb Raphael Taylor-Davies and Dani\u00ebl Heres. 2023. Aggregating Millions of Groups Fast in Apache Arrow DataFusion. https:\/\/www.influxdata. com\/blog\/aggregating-millions-groups-fast-apache-arrow-datafusion"},{"key":"e_1_3_2_1_51_1","unstructured":"Lance. 2023. Lance: modern columnar data format for ML. https:\/\/lancedb.github. io\/lance\/"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610507"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.14778\/2794367.2794375"},{"key":"e_1_3_2_1_55_1","unstructured":"Jon Mease. 2023. VegaFusion: Server side scaling for the Vega visualization library. https:\/\/vegafusion.io\/"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920886"},{"key":"e_1_3_2_1_57_1","volume-title":"VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24--27","author":"Moerkotte Guido","year":"1998","unstructured":"Guido Moerkotte. 1998. Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing. In VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24--27, 1998, New York City, New York, USA, Ashish Gupta, Oded Shmueli, and Jennifer Widom (Eds.). Morgan Kaufmann, 476--487. http:\/\/www.vldb.org\/conf\/1998\/p476.pdf"},{"key":"e_1_3_2_1_58_1","volume-title":"Interval analysis","author":"Moore Ramon E.","unstructured":"Ramon E. Moore. 1966. Interval analysis. Vol. 4. Prentice-Hall Englewood Cliffs."},{"key":"e_1_3_2_1_59_1","unstructured":"The pandas development team. 2020. pandas-dev\/pandas: Pandas. https:\/\/doi. org\/10.5281\/zenodo.3509134"},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554829"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.14778\/3603581.3603604"},{"key":"e_1_3_2_1_62_1","unstructured":"PostgreSQL. 2024. The PostgreSQL Relationa Database. https:\/\/www.postgresql. org\/"},{"key":"e_1_3_2_1_63_1","unstructured":"The Dask project. 2023. The dask-sql project. https:\/\/dask-sql.readthedocs.io\/en\/ latest\/"},{"key":"e_1_3_2_1_64_1","volume-title":"Gluten: Plugin to Double SparkSQL's Performance. https: \/\/h2oai.github.io\/db-benchmark\/","author":"The OAP","year":"2023","unstructured":"The OAP project. 2023. Gluten: Plugin to Double SparkSQL's Performance. https: \/\/h2oai.github.io\/db-benchmark\/"},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209950.3209955"},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3320212"},{"key":"e_1_3_2_1_67_1","volume-title":"Tokio: As asynchronous Rust runtime. https:\/\/tokio.rs\/","author":"Developers Tokio","year":"2023","unstructured":"Tokio rs Developers. 2023. Tokio: As asynchronous Rust runtime. https:\/\/tokio.rs\/"},{"key":"e_1_3_2_1_68_1","unstructured":"SDF. 2023. SDF. https:\/\/www.sdf.com\/engine"},{"key":"e_1_3_2_1_69_1","unstructured":"Seafowl. 2024. Seafowl Postgres Accelerator. https:\/\/seafowl.io\/"},{"key":"e_1_3_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544909"},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.14778\/3510397.3510408"},{"key":"e_1_3_2_1_72_1","unstructured":"The sqlparser-rs authors. 2023. sqlparser-rs: Extensible SQL Lexer and Parser for Rust. https:\/\/github.com\/sqlparser-rs\/sqlparser-rs"},{"key":"e_1_3_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/1409360.1409379"},{"key":"e_1_3_2_1_74_1","volume-title":"Proceedings of the 31st International Conference on Very Large Data Bases","author":"Stonebraker Michael","year":"2005","unstructured":"Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, and Stanley B. Zdonik. 2005. C-Store: A Column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 - September 2, 2005, Klemens B\u00f6hm, Christian S. Jensen, Laura M. Haas, Martin L. Kersten, Per-\u00c5ke Larson, and Beng Chin Ooi (Eds.). ACM, 553--564. http:\/\/www.vldb.org\/ archives\/website\/2005\/program\/paper\/thu\/p553-stonebraker.pdf"},{"key":"e_1_3_2_1_75_1","unstructured":"Synnada. 2023. Synnada realtime data platform. https:\/\/www.synnada.ai\/"},{"key":"e_1_3_2_1_76_1","unstructured":"The Rust team. 2023. The Rust programming language. https:\/\/www.rustlang. org\/"},{"key":"e_1_3_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/2934664"}],"event":{"name":"SIGMOD\/PODS '24: International Conference on Management of Data","location":"Santiago AA Chile","acronym":"SIGMOD\/PODS '24","sponsor":["SIGMOD ACM Special Interest Group on Management of Data"]},"container-title":["Companion of the 2024 International Conference on Management of Data"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3626246.3653368","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3626246.3653368","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T11:29:15Z","timestamp":1755862155000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3626246.3653368"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,9]]},"references-count":77,"alternative-id":["10.1145\/3626246.3653368","10.1145\/3626246"],"URL":"https:\/\/doi.org\/10.1145\/3626246.3653368","relation":{},"subject":[],"published":{"date-parts":[[2024,6,9]]},"assertion":[{"value":"2024-06-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}