{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,30]],"date-time":"2025-08-30T00:06:21Z","timestamp":1756512381666,"version":"3.44.0"},"reference-count":76,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,3]]},"abstract":"<jats:p>Data scientists today often need to analyze data from various places. This makes it necessary for corresponding engines to support query federation (i.e., the ability to perform SQL queries over data hosted in different sources). Although many systems come with federation capabilities, their implementations are tightly coupled with the core engine design. This not only increases complexity and reduces portability across engines, but also often leads to performance issues by missing optimization opportunities. This paper proposes Accio, a new \"bolt-on\" approach to query federation. Accio is a middleware library that decouples query federation from the target system. It enables two key optimizations\u2014join pushdown and query partitioning\u2014via a declarative interface that can be easily leveraged by different engines. Our experience of adapting five popular data science query engines shows that Accio can outperform existing approaches by orders of magnitude in various scenarios without the need for any intrusive changes or extra maintenance.<\/jats:p>","DOI":"10.14778\/3734839.3734849","type":"journal-article","created":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T16:01:06Z","timestamp":1756483266000},"page":"2126-2135","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Accio: Bolt-on Query Federation"],"prefix":"10.14778","volume":"18","author":[{"given":"Xiaoying","family":"Wang","sequence":"first","affiliation":[{"name":"Simon Fraser University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiannan","family":"Wang","sequence":"additional","affiliation":[{"name":"Huawei Technologies, Simon Fraser University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tianzheng","family":"Wang","sequence":"additional","affiliation":[{"name":"Simon Fraser University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yong","family":"Zhang","sequence":"additional","affiliation":[{"name":"Huawei Technologies"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,8,29]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2023. Federated Query using Spark. https:\/\/medium.com\/globant\/federated-query-using-spark-a32ad9152e77."},{"key":"e_1_2_1_2_1","unstructured":"2024. Amazon Athena. https:\/\/docs.aws.amazon.com\/athena\/latest\/ug\/what-is.html."},{"key":"e_1_2_1_3_1","unstructured":"2024. Apache Arrow. https:\/\/arrow.apache.org\/."},{"key":"e_1_2_1_4_1","unstructured":"2024. Are different database systems going to be supported data sources? https:\/\/github.com\/apache\/datafusion\/issues\/1048."},{"key":"e_1_2_1_5_1","unstructured":"2024. Calcite CoreRules. https:\/\/calcite.apache.org\/javadocAggregate\/org\/apache\/calcite\/rel\/rules\/CoreRules.html."},{"key":"e_1_2_1_6_1","unstructured":"2024. Calcite RelDecorrelator. https:\/\/calcite.apache.org\/javadocAggregate\/org\/apache\/calcite\/sql2rel\/RelDecorrelator.html."},{"key":"e_1_2_1_7_1","unstructured":"2024. chDB. https:\/\/clickhouse.com\/docs\/en\/chdb."},{"key":"e_1_2_1_8_1","unstructured":"2024. ClickHouse. https:\/\/github.com\/ClickHouse\/ClickHouse."},{"key":"e_1_2_1_9_1","unstructured":"2024. Common Subexpression Elimination Explained. https:\/\/learn.microsoft.com\/en-us\/sql\/analytics-platform-system\/common-sub-expression-elimination?view=aps-pdw-2016-au7."},{"key":"e_1_2_1_10_1","unstructured":"2024. Database-like ops benchmark. https:\/\/duckdblabs.github.io\/db-benchmark\/."},{"key":"e_1_2_1_11_1","unstructured":"2024. DuckDB Postgres extension. https:\/\/github.com\/duckdb\/postgres_scanner."},{"key":"e_1_2_1_12_1","unstructured":"2024. DuckDB SQLite extension. https:\/\/github.com\/duckdb\/sqlite_scanner."},{"key":"e_1_2_1_13_1","unstructured":"2024. Federated Querying using DuckDB on blockchain data. https:\/\/kowalskidefi.medium.com\/federated-querying-using-duckdb-x-postgres-on-blockchain-data-5391518601ee."},{"key":"e_1_2_1_14_1","unstructured":"2024. Huawei Hetu Cyberverse. https:\/\/www.huaweicloud.com\/product\/live\/dspace.html."},{"key":"e_1_2_1_15_1","unstructured":"2024. MySQL Constant-Folding Optimization. https:\/\/dev.mysql.com\/doc\/refman\/8.4\/en\/constant-folding-optimization.html."},{"key":"e_1_2_1_16_1","unstructured":"2024. Polars: Fast multi-threaded DataFrame library in Rust and Python. https:\/\/github.com\/pola-rs\/polars."},{"key":"e_1_2_1_17_1","unstructured":"2024. scan_database feature. https:\/\/github.com\/pola-rs\/polars\/issues\/9091."},{"key":"e_1_2_1_18_1","unstructured":"2024. SQLServer Constant Folding and Expression Evaluation. https:\/\/learn.microsoft.com\/en-us\/sql\/relational-databases\/query-processing-architecture-guide?view=sql-server-ver16#constant-folding-and-expression-evaluation."},{"key":"e_1_2_1_19_1","unstructured":"2024. Subexpression Elimination In Code-Generated Expression Evaluation (Common Expression Reuse). https:\/\/books.japila.pl\/spark-sql-internals\/subexpression-elimination\/."},{"key":"e_1_2_1_20_1","unstructured":"2024. TPC-H Homepage. http:\/\/www.tpc.org\/tpch."},{"key":"e_1_2_1_21_1","unstructured":"2024. Trino a query engine that runs at ludicrous speed. https:\/\/trino.io\/."},{"key":"e_1_2_1_22_1","unstructured":"2024. Trino Dynamic Filtering. https:\/\/trino.io\/docs\/current\/admin\/dynamic-filtering.html."},{"key":"e_1_2_1_23_1","unstructured":"Apache Spark 3.4.0. 2024. JDBC To Other Databases. https:\/\/spark.apache.org\/docs\/latest\/sql-data-sources-jdbc.html."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687731"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.46912\/napas.29"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/3236187.3236195"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3319895"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742797"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3631504.3631510"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3190662"},{"key":"e_1_2_1_31_1","volume-title":"OBDA for the Web: Creating Virtual RDF Graphs On Top of Web Data Sources. CoRR abs\/2005.11264","author":"Bereta Konstantina","year":"2020","unstructured":"Konstantina Bereta, George Papadakis, and Manolis Koubarakis. 2020. OBDA for the Web: Creating Virtual RDF Graphs On Top of Web Data Sources. CoRR abs\/2005.11264 (2020). arXiv:2005.11264 https:\/\/arxiv.org\/abs\/2005.11264"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/IGARSS.2018.8518255"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.3233\/SW-160217"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/RIDE.1995.378736"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/645920.672834"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2002.994788"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2463709"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2814710.2814713"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/BFB0054528"},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Haralampos Gavriilidis Kaustubh Beedkar Jorge-Arnulfo Quian\u00e9-Ruiz and Volker Markl. 2023. In-Situ Cross-Database Query Processing. In ICDE.","DOI":"10.1109\/ICDE55515.2023.00214"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2741948.2741968"},{"key":"e_1_2_1_42_1","unstructured":"Andy Grove. [n.d.]. Apache DataFusion. https:\/\/github.com\/apache\/arrow-datafusion"},{"key":"e_1_2_1_43_1","unstructured":"S. Hemminger. April 2005. Network Emulation with NetEm. Open Source Development Lab."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01277518"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554898"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1016\/J.WEBSEM.2017.02.001"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554891"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10619-015-7185-y"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/371578.371598"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/352958.352982"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611479.3611484"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-020-00612-x"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-68234-9_37"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3653317"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559889"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-26253-2"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","unstructured":"The pandas development team. 2020. pandas-dev\/pandas: Pandas. 10.5281\/zenodo.3509134","DOI":"10.5281\/zenodo.3509134"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407807"},{"key":"e_1_2_1_60_1","unstructured":"Mark Raasveldt and Hannes Muehleisen. [n.d.]. DuckDB. https:\/\/github.com\/duckdb\/duckdb"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1016\/J.FUTURE.2016.08.013"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.25080\/Majora-7b98e3ed-013"},{"key":"e_1_2_1_63_1","volume-title":"Schwarz","author":"Roth Mary Tork","year":"1997","unstructured":"Mary Tork Roth and Peter M. Schwarz. 1997. Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources. In VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25\u201329, 1997, Athens, Greece, Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, and Manfred A. Jeusfeld (Eds.). Morgan Kaufmann, 266\u2013275. http:\/\/www.vldb.org\/conf\/1997\/P266.PDF"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00196"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/96602.96604"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1007\/S007780050015"},{"key":"e_1_2_1_67_1","first-page":"281","article-title":"Counting connected sets and connected partitions of a graph","volume":"67","author":"Vince Andrew","year":"2017","unstructured":"Andrew Vince. 2017. Counting connected sets and connected partitions of a graph. Australas. J Comb. 67 (2017), 281\u2013293. http:\/\/ajc.maths.uq.edu.au\/pdf\/67\/ajc_v67_p281.pdf","journal-title":"Australas. J Comb."},{"key":"e_1_2_1_68_1","volume-title":"8th Biennial Conference on Innovative Data Systems Research, CIDR","author":"Wang Jingjing","year":"2017","unstructured":"Jingjing Wang, Tobin Baker, Magdalena Balazinska, Daniel Halperin, Brandon Haynes, Bill Howe, Dylan Hutchison, Shrainik Jain, Ryan Maas, Parmita Mehta, Dominik Moritz, Brandon Myers, Jennifer Ortiz, Dan Suciu, Andrew Whitaker, and Shengliang Xu. 2017. The Myria Big Data Management and Analytics System and Cloud Services. In 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8\u201311, 2017, Online Proceedings. www.cidrdb.org. http:\/\/cidrdb.org\/cidr2017\/papers\/p37-wang-cidr17.pdf"},{"key":"e_1_2_1_69_1","unstructured":"Xiaoying Wang Jiannan Wang Tianzheng Wang and Yong Zhang. 2024. [Technical Report] Accio: Bolt-on Query Federation. https:\/\/github.com\/sfu-db\/accio\/blob\/main\/accio_technical_report.pdf."},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551847"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3329859.3329873"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611540.3611563"},{"key":"e_1_2_1_73_1","unstructured":"W Yan and Per-Ake Larson. 1995. Interchanging the order of grouping and join. Technical Report. Citeseer."},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.1994.283001"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1007\/S00778-024-00867-8"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526171"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3734839.3734849","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T16:03:38Z","timestamp":1756483418000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3734839.3734849"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3]]},"references-count":76,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,3]]}},"alternative-id":["10.14778\/3734839.3734849"],"URL":"https:\/\/doi.org\/10.14778\/3734839.3734849","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,3]]},"assertion":[{"value":"2025-08-29","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}