{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,16]],"date-time":"2025-12-16T12:52:17Z","timestamp":1765889537422,"version":"3.44.0"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:p>Real-time data warehouses are essential for modern applications. Extract-Transform-Load (ETL) as a fundamental component of offline data warehouses also provides crucial support within realtime data warehouses. Among various traditional ETL approaches, Lambda and Kappa have emerged as classic real-time data processing solutions due to their freshness and query performance, which best meet business demands. However, both of them often require the integration of external stream processing engines, introducing challenges related to complexity, efficiency, and consistency. ZeroETL has emerged as an approach to address these issues. Nevertheless, existing ZeroETL-based solutions primarily emphasize the implementation of extraction and loading, resulting in limitations in handling transformation. Incremental View Maintenance (IVM) offers an alternative that can enhance ZeroETL. However, existing IVM implementations often focus on query acceleration rather than supporting high-throughput, complex real-time workloads.<\/jats:p>\n          <jats:p>To address these challenges, we propose Streaming View, an efficient real-time data processing engine integrated within AnalyticDB of Alibaba Cloud. Unlike existing solutions, Streaming View supports high-throughput, complex data processing for realtime streaming ETL workloads. Furthermore, it can be leveraged to optimize ZeroETL-based approaches by enhancing transformation capabilities. We design tailored algorithms and optimizations for diverse syntaxes and high-throughput scenarios, ensuring the system meets complex application needs. By integrating incremental computation into the data warehouse, Streaming View reduces complexity, ensures data consistency, and boosts performance, offering a robust solution for real-world applications. Experiments show Streaming View improves processing performance by up to 7x and 20x over traditional ETL and IVM methods, respectively, and addresses complex scenarios unsolved by existing solutions.<\/jats:p>","DOI":"10.14778\/3750601.3750634","type":"journal-article","created":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:38:05Z","timestamp":1758029885000},"page":"5153-5165","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Streaming View: An Efficient Data Processing Engine for Modern Real-Time Data Warehouse of Alibaba Cloud"],"prefix":"10.14778","volume":"18","author":[{"given":"Fangyuan","family":"Zhang","sequence":"first","affiliation":[{"name":"Alibaba Cloud Computing, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mengqi","family":"Wu","sequence":"additional","affiliation":[{"name":"Alibaba Cloud Computing, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chunlei","family":"Xu","sequence":"additional","affiliation":[{"name":"Alibaba Cloud Computing, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yunong","family":"Bao","sequence":"additional","affiliation":[{"name":"Alibaba Cloud Computing, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiyu","family":"Qiao","sequence":"additional","affiliation":[{"name":"Alibaba Cloud Computing, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yingli","family":"Zhou","sequence":"additional","affiliation":[{"name":"Alibaba Cloud Computing, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hua","family":"Fan","sequence":"additional","affiliation":[{"name":"Alibaba Cloud Computing, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Caihua","family":"Yin","sequence":"additional","affiliation":[{"name":"Alibaba Cloud Computing, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenchao","family":"Zhou","sequence":"additional","affiliation":[{"name":"Alibaba Cloud Computing, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Feifei","family":"Li","sequence":"additional","affiliation":[{"name":"Alibaba Cloud Computing, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,9,16]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"496","article-title":"Automated selection of materialized views and indexes in SQL databases","volume":"2000","author":"Agrawal Sanjay","year":"2000","unstructured":"Sanjay Agrawal, Surajit Chaudhuri, and Vivek R Narasayya. 2000. Automated selection of materialized views and indexes in SQL databases. In VLDB, Vol. 2000. 496\u2013505.","journal-title":"VLDB"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687592"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415533"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3629104.3672430"},{"key":"e_1_2_1_5_1","volume-title":"Retrieved","author":"Services Amazon Web","year":"2022","unstructured":"Amazon Web Services 2022. AWS announces Amazon Aurora zero-ETL intergration with Amazon Redshift. Retrieved March 16, 2025 from https:\/\/aws.amazon.com\/cn\/about-aws\/whats-new\/2022\/11\/amazon-aurora-zero-etl-integration-redshift\/"},{"key":"e_1_2_1_6_1","volume-title":"Retrieved","author":"DB","year":"2025","unstructured":"AnalyticDB for PostgreSQL 2025. AnalyticDB for PostgreSQL: Online MPP Data Warehousing Service - Vector Database - Alibaba Cloud. Retrieved Mar 16, 2025 from https:\/\/www.alibabacloud.com\/en\/product\/hybriddb-postgresql"},{"key":"e_1_2_1_7_1","volume-title":"Retrieved","author":"Flink Apache","year":"2025","unstructured":"Apache Flink 2025. Apache Flink\u00ae \u2014 Stateful Computations over Data Streams. Retrieved Mar 16, 2025 from https:\/\/flink.apache.org\/"},{"key":"e_1_2_1_8_1","volume-title":"Retrieved","author":"Spark Apache","year":"2025","unstructured":"Apache Spark 2025. Apache Doris: Open source data warehouse for real time data analytics - Apache Doris. Retrieved Mar 16, 2025 from https:\/\/doris.apache.org\/"},{"key":"e_1_2_1_9_1","volume-title":"Retrieved","author":"Spark Apache","year":"2025","unstructured":"Apache Spark 2025. Apache Spark\u2122 - Unified Engine for large-scale data analytics. Retrieved Mar 16, 2025 from https:\/\/spark.apache.org\/"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526045"},{"key":"e_1_2_1_11_1","first-page":"836","article-title":"Design of a massively parallel processor","volume":"100","year":"1980","unstructured":"Batcher. 1980. Design of a massively parallel processor. IEEE Trans. Comput. 100, 9 (1980), 836\u2013840.","journal-title":"IEEE Trans. Comput."},{"key":"e_1_2_1_12_1","first-page":"24","article-title":"Materialized views in Oracle","volume":"98","author":"Bello Randall G","year":"1998","unstructured":"Randall G Bello, Karl Dias, Alan Downing, James Feenan, Jim Finnerty, William D Norcott, Harry Sun, Andrew Witkowski, and Mohamed Ziauddin. 1998. Materialized views in Oracle. In VLDB, Vol. 98. 24\u201327.","journal-title":"VLDB"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3587136.3587137"},{"key":"e_1_2_1_14_1","volume-title":"Apache flink: Stream and batch processing in a single engine. The Bulletin of the Technical Committee on Data Engineering 38, 4","author":"Carbone Paris","year":"2015","unstructured":"Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. The Bulletin of the Technical Committee on Data Engineering 38, 4 (2015)."},{"key":"e_1_2_1_15_1","volume-title":"Further normalization of the data base relational model. Data base systems 6","author":"Codd Edgar F","year":"1972","unstructured":"Edgar F Codd. 1972. Further normalization of the data base relational model. Data base systems 6, 1972 (1972), 33\u201364."},{"key":"e_1_2_1_16_1","volume-title":"International Symposium on Intelligent and Distributed Computing. Springer, 319\u2013329","author":"Coviello Giuseppe","year":"2021","unstructured":"Giuseppe Coviello, Kunal Rao, Murugan Sankaradas, and Srimat Chakradhar. 2021. DataX: A system for data exchange and transformation of streams. In International Symposium on Intelligent and Distributed Computing. Springer, 319\u2013329."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2903741"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733080"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00682-5"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/290593.290597"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742795"},{"key":"e_1_2_1_22_1","volume-title":"Apache Flume: distributed log collection for Hadoop","author":"Hoffman Steve","unstructured":"Steve Hoffman. 2013. Apache Flume: distributed log collection for Hadoop. Packt Publishing Ltd."},{"key":"e_1_2_1_23_1","volume-title":"Delta Live Tables. Databricks Product Page. Retrieved","author":"Databricks Inc. 2025.","year":"2025","unstructured":"Databricks Inc. 2025. Delta Live Tables. Databricks Product Page. Retrieved June 2, 2025 from https:\/\/www.databricks.com\/product\/data-engineering\/dlt"},{"key":"e_1_2_1_24_1","unstructured":"Snowflake Inc. 2025. Dynamic Tables. Retrieved June 2 2025 from https:\/\/docs.snowflake.com\/en\/user-guide\/dynamic-tables-about"},{"key":"e_1_2_1_25_1","unstructured":"HV Jagadish PPS Narayan Sridhar Seshadri S Sudarshan and Rama Kanneganti. 1997. Incremental organization for data recording and warehousing. In VLDB. ResearchGate GmbH 16\u201325."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-023-00817-w"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/322326.322332"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-013-0348-4"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807085.1807100"},{"key":"e_1_2_1_30_1","volume-title":"2007 IEEE 23rd International Conference on Data Engineering. IEEE, 56\u201365","author":"Larson Per-Ake","year":"2006","unstructured":"Per-Ake Larson and Jingren Zhou. 2006. Efficient maintenance of materialized outer-join views. In 2007 IEEE 23rd International Conference on Data Engineering. IEEE, 56\u201365."},{"key":"e_1_2_1_31_1","unstructured":"Patrick Lewis Ethan Perez Aleksandra Piktus Fabio Petroni Vladimir Karpukhin Naman Goyal Heinrich K\u00fcttler Mike Lewis Wen-tau Yih Tim Rockt\u00e4schel et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 33 (2020) 9459\u20139474."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732951.2732965"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611540.3611639"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIC.2017.3481351"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457562"},{"key":"e_1_2_1_36_1","volume-title":"Materialize - The Streaming SQL Database. Materialize. Retrieved","year":"2025","unstructured":"Materialize. 2025. Materialize - The Streaming SQL Database. Materialize. Retrieved June 2, 2025 from https:\/\/materialize.com\/"},{"key":"e_1_2_1_37_1","volume-title":"VLDB DISPA Workshop","author":"Narayanan Deepak","year":"2020","unstructured":"Deepak Narayanan, Keshav Santhanam, Fiodar Kazhamiaka, Amar Phanishayee, and Matei Zaharia. 2020. Analysis and exploitation of dynamic pricing in the public cloud for ml training. In VLDB DISPA Workshop 2020."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183758"},{"key":"e_1_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Milos Nikolic Haozhe Zhang Ahmet Kara and Dan Olteanu. 2020. F-IVM: learning over fast-evolving relational data. In SIGMOD. 2773\u20132776.","DOI":"10.1145\/3318464.3384702"},{"key":"e_1_2_1_40_1","volume-title":"RisingWave: The Cloud-Native Streaming Database. Rising-Wave. Retrieved","year":"2025","unstructured":"RisingWave. 2025. RisingWave: The Cloud-Native Streaming Database. Rising-Wave. Retrieved June 2, 2025 from https:\/\/www.risingwave.com\/"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1080\/17445760.2019.1585848"},{"key":"e_1_2_1_42_1","volume-title":"Retrieved","author":"The Transaction Processing Council","year":"2025","unstructured":"The Transaction Processing Council 2025. TPC-H Homepage. Retrieved Mar 16, 2025 from https:\/\/www.tpc.org\/tpch\/"},{"key":"e_1_2_1_43_1","volume-title":"Nexmark-a benchmark for queries over data streams (draft). Technical report","author":"Tucker Pete","year":"2008","unstructured":"Pete Tucker, Kristin Tufte, Vassilis Papadimos, and David Maier. 2008. Nexmark-a benchmark for queries over data streams (draft). Technical report (2008)."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.4018\/jdwm.2009070101"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415504"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/3421424.3421427"},{"key":"e_1_2_1_47_1","volume-title":"Big Data: Principles and best practices of scalable realtime data systems.","author":"Warren James","year":"2015","unstructured":"James Warren and Nathan Marz. 2015. Big Data: Principles and best practices of scalable realtime data systems."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2934664"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352124"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/3641204.3641206"},{"key":"e_1_2_1_51_1","volume-title":"2007 IEEE 23rd International Conference on Data Engineering. IEEE, 526\u2013535","author":"Zhou Jingren","year":"2006","unstructured":"Jingren Zhou, Per-Ake Larson, Jonathan Goldstein, and Luping Ding. 2006. Dynamic materialized views. In 2007 IEEE 23rd International Conference on Data Engineering. IEEE, 526\u2013535."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.5555\/1325851.1325881"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3750601.3750634","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:38:25Z","timestamp":1758029905000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3750601.3750634"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8]]},"references-count":52,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["10.14778\/3750601.3750634"],"URL":"https:\/\/doi.org\/10.14778\/3750601.3750634","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,8]]},"assertion":[{"value":"2025-09-16","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}