{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T02:38:07Z","timestamp":1772851087304,"version":"3.50.1"},"reference-count":66,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:p>We present Apache Flink 2.0, an evolution of the popular stream processing system's architecture that decouples computation from state management. Flink 2.0 relies on a remote distributed file system (DFS) for primary state storage and uses local disks as a secondary cache, with state updates streamed continuously and directly to the DFS. To address the latency implications of remote storage, Flink 2.0 incorporates an asynchronous runtime execution model. Furthermore, Flink 2.0 introduces ForSt, a novel state store featuring a unified file system that enables faster and lightweight checkpointing, recovery, and reconfiguration with minimal intrusion to the existing Flink runtime architecture. Using a comprehensive set of Nexmark benchmarks and a large-scale stateful production workload, we evaluate Flink 2.0's large-state processing, checkpointing, and recovery mechanisms. Our results show significant performance improvements and reduced resource utilization compared to the baseline Flink 1.20 implementation. Specifically, we observe up to 94% reduction in checkpoint duration, up to 49\u00d7 faster recovery after failures or a rescaling operation, and up to 50% cost savings.<\/jats:p>","DOI":"10.14778\/3750601.3750609","type":"journal-article","created":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:38:05Z","timestamp":1758029885000},"page":"4846-4859","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Disaggregated State Management in Apache Flink\u00ae 2.0"],"prefix":"10.14778","volume":"18","author":[{"given":"Yuan","family":"Mei","sequence":"first","affiliation":[{"name":"Alibaba Group"}]},{"given":"Rui","family":"Xia","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]},{"given":"Zhaoqian","family":"Lan","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]},{"given":"Kaitian","family":"Hu","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]},{"given":"Lei","family":"Huang","sequence":"additional","affiliation":[{"name":"Boston University"}]},{"given":"Paris","family":"Carbone","sequence":"additional","affiliation":[{"name":"KTH Royal Institute of Technology"}]},{"given":"Yanfei","family":"Lei","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]},{"given":"Vasiliki","family":"Kalavri","sequence":"additional","affiliation":[{"name":"Boston University"}]},{"given":"Han","family":"Yin","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]},{"given":"Feng","family":"Wang","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]}],"member":"320","published-online":{"date-parts":[[2025,9,16]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-003-0095-z"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415545"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536222.2536229"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536222.2536229"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476389"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824076"},{"key":"e_1_2_1_7_1","volume-title":"Alibaba Realtime Compute. https:\/\/www.alibabacloud.com\/product\/realtime-compute Last access","year":"2025","unstructured":"Alibaba. 2025. Alibaba Realtime Compute. https:\/\/www.alibabacloud.com\/product\/realtime-compute Last access: March 2025."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3314047"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872854"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526045"},{"key":"e_1_2_1_11_1","unstructured":"Asynchronous Execution Model 2024. https:\/\/cwiki.apache.org\/confluence\/x\/S4p3EQ."},{"key":"e_1_2_1_12_1","unstructured":"Asynchronous State APIs 2024. https:\/\/cwiki.apache.org\/confluence\/pages\/viewpage.action?pageId=293046857."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066160"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733016"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/3587136.3587137"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/3229863.3229872"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137765.3137777"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3383131"},{"key":"e_1_2_1_19_1","volume-title":"Apache flink: Stream and batch processing in a single engine. The Bulletin of the Technical Committee on Data Engineering 38, 4","author":"Carbone Paris","year":"2015","unstructured":"Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. The Bulletin of the Technical Committee on Data Engineering 38, 4 (2015)."},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","unstructured":"U\u011fur \u00c7etintemel Daniel Abadi Yanif Ahmad Hari Balakrishnan Magdalena Balazinska Mitch Cherniack Jeong-Hyon Hwang Samuel Madden Anurag Maskey Alexander Rasin et al. 2016. The aurora and borealis stream processing engines. In Data Stream Management. Springer 337\u2013359.","DOI":"10.1007\/978-3-540-28608-0_17"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872857"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1365815.1365816"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2904441"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","unstructured":"Jianjun Chen David J. DeWitt Feng Tian and Yuan Wang. [n.d.]. NiagaraCQ: A scalable continuous query system for internet databases. ([n. d.]) 379\u2013390. 10.1145\/342009.335432","DOI":"10.1145\/342009.335432"},{"key":"e_1_2_1_25_1","volume-title":"Alibaba Cloud OSS. https:\/\/www.alibabacloud.com\/help\/en\/oss\/user-guide\/overview-53 Last access","author":"Cloud Alibaba","year":"2025","unstructured":"Alibaba Cloud. 2025. Alibaba Cloud OSS. https:\/\/www.alibabacloud.com\/help\/en\/oss\/user-guide\/overview-53 Last access: July 2025."},{"key":"e_1_2_1_26_1","volume-title":"Alibaba Cloud Pricing. https:\/\/www.alibabacloud.com\/en\/product\/ecs-pricing-list\/en Last access","author":"Cloud Alibaba","year":"2025","unstructured":"Alibaba Cloud. 2025. Alibaba Cloud Pricing. https:\/\/www.alibabacloud.com\/en\/product\/ecs-pricing-list\/en Last access: July 2025."},{"key":"e_1_2_1_27_1","volume-title":"Resource Setup in Alibaba Cloud's Real-time Computing Service. https:\/\/www.alibabacloud.com\/help\/en\/flink\/product-overview\/basic-concepts Last access","author":"Cloud Alibaba","year":"2025","unstructured":"Alibaba Cloud. 2025. Resource Setup in Alibaba Cloud's Real-time Computing Service. https:\/\/www.alibabacloud.com\/help\/en\/flink\/product-overview\/basic-concepts Last access: July 2025."},{"key":"e_1_2_1_28_1","volume-title":"https:\/\/github.com\/ververica\/ForSt\/ Last access","author":"Community ForSt","year":"2025","unstructured":"ForSt Community. 2024. ForSt Project. https:\/\/github.com\/ververica\/ForSt\/ Last access: July 2025."},{"key":"e_1_2_1_29_1","volume-title":"Nexmark Github repo. https:\/\/github.com\/nexmark\/nexmark\/ Last access","author":"Community Nexmark","year":"2025","unstructured":"Nexmark Community. 2020. Nexmark Github repo. https:\/\/github.com\/nexmark\/nexmark\/ Last access: July 2025."},{"key":"e_1_2_1_30_1","unstructured":"Compute Units in Alibaba Cloud 2024. Metering method of Realtime Compute for Apache Flink in Alibaba Cloud. https:\/\/www.alibabacloud.com\/help\/en\/flink\/product-overview\/billable-items."},{"key":"e_1_2_1_31_1","volume-title":"Spanner: Google's Globally-Distributed Database. In OSDI.","author":"Corbett James C.","year":"2012","unstructured":"James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Dale Woodford, Yasushi Saito, Christopher Taylor, Michal Szymaniak, and Ruth Wang. 2012. Spanner: Google's Globally-Distributed Database. In OSDI."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872838"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2903741"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3386129"},{"key":"e_1_2_1_35_1","volume-title":"Powered by Flink. https:\/\/flink.apache.org\/what-is-flink\/powered-by\/ Last access","author":"Flink Apache","year":"2025","unstructured":"Apache Flink. 2025. Powered by Flink. https:\/\/flink.apache.org\/what-is-flink\/powered-by\/ Last access: July 2025."},{"key":"e_1_2_1_36_1","unstructured":"Flink Remote Compaction 2025. https:\/\/github.com\/AlexYinHan\/flink\/tree\/remote_compaction_feature."},{"key":"e_1_2_1_37_1","volume-title":"Apache Flink Committee. https:\/\/projects.apache.org\/committee.html?flink Last access","author":"Foundation Apache Software","year":"2025","unstructured":"Apache Software Foundation. 2025. Apache Flink Committee. https:\/\/projects.apache.org\/committee.html?flink Last access: July 2025."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-023-00819-8"},{"key":"e_1_2_1_39_1","volume-title":"Google Cloud Dataflow. https:\/\/cloud.google.com\/dataflow Last access","year":"2025","unstructured":"Google. 2025. Google Cloud Dataflow. https:\/\/cloud.google.com\/dataflow Last access: March 2025."},{"key":"e_1_2_1_40_1","volume-title":"Stream processing with Apache Flink: fundamentals, implementation, and operation of streaming applications","author":"Hueske Fabian","unstructured":"Fabian Hueske and Vasiliki Kalavri. 2019. Stream processing with Apache Flink: fundamentals, implementation, and operation of streaming applications. O'Reilly Media."},{"key":"e_1_2_1_41_1","volume-title":"The Open Group Base Specifications Issue 7","author":"IEEE and The Open Group","year":"2018","unstructured":"IEEE and The Open Group. 2018. The Open Group Base Specifications Issue 7, 2018 edition. https:\/\/pubs.opengroup.org\/onlinepubs\/9699919799 Last access: July 2025."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.14778\/3007263.3007272"},{"key":"e_1_2_1_43_1","unstructured":"Keynote Flink Forward 2024. The Past Present and Future of Apache Flink. https:\/\/www.alibabacloud.com\/blog\/the-past-present-and-future-of-apache-flink_601867."},{"key":"e_1_2_1_44_1","volume-title":"Processing Units in AWS","author":"Kinesis","year":"2024","unstructured":"Kinesis Processing Units in AWS 2024. Managed Service for Apache Flink application resources in AWS. https:\/\/docs.aws.amazon.com\/managed-flink\/latest\/java\/how-resources.html."},{"key":"e_1_2_1_45_1","volume-title":"https:\/\/risingwave.com\/ Last access","author":"Labs RisingWave","year":"2025","unstructured":"RisingWave Labs. 2025. RisingWave. https:\/\/risingwave.com\/ Last access: March 2025."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352141"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453890"},{"key":"e_1_2_1_48_1","volume-title":"More Than Capacity: Performance-oriented Evolution of Pangu in Alibaba. In 21st USENIX Conference on File and Storage Technologies (FAST 23)","author":"Li Qiang","year":"2023","unstructured":"Qiang Li, Qiao Xiang, Yuxin Wang, et al. 2023. More Than Capacity: Performance-oriented Evolution of Pangu in Alibaba. In 21st USENIX Conference on File and Storage Technologies (FAST 23). USENIX Association, Santa Clara, CA, 331\u2013346. https:\/\/www.usenix.org\/conference\/fast23\/presentation\/li-qiang-deployed"},{"key":"e_1_2_1_49_1","volume-title":"S-store: Streaming meets transaction processing. arXiv preprint arXiv:1503.01143","author":"Meehan John","year":"2015","unstructured":"John Meehan, Nesime Tatbul, Stan Zdonik, Cansu Aslantas, Ugur Cetintemel, Jiang Du, Tim Kraska, Samuel Madden, David Maier, Andrew Pavlo, et al. 2015. S-store: Streaming meets transaction processing. arXiv preprint arXiv:1503.01143 (2015)."},{"key":"e_1_2_1_50_1","volume-title":"ZippyDB: Facebook's key value store. https:\/\/engineering.fb.com\/2021\/08\/06\/core-infra\/zippydb\/ Last access","year":"2025","unstructured":"Meta. 2021. ZippyDB: Facebook's key value store. https:\/\/engineering.fb.com\/2021\/08\/06\/core-infra\/zippydb\/ Last access: July 2025."},{"key":"e_1_2_1_51_1","volume-title":"2017 USENIX Annual Technical Conference (USENIX ATC 17)","author":"Miao Hongyu","year":"2017","unstructured":"Hongyu Miao, Heejin Park, Myeongjae Jeon, Gennady Pekhimenko, Kathryn S. McKinley, and Felix Xiaozhu Lin. 2017. StreamBox: Modern Stream Processing on a Multicore Machine. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX Association, Santa Clara, CA, 617\u2013629. https:\/\/www.usenix.org\/conference\/atc17\/technical-sessions\/presentation\/miao"},{"key":"e_1_2_1_52_1","volume-title":"Azure Stream Analytics. https:\/\/azure.microsoft.com\/en-us\/services\/stream-analytics\/ Last access","year":"2025","unstructured":"Microsoft. 2025. Azure Stream Analytics. https:\/\/azure.microsoft.com\/en-us\/services\/stream-analytics\/ Last access: March 2025."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522738"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDMW.2010.172"},{"key":"e_1_2_1_55_1","unstructured":"Nexmark Queries [n.d.]. NEXMark benchmark. http:\/\/datalab.cs.pdx.edu\/niagaraST\/NEXMark."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137765.3137770"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/1055558.1055596"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/130283.130333"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2595641"},{"key":"e_1_2_1_60_1","unstructured":"Pete Tucker Kristin Tufte Vassilis Papadimos and David Maier. 2002. NEXMark\u2014A Benchmark for Queries over Data Streams. Technical Report. OGI School of Science & Engineering at OHSU."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3056101"},{"key":"e_1_2_1_62_1","volume-title":"17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20)","author":"Vuppalapati Midhul","year":"2020","unstructured":"Midhul Vuppalapati, Justin Miron, Rachit Agarwal, Dan Truong, Ashish Motivala, and Thierry Cruanes. 2020. Building an elastic query engine on disaggregated storage. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). 449\u2013462."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457556"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3555041.3589403"},{"key":"e_1_2_1_65_1","volume-title":"Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21)","author":"Xu Le","year":"2021","unstructured":"Le Xu, Shivaram Venkataraman, Indranil Gupta, Luo Mai, and Rahul Potharaju. 2021. Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). 389\u2013405."},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352124"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3750601.3750609","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:39:10Z","timestamp":1758029950000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3750601.3750609"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8]]},"references-count":66,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["10.14778\/3750601.3750609"],"URL":"https:\/\/doi.org\/10.14778\/3750601.3750609","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,8]]},"assertion":[{"value":"2025-09-16","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}