{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,28]],"date-time":"2025-09-28T20:38:23Z","timestamp":1759091903427},"reference-count":6,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2018,8]]},"abstract":"<jats:p>\n            Distributed dataflow systems (DDS) are widely employed in graph processing and machine learning (ML), where many of these algorithms are iterative in nature. Typically, DDS achieve fault-tolerance using checkpointing mechanisms or they exploit algorithmic properties to enable fault-tolerance without the need for checkpoints. Recently, for graph processing, we proposed utilizing\n            <jats:italic>unblocking checkpointing<\/jats:italic>\n            , to parallelize the execution pipeline and checkpoint writing, as well as\n            <jats:italic>confined recovery<\/jats:italic>\n            , to enable fast recovery upon partial node failures. Furthermore, for ML algorithms implemented using broadcast variables, we proposed utilizing\n            <jats:italic>replica recovery<\/jats:italic>\n            , to leverage broadcast variable replicas and facilitate failure recovery checkpointing-free. In this demonstration, we showcase these fault-tolerance techniques using Apache Flink. Attendees will be able to: (i) run representative iterative algorithms including PageRank, Connected Components, and K-Means, (ii) explore the internal behavior of DDS under the influence of unblocking checkpointing, and (iii) trigger failures, to observe the effects of confined recovery and replica recovery.\n          <\/jats:p>","DOI":"10.14778\/3229863.3236242","type":"journal-article","created":{"date-parts":[[2018,9,10]],"date-time":"2018-09-10T12:12:28Z","timestamp":1536581548000},"page":"1990-1993","source":"Crossref","is-referenced-by-count":1,"title":["Fault-tolerance for distributed iterative dataflows in action"],"prefix":"10.14778","volume":"11","author":[{"given":"Chen","family":"Xu","sequence":"first","affiliation":[{"name":"East China Normal University, Shanghai, China and TU Berlin"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rudi Poepsel","family":"Lemaitre","sequence":"additional","affiliation":[{"name":"Technische Universit\u00e4t Berlin, Berlin, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Juan","family":"Soto","sequence":"additional","affiliation":[{"name":"Technische Universit\u00e4t Berlin, Berlin, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Volker","family":"Markl","sequence":"additional","affiliation":[{"name":"Technische Universit\u00e4t Berlin, Berlin, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,8]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"9","volume-title":"TPCTC","author":"Boden C.","year":"2017","unstructured":"C. Boden : A framework for benchmarking distributed systems and algorithms . In TPCTC , pages 9 -- 24 , 2017 . C. Boden et al. PEEL: A framework for benchmarking distributed systems and algorithms. In TPCTC, pages 9--24, 2017."},{"issue":"4","key":"e_1_2_1_2_1","first-page":"28","article-title":"Apache flink\u2122: Stream and batch processing in a single engine","volume":"38","author":"Carbone P.","year":"2015","unstructured":"P. Carbone Apache flink\u2122: Stream and batch processing in a single engine . IEEE Data Eng. Bull. , 38 ( 4 ): 28 -- 38 , 2015 . P. Carbone et al. Apache flink\u2122: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38(4):28--38, 2015.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_3_1","first-page":"137","volume-title":"OSDI","author":"Dean J.","year":"2004","unstructured":"J. Dean : Simplified data processing on large clusters . In OSDI , pages 137 -- 150 , 2004 . J. Dean et al. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2016.7498275"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2017.2690431"},{"key":"e_1_2_1_6_1","first-page":"1","volume-title":"HotCloud","author":"Zaharia M.","year":"2010","unstructured":"M. Zaharia : Cluster computing with working sets . In HotCloud , pages 10: 1 -- 6 , 2010 . M. Zaharia et al. Spark: Cluster computing with working sets. In HotCloud, pages 10:1--6, 2010."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3229863.3236242","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:06:48Z","timestamp":1672222008000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3229863.3236242"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,8]]},"references-count":6,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2018,8]]}},"alternative-id":["10.14778\/3229863.3236242"],"URL":"https:\/\/doi.org\/10.14778\/3229863.3236242","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2018,8]]}}}