{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T14:05:56Z","timestamp":1781186756892,"version":"3.54.1"},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2015,8]]},"abstract":"<jats:p>Unbounded, unordered, global-scale datasets are increasingly common in day-to-day business (e.g. Web logs, mobile usage statistics, and sensor networks). At the same time, consumers of these datasets have evolved sophisticated requirements, such as event-time ordering and windowing by features of the data themselves, in addition to an insatiable hunger for faster answers. Meanwhile, practicality dictates that one can never fully optimize along all dimensions of correctness, latency, and cost for these types of input. As a result, data processing practitioners are left with the quandary of how to reconcile the tensions between these seemingly competing propositions, often resulting in disparate implementations and systems.<\/jats:p>\n          <jats:p>We propose that a fundamental shift of approach is necessary to deal with these evolved requirements in modern data processing. We as a field must stop trying to groom unbounded datasets into finite pools of information that eventually become complete, and instead live and breathe under the assumption that we will never know if or when we have seen all of our data, only that new data will arrive, old data may be retracted, and the only way to make this problem tractable is via principled abstractions that allow the practitioner the choice of appropriate tradeoffs along the axes of interest: correctness, latency, and cost.<\/jats:p>\n          <jats:p>In this paper, we present one such approach, the Dataflow Model, along with a detailed examination of the semantics it enables, an overview of the core principles that guided its design, and a validation of the model itself via the real-world experiences that led to its development.<\/jats:p>","DOI":"10.14778\/2824032.2824076","type":"journal-article","created":{"date-parts":[[2015,9,16]],"date-time":"2015-09-16T12:18:17Z","timestamp":1442405897000},"page":"1792-1803","source":"Crossref","is-referenced-by-count":489,"title":["The dataflow model"],"prefix":"10.14778","volume":"8","author":[{"given":"Tyler","family":"Akidau","sequence":"first","affiliation":[{"name":"Google"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Robert","family":"Bradshaw","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Craig","family":"Chambers","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Slava","family":"Chernyak","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rafael J.","family":"Fern\u00e1ndez-Moctezuma","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Reuven","family":"Lax","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sam","family":"McVeety","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Daniel","family":"Mills","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Frances","family":"Perry","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Eric","family":"Schmidt","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sam","family":"Whittle","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2015,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-003-0095-z"},{"key":"e_1_2_1_2_1","volume-title":"Proc. of the 39th Int. Conf. on Very Large Data Bases (VLDB)","author":"Akidau T.","year":"2013"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-014-0357-y"},{"key":"e_1_2_1_4_1","unstructured":"Apache. Apache Hadoop. http:\/\/hadoop.apache.org 2012.  Apache. Apache Hadoop. http:\/\/hadoop.apache.org 2012."},{"key":"e_1_2_1_5_1","unstructured":"Apache. Apache Storm. http:\/\/storm.apache.org 2013.  Apache. Apache Storm. http:\/\/storm.apache.org 2013."},{"key":"e_1_2_1_6_1","unstructured":"Apache. Apache Flink. http:\/\/flink.apache.org\/ 2014.  Apache. Apache Flink. http:\/\/flink.apache.org\/ 2014."},{"key":"e_1_2_1_7_1","unstructured":"Apache. Apache Samza. http:\/\/samza.apache.org 2014.  Apache. Apache Samza. http:\/\/samza.apache.org 2014."},{"key":"e_1_2_1_8_1","first-page":"363","volume-title":"Proc. of the Third Biennial Conf. on Innovative Data Systems Research (CIDR)","author":"Barga R. S.","year":"2007"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920874"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733016"},{"key":"e_1_2_1_11_1","unstructured":"Cask. Tigon. http:\/\/tigon.io\/ 2015.  Cask. Tigon. http:\/\/tigon.io\/ 2015."},{"key":"e_1_2_1_12_1","first-page":"363","volume-title":"Efficient Data-Parallel Pipelines. In Proc. of the 2010 ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI)","author":"Chambers C.","year":"2010"},{"key":"e_1_2_1_13_1","volume-title":"Trill: A High-Performance Incremental Query Processor for Diverse Analytics. In Proc. of the 41st Int. Conf. on Very Large Data Bases (VLDB)","author":"Chandramouli B.","year":"2015"},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","first-page":"668","DOI":"10.1145\/872757.872857","volume-title":"TelegraphCQ: Continuous Dataflow Processing. In Proc. of the 2003 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD), SIGMOD '03","author":"Chandrasekaran S.","year":"2003"},{"key":"e_1_2_1_15_1","first-page":"379","volume-title":"NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In Proc. of the 2000 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD)","author":"Chen J.","year":"2000"},{"key":"e_1_2_1_16_1","volume-title":"Proc. of the Sixth Symposium on Operating System Design and Implementation (OSDI)","author":"Dean J.","year":"2004"},{"key":"e_1_2_1_17_1","unstructured":"EsperTech. Esper. http:\/\/www.espertech.com\/esper\/ 2006.  EsperTech. Esper. http:\/\/www.espertech.com\/esper\/ 2006."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687568"},{"key":"e_1_2_1_19_1","unstructured":"Google. Dataflow SDK. https:\/\/github.com\/GoogleCloudPlatform\/DataflowJavaSDK 2015.  Google. Dataflow SDK. https:\/\/github.com\/GoogleCloudPlatform\/DataflowJavaSDK 2015."},{"key":"e_1_2_1_20_1","unstructured":"Google. Google Cloud Dataflow. https:\/\/cloud.google.com\/dataflow\/ 2015.  Google. Google Cloud Dataflow. https:\/\/cloud.google.com\/dataflow\/ 2015."},{"key":"e_1_2_1_21_1","first-page":"1079","volume-title":"Proc. of the 31st Int. Conf. on Very Large Data Bases (VLDB)","author":"Johnson T.","year":"2005"},{"key":"e_1_2_1_22_1","first-page":"311","volume-title":"Proceedings og the ACM SIGMOD Int. Conf. on Management of Data (SIGMOD)","author":"Li J.","year":"2005"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453890"},{"key":"e_1_2_1_24_1","first-page":"37","volume-title":"Proc. of the 10th Int. Conf. on Database Theory (ICDT)","author":"Maier D.","year":"2005"},{"key":"e_1_2_1_25_1","unstructured":"N. Marz. How to beat the CAP theorem. http:\/\/nathanmarz.com\/blog\/how-to-beat-the-cap-theorem.html 2011.  N. Marz. How to beat the CAP theorem. http:\/\/nathanmarz.com\/blog\/how-to-beat-the-cap-theorem.html 2011."},{"key":"e_1_2_1_26_1","unstructured":"S. Murthy etal Pulsar -- Real-Time Analytics at Scale. Technical report eBay 2015.  S. Murthy et al. Pulsar -- Real-Time Analytics at Scale. Technical report eBay 2015."},{"key":"e_1_2_1_27_1","unstructured":"SQLStream. http:\/\/sqlstream.com\/ 2015.  SQLStream. http:\/\/sqlstream.com\/ 2015."},{"key":"e_1_2_1_28_1","first-page":"263","volume-title":"Proc. of the 23rd ACM SIGMOD-SIGACT-SIGART Symp. on Princ. of Database Systems","author":"Srivastava U.","year":"2004"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687609"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2003.1198390"},{"key":"e_1_2_1_31_1","volume-title":"Proc. of the ACM SIGSPATIAL Int. Workshop on GeoStreaming (IWGS), 2010","author":"Whiteneck J.","year":"1878"},{"key":"e_1_2_1_33_1","first-page":"15","volume-title":"Proc. of the 9th USENIX Conf. on Networked Systems Design and Implementation (NSDI)","author":"Zaharia M.","year":"2012"},{"key":"e_1_2_1_34_1","volume-title":"Proc. of the 24th ACM Symp. on Operating Systems Principles","author":"Zaharia M.","year":"2013"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2824032.2824076","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:10:01Z","timestamp":1672222201000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2824032.2824076"}},"subtitle":["a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing"],"short-title":[],"issued":{"date-parts":[[2015,8]]},"references-count":33,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2015,8]]}},"alternative-id":["10.14778\/2824032.2824076"],"URL":"https:\/\/doi.org\/10.14778\/2824032.2824076","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2015,8]]}}}