{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T14:02:50Z","timestamp":1773324170010,"version":"3.50.1"},"reference-count":19,"publisher":"Association for Computing Machinery (ACM)","issue":"13","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2014,8]]},"abstract":"<jats:p>\n            With the addition of lambda expressions and the Stream API in Java 8, Java has gained a powerful and expressive query language that operates over in-memory collections of Java objects, making the transformation and analysis of data more convenient, scalable and efficient. In this paper, we build on Java 8 Stream and add a\n            <jats:italic>DistributableStream<\/jats:italic>\n            abstraction that supports federated query execution over an extensible set of distributed compute engines. Each query eventually results in the creation of a materialized result that is returned either as a local object or as an engine defined distributed Java Collection that can be saved and\/or used as a source for future queries. Distinctively, DistributableStream supports the changing of compute engines both between and within a query, allowing different parts of a computation to be executed on different platforms. At execution time, the query is organized as a sequence of pipelined stages, each stage potentially running on a different engine. Each node that is part of a stage executes its portion of the computation on the data available locally or produced by the previous stage of the computation. This approach allows for computations to be assigned to engines based on pricing, data locality, and resource availability. Coupled with the inherent laziness of stream operations, this brings great flexibility to query planning and separates the semantics of the query from the details of the engine used to execute it. We currently support three engines, Local, Apache Hadoop MapReduce and Oracle Coherence, and we illustrate how new engines and data sources can be added.\n          <\/jats:p>","DOI":"10.14778\/2733004.2733007","type":"journal-article","created":{"date-parts":[[2015,5,12]],"date-time":"2015-05-12T15:37:52Z","timestamp":1431445072000},"page":"1343-1354","source":"Crossref","is-referenced-by-count":11,"title":["Changing engines in midstream"],"prefix":"10.14778","volume":"7","author":[{"given":"Xueyuan","family":"Su","sequence":"first","affiliation":[{"name":"Oracle Corporation"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Garret","family":"Swart","sequence":"additional","affiliation":[{"name":"Oracle Corporation"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Brian","family":"Goetz","sequence":"additional","affiliation":[{"name":"Oracle Corporation"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Brian","family":"Oliver","sequence":"additional","affiliation":[{"name":"Oracle Corporation"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Paul","family":"Sandoz","sequence":"additional","affiliation":[{"name":"Oracle Corporation"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2014,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Apache Hadoop. http:\/hadoop.apache.org\/.  Apache Hadoop. http:\/hadoop.apache.org\/."},{"key":"e_1_2_1_2_1","unstructured":"Apache Hadoop YARN. http:\/hadoop.apache.org\/docs\/current\/hadoop-yarn\/hadoop-yarn-site\/YARN.html.  Apache Hadoop YARN. http:\/hadoop.apache.org\/docs\/current\/hadoop-yarn\/hadoop-yarn-site\/YARN.html."},{"key":"e_1_2_1_3_1","unstructured":"Apache Hive. http:\/\/hive.apache.org\/.  Apache Hive. http:\/\/hive.apache.org\/."},{"key":"e_1_2_1_4_1","unstructured":"Apache Pig. http:\/\/pig.apache.org\/.  Apache Pig. http:\/\/pig.apache.org\/."},{"key":"e_1_2_1_5_1","unstructured":"Apache Spark. http:\/\/spark.apache.org\/.  Apache Spark. http:\/\/spark.apache.org\/."},{"key":"e_1_2_1_6_1","unstructured":"Apache Storm. http:\/\/storm.incubator.apache.org\/.  Apache Storm. http:\/\/storm.incubator.apache.org\/."},{"key":"e_1_2_1_7_1","unstructured":"Apache Tez. http:\/\/tez.incubator.apache.org\/.  Apache Tez. http:\/\/tez.incubator.apache.org\/."},{"key":"e_1_2_1_8_1","unstructured":"Apache ZooKeeper. http:\/\/zookeeper.apache.org\/.  Apache ZooKeeper. http:\/\/zookeeper.apache.org\/."},{"key":"e_1_2_1_9_1","unstructured":"Cascading. http:\/\/www.cascading.org\/.  Cascading. http:\/\/www.cascading.org\/."},{"key":"e_1_2_1_10_1","first-page":"137","volume-title":"Proceedings of OSDI","author":"Dean J.","year":"2004","unstructured":"J. Dean and S. Ghemawat . MapReduce: Simplified data processing on large clusters . In Proceedings of OSDI , pages 137 -- 150 , 2004 . J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of OSDI, pages 137--150, 2004."},{"key":"e_1_2_1_11_1","unstructured":"The Dryad project. http:\/\/research.microsoft.com\/en-us\/projects\/Dryad\/.  The Dryad project. http:\/\/research.microsoft.com\/en-us\/projects\/Dryad\/."},{"key":"e_1_2_1_12_1","unstructured":"JDK 8 project. https:\/\/jdk8.java.net\/.  JDK 8 project. https:\/\/jdk8.java.net\/."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2011.68"},{"key":"e_1_2_1_14_1","unstructured":"LINQ\n  : language integrated query. http:\/\/msdn.microsoft.com\/en-us\/library\/bb397926.aspx\/.  LINQ: language integrated query. http:\/\/msdn.microsoft.com\/en-us\/library\/bb397926.aspx\/."},{"key":"e_1_2_1_15_1","unstructured":"Oracle Big Data Appliance. http:\/\/www.oracle.com\/us\/products\/database\/big-data-appliance\/.  Oracle Big Data Appliance. http:\/\/www.oracle.com\/us\/products\/database\/big-data-appliance\/."},{"key":"e_1_2_1_16_1","unstructured":"Oracle Coherence. http:\/\/www.oracle.com\/technetwork\/middleware\/coherence\/.  Oracle Coherence. http:\/\/www.oracle.com\/technetwork\/middleware\/coherence\/."},{"key":"e_1_2_1_17_1","unstructured":"Shark. http:\/\/shark.cs.berkeley.edu\/.  Shark. http:\/\/shark.cs.berkeley.edu\/."},{"key":"e_1_2_1_18_1","unstructured":"Spark streaming. http:\/\/spark.apache.org\/streaming\/.  Spark streaming. http:\/\/spark.apache.org\/streaming\/."},{"key":"e_1_2_1_19_1","first-page":"15","volume-title":"Proceedings of NSDI","author":"Zaharia M.","year":"2012","unstructured":"M. Zaharia , M. Chowdhury , T. Das , A. Dave , J. Ma , M. McCauley , M. J. Franklin , S. Shenker , and I. Stoica . Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing . In Proceedings of NSDI , pages 15 -- 28 , 2012 . M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of NSDI, pages 15--28, 2012."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2733004.2733007","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:39:01Z","timestamp":1672220341000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2733004.2733007"}},"subtitle":["a java stream computational model for big data processing"],"short-title":[],"issued":{"date-parts":[[2014,8]]},"references-count":19,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2014,8]]}},"alternative-id":["10.14778\/2733004.2733007"],"URL":"https:\/\/doi.org\/10.14778\/2733004.2733007","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2014,8]]}}}