{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T08:34:51Z","timestamp":1775032491210,"version":"3.50.1"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2018,7]]},"abstract":"<jats:p>\n            Solving business problems increasingly requires going beyond the limits of a single data processing platform (platform for short), such as Hadoop or a DBMS. As a result, organizations typically perform tedious and costly tasks to juggle their code and data across different platforms. Addressing this pain and achieving automatic cross-platform data processing is quite challenging: finding the most efficient platform for a given task requires quite good expertise for all the available platforms. We present R\n            <jats:sc>heem<\/jats:sc>\n            , a general-purpose cross-platform data processing system that decouples applications from the underlying platforms. It not only determines the best platform to run an incoming task, but also splits the task into subtasks and assigns each subtask to a specific platform to minimize the overall cost (e.g., runtime or monetary cost). It features (i) an interface to easily compose data analytic tasks; (ii) a novel cost-based optimizer able to find the most efficient platform in almost all cases; and (iii) an executor to efficiently orchestrate tasks over different platforms. As a result, it allows users to focus on the business logic of their applications rather than on the mechanics of how to compose and execute them. Using different real-world applications with R\n            <jats:sc>heem<\/jats:sc>\n            , we demonstrate how cross-platform data processing can accelerate performance by more than one order of magnitude compared to single-platform data processing.\n          <\/jats:p>","DOI":"10.14778\/3236187.3236195","type":"journal-article","created":{"date-parts":[[2018,9,10]],"date-time":"2018-09-10T12:12:28Z","timestamp":1536581548000},"page":"1414-1427","source":"Crossref","is-referenced-by-count":42,"title":["RHEEM: enabling cross-platform data processing"],"prefix":"10.14778","volume":"11","author":[{"given":"Divy","family":"Agrawal","sequence":"first","affiliation":[{"name":"UCSB and QCRI"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sanjay","family":"Chawla","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute (QCRI), HBKU"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bertty","family":"Contreras-Rojas","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute (QCRI), HBKU"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ahmed","family":"Elmagarmid","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute (QCRI), HBKU"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yasser","family":"Idris","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute (QCRI), HBKU"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zoi","family":"Kaoudi","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute (QCRI), HBKU"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sebastian","family":"Kruse","sequence":"additional","affiliation":[{"name":"Hasso Plattner Institute (HPI) and QCRI"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ji","family":"Lucas","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute (QCRI), HBKU"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Essam","family":"Mansour","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute (QCRI), HBKU"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mourad","family":"Ouzzani","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute (QCRI), HBKU"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Paolo","family":"Papotti","sequence":"additional","affiliation":[{"name":"Eurecom and QCRI"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jorge-Arnulfo","family":"Quian\u00e9-Ruiz","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute (QCRI), HBKU"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nan","family":"Tang","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute (QCRI), HBKU"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Saravanan","family":"Thirumuruganathan","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute (QCRI), HBKU"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anis","family":"Troudi","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute (QCRI), HBKU"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,7]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Apache Beam. https:\/\/beam.apache.org.  Apache Beam. https:\/\/beam.apache.org."},{"key":"e_1_2_1_2_1","unstructured":"Apache Drill. https:\/\/drill.apache.org.  Apache Drill. https:\/\/drill.apache.org."},{"key":"e_1_2_1_3_1","unstructured":"Apache Flink. https:\/\/flink.apache.org.  Apache Flink. https:\/\/flink.apache.org."},{"key":"e_1_2_1_4_1","unstructured":"Apache Flume. https:\/\/flume.apache.org.  Apache Flume. https:\/\/flume.apache.org."},{"key":"e_1_2_1_5_1","unstructured":"Apache HBase. http:\/\/hbase.apache.org\/.  Apache HBase. http:\/\/hbase.apache.org\/."},{"key":"e_1_2_1_6_1","unstructured":"Apache Hive: A data warehouse software for distributed storage. http:\/\/hive.apache.org.  Apache Hive: A data warehouse software for distributed storage. http:\/\/hive.apache.org."},{"key":"e_1_2_1_7_1","unstructured":"Apache Mahout. http:\/\/mahout.apache.org.  Apache Mahout. http:\/\/mahout.apache.org."},{"key":"e_1_2_1_8_1","unstructured":"Apache Spark: Lightning-Fast Cluster Computing. http:\/\/spark.incubator.apache.org\/.  Apache Spark: Lightning-Fast Cluster Computing. http:\/\/spark.incubator.apache.org\/."},{"key":"e_1_2_1_9_1","unstructured":"Fortune magazine. http:\/\/fortune.com\/2014\/06\/19\/big-data-airline-industry\/.  Fortune magazine. http:\/\/fortune.com\/2014\/06\/19\/big-data-airline-industry\/."},{"key":"e_1_2_1_10_1","unstructured":"Luigi Project. https:\/\/github.com\/spotify\/luigi.  Luigi Project. https:\/\/github.com\/spotify\/luigi."},{"key":"e_1_2_1_11_1","unstructured":"PostgreSQL. http:\/\/www.postgresql.org\/.  PostgreSQL. http:\/\/www.postgresql.org\/."},{"key":"e_1_2_1_12_1","unstructured":"PrestoDB Project. https:\/\/prestodb.io.  PrestoDB Project. https:\/\/prestodb.io."},{"key":"e_1_2_1_13_1","unstructured":"Spark MLlib: http:\/\/spark.apache.org\/mllib.  Spark MLlib: http:\/\/spark.apache.org\/mllib."},{"key":"e_1_2_1_14_1","unstructured":"Spark SQL programming guide. http:\/\/spark.apache.org\/docs\/latest\/sql-programming-guide.html.  Spark SQL programming guide. http:\/\/spark.apache.org\/docs\/latest\/sql-programming-guide.html."},{"key":"e_1_2_1_15_1","first-page":"265","volume-title":"OSDI","author":"Abadi M.","year":"2016","unstructured":"M. Abadi , P. Barham , J. Chen , Z. Chen , A. Davis , J. Dean , M. Devin , S. Ghemawat , G. Irving , M. Isard , M. Kudlur , J. Levenberg , R. Monga , S. Moore , D. G. Murray , B. Steiner , P. A. Tucker , V. Vasudevan , P. Warden , M. Wicke , Y. Yu , and X. Zheng . TensorFlow: A System for Large-Scale Machine Learning . In OSDI , pages 265 -- 283 , 2016 . M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. A. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: A System for Large-Scale Machine Learning. In OSDI, pages 265--283, 2016."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2899414"},{"key":"e_1_2_1_17_1","first-page":"479","volume-title":"EDBT","author":"Agrawal D.","year":"2016","unstructured":"D. Agrawal , S. Chawla , A. K. Elmagarmid , Z. Kaoudi , M. Ouzzani , P. Papotti , J. Quian\u00e9-Ruiz , N. Tang , and M. J. Zaki . Road to Freedom in Big Data Analytics . In EDBT , pages 479 -- 484 , 2016 . D. Agrawal, S. Chawla, A. K. Elmagarmid, Z. Kaoudi, M. Ouzzani, P. Papotti, J. Quian\u00e9-Ruiz, N. Tang, and M. J. Zaki. Road to Freedom in Big Data Analytics. In EDBT, pages 479--484, 2016."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-014-0357-y"},{"key":"e_1_2_1_19_1","volume-title":"How to use big data technologies to optimize operations in upstream petroleum industry. In 21<sup>st<\/sup> World Petroleum Congress","author":"Baaziz A.","year":"2014","unstructured":"A. Baaziz and L. Quoniam . How to use big data technologies to optimize operations in upstream petroleum industry. In 21<sup>st<\/sup> World Petroleum Congress , 2014 . A. Baaziz and L. Quoniam. How to use big data technologies to optimize operations in upstream petroleum industry. In 21<sup>st<\/sup> World Petroleum Congress, 2014."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/3007263.3007279"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.223544"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/548301.827880"},{"key":"e_1_2_1_23_1","first-page":"7","volume-title":"IPSJ","author":"Chawathe S. S.","year":"1994","unstructured":"S. S. Chawathe , H. Garcia-Molina , J. Hammer , K. Ireland , Y. Papakonstantinou , J. D. Ullman , and J. Widom . The TSIMMIS Project: Integration of Heterogeneous Information Sources . In IPSJ , pages 7 -- 18 , 1994 . S. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. D. Ullman, and J. Widom. The TSIMMIS Project: Integration of Heterogeneous Information Sources. In IPSJ, pages 7--18, 1994."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/MS.2016.36"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465327"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_2_1_27_1","volume-title":"CIDR","author":"Deng D.","year":"2017","unstructured":"D. Deng , R. C. Fernandez , Z. Abedjan , S. Wang , M. Stonebraker , A. K. Elmagarmid , I. F. Ilyas , S. Madden , M. Ouzzani , and N. Tang . The Data Civilizer System . In CIDR , 2017 . D. Deng, R. C. Fernandez, Z. Abedjan, S. Wang, M. Stonebraker, A. K. Elmagarmid, I. F. Ilyas, S. Madden, M. Ouzzani, and N. Tang. The Data Civilizer System. In CIDR, 2017."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2463709"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2016.7840605"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824098"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1366102.1366103"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3058740"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2684822.2685326"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2741948.2741968"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2987550.2987567"},{"key":"e_1_2_1_36_1","unstructured":"A. Hems A. Soofi and E. Perez. How innovative oil and gas companies are using big data to outmaneuver the competition. Microsoft White Paper http:\/\/goo.gl\/2Bn0xq 2014.  A. Hems A. Soofi and E. Perez. How innovative oil and gas companies are using big data to outmaneuver the competition. Microsoft White Paper http:\/\/goo.gl\/2Bn0xq 2014."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350244"},{"key":"e_1_2_1_38_1","unstructured":"IBM. Data-driven healthcare organizations use big data analytics for big gains. White paper http:\/\/goo.gl\/AFIHpk.  IBM. Data-driven healthcare organizations use big data analytics for big gains. White paper http:\/\/goo.gl\/AFIHpk."},{"key":"e_1_2_1_39_1","volume-title":"ICDE (tutorial)","author":"Kaoudi Z.","year":"2018","unstructured":"Z. Kaoudi and J.-A. Quian\u00e9-Ruiz . Cross-Platform Data Processing: Use Cases and Challenges . In ICDE (tutorial) , 2018 . Z. Kaoudi and J.-A. Quian\u00e9-Ruiz. Cross-Platform Data Processing: Use Cases and Challenges. In ICDE (tutorial), 2018."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064042"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2747646"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.14778\/2831360.2831362"},{"key":"e_1_2_1_43_1","volume-title":"RHEEMix in the Data Jungle - A Cross-Platform Query Optimizer. arXiv","author":"Kruse S.","year":"1805","unstructured":"S. Kruse , Z. Kaoudi , J.-A. Quian\u00e9-Ruiz , S. Chawla , F. Naumann , and B. Contreras-Rojas . RHEEMix in the Data Jungle - A Cross-Platform Query Optimizer. arXiv : 1805 .03533 https:\/\/arxiv.org\/abs\/1805.03533, 2018. S. Kruse, Z. Kaoudi, J.-A. Quian\u00e9-Ruiz, S. Chawla, F. Naumann, and B. Contreras-Rojas. RHEEMix in the Data Jungle - A Cross-Platform Query Optimizer. arXiv: 1805.03533 https:\/\/arxiv.org\/abs\/1805.03533, 2018."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2588568"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"key":"e_1_2_1_46_1","volume-title":"CIDR","author":"Lim H.","year":"2013","unstructured":"H. Lim , Y. Han , and S. Babu . How to Fit when No One Size Fits . In CIDR , 2013 . H. Lim, Y. Han, and S. Babu. How to Fit when No One Size Fits. In CIDR, 2013."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2018.00179"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007642"},{"key":"e_1_2_1_49_1","volume-title":"Manning","author":"Marz N.","year":"2015","unstructured":"N. Marz and J. Warren . Big Data: Principles and best practices of scalable realtime data systems . Manning , 2015 . N. Marz and J. Warren. Big Data: Principles and best practices of scalable realtime data systems. Manning, 2015."},{"key":"e_1_2_1_50_1","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/3927.001.0001","volume-title":"An introduction to genetic algorithms","author":"Mitchell M.","year":"1998","unstructured":"M. Mitchell . An introduction to genetic algorithms . MIT press , 1998 . M. Mitchell. An introduction to genetic algorithms. MIT press, 1998."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376726"},{"key":"e_1_2_1_52_1","volume-title":"CIDR","author":"Palkar S.","year":"2017","unstructured":"S. Palkar , J. J. Thomas , A. Shanbhag , M. Schwarzkopt , S. P. Amarasinghe , and M. Zaharia . Weld: A Common Runtime for High Performance Data Analysis . In CIDR , 2017 . S. Palkar, J. J. Thomas, A. Shanbhag, M. Schwarzkopt, S. P. Amarasinghe, and M. Zaharia. Weld: A Common Runtime for High Performance Data Analysis. In CIDR, 2017."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559865"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2015.04.002"},{"key":"e_1_2_1_55_1","volume-title":"NoSQL distilled: A brief guide to the emerging world of polyglot persistence","author":"Sadalage P. J.","year":"2012","unstructured":"P. J. Sadalage and M. Fowler . NoSQL distilled: A brief guide to the emerging world of polyglot persistence . Addison-Wesley Professional , 2012 . P. J. Sadalage and M. Fowler. NoSQL distilled: A brief guide to the emerging world of polyglot persistence. Addison-Wesley Professional, 2012."},{"key":"e_1_2_1_56_1","unstructured":"S. Shankar A. Choi and J.-P. Dijcks. Integrating Hadoop Data with Oracle Parallel Processing. Oracle White Paper http:\/\/www.oracle.com\/technetwork\/database\/bi-datawarehousing\/twp-integrating-hadoop-data-with-or-130063.pdf 2010.  S. Shankar A. Choi and J.-P. Dijcks. Integrating Hadoop Data with Oracle Parallel Processing. Oracle White Paper http:\/\/www.oracle.com\/technetwork\/database\/bi-datawarehousing\/twp-integrating-hadoop-data-with-or-130063.pdf 2010."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/96602.96604"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213963"},{"key":"e_1_2_1_59_1","volume-title":"CASCON","author":"Statchuk C.","year":"2015","unstructured":"C. Statchuk , N. Madhavji , A. Miranskyy , and F. Dehne . Taming a tiger: Software engineering in the era of big data & continuous development . In CASCON , 2015 . C. Statchuk, N. Madhavji, A. Miranskyy, and F. Dehne. Taming a tiger: Software engineering in the era of big data & continuous development. In CASCON, 2015."},{"key":"e_1_2_1_60_1","volume-title":"http:\/\/wp.sigmod.org\/?p=1629","author":"Stonebraker M.","year":"2015","unstructured":"M. Stonebraker . The Case for Polystores . http:\/\/wp.sigmod.org\/?p=1629 , 2015 . M. Stonebraker. The Case for Polystores. http:\/\/wp.sigmod.org\/?p=1629, 2015."},{"key":"e_1_2_1_61_1","first-page":"406","volume-title":"Euro-Par","author":"Tsoumakos D.","year":"2013","unstructured":"D. Tsoumakos and C. Mantas . The Case for Multi-Engine Data Analytics . In Euro-Par , pages 406 -- 415 , 2013 . D. Tsoumakos and C. Mantas. The Case for Multi-Engine Data Analytics. In Euro-Par, pages 406--415, 2013."},{"key":"e_1_2_1_62_1","volume-title":"CIDR","author":"Wang J.","year":"2017","unstructured":"J. Wang , T. Baker , M. Balazinska , D. Halperin , B. Haynes , B. Howe , D. Hutchison , S. Jain , R. Maas , P. Mehta , D. Moritz , B. Myers , J. Ortiz , D. Suciu , A. Whitaker , and S. Xu . The Myria Big Data Management and Analytics System and Cloud Services . In CIDR , 2017 . J. Wang, T. Baker, M. Balazinska, D. Halperin, B. Haynes, B. Howe, D. Hutchison, S. Jain, R. Maas, P. Mehta, D. Moritz, B. Myers, J. Ortiz, D. Suciu, A. Whitaker, and S. Xu. The Myria Big Data Management and Analytics System and Cloud Services. In CIDR, 2017."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.14778\/2876473.2876477"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3236187.3236195","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:45:56Z","timestamp":1672220756000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3236187.3236195"}},"subtitle":["may the big data be with you!"],"short-title":[],"issued":{"date-parts":[[2018,7]]},"references-count":63,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2018,7]]}},"alternative-id":["10.14778\/3236187.3236195"],"URL":"https:\/\/doi.org\/10.14778\/3236187.3236195","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2018,7]]}}}