{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T20:57:19Z","timestamp":1776891439678,"version":"3.51.2"},"reference-count":54,"publisher":"Centre pour la Communication Scientifique Directe (CCSD)","license":[{"start":{"date-parts":[[2017,12,11]],"date-time":"2017-12-11T00:00:00Z","timestamp":1512950400000},"content-version":"unspecified","delay-in-days":344,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Funct. Prog."],"published-print":{"date-parts":[[2017]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>We present an algebra for data-intensive scalable computing based on monoid homomorphisms that consists of a small set of operations that capture most features supported by current domain-specific languages for data-centric distributed computing. This algebra is being used as the formal basis of MRQL, which is a query processing and optimization system for large-scale distributed data analysis. The MRQL semantics is given in terms of monoid comprehensions, which support group-by and order-by syntax and can work on heterogeneous collections without requiring any extension to the monoid algebra. We present the syntax and semantics of monoid comprehensions and provide rules to translate them to the monoid algebra. We give evidence of the effectiveness of our algebra by presenting some important optimization rules, such as converting nested queries to joins.<\/jats:p>","DOI":"10.1017\/s0956796817000193","type":"journal-article","created":{"date-parts":[[2017,12,11]],"date-time":"2017-12-11T02:19:39Z","timestamp":1512958779000},"source":"Crossref","is-referenced-by-count":13,"title":["An algebra for distributed Big Data analytics"],"prefix":"10.46298","volume":"27","author":[{"given":"LEONIDAS","family":"FEGARAS","sequence":"first","affiliation":[]}],"member":"25203","published-online":{"date-parts":[[2017,12,11]]},"reference":[{"key":"S0956796817000193_ref54","doi-asserted-by":"crossref","unstructured":"Zaharia M. , Das T. , Li H. , Hunter T. , Shenker S. & Stoica I. (2013) Discretized streams: Fault-tolerant streaming computation at scale. In Symposium on Operating Systems Principles (SOSP).","DOI":"10.1145\/2517349.2522737"},{"key":"S0956796817000193_ref53","unstructured":"Zaharia M. , Chowdhury M. , Das T. , Dave A. , Ma J. , McCauley M. , Franklin M. J. , Shenker S. & Stoica I. (2012) Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In USENIX Symposium on Networked Systems Design and Implementation (NSDI)."},{"key":"S0956796817000193_ref48","doi-asserted-by":"publisher","DOI":"10.1145\/79173.79181"},{"key":"S0956796817000193_ref47","unstructured":"Trinder P. W. (1991) Comprehensions, a query notation for DBPLs. In International Workshop on Database Programming Languages (DBPL). pp. 55\u201368."},{"key":"S0956796817000193_ref46","unstructured":"Trinder P. & Wadler P. (1989) Improving list comprehension database queries. In TENCON. pp. 186\u2013192."},{"key":"S0956796817000193_ref44","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687609"},{"key":"S0956796817000193_ref42","unstructured":"Steele G. L. Jr. (2009) Organizing functional code for parallel execution or, foldl and foldr considered slightly harmful. In ICFP. pp. 1\u20132."},{"key":"S0956796817000193_ref41","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367513"},{"key":"S0956796817000193_ref40","unstructured":"Power R. & Li J. (2010) Piccolo: Building fast, distributed programs with partitioned tables. In Symposium on Operating System Design and Implementation (OSDI)."},{"key":"S0956796817000193_ref39","doi-asserted-by":"crossref","unstructured":"Olston C. , Reed B. , Srivastava U. , Kumar R. & Tomkins A. (2008) Pig Latin: A not-so-foreign language for data processing. In ACM SIGMOD International Conference on Management of Data. pp. 1099\u20131110.","DOI":"10.1145\/1376616.1376726"},{"key":"S0956796817000193_ref36","doi-asserted-by":"crossref","DOI":"10.2200\/S00274ED1V01Y201006HLT007","volume-title":"Data-Intensive Text Processing with MapReduce","author":"Lin","year":"2010"},{"key":"S0956796817000193_ref34","doi-asserted-by":"crossref","unstructured":"Holsch J. , Grossniklaus M. & Scholl M. H. (2016) Optimization of nested queries using the NF2 algebra. In ACM SIGMOD International Conference on Management of Data. pp. 1765\u20131780.","DOI":"10.1145\/2882903.2915241"},{"key":"S0956796817000193_ref33","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008705026446"},{"key":"S0956796817000193_ref32","doi-asserted-by":"crossref","unstructured":"Giorgidze G. , Grust T. , Schweinsberg N. & Weijers J. (2011) Bringing back monad comprehensions. In Haskell Symposium, pp. 13\u201322.","DOI":"10.1145\/2034675.2034678"},{"key":"S0956796817000193_ref31","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-30936-1_7"},{"key":"S0956796817000193_ref28","doi-asserted-by":"publisher","DOI":"10.1145\/377674.377676"},{"key":"S0956796817000193_ref27","doi-asserted-by":"crossref","unstructured":"Fegaras L. & Maier D. (1995) Towards an effective calculus for object query languages. In International Conference on Management of Data (SIGMOD). pp. 47\u201358.","DOI":"10.1145\/568271.223789"},{"key":"S0956796817000193_ref25","unstructured":"Fegaras L. , Li C. , Gupta U. & Philip J. J. (2011) XML query optimization in map-reduce. In International Workshop on the Web and Databases (WebDB)."},{"key":"S0956796817000193_ref24","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2016.2601103"},{"key":"S0956796817000193_ref22","unstructured":"Dean J. & Ghemawat S. (2004) MapReduce: Simplified data processing on large clusters. In Symposium on Operating System Design and Implementation (OSDI)."},{"key":"S0956796817000193_ref18","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2011.73"},{"key":"S0956796817000193_ref14","doi-asserted-by":"crossref","unstructured":"Battre D. , Ewen S. , Hueske F. , Kao O. , Markl V. & Warneke D. (2010) Nephele\/PACTs: A programming model and execution framework for web-scale analytical processing. In 1st ACM Symposium on Cloud Computing (SOCC'10). pp. 119\u2013130.","DOI":"10.1145\/1807128.1807148"},{"key":"S0956796817000193_ref11","unstructured":"Apache Spark. (2017) Available at: http:\/\/spark.apache.org\/, accessed January 2, 2017."},{"key":"S0956796817000193_ref8","unstructured":"Apache Hama. (2017) Available at: http:\/\/hama.apache.org\/, accessed January 2, 2017."},{"key":"S0956796817000193_ref6","unstructured":"Apache Giraph. (2017) Available at: http:\/\/giraph.apache.org\/, accessed January 2, 2017."},{"key":"S0956796817000193_ref5","unstructured":"Apache Flink. (2017) Available at: http:\/\/flink.apache.org\/, accessed January 2, 2017."},{"key":"S0956796817000193_ref3","doi-asserted-by":"publisher","DOI":"10.1145\/2949741.2949754"},{"key":"S0956796817000193_ref2","doi-asserted-by":"crossref","unstructured":"A\u00eft-Kaci H. (2013) An abstract, reusable & extensible programming language design architecture. In In Search of Elegance in the Theory and Practice of Computation, Springer 2013, LNCS 8000, pp. 112\u2013166. Available at http:\/\/hassan-ait-kaci.net\/pdf\/hak-opb.pdf.","DOI":"10.1007\/978-3-642-41660-6_6"},{"key":"S0956796817000193_ref49","doi-asserted-by":"crossref","unstructured":"Wadler P. (1990) Comprehending monads. In ACM Symposium on Lisp and Functional Programming. pp. 61\u201378.","DOI":"10.1145\/91556.91592"},{"key":"S0956796817000193_ref1","doi-asserted-by":"publisher","DOI":"10.1145\/1596527.1596530"},{"key":"S0956796817000193_ref51","unstructured":"Wadler P. & Peyton Jones S. (2007) Comprehensive comprehensions (comprehensions with \u2018Order by\u2019 and \u2018Group by\u2019). In Haskell Symposium. pp. 61\u201372."},{"key":"S0956796817000193_ref17","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733016"},{"key":"S0956796817000193_ref9","unstructured":"Apache Hive. (2017) Available at: http:\/\/hive.apache.org\/, accessed January 2, 2017."},{"key":"S0956796817000193_ref12","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-57499-9_15"},{"key":"S0956796817000193_ref15","unstructured":"Blelloch G. (1993) NESL: A nested data-parallel language. Technical report, Carnegie Mellon University. CMU-CS-93-129."},{"key":"S0956796817000193_ref45","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447738"},{"key":"S0956796817000193_ref13","unstructured":"Bancilhon F. , Briggs T. , Khoshafian S. & Valduriez P. (1987) FAD, a powerful and simple database language. In Proceedings of the International Conference on Very Large Data Bases. pp. 97\u2013105."},{"key":"S0956796817000193_ref21","doi-asserted-by":"crossref","unstructured":"Chakrabarti D. , Zhan Y. & Faloutsos C. (2004) R-MAT: A recursive model for graph mining. In SIAM International Conference on Data Mining (SDM). pp. 442\u2013446.","DOI":"10.1137\/1.9781611972740.43"},{"key":"S0956796817000193_ref43","unstructured":"Tannen V. B. , Buneman P. & Naqvi S. (1991) Structural recursion as a query language. In International Workshop on Database Programming Languages: Bulk Types and Persistent Data (DBPL). pp. 9\u201319."},{"key":"S0956796817000193_ref19","doi-asserted-by":"publisher","DOI":"10.1145\/181550.181564"},{"key":"S0956796817000193_ref35","unstructured":"Isard M. & Yu Y. (2009) Distributed data-parallel computing using a high-level programming language. In ACM SIGMOD International Conference on Management of Data. pp. 987\u2013994."},{"key":"S0956796817000193_ref20","doi-asserted-by":"publisher","DOI":"10.14778\/1454159.1454166"},{"key":"S0956796817000193_ref10","unstructured":"Apache MRQL (incubating). (2017) Available at: http:\/\/mrql.incubator.apache.org\/ The MRQL syntax is described at http:\/\/wiki.apache.org\/mrql\/LanguageDescription, accessed January 2, 2017."},{"key":"S0956796817000193_ref29","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687568"},{"key":"S0956796817000193_ref52","doi-asserted-by":"publisher","DOI":"10.1017\/S0956796899003585"},{"key":"S0956796817000193_ref30","doi-asserted-by":"publisher","DOI":"10.1017\/S0956796800001908"},{"key":"S0956796817000193_ref37","doi-asserted-by":"publisher","DOI":"10.14778\/2212351.2212354"},{"key":"S0956796817000193_ref50","volume-title":"The Implementation of Functional Programming Languages","author":"Wadler","year":"1987"},{"key":"S0956796817000193_ref26","doi-asserted-by":"crossref","unstructured":"Fegaras L. , Li C. & Gupta U. (2012) An optimization framework for map-reduce queries. In International Conference on Extending Database Technology (EDBT). pp. 26\u201337.","DOI":"10.1145\/2247596.2247601"},{"key":"S0956796817000193_ref23","doi-asserted-by":"crossref","unstructured":"Fegaras L. (2012) Supporting bulk synchronous parallelism in map-reduce queries. In International Workshop on Data Intensive Computing in the Clouds (DataCloud).","DOI":"10.1109\/SC.Companion.2012.129"},{"key":"S0956796817000193_ref4","doi-asserted-by":"crossref","unstructured":"Armbrust M. , Xin R. S. , Lian C. , Huai Y. , Liu D. , Bradley J. K. , Meng X. , Kaftan T. , Franklin M. J. , Ghodsi A. & Zaharia M. (2015) Spark SQL: Relational data processing in Spark. In International Conference on Management of Data (SIGMOD). pp. 1383\u20131394.","DOI":"10.1145\/2723372.2742797"},{"key":"S0956796817000193_ref38","doi-asserted-by":"crossref","unstructured":"Malewicz G. , Austern M. H. , Bik A. J. C. , Dehnert J. C. , Horn I. , Leiser N. & Czajkowski G. (2010) Pregel: A system for large-scale graph processing. In ACM SIGMOD International Conference on Management of Data. pp. 135\u2013146.","DOI":"10.1145\/1807167.1807184"},{"key":"S0956796817000193_ref7","unstructured":"Apache Hadoop. (2017) Available at: http:\/\/hadoop.apache.org\/, accessed January 2, 2017."},{"key":"S0956796817000193_ref16","doi-asserted-by":"publisher","DOI":"10.1016\/0743-7315(90)90087-6"}],"container-title":["Journal of Functional Programming"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0956796817000193","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T20:19:52Z","timestamp":1776889192000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0956796817000193\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017]]},"references-count":54,"alternative-id":["S0956796817000193"],"URL":"https:\/\/doi.org\/10.1017\/s0956796817000193","relation":{},"ISSN":["0956-7968","1469-7653"],"issn-type":[{"value":"0956-7968","type":"print"},{"value":"1469-7653","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017]]},"article-number":"e27"}}