{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T09:51:47Z","timestamp":1773481907890,"version":"3.50.1"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2015,8]]},"abstract":"<jats:p>Data analytics has recently grown to include increasingly sophisticated techniques, such as machine learning and advanced statistics. Users frequently express these complex analytics tasks as workflows of user-defined functions (UDFs) that specify each algorithmic step. However, given typical hardware configurations and dataset sizes, the core challenge of complex analytics is no longer sheer data volume but rather the computation itself, and the next generation of analytics frameworks must focus on optimizing for this computation bottleneck. While query compilation has gained widespread popularity as a way to tackle the computation bottleneck for traditional SQL workloads, relatively little work addresses UDF-centric workflows in the domain of complex analytics.<\/jats:p>\n          <jats:p>\n            In this paper, we describe a novel architecture for automatically compiling workflows of UDFs. We also propose several optimizations that consider properties of the data, UDFs, and hardware together in order to generate different code on a case-by-case basis. To evaluate our approach, we implemented these techniques in T\n            <jats:sc>upleware<\/jats:sc>\n            , a new high-performance distributed analytics system, and our benchmarks show performance improvements of up to three orders of magnitude compared to alternative systems.\n          <\/jats:p>","DOI":"10.14778\/2824032.2824045","type":"journal-article","created":{"date-parts":[[2015,9,16]],"date-time":"2015-09-16T12:18:17Z","timestamp":1442405897000},"page":"1466-1477","source":"Crossref","is-referenced-by-count":64,"title":["An architecture for compiling UDF-centric workflows"],"prefix":"10.14778","volume":"8","author":[{"given":"Andrew","family":"Crotty","sequence":"first","affiliation":[{"name":"Brown University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alex","family":"Galakatos","sequence":"additional","affiliation":[{"name":"Brown University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kayhan","family":"Dursun","sequence":"additional","affiliation":[{"name":"Brown University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tim","family":"Kraska","sequence":"additional","affiliation":[{"name":"Brown University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carsten","family":"Binnig","sequence":"additional","affiliation":[{"name":"Brown University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ugur","family":"Cetintemel","sequence":"additional","affiliation":[{"name":"Brown University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stan","family":"Zdonik","sequence":"additional","affiliation":[{"name":"Brown University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Apache Hadoop. http:\/\/hadoop.apache.org.  Apache Hadoop. http:\/\/hadoop.apache.org."},{"key":"e_1_2_1_2_1","unstructured":"BLAS. http:\/\/netlib.org\/blas\/.  BLAS. http:\/\/netlib.org\/blas\/."},{"key":"e_1_2_1_3_1","unstructured":"Lists of instruction latencies throughputs and micro-operation breakdowns for Intel AMD and VIA CPUs. http:\/\/agner.org\/optimize\/instruction_tables.pdf.  Lists of instruction latencies throughputs and micro-operation breakdowns for Intel AMD and VIA CPUs. http:\/\/agner.org\/optimize\/instruction_tables.pdf."},{"key":"e_1_2_1_4_1","unstructured":"Mahout. http:\/\/mahout.apache.org\/.  Mahout. http:\/\/mahout.apache.org\/."},{"key":"e_1_2_1_5_1","unstructured":"Matlab. http:\/\/mathworks.com\/products\/matlab\/.  Matlab. http:\/\/mathworks.com\/products\/matlab\/."},{"key":"e_1_2_1_6_1","unstructured":"MLlib. http:\/\/spark.apache.org\/mllib\/.  MLlib. http:\/\/spark.apache.org\/mllib\/."},{"key":"e_1_2_1_7_1","unstructured":"R Project. http:\/\/r-project.org\/.  R Project. http:\/\/r-project.org\/."},{"key":"e_1_2_1_8_1","unstructured":"UK Crime Dataset. http:\/\/data.police.uk\/.  UK Crime Dataset. http:\/\/data.police.uk\/."},{"key":"e_1_2_1_9_1","unstructured":"Wikipedia Webgraph. http:\/\/dumps.wikimedia.org\/.  Wikipedia Webgraph. http:\/\/dumps.wikimedia.org\/."},{"key":"e_1_2_1_10_1","first-page":"1898","volume-title":"VLDB","author":"Alsubaiee S.","year":"2012"},{"key":"e_1_2_1_11_1","first-page":"42","volume-title":"IEEE Computer","author":"Astrahan M. M.","year":"1979"},{"key":"e_1_2_1_12_1","first-page":"591","volume-title":"ISMIR","author":"Bertin-Mahieux T.","year":"2011"},{"key":"e_1_2_1_13_1","first-page":"24","volume-title":"IEEE Data Eng. Bull.","author":"Borkar V. R.","year":"2012"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920881"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1859127.1859141"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2487575.2487677"},{"key":"e_1_2_1_17_1","first-page":"1265","volume-title":"VLDB","author":"Chaiken R.","year":"2008"},{"key":"e_1_2_1_18_1","first-page":"1318","volume-title":"VLDB","author":"Chattopadhyay B.","year":"2011"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367519"},{"key":"e_1_2_1_20_1","first-page":"137","volume-title":"OSDI","author":"Dean J.","year":"2004"},{"key":"e_1_2_1_21_1","first-page":"810","volume-title":"HPDC","author":"Ekanayake J.","year":"2010"},{"key":"e_1_2_1_22_1","first-page":"231","volume-title":"ICDE","author":"Ghoting A.","year":"2011"},{"key":"e_1_2_1_23_1","unstructured":"B. Graham and M. R. Rangaswami. Do You Hadoop? A Survey of Big Data Practitioners. 2013.  B. Graham and M. R. Rangaswami. Do You Hadoop? A Survey of Big Data Practitioners. 2013."},{"key":"e_1_2_1_24_1","first-page":"41","volume-title":"IEEE Data Eng. Bull.","author":"Kemper A.","year":"2013"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732951.2732959"},{"key":"e_1_2_1_26_1","first-page":"613","volume-title":"ICDE","author":"Krikellas K.","year":"2010"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/359545.359563"},{"key":"e_1_2_1_28_1","first-page":"75","volume-title":"CGO","author":"Lattner C.","year":"2004"},{"key":"e_1_2_1_29_1","unstructured":"A. Nadkarni and L. DuBois. Trends in Enterprise Hadoop Deployments. 2013.  A. Nadkarni and L. DuBois. Trends in Enterprise Hadoop Deployments. 2013."},{"key":"e_1_2_1_30_1","first-page":"539","volume-title":"VLDB","author":"Neumann T.","year":"2011"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485278.2485284"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.40"},{"key":"e_1_2_1_33_1","first-page":"693","volume-title":"NIPS","author":"Recht B.","year":"2011"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536206.2536213"},{"key":"e_1_2_1_35_1","first-page":"109","volume-title":"PODS","author":"Ross K. A.","year":"2002"},{"key":"e_1_2_1_36_1","first-page":"1297","volume-title":"ICDE","author":"Ross K. A.","year":"2007"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2169090.2169092"},{"key":"e_1_2_1_38_1","first-page":"1231","volume-title":"SIGMOD","author":"R\u0103ducanu B.","year":"2013"},{"key":"e_1_2_1_39_1","first-page":"609","volume-title":"ICML","author":"Sujeeth A. K.","year":"2011"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1996092.1996095"},{"key":"e_1_2_1_41_1","first-page":"1292","volume-title":"ICDE","author":"Tzoumas K.","year":"2013"},{"key":"e_1_2_1_42_1","first-page":"31","volume-title":"IEEE Data Eng. Bull.","author":"Wanderman-Milne S.","year":"2014"},{"key":"e_1_2_1_43_1","first-page":"1","volume-title":"OSDI","author":"Yu Y.","year":"2008"},{"key":"e_1_2_1_44_1","first-page":"15","volume-title":"NSDI","author":"Zaharia M.","year":"2012"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732977.2733001"},{"key":"e_1_2_1_46_1","first-page":"17","volume-title":"IEEE Data Eng. Bull.","author":"Zukowski M.","year":"2005"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2012.148"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2824032.2824045","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:17:01Z","timestamp":1672222621000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2824032.2824045"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,8]]},"references-count":47,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2015,8]]}},"alternative-id":["10.14778\/2824032.2824045"],"URL":"https:\/\/doi.org\/10.14778\/2824032.2824045","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2015,8]]}}}