{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T15:05:53Z","timestamp":1773414353945,"version":"3.50.1"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2012,8]]},"abstract":"<jats:p>Within the past few years, organizations in diverse industries have adopted MapReduce-based systems for large-scale data processing. Along with these new users, important new workloads have emerged which feature many small, short, and increasingly interactive jobs in addition to the large, long-running batch jobs for which MapReduce was originally designed. As interactive, large-scale query processing is a strength of the RDBMS community, it is important that lessons from that field be carried over and applied where possible in this new domain. However, these new workloads have not yet been described in the literature. We fill this gap with an empirical analysis of MapReduce traces from six separate business-critical deployments inside Facebook and at Cloudera customers in e-commerce, telecommunications, media, and retail. Our key contribution is a characterization of new MapReduce workloads which are driven in part by interactive analysis, and which make heavy use of query-like programming frameworks on top of MapReduce. These workloads display diverse behaviors which invalidate prior assumptions about MapReduce such as uniform data access, regular diurnal patterns, and prevalence of large jobs. A secondary contribution is a first step towards creating a TPC-like data processing benchmark for MapReduce.<\/jats:p>","DOI":"10.14778\/2367502.2367519","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"1802-1813","source":"Crossref","is-referenced-by-count":319,"title":["Interactive analytical processing in big data systems"],"prefix":"10.14778","volume":"5","author":[{"given":"Yanpei","family":"Chen","sequence":"first","affiliation":[{"name":"University of California, Berkeley"}]},{"given":"Sara","family":"Alspaugh","sequence":"additional","affiliation":[{"name":"University of California, Berkeley"}]},{"given":"Randy","family":"Katz","sequence":"additional","affiliation":[{"name":"University of California, Berkeley"}]}],"member":"320","published-online":{"date-parts":[[2012,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Apache Hive. http:\/\/hive.apache.org\/. Apache Hive. http:\/\/hive.apache.org\/."},{"key":"e_1_2_1_2_1","unstructured":"Apache Oozie(TM) Workflow Scheduler for Hadoop. http:\/\/incubator.apache.org\/oozie\/. Apache Oozie(TM) Workflow Scheduler for Hadoop. http:\/\/incubator.apache.org\/oozie\/."},{"key":"e_1_2_1_3_1","unstructured":"Apache Pig. http:\/\/pig.apache.org\/. Apache Pig. http:\/\/pig.apache.org\/."},{"key":"e_1_2_1_4_1","unstructured":"Gridmix. HADOOP-HOME\/mapred\/src\/benchmarks\/gridmix in Hadoop 0.21.0 onwards. Gridmix. HADOOP-HOME\/mapred\/src\/benchmarks\/gridmix in Hadoop 0.21.0 onwards."},{"key":"e_1_2_1_5_1","unstructured":"Hadoop World 2011 Speakers. http:\/\/www.hadoopworld.com\/speakers\/. Hadoop World 2011 Speakers. http:\/\/www.hadoopworld.com\/speakers\/."},{"key":"e_1_2_1_6_1","unstructured":"Sort benchmark home page. http:\/\/sortbenchmark.org\/. Sort benchmark home page. http:\/\/sortbenchmark.org\/."},{"key":"e_1_2_1_7_1","first-page":"63","volume-title":"SIGCOMM","author":"Al-Fares M.","year":"2008"},{"key":"e_1_2_1_8_1","first-page":"63","volume-title":"SIGCOMM","author":"Alizadeh M.","year":"2010"},{"key":"e_1_2_1_9_1","volume-title":"MIT Press","author":"Alpaydin E.","year":"2004"},{"key":"e_1_2_1_10_1","first-page":"1","volume-title":"OSDI","author":"Ananthanarayanan G.","year":"2010"},{"key":"e_1_2_1_11_1","first-page":"287","volume-title":"Eurosys","author":"Ananthanarayanan G.","year":"2011"},{"key":"e_1_2_1_12_1","first-page":"20","volume-title":"NSDI","author":"Ananthanarayanan G.","year":"2012"},{"key":"e_1_2_1_13_1","first-page":"1","volume-title":"FAST","author":"Bairavasundaram L.","year":"2008"},{"key":"e_1_2_1_14_1","first-page":"1071","volume-title":"SIGMOD","author":"Borthakur D.","year":"2011"},{"key":"e_1_2_1_15_1","first-page":"126","volume-title":"INFOCOM","author":"Breslau L.","year":"1999"},{"key":"e_1_2_1_16_1","first-page":"285","volume-title":"VLDB","author":"Bu Y.","year":"2010"},{"key":"e_1_2_1_17_1","first-page":"43","volume-title":"SOSP","author":"Chen Y.","year":"2011"},{"key":"e_1_2_1_18_1","first-page":"390","volume-title":"MASCOTS","author":"Chen Y.","year":"2011"},{"key":"e_1_2_1_19_1","first-page":"43","volume-title":"EuroSys","author":"Chen Y.","year":"2012"},{"key":"e_1_2_1_20_1","first-page":"98","volume-title":"SIGCOMM","author":"Chowdhury M.","year":"2011"},{"key":"e_1_2_1_21_1","unstructured":"Cloudera Inc. Cloudera Manager Datasheet. Cloudera Inc. Cloudera Manager Datasheet."},{"key":"e_1_2_1_22_1","first-page":"107","volume-title":"OSDI","author":"Dean J.","year":"2004"},{"key":"e_1_2_1_23_1","first-page":"515","volume-title":"VLDB","author":"Dittrich J.","year":"2010"},{"key":"e_1_2_1_24_1","unstructured":"EMC and IDC iView. Digital Universe. http:\/\/www.emc.com\/leadership\/programs\/digital-universe.htm. EMC and IDC iView. Digital Universe. http:\/\/www.emc.com\/leadership\/programs\/digital-universe.htm."},{"key":"e_1_2_1_25_1","first-page":"43","volume-title":"NSDI","author":"Feamster N.","year":"2005"},{"key":"e_1_2_1_26_1","first-page":"87","volume-title":"SMDB","author":"Ganapathi A.","year":"2010"},{"key":"e_1_2_1_27_1","first-page":"1414","volume-title":"VLDB","author":"Gates A. F.","year":"2009"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1145\/191839.191886","volume-title":"SIGMOD","author":"Gray J.","year":"1994"},{"key":"e_1_2_1_29_1","first-page":"51","volume-title":"SIGCOMM","author":"Greenberg A.","year":"2009"},{"key":"e_1_2_1_30_1","volume-title":"United Nations Economic Commission for Europe","author":"Hellerstein J.","year":"2008"},{"key":"e_1_2_1_31_1","first-page":"1111","volume-title":"VLDB","author":"Herodotou H.","year":"2011"},{"key":"e_1_2_1_32_1","first-page":"22","volume-title":"NSDI","author":"Hindman B.","year":"2011"},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1145\/1629575.1629601","volume-title":"SOSP","author":"Isard M.","year":"2009"},{"key":"e_1_2_1_34_1","first-page":"385","volume-title":"VLDB","author":"Jahani E.","year":"2011"},{"key":"e_1_2_1_35_1","first-page":"1105","volume-title":"VLDB","author":"Krompass S.","year":"2007"},{"key":"e_1_2_1_36_1","first-page":"129","volume-title":"VLDB","author":"Lang W.","year":"2010"},{"key":"e_1_2_1_37_1","first-page":"1","volume-title":"SIGCOMM","author":"Leland W.","year":"1993"},{"key":"e_1_2_1_38_1","first-page":"319","volume-title":"ISCA","author":"Meisner D.","year":"2011"},{"key":"e_1_2_1_39_1","first-page":"330","volume-title":"VLDB","author":"Melnik S.","year":"2010"},{"key":"e_1_2_1_40_1","first-page":"44","volume-title":"ICAC","author":"Mesnier M.","year":"2004"},{"key":"e_1_2_1_41_1","first-page":"34","volume-title":"SIGMETRICS","author":"Mishra A.","year":"2010"},{"key":"e_1_2_1_42_1","first-page":"299","volume-title":"SIGCOMM","author":"Mogul J. C.","year":"1995"},{"key":"e_1_2_1_43_1","first-page":"507","volume-title":"SIGMOD","author":"Morton K.","year":"2010"},{"key":"e_1_2_1_44_1","first-page":"15","volume-title":"SOSP","author":"Ousterhout J.","year":"1985"},{"key":"e_1_2_1_45_1","first-page":"165","volume-title":"SIGMOD","author":"Pavlo A.","year":"2009"},{"key":"e_1_2_1_46_1","first-page":"139","volume-title":"SIGCOMM","author":"Paxson V.","year":"1997"},{"key":"e_1_2_1_47_1","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1145\/1460412.1460416","volume-title":"SenSys","author":"Srinivasan K.","year":"2008"},{"key":"e_1_2_1_48_1","first-page":"169","volume-title":"EuroSys","author":"Thereska E.","year":"2011"},{"key":"e_1_2_1_49_1","first-page":"1626","volume-title":"VLDB","author":"Thusoo A.","year":"2009"},{"key":"e_1_2_1_50_1","unstructured":"Transactional Processing Performance Council. The TPC-W Benchmark. http:\/\/www.tpc.org\/tpcw\/default.asp. Transactional Processing Performance Council. The TPC-W Benchmark. http:\/\/www.tpc.org\/tpcw\/default.asp."},{"key":"e_1_2_1_51_1","unstructured":"Transactional Processing Performance Council. TPC-* Benchmarks. http:\/\/www.tpc.org\/. Transactional Processing Performance Council. TPC-* Benchmarks. http:\/\/www.tpc.org\/."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2367502.2367519","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,28]],"date-time":"2024-05-28T00:01:27Z","timestamp":1716854487000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2367502.2367519"}},"subtitle":["a cross-industry study of MapReduce workloads"],"short-title":[],"issued":{"date-parts":[[2012,8]]},"references-count":51,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2012,8]]}},"alternative-id":["10.14778\/2367502.2367519"],"URL":"https:\/\/doi.org\/10.14778\/2367502.2367519","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2012,8]]}}}