{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T14:36:57Z","timestamp":1775745417972,"version":"3.50.1"},"reference-count":81,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2012,1,11]],"date-time":"2012-01-11T00:00:00Z","timestamp":1326240000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGMOD Rec."],"published-print":{"date-parts":[[2012,1,11]]},"abstract":"<jats:p>A prominent parallel data processing tool MapReduce is gaining significant momentum from both industry and academia as the volume of data to analyze grows rapidly. While MapReduce is used in many areas where massive data analysis is required, there are still debates on its performance, efficiency per node, and simple abstraction. This survey intends to assist the database and open source communities in understanding various technical aspects of the MapReduce framework. In this survey, we characterize the MapReduce framework and discuss its inherent pros and cons. We then introduce its optimization strategies reported in the recent literature. We also discuss the open issues and challenges raised on parallel data analysis with MapReduce.<\/jats:p>","DOI":"10.1145\/2094114.2094118","type":"journal-article","created":{"date-parts":[[2012,1,17]],"date-time":"2012-01-17T17:21:44Z","timestamp":1326820904000},"page":"11-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":385,"title":["Parallel data processing with MapReduce"],"prefix":"10.1145","volume":"40","author":[{"given":"Kyong-Ha","family":"Lee","sequence":"first","affiliation":[{"name":"KAIST"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yoon-Joon","family":"Lee","sequence":"additional","affiliation":[{"name":"KAIST"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hyunsik","family":"Choi","sequence":"additional","affiliation":[{"name":"Korea University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yon Dohn","family":"Chung","sequence":"additional","affiliation":[{"name":"Korea University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bongki","family":"Moon","sequence":"additional","affiliation":[{"name":"University of Arizona"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2012,1,11]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Mahout: Scalable machine-learning and data-mining library. http:\/\/mapout.apache.org 2010.  Mahout: Scalable machine-learning and data-mining library. http:\/\/mapout.apache.org 2010."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1739041.1739056"},{"key":"e_1_2_1_3_1","first-page":"169","volume-title":"The VLDB Journal","author":"Ailamaki A.","year":"2001","unstructured":"A. Ailamaki , D.J. DeWitt , M.D. Hill , and M. Skounakis . Weaving relations for cache performance . The VLDB Journal , pages 169 -- 180 , 2001 . A. Ailamaki, D.J. DeWitt, M.D. Hill, and M. Skounakis. Weaving relations for cache performance. The VLDB Journal, pages 169--180, 2001."},{"key":"e_1_2_1_4_1","volume-title":"Scaling Hadoop to 4000 nodes at Yahoo! http:\/\/goo.gl\/8dRMq","author":"Anand A.","year":"2008","unstructured":"A. Anand . Scaling Hadoop to 4000 nodes at Yahoo! http:\/\/goo.gl\/8dRMq , 2008 . A. Anand. Scaling Hadoop to 4000 nodes at Yahoo! http:\/\/goo.gl\/8dRMq, 2008."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807128.1807150"},{"key":"e_1_2_1_6_1","unstructured":"Nokia Research Center. Disco: Massive data- minimal code. http:\/\/discoproject.org 2010.  Nokia Research Center. Disco: Massive data- minimal code. http:\/\/discoproject.org 2010."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1921020"},{"issue":"5","key":"e_1_2_1_8_1","first-page":"1","article-title":"de Kruijf and K. Sankaralingam. Mapreduce for the cell broadband engine architecture","volume":"53","author":"M","year":"2009","unstructured":"M . de Kruijf and K. Sankaralingam. Mapreduce for the cell broadband engine architecture . IBM Journal of Research and Development , 53 ( 5 ):10: 1 -- 10 :12, 2009 . M. de Kruijf and K. Sankaralingam. Mapreduce for the cell broadband engine architecture. IBM Journal of Research and Development, 53(5):10:1--10:12, 2009.","journal-title":"IBM Journal of Research and Development"},{"key":"e_1_2_1_9_1","volume-title":"lessons and advice from building large distributed systems. Keynote from LADIS","author":"Dean J.","year":"2009","unstructured":"J. Dean . Designs , lessons and advice from building large distributed systems. Keynote from LADIS , 2009 . J. Dean. Designs, lessons and advice from building large distributed systems. Keynote from LADIS, 2009."},{"key":"e_1_2_1_10_1","first-page":"1","article-title":"MapReduce: A major step backwards","author":"DeWitt D.","year":"2008","unstructured":"D. DeWitt and M. Stonebraker . MapReduce: A major step backwards . The Database Column , 1 , 2008 . D. DeWitt and M. Stonebraker. MapReduce: A major step backwards. The Database Column, 1, 2008.","journal-title":"The Database Column"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687731"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/1988776.1988778"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/eScience.2008.62"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989423"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559865"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687609"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447738"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687568"},{"key":"e_1_2_1_19_1","volume-title":"Workshop on Software Tools for MultiCore Systems","author":"Catanzaro B.","year":"2008","unstructured":"B. Catanzaro A map reduce framework for programming graphics processors . In Workshop on Software Tools for MultiCore Systems , 2008 . B. Catanzaro et al. A map reduce framework for programming graphics processors. In Workshop on Software Tools for MultiCore Systems, 2008."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454152"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989426"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376726"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1871940.1871955"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2007.346181"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447913"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807128.1807148"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2010.248"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920903"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.14778\/1454159.1454204"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1646468.1646476"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453865"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1740390.1740400"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687567"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/1978665.1978670"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1365815.1365816"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807184"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247602"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1629175.1629198"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920908"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/eScience.2008.59"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1851476.1851593"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/1740390.1740405"},{"key":"e_1_2_1_44_1","first-page":"137","volume-title":"In Proceedings of the 6th USENIX OSDI","year":"2004","unstructured":"Jeffrey Dean et al. Mapreduce: Simplified data processing on large clusters . In In Proceedings of the 6th USENIX OSDI , pages 137 -- 150 , 2004 . Jeffrey Dean et al. Mapreduce: Simplified data processing on large clusters. In In Proceedings of the 6th USENIX OSDI, pages 137--150, 2004."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447919"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807223"},{"key":"e_1_2_1_47_1","unstructured":"Kenton etal Protocol Buffer- Google's data interchange format. http:\/\/code.google.com\/p\/protobuf\/.  Kenton et al. Protocol Buffer- Google's data interchange format. http:\/\/code.google.com\/p\/protobuf\/."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1272996.1273005"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559962"},{"key":"e_1_2_1_50_1","volume-title":"Conference on Innovative Data Systems Research (CIDR)","author":"Stonebraker M.","year":"2007","unstructured":"M. Stonebraker One size fits all? Part 2: Benchmarking results . In Conference on Innovative Data Systems Research (CIDR) , 2007 . M. Stonebraker et al. One size fits all? Part 2: Benchmarking results. In Conference on Innovative Data Systems Research (CIDR), 2007."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/1629175.1629197"},{"key":"e_1_2_1_52_1","first-page":"29","volume-title":"Proceedings of the 8th USENIX OSDI","author":"Zaharia M.","year":"2008","unstructured":"M. Zaharia Improving mapreduce performance in heterogeneous environments . In Proceedings of the 8th USENIX OSDI , pages 29 -- 42 , 2008 . M. Zaharia et al. Improving mapreduce performance in heterogeneous environments. In Proceedings of the 8th USENIX OSDI, pages 29--42, 2008."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/1454159.1454166"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTR.2009.5289201"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1155\/2005\/962135"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807222"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807273"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807275"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/1165389.945450"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2010.112"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTR.2009.5289149"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920886"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.5555\/1855711.1855732"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920906"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2010.10"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920881"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/1851290.1851296"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2011.5767933"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989424"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807272"},{"key":"e_1_2_1_71_1","first-page":"1","volume-title":"Proceedings of the 8th USENIX OSDI","author":"Yu Y.","year":"2008","unstructured":"Y. Yu : A system for general-purpose distributed data-parallel computing using a high-level language . In Proceedings of the 8th USENIX OSDI , pages 1 -- 14 , 2008 . Y. Yu et al. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of the 8th USENIX OSDI, pages 1--14, 2008."},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/1558334.1558339"},{"key":"e_1_2_1_73_1","unstructured":"Saptarshi Guha. RHIPE-R and Hadoop Integrated Processing Environment. http:\/\/www.stat.purdue.edu\/sguha\/rhipe\/ 2010.  Saptarshi Guha. RHIPE-R and Hadoop Integrated Processing Environment. http:\/\/www.stat.purdue.edu\/sguha\/rhipe\/ 2010."},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920862"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.2200\/S00274ED1V01Y201006HLT007"},{"key":"e_1_2_1_76_1","volume-title":"Proceedings of sort benchmark","author":"O'Malley O.","year":"2009","unstructured":"O. O'Malley and A.C. Murthy . Winning a 60 second dash with a yellow elephant . Proceedings of sort benchmark , 2009 . O. O'Malley and A.C. Murthy. Winning a 60 second dash with a yellow elephant. Proceedings of sort benchmark, 2009."},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327491"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp236"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2005.1"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-11-S12-S1"},{"key":"e_1_2_1_81_1","volume-title":"The Definitive Guide","author":"White T.","year":"2010","unstructured":"T. White . Hadoop : The Definitive Guide . Yahoo Press , 2010 . T. White. Hadoop: The Definitive Guide. Yahoo Press, 2010."}],"container-title":["ACM SIGMOD Record"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2094114.2094118","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2094114.2094118","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T09:48:41Z","timestamp":1750240121000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2094114.2094118"}},"subtitle":["a survey"],"short-title":[],"issued":{"date-parts":[[2012,1,11]]},"references-count":81,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2012,1,11]]}},"alternative-id":["10.1145\/2094114.2094118"],"URL":"https:\/\/doi.org\/10.1145\/2094114.2094118","relation":{},"ISSN":["0163-5808"],"issn-type":[{"value":"0163-5808","type":"print"}],"subject":[],"published":{"date-parts":[[2012,1,11]]},"assertion":[{"value":"2012-01-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}