{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,1]],"date-time":"2025-10-01T15:23:00Z","timestamp":1759332180502},"reference-count":47,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2013,5,25]],"date-time":"2013-05-25T00:00:00Z","timestamp":1369440000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Peer-to-Peer Netw. Appl."],"published-print":{"date-parts":[[2013,12]]},"DOI":"10.1007\/s12083-013-0213-7","type":"journal-article","created":{"date-parts":[[2013,5,24]],"date-time":"2013-05-24T06:33:32Z","timestamp":1369377212000},"page":"409-424","source":"Crossref","is-referenced-by-count":39,"title":["Handling partitioning skew in MapReduce using LEEN"],"prefix":"10.1007","volume":"6","author":[{"given":"Shadi","family":"Ibrahim","sequence":"first","affiliation":[]},{"given":"Hai","family":"Jin","sequence":"additional","affiliation":[]},{"given":"Lu","family":"Lu","sequence":"additional","affiliation":[]},{"given":"Bingsheng","family":"He","sequence":"additional","affiliation":[]},{"given":"Gabriel","family":"Antoniu","sequence":"additional","affiliation":[]},{"given":"Song","family":"Wu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2013,5,25]]},"reference":[{"key":"213_CR1","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1145\/1327452.1327492","volume":"51","author":"J Dean","year":"2008","unstructured":"Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51:107\u2013113","journal-title":"Commun ACM"},{"key":"213_CR2","first-page":"3","volume-title":"Tools and technologies for building the clouds. Cloud computing: principles systems and applications, Computer Communications and Networks","author":"H Jin","year":"2010","unstructured":"Jin H, Ibrahim S, Bell T, Qi L, Cao H, Wu S, Shi X (2010) Tools and technologies for building the clouds. Cloud computing: principles systems and applications, Computer Communications and Networks. Springer-Verlag, Berlin, pp 3\u201320"},{"key":"213_CR3","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1002\/9780470940105.ch14","volume-title":"Cloud computing: principles and paradigms","author":"H Jin","year":"2011","unstructured":"Jin H, Ibrahim S, Qi L, Cao H, Wu S, Shi X (2011) The mapreduce programming model and implementations. In: Buyya R, Broberg J, Goscinski A (eds) Cloud computing: principles and paradigms. Wiley, Hoboken, pp 373\u2013390"},{"key":"213_CR4","doi-asserted-by":"crossref","unstructured":"Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of the 2nd ACM European conference on computer systems (EuroSys \u201907), Lisbon, pp 59\u201372","DOI":"10.1145\/1272996.1273005"},{"key":"213_CR5","doi-asserted-by":"crossref","unstructured":"He B, Fang W, Luo Q, Govindaraju NK, Wang T (2008) Mars: a mapreduce framework on graphics processors. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, Toronto, pp 260\u2013269","DOI":"10.1145\/1454115.1454152"},{"key":"213_CR6","doi-asserted-by":"crossref","unstructured":"Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C (2007) Evaluating mapreduce for multi-core and multiprocessor systems. In: Proceedings of the 2007 IEEE 13th international symposium on high performance computer architecture (HPCA-13), Phoenix, pp 13\u201324","DOI":"10.1109\/HPCA.2007.346181"},{"key":"213_CR7","unstructured":"Hadoop project (2011) http:\/\/lucene.apache.org\/hadoop"},{"key":"213_CR8","unstructured":"Yahoo! (2011) Yahoo! developer network, http:\/\/developer.yahoo.com\/blogs\/hadoop\/2008\/02\/yahoo-worldslargest-production-hadoop.html"},{"key":"213_CR9","unstructured":"Hadoop (2011) Applications powered by hadoop: http:\/\/wiki.apache.org\/hadoop\/PoweredB"},{"issue":"5","key":"213_CR10","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1145\/1165389.945450","volume":"37","author":"S Ghemawat","year":"2003","unstructured":"Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. SIGOPS - Oper Syst Rev 37(5):29\u201343","journal-title":"SIGOPS - Oper Syst Rev"},{"key":"213_CR11","unstructured":"Condie T, Conway N, Alvaro P, Hellerstein JM, Elmeleegy K, Sears R (2010) Mapreduce online. In: Proceedings of the 7th USENIX conference on networked systems design and implementation (NSDI\u201910), San Jose"},{"key":"213_CR12","unstructured":"Kwon Y, Balazinska M, Howe B, Rolia J (2011) A study of skew in mapreduce applications, http:\/\/nuage.cs.washington.edu\/pubs\/opencirrus2011.pdf"},{"key":"213_CR13","doi-asserted-by":"crossref","unstructured":"Qiu X, Ekanayake J, Beason S, Gunarathne T, Fox G, Barga R, Gannon D (2009) Cloud technologies for bioinformatics applications. In: Proceedings of the 2nd workshop on many-task computing on grids and supercomputers (MTAGS \u201909)","DOI":"10.1145\/1646468.1646474"},{"key":"213_CR14","unstructured":"Lin J (2009) The curse of zipf and limits to parallelization: a look at the stragglers problem in mapreduce. In: Proceedings of the 7th workshop on large-scale distributed systems for information retrieval (LSDS-IR\u201909)"},{"key":"213_CR15","unstructured":"DeWitt DJ, Stonebraker M (2008) Mapreduce: a major step backwards, http:\/\/databasecolumn.vertica.com\/databaseinnovation\/mapreduce-a-major-step-backwards"},{"key":"213_CR16","unstructured":"Wiley K, Connolly A, Gardner JP, Krughof S, Balazinska M, Howe B, Kwon Y, Bu Y (2011) Astronomy in the cloud: using MapReduce for image coaddition, CoRR abs\/1010.1015"},{"key":"213_CR17","doi-asserted-by":"crossref","unstructured":"Chen R, Yang M, Weng X, Choi B, He B, Li X (2012) Improving large graph processing on partitioned graphs in the cloud. In: Proceedings of the third ACM symposium on cloud computing, SoCC \u201912, ACM, New York, pp 3:1\u20133:13. doi: 10.1145\/2391229.2391232 . http:\/\/doi.acm.org\/10.1145\/2391229.2391232","DOI":"10.1145\/2391229.2391232"},{"key":"213_CR18","doi-asserted-by":"crossref","first-page":"1363","DOI":"10.1093\/bioinformatics\/btp236","volume":"25","author":"MC Schatz","year":"2009","unstructured":"Schatz MC (2009) Cloudburst: highly sensitive read mapping with mapreduce. Bioinformatics 25:1363\u20131369","journal-title":"Bioinformatics"},{"key":"213_CR19","doi-asserted-by":"crossref","unstructured":"Verma A, Llor\u00e0 X, Goldberg DE, Campbell RH (2009) Scaling genetic algorithms using MapReduce. In: Proceedings of the 2009 9th international conference on intelligent systems design and applications, pp 13\u201318","DOI":"10.1109\/ISDA.2009.181"},{"key":"213_CR20","unstructured":"Ng AY, Bradski G, Chu C-T, Olukotun K, Kim SK, Lin Y-A, Yu Y (2006) MapReduce for machine learning on multicore. In: Proceedings of the twentieth annual conference on neural information processing systems (NIPS \u201906), Vancouver, pp 281\u2013288"},{"key":"213_CR21","doi-asserted-by":"crossref","unstructured":"Lin J, Schatz M (2010) Design patterns for efficient graph algorithms in mapreduce. In: Proceedings of the eighth workshop on mining and learning with graphs, Washington, pp 78\u201385","DOI":"10.1145\/1830252.1830263"},{"key":"213_CR22","unstructured":"Xen hypervisor homepage (2011) http:\/\/www.xen.org\/"},{"key":"213_CR23","unstructured":"Amazon elastic compute cloud (2011) http:\/\/aws.amazon.com\/ec2\/"},{"key":"213_CR24","doi-asserted-by":"crossref","unstructured":"Ibrahim S, Jin H, Lu L, Wu S, He B, Qi L (2010) Leen: locality\/fairness-aware key partitioning for mapreduce in the cloud. In: Proceedings of the 2010 IEEE second international conference on cloud computing technology and science (CLOUDCOM\u201910), Indianapolis, pp 17\u201324","DOI":"10.1109\/CloudCom.2010.25"},{"key":"213_CR25","unstructured":"Jain R, Chiu D-M, Hawe W (1984) A quantitative measure of fairness and discrimination for resource allocation in shared computer systems, DEC Research Report TR-301"},{"key":"213_CR26","doi-asserted-by":"crossref","unstructured":"Ibrahim S, Jin H, Lu L, Qi L, Wu S, Shi X (2009) Evaluating mapreduce on virtual machines: the hadoop case. In: Proceedings of the 1st international conference on cloud computing (CLOUDCOM\u201909), Beijing, pp 519\u2013528","DOI":"10.1007\/978-3-642-10665-1_47"},{"key":"213_CR27","doi-asserted-by":"crossref","unstructured":"Ibrahim S, Jin H, Cheng B, Cao H, Wu S, Qi L (2009) Cloudlet: towards mapreduce implementation on virtual machines. In: Proceedings of the 18th ACM international symposium on high performance distributed computing (HPDC-18), Garching, pp 65\u201366","DOI":"10.1145\/1551609.1551624"},{"key":"213_CR28","doi-asserted-by":"crossref","unstructured":"Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th ACM European conference on computer systems (EuroSys\u201910), Paris, pp 265\u2013278","DOI":"10.1145\/1755913.1755940"},{"key":"213_CR29","doi-asserted-by":"crossref","unstructured":"Ibrahim S, Jin H, Lu L, He B, Antoniu G, Wu S (2012) Maestro: replica-aware map scheduling for mapreduce. In: Proceedings of the 12th IEEE\/ACM international symposium on cluster, cloud and grid computing (CCGrid 2012), Ottawa","DOI":"10.1109\/CCGrid.2012.122"},{"key":"213_CR30","doi-asserted-by":"crossref","unstructured":"Ibrahim S, Jin H, Lu L, He B, Wu S (2011) Adaptive disk i\/o scheduling for mapreduce in virtualized environment. In: Proceedings of the 2011 international conference on parallel processing (ICPP\u201911), Taipei, pp 335\u2013344","DOI":"10.1109\/ICPP.2011.86"},{"key":"213_CR31","doi-asserted-by":"crossref","unstructured":"Menon RK, Bhat GP, Schatz MC (2011) Rapid parallel genome indexing with MapReduce. In: Proceedings of the 2nd international workshop on MapReduce and its applications, San Jose, pp 51\u201358","DOI":"10.1145\/1996092.1996104"},{"key":"213_CR32","doi-asserted-by":"crossref","unstructured":"Ekanayake J, Pallickara S, Fox G (2008) Mapreduce for data intensive scientific analyses. In: Proceedings of the 2008 fourth IEEE international conference on eScience, pp 277\u2013284","DOI":"10.1109\/eScience.2008.59"},{"key":"213_CR33","doi-asserted-by":"crossref","unstructured":"Gunarathne T, Wu T-L, Qiu J, Fox G (2010) MapReduce in the clouds for science. In: Proceedings of the 2010 IEEE second international conference on cloud computing technology and science, pp 565\u2013572","DOI":"10.1109\/CloudCom.2010.107"},{"key":"213_CR34","doi-asserted-by":"crossref","unstructured":"Ganjisaffar Y, Debeauvais T, Javanmardi S, Caruana R, Lopes CV (2011) Distributed tuning of machine learning algorithms using MapReduce clusters. In: Proceedings of the 3rd workshop on large scale data mining: theory and applications, San Diego, pp 2:1\u20132:8","DOI":"10.1145\/2002945.2002947"},{"key":"213_CR35","doi-asserted-by":"crossref","unstructured":"Blanas S, Patel JM, Ercegovac V, Rao J, Shekita EJ, Tian Y (2010) A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the 2010 international conference on Management of data, Indianapolis, pp 975\u2013986","DOI":"10.1145\/1807167.1807273"},{"key":"213_CR36","unstructured":"Logothetis D, Trezzo C, Webb KC, Yocum K (2011) In-situ mapreduce for log processing. In: Proceedings of the 2011 USENIX conference on USENIX annual technical conference, Portland, pp 9\u20139"},{"key":"213_CR37","doi-asserted-by":"crossref","unstructured":"Seo S, Jang I, Woo K, Kim I, Kim J-S, Maeng S (2009) Hpmr: prefetching and pre-shuffling in shared mapreduce computation environment. In: Proceedings of the 2009 IEEE international conference on cluster computing (CLUSTER\u201909), New Orleans","DOI":"10.1109\/CLUSTR.2009.5289171"},{"issue":"6","key":"213_CR38","doi-asserted-by":"crossref","first-page":"843","DOI":"10.1016\/j.future.2010.09.001","volume":"27","author":"Y-L Su","year":"2011","unstructured":"Su Y-L, Chen P-C, Chang J-B, Shieh C-K (2011) Variable-sized map and locality-aware reduce on public-resource grids. Futur Gener Comput Syst 27(6):843\u2013849","journal-title":"Futur Gener Comput Syst"},{"key":"213_CR39","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1145\/129888.129894","volume":"35","author":"D DeWitt","year":"1992","unstructured":"DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database systems. Commun ACM 35:85\u201398","journal-title":"Commun ACM"},{"key":"213_CR40","unstructured":"Chen S, Schlosser SW (2008) Map-reduce meets wider varieties of applications, Tech. Rep. IRP-TR-08-05, Technical Report. Intel Research Pittsburgh"},{"key":"213_CR41","unstructured":"Ananthanarayanan G, Kandula S, Greenberg A, Stoica I, Lu Y, Saha B, Harris E (2010) Reining in the outliers in map-reduce clusters using mantri. In: Proceedings of the 9th USENIX conference on operating systems design and implementation (OSDI\u201910), Vancouver, pp 1\u201316"},{"key":"213_CR42","doi-asserted-by":"crossref","unstructured":"Kwon Y, Balazinska M, Howe B, Rolia J (2010) Skew-resistant parallel processing of feature-extracting scientific user-defined functions. In: Proceedings of the 1st ACM symposium on Cloud computing (SoCC \u201910)","DOI":"10.1145\/1807128.1807140"},{"key":"213_CR43","doi-asserted-by":"crossref","unstructured":"Gufler B, Augsten N, Reiser A, Kemper A (2012) Load balancing in mapreduce based on scalable cardinality estimates. In: Proceedings of the 2012 IEEE 28th international conference on data engineering (ICDE \u201912)","DOI":"10.1109\/ICDE.2012.58"},{"key":"213_CR44","doi-asserted-by":"crossref","unstructured":"Kwon Y, Balazinska M, Howe B, Rolia J (2012) Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data (SIGMOD \u201912)","DOI":"10.1145\/2213836.2213840"},{"key":"213_CR45","unstructured":"He B, Yang M, Guo Z, Chen R, Lin W, Su B, Wang H, Zhou L (2009) Wave computing in the cloud. In: Proceedings of the 12th conference on hot topics in operating systems (HotOS\u201909)"},{"key":"213_CR46","doi-asserted-by":"crossref","unstructured":"He B, Yang M, Guo Z, Chen R, Su B, Lin W, Zhou L (2010) Comet: batched stream processing for data intensive distributed computing. In: Proceedings of the 1st ACM symposium on Cloud computing (SoCC \u201910)","DOI":"10.1145\/1807128.1807139"},{"key":"213_CR47","doi-asserted-by":"crossref","unstructured":"Ibrahim S, He B, Jin H (2011) Towards pay-as-you-consume cloud computing. In: Proceedings of the 2011 IEEE international conference on services computing (SCC\u201911), Washington, DC, pp 370\u2013377","DOI":"10.1109\/SCC.2011.38"}],"container-title":["Peer-to-Peer Networking and Applications"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s12083-013-0213-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s12083-013-0213-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s12083-013-0213-7","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,7,13]],"date-time":"2019-07-13T23:46:50Z","timestamp":1563061610000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s12083-013-0213-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,5,25]]},"references-count":47,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,12]]}},"alternative-id":["213"],"URL":"https:\/\/doi.org\/10.1007\/s12083-013-0213-7","relation":{},"ISSN":["1936-6442","1936-6450"],"issn-type":[{"value":"1936-6442","type":"print"},{"value":"1936-6450","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,5,25]]}}}