{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,7]],"date-time":"2024-08-07T15:16:19Z","timestamp":1723043779078},"reference-count":59,"publisher":"Springer Science and Business Media LLC","issue":"12","license":[{"start":{"date-parts":[[2016,6,1]],"date-time":"2016-06-01T00:00:00Z","timestamp":1464739200000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2016,12]]},"DOI":"10.1007\/s11227-016-1760-5","type":"journal-article","created":{"date-parts":[[2016,6,2]],"date-time":"2016-06-02T00:13:24Z","timestamp":1464826404000},"page":"4573-4600","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters"],"prefix":"10.1007","volume":"72","author":[{"given":"Dipti","family":"Shankar","sequence":"first","affiliation":[]},{"given":"Xiaoyi","family":"Lu","sequence":"additional","affiliation":[]},{"given":"Md.","family":"Wasi-ur-Rahman","sequence":"additional","affiliation":[]},{"given":"Nusrat","family":"Islam","sequence":"additional","affiliation":[]},{"given":"Dhabaleswar K.","family":"Panda","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2016,6,1]]},"reference":[{"key":"1760_CR1","unstructured":"Adjacency List Representation: https:\/\/xlinux.nist.gov\/dads\/\/HTML\/adjacencyListRep.html"},{"key":"1760_CR2","unstructured":"Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases. VLDB \u201994, San Francisco, pp 487\u2013499"},{"key":"1760_CR3","unstructured":"Ahmad F, Lee S, Thottethodi M, Vijaykumar T (2012) PUMA: Purdue MapReduce Benchmarks Suite"},{"key":"1760_CR4","unstructured":"Ananthanarayanan G, Ghodsi A, Wang A, Borthakur D, Kandula S, Shenker S, Stoica I (2012) PACMan: coordinated memory caching for parallel jobs. In: Proceedings of the 9th USENIX conference on networked systems design and implementation. USENIX Association, p 20"},{"key":"1760_CR5","unstructured":"Apache Hadoop NextGen MapReduce (YARN): http:\/\/hadoop.apache.org\/docs\/current\/hadoop-yarn\/hadoop-yarn-site\/YARN.html"},{"key":"1760_CR6","unstructured":"Apache Mahout: http:\/\/mahout.apache.org"},{"key":"1760_CR7","unstructured":"Apache Spark: https:\/\/spark.apache.org"},{"key":"1760_CR8","unstructured":"BigDataBench: A Big Data Benchmark Suite. http:\/\/prof.ict.ac.cn\/BigDataBench"},{"key":"1760_CR9","doi-asserted-by":"crossref","unstructured":"Chen Y, Ganapathi A, Griffith R, Katz R (2011) The case for evaluating MapReduce performance using workload suites. In: 2011 IEEE 19th international symposium on modeling, analysis simulation of computer and telecommunication systems. MASCOTS (July 2011), pp 390\u2013399","DOI":"10.1109\/MASCOTS.2011.12"},{"key":"1760_CR10","unstructured":"Comet at SDSC: http:\/\/www.sdsc.edu\/services\/hpc\/hpc_systems.html#comet"},{"key":"1760_CR11","unstructured":"Connected-component labeling: http:\/\/en.wikipedia.org\/wiki\/Connected-component_labeling"},{"key":"1760_CR12","unstructured":"Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on Symposium on Operating Systems Design and Implementation. OSDI, San Francisco, CA (December 2004)"},{"key":"1760_CR13","unstructured":"Gordon at SDSC: http:\/\/www.sdsc.edu\/us\/resources\/gordon\/"},{"key":"1760_CR14","unstructured":"GraySort and MinuteSort at Yahoo on Hadoop 0.23: http:\/\/sortbenchmark.org\/Yahoo2013Sort.pdf"},{"key":"1760_CR15","unstructured":"Grep: http:\/\/wiki.apache.org\/hadoop\/Grep"},{"key":"1760_CR16","unstructured":"GridMix3: Emulating Production Workload for Apache Hadoop: https:\/\/developer.yahoo.com\/blogs\/hadoop\/gridmix3-emulating-production-workload-apache-hadoop-450.html"},{"key":"1760_CR17","first-page":"107","volume-title":"Proceedings of the 10th international conference on autonomic computing (ICAC\u2019 13)","author":"Y Guo","year":"2013","unstructured":"Guo Y, Rao J, Zhou X (2013) iShuffle: Improving Hadoop Performance with Shuffle-on-Write. Proceedings of the 10th international conference on autonomic computing (ICAC\u2019 13). USENIX, San Jose, pp 107\u2013117"},{"key":"1760_CR18","unstructured":"Hadoop Distributed File System (HDFS): https:\/\/hadoop.apache.org\/docs\/stable\/hadoop-project-dist\/hadoop-hdfs\/HdfsDesign.html"},{"key":"1760_CR19","unstructured":"High-Performance Big Data (HiBD). http:\/\/hibd.cse.ohio-state.edu"},{"key":"1760_CR20","doi-asserted-by":"crossref","unstructured":"Huang S, Huang J, Dai J, Xie T, Huang B (2010) The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: Proceedings of the 26th international conference on data engineering workshops. ICDEW, Long Beach, CA (March 2010)","DOI":"10.1109\/ICDEW.2010.5452747"},{"key":"1760_CR21","unstructured":"International Data Corporation (IDC): New IDC Worldwide HPC End-User Study Identifies Latest Trends in High Performance Computing Usage and Spending. http:\/\/www.idc.com\/getdoc.jsp?containerId=prUS24409313"},{"key":"1760_CR22","unstructured":"Inverted Index: https:\/\/en.wikipedia.org\/wiki\/Inverted_index"},{"key":"1760_CR23","doi-asserted-by":"crossref","unstructured":"Islam NS, Lu X, Rahman W, Shankar D, Panda DK (2015) Triple-H: a hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture . In: 15th IEEE\/ACM international symposium on cluster, cloud and grid computing. CCGrid, Shenzhen, China (May 2015)","DOI":"10.1109\/CCGrid.2015.161"},{"key":"1760_CR24","unstructured":"Islam NS, Lu X, Rahman MW, Jose J, Panda DK (2012) A micro-benchmark suite for evaluating HDFS operations on modern clusters. In: Proceedings of the 2nd workshop on Big Data benchmarking. WBDB"},{"key":"1760_CR25","doi-asserted-by":"crossref","unstructured":"Islam NS, Rahman MW, Jose J, Rajachandrasekar R, Wang H, Subramoni H, Murthy C, Panda DK (2012) High performance RDMA-based design of HDFS over InfiniBand. In: Proceedings of the international conference on high performance computing, networking, storage and analysis. SC (November 2012)","DOI":"10.1109\/SC.2012.65"},{"key":"1760_CR26","doi-asserted-by":"crossref","unstructured":"Islam NS, Lu X, Rahman MWu, Panda DKD (2014) SOR-HDFS: a SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS. In: Proceedings of the 23rd international symposium on high-performance parallel and distributed computing. HPDC \u201914, ACM, New York, pp 261\u2013264","DOI":"10.1145\/2600212.2600715"},{"key":"1760_CR27","unstructured":"Hartigan JA, MAW (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100\u2013108. http:\/\/www.jstor.org\/stable\/2346830"},{"key":"1760_CR28","doi-asserted-by":"crossref","unstructured":"Jia Z, Zhan J, Wang L, Han R, McKee SA, Yang Q, Luo C, Li J (2014) Characterizing and subsetting Big Data workloads. arXiv:1409.0792","DOI":"10.1109\/IISWC.2014.6983058"},{"key":"1760_CR29","doi-asserted-by":"crossref","unstructured":"Kang U, Tsourakakis CE, Faloutsos C (2009) PEGASUS: a peta-scale graph mining system - implementation and observation. In: Data mining, 2009. ICDM\u201909. Ninth IEEE international conference on IEEE, pp 229\u2013238","DOI":"10.1109\/ICDM.2009.14"},{"key":"1760_CR30","doi-asserted-by":"crossref","unstructured":"Kim K, Jeon K, Han H, gyu Kim S, Jung H, Yeom H (2008) MRBench: a benchmark for MapReduce framework. In: Proceedings of the IEEE 14th international conference on parallel and distributed systems. ICPADS, Melbourne, Victoria, Australia (December 2008)","DOI":"10.1109\/ICPADS.2008.70"},{"key":"1760_CR31","unstructured":"Kwon Y, Balazinska M, Howe B, Rolia J (2011) A study of skew in MapReduce applications. Open Cirrus Summit"},{"issue":"1","key":"1760_CR32","first-page":"24","volume":"36","author":"Y Kwon","year":"2013","unstructured":"Kwon Y, Ren K, Balazinska M, Howe B, Rolia J (2013) Managing skew in Hadoop. IEEE Data Eng Bull 36(1):24\u201333","journal-title":"IEEE Data Eng Bull"},{"key":"1760_CR33","doi-asserted-by":"crossref","unstructured":"Lu X, Islam NS, Rahman MW, Jose J, Subramoni H, Wang H, Panda DK (2013) High-performance design of Hadoop RPC with RDMA over InfiniBand. In: Proceedings of the IEEE 42th international conference on parallel processing. ICPP, Lyon","DOI":"10.1109\/ICPP.2013.78"},{"key":"1760_CR34","doi-asserted-by":"crossref","unstructured":"Lu X, Islam NS, Wasi-Ur-Rahman M, Panda DK (2013) A micro-benchmark suite for evaluating Hadoop RPC on high-performance networks. In: Proceedings of the 3rd workshop on Big Data benchmarking. WBDB (May 2013)","DOI":"10.1007\/978-3-319-10596-3_3"},{"key":"1760_CR35","doi-asserted-by":"crossref","unstructured":"Lu X, Rahman M, Islam N, Shankar D, Panda D (2014) Accelerating Spark with RDMA for Big Data processing: early experiences. In: High-performance interconnects (HOTI), 2014 IEEE 22nd annual symposium on, pp 9\u201316 (Aug 2014)","DOI":"10.1109\/HOTI.2014.15"},{"key":"1760_CR36","doi-asserted-by":"crossref","unstructured":"Lu X, Wang B, Zha L, Xu Z (2011) Can MPI benefit Hadoop and MapReduce applications? In: Proceedings of the IEEE 40th international conference on parallel processing workshops. ICPPW (September 2011)","DOI":"10.1109\/ICPPW.2011.56"},{"key":"1760_CR37","unstructured":"Lustre filesystem: http:\/\/lustre.org"},{"key":"1760_CR38","unstructured":"Memory Storage Support in HDFS: https:\/\/hadoop.apache.org\/docs\/r2.7.1\/hadoop-project-dist\/hadoop-hdfs\/MemoryStorage.html"},{"key":"1760_CR39","doi-asserted-by":"crossref","unstructured":"Ming Z, Luo C, Gao W, Han R, Yang Q, Wang L, Zhan J (2014) BDGS: a scalable Big Data generator suite in Big Data benchmarking. arXiv:1401.5465","DOI":"10.1007\/978-3-319-10596-3_11"},{"key":"1760_CR40","unstructured":"NullOutputFormat (Hadoop 1.2.1 API). https:\/\/hadoop.apache.org\/docs\/r1.2.1\/api\/org\/apache\/hadoop\/mapred\/lib\/NullOutputFormat.html"},{"key":"1760_CR41","unstructured":"PageRank: http:\/\/en.wikipedia.org\/wiki\/PageRank"},{"key":"1760_CR42","unstructured":"PoweredBy-Hadoop Wiki: https:\/\/wiki.apache.org\/hadoop\/PoweredBy"},{"key":"1760_CR43","unstructured":"PUMA MapReduce Benchmarks: https:\/\/engineering.purdue.edu\/~puma\/pumabenchmarks.htm"},{"key":"1760_CR44","unstructured":"Rahman MW, Lu X, Islam NS, Rajachadrasekar R, Panda DK (2015) High-performance design of YARN MapReduce on modern HPC clusters with Lustre and RDMA. In: 29th IEEE international parallel and distributed processing symposium. IPDPS (May 2015)"},{"key":"1760_CR45","unstructured":"Rahman MW, Lu X, Islam N, Rajachandrasekar R, Panda D (2014) MapReduce over Lustre: Can RDMA-based approach benefit? In: Euro-Par 2014 parallel processing, lecture notes in computer science, vol 8632. Springer International Publishing (August 2014), pp 644\u2013655"},{"key":"1760_CR46","doi-asserted-by":"crossref","unstructured":"Rahman MW, Islam NS, Lu X, Jose J, Subramoni H, Wang H, Panda DK (2013) High-performance RDMA-based design of Hadoop MapReduce over InfiniBand. In: International workshop on high performance data intensive computing. HPDIC, Boston (May 2013)","DOI":"10.1109\/SC.2012.65"},{"key":"1760_CR47","doi-asserted-by":"crossref","unstructured":"Rahman MW, Lu X, Islam NS, Panda DK (2014) HOMR: a hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects. In: Proceedings of the 28th ACM international conference on supercomputing. ICS, ACM, Munich, pp 33\u201342 (June 2014)","DOI":"10.1145\/2597652.2597684"},{"key":"1760_CR48","doi-asserted-by":"crossref","unstructured":"Sangroya A, Serrano D, Bouchenak S (2013) MRBS: Towards dependability benchmarking for Hadoop MapReduce. In: Proceedings of the 18th international conference on parallel processing workshops. Euro-Par, Aachen (Aug 2013)","DOI":"10.1007\/978-3-642-36949-0_2"},{"key":"1760_CR49","doi-asserted-by":"crossref","unstructured":"Shankar D, Lu X, Rahman MW, Islam N, Panda DK (2014) A Micro-benchmark Suite for Evaluating Hadoop MapReduce on high-performance networks. In: Proceedings of the fifth workshop on Big Data benchmarks, performance optimization, and emerging hardware, BPOE-5, vol 8807. Springer International Publishing, Hangzhou, pp 19\u201333 (Sep 2014)","DOI":"10.1007\/978-3-319-13021-7_2"},{"key":"1760_CR50","unstructured":"Sort: http:\/\/wiki.apache.org\/hadoop\/Sort"},{"key":"1760_CR51","unstructured":"Stampede at TACC: http:\/\/www.tacc.utexas.edu\/resources\/hpc\/stampede"},{"key":"1760_CR52","unstructured":"Stanford Large Network Dataset Collection (SNAP): https:\/\/snap.stanford.edu\/data\/"},{"key":"1760_CR53","unstructured":"TeraSort Package: https:\/\/hadoop.apache.org\/docs\/current\/api\/org\/apache\/hadoop\/examples\/terasort\/package-summary.html"},{"key":"1760_CR54","unstructured":"The Apache Software Foundation: Apache Hadoop. http:\/\/hadoop.apache.org"},{"key":"1760_CR55","unstructured":"Top500 Supercomputing System: http:\/\/www.top500.org"},{"key":"1760_CR56","doi-asserted-by":"crossref","unstructured":"Wang L, Zhan J, Luo C, Zhu Y, Yang Q, He Y, Gao W, Jia Z, Shi Y, Zhang S, Zheng C, Lu G, Zhan K, Li X, Qiu B (2014) BigDataBench: a Big Data Benchmark Suite from Internet Services. In: Proceedings of the 20th IEEE international symposium on high performance computer architecture. HPCA, Orlando (Feb 2014)","DOI":"10.1109\/HPCA.2014.6835958"},{"key":"1760_CR57","doi-asserted-by":"crossref","unstructured":"Wang Y, Que X, Yu W, Goldenberg D, Sehgal D (2011) Hadoop acceleration through network levitated merge. In: Proceedings of international conference for high performance computing, networking, storage and analysis (SC). Seattle (Nov 2011)","DOI":"10.1145\/2063384.2063461"},{"key":"1760_CR58","unstructured":"Wikipedia Dumps: http:\/\/dumps.wikimedia.org\/enwiki\/"},{"key":"1760_CR59","unstructured":"WordCount: http:\/\/wiki.apache.org\/hadoop\/WordCount"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-016-1760-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s11227-016-1760-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-016-1760-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,30]],"date-time":"2022-06-30T23:21:09Z","timestamp":1656631269000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s11227-016-1760-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,6,1]]},"references-count":59,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2016,12]]}},"alternative-id":["1760"],"URL":"https:\/\/doi.org\/10.1007\/s11227-016-1760-5","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"value":"0920-8542","type":"print"},{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,6,1]]}}}