{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,3,13]],"date-time":"2024-03-13T13:04:59Z","timestamp":1710335099810},"reference-count":11,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2013,8,28]]},"abstract":"<jats:p>The underlying assumption behind Hadoop and, more generally, the need for distributed processing is that the data to be analyzed cannot be held in memory on a single machine. Today, this assumption needs to be re-evaluated. Although petabyte-scale data-stores are increasingly common, it is unclear whether \"typical\" analytics tasks require more than a single high-end server. Additionally, we are seeing increased sophistication in analytics, e.g., machine learning, which generally operates over smaller and more refined datasets. To address these trends, we propose \"scaling down\" Hadoop to run on shared-memory machines. This paper presents a prototype runtime called Hone, intended to be both API and binary compatible with standard (distributed) Hadoop. That is, Hone can take an existing Hadoop jar and efficiently execute it, without modification, on a multi-core shared memory machine. This allows us to take existing Hadoop algorithms and find the most suitable run-time environment for execution on datasets of varying sizes. Our experiments show that Hone can be an order of magnitude faster than Hadoop pseudo-distributed mode (PDM); on dataset sizes that fit into memory, Hone can outperform a fully-distributed 15-node Hadoop cluster in some cases as well.<\/jats:p>","DOI":"10.14778\/2536274.2536314","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"1354-1357","source":"Crossref","is-referenced-by-count":13,"title":["Hone"],"prefix":"10.14778","volume":"6","author":[{"given":"K. Ashwin","family":"Kumar","sequence":"first","affiliation":[{"name":"University of Maryland, College Park"}]},{"given":"Jonathan","family":"Gluck","sequence":"additional","affiliation":[{"name":"University of Maryland, College Park"}]},{"given":"Amol","family":"Deshpande","sequence":"additional","affiliation":[{"name":"University of Maryland, College Park"}]},{"given":"Jimmy","family":"Lin","sequence":"additional","affiliation":[{"name":"University of Maryland, College Park"}]}],"member":"320","published-online":{"date-parts":[[2013,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"VLDB","author":"Bu Y.","year":"2010","unstructured":"Y. Bu , B. Howe , M. Balazinska , and M. Ernst . HaLoop: Efficient iterative data processing on large clusters . VLDB , 2010 . Y. Bu, B. Howe, M. Balazinska, and M. Ernst. HaLoop: Efficient iterative data processing on large clusters. VLDB, 2010."},{"key":"e_1_2_1_2_1","volume-title":"PACT","author":"Chen R.","year":"2010","unstructured":"R. Chen , H. Chen , and B. Zang . Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling . PACT , 2010 . R. Chen, H. Chen, and B. Zang. Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling. PACT, 2010."},{"key":"e_1_2_1_3_1","volume-title":"OSDI","author":"Dean J.","year":"2004","unstructured":"J. Dean and S. Ghemawat . MapReduce: Simplified data processing on large clusters . OSDI , 2004 . J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. OSDI, 2004."},{"key":"e_1_2_1_4_1","volume-title":"CCGRID","author":"Jiang W.","year":"2010","unstructured":"W. Jiang , V. Ravi , and G. Agrawal . A Map-Reduce system with an alternate API for multi-core environments . CCGRID , 2010 . W. Jiang, V. Ravi, and G. Agrawal. A Map-Reduce system with an alternate API for multi-core environments. CCGRID, 2010."},{"key":"e_1_2_1_6_1","volume-title":"HPCA","author":"Ranger C.","year":"2007","unstructured":"C. Ranger , R. Raghuraman , A. Penmetsa , G. Bradski , and C. Kozyrakis . Evaluating MapReduce for multi-core and multiprocessor systems . HPCA , 2007 . C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for multi-core and multiprocessor systems. HPCA, 2007."},{"key":"e_1_2_1_7_1","volume-title":"HotCloud","author":"Rowstron A.","year":"2012","unstructured":"A. Rowstron , D. Narayanan , A. Donnelly , G. O'Shea , and A. Douglas . Nobody ever got fired for using Hadoop on a cluster . HotCloud , 2012 . A. Rowstron, D. Narayanan, A. Donnelly, G. O'Shea, and A. Douglas. Nobody ever got fired for using Hadoop on a cluster. HotCloud, 2012."},{"key":"e_1_2_1_8_1","volume-title":"Aug.","author":"Shinnar A.","year":"2012","unstructured":"A. Shinnar , D. Cunningham , V. Saraswat , and B. Herta . M3R: increased performance for in-memory Hadoop jobs. VLDB, 5(12) , Aug. 2012 . A. Shinnar, D. Cunningham, V. Saraswat, and B. Herta. M3R: increased performance for in-memory Hadoop jobs. VLDB, 5(12), Aug. 2012."},{"key":"e_1_2_1_9_1","volume-title":"MapReduce","author":"Talbot J.","year":"2011","unstructured":"J. Talbot , R. Yoo , and C. Kozyrakis . Phoenix++: modular MapReduce for shared-memory systems . MapReduce , 2011 . J. Talbot, R. Yoo, and C. Kozyrakis. Phoenix++: modular MapReduce for shared-memory systems. MapReduce, 2011."},{"key":"e_1_2_1_10_1","volume-title":"IISWC","author":"Yoo R.","year":"2009","unstructured":"R. Yoo , A. Romano , and C. Kozyrakis . Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system . IISWC , 2009 . R. Yoo, A. Romano, and C. Kozyrakis. Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system. IISWC, 2009."},{"key":"e_1_2_1_11_1","volume-title":"HotCloud","author":"Zaharia M.","year":"2010","unstructured":"M. Zaharia , M. Chowdhury , M. Franklin , S. Shenker , and I. Stoica . Spark: cluster computing with working sets . HotCloud , 2010 . M. Zaharia, M. Chowdhury, M. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. HotCloud, 2010."},{"key":"e_1_2_1_12_1","volume-title":"WWW","author":"Zhai K.","year":"2012","unstructured":"K. Zhai , J. Boyd-Graber , N. Asadi , and M. Alkhouja . Mr . LDA: A flexible large scale topic modeling package using variational inference in MapReduce . WWW , 2012 . K. Zhai, J. Boyd-Graber, N. Asadi, and M. Alkhouja. Mr. LDA: A flexible large scale topic modeling package using variational inference in MapReduce. WWW, 2012."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2536274.2536314","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:57:08Z","timestamp":1672225028000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2536274.2536314"}},"subtitle":["\"Scaling down\" Hadoop on shared-memory systems"],"short-title":[],"issued":{"date-parts":[[2013,8]]},"references-count":11,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2013,8,28]]}},"alternative-id":["10.14778\/2536274.2536314"],"URL":"https:\/\/doi.org\/10.14778\/2536274.2536314","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2013,8]]}}}