{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T20:37:05Z","timestamp":1761597425416,"version":"3.41.0"},"reference-count":13,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2013,4,29]],"date-time":"2013-04-29T00:00:00Z","timestamp":1367193600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGMETRICS Perform. Eval. Rev."],"published-print":{"date-parts":[[2013,4,29]]},"abstract":"<jats:p>MapReduce\/Hadoop framework has been widely used to process large-scale datasets on computing clusters. Scheduling map tasks to improve data locality is crucial to the performance of MapReduce. Many works have been devoted to increasing data locality for better efficiency. However, to the best of our knowledge, fundamental limits of MapReduce computing clusters with data locality, including the capacity region and throughput optimal algorithms, have not been studied. In this paper, we address these problems from a stochastic network perspective. Our focus is to strike the right balance between data-locality and load-balancing to maximize throughput. We present a new queueing architecture and propose a map task scheduling algorithm constituted by the Join the Shortest Queue policy together with the MaxWeight policy. We identify an outer bound on the capacity region, and then prove that the proposed algorithm can stabilize any arrival rate vector strictly within this outer bound. It shows that the algorithm is throughput optimal and the outer bound coincides with the actual capacity region. The proofs in this paper deal with random processing time with different parameters and nonpreemptive tasks, which differentiate our work from many other works, so the proof technique itself is also a contribution of this paper.<\/jats:p>","DOI":"10.1145\/2479942.2479947","type":"journal-article","created":{"date-parts":[[2013,5,1]],"date-time":"2013-05-01T19:47:09Z","timestamp":1367437629000},"page":"33-42","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["A throughput optimal algorithm for map task scheduling in mapreduce with data locality"],"prefix":"10.1145","volume":"40","author":[{"given":"Weina","family":"Wang","sequence":"first","affiliation":[{"name":"Arizona State University, Tempe, Arizona"}]},{"given":"Kai","family":"Zhu","sequence":"additional","affiliation":[{"name":"Arizona State University, Tempe, Arizona"}]},{"given":"Lei","family":"Ying","sequence":"additional","affiliation":[{"name":"Arizona State University, Tempe, Arizona"}]},{"given":"Jian","family":"Tan","sequence":"additional","affiliation":[{"name":"IBM T. J. Watson Research Center, Yorktown Heights, New York"}]},{"given":"Li","family":"Zhang","sequence":"additional","affiliation":[{"name":"IBM T. J. Watson Research Center, Yorktown Heights, New York"}]}],"member":"320","published-online":{"date-parts":[[2013,4,29]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Hadoop. http:\/\/hadoop.apache.org.  Hadoop. http:\/\/hadoop.apache.org."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1966445.1966472"},{"key":"e_1_2_1_3_1","first-page":"20","volume-title":"Proc. Conf. Networked Systems Design and Implementations (USENIX)","author":"Ananthanarayanan G.","year":"2012","unstructured":"G. Ananthanarayanan , A. Ghodsi , A. Wang , D. Borthakur , S. Kandula , S. Shenker , and I. Stoica . Pacman: coordinated memory caching for parallel jobs . In Proc. Conf. Networked Systems Design and Implementations (USENIX) , pages 20 -- 20 , 2012 . G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. Pacman: coordinated memory caching for parallel jobs. In Proc. Conf. Networked Systems Design and Implementations (USENIX), pages 20--20, 2012."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/945445.945450"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1629575.1629601"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2010.112"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFCOM.2013.6566988"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2010.5496972"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/9.182479"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/18.212277"},{"key":"e_1_2_1_12_1","volume-title":"The definitive guide","author":"White T.","year":"2010","unstructured":"T. White . Hadoop : The definitive guide . Yahoo Press , 2010 . T. White. Hadoop: The definitive guide. Yahoo Press, 2010."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1755913.1755940"}],"container-title":["ACM SIGMETRICS Performance Evaluation Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2479942.2479947","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2479942.2479947","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T08:35:28Z","timestamp":1750235728000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2479942.2479947"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,4,29]]},"references-count":13,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,4,29]]}},"alternative-id":["10.1145\/2479942.2479947"],"URL":"https:\/\/doi.org\/10.1145\/2479942.2479947","relation":{},"ISSN":["0163-5999"],"issn-type":[{"type":"print","value":"0163-5999"}],"subject":[],"published":{"date-parts":[[2013,4,29]]},"assertion":[{"value":"2013-04-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}