{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T16:46:35Z","timestamp":1762015595958,"version":"3.41.0"},"reference-count":14,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2016,12,3]],"date-time":"2016-12-03T00:00:00Z","timestamp":1480723200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"project Handling Uncertainty in Data Intensive Applications"},{"name":"Operational Program \u201cEducation and Lifelong Learning,\u201d under the program THALES"},{"name":"European Union (European Social Fund) and Greek national funds"},{"name":"Rita Altura Trust Chair in Computer Sciences"},{"name":"Infrastructure Research in the Field of Advanced Computing and Cyber Security"},{"DOI":"10.13039\/501100003977","name":"Israel Science Foundation","doi-asserted-by":"crossref","award":["428\/11"],"award-info":[{"award-number":["428\/11"]}],"id":[{"id":"10.13039\/501100003977","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Lynne and William Frankel Center for Computer Sciences"},{"name":"Cabarnit Cyber Security MAGNET Consortium"},{"name":"Ministry of Science and Technolog"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2017,5,31]]},"abstract":"<jats:p>\n            A MapReduce algorithm can be described by a\n            <jats:italic>mapping schema<\/jats:italic>\n            , which assigns inputs to a set of reducers, such that for each required output there exists a reducer that receives all the inputs participating in the computation of this output. Reducers have a capacity that limits the sets of inputs they can be assigned. However, individual inputs may vary in terms of size. We consider, for the first time, mapping schemas where input sizes are part of the considerations and restrictions. One of the significant parameters to optimize in any MapReduce job is communication cost between the map and reduce phases. The communication cost can be optimized by minimizing the number of copies of inputs sent to the reducers. The communication cost is closely related to the number of reducers of constrained capacity that are used to accommodate appropriately the inputs, so that the requirement of how the inputs must meet in a reducer is satisfied. In this work, we consider a family of problems where it is required that each input meets with each other input in at least one reducer. We also consider a slightly different family of problems in which each input of a list,\n            <jats:italic>X<\/jats:italic>\n            , is required to meet each input of another list,\n            <jats:italic>Y<\/jats:italic>\n            , in at least one reducer. We prove that finding an optimal mapping schema for these families of problems is NP-hard, and present a bin-packing-based approximation algorithm for finding a near optimal mapping schema.\n          <\/jats:p>","DOI":"10.1145\/2987376","type":"journal-article","created":{"date-parts":[[2016,12,5]],"date-time":"2016-12-05T16:47:16Z","timestamp":1480956436000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Assignment Problems of Different-Sized Inputs in MapReduce"],"prefix":"10.1145","volume":"11","author":[{"given":"Foto","family":"Afrati","sequence":"first","affiliation":[{"name":"National Technical University of Athens, Zografou, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shlomi","family":"Dolev","sequence":"additional","affiliation":[{"name":"Ben-Gurion University, Beer-Sheva, Israel"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ephraim","family":"Korach","sequence":"additional","affiliation":[{"name":"Ben-Gurion University, Beer-Sheva, Israel"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shantanu","family":"Sharma","sequence":"additional","affiliation":[{"name":"Ben-Gurion University, Beer-Sheva, Israel"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jeffrey D.","family":"Ullman","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2016,12,3]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 2nd Workshop on Algorithms and Systems for MapReduce and Beyond (BeyondMR). 28--37. Also appears as a Brief Announcement in International Symposium on Distributed Computing (DISC)","author":"Afrati Foto","year":"2014","unstructured":"Foto Afrati , Shlomi Dolev , Ephraim Korach , Shantanu Sharma , and Jeffrey D. Ullman . 2015. Assignment of different-sized inputs in MapReduce . In Proceedings of the 2nd Workshop on Algorithms and Systems for MapReduce and Beyond (BeyondMR). 28--37. Also appears as a Brief Announcement in International Symposium on Distributed Computing (DISC) , 2014 , and as a technical report 14-05 at Department of Computer Science, Ben-Gurion University of the Negev. Foto Afrati, Shlomi Dolev, Ephraim Korach, Shantanu Sharma, and Jeffrey D. Ullman. 2015. Assignment of different-sized inputs in MapReduce. In Proceedings of the 2nd Workshop on Algorithms and Systems for MapReduce and Beyond (BeyondMR). 28--37. Also appears as a Brief Announcement in International Symposium on Distributed Computing (DISC), 2014, and as a technical report 14-05 at Department of Computer Science, Ben-Gurion University of the Negev."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.14778\/2535570.2488334"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2513591.2513663"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242591"},{"key":"e_1_2_1_5_1","unstructured":"E. G. Coffman Jr. M. R. Garey and D. S. Johnson. 1997. Approximation algorithms for bin packing: a survey. In Approximation Algorithms for NP-Hard Problems. PWS Publishing Co. 46--93.   E. G. Coffman Jr. M. R. Garey and D. S. Johnson. 1997. Approximation algorithms for bin packing: a survey. In Approximation Algorithms for NP-Hard Problems. PWS Publishing Co. 46--93."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI\u201904)","author":"Dean Jeffrey","year":"2004","unstructured":"Jeffrey Dean and Sanjay Ghemawat . 2004 . MapReduce: Simplified data processing on large clusters . In Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI\u201904) . 137--150. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI\u201904). 137--150."},{"key":"e_1_2_1_7_1","volume-title":"Johnson","author":"Garey M. R.","year":"1979","unstructured":"M. R. Garey and David S . Johnson . 1979 . Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman . M. R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman."},{"key":"e_1_2_1_8_1","unstructured":"Michael T. Goodrich. 2010. Simulating parallel algorithms in the mapreduce framework with applications to parallel computational geometry. arXiv preprint arXiv:1004.4708.  Michael T. Goodrich. 2010. Simulating parallel algorithms in the mapreduce framework with applications to parallel computational geometry. arXiv preprint arXiv:1004.4708."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-85363-3_9"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/1873601.1873677"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989493.1989505"},{"key":"e_1_2_1_13_1","volume-title":"Ullman","author":"Leskovec Jure","year":"2014","unstructured":"Jure Leskovec , Anand Rajaraman , and Jeffrey D . Ullman . 2014 . Mining of Massive Datasets, 2 nd ed. Cambridge University Press . Jure Leskovec, Anand Rajaraman, and Jeffrey D. Ullman. 2014. Mining of Massive Datasets, 2nd ed. Cambridge University Press.","edition":"2"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304607"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2331042.2331053"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2987376","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2987376","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:38:37Z","timestamp":1750282717000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2987376"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,12,3]]},"references-count":14,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2017,5,31]]}},"alternative-id":["10.1145\/2987376"],"URL":"https:\/\/doi.org\/10.1145\/2987376","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2016,12,3]]},"assertion":[{"value":"2015-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-12-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}