{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T22:29:45Z","timestamp":1772490585458,"version":"3.50.1"},"reference-count":22,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2015,10]]},"abstract":"<jats:p>In the era of global-scale services, big data analytical queries are often required to process datasets that span multiple data centers (DCs). In this setting, cross-DC bandwidth is often the scarcest, most volatile, and\/or most expensive resource. However, current widely deployed big data analytics frameworks make no attempt to minimize the traffic traversing these links.<\/jats:p>\n          <jats:p>\n            In this paper, we present P\n            <jats:sc>ixida<\/jats:sc>\n            , a scheduler that aims to minimize data movement across resource constrained links. To achieve this, we introduce a new abstraction called S\n            <jats:sc>ilo<\/jats:sc>\n            , which is key to modeling P\n            <jats:sc>ixida<\/jats:sc>\n            's scheduling goals as a graph partitioning problem. Furthermore, we show that existing graph partitioning problem formulations do not map to how big data jobs work, causing their solutions to miss opportunities for avoiding data movement. To address this, we formulate a new graph partitioning problem and propose a novel algorithm to solve it. We integrated P\n            <jats:sc>ixida<\/jats:sc>\n            in Spark and our experiments show that, when compared to existing schedulers, P\n            <jats:sc>ixida<\/jats:sc>\n            achieves a significant traffic reduction of up to ~ 9x on the aforementioned links.\n          <\/jats:p>","DOI":"10.14778\/2850578.2850582","type":"journal-article","created":{"date-parts":[[2016,2,1]],"date-time":"2016-02-01T14:10:31Z","timestamp":1454335831000},"page":"72-83","source":"Crossref","is-referenced-by-count":69,"title":["Pixida"],"prefix":"10.14778","volume":"9","author":[{"given":"Konstantinos","family":"Kloudas","sequence":"first","affiliation":[{"name":"University of Lisbon"}]},{"given":"Margarida","family":"Mamede","sequence":"additional","affiliation":[{"name":"Universidade NOVA de Lisboa"}]},{"given":"Nuno","family":"Pregui\u00e7a","sequence":"additional","affiliation":[{"name":"Universidade NOVA de Lisboa"}]},{"given":"Rodrigo","family":"Rodrigues","sequence":"additional","affiliation":[{"name":"University of Lisbon"}]}],"member":"320","published-online":{"date-parts":[[2015,10]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Amazon EC2 Pricing. http:\/\/aws.amazon.com\/ec2\/pricing\/.  Amazon EC2 Pricing. http:\/\/aws.amazon.com\/ec2\/pricing\/."},{"key":"e_1_2_1_2_1","first-page":"281","volume-title":"Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI'12","author":"Agarwal S.","year":"2012","unstructured":"S. Agarwal , S. Kandula , N. Bruno , M.-C. Wu , I. Stoica , and J. Zhou . Re-optimizing Data-parallel Computing . In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI'12 , pages 281 -- 294 , San Jose, CA, USA , 2012 . USENIX Association. S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou. Re-optimizing Data-parallel Computing. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI'12, pages 281--294, San Jose, CA, USA, 2012. USENIX Association."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536222.2536229"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465272"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1966445.1966449"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213959"},{"key":"e_1_2_1_7_1","first-page":"251","volume-title":"Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI'12","author":"Corbett J. C.","year":"2012","unstructured":"J. C. Corbett , J. Dean , M. Epstein , A. Fikes , C. Frost , J. Furman , S. Ghemawat , A. Gubarev , C. Heiser , P. Hochschild , W. Hsieh , S. Kanthak , E. Kogan , H. Li , A. Lloyd , S. Melnik , D. Mwaura , D. Nagle , S. Quinlan , R. Rao , L. Rolig , Y. Saito , M. Szymaniak , C. Taylor , R. Wang , and D. Woodford . Spanner: Google's Globally-Distributed Database . In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI'12 , pages 251 -- 264 , Hollywood, CA , 2012 . USENIX Association. J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's Globally-Distributed Database. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI'12, pages 251--264, Hollywood, CA, 2012. USENIX Association."},{"key":"e_1_2_1_8_1","volume-title":"Introduction to Algorithms","author":"Cormen T. H.","year":"2009","unstructured":"T. H. Cormen , C. E. Leiserson , R. L. Rivest , and C. Stein . Introduction to Algorithms . McGraw-Hill Higher Education , 3 rd edition, 2009 . T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. McGraw-Hill Higher Education, 3rd edition, 2009.","edition":"3"},{"key":"e_1_2_1_9_1","first-page":"137","volume-title":"Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation, OSDI'04","author":"Dean J.","year":"2004","unstructured":"J. Dean and S. Ghemawat . MapReduce: Simplified Data Processing on Large Clusters . In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation, OSDI'04 , pages 137 -- 149 , San Francisco, CA, USA , 2004 . USENIX Association. J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation, OSDI'04, pages 137--149, San Francisco, CA, USA, 2004. USENIX Association."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/2627817.2627840"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2168836.2168847"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732977.2732999"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.105"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2785956.2787505"},{"key":"e_1_2_1_16_1","first-page":"275","volume-title":"Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI'14","author":"Rabkin A.","year":"2014","unstructured":"A. Rabkin , M. Arye , S. Sen , V. S. Pai , and M. J. Freedman . Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area . In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI'14 , pages 275 -- 288 , Seattle, WA, USA , 2014 . USENIX Association. A. Rabkin, M. Arye, S. Sen, V. S. Pai, and M. J. Freedman. Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI'14, pages 275--288, Seattle, WA, USA, 2014. USENIX Association."},{"key":"e_1_2_1_17_1","volume-title":"Inc.","author":"Vazirani V. V.","year":"2001","unstructured":"V. V. Vazirani . Approximation Algorithms . Springer-Verlag New York , Inc. , New York, NY, USA , 2001 . V. V. Vazirani. Approximation Algorithms. Springer-Verlag New York, Inc., New York, NY, USA, 2001."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2735365"},{"key":"e_1_2_1_19_1","first-page":"323","volume-title":"Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation, NSDI'15","author":"Vulimiri A.","year":"2015","unstructured":"A. Vulimiri , C. Curino , P. B. Godfrey , T. Jungblut , J. Padhye , and G. Varghese . Global Analytics in the Face of Bandwidth and Regulatory Constraints . In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation, NSDI'15 , pages 323 -- 336 , Oakland, CA, USA , 2015 . USENIX Association. A. Vulimiri, C. Curino, P. B. Godfrey, T. Jungblut, J. Padhye, and G. Varghese. Global Analytics in the Face of Bandwidth and Regulatory Constraints. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation, NSDI'15, pages 323--336, Oakland, CA, USA, 2015. USENIX Association."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522730"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1755913.1755940"},{"key":"e_1_2_1_22_1","first-page":"15","volume-title":"Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI'12","author":"Zaharia M.","year":"2012","unstructured":"M. Zaharia , M. Chowdhury , T. Das , A. Dave , J. Ma , M. McCauly , M. J. Franklin , S. Shenker , and I. Stoica . Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing . In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI'12 , pages 15 -- 28 , San Jose, CA, USA , 2012 . USENIX. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI'12, pages 15--28, San Jose, CA, USA, 2012. USENIX."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522737"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2850578.2850582","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:16:52Z","timestamp":1672222612000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2850578.2850582"}},"subtitle":["optimizing data parallel jobs in wide-area data analytics"],"short-title":[],"issued":{"date-parts":[[2015,10]]},"references-count":22,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2015,10]]}},"alternative-id":["10.14778\/2850578.2850582"],"URL":"https:\/\/doi.org\/10.14778\/2850578.2850582","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2015,10]]}}}