{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T05:16:23Z","timestamp":1755839783693,"version":"3.32.0"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:p>\n            At internet scale companies like ByteDance, data is generated and consumed at enormously high speed by many different applications. Achieving low latency on such big data jobs is an important problem. However, the naive approach of aggregating all the data required by a job to a single location is not always feasible in a geo-distributed environment. Similarly, existing approaches in geo-distributed job scheduling often try to minimize WAN usage, which may come at the cost of latency. Another crucial element to ensure low latency is resource load balancing among DCs, which enables flexibility in job scheduling and avoids resource bottlenecks. Therefore, to minimize latency, optimizing job completion time (JCT) while maintaining resource utilization balance is important. To this end, we propose\n            <jats:italic>ResLake<\/jats:italic>\n            , a global scheduling platform for data-intensive workloads. ResLake aims to reduce JCT of geo-distributed applications while balancing the compute (CPU\/Memory) and storage (Disk) usages across DCs and efficiently using WAN interconnections. We have deployed ResLake in ByteDance's production for over 1.5 years. ResLake has scheduled billions of jobs since its deployment. We find that ResLake improves JCT of jobs by at least 20%, and can improve resource utilization balance across DCs by up to 53%.\n          <\/jats:p>","DOI":"10.14778\/3685800.3685817","type":"journal-article","created":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T17:25:21Z","timestamp":1731086721000},"page":"3934-3946","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["<i>ResLake<\/i>\n            : Towards Minimum Job Latency and Balanced Resource Utilization in Geo-Distributed Job Scheduling"],"prefix":"10.14778","volume":"17","author":[{"given":"Xinchun","family":"Zhang","sequence":"first","affiliation":[{"name":"ByteDance"}]},{"given":"Aqsa","family":"Kashaf","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Yihan","family":"Zou","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Wei","family":"Zhang","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Weibo","family":"Liao","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Haoxiang","family":"Song","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Jintao","family":"Ye","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Yakun","family":"Li","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Rui","family":"Shi","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Yong","family":"Tian","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Wei","family":"Feng","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Binbin","family":"Chen","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Zuzhi","family":"Chen","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Tieying","family":"Zhang","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Yongping","family":"Tang","sequence":"additional","affiliation":[{"name":"ByteDance"}]}],"member":"320","published-online":{"date-parts":[[2024,11,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Volley: Automated data placement for geo-distributed cloud services. In NSDI. USENIX, 2.","author":"Agarwal Sharad","year":"2010","unstructured":"Sharad Agarwal, John Dunagan, Navendu Jain, Stefan Saroiu, Alec Wolman, and Habinder Bhogan. 2010. Volley: Automated data placement for geo-distributed cloud services. In NSDI. USENIX, 2."},{"key":"e_1_2_1_2_1","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)","author":"Annamalai Muthukaruppan","year":"2018","unstructured":"Muthukaruppan Annamalai, Kaushik Ravichandran, Harish Srinivas, Igor Zinkovsky, Luning Pan, Tony Savor, David Nagle, and Michael Stumm. 2018. Sharding the shards: managing datastore locality at scale with Akkio. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX, 445--460."},{"key":"e_1_2_1_3_1","unstructured":"Apache Software Foundation. [n. d.]. Hadoop. https:\/\/hadoop.apache.org"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/2685048.2685077"},{"volume-title":"Convex analysis and optimization","author":"Bertsekas Dimitri","key":"e_1_2_1_5_1","unstructured":"Dimitri Bertsekas, Angelia Nedic, and Asuman Ozdaglar. 2003. Convex analysis and optimization. Vol. 1. Athena Scientific."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446760"},{"key":"e_1_2_1_7_1","first-page":"28","article-title":"Apache flink: Stream and batch processing in a single engine","volume":"38","author":"Carbone Paris","year":"2015","unstructured":"Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. The Bulletin of the Technical Committee on Data Engineering 38, 4 (2015), 28--38.","journal-title":"The Bulletin of the Technical Committee on Data Engineering"},{"key":"e_1_2_1_8_1","volume-title":"Data Availability and Durability with the Hadoop Distributed File System. login Usenix Mag. 37","author":"Chansler Robert J.","year":"2012","unstructured":"Robert J. Chansler. 2012. Data Availability and Durability with the Hadoop Distributed File System. login Usenix Mag. 37 (2012). https:\/\/api.semanticscholar.org\/CorpusID:2146015"},{"key":"e_1_2_1_9_1","volume-title":"Yang Zhang, and Samuel R Madden.","author":"Curino Carlo","year":"2010","unstructured":"Carlo Curino, Evan Philip Charles Jones, Yang Zhang, and Samuel R Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. (2010), 48--57."},{"volume-title":"Multiobjective Optimization Using Evolutionary Algorithms","author":"Deb Kalyan","key":"e_1_2_1_10_1","unstructured":"Kalyan Deb. 2001. Multiobjective Optimization Using Evolutionary Algorithms. Wiley, New York. Wiley."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/645496.658058"},{"key":"e_1_2_1_12_1","unstructured":"IBM Documentation. 2024. IBM Storage Ceph - Edge clusters Erasure-coding. https:\/\/www.ibm.com\/docs\/en\/storage-ceph\/7?topic=components-erasure-coding"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2486001.2486012"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2016.7524469"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352132"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3190508.3190528"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1272996.1273005"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2534169.2486019"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2829988.2787488"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3342195.3387551"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3419394.3423664"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00158-005-0557-6"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2626285"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CloudCom.2016.0032"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/2789770.2789791"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2829988.2787505"},{"key":"e_1_2_1_27_1","volume-title":"Nick Feamster, Renata Teixeira, Sam Crawford, and Antonio Pescap\u00e8.","author":"Sundaresan Srikanth","year":"2011","unstructured":"Srikanth Sundaresan, Walter De Donato, Nick Feamster, Renata Teixeira, Sam Crawford, and Antonio Pescap\u00e8. 2011. Broadband internet performance: a view from the gateway. ACM SIGCOMM computer communication review 41, 4 (2011), 134--145."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2013.95"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2523616.2523633"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2735365"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2012.09.001"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2018.2866993"},{"volume-title":"Integer and combinatorial optimization","author":"Wolsey Laurence A","key":"e_1_2_1_33_1","unstructured":"Laurence A Wolsey and George L Nemhauser. 2014. Integer and combinatorial optimization. John Wiley & Sons."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3620678.3624663"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10878-023-01051-4"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2934664"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3685800.3685817","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,31]],"date-time":"2024-12-31T05:35:39Z","timestamp":1735623339000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3685800.3685817"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8]]},"references-count":36,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["10.14778\/3685800.3685817"],"URL":"https:\/\/doi.org\/10.14778\/3685800.3685817","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2024,8]]},"assertion":[{"value":"2024-11-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}