{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T06:06:30Z","timestamp":1767852390203,"version":"3.49.0"},"reference-count":69,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:p>Big data processing at the production scale presents a highly complex environment for resource optimization (RO), a problem crucial for meeting performance goals and budgetary constraints of analytical users. The RO problem is challenging because it involves a set of decisions (the partition count, placement of parallel instances on machines, and resource allocation to each instance), requires multi-objective optimization (MOO), and is compounded by the scale and complexity of big data systems while having to meet stringent time constraints for scheduling. This paper presents a MaxCompute based integrated system to support multi-objective resource optimization via fine-grained instance-level modeling and optimization. We propose a new architecture that breaks RO into a series of simpler problems, new fine-grained predictive models, and novel optimization methods that exploit these models to make effective instance-level RO decisions well under a second. Evaluation using production workloads shows that our new RO system could reduce 37--72% latency and 43--78% cost at the same time, compared to the current optimizer and scheduler, while running in 0.02-0.23s.<\/jats:p>","DOI":"10.14778\/3551793.3551855","type":"journal-article","created":{"date-parts":[[2022,9,29]],"date-time":"2022-09-29T22:25:03Z","timestamp":1664490303000},"page":"3098-3111","source":"Crossref","is-referenced-by-count":11,"title":["Fine-grained modeling and optimization for intelligent resource management in big data processing"],"prefix":"10.14778","volume":"15","author":[{"given":"Chenghao","family":"Lyu","sequence":"first","affiliation":[{"name":"University of Massachusetts, Amherst"}]},{"given":"Qi","family":"Fan","sequence":"additional","affiliation":[{"name":"Ecole Polytechnique"}]},{"given":"Fei","family":"Song","sequence":"additional","affiliation":[{"name":"Ecole Polytechnique"}]},{"given":"Arnab","family":"Sinha","sequence":"additional","affiliation":[{"name":"Ecole Polytechnique"}]},{"given":"Yanlei","family":"Diao","sequence":"additional","affiliation":[{"name":"University of Massachusetts, Amherst and Ecole Polytechnique"}]},{"given":"Wei","family":"Chen","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]},{"given":"Li","family":"Ma","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]},{"given":"Yihui","family":"Feng","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]},{"given":"Yaliang","family":"Li","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]},{"given":"Kai","family":"Zeng","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]},{"given":"Jingren","family":"Zhou","sequence":"additional","affiliation":[{"name":"Alibaba Group"}]}],"member":"320","published-online":{"date-parts":[[2022,9,29]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1201\/b17320"},{"key":"e_1_2_1_2_1","volume-title":"Towards Plan-aware Resource Allocation in Serverless Query Processing. In 12th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2020","author":"Bag Malay","year":"2020","unstructured":"Malay Bag , Alekh Jindal , and Hiren Patel . 2020 . Towards Plan-aware Resource Allocation in Serverless Query Processing. In 12th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2020 , July 13 --14 , 2020, Amar Phanishayee and Ryan Stutsman (Eds.). USENIX Association. https:\/\/www.usenix.org\/conference\/hotcloud20\/presentation\/bag Malay Bag, Alekh Jindal, and Hiren Patel. 2020. Towards Plan-aware Resource Allocation in Serverless Query Processing. In 12th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2020, July 13--14, 2020, Amar Phanishayee and Ryan Stutsman (Eds.). USENIX Association. https:\/\/www.usenix.org\/conference\/hotcloud20\/presentation\/bag"},{"key":"e_1_2_1_3_1","volume-title":"Hyracks: A flexible and extensible foundation for data-intensive computing. In ICDE. 1151--1162.","author":"Borkar Vinayak R.","year":"2011","unstructured":"Vinayak R. Borkar , Michael J. Carey , Raman Grover , Nicola Onose , and Rares Vernica . 2011 . Hyracks: A flexible and extensible foundation for data-intensive computing. In ICDE. 1151--1162. Vinayak R. Borkar, Michael J. Carey, Raman Grover, Nicola Onose, and Rares Vernica. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing. In ICDE. 1151--1162."},{"key":"e_1_2_1_4_1","volume-title":"Introduction to Algorithms","author":"Cormen Thomas H.","unstructured":"Thomas H. Cormen , Charles E. Leiserson , Ronald L. Rivest , and Clifford Stein . 2009. Introduction to Algorithms , Third Edition (3 rd ed.). The MIT Press . Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press.","edition":"3"},{"key":"e_1_2_1_5_1","volume-title":"Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization. CoRR abs\/2006.05078","author":"Daulton Samuel","year":"2020","unstructured":"Samuel Daulton , Maximilian Balandat , and Eytan Bakshy . 2020. Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization. CoRR abs\/2006.05078 ( 2020 ). arXiv:2006.05078 https:\/\/arxiv.org\/abs\/2006.05078 Samuel Daulton, Maximilian Balandat, and Eytan Bakshy. 2020. Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization. CoRR abs\/2006.05078 (2020). arXiv:2006.05078 https:\/\/arxiv.org\/abs\/2006.05078"},{"key":"e_1_2_1_6_1","volume-title":"OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation","author":"Dean Jeffrey","year":"2004","unstructured":"Jeffrey Dean and Sanjay Ghemawat . 2004 . MapReduce: simplified data processing on large clusters . In OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation ( San Francisco, CA). USENIX Association, Berkeley, CA, USA, 10--10. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: simplified data processing on large clusters. In OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation (San Francisco, CA). USENIX Association, Berkeley, CA, USA, 10--10."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2735372"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11047-018-9685-y"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687568"},{"key":"e_1_2_1_10_1","first-page":"19","article-title":"The Cascades Framework for Query Optimization","volume":"18","author":"Graefe Goetz","year":"1995","unstructured":"Goetz Graefe . 1995 . The Cascades Framework for Query Optimization . IEEE Data Eng. Bull. 18 , 3 (1995), 19 -- 29 . http:\/\/sites.computer.org\/debull\/95SEP-CD.pdf Goetz Graefe. 1995. The Cascades Framework for Query Optimization. IEEE Data Eng. Bull. 18, 3 (1995), 19--29. http:\/\/sites.computer.org\/debull\/95SEP-CD.pdf","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389741"},{"key":"e_1_2_1_12_1","volume-title":"Amar Shah, and Ryan P. Adams.","author":"Hern\u00e1ndez-Lobato Daniel","year":"2016","unstructured":"Daniel Hern\u00e1ndez-Lobato , Jos\u00e9 Miguel Hern\u00e1ndez-Lobato , Amar Shah, and Ryan P. Adams. 2016 . Predictive Entropy Search for Multi-objective Bayesian Optimization. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19--24, 2016 (JMLR Workshop and Conference Proceedings), Maria-Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48 . JMLR. org, 1492--1501. http:\/\/proceedings.mlr.press\/v48\/hernandez-lobatoa16.html Daniel Hern\u00e1ndez-Lobato, Jos\u00e9 Miguel Hern\u00e1ndez-Lobato, Amar Shah, and Ryan P. Adams. 2016. Predictive Entropy Search for Multi-objective Bayesian Optimization. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19--24, 2016 (JMLR Workshop and Conference Proceedings), Maria-Florina Balcan and Kilian Q. Weinberger (Eds.), Vol.48. JMLR.org, 1492--1501. http:\/\/proceedings.mlr.press\/v48\/hernandez-lobatoa16.html"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3461535.3461545"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the 28th International Conference on Very Large Data Bases","author":"Hulgeri Arvind","unstructured":"Arvind Hulgeri and S. Sudarshan . 2002. Parametric Query Optimization for Linear and Piecewise Linear Cost Functions . In Proceedings of the 28th International Conference on Very Large Data Bases ( Hong Kong, China) (VLDB '02). VLDB Endowment, 167--178. http:\/\/dl.acm.org\/citation.cfm?id=1287369.1287385 Arvind Hulgeri and S. Sudarshan. 2002. Parametric Query Optimization for Linear and Piecewise Linear Cost Functions. In Proceedings of the 28th International Conference on Very Large Data Bases (Hong Kong, China) (VLDB '02). VLDB Endowment, 167--178. http:\/\/dl.acm.org\/citation.cfm?id=1287369.1287385"},{"key":"e_1_2_1_15_1","volume-title":"Morpheus: Towards Automated SLOs for Enterprise Clusters. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016","author":"Jyothi Sangeetha Abdu","year":"2016","unstructured":"Sangeetha Abdu Jyothi , Carlo Curino , Ishai Menache , Shravan Matthur Narayanamurthy , Alexey Tumanov , Jonathan Yaniv , Ruslan Mavlyutov , I\u00f1igo Goiri , Subru Krishnan , Janardhan Kulkarni , and Sriram Rao . 2016 . Morpheus: Towards Automated SLOs for Enterprise Clusters. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016 , Savannah, GA, USA, November 2--4 , 2016. 117--134. https:\/\/www.usenix.org\/conference\/osdi16\/technical-sessions\/presentation\/jyothi Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, I\u00f1igo Goiri, Subru Krishnan, Janardhan Kulkarni, and Sriram Rao. 2016. Morpheus: Towards Automated SLOs for Enterprise Clusters. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2--4, 2016. 117--134. https:\/\/www.usenix.org\/conference\/osdi16\/technical-sessions\/presentation\/jyothi"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989355"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/3461535.3461549"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735461.2735464"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3184470.3184474"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476254"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/3494124.3494127"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2207.02026"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457276"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/2977797.2977804"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342644"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342646"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00158-003-0368-6"},{"key":"e_1_2_1_28_1","unstructured":"MaxCompute [n.d.]. Open Data Processing Service. https:\/\/www.alibabacloud.com\/product\/maxcompute.  MaxCompute [n.d.]. Open Data Processing Service. https:\/\/www.alibabacloud.com\/product\/maxcompute."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.2514\/2.936"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00158-002-0276-1"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522738"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476259"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452821"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2987550.2987566"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380584"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE51399.2021.00041"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/3368289.3368296"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452790"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/p15-1150"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/2977797.2977799"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687609"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610527"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735508.2735512"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2746484"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/3494124.3494126"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064029"},{"key":"e_1_2_1_47_1","unstructured":"M. van Steen and A.S. Tanenbaum. 2017. Distributed Systems (3 ed.).  M. van Steen and A.S. Tanenbaum. 2017. Distributed Systems (3 ed.)."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2523616.2523633"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2018.00156"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/3485450.3485458"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476327"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.14778\/3291264.3291267"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452830"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2012.14743"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465288"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.14778\/3421424.3421432"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.14778\/3368289.3368294"},{"key":"e_1_2_1_58_1","volume-title":"Graph Transformer Networks. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019","author":"Yun Seongjun","year":"2019","unstructured":"Seongjun Yun , Minbyul Jeong , Raehyun Kim , Jaewoo Kang , and Hyunwoo J. Kim . 2019 . Graph Transformer Networks. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 , NeurIPS 2019 , December 8--14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alch\u00e9-Buc, Emily B. Fox, and Roman Garnett (Eds.). 11960--11970. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/9d63484abb477c97640154d40595a3bb-Abstract.html Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J. Kim. 2019. Graph Transformer Networks. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alch\u00e9-Buc, Emily B. Fox, and Roman Garnett (Eds.). 11960--11970. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/9d63484abb477c97640154d40595a3bb-Abstract.html"},{"key":"e_1_2_1_59_1","volume-title":"Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation","author":"Zaharia Matei","year":"2012","unstructured":"Matei Zaharia , Mosharaf Chowdhury , Tathagata Das , Ankur Dave , Justin Ma , Murphy McCauley , Michael J. Franklin , Scott Shenker , and Ion Stoica . 2012 . Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing . In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation ( San Jose, CA) (NSDI'12). USENIX Association, Berkeley, CA, USA, 2--2. http:\/\/dl.acm.org\/citation.cfm?id=2228298.2228301 Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (San Jose, CA) (NSDI'12). USENIX Association, Berkeley, CA, USA, 2--2. http:\/\/dl.acm.org\/citation.cfm?id=2228298.2228301"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352103"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300085"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457291"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733012"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-012-0280-z"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.14778\/3397230.3397238"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.14778\/3461535.3461539"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476298"},{"key":"e_1_2_1_68_1","volume-title":"ClassyTune: A Performance Auto-Tuner for Systems in the Cloud","author":"Zhu Yuqing","year":"2019","unstructured":"Yuqing Zhu and Jianxun Liu . 2019. ClassyTune: A Performance Auto-Tuner for Systems in the Cloud . IEEE Transactions on Cloud Computing ( 2019 ), 1--1. Yuqing Zhu and Jianxun Liu. 2019. ClassyTune: A Performance Auto-Tuner for Systems in the Cloud. IEEE Transactions on Cloud Computing (2019), 1--1."},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3127479.3128605"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3551793.3551855","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:51:40Z","timestamp":1672224700000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3551793.3551855"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7]]},"references-count":69,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2022,7]]}},"alternative-id":["10.14778\/3551793.3551855"],"URL":"https:\/\/doi.org\/10.14778\/3551793.3551855","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,7]]}}}