{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T15:29:15Z","timestamp":1778167755097,"version":"3.51.4"},"reference-count":72,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:p>\n            As Spark becomes a common big data analytics platform, its growing complexity makes automatic tuning of numerous parameters critical for performance. Our work on Spark parameter tuning is particularly motivated by two recent trends: Spark's\n            <jats:italic>Adaptive Query Execution<\/jats:italic>\n            (AQE) based on runtime statistics, and the increasingly popular\n            <jats:italic>Spark cloud deployments<\/jats:italic>\n            that make cost-performance reasoning crucial for the end user. This paper presents our design of\n            <jats:italic>a Spark optimizer that controls all tunable parameters of each query in the new AQE architecture to explore its performance benefits and, at the same time, casts the tuning problem in the theoretically sound multi-objective optimization (MOO) setting to better adapt to user cost-performance preferences.<\/jats:italic>\n            To this end, we propose a novel hybrid compile-time\/runtime approach to multi-granularity tuning of diverse, correlated Spark parameters, as well as a suite of modeling and optimization techniques to solve the tuning problem in the MOO setting while meeting the stringent time constraint of 1--2 seconds for cloud use. Evaluation results using TPC-H and TPC-DS benchmarks demonstrate the superior performance of our approach:\n            <jats:italic>(i<\/jats:italic>\n            ) When prioritizing latency, it achieves 63% and 65% reduction for TPC-H and TPC-DS, respectively, under an average solving time of 0.7--0.8 sec, outperforming the most competitive MOO method that reduces only 18--25% latency with 2.6--15 sec solving time.\n            <jats:italic>(ii)<\/jats:italic>\n            When shifting preferences between latency and cost, our approach dominates the solutions of alternative methods, exhibiting superior adaptability to varying preferences.\n          <\/jats:p>","DOI":"10.14778\/3681954.3682021","type":"journal-article","created":{"date-parts":[[2024,8,30]],"date-time":"2024-08-30T16:23:36Z","timestamp":1725035016000},"page":"3565-3579","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning"],"prefix":"10.14778","volume":"17","author":[{"given":"Chenghao","family":"Lyu","sequence":"first","affiliation":[{"name":"University of Massachusetts, Amherst"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qi","family":"Fan","sequence":"additional","affiliation":[{"name":"Ecole Polytechnique"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Philippe","family":"Guyard","sequence":"additional","affiliation":[{"name":"Ecole Polytechnique"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanlei","family":"Diao","sequence":"additional","affiliation":[{"name":"Ecole Polytechnique and University of Massachusetts, Amherst"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,8,30]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2797022.2797029"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742797"},{"key":"e_1_2_1_3_1","volume-title":"Hyracks: A flexible and extensible foundation for data-intensive computing. In ICDE. 1151--1162.","author":"Borkar Vinayak R.","year":"2011","unstructured":"Vinayak R. Borkar, Michael J. Carey, Raman Grover, Nicola Onose, and Rares Vernica. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing. In ICDE. 1151--1162."},{"key":"e_1_2_1_4_1","unstructured":"Google Cloud. 2022. Dataflow Pricing. https:\/\/cloud.google.com\/dataflow\/pricing"},{"key":"e_1_2_1_5_1","volume-title":"Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization. CoRR abs\/2006.05078","author":"Daulton Samuel","year":"2020","unstructured":"Samuel Daulton, Maximilian Balandat, and Eytan Bakshy. 2020. Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization. CoRR abs\/2006.05078 (2020). arXiv:2006.05078 https:\/\/arxiv.org\/abs\/2006.05078"},{"key":"e_1_2_1_6_1","volume-title":"OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation","author":"Dean Jeffrey","year":"2004","unstructured":"Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: simplified data processing on large clusters. In OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation (San Francisco, CA). USENIX Association, Berkeley, CA, USA, 10--10."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2735372"},{"key":"e_1_2_1_8_1","volume-title":"AAAI Workshop on Deep Learning on Graphs: Methods and Applications","author":"Dwivedi Vijay Prakash","year":"2021","unstructured":"Vijay Prakash Dwivedi and Xavier Bresson. 2021. A Generalization of Transformer Networks to Graphs. AAAI Workshop on Deep Learning on Graphs: Methods and Applications (2021)."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11047-018-9685-y"},{"key":"e_1_2_1_10_1","unstructured":"Wenchen Fan Herman van Hovell and MaryAnn Xue. 2020. Adaptive Query Execution: Speeding Up Spark SQL at Runtime. https:\/\/www.databricks.com\/blog\/2020\/05\/29\/adaptive-query-execution-speeding-up-spark-sql-at-runtime.html."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403299"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687568"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389741"},{"key":"e_1_2_1_14_1","volume-title":"Amar Shah, and Ryan P. Adams.","author":"Hern\u00e1ndez-Lobato Daniel","year":"2016","unstructured":"Daniel Hern\u00e1ndez-Lobato, Jos\u00e9 Miguel Hern\u00e1ndez-Lobato, Amar Shah, and Ryan P. Adams. 2016. Predictive Entropy Search for Multi-objective Bayesian Optimization. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19--24, 2016 (JMLR Workshop and Conference Proceedings), Maria-Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. JMLR.org, 1492--1501. http:\/\/proceedings.mlr.press\/v48\/hernandez-lobatoa16.html"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/3461535.3461545"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2021.3055019"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 28th International Conference on Very Large Data Bases","author":"Hulgeri Arvind","unstructured":"Arvind Hulgeri and S. Sudarshan. 2002. Parametric Query Optimization for Linear and Piecewise Linear Cost Functions. In Proceedings of the 28th International Conference on Very Large Data Bases (Hong Kong, China) (VLDB '02). VLDB Endowment, 167--178. http:\/\/dl.acm.org\/citation.cfm?id=1287369.1287385"},{"key":"e_1_2_1_18_1","volume-title":"Morpheus: Towards Automated SLOs for Enterprise Clusters. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016","author":"Jyothi Sangeetha Abdu","year":"2016","unstructured":"Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, I\u00f1igo Goiri, Subru Krishnan, Janardhan Kulkarni, and Sriram Rao. 2016. Morpheus: Towards Automated SLOs for Enterprise Clusters. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2--4, 2016. 117--134. https:\/\/www.usenix.org\/conference\/osdi16\/technical-sessions\/presentation\/jyothi"},{"key":"e_1_2_1_19_1","volume-title":"12th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2020","author":"Kanellis Konstantinos","year":"2020","unstructured":"Konstantinos Kanellis, Ramnatthan Alagappan, and Shivaram Venkataraman. 2020. Too Many Knobs to Tune? Towards Faster Database Tuning by Preselecting Important Knobs. In 12th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2020, July 13--14, 2020, Anirudh Badam and Vijay Chidambaram (Eds.). USENIX Association. https:\/\/www.usenix.org\/conference\/hotstorage20\/presentation\/kanellis"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989355"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3092931.3092934"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/321906.321910"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380591"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/3461535.3461549"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352129"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/S10619-018-7244-2"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611540.3611548"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE53745.2022.00195"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476254"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/3494124.3494127"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551855"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457562"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/2977797.2977804"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00158-003-0368-6"},{"key":"e_1_2_1_36_1","unstructured":"MaxCompute [n.d.]. Open Data Processing Service. https:\/\/www.alibabacloud.com\/product\/maxcompute."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1080\/00401706.2000.10485979"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.2514\/2.936"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00158-002-0276-1"},{"key":"e_1_2_1_40_1","volume-title":"Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5--8, 2013","author":"Mikolov Tom\u00e1s","year":"2013","unstructured":"Tom\u00e1s Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5--8, 2013, Lake Tahoe, Nevada, United States, Christopher J. C. Burges, L\u00e9on Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.). 3111--3119. https:\/\/proceedings.neurips.cc\/paper\/2013\/hash\/9aa42b31882ec039965f3c4923ce901b-Abstract.html"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522738"},{"key":"e_1_2_1_42_1","volume-title":"Intelligent Scaling in Amazon Redshift. In SIGMOD '24: International Conference on Management of Data","author":"Nathan Vikram","year":"2024","unstructured":"Vikram Nathan, Vikramank Singh, Zhengchun Liu, Mohammad Rahman, Andreas Kipf, Dominik Horn, Davide Pagano, Balakrishnan Narayanaswamy Gaurav Saxena, and Tim Kraska. [n.d.]. Intelligent Scaling in Amazon Redshift. In SIGMOD '24: International Conference on Management of Data, Philadelphia, 2024. ACM, 1--. To appear."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476259"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452821"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2987550.2987566"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2988336.2988337"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE51399.2021.00041"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452790"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589769"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/2977797.2977799"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687609"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610527"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735508.2735512"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2746484"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064029"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2523616.2523633"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.14778\/3485450.3485458"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.14778\/3484224.3484236"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476327"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452830"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","unstructured":"Ziniu Wu Amir Shaikhha Rong Zhu Kai Zeng Yuxing Han and Jingren Zhou. 2020. BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation. 10.48550\/ARXIV.2012.14743","DOI":"10.48550\/ARXIV.2012.14743"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526157"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465288"},{"key":"e_1_2_1_64_1","volume-title":"Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation","author":"Zaharia Matei","year":"2012","unstructured":"Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (San Jose, CA) (NSDI'12). USENIX Association, Berkeley, CA, USA, 2--2. http:\/\/dl.acm.org\/citation.cfm?id=2228298.2228301"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352103"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300085"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457291"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526176"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733012"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-012-0280-z"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.14778\/3461535.3461539"},{"key":"e_1_2_1_72_1","volume-title":"ClassyTune: A Performance Auto-Tuner for Systems in the Cloud","author":"Zhu Yuqing","year":"2019","unstructured":"Yuqing Zhu and Jianxun Liu. 2019. ClassyTune: A Performance Auto-Tuner for Systems in the Cloud. IEEE Transactions on Cloud Computing (2019), 1--1."},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/3127479.3128605"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3681954.3682021","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,4]],"date-time":"2024-09-04T18:28:13Z","timestamp":1725474493000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3681954.3682021"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7]]},"references-count":72,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["10.14778\/3681954.3682021"],"URL":"https:\/\/doi.org\/10.14778\/3681954.3682021","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,7]]},"assertion":[{"value":"2024-08-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}