{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T16:53:15Z","timestamp":1755795195047,"version":"3.44.0"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2025,2,27]],"date-time":"2025-02-27T00:00:00Z","timestamp":1740614400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"name":"3rd Xinjiang Scientific Expedition Program","award":["2021xjkk1300"],"award-info":[{"award-number":["2021xjkk1300"]}]},{"name":"Shenzhen Science and Technology Plan Project","award":["SGDX20220530111001003"],"award-info":[{"award-number":["SGDX20220530111001003"]}]},{"name":"Shenzhen Science and Technology Program","award":["CJGJZD20230724093659004"],"award-info":[{"award-number":["CJGJZD20230724093659004"]}]},{"name":"SIAT-Suntang Big Data&AI Joint Innovation Laboratory","award":["E3Z092"],"award-info":[{"award-number":["E3Z092"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,8,14]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Moving data warehouses (DWs) to the cloud is what today\u2019s companies consider a trend towards cost-effective data management. To fully achieve the goal, the cloud DW system is supposed to adjust its resource provisioning to adapt to changing workload requirements. However, traditional data warehousing architecture lacks the flexibility for on-demand resource control, which severely restricts cost optimization and quality of service for both cloud providers and users. To build cloud DWs, new architectures are needed. This paper explores an architecture that decouples data management and processing to enable on-demand resource control. This optimized design enhances system elasticity and adaptability. However, this separation design is not without cost, as cooperation overhead can be high if not well optimized. For proof of concept, we build a prototype system, DuoSQL, using PostgreSQL for data management and Spark for data processing. To optimize cooperation, we conduct joint parameter tuning to improve overall system performance. We validate the system with the TPC-H benchmark. Results show the decoupling approach is flexible and offers significant performance potential.<\/jats:p>","DOI":"10.1093\/comjnl\/bxaf014","type":"journal-article","created":{"date-parts":[[2025,2,5]],"date-time":"2025-02-05T07:25:21Z","timestamp":1738740321000},"page":"926-938","source":"Crossref","is-referenced-by-count":0,"title":["DuoSQL: towards elastic data warehousing via separated data management and processing"],"prefix":"10.1093","volume":"68","author":[{"given":"Weikang","family":"zhang","sequence":"first","affiliation":[{"name":"College of Engineering , Southern University of Science and Technology, Shenzhen, Guangdong 518055,","place":["China"]}]},{"given":"Zhi","family":"Liu","sequence":"additional","affiliation":[{"name":"Shenzhen Institutes of Advanced Technology , Chinese Academy of Sciences, Shenzhen, Guangdong 518000,","place":["China"]}]},{"given":"Tongxin","family":"Bai","sequence":"additional","affiliation":[{"name":"Beijing Academy of Artificial Intelligence , Beijing, Beijing 100084","place":["China"]}]},{"given":"Furong","family":"Zheng","sequence":"additional","affiliation":[{"name":"The SIAT-Suntang Big Data&AI Joint Innovation Laboratory , Shenzhen, Guangdong 518052,","place":["China"]}]},{"given":"Wenming","family":"Jin","sequence":"additional","affiliation":[{"name":"The SIAT-Suntang Big Data&AI Joint Innovation Laboratory , Shenzhen, Guangdong 518052,","place":["China"]}]},{"given":"Yang","family":"Wang","sequence":"additional","affiliation":[{"name":"Shenzhen Institutes of Advanced Technology , Chinese Academy of Sciences, Shenzhen, Guangdong 518000,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2025,2,27]]},"reference":[{"key":"2025081702465970600_ref1","doi-asserted-by":"publisher","first-page":"599","DOI":"10.1016\/j.future.2008.12.001","article-title":"Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility","volume":"25","author":"Buyya","year":"2008","journal-title":"Future Gener Comput Syst"},{"key":"2025081702465970600_ref2","first-page":"443","article-title":"FaaSNet: scalable and fast provisioning of custom Serverless container runtimes at Alibaba cloud function compute","volume-title":"Proceedings of the USENIX ATC 21. Boston, USA 14-16 July","author":"Wang"},{"key":"2025081702465970600_ref3","doi-asserted-by":"publisher","first-page":"3528","DOI":"10.14778\/3611540.3611545","article-title":"Krypton: real-time serving and analytical SQL engine at ByteDance","volume":"16","author":"Chen","year":"2023","journal-title":"Proc VLDB Endow"},{"key":"2025081702465970600_ref4","first-page":"1208","article-title":"Real-time LSM-trees for HTAP workloads","volume-title":"Proceedings of the 39th ICDE, Anaheim, USA, 3-7 April","author":"Saxena","year":"2023"},{"key":"2025081702465970600_ref5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1561\/1900000036","article-title":"Massively parallel databases and mapreduce systems","volume":"5","author":"Babu","year":"2013","journal-title":"Found Trends Databases"},{"key":"2025081702465970600_ref6","first-page":"48","article-title":"Lifting the fog of uncertainties: dynamic resource orchestration for the containerized cloud","volume-title":"Proceedings of the SoCC\u201923, Santa Cruz, USA, 30 October-1 November","author":"Zhang","year":"2023"},{"key":"2025081702465970600_ref7","first-page":"1","article-title":"Autopilot: workload autoscaling at Google","volume-title":"Proceedings of the EuroSys\u201920, Heraklion, Greece, 27-30 April","author":"Rzadca","year":"2020"},{"key":"2025081702465970600_ref8","first-page":"387","article-title":"AWARE: automate workload autoscaling with reinforcement learning in production cloud systems","volume-title":"Proceedings of the USENIX ATC 23, Boston, USA, 10-12 July","author":"Qiu","year":"2023"},{"key":"2025081702465970600_ref9","doi-asserted-by":"publisher","first-page":"3808","DOI":"10.14778\/3611540.3611566","article-title":"MagicScaler: uncertainty-aware, predictive autoscaling","volume":"16","author":"Pan","year":"2023","journal-title":"Proc VLDB Endow"},{"key":"2025081702465970600_ref10","first-page":"2762","article-title":"RobustScaler: QoS-aware autoscaling for complex workloads","volume-title":"Proceedings of the 38th ICDE, Kuala Lumpur, Malaysia (held virtually), 9-12 May","author":"Qian","year":"2022"},{"key":"2025081702465970600_ref11","first-page":"1275","article-title":"Bao: making learned query optimization practical","volume-title":"Proceedings of the SIGMOD\u201921, Virtual Event China, 20-25 June","author":"Marcus","year":"2021"},{"key":"2025081702465970600_ref12","doi-asserted-by":"publisher","first-page":"2742","DOI":"10.14778\/3611479.3611484","article-title":"Epoxy: ACID transactions across diverse data stores","volume":"16","author":"Kraft","year":"2023","journal-title":"Proc VLDB Endow"},{"key":"2025081702465970600_ref13","first-page":"95","article-title":"Improving spark application throughput via memory aware task co-location: a mixture of experts approach","volume-title":"Proceedings of Middleware\u201917, Las Vegas, USA, 11-15 December","author":"Marco","year":"2017"},{"key":"2025081702465970600_ref14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3589279","article-title":"QaaD (query-as-a-data): scalable execution of massive number of small queries in spark","volume":"1","author":"Park","year":"2023","journal-title":"Proc ACM Manag Data"},{"key":"2025081702465970600_ref15","article-title":"Spark: cluster computing with working sets","volume-title":"Proceedings of HotCloud 2010, Boston, USA, 22 June","author":"Zaharia","year":"2010"},{"key":"2025081702465970600_ref16","first-page":"215","article-title":"The snowflake elastic data warehouse","volume-title":"Proceedings of the SIGMOD\u201916, San Francisco, USA, 26 June-1 July","author":"Dageville","year":"2016"},{"key":"2025081702465970600_ref17","doi-asserted-by":"publisher","first-page":"3162","DOI":"10.14778\/3476311.3476391","article-title":"The evolution of Amazon Redshif","volume":"14","author":"Pandis","year":"2021","journal-title":"Proc VLDB Endow"},{"key":"2025081702465970600_ref18","doi-asserted-by":"publisher","first-page":"3461","DOI":"10.14778\/3415478.3415568","article-title":"Dremel: a decade of interactive SQL analysis at web scale","volume":"13","author":"Melnik","year":"2020","journal-title":"Proc VLDB Endow"},{"key":"2025081702465970600_ref19","first-page":"1667","article-title":"Black or white? How to develop an AutoTuner for memory-based analytics","volume-title":"Proceedings of the SIGMOD\u201920, Portland, USA, 14-19 June","author":"Kunjir","year":"2020"},{"key":"2025081702465970600_ref20","doi-asserted-by":"publisher","first-page":"2118","DOI":"10.14778\/3352063.3352129","article-title":"QTune: a query-aware database tuning system with deep reinforcement learning","volume":"12","author":"Li","year":"2019","journal-title":"Proc VLDB Endow"},{"key":"2025081702465970600_ref21","first-page":"2102","article-title":"ResTune: resource oriented tuning boosted by meta-learning for cloud databases","volume-title":"Proceedings of the SIGMOD\u201921, Virtual Event China, 20-25 June","author":"Zhang","year":"2021"},{"key":"2025081702465970600_ref22","doi-asserted-by":"publisher","first-page":"539","DOI":"10.14778\/3632093.3632114","article-title":"An efficient transfer learning based configuration adviser for database tuning","volume":"17","author":"Zhang","year":"2023","journal-title":"Proc VLDB Endow"},{"key":"2025081702465970600_ref23","first-page":"674","article-title":"Locat: Low-overhead online configuration auto-tuning of spark sql applications","volume-title":"Proceedings of the SIGMOD\u201922","author":"Xin","year":"2022"},{"key":"2025081702465970600_ref24","first-page":"338","article-title":"BestConfig: tapping the performance potential of systems via automatic configuration tuning","volume-title":"Proceedings of the SoCC\u201917, Santa Clara, USA, 24-27 September","author":"Zhu","year":"2017"},{"key":"2025081702465970600_ref25","first-page":"415","article-title":"An end-to-end automatic cloud database tuning system using deep reinforcement learning","volume-title":"Proceedings of the SIGMOD\u201919, Amsterdam, Netherlands, 30 June-5 July","author":"Zhang","year":"2019"},{"key":"2025081702465970600_ref26","first-page":"508","article-title":"Demonstrating $\\lambda $-tune: exploiting large language models for workload-adaptive database system tuning","volume-title":"Proceedings of the SIGMOD\u201924, Santiago, Chile, 9\u201315 June","author":"Giannakouris","year":"2024"},{"key":"2025081702465970600_ref27","doi-asserted-by":"publisher","first-page":"1939","DOI":"10.14778\/3659437.3659449","article-title":"GPTuner: a manual-reading database tuning system via GPT-guided Bayesian optimization","volume":"17","author":"Lao","year":"2024","journal-title":"Proc VLDB Endow"},{"key":"2025081702465970600_ref28","first-page":"208","article-title":"Finding the right cloud configuration for analytics clusters","volume-title":"Proceedings of the SoCC\u201920, Virtual Event, 19-21 October","author":"Bilal","year":"2020"},{"key":"2025081702465970600_ref29","first-page":"221","article-title":"Apache Calcite: a foundational framework for optimized query processing over heterogeneous data sources","volume-title":"Proceedings of the SIGMOD\u201918, Houston, USA, 10-15 June","author":"Begoli","year":"2018"},{"key":"2025081702465970600_ref30","first-page":"1009","article-title":"Automatic database management system tuning through large-scale machine learning","volume-title":"Proceedings of the SIGMOD\u201917, Chicago, USA, 14-19 May","author":"Aken","year":"2017"},{"key":"2025081702465970600_ref31","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso. J.R.Stat","volume":"58","author":"Tibshirani","year":"1996","journal-title":"SocSerB"},{"key":"2025081702465970600_ref32","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1142\/S0129065704001899","article-title":"Gaussian processes for machine learning","volume":"14","author":"Seeger","year":"2004","journal-title":"IntJNeural Syst"},{"key":"2025081702465970600_ref33","first-page":"2546","article-title":"Algorithms for hyper-parameter optimization","volume-title":"Proceedings of NIPS 2011, Granada, Spain, 12-14 December","author":"Bergstra","year":"2011"},{"key":"2025081702465970600_ref34","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1016\/0925-2312(93)90006-O","article-title":"Backpropagation and stochastic gradient descent method","volume":"5","author":"Amari","year":"1993","journal-title":"Neurocomputing"},{"key":"2025081702465970600_ref35","first-page":"110","article-title":"A protocol for extending analytics capability of SQL database","volume-title":"Proceedings of the 7th CCBD, Macau, China, 16-18 November","author":"Cai","year":"2016"},{"key":"2025081702465970600_ref36","first-page":"15","article-title":"Resilient distributed datasets: a fault-tolerant abstraction for In-memory cluster computing","volume-title":"Proceedings of the NSDI 12, San Jose, USA, 25-27 April","author":"Zaharia","year":"2012"},{"author":"Scikit-learn","key":"2025081702465970600_ref37","article-title":"Scikit-Learn, machine learning in Python"},{"author":"Tensorflow","key":"2025081702465970600_ref38","article-title":"TensorFlow, an end-to-end platform for machine learning"},{"key":"2025081702465970600_ref39","doi-asserted-by":"publisher","first-page":"1700","DOI":"10.14778\/2367502.2367510","article-title":"The MADlib analytics library or mad skills, the SQL","volume":"5","author":"Hellerstein","year":"2012","journal-title":"Proc VLDB Endow"}],"container-title":["The Computer Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/68\/8\/926\/62198119\/bxaf014.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/68\/8\/926\/62198119\/bxaf014.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,17]],"date-time":"2025-08-17T06:47:07Z","timestamp":1755413227000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/comjnl\/article\/68\/8\/926\/8045384"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,27]]},"references-count":39,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2025,2,27]]},"published-print":{"date-parts":[[2025,8,14]]}},"URL":"https:\/\/doi.org\/10.1093\/comjnl\/bxaf014","relation":{},"ISSN":["0010-4620","1460-2067"],"issn-type":[{"type":"print","value":"0010-4620"},{"type":"electronic","value":"1460-2067"}],"subject":[],"published-other":{"date-parts":[[2025,8]]},"published":{"date-parts":[[2025,2,27]]}}}