{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T19:01:29Z","timestamp":1774983689477,"version":"3.50.1"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,2,10]],"date-time":"2025-02-10T00:00:00Z","timestamp":1739145600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,2,10]]},"abstract":"<jats:p>Database management system (DBMS) configuration debugging, e.g., diagnosing poorly configured DBMS knobs and generating troubleshooting recommendations, is crucial in optimizing DBMS performance. However, the configuration debugging process is tedious and, sometimes challenging, even for seasoned database administrators (DBAs) with sufficient experience in DBMS configurations and good understandings of the DBMS internals (e.g., MySQL or Oracle). To address this difficulty, we propose Andromeda, a framework that utilizes large language models (LLMs) to enable automatic DBMS configuration debugging. Andromeda serves as a natural surrogate of DBAs to answer a wide range of natural language (NL) questions on DBMS configuration issues, and to generate diagnostic suggestions to fix these issues. Nevertheless, directly prompting LLMs with these professional questions may result in overly generic and often unsatisfying answers. To this end, we propose a retrieval-augmented generation (RAG) strategy that effectively provides matched domain-specific contexts for the question from multiple sources. They come from related historical questions, troubleshooting manuals and DBMS telemetries, which significantly improve the performance of configuration debugging. To support the RAG strategy, we develop a document retrieval mechanism addressing heterogeneous documents and design an effective method for telemetry analysis. Extensive experiments on real-world DBMS configuration debugging datasets show that Andromeda significantly outperforms existing solutions.<\/jats:p>","DOI":"10.1145\/3709663","type":"journal-article","created":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T15:45:06Z","timestamp":1739288706000},"page":"1-27","source":"Crossref","is-referenced-by-count":3,"title":["Automatic Database Configuration Debugging using Retrieval-Augmented Language Models"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-5331-5829","authenticated-orcid":false,"given":"Sibei","family":"Chen","sequence":"first","affiliation":[{"name":"Renmin University of China, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4729-9903","authenticated-orcid":false,"given":"Ju","family":"Fan","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4743-1006","authenticated-orcid":false,"given":"Bin","family":"Wu","sequence":"additional","affiliation":[{"name":"Alibaba Group Computing, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2832-0295","authenticated-orcid":false,"given":"Nan","family":"Tang","sequence":"additional","affiliation":[{"name":"HKUST (GZ), Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-9312-2110","authenticated-orcid":false,"given":"Chao","family":"Deng","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-5634-9782","authenticated-orcid":false,"given":"Pengyi","family":"Wang","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1860-2723","authenticated-orcid":false,"given":"Ye","family":"Li","sequence":"additional","affiliation":[{"name":"Alibaba Cloud Computing, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1080-9300","authenticated-orcid":false,"given":"Jian","family":"Tan","sequence":"additional","affiliation":[{"name":"Alibaba Cloud Computing, Sunnyvale, CA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-0770-5775","authenticated-orcid":false,"given":"Feifei","family":"Li","sequence":"additional","affiliation":[{"name":"Alibaba Cloud Computing, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4220-2634","authenticated-orcid":false,"given":"Jingren","family":"Zhou","sequence":"additional","affiliation":[{"name":"Alibaba Cloud Computing, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5757-9135","authenticated-orcid":false,"given":"Xiaoyong","family":"Du","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,2,11]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"[n. d.]. Llama-3.1--70B-Instruct. https:\/\/huggingface.co\/meta-llama\/Llama-3.1--70B-Instruct."},{"key":"e_1_2_1_2_1","unstructured":"[n. d.]. Llama-3.1--8B-Instruct. https:\/\/huggingface.co\/meta-llama\/Llama-3.1--8B-Instruct."},{"key":"e_1_2_1_3_1","unstructured":"[n. d.]. MySQL Forum Dataset. https:\/\/forums.mysql.com."},{"key":"e_1_2_1_4_1","unstructured":"[n. d.]. MySQL official manual. https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/."},{"key":"e_1_2_1_5_1","unstructured":"[n. d.]. PostgreSQL official manual. https:\/\/www.postgresql.org\/docs\/."},{"key":"e_1_2_1_6_1","unstructured":"[n. d.]. Qwen2--72B-Instruct. https:\/\/huggingface.co\/Qwen\/Qwen2--72B-Instruct."},{"key":"e_1_2_1_7_1","unstructured":"[n. d.]. Qwen2--7B-Instruct. https:\/\/huggingface.co\/Qwen\/Qwen2--7B-Instruct."},{"key":"e_1_2_1_8_1","unstructured":"[n. d.]. Sentence-Bert Model. https:\/\/huggingface.co\/sentence-transformers\/all-mpnet-base-v2."},{"key":"e_1_2_1_9_1","unstructured":"[n. d.]. StackOverflow Dataset. https:\/\/www.kaggle.com\/datasets\/stackoverflow\/stackoverflow."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064029"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2310.11511"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-1091"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3654925"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588945"},{"key":"e_1_2_1_15_1","volume-title":"Terpenning","author":"Cleveland R. B.","year":"1990","unstructured":"R. B. Cleveland, William S. Cleveland, Jean E. McRae, and Irma J. Terpenning. 1990. STL: A seasonal-trend decomposition procedure based on loess (with discussion). https:\/\/api.semanticscholar.org\/CorpusID:268093671"},{"key":"e_1_2_1_16_1","volume-title":"Second Biennial Conference on Innovative Data Systems Research, CIDR","author":"Dias Karl","year":"2005","unstructured":"Karl Dias, Mark Ramacher, Uri Shaft, Venkateshwaran Venkataramani, and GrahamWood. 2005. Automatic Performance Diagnosis and Tuning in Oracle. In Second Biennial Conference on Innovative Data Systems Research, CIDR 2005, Asilomar, CA, USA, January 4--7, 2005, Online Proceedings. www.cidrdb.org, 84--94. http:\/\/cidrdb.org\/cidr2005\/papers\/P07.pdf"},{"key":"e_1_2_1_17_1","volume-title":"Second Biennial Conference on Innovative Data Systems Research, CIDR","author":"Dias Karl","year":"2005","unstructured":"Karl Dias, Mark Ramacher, Uri Shaft, Venkateshwaran Venkataramani, and GrahamWood. 2005. Automatic Performance Diagnosis and Tuning in Oracle. In Second Biennial Conference on Innovative Data Systems Research, CIDR 2005, Asilomar, CA, USA, January 4--7, 2005, Online Proceedings. www.cidrdb.org, 84--94. http:\/\/cidrdb.org\/cidr2005\/papers\/P07.pdf"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687767"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403299"},{"key":"e_1_2_1_20_1","unstructured":"Matthias Feurer. 2018. Scalable Meta-Learning for Bayesian Optimization using Ranking-Weighted Gaussian Process Ensembles. https:\/\/api.semanticscholar.org\/CorpusID:51795721"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/3583140.3583155"},{"key":"e_1_2_1_22_1","volume-title":"Automatic Anomaly Detection in the Cloud Via Statistical Learning. CoRR abs\/1704.07706","author":"Hochenbaum Jordan","year":"2017","unstructured":"Jordan Hochenbaum, Owen S. Vallis, and Arun Kejariwal. 2017. Automatic Anomaly Detection in the Cloud Via Statistical Learning. CoRR abs\/1704.07706 (2017). arXiv:1704.07706 http:\/\/arxiv.org\/abs\/1704.07706"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2311.17502"},{"key":"e_1_2_1_24_1","volume-title":"Robust statistics","author":"Huber Peter J","unstructured":"Peter J Huber and Elvezio M Ronchetti. 2011. Robust statistics. John Wiley & Sons."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2403.14403"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2311.03157"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352129"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2310.07637"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.14778\/3389133.3389136"},{"key":"e_1_2_1_30_1","volume-title":"WebGPT: Browser-assisted question-answering with human feedback. CoRR abs\/2112.09332","author":"Nakano Reiichiro","year":"2021","unstructured":"Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, and John Schulman. 2021. WebGPT: Browser-assisted question-answering with human feedback. CoRR abs\/2112.09332 (2021). arXiv:2112.09332 https:\/\/arxiv.org\/abs\/2112.09332"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.2307\/1268354"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835449.1835518"},{"key":"e_1_2_1_35_1","volume-title":"14th Conference on Innovative Data Systems Research, CIDR 2024","author":"Singh Vikramank Y.","year":"2024","unstructured":"Vikramank Y. Singh, Kapil Vaidya, Vinayshekhar Bannihatti Kumar, Sopan Khosla, Balakrishnan Narayanaswamy, Rashmi Gangadharaiah, and Tim Kraska. 2024. Panda: Performance Debugging for Databases using LLM Agents. In 14th Conference on Innovative Data Systems Research, CIDR 2024, Chaminade, HI, USA, January 14--17, 2024. www.cidrdb.org. https:\/\/www.cidrdb.org\/cidr2024\/papers\/p6-singh.pdf"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00530"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588938"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.5591\/978--1--57735--516--8\/IJCAI11--317"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915218"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915218"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626772.3657923"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300085"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661829.2661908"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.14778\/3538598.3538604"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457291"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2014.2356461"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/978--3--319--73618--1_56"},{"key":"e_1_2_1_48_1","volume-title":"D-bot: Database diagnosis system using large language models. arXiv preprint arXiv:2312.01454","author":"Zhou Xuanhe","year":"2023","unstructured":"Xuanhe Zhou, Guoliang Li, Zhaoyan Sun, Zhiyuan Liu, Weize Chen, Jianming Wu, Jiesi Liu, Ruohang Feng, and Guoyang Zeng. 2023. D-bot: Database diagnosis system using large language models. arXiv preprint arXiv:2312.01454 (2023)."}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3709663","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3709663","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T18:19:25Z","timestamp":1774981165000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3709663"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,10]]},"references-count":48,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,2,10]]}},"alternative-id":["10.1145\/3709663"],"URL":"https:\/\/doi.org\/10.1145\/3709663","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,10]]}}}