{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T16:22:13Z","timestamp":1774974133313,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":30,"publisher":"ACM","funder":[{"name":"Key-Area Research and Development Program of Guangdong Province, China","award":["2020B010164003"],"award-info":[{"award-number":["2020B010164003"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,6,23]]},"DOI":"10.1145\/3696630.3728492","type":"proceedings-article","created":{"date-parts":[[2025,7,28]],"date-time":"2025-07-28T19:10:43Z","timestamp":1753729843000},"page":"525-529","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["AgentFM: Role-Aware Failure Management for Distributed Databases with LLM-Driven Multi-Agents"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-9500-4489","authenticated-orcid":false,"given":"Lingzhe","family":"Zhang","sequence":"first","affiliation":[{"name":"Peking University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3344-4543","authenticated-orcid":false,"given":"Yunpeng","family":"Zhai","sequence":"additional","affiliation":[{"name":"Alibaba Group, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5946-9829","authenticated-orcid":false,"given":"Tong","family":"Jia","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-3462-5324","authenticated-orcid":false,"given":"Xiaosong","family":"Huang","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-4422-6323","authenticated-orcid":false,"given":"Chiming","family":"Duan","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9667-2423","authenticated-orcid":false,"given":"Ying","family":"Li","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,7,28]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1737\u20131749","author":"Ahmed Toufique","year":"2023","unstructured":"Toufique Ahmed, Supriyo Ghosh, Chetan Bansal, Thomas Zimmermann, Xuchao Zhang, and Saravan Rajmohan. 2023. Recommending root-cause and mitigation steps for cloud incidents using large language models. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1737\u20131749."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2491245","article-title":"Spanner: Google's globally distributed database","volume":"31","author":"Corbett James C","year":"2013","unstructured":"James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, Jeffrey John Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, et al. 2013. Spanner: Google's globally distributed database. ACM Transactions on Computer Systems (TOCS) 31, 3 (2013), 1\u201322.","journal-title":"ACM Transactions on Computer Systems (TOCS)"},{"key":"e_1_3_2_1_3_1","volume-title":"Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 417\u2013428","author":"Goel Drishti","year":"2024","unstructured":"Drishti Goel, Fiza Husain, Aditya Singh, Supriyo Ghosh, Anjaly Parayil, Chetan Bansal, Xuchao Zhang, and Saravan Rajmohan. 2024. X-lifecycle learning for cloud incident management using llms. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 417\u2013428."},{"key":"e_1_3_2_1_4_1","volume-title":"Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice. 47\u201357","author":"Hrusto Adha","year":"2024","unstructured":"Adha Hrusto, Per Runeson, and Magnus C Ohlsson. 2024. Autonomous monitors for detecting failures early and reporting interpretable alerts in cloud operations. In Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice. 47\u201357."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"crossref","first-page":"3072","DOI":"10.14778\/3415478.3415535","article-title":"TiDB: a Raft-based HTAP database","volume":"13","author":"Huang Dongxu","year":"2020","unstructured":"Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, et al. 2020. TiDB: a Raft-based HTAP database. Proceedings of the VLDB Endowment 13, 12 (2020), 3072\u20133084.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_3_2_1_6_1","volume-title":"2023 38th IEEE\/ACM International Conference on Automated Software Engineering (ASE). IEEE, 66\u201378","author":"Huang Jun","year":"2023","unstructured":"Jun Huang, Yang Yang, Hang Yu, Jianguo Li, and Xiao Zheng. 2023. Twin graph-based anomaly detection via attentive multi-modal learning for microservice system. In 2023 38th IEEE\/ACM International Conference on Automated Software Engineering (ASE). IEEE, 66\u201378."},{"key":"e_1_3_2_1_7_1","volume-title":"2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 3340\u20133352","author":"Kang Yuyuan","year":"2022","unstructured":"Yuyuan Kang, Xiangdong Huang, Shaoxu Song, Lingzhe Zhang, Jialin Qiao, Chen Wang, Jianmin Wang, and Julian Feinauer. 2022. Separation or not: On handing out-of-order time-series data in leveled lsm-tree. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 3340\u20133352."},{"key":"e_1_3_2_1_8_1","volume-title":"2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1750\u20131762","author":"Lee Cheryl","year":"2023","unstructured":"Cheryl Lee, Tianyi Yang, Zhuangbin Chen, Yuxin Su, and Michael R Lyu. 2023. Eadro: An end-to-end troubleshooting framework for microservices on multi-source data. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1750\u20131762."},{"key":"e_1_3_2_1_9_1","volume-title":"2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1724\u20131736","author":"Lee Cheryl","year":"2023","unstructured":"Cheryl Lee, Tianyi Yang, Zhuangbin Chen, Yuxin Su, Yongqiang Yang, and Michael R Lyu. 2023. Heterogeneous anomaly detection for software systems via semi-supervised cross-modal attention. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1724\u20131736."},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"38","author":"Lin Cheng-Ming","year":"2024","unstructured":"Cheng-Ming Lin, Ching Chang, Wei-Yao Wang, Kuang-Da Wang, and Wen-Chih Peng. 2024. Root Cause Analysis in Microservice Using Neural Granger Causal Discovery. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 206\u2013213."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"crossref","first-page":"1176","DOI":"10.14778\/3389133.3389136","article-title":"Diagnosing root causes of intermittent slow queries in cloud databases","volume":"13","author":"Ma Minghua","year":"2020","unstructured":"Minghua Ma, Zheng Yin, Shenglin Zhang, Sheng Wang, Christopher Zheng, Xinhao Jiang, Hanwen Hu, Cheng Luo, Yilin Li, Nengjun Qiu, et al. 2020. Diagnosing root causes of intermittent slow queries in cloud databases. Proceedings of the VLDB Endowment 13, 8 (2020), 1176\u20131189.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_3_2_1_12_1","volume-title":"Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 208\u2013219","author":"Roy Devjeet","year":"2024","unstructured":"Devjeet Roy, Xuchao Zhang, Rashi Bhave, Chetan Bansal, Pedro Las-Casas, Rodrigo Fonseca, and Saravan Rajmohan. 2024. Exploring llm-based agents for root cause analysis. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 208\u2013219."},{"key":"e_1_3_2_1_13_1","volume-title":"Proceedings of the 2024 ACM Symposium on Cloud Computing. 99\u2013110","author":"Shetty Manish","year":"2024","unstructured":"Manish Shetty, Yinfang Chen, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Xuchao Zhang, Jonathan Mace, Dax Vandevoorde, Pedro Las-Casas, Shachee Mishra Gupta, et al. 2024. Building AI Agents for Autonomous Clouds: Challenges and Design Principles. In Proceedings of the 2024 ACM Symposium on Cloud Computing. 99\u2013110."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"crossref","first-page":"2901","DOI":"10.14778\/3415478.3415504","article-title":"Apache iotdb: time-series database for internet of things","volume":"13","author":"Wang Chen","year":"2020","unstructured":"Chen Wang, Xiangdong Huang, Jialin Qiao, Tian Jiang, Lei Rui, Jinrui Zhang, Rong Kang, Julian Feinauer, Kevin A McGrail, Peng Wang, et al. 2020. Apache iotdb: time-series database for internet of things. Proceedings of the VLDB Endowment 13, 12 (2020), 2901\u20132904.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_3_2_1_15_1","volume-title":"2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1448\u20131460","author":"Yang Lin","year":"2021","unstructured":"Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, and Wenbin Zhang. 2021. Semi-supervised log-based anomaly detection via probabilistic label estimation. In 2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1448\u20131460."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","first-page":"3385","DOI":"10.14778\/3554821.3554830","article-title":"OceanBase: a 707 million tpmC distributed relational database system","volume":"15","author":"Yang Zhenkun","year":"2022","unstructured":"Zhenkun Yang, Chuanhui Yang, Fusheng Han, Mingqiang Zhuang, Bing Yang, Zhifeng Yang, Xiaojun Cheng, Yuzhong Zhao, Wenhui Shi, Huafeng Xi, et al. 2022. OceanBase: a 707 million tpmC distributed relational database system. Proceedings of the VLDB Endowment 15, 12 (2022), 3385\u20133397.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_3_2_1_17_1","volume-title":"Proceedings of the 44th international conference on software engineering. 623\u2013634","author":"Zhang Chenxi","year":"2022","unstructured":"Chenxi Zhang, Xin Peng, Chaofeng Sha, Ke Zhang, Zhenqing Fu, Xiya Wu, Qingwei Lin, and Dongmei Zhang. 2022. Deeptralog: Trace-log combined microservice anomaly detection through graph-based deep learning. In Proceedings of the 44th international conference on software engineering. 623\u2013634."},{"key":"e_1_3_2_1_18_1","volume-title":"Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 388\u2013398","author":"Zhang Dylan","year":"2024","unstructured":"Dylan Zhang, Xuchao Zhang, Chetan Bansal, Pedro Las-Casas, Rodrigo Fonseca, and Saravan Rajmohan. 2024. LM-PACE: Confidence estimation by large language models for effective root causing of cloud incidents. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 388\u2013398."},{"key":"e_1_3_2_1_19_1","volume-title":"Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4256\u20134267","author":"Zhang Lingzhe","year":"2024","unstructured":"Lingzhe Zhang, Tong Jia, Mengxi Jia, Ying Li, Yong Yang, and Zhonghai Wu. 2024. Multivariate Log-based Anomaly Detection for Distributed Database. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4256\u20134267."},{"key":"e_1_3_2_1_20_1","volume-title":"Towards Close-To-Zero Runtime Collection Overhead: Raft-Based Anomaly Diagnosis on System Faults for Distributed Storage System","author":"Zhang Lingzhe","year":"2024","unstructured":"Lingzhe Zhang, Tong Jia, Mengxi Jia, Hongyi Liu, Yong Yang, Zhonghai Wu, and Ying Li. 2024. Towards Close-To-Zero Runtime Collection Overhead: Raft-Based Anomaly Diagnosis on System Faults for Distributed Storage System. IEEE Transactions on Services Computing (2024)."},{"key":"e_1_3_2_1_21_1","volume-title":"A Survey of AIOps for Failure Management in the Era of Large Language Models. arXiv preprint arXiv:2406.11213","author":"Zhang Lingzhe","year":"2024","unstructured":"Lingzhe Zhang, Tong Jia, Mengxi Jia, Yifan Wu, Aiwei Liu, Yong Yang, Zhonghai Wu, Xuming Hu, Philip S Yu, and Ying Li. 2024. A Survey of AIOps for Failure Management in the Era of Large Language Models. arXiv preprint arXiv:2406.11213 (2024)."},{"key":"e_1_3_2_1_22_1","volume-title":"ScalaLog: Scalable Log-Based Failure Diagnosis Using LLM. In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1\u20135.","author":"Zhang Lingzhe","year":"2025","unstructured":"Lingzhe Zhang, Tong Jia, Mengxi Jia, Yifan Wu, Hongyi Liu, and Ying Li. 2025. ScalaLog: Scalable Log-Based Failure Diagnosis Using LLM. In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1\u20135."},{"key":"e_1_3_2_1_23_1","volume-title":"XRAGLog: A Resource-Efficient and Context-Aware Log-Based Anomaly Detection Method Using Retrieval-Augmented Generation. In AAAI 2025 Workshop on Preventing and Detecting LLM Misinformation (PDLM).","author":"Zhang Lingzhe","year":"2025","unstructured":"Lingzhe Zhang, Tong Jia, Mengxi Jia, Yifan Wu, Hongyi Liu, and Ying Li. 2025. XRAGLog: A Resource-Efficient and Context-Aware Log-Based Anomaly Detection Method Using Retrieval-Augmented Generation. In AAAI 2025 Workshop on Preventing and Detecting LLM Misinformation (PDLM)."},{"key":"e_1_3_2_1_24_1","volume-title":"Proceedings of the 18th ACM\/IEEE International Symposium on Empirical Software Engineering and Measurement. 538\u2013548","author":"Zhang Lingzhe","year":"2024","unstructured":"Lingzhe Zhang, Tong Jia, Kangjin Wang, Mengxi Jia, Yong Yang, and Ying Li. 2024. Reducing Events to Augment Log-based Anomaly Detection Models: An Empirical Study. In Proceedings of the 18th ACM\/IEEE International Symposium on Empirical Software Engineering and Measurement. 538\u2013548."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"crossref","first-page":"102224","DOI":"10.1016\/j.aei.2023.102224","article-title":"Time-tired compaction: An elastic compaction scheme for LSM-tree based time-series database","volume":"59","author":"Zhang Ling-Zhe","year":"2024","unstructured":"Ling-Zhe Zhang, Xiang-Dong Huang, Yan-Kai Wang, Jia-Lin Qiao, Shao-Xu Song, and Jian-Min Wang. 2024. Time-tired compaction: An elastic compaction scheme for LSM-tree based time-series database. Advanced Engineering Informatics 59 (2024), 102224.","journal-title":"Advanced Engineering Informatics"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"crossref","first-page":"3851","DOI":"10.1109\/TSC.2023.3290018","article-title":"Robust failure diagnosis of microservice system through multimodal data","volume":"16","author":"Zhang Shenglin","year":"2023","unstructured":"Shenglin Zhang, Pengxiang Jin, Zihan Lin, Yongqian Sun, Bicheng Zhang, Sibo Xia, Zhengdan Li, Zhenyu Zhong, Minghua Ma, Wa Jin, et al. 2023. Robust failure diagnosis of microservice system through multimodal data. IEEE Transactions on Services Computing 16, 6 (2023), 3851\u20133864.","journal-title":"IEEE Transactions on Services Computing"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Wei Zhang Hongcheng Guo Jian Yang Yi Zhang Chaoran Yan Zhoujin Tian Hangyuan Ji Zhoujun Li Tongliang Li Tieqiao Zheng et al. 2024. mABC: multi-Agent Blockchain-Inspired Collaboration for root cause analysis in micro-services architecture. arXiv preprint arXiv:2404.12135 (2024).","DOI":"10.18653\/v1\/2024.findings-emnlp.232"},{"key":"e_1_3_2_1_28_1","volume-title":"Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 266\u2013277","author":"Zhang Xuchao","year":"2024","unstructured":"Xuchao Zhang, Supriyo Ghosh, Chetan Bansal, Rujia Wang, Minghua Ma, Yu Kang, and Saravan Rajmohan. 2024. Automated root causing of cloud incidents using in-context learning with gpt-4. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 266\u2013277."},{"key":"e_1_3_2_1_29_1","volume-title":"Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 527\u2013539","author":"Zhao Nengwen","year":"2021","unstructured":"Nengwen Zhao, Junjie Chen, Zhaoyang Yu, Honglin Wang, Jiesong Li, Bin Qiu, Hongyu Xu, Wenchi Zhang, Kaixin Sui, and Dan Pei. 2021. Identifying bad software changes via multimodal anomaly detection for online service systems. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 527\u2013539."},{"key":"e_1_3_2_1_30_1","volume-title":"Multi-modal Causal Structure Learning and Root Cause Analysis. arXiv preprint arXiv:2402.02357","author":"Zheng Lecheng","year":"2024","unstructured":"Lecheng Zheng, Zhengzhang Chen, Jingrui He, and Haifeng Chen. 2024. Multi-modal Causal Structure Learning and Root Cause Analysis. arXiv preprint arXiv:2402.02357 (2024)."}],"event":{"name":"FSE Companion '25: 33rd ACM International Conference on the Foundations of Software Engineering","location":"Clarion Hotel Trondheim Trondheim Norway","acronym":"FSE Companion '25","sponsor":["SIGSOFT ACM Special Interest Group on Software Engineering"]},"container-title":["Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3696630.3728492","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,28]],"date-time":"2025-07-28T19:12:26Z","timestamp":1753729946000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3696630.3728492"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,23]]},"references-count":30,"alternative-id":["10.1145\/3696630.3728492","10.1145\/3696630"],"URL":"https:\/\/doi.org\/10.1145\/3696630.3728492","relation":{},"subject":[],"published":{"date-parts":[[2025,6,23]]},"assertion":[{"value":"2025-07-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}