{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T15:27:57Z","timestamp":1759937277942,"version":"3.41.0"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T00:00:00Z","timestamp":1685059200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,5,26]]},"abstract":"<jats:p>Root Cause Analysis (RCA) plays an indispensable role in distributed data system maintenance and operations, as it bridges the gap between fault detection and system recovery. Existing works mainly study multidimensional localization or graph-based root cause localization. This paper opens up the possibilities of exploiting the recently developed framework of explainable AI (XAI) for the purpose of RCA. In particular, we propose BALANCE (BAyesian Linear AttributioN for root CausE localization), which formulates the problem of RCA through the lens of attribution in XAI and seeks to explain the anomalies in the target KPIs by the behavior of the candidate root causes. BALANCE consists of three innovative components. First, we propose a Bayesian multicollinear feature selection (BMFS) model to predict the target KPIs given the candidate root causes in a forward manner while promoting sparsity and concurrently paying attention to the correlation between the candidate root causes. Second, we introduce attribution analysis to compute the attribution score for each candidate in a backward manner. Third, we merge the estimated root causes related to each KPI if there are multiple KPIs. We extensively evaluate the proposed BALANCE method on one synthesis dataset as well as three real-world RCA tasks, that is, bad SQL localization, container fault localization, and fault type diagnosis for Exathlon. Results show that BALANCE outperforms the state-of-the-art (SOTA) methods in terms of accuracy with the least amount of running time, and achieves at least 6% notably higher accuracy than SOTA methods for real tasks. BALANCE has been deployed to production to tackle real-world RCA problems, and the online results further advocate its usage for real-time diagnosis in distributed data systems.<\/jats:p>","DOI":"10.1145\/3588949","type":"journal-article","created":{"date-parts":[[2023,5,30]],"date-time":"2023-05-30T17:42:05Z","timestamp":1685468525000},"page":"1-26","source":"Crossref","is-referenced-by-count":5,"title":["BALANCE: Bayesian Linear Attribution for Root Cause Localization"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-6133-4324","authenticated-orcid":false,"given":"Chaoyu","family":"Chen","sequence":"first","affiliation":[{"name":"Ant Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5639-0912","authenticated-orcid":false,"given":"Hang","family":"Yu","sequence":"additional","affiliation":[{"name":"Ant Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-1599-4675","authenticated-orcid":false,"given":"Zhichao","family":"Lei","sequence":"additional","affiliation":[{"name":"Ant Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8645-0680","authenticated-orcid":false,"given":"Jianguo","family":"Li","sequence":"additional","affiliation":[{"name":"Ant Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0509-8986","authenticated-orcid":false,"given":"Shaokang","family":"Ren","sequence":"additional","affiliation":[{"name":"Ant Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-6561-1561","authenticated-orcid":false,"given":"Tingkai","family":"Zhang","sequence":"additional","affiliation":[{"name":"Ant Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-9305-7675","authenticated-orcid":false,"given":"Silin","family":"Hu","sequence":"additional","affiliation":[{"name":"Ant Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-2749-2611","authenticated-orcid":false,"given":"Jianchao","family":"Wang","sequence":"additional","affiliation":[{"name":"Ant Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9903-2138","authenticated-orcid":false,"given":"Wenhui","family":"Shi","sequence":"additional","affiliation":[{"name":"OceanBase, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2023,5,30]]},"reference":[{"key":"e_1_2_2_1_1","volume-title":"International Conference on Service-Oriented Computing. 137--149","author":"Aggarwal Pooja","year":"2020","unstructured":"Pooja Aggarwal, Ajay Gupta, Prateeti Mohapatra, et al. 2020. Localization of operational faults in cloud applications by mining causal dependencies in logs using golden signals. In International Conference on Service-Oriented Computing. 137--149."},{"key":"e_1_2_2_2_1","volume-title":"International Conference on Machine Learning (ICML). 272--281","author":"Ancona Marco","year":"2019","unstructured":"Marco Ancona, Cengiz Oztireli, and Markus Gross. 2019. Explaining deep neural networks with a polynomial time algorithm for shapley value approximation. In International Conference on Machine Learning (ICML). 272--281."},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1.13428"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0130140"},{"key":"e_1_2_2_5_1","volume-title":"11th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 43--55","author":"Bhagwan Ranjita","year":"2014","unstructured":"Ranjita Bhagwan, Rahul Kumar, Ramachandran Ramjee, et al. 2014. Adtributor: Revenue debugging in advertising systems. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 43--55."},{"volume-title":"Pattern recognition and machine learning","author":"Bishop Christopher M","key":"e_1_2_2_6_1","unstructured":"Christopher M Bishop and Nasser M Nasrabadi. 2006. Pattern recognition and machine learning. Vol. 4."},{"key":"e_1_2_2_7_1","doi-asserted-by":"crossref","unstructured":"Jonathan Boss Jyotishka Datta Xin Wang et al. 2021. Group Inverse-Gamma Gamma Shrinkage for Sparse Regression with Block-Correlated Predictors. arXiv preprint arXiv:2102.10670 (2021).","DOI":"10.32614\/CRAN.package.gigg"},{"key":"e_1_2_2_8_1","volume-title":"Adelaida Creosteanu, et al.","author":"Bykov Kirill","year":"2021","unstructured":"Kirill Bykov, Marina M-C H\u00f6hne, Adelaida Creosteanu, et al. 2021. Explaining bayesian neural networks. arXiv preprint arXiv:2108.10346 (2021)."},{"key":"e_1_2_2_9_1","unstructured":"Carlos M Carvalho Nicholas G Polson and James G Scott. 2009. Handling sparsity via the horseshoe. In Artificial Intelligence and Statistics (AIStat). 73--80."},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2014.6848128"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE-Companion.2019.00023"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476307"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3314048"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2494232.2465753"},{"key":"e_1_2_2_15_1","volume-title":"International Conference on Machine Learning (ICML). 3992--4002","author":"Lin Wu","year":"2019","unstructured":"Wu Lin, Mohammad Emtiyaz Khan, and Mark Schmidt. 2019. Fast and simple natural-gradient variational inference with mixture of exponential-family approximations. In International Conference on Machine Learning (ICML). 3992--4002."},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.3390\/e23010018"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE-SEIP52600.2021.00043"},{"key":"e_1_2_2_18_1","volume-title":"A unified approach to interpreting model predictions. Advances in neural information processing systems (NIPS)","author":"Lundberg Scott M","year":"2017","unstructured":"Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems (NIPS), Vol. 30 (2017)."},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3389133.3389136"},{"key":"e_1_2_2_20_1","unstructured":"Raha Moraffah Paras Sheth Mansooreh Karami et al. 2021. Causal inference for time series analysis: Problems methods and evaluation. Knowledge and Information Systems (2021) 1--45."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2783258.2788624"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1214\/14-EJS910"},{"key":"e_1_2_2_23_1","unstructured":"Dmitry Pavlyuk. 2022. fsMTS: Feature Selection for Multivariate Time Series."},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.3390\/app10062166"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939778"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/s40300-014-0047-y"},{"key":"e_1_2_2_27_1","volume-title":"International conference on machine learning (ICML). 3145--3153","author":"Shrikumar Avanti","year":"2017","unstructured":"Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In International conference on machine learning (ICML). 3145--3153."},{"key":"e_1_2_2_28_1","volume-title":"ICLR workshop.","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep inside convolutional networks: Visualising image classification models and saliency maps. In ICLR workshop."},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3501297"},{"key":"e_1_2_2_30_1","volume-title":"An algorithm for fast recovery of sparse causal graphs. Social science computer review","author":"Spirtes Peter","year":"1991","unstructured":"Peter Spirtes and Clark Glymour. 1991. An algorithm for fast recovery of sparse causal graphs. Social science computer review, Vol. 9, 1 (1991), 62--72."},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1993574.1993601"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2804764"},{"key":"e_1_2_2_33_1","volume-title":"International conference on machine learning (ICML). 3319--3328","author":"Sundararajan Mukund","year":"2017","unstructured":"Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In International conference on machine learning (ICML). 3319--3328."},{"key":"e_1_2_2_34_1","volume-title":"Istemi Ekin Akkus, et al","author":"Thalheim J\u00f6rg","year":"2017","unstructured":"J\u00f6rg Thalheim, Antonio Rodrigues, Istemi Ekin Akkus, et al. 2017. Sieve: Actionable insights from monitored metrics in microservices. arXiv preprint arXiv:1709.06686 (2017)."},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.2517-6161.1996.tb02080.x"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1162\/15324430152748236"},{"key":"e_1_2_2_37_1","volume-title":"International Conference on Machine Learning (ICML).","author":"Wang Guanchu","year":"2022","unstructured":"Guanchu Wang, Yu-Neng Chuang, Mengnan Du, et al. 2022. Accelerating Shapley Explanation via Contributive Cooperator Selection. In International Conference on Machine Learning (ICML)."},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352105"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2018.00076"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2018.2843805"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554830"},{"key":"e_1_2_2_42_1","first-page":"124","article-title":"Fast Bayesian Inference of Sparse Networks with Automatic Sparsity Determination","volume":"21","author":"Yu Hang","year":"2020","unstructured":"Hang Yu, Songwei Wu, Luyin Xin, et al. 2020. Fast Bayesian Inference of Sparse Networks with Automatic Sparsity Determination. J. Mach. Learn. Res., Vol. 21 (2020), 124--1.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2019.2953651"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10590-1_53"},{"key":"e_1_2_2_45_1","volume-title":"On assessing prior distributions and Bayesian regression analysis with g-prior distributions. Bayesian inference and decision techniques","author":"Zellner Arnold","year":"1986","unstructured":"Arnold Zellner. 1986. On assessing prior distributions and Bayesian regression analysis with g-prior distributions. Bayesian inference and decision techniques (1986)."},{"key":"e_1_2_2_46_1","unstructured":"Kai Zhang Chao Tian Kun Zhang et al. 2021c. A Fast PC Algorithm with Reversed-order Pruning and A Parallelization Strategy. arXiv preprint arXiv:2109.04626 (2021)."},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467190"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3481903"},{"key":"e_1_2_2_49_1","volume-title":"Advances in Neural Information Processing Systems (NIPS)","volume":"31","author":"Zheng Xun","year":"2018","unstructured":"Xun Zheng, Bryon Aragam, Pradeep K Ravikumar, et al. 2018. Dags with no tears: Continuous optimization for structure learning. Advances in Neural Information Processing Systems (NIPS), Vol. 31 (2018)."},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-9868.2005.00503.x"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588949","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3588949","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:38Z","timestamp":1750178858000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588949"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,26]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,5,26]]}},"alternative-id":["10.1145\/3588949"],"URL":"https:\/\/doi.org\/10.1145\/3588949","relation":{},"ISSN":["2836-6573"],"issn-type":[{"type":"electronic","value":"2836-6573"}],"subject":[],"published":{"date-parts":[[2023,5,26]]}}}