{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T17:36:38Z","timestamp":1772040998330,"version":"3.50.1"},"reference-count":90,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2025,2,23]],"date-time":"2025-02-23T00:00:00Z","timestamp":1740268800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Research Grants Council of the Hong Kong Special Administrative Region, China","award":["CUHK 14206921"],"award-info":[{"award-number":["CUHK 14206921"]}]},{"name":"Fundamental Research Funds for the Central Universities, Sun Yat-sen University","award":["76250-31610005"],"award-info":[{"award-number":["76250-31610005"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>Cloud systems, typically comprised of various components (e.g., microservices), are susceptible to performance issues, which may cause service-level agreement violations and financial losses. Identifying performance issues is thus of paramount importance for cloud vendors. In current practice, crucial metrics, i.e., Key Performance Indicators (KPIs), are monitored periodically to provide insight into the operational status of components. Identifying performance issues is often formulated as an anomaly detection problem, which is tackled by analyzing each metric independently. However, this approach overlooks the complex dependencies existing among cloud components. Some graph neural network-based methods take both temporal and relational information into account; however, the correlation violations in the metrics that serve as indicators of underlying performance issues are difficult for them to identify. Furthermore, a large volume of components in a cloud system results in a vast array of noisy metrics. This complexity renders it impractical for engineers to fully comprehend the correlations, making it challenging to identify performance issues accurately. To address these limitations, we propose Identifying Performance Issues based on Relational-Temporal Features (ISOLATE), a learning-based approach that leverages both the relational and temporal features of metrics to identify performance issues. In particular, it adopts a graph neural network with attention to characterizing the relations among metrics and extracts long-term and multi-scale temporal patterns using a GRU and a convolution network, respectively. The learned graph attention weights can be further used to localize the correlation-violated metrics. Moreover, to relieve the impact of noisy data, ISOLATE utilizes a Positive Unlabeled (PU) Learning strategy that tags pseudo-labels based on a small portion of confirmed negative examples. Extensive evaluation on both public and industrial datasets shows that ISOLATE outperforms all baseline models with 0.945 F1 score and 0.920 Hit rate@3. The ablation study also proves the effectiveness of the relational-temporal features and the PU-Learning strategy. Furthermore, we share the success stories of leveraging ISOLATE to identify performance issues in Huawei Cloud, which demonstrates its superiority in practice.<\/jats:p>","DOI":"10.1145\/3702978","type":"journal-article","created":{"date-parts":[[2024,11,5]],"date-time":"2024-11-05T16:29:38Z","timestamp":1730824178000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Identifying Performance Issues in Cloud Service Systems Based on Relational-Temporal Features"],"prefix":"10.1145","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1096-2732","authenticated-orcid":false,"given":"Wenwei","family":"Gu","sequence":"first","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0037-1912","authenticated-orcid":false,"given":"Jinyang","family":"Liu","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5158-6716","authenticated-orcid":false,"given":"Zhuangbin","family":"Chen","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, Zhuhai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8262-9608","authenticated-orcid":false,"given":"Jianping","family":"Zhang","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3338-8561","authenticated-orcid":false,"given":"Yuxin","family":"Su","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, Zhuhai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5831-9474","authenticated-orcid":false,"given":"Jiazhen","family":"Gu","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-5556-4004","authenticated-orcid":false,"given":"Cong","family":"Feng","sequence":"additional","affiliation":[{"name":"Huawei Cloud Computing Technology Co., Ltd, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6307-7310","authenticated-orcid":false,"given":"Zengyin","family":"Yang","sequence":"additional","affiliation":[{"name":"Huawei Cloud Computing Technology Co., Ltd, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9733-4346","authenticated-orcid":false,"given":"Yongqiang","family":"Yang","sequence":"additional","affiliation":[{"name":"Huawei Cloud Computing Technology Co., Ltd, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3666-5798","authenticated-orcid":false,"given":"Michael R.","family":"Lyu","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,2,23]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3611643.3616316"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/INM.2015.7140319"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403392"},{"key":"e_1_3_1_5_2","first-page":"777","volume-title":"Proceedings of the 2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE)","author":"Bali Malvinder Singh","year":"2013","unstructured":"Malvinder Singh Bali and Shivani Khurana. 2013. Effect of latency on network and end user domains in cloud computing. In Proceedings of the 2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE). IEEE, 777\u2013782."},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33019428"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335388"},{"key":"e_1_3_1_8_2","unstructured":"Shaked Brody Uri Alon and Eran Yahav. 2021. How attentive are graph attention networks? arXiv:2105.14491. Retrieved from https:\/\/arxiv.org\/abs\/2105.14491"},{"key":"e_1_3_1_9_2","unstructured":"C\u0103t\u0103lina Cangea Petar Veli\u010dkovi\u0107 Nikola Jovanovi\u0107 Thomas Kipf and Pietro Li\u00f2. 2018. Towards sparse hierarchical graph classifiers. arXiv:1811.01287. Retrieved from https:\/\/arxiv.org\/abs\/1811.01287"},{"key":"e_1_3_1_10_2","first-page":"12","article-title":"Learning graph structures with transformer for multivariate time-series anomaly detection in IoT","volume":"9","author":"Chen Zekai","year":"2021","unstructured":"Zekai Chen, Dingshuo Chen, Xiao Zhang, Zixuan Yuan, and Xiuzhen Cheng. 2021. Learning graph structures with transformer for multivariate time-series anomaly detection in IoT. IEEE Internet of Things Journal 9, 12 (2021), 9179\u20139189.","journal-title":"IEEE Internet of Things Journal"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3417055"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510085"},{"key":"e_1_3_1_13_2","first-page":"430","volume-title":"Proceedings of the 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE)","author":"Chen Zhuangbin","year":"2021","unstructured":"Zhuangbin Chen, Jinyang Liu, Yuxin Su, Hongyu Zhang, Xuemin Wen, Xiao Ling, Yongqiang Yang, and Michael R. Lyu. 2021. Graph-based incident aggregation for large-scale online service systems. In Proceedings of the 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). IEEE, 430\u2013442."},{"key":"e_1_3_1_14_2","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1145\/2996890.2996906","volume-title":"Proceedings of the 9th International Conference on Utility and Cloud Computing","author":"Chhetri Mohan Baruwal","year":"2016","unstructured":"Mohan Baruwal Chhetri, Quoc Bao Vo, and Ryszard Kowalczyk. 2016. CL-SLAM: Cross-layer SLA monitoring framework for cloud service-based applications. In Proceedings of the 9th International Conference on Utility and Cloud Computing, 30\u201336."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507743"},{"key":"e_1_3_1_16_2","first-page":"1","volume-title":"Proceedings of the ACM on Programming Languages","volume":"2","author":"Degenbaev Ulan","year":"2018","unstructured":"Ulan Degenbaev, Jochen Eisinger, Kentaro Hara, Marcel Hlopko, Michael Lippautz, and Hannes Payer. 2018. Cross-component garbage collection. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 1\u201324."},{"key":"e_1_3_1_17_2","first-page":"4027","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"35","author":"Deng Ailin","year":"2021","unstructured":"Ailin Deng and Bryan Hooi. 2021. Graph neural network-based anomaly detection in multivariate time series. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 4027\u20134035."},{"key":"e_1_3_1_18_2","first-page":"324","volume-title":"Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC)","author":"Fu Rui","year":"2016","unstructured":"Rui Fu, Zuo Zhang, and Li Li. 2016. Using LSTM and GRU neural network methods for traffic flow prediction. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC). IEEE, 324\u2013328."},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnca.2017.04.007"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3542929.3563482"},{"key":"e_1_3_1_21_2","first-page":"1387","volume-title":"Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC\/FSE)","author":"Guo Xiaofeng","year":"2020","unstructured":"Xiaofeng Guo, Xin Peng, Hanzhang Wang, Wanxue Li, Huai Jiang, Dan Ding, Tao Xie, and Liangfei Su. 2020. Graph-based trace analysis for microservice architecture understanding and problem diagnosis. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC\/FSE), 1387\u20131397."},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539117"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3603269.3604877"},{"issue":"4","key":"e_1_3_1_24_2","first-page":"1705","article-title":"A spatiotemporal deep learning approach for unsupervised anomaly detection in cloud systems","volume":"34","author":"He Zilong","year":"2020","unstructured":"Zilong He, Pengfei Chen, Xiaoyun Li, Yongfeng Wang, Guangba Yu, Cailin Chen, Xinrui Li, and Zibin Zheng. 2020. A spatiotemporal deep learning approach for unsupervised anomaly detection in cloud systems. IEEE Transactions on Neural Networks and Learning Systems 34, 4 (2020), 1705\u20131719.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3485447.3511984"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3560414"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219845"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/2791120"},{"key":"e_1_3_1_29_2","first-page":"448","volume-title":"Proceedings of the International Conference on Machine Learning (ICML)","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning (ICML). PMLR, 448\u2013456."},{"key":"e_1_3_1_30_2","first-page":"150","volume-title":"Proceedings of the 2021 IEEE\/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)","author":"Islam Mohammad S.","year":"2021","unstructured":"Mohammad S. Islam, William Pourmajidi, Lei Zhang, John Steinbacher, Tony Erwin, and Andriy Miranskyy. 2021. Anomaly detection in a large-scale cloud platform. In Proceedings of the 2021 IEEE\/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 150\u2013159."},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/1555228.1555233"},{"issue":"4","key":"e_1_3_1_32_2","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1109\/TPDS.2019.2945315","article-title":"Hotspot-aware hybrid memory management for in-memory key-value stores","volume":"31","author":"Jin Hai","year":"2019","unstructured":"Hai Jin, Zhiwei Li, Haikun Liu, Xiaofei Liao, and Yu Zhang. 2019. Hotspot-aware hybrid memory management for in-memory key-value stores. IEEE Transactions on Parallel and Distributed Systems 31, 4 (2019), 779\u2013792.","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"e_1_3_1_33_2","unstructured":"Ming Jin Huan Yee Koh Qingsong Wen Daniele Zambon Cesare Alippi Geoffrey I. Webb Irwin King and Shirui Pan. 2023. A survey on graph neural networks for time series: Forecasting classification imputation and anomaly detection. arXiv:2307.03759. Retrieved from https:\/\/arxiv.org\/abs\/2307.03759"},{"key":"e_1_3_1_34_2","unstructured":"Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980. Retrieved from https:\/\/arxiv.org\/abs\/1412.6980"},{"key":"e_1_3_1_35_2","unstructured":"Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational Bayes. arXiv:1312.6114. Retrieved from https:\/\/arxiv.org\/abs\/1312.6114"},{"key":"e_1_3_1_36_2","unstructured":"Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907. Retrieved from https:\/\/arxiv.org\/abs\/1609.02907"},{"key":"e_1_3_1_37_2","first-page":"1674","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NeurIPS)","author":"Kiryo Ryuichi","year":"2017","unstructured":"Ryuichi Kiryo, Gang Niu, Marthinus C. Du Plessis, and Masashi Sugiyama. 2017. Positive-unlabeled learning with non-negative risk estimator. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 1674\u20131684."},{"key":"e_1_3_1_38_2","first-page":"1724","volume-title":"Proceedings of the 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE)","author":"Lee Cheryl","year":"2023","unstructured":"Cheryl Lee, Tianyi Yang, Zhuangbin Chen, Yuxin Su, Yongqiang Yang, and Michael R. Lyu. 2023. Heterogeneous anomaly detection for software systems via semi-supervised cross-modal attention. In Proceedings of the 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1724\u20131736."},{"key":"e_1_3_1_39_2","first-page":"3734","volume-title":"Proceedings of the International Conference on Machine Learning (ICML)","author":"Lee Junhyun","year":"2019","unstructured":"Junhyun Lee, Inyeop Lee, and Jaewoo Kang. 2019. Self-attention graph pooling. In Proceedings of the International Conference on Machine Learning (ICML). PMLR, 3734\u20133743."},{"key":"e_1_3_1_40_2","first-page":"121","volume-title":"Proceedings of the 2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE)","author":"Li Xiaoyun","year":"2022","unstructured":"Xiaoyun Li, Guangba Yu, Pengfei Chen, Hongyang Chen, and Zhekang Chen. 2022. Going through the life cycle of faults in clouds: Guidelines on fault handling. In Proceedings of the 2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE). IEEE, 121\u2013132."},{"key":"e_1_3_1_41_2","first-page":"3","volume-title":"Proceedings of the 6th International Conference on Service-Oriented Computing (ICSOC \u201918)","author":"Lin JinJin","year":"2018","unstructured":"JinJin Lin, Pengfei Chen, and Zibin Zheng. 2018. Microscope: Pinpoint performance issues with causal graphs in micro-service environments. In Proceedings of the 6th International Conference on Service-Oriented Computing (ICSOC \u201918). Springer, 3\u201320."},{"key":"e_1_3_1_42_2","first-page":"413","volume-title":"Proceedings of the 2008 8th IEEE International Conference on Data Mining (ICDM)","author":"Liu Fei Tony","year":"2008","unstructured":"Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In Proceedings of the 2008 8th IEEE International Conference on Data Mining (ICDM). IEEE, 413\u2013422."},{"key":"e_1_3_1_43_2","unstructured":"Jinyang Liu Wenwei Gu Zhuangbin Chen Yichen Li Yuxin Su and Michael R. Lyu. 2024. MTAD: Tools and benchmarks for multivariate time series anomaly detection. arXiv:2401.06175. Retrieved from https:\/\/arxiv.org\/abs\/2401.06175"},{"key":"e_1_3_1_44_2","unstructured":"Jinyang Liu Junjie Huang Yintong Huo Zhihan Jiang Jiazhen Gu Zhuangbin Chen Cong Feng Minzhi Yan and Michael R. Lyu. 2023. Scalable and adaptive log-based anomaly detection with expert in the loop. arXiv:2306.05032. Retrieved from https:\/\/arxiv.org\/abs\/2306.05032"},{"key":"e_1_3_1_45_2","first-page":"36","volume-title":"Proceedings of the 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)","author":"Liu Jinyang","year":"2023","unstructured":"Jinyang Liu, Tianyi Yang, Zhuangbin Chen, Yuxin Su, Cong Feng, Zengyin Yang, and Michael R. Lyu. 2023. Practical anomaly detection over multivariate monitoring metrics for online services. In Proceedings of the 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 36\u201345."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3472883.3487003"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.14778\/3389133.3389136"},{"key":"e_1_3_1_48_2","first-page":"413","volume-title":"Proceedings of the 2021 USENIX Annual Technical Conference (USENIX ATC \u201921)","author":"Ma Minghua","year":"2021","unstructured":"Minghua Ma, Shenglin Zhang, Junjie Chen, Jim Xu, Haozhe Li, Yongliang Lin, Xiaohui Nie, Bo Zhou, Yong Wang, and Dan Pei. 2021. Jump-starting multivariate time series anomaly detection for online service systems. In Proceedings of the 2021 USENIX Annual Technical Conference (USENIX ATC \u201921), 413\u2013426."},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10922-019-09504-0"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnca.2016.06.003"},{"key":"e_1_3_1_51_2","first-page":"1","volume-title":"Proceedings of the Managing Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques","author":"Nguyen Hiep","year":"2011","unstructured":"Hiep Nguyen, Yongmin Tan, and Xiaohui Gu. 2011. Pal: Propagation-aware anomaly localization for cloud hosted distributed applications. In Proceedings of the Managing Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques, 1\u20138."},{"key":"e_1_3_1_52_2","unstructured":"Aaron van den Oord Sander Dieleman Heiga Zen Karen Simonyan Oriol Vinyals Alex Graves Nal Kalchbrenner Andrew Senior and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv:1609.03499. Retrieved from https:\/\/arxiv.org\/abs\/1609.03499"},{"issue":"1","key":"e_1_3_1_53_2","first-page":"16","article-title":"Big data processing with Hadoop-MapReduce in cloud systems","volume":"2","author":"Padhy Rabi Prasad","year":"2013","unstructured":"Rabi Prasad Padhy. 2013. Big data processing with Hadoop-MapReduce in cloud systems. International Journal of Cloud Computing and Services Science 2, 1 (2013), 16.","journal-title":"International Journal of Cloud Computing and Services Science"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2018.2801475"},{"key":"e_1_3_1_55_2","first-page":"769","volume-title":"Proceedings of the IEEE 7th International Conference on Cloud Computing","author":"Peiris Manjula","year":"2014","unstructured":"Manjula Peiris, James H. Hill, Jorgen Thelin, Sergey Bykov, Gabriel Kliot, and Christian Konig. 2014. Pad: Performance anomaly detection in multi-server distributed systems. In Proceedings of the IEEE 7th International Conference on Cloud Computing. IEEE, 769\u2013776."},{"key":"e_1_3_1_56_2","first-page":"776","volume-title":"Proceedings of the 2018 IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, 16th International Conference on Pervasive Intelligence and Computing, 4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC\/PiCom\/DataCom\/CyberSciTech)","author":"Picoreti Rodolfo","year":"2018","unstructured":"Rodolfo Picoreti, Alexandre Pereira do Carmo, Felippe Mendonca de Queiroz, Anilton Salles Garcia, Raquel Frizera Vassallo, and Dimitra Simeonidou. 2018. Multilevel observability in cloud orchestration. In Proceedings of the 2018 IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, 16th International Conference on Pervasive Intelligence and Computing, 4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC\/PiCom\/DataCom\/CyberSciTech). IEEE, 776\u2013784."},{"key":"e_1_3_1_57_2","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1007\/978-3-642-10665-1_63","volume-title":"Proceedings of the IEEE International Conference on Cloud Computing","author":"Qian Ling","year":"2009","unstructured":"Ling Qian, Zhiguo Luo, Yujian Du, and Leitao Guo. 2009. Cloud computing: An overview. In Proceedings of the IEEE International Conference on Cloud Computing. Springer, 626\u2013631."},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330680"},{"key":"e_1_3_1_59_2","first-page":"1278","volume-title":"Proceedings of the International Conference on Machine Learning (ICML)","author":"Rezende Danilo Jimenez","year":"2014","unstructured":"Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the International Conference on Machine Learning (ICML). PMLR, 1278\u20131286."},{"key":"e_1_3_1_60_2","first-page":"7","volume-title":"Proceedings of the 2021 IEEE\/ACM International Workshop on Cloud Intelligence (CloudIntelligence)","author":"Scheinert Dominik","year":"2021","unstructured":"Dominik Scheinert, Alexander Acker, Lauritz Thamsen, Morgan K. Geldenhuys, and Odej Kao. 2021. Learning dependencies in distributed cloud applications to identify and localize anomalies. In Proceedings of the 2021 IEEE\/ACM International Workshop on Cloud Intelligence (CloudIntelligence). IEEE, 7\u201312."},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1162\/089976601750264965"},{"key":"e_1_3_1_62_2","doi-asserted-by":"crossref","first-page":"3215","DOI":"10.1145\/3308558.3313653","volume-title":"Proceedings of the World Wide Web Conference (WWW)","author":"Shan Huasong","year":"2019","unstructured":"Huasong Shan, Yuan Chen, Haifeng Liu, Yunpeng Zhang, Xiao Xiao, Xiaofeng He, Min Li, and Wei Ding. 2019. ?-diagnosis: Unsupervised and real-time diagnosis of small-window long-tail latency in large-scale microservice platforms. In Proceedings of the World Wide Web Conference (WWW), 3215\u20133222."},{"key":"e_1_3_1_63_2","first-page":"13016","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NeurIPS)","author":"Shen Lifeng","year":"2020","unstructured":"Lifeng Shen, Zhuocong Li, and James Kwok. 2020. Timeseries anomaly detection using temporal hierarchical one-class network. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 13016\u201313026."},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/3310361"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/3501297"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330672"},{"key":"e_1_3_1_67_2","first-page":"1201","volume-title":"Proceedings of the VLDB Endowment","volume":"15","author":"Tuli Shreshth","year":"2022","unstructured":"Shreshth Tuli, Giuliano Casale, and Nicholas R. Jennings. 2022. TranAD: Deep transformer networks for anomaly detection in multivariate time series data. Proceedings of the VLDB Endowment 15, 6 (2022), 1201\u20131214."},{"key":"e_1_3_1_68_2","first-page":"6000","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NeurIPS)","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 6000\u20136010."},{"key":"e_1_3_1_69_2","first-page":"419","volume-title":"Proceedings of the 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE)","author":"Wang Hanzhang","year":"2021","unstructured":"Hanzhang Wang, Zhengkai Wu, Huai Jiang, Yichao Huang, Jiamu Wang, Selcuk Kopru, and Tao Xie. 2021. Groot: An event-graph-based approach for root cause analysis in industrial settings. In Proceedings of the 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). IEEE, 419\u2013429."},{"key":"e_1_3_1_70_2","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1109\/DSN53405.2022.00036","volume-title":"Proceedings of the 2022 52nd Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN)","author":"Wang Wenlu","year":"2022","unstructured":"Wenlu Wang, Pengfei Chen, Yibin Xu, and Zilong He. 2022. Active-MTSAD: Multivariate time series anomaly detection with active learning. In Proceedings of the 2022 52nd Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 263\u2013274."},{"key":"e_1_3_1_71_2","first-page":"885","volume-title":"Proceedings of the 2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE)","author":"Wang Yaohui","year":"2021","unstructured":"Yaohui Wang, Guozheng Li, Zijian Wang, Yu Kang, Yangfan Zhou, Hongyu Zhang, Feng Gao, Jeffrey Sun, Li Yang, Pochian Lee, et al. 2021. Fast outage analysis of large-scale production clouds with service correlation mining. In Proceedings of the 2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 885\u2013896."},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/2286996.2287001"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380396"},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2018.2843805"},{"key":"e_1_3_1_75_2","unstructured":"Bing Xu Naiyan Wang Tianqi Chen and Mu Li. 2015. Empirical evaluation of rectified activations in convolutional network. arXiv:1505.00853. Retrieved from https:\/\/arxiv.org\/abs\/1505.00853"},{"key":"e_1_3_1_76_2","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3185996"},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539109"},{"key":"e_1_3_1_78_2","first-page":"476","volume-title":"Proceedings of the 2021 IEEE International Conference on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA\/BDCloud\/SocialCom\/SustainCom)","author":"Yan Shili","year":"2021","unstructured":"Shili Yan, Bing Tang, Jincheng Luo, Xing Fu, and Xiaoyuan Zhang. 2021. Unsupervised anomaly detection with variational auto-encoder and local outliers factor for KPIs. In Proceedings of the 2021 IEEE International Conference on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA\/BDCloud\/SocialCom\/SustainCom). IEEE, 476\u2013483."},{"key":"e_1_3_1_79_2","first-page":"24","volume-title":"Proceedings of the ACM on Software Engineering","volume":"1","author":"Yu Guangba","year":"2024","unstructured":"Guangba Yu, Pengfei Chen, Zilong He, Qiuyu Yan, Yu Luo, Fangyuan Li, and Zibin Zheng. 2024. ChangeRCA: Finding root causes from software changes in large online systems. In Proceedings of the ACM on Software Engineering 1, FSE (2024), 24\u201346."},{"key":"e_1_3_1_80_2","first-page":"1763","volume-title":"Proceedings of the 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE)","author":"Yu Guangba","year":"2023","unstructured":"Guangba Yu, Pengfei Chen, Pairui Li, Tianjun Weng, Haibing Zheng, Yuetang Deng, and Zibin Zheng. 2023. Logreducer: Identify and reduce log hotspots in kernel on the fly. In Proceedings of the 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1763\u20131775."},{"key":"e_1_3_1_81_2","doi-asserted-by":"publisher","DOI":"10.1145\/3611643.3616249"},{"key":"e_1_3_1_82_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33011409"},{"key":"e_1_3_1_83_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2023.12.023"},{"key":"e_1_3_1_84_2","doi-asserted-by":"publisher","DOI":"10.1145\/3580305.3599902"},{"issue":"3","key":"e_1_3_1_85_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3440757","article-title":"Predicting performance anomalies in software systems at run-time","volume":"30","author":"Zhao Guoliang","year":"2021","unstructured":"Guoliang Zhao, Safwat Hassan, Ying Zou, Derek Truong, and Toby Corbin. 2021. Predicting performance anomalies in software systems at run-time. ACM Transactions on Software Engineering and Methodology 30, 3 (2021), 1\u201333.","journal-title":"ACM Transactions on Software Engineering and Methodology"},{"key":"e_1_3_1_86_2","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1109\/ICDM50108.2020.00093","volume-title":"Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM)","author":"Zhao Hang","year":"2020","unstructured":"Hang Zhao, Yujing Wang, Juanyong Duan, Congrui Huang, Defu Cao, Yunhai Tong, Bixiong Xu, Jing Bai, Jie Tong, and Qi Zhang. 2020. Multivariate time-series anomaly detection via graph attention network. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 841\u2013850."},{"key":"e_1_3_1_87_2","doi-asserted-by":"publisher","DOI":"10.1145\/3377813.3381363"},{"key":"e_1_3_1_88_2","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3468543"},{"key":"e_1_3_1_89_2","unstructured":"Yue Zhao Zain Nasrullah and Zheng Li. 2019. Pyod: A python toolbox for scalable outlier detection. arXiv:1901.01588. Retrieved from https:\/\/arxiv.org\/abs\/1901.01588"},{"key":"e_1_3_1_90_2","unstructured":"Renyi Zhong Yichen Li Jinxi Kuang Wenwei Gu Yintong Huo and Michael R. Lyu. 2024. Automated defects detection and fix in logging statement. arXiv:2408.03101. Retrieved from https:\/\/arxiv.org\/abs\/2408.03101"},{"key":"e_1_3_1_91_2","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR)","author":"Zong Bo","year":"2018","unstructured":"Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Daeki Cho, and Haifeng Chen. 2018. Deep autoencoding Gaussian mixture model for unsupervised anomaly detection. In Proceedings of the International Conference on Learning Representations (ICLR)."}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3702978","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3702978","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:04Z","timestamp":1750295884000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3702978"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,23]]},"references-count":90,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3702978"],"URL":"https:\/\/doi.org\/10.1145\/3702978","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,23]]},"assertion":[{"value":"2023-12-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-17","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-23","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}