{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T10:24:55Z","timestamp":1779359095847,"version":"3.51.4"},"reference-count":72,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,4,23]],"date-time":"2021-04-23T00:00:00Z","timestamp":1619136000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2021,7,31]]},"abstract":"<jats:p>\n            High performance is a critical factor to achieve and maintain the success of a software system. Performance anomalies represent the performance degradation issues (e.g., slowing down in system response times) of software systems at run-time. Performance anomalies can cause a dramatically negative impact on users\u2019 satisfaction. Prior studies propose different approaches to detect anomalies by analyzing execution logs and resource utilization metrics after the anomalies have happened. However, the prior detection approaches cannot predict the anomalies ahead of time; such limitation causes an inevitable delay in taking corrective actions to prevent performance anomalies from happening. We propose an approach that can\n            <jats:italic>predict performance anomalies<\/jats:italic>\n            in software systems and raise anomaly warnings in advance. Our approach uses a Long-Short Term Memory neural network to capture the normal behaviors of a software system. Then, our approach predicts performance anomalies by identifying the early deviations from the captured normal system behaviors. We conduct extensive experiments to evaluate our approach using two real-world software systems (i.e., Elasticsearch and Hadoop). We compare the performance of our approach with two baselines. The first baseline is one state-to-the-art baseline called Unsupervised Behavior Learning. The second baseline predicts performance anomalies by checking if the resource utilization exceeds pre-defined thresholds. Our results show that our approach can predict various performance anomalies with high precision (i.e., 97\u2013100%) and recall (i.e., 80\u2013100%), while the baselines achieve 25\u201397% precision and 93\u2013100% recall. For a range of performance anomalies, our approach can achieve sufficient lead times that vary from 20 to 1,403 s (i.e., 23.4 min). We also demonstrate the ability of our approach to predict the performance anomalies that are caused by real-world performance bugs. For predicting performance anomalies that are caused by real-world performance bugs, our approach achieves 95\u2013100% precision and 87\u2013100% recall, while the baselines achieve 49\u201383% precision and 100% recall. The obtained results show that our approach outperforms the existing anomaly prediction approaches and is able to predict performance anomalies in real-world systems.\n          <\/jats:p>","DOI":"10.1145\/3440757","type":"journal-article","created":{"date-parts":[[2021,4,23]],"date-time":"2021-04-23T10:32:50Z","timestamp":1619173970000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":24,"title":["Predicting Performance Anomalies in Software Systems at Run-time"],"prefix":"10.1145","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0152-5100","authenticated-orcid":false,"given":"Guoliang","family":"Zhao","sequence":"first","affiliation":[{"name":"School of Computing, Queen\u2019s University, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Safwat","family":"Hassan","sequence":"additional","affiliation":[{"name":"Department of Engineering, Thompson Rivers University, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ying","family":"Zou","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Derek","family":"Truong","sequence":"additional","affiliation":[{"name":"IBM, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Toby","family":"Corbin","sequence":"additional","affiliation":[{"name":"IBM, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,4,23]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Amazon. [n.d.]. Amazon EC2 Instance Types. Retrieved from\u00a0https:\/\/aws.amazon.com\/ec2\/instance-types\/."},{"key":"e_1_2_1_2_1","unstructured":"Apache. [n.d.]. Apache Hadoop RandomTextWriter application. Retrieved from\u00a0https:\/\/hadoop.apache.org\/docs\/r1.2.1\/api\/org\/apache\/hadoop\/examples\/RandomTextWriter.html."},{"key":"e_1_2_1_3_1","unstructured":"Apache. [n.d.]. Apache Hadoop System. Retrieved from http:\/\/hadoop.apache.org\/."},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201904)","volume":"4","author":"Barham Paul","year":"2004","unstructured":"Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier. 2004. Using magpie for request extraction and workload modelling. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201904), Vol. 4. 18--18."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 27th International Conference on Software Engineering. 571--579","author":"Berner Stefan","unstructured":"Stefan Berner, Roland Weber, and Rudolf K. Keller. 2005. Observations and lessons learned from automated testing. In Proceedings of the 27th International Conference on Software Engineering. 571--579."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the Systems Modeling Language Conference (SysML).","author":"Bod\u00edk Peter","year":"2008","unstructured":"Peter Bod\u00edk, Moises Goldszmidt, and Armando Fox. 2008. HiLighter: Automatically building robust signatures of performance behavior for small-and large-scale systems. In Proceedings of the Systems Modeling Language Conference (SysML)."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSAA.2015.7344872"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICAC.2004.1301345"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2008.4630116"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1095810.1095821"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2371536.2371572"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2016.0103"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133956.3134015"},{"key":"e_1_2_1_14_1","unstructured":"Elastic. [n.d.]. Rally. Retrieved from https:\/\/github.com\/elastic\/rally."},{"key":"e_1_2_1_15_1","unstructured":"ElasticSearch. [n.d.]. Elasticsearch. Retrieved from https:\/\/www.elastic.co."},{"key":"e_1_2_1_16_1","unstructured":"ElasticSearch. [n.d.]. Elasticsearch Reference. Retrieved from https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/5.3\/modules-threadpool.html."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2382553.2382555"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2009.60"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.128"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2006.70"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the IEEE International Conference on Web Services (ICWS\u201917)","author":"He Pinjia","unstructured":"Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R. Lyu. 2017. Drain: An online log parsing approach with fixed depth tree. In Proceedings of the IEEE International Conference on Web Services (ICWS\u201917). IEEE, 33--40."},{"key":"e_1_2_1_22_1","volume-title":"Advances in Neural Information Processing Systems","author":"Hermans Michiel","unstructured":"Michiel Hermans and Benjamin Schrauwen. 2013. Training and analysing deep recurrent neural networks. In Advances in Neural Information Processing Systems. MIT Press, 190--198."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the IEEE\/ACM 24th International Symposium on Quality of Service (IWQoS\u201916)","author":"Huang Shaohan","year":"2016","unstructured":"Shaohan Huang, Carol Fung, Kui Wang, Polo Pei, Zhongzhi Luan, and Depei Qian. 2016. Using recurrent neural networks toward black-box system anomaly prediction. In Proceedings of the IEEE\/ACM 24th International Symposium on Quality of Service (IWQoS\u201916). IEEE, 1--10."},{"key":"e_1_2_1_25_1","unstructured":"IBM. [n.d.]. IBM Javametrics. Retrieved from https:\/\/developer.ibm.com\/javasdk\/application-metrics-java\/."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10586-006-0008-1"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2006.52"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the IEEE\/IFIP International Conference on Dependable Systems & Networks. IEEE, 285--294","author":"Jiang Miao","unstructured":"Miao Jiang, Mohammad A. Munawar, Thomas Reidemeister, and Paul A. S. Ward. 2009. Automatic fault detection and diagnosis in complex software systems by information-theoretic monitoring. In Proceedings of the IEEE\/IFIP International Conference on Dependable Systems & Networks. IEEE, 285--294."},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 6th International Conference on Autonomic Computing. ACM, 13--22","author":"Jiang Miao","unstructured":"Miao Jiang, Mohammad A. Munawar, Thomas Reidemeister, and Paul A. S. Ward. 2009. System monitoring with metric-correlation models: Problems and solutions. In Proceedings of the 6th International Conference on Autonomic Computing. ACM, 13--22."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2345156.2254075"},{"key":"e_1_2_1_31_1","unstructured":"JustGlowing. [n.d.]. MiniSom: a minimalistic implementation of the Self Organizing Maps. Retrieved from https:\/\/github.com\/JustGlowing\/minisom."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2005.79"},{"key":"e_1_2_1_33_1","unstructured":"Keras. [n.d.]. Keras: The Python Deep Learning library. Retrieved from https:\/\/keras.io\/."},{"key":"e_1_2_1_34_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2014","unstructured":"Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. Retrieved from https:\/\/arXiv:1412.6980."},{"key":"e_1_2_1_35_1","unstructured":"Mayuresh Kunjir Yuzhang Han and Shivnath Babu. 2016. Where does memory go?: Study of memory management in JVM-based data analytics."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2007.46"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2889160.2889232"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/3297753.3297759"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the USENIX Annual Technical Conference. 1--14","author":"Lou Jian-Guang","year":"2010","unstructured":"Jian-Guang Lou, Qiang Fu, Shengqi Yang, Ye Xu, and Jiang Li. 2010. Mining invariants from console logs for system problem detection. In Proceedings of the USENIX Annual Technical Conference. 1--14."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2011.138"},{"key":"e_1_2_1_41_1","unstructured":"Pankaj Malhotra Anusha Ramakrishnan Gaurangi Anand Lovekesh Vig Puneet Agarwal and Gautam Shroff. 2016. LSTM-based encoder-decoder for multi-sensor anomaly detection. Retrieved from https:\/\/arXiv:1607.00148."},{"key":"e_1_2_1_42_1","volume-title":"Proceedings. Presses Universitaires de Louvain, 89","author":"Malhotra Pankaj","year":"2015","unstructured":"Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal. 2015. Long short term memory networks for anomaly detection in time series. In Proceedings. Presses Universitaires de Louvain, 89."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-15-0121-0"},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the Conference of the Center for Advanced Studies on Collaborative Research. IBM Corp., 152--166","author":"Mohammad","unstructured":"Mohammad A. Munawar and Paul A. S. Ward. 2007. A comparative study of pairwise regression techniques for problem determination. In Proceedings of the Conference of the Center for Advanced Studies on Collaborative Research. IBM Corp., 152--166."},{"key":"e_1_2_1_45_1","volume-title":"Ward","author":"Munawar Mohammad Ahmad","year":"2007","unstructured":"Mohammad Ahmad Munawar and Paul A. S. Ward. 2007. Leveraging many simple statistical models to adaptively monitor software systems. In Proceedings of the International Symposium on Parallel and Distributed Processing and Applications. Springer, 457--470."},{"key":"e_1_2_1_46_1","volume-title":"Openjdk documentation. Retrieved","year":"2019","unstructured":"Openjdk. [n.d.]. Openjdk documentation. Retrieved December 2, 2019 from http:\/\/openjdk.java.net\/groups\/hotspot\/docs\/HotSpotGlossary.html"},{"key":"e_1_2_1_47_1","unstructured":"Oracle. [n.d.]. jconsole. Retrieved from http:\/\/openjdk.java.net\/tools\/svc\/jconsole\/."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1081870.1081976"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2015.7113365"},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the Software Evolution Week\u2014IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE\u201914)","author":"Saha Ripon K.","unstructured":"Ripon K. Saha, Sarfraz Khurshid, and Dewayne E. Perry. 2014. An empirical study of long lived bugs. In Proceedings of the Software Evolution Week\u2014IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE\u201914). IEEE, 144--153."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555349.1555360"},{"key":"e_1_2_1_52_1","unstructured":"solarwinds. [n.d.]. Solarwinds SAM Server & Application Monitor. Retrieved from https:\/\/www.solarwinds.com\/server-application-monitor?CMP=BIZ-TAD-PCWDLD-SAM_PP-A-PP-Q116."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1272996.1273002"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2012-65"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOTS.2010.22"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835698.1835741"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2012.65"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063576.2063690"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSAA.2016.20"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/1375457.1375489"},{"key":"e_1_2_1_61_1","unstructured":"Dylan Tweney. 2013. Amazon website goes down for 40 minutes costing the company $5 million. Retrieved from https:\/\/venturebeat.com\/2013\/08\/19\/amazon-website-down\/."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPOM.2003.1251233"},{"key":"e_1_2_1_63_1","unstructured":"VMware. [n.d.]. Virtual machine CPU usage alarm. Retrieved from https:\/\/kb.vmware.com\/s\/article\/2057830."},{"key":"e_1_2_1_64_1","volume-title":"Proceedings of the IEEE Network Operations and Management Symposium (NOMS\u201910)","author":"Wang Chengwei","year":"2010","unstructured":"Chengwei Wang, Vanish Talwar, Karsten Schwan, and Parthasarathy Ranganathan. 2010. Online detection of utility cloud anomalies using metric distributions. In Proceedings of the IEEE Network Operations and Management Symposium (NOMS\u201910). IEEE, 96--103."},{"key":"e_1_2_1_65_1","unstructured":"James C. Warner. 2013. top Linux man page. Retrieved from https:\/\/linux.die.net\/man\/1\/top."},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2007.370345"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.3354\/cr030079"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2009.19"},{"key":"e_1_2_1_69_1","volume-title":"Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. ACM, 117--132","author":"Xu Wei","unstructured":"Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael I. Jordan. 2009. Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. ACM, 117--132."},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/2523649.2523670"},{"key":"e_1_2_1_71_1","unstructured":"Chunting Zhou Chonglin Sun Zhiyuan Liu and Francis Lau. 2015. A C-LSTM neural network for text classification. Retrieved from https:\/\/arXiv:1511.08630."},{"key":"e_1_2_1_72_1","volume-title":"Lyu","author":"Zhu Jieming","year":"2018","unstructured":"Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, and Michael R. Lyu. 2018. Tools and benchmarks for automated log parsing. Retrieved from https:\/\/arXiv:1811.03509."}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3440757","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3440757","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:28:18Z","timestamp":1750195698000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3440757"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,23]]},"references-count":72,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,7,31]]}},"alternative-id":["10.1145\/3440757"],"URL":"https:\/\/doi.org\/10.1145\/3440757","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,23]]},"assertion":[{"value":"2020-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}