{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T07:37:30Z","timestamp":1769845050575,"version":"3.49.0"},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2022,6,14]],"date-time":"2022-06-14T00:00:00Z","timestamp":1655164800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGOPS Oper. Syst. Rev."],"published-print":{"date-parts":[[2022,6,14]]},"abstract":"<jats:p>Abstract-Cloud applications are increasingly shifting from large monolithic services to complex graphs of loosely-coupled microservices. Despite their benefits, microservices are prone to cascading performance issues, and can lead to prolonged periods of degraded performance.<\/jats:p>\n          <jats:p>We present Sage, a machine learning-driven root cause analysis system for interactive cloud microservices that is both accurate and practical. We show that Sage correctly identifies the root causes of performance issues across a diverse set of microservices and takes action to address them, leading to more predictable, performant, and efficient cloud systems.<\/jats:p>","DOI":"10.1145\/3544497.3544503","type":"journal-article","created":{"date-parts":[[2022,6,15]],"date-time":"2022-06-15T10:06:57Z","timestamp":1655287617000},"page":"34-41","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Enabling Practical Cloud Performance Debugging with Unsupervised Learning"],"prefix":"10.1145","volume":"56","author":[{"given":"Yu","family":"Gan","sequence":"first","affiliation":[{"name":"Cornell University"}]},{"given":"Mingyu","family":"Liang","sequence":"additional","affiliation":[{"name":"Cornell University"}]},{"given":"Sundar","family":"Dev","sequence":"additional","affiliation":[{"name":"Google"}]},{"given":"David","family":"Lo","sequence":"additional","affiliation":[{"name":"Google"}]},{"given":"Christina","family":"Delimitrou","sequence":"additional","affiliation":[{"name":"Cornell University"}]}],"member":"320","published-online":{"date-parts":[[2022,6,14]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"\"Jaeger: open source end-to-end distributed tracing \" https:\/\/www. jaegertracing.io\/.  \"Jaeger: open source end-to-end distributed tracing \" https:\/\/www. jaegertracing.io\/."},{"key":"e_1_2_1_2_1","unstructured":"\"Zipkin \" http:\/\/zipkin.io.  \"Zipkin \" http:\/\/zipkin.io."},{"key":"e_1_2_1_3_1","unstructured":"\"The evolution of microservices \" https:\/\/www.slideshare.net\/ adriancockcroft\/evolution-of-microservices-craft-conference 2016.  \"The evolution of microservices \" https:\/\/www.slideshare.net\/ adriancockcroft\/evolution-of-microservices-craft-conference 2016."},{"key":"e_1_2_1_4_1","volume-title":"workshop: Why, what, and how to get there,\" http:\/\/www.slideshare.net\/adriancockcroft\/ microservices-workshop-craft-conference.","unstructured":"\"Microservices workshop: Why, what, and how to get there,\" http:\/\/www.slideshare.net\/adriancockcroft\/ microservices-workshop-craft-conference. \"Microservices workshop: Why, what, and how to get there,\" http:\/\/www.slideshare.net\/adriancockcroft\/ microservices-workshop-craft-conference."},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-031-01722-3","volume-title":"The Datacenter as a Computer: An Intro- duction to the Design of Warehouse-Scale Machines","author":"Barroso L.","year":"2009","unstructured":"L. Barroso and U. Hoelzle , The Datacenter as a Computer: An Intro- duction to the Design of Warehouse-Scale Machines . MC Publishers , 2009 . L. Barroso and U. Hoelzle, The Datacenter as a Computer: An Intro- duction to the Design of Warehouse-Scale Machines. MC Publishers, 2009."},{"key":"e_1_2_1_6_1","first-page":"1887","volume-title":"Causeinfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems,\" in IEEE INFOCOM 2014 - IEEE Conference on Computer Communications","author":"Chen P.","year":"2014","unstructured":"P. Chen , Y. Qi , P. Zheng , and D. Hou , \" Causeinfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems,\" in IEEE INFOCOM 2014 - IEEE Conference on Computer Communications , 2014 , pp. 1887 -- 1895 . P. Chen, Y. Qi, P. Zheng, and D. Hou, \"Causeinfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems,\" in IEEE INFOCOM 2014 - IEEE Conference on Computer Communications, 2014, pp. 1887--1895."},{"key":"e_1_2_1_7_1","volume-title":"The tail at scale,\" in CACM","author":"Dean J.","unstructured":"J. Dean and L. A. Barroso , \" The tail at scale,\" in CACM , Vol. 56 No. 2. J. Dean and L. A. Barroso, \"The tail at scale,\" in CACM, Vol. 56 No. 2."},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","unstructured":"C. Delimitrou and C. Kozyrakis \"Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters \" in Proceedings of the Eighteenth Interna- tional Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Houston TX USA 2013.  C. Delimitrou and C. Kozyrakis \"Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters \" in Proceedings of the Eighteenth Interna- tional Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Houston TX USA 2013.","DOI":"10.1145\/2451116.2451125"},{"issue":"4","key":"e_1_2_1_9_1","volume":"31","author":"Delimitrou C.","year":"2013","unstructured":"C. Delimitrou and C. Kozyrakis , \"QoS-Aware Scheduling in Heterogeneous Datacenters with Paragon,\" in ACM Transactions on Computer Systems (TOCS) , Vol. 31 Issue 4 . December 2013 . C. Delimitrou and C. Kozyrakis, \"QoS-Aware Scheduling in Heterogeneous Datacenters with Paragon,\" in ACM Transactions on Computer Systems (TOCS), Vol. 31 Issue 4. December 2013.","journal-title":"\"QoS-Aware Scheduling in Heterogeneous Datacenters with Paragon,\" in ACM Transactions on Computer Systems (TOCS)"},{"key":"e_1_2_1_10_1","volume-title":"Quality-of-Service-Aware Scheduling in Heterogeneous Datacenters with Paragon,\" in IEEE Micro Spe- cial Issue on Top Picks from the Computer Architecture Conferences. May\/June","author":"Delimitrou C.","year":"2014","unstructured":"C. Delimitrou and C. Kozyrakis , \" Quality-of-Service-Aware Scheduling in Heterogeneous Datacenters with Paragon,\" in IEEE Micro Spe- cial Issue on Top Picks from the Computer Architecture Conferences. May\/June 2014 . C. Delimitrou and C. Kozyrakis, \"Quality-of-Service-Aware Scheduling in Heterogeneous Datacenters with Paragon,\" in IEEE Micro Spe- cial Issue on Top Picks from the Computer Architecture Conferences. May\/June 2014."},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","unstructured":"C. Delimitrou and C. Kozyrakis \"Quasar: Resource-Efficient and QoSAware Cluster Management \" in Proceedings of the Nineteenth Interna- tional Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Salt Lake City UT USA 2014.  C. Delimitrou and C. Kozyrakis \"Quasar: Resource-Efficient and QoSAware Cluster Management \" in Proceedings of the Nineteenth Interna- tional Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Salt Lake City UT USA 2014.","DOI":"10.1145\/2541940.2541941"},{"key":"e_1_2_1_12_1","volume-title":"April","author":"Delimitrou C.","year":"2016","unstructured":"C. Delimitrou and C. Kozyrakis , \" HCloud: Resource-Efficient Provisioning in Shared Cloud Systems,\" in Proceedings of the Twenty First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , April 2016 . C. Delimitrou and C. Kozyrakis, \"HCloud: Resource-Efficient Provisioning in Shared Cloud Systems,\" in Proceedings of the Twenty First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2016."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037703"},{"key":"e_1_2_1_14_1","volume-title":"August","author":"Delimitrou C.","year":"2015","unstructured":"C. Delimitrou , D. Sanchez , and C. Kozyrakis , \" Tarcil: Reconciling Scheduling Speed and Quality in Large Shared Clusters,\" in Proceedings of the Sixth ACM Symposium on Cloud Computing (SOCC) , August 2015 . C. Delimitrou, D. Sanchez, and C. Kozyrakis, \"Tarcil: Reconciling Scheduling Speed and Quality in Large Shared Clusters,\" in Proceedings of the Sixth ACM Symposium on Cloud Computing (SOCC), August 2015."},{"key":"e_1_2_1_15_1","first-page":"20","volume-title":"NSDI'07","author":"Fonseca R.","year":"2007","unstructured":"R. Fonseca , G. Porter , R. H. Katz , S. Shenker , and I. Stoica , \" X-trace: A pervasive network tracing framework,\" in Proceedings of the 4th USENIX Conference on Networked Systems Design & Implementation, ser . NSDI'07 . Berkeley, CA, USA: USENIX Association , 2007 , pp. 20 -- 20 . R. Fonseca, G. Porter, R. H. Katz, S. Shenker, and I. Stoica, \"X-trace: A pervasive network tracing framework,\" in Proceedings of the 4th USENIX Conference on Networked Systems Design & Implementation, ser. NSDI'07. Berkeley, CA, USA: USENIX Association, 2007, pp. 20--20."},{"issue":"2","key":"e_1_2_1_16_1","volume":"17","author":"Gan Y.","year":"2018","unstructured":"Y. Gan and C. Delimitrou , \"The Architectural Implications of Cloud Microservices,\" in Computer Architecture Letters (CAL) , vol. 17 , iss. 2 , Jul-Dec 2018 . Y. Gan and C. Delimitrou, \"The Architectural Implications of Cloud Microservices,\" in Computer Architecture Letters (CAL), vol.17, iss. 2, Jul-Dec 2018.","journal-title":"\"The Architectural Implications of Cloud Microservices,\" in Computer Architecture Letters (CAL)"},{"key":"#cr-split#-e_1_2_1_17_1.1","doi-asserted-by":"crossref","unstructured":"Y. Gan M. Liang S. Dev D. Lo and C. Delimitrou \"Sage: Practical and scalable ml-driven performance debugging in microservices \" in Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems ser. ASPLOS 2021. New York NY USA: Association for Computing Machinery April 2021 p. 135--151. [Online]. Available: https:\/\/doi.org\/10.1145\/3445814.3446700 10.1145\/3445814.3446700","DOI":"10.1145\/3445814.3446700"},{"key":"#cr-split#-e_1_2_1_17_1.2","doi-asserted-by":"crossref","unstructured":"Y. Gan M. Liang S. Dev D. Lo and C. Delimitrou \"Sage: Practical and scalable ml-driven performance debugging in microservices \" in Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems ser. ASPLOS 2021. New York NY USA: Association for Computing Machinery April 2021 p. 135--151. [Online]. Available: https:\/\/doi.org\/10.1145\/3445814.3446700","DOI":"10.1145\/3445814.3446700"},{"key":"e_1_2_1_18_1","volume-title":"July","author":"Gan Y.","year":"2018","unstructured":"Y. Gan , M. Pancholi , D. Cheng , S. Hu , Y. He , and C. Delimitrou , \" Seer: Leveraging Big Data to Navigate the Complexity of Cloud Debugging,\" in Proceedings of the Tenth USENIX Workshop on Hot Topics in Cloud Computing (HotCloud) , July 2018 . Y. Gan, M. Pancholi, D. Cheng, S. Hu, Y. He, and C. Delimitrou, \"Seer: Leveraging Big Data to Navigate the Complexity of Cloud Debugging,\" in Proceedings of the Tenth USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), July 2018."},{"key":"e_1_2_1_19_1","volume-title":"April","author":"Gan Y.","year":"2019","unstructured":"Y. Gan , Y. Zhang , D. Cheng , A. Shetty , P. Rathi , N. Katarki , A. Bruno , J. Hu , B. Ritchken , B. Jackson , K. Hu , M. Pancholi , Y. He , B. Clancy , C. Colen , F. Wen , C. Leung , S. Wang , L. Zaruvinsky , M. Espinosa , R. Lin , Z. Liu , J. Padilla , and C. Delimitrou , \" An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud and Edge Systems,\" in Proceedings of the Twenty Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , April 2019 . Y. Gan, Y. Zhang, D. Cheng, A. Shetty, P. Rathi, N. Katarki, A. Bruno, J. Hu, B. Ritchken, B. Jackson, K. Hu, M. Pancholi, Y. He, B. Clancy, C. Colen, F. Wen, C. Leung, S. Wang, L. Zaruvinsky, M. Espinosa, R. Lin, Z. Liu, J. Padilla, and C. Delimitrou, \"An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud and Edge Systems,\" in Proceedings of the Twenty Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2019."},{"key":"e_1_2_1_20_1","volume-title":"April","author":"Gan Y.","year":"2019","unstructured":"Y. Gan , Y. Zhang , K. Hu , Y. He , M. Pancholi , D. Cheng , and C. Delimitrou , \" Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices,\" in Proceedings of the Twenty Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , April 2019 . Y. Gan, Y. Zhang, K. Hu, Y. He, M. Pancholi, D. Cheng, and C. Delimitrou, \"Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices,\" in Proceedings of the Twenty Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2019."},{"key":"e_1_2_1_21_1","volume-title":"April","author":"Lazarev N.","year":"2021","unstructured":"N. Lazarev , N. Adit , S. Xiang , Z. Zhang , and C. Delimitrou , \" Dagger: Towards Efficient RPCs in Cloud Microservices with Near-Memory Reconfigurable NICs,\" in Proceedings of the Twenty Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , April 2021 . N. Lazarev, N. Adit, S. Xiang, Z. Zhang, and C. Delimitrou, \"Dagger: Towards Efficient RPCs in Cloud Microservices with Near-Memory Reconfigurable NICs,\" in Proceedings of the Twenty Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2021."},{"key":"e_1_2_1_22_1","first-page":"3","volume-title":"Microscope: Pinpoint performance issues with causal graphs in micro-service environments,\" in International Conference on Service-Oriented Computing","author":"Lin J.","year":"2018","unstructured":"J. Lin , P. Chen , and Z. Zheng , \" Microscope: Pinpoint performance issues with causal graphs in micro-service environments,\" in International Conference on Service-Oriented Computing . Springer , 2018 , pp. 3 -- 20 . J. Lin, P. Chen, and Z. Zheng, \"Microscope: Pinpoint performance issues with causal graphs in micro-service environments,\" in International Conference on Service-Oriented Computing. Springer, 2018, pp. 3-- 20."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2391229.2391236"},{"key":"e_1_2_1_24_1","volume-title":"Tech. Rep., 2010","author":"Sigelman B. H.","year":"2010","unstructured":"B. H. Sigelman , L. A. Barroso , M. Burrows , P. Stephenson , M. Plakal , D. Beaver , S. Jaspan , and C. Shanbhag , \" Dapper, a large-scale distributed systems tracing infrastructure,\" Google, Inc ., Tech. Rep., 2010 . [Online]. Available : https:\/\/research.google.com\/archive\/papers\/ dapper- 2010 --1.pdf B. H. Sigelman, L. A. Barroso, M. Burrows, P. Stephenson, M. Plakal, D. Beaver, S. Jaspan, and C. Shanbhag, \"Dapper, a large-scale distributed systems tracing infrastructure,\" Google, Inc., Tech. Rep., 2010. [Online]. Available: https:\/\/research.google.com\/archive\/papers\/ dapper-2010--1.pdf"},{"key":"#cr-split#-e_1_2_1_25_1.1","doi-asserted-by":"crossref","unstructured":"A. Sriraman and A. Dhanotia \"Accelerometer: Understanding acceleration opportunities for data center overheads at hyperscale \" in Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems ser. ASPLOS '20. New York NY USA: Association for Computing Machinery 2020 p. 733--750. [Online]. Available: https:\/\/doi.org\/10.1145\/3373376.3378450 10.1145\/3373376.3378450","DOI":"10.1145\/3373376.3378450"},{"key":"#cr-split#-e_1_2_1_25_1.2","doi-asserted-by":"crossref","unstructured":"A. Sriraman and A. Dhanotia \"Accelerometer: Understanding acceleration opportunities for data center overheads at hyperscale \" in Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems ser. ASPLOS '20. New York NY USA: Association for Computing Machinery 2020 p. 733--750. [Online]. Available: https:\/\/doi.org\/10.1145\/3373376.3378450","DOI":"10.1145\/3373376.3378450"},{"key":"e_1_2_1_26_1","first-page":"177","volume-title":"CA: USENIX Association","author":"Sriraman A.","year":"2018","unstructured":"A. Sriraman and T. F. Wenisch , \"\" tune: Auto-tuned threading for OLDI microservices,\" in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). Carlsbad , CA: USENIX Association , Oct. 2018 , pp. 177 -- 194 . [Online]. Available : https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/sriraman A. Sriraman and T. F. Wenisch, \"\"tune: Auto-tuned threading for OLDI microservices,\" in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). Carlsbad, CA: USENIX Association, Oct. 2018, pp. 177--194. [Online]. Available: https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/sriraman"},{"key":"e_1_2_1_27_1","volume-title":"April","author":"Zhang Y.","year":"2021","unstructured":"Y. Zhang , W. Hua , Z. Zhou , E. Suh , and C. Delimitrou , \" Sinan: MLBased and QoS-Aware Resource Management for Cloud Microservices,\" in Proceedings of the Twenty Sixth International Conference on Archi- tectural Support for Programming Languages and Operating Systems (ASPLOS) , April 2021 . Y. Zhang, W. Hua, Z. Zhou, E. Suh, and C. Delimitrou, \"Sinan: MLBased and QoS-Aware Resource Management for Cloud Microservices,\" in Proceedings of the Twenty Sixth International Conference on Archi- tectural Support for Programming Languages and Operating Systems (ASPLOS), April 2021."}],"container-title":["ACM SIGOPS Operating Systems Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3544497.3544503","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3544497.3544503","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:54Z","timestamp":1750186974000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3544497.3544503"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,14]]},"references-count":29,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,6,14]]}},"alternative-id":["10.1145\/3544497.3544503"],"URL":"https:\/\/doi.org\/10.1145\/3544497.3544503","relation":{},"ISSN":["0163-5980"],"issn-type":[{"value":"0163-5980","type":"print"}],"subject":[],"published":{"date-parts":[[2022,6,14]]},"assertion":[{"value":"2022-06-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}