{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T15:10:23Z","timestamp":1755789023393,"version":"3.44.0"},"publisher-location":"New York, NY, USA","reference-count":115,"publisher":"ACM","license":[{"start":{"date-parts":[[2025,3,30]],"date-time":"2025-03-30T00:00:00Z","timestamp":1743292800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,3,30]]},"DOI":"10.1145\/3669940.3707244","type":"proceedings-article","created":{"date-parts":[[2025,2,6]],"date-time":"2025-02-06T12:28:01Z","timestamp":1738844881000},"page":"214-232","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Cooperative Graceful Degradation in Containerized Clouds"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-3110-3041","authenticated-orcid":false,"given":"Kapil","family":"Agrawal","sequence":"first","affiliation":[{"name":"University of California, Irvine, Irvine, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-0503-4478","authenticated-orcid":false,"given":"Sangeetha","family":"Abdu Jyothi","sequence":"additional","affiliation":[{"name":"University of California, Irvine and VMware Research, Irvine, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,3,30]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3477132.3483546"},{"key":"e_1_3_2_1_2_1","unstructured":"Fault tolerance through optimal workload placement. https:\/\/engineering.fb.com\/2020\/09\/08\/data-center-engineering\/faulttolerance-through-optimal-workload-placement\/. (Accessed on 03\/12\/2023)."},{"key":"e_1_3_2_1_3_1","first-page":"373","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)","author":"Veeraraghavan Kaushik","year":"2018","unstructured":"Kaushik Veeraraghavan, Justin Meza, Scott Michelson, Sankaralingam Panneerselvam, Alex Gyori, David Chou, Sonia Margulis, Daniel Obenshain, Shruti Padmanabha, Ashish Shah, et al. Maelstrom: Mitigating datacenter-level disasters by draining interdependent traffic safely and efficiently. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 373--389, 2018."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3542929.3563482"},{"key":"e_1_3_2_1_5_1","unstructured":"AWS outage: What happens when the world's largest cloud service provider goes offline? https:\/\/techwireasia.com\/06\/2023\/whathappens-when-the-worlds-largest-cloud-service-provider-goesoffline\/. (Accessed on 04\/04\/2023)."},{"key":"e_1_3_2_1_6_1","unstructured":"Google's London data center outage during heatwave caused by ''simultaneous failure of multiple redundant cooling systems''. https:\/\/www.datacenterdynamics.com\/en\/news\/googles-londondata-center-outage-during-heatwave-caused-by-simultaneousfailure-of-multiple-redundant-cooling-systems\/. (Accessed on 04\/21\/2024)."},{"key":"e_1_3_2_1_7_1","unstructured":"AWS Internet Outage Cause Human Error Incorrect Command. https:\/\/www.vox.com\/2017\/3\/2\/14792636\/amazon-awsinternet-outage-cause-human-error-incorrect-command. (Accessed on 04\/04\/2023)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359664"},{"key":"e_1_3_2_1_9_1","first-page":"217","volume-title":"18th USENIX Symposium on Networked Systems Design and Implementation","author":"Xia Yiting","year":"2021","unstructured":"Yiting Xia, Ying Zhang, Zhizhen Zhong, Guanqing Yan, Chiunlin Lim, Satyajeet Singh Ahuja, Soshant Bali, Alexander Nikolaidis, Kimia Ghobadi, and Manya Ghobadi. A social network under social distancing: risk-driven backbone management during covid-19 and beyond. In 18th USENIX Symposium on Networked Systems Design and Implementation, pages 217--231. USENIX Association, 2021."},{"key":"e_1_3_2_1_10_1","unstructured":"Managing Failure Modes in Microservice Architectures. https:\/\/www.infoq.com\/presentations\/microservices-failure-modes\/. (Accessed on 05\/12\/2022)."},{"key":"e_1_3_2_1_11_1","first-page":"59","volume-title":"FAST","volume":"4","author":"Keeton Kimberly","year":"2004","unstructured":"Kimberly Keeton, Cipriano A Santos, Dirk Beyer, Jeffrey S Chase, John Wilkes, et al. Designing for disasters. In FAST, volume 4, pages 59--62, 2004."},{"key":"e_1_3_2_1_12_1","first-page":"589","volume-title":"17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)","author":"Eriksen Marius","year":"2023","unstructured":"Marius Eriksen, Kaushik Veeraraghavan, Yusuf Abdulghani, Andrew Birchall, Po-Yen Chou, Richard Cornew, Adela Kabiljo, Maroo Lieuw, Justin Meza, Scott Michelson, et al. Global capacity management with flux. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), pages 589--606, 2023."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3452296.3472902"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3477132.3483578"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/50202.50214"},{"key":"e_1_3_2_1_16_1","unstructured":"TIA 942. https:\/\/tiaonline.org\/standards-the-key-to-improving-data-center-resilience-efficiency-and-sustainability\/#:~:text=One%20of%20the%20most%20significant are%20always%20available%20when%20needed. (Accessed on 04\/21\/2024)."},{"key":"e_1_3_2_1_17_1","unstructured":"Shrinking the time to mitigate production incidents-CRE life lessons. https:\/\/cloud.google.com\/blog\/products\/management-tools\/shrinking-the-time-to-mitigate-production-incidents. (Accessed on 04\/04\/2023)."},{"key":"e_1_3_2_1_18_1","unstructured":"Chaos Engineering. https:\/\/netflixtechblog.com\/tagged\/chaosengineering. (Accessed on 04\/04\/2023)."},{"key":"e_1_3_2_1_19_1","unstructured":"Verify the resilience of your workloads using Chaos Engineering. https:\/\/aws.amazon.com\/blogs\/architecture\/verify-theresilience-of-your-workloads-using-chaos-engineering\/. (Accessed on 04\/04\/2023)."},{"key":"e_1_3_2_1_20_1","unstructured":"Failover with AWS. https:\/\/docs.aws.amazon.com\/whitepapers\/latest\/web-application-hosting-best-practices\/failover-withaws.html. (Accessed on 04\/04\/2023)."},{"key":"e_1_3_2_1_21_1","first-page":"1155","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Levy Sebastien","year":"2020","unstructured":"Sebastien Levy, Randolph Yao, Youjiang Wu, Yingnong Dang, Peng Huang, Zheng Mu, Pu Zhao, Tarun Ramani, Naga Govindaraju, Xukun Li, et al. Predictive and adaptive failure mitigation to avert production cloud {VM} interruptions. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 1155--1170, 2020."},{"key":"e_1_3_2_1_22_1","first-page":"287","volume-title":"17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)","author":"Lyu Jialun","year":"2023","unstructured":"Jialun Lyu, Marisa You, Celine Irvene, Mark Jung, Tyler Narmore, Jacob Shapiro, Luke Marshall, Savyasachi Samal, Ioannis Manousakis, Lisa Hsu, et al. Hyrax:{Fail-in-Place} server operation in cloud platforms. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), pages 287--304, 2023."},{"key":"e_1_3_2_1_23_1","first-page":"473","volume-title":"2021 USENIX Annual Technical Conference (USENIX ATC 21)","author":"Kumbhare Alok Gautam","year":"2021","unstructured":"Alok Gautam Kumbhare, Reza Azimi, Ioannis Manousakis, Anand Bonde, Felipe Frujeri, Nithish Mahalingam, Pulkit A Misra, Seyyed Ahmad Javadi, Bianca Schroeder, Marcus Fontoura, et al. {Prediction-Based} power oversubscription in cloud platforms. In 2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 473--487, 2021."},{"key":"e_1_3_2_1_24_1","first-page":"1241","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Li Shaohong","year":"2020","unstructured":"Shaohong Li, Xi Wang, Faria Kalim, Xiao Zhang, Sangeetha Abdu Jyothi, Karan Grover, Vasileios Kontorinis, Nina Narodytska, Owolabi Legunsen, Sreekumar Kodakara, et al. Thunderbolt:{Throughput-Optimized},{Quality-of-Service-Aware} power capping at scale. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 1241--1255, 2020."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2342356.2342438"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3617232.3624853"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001187"},{"key":"e_1_3_2_1_28_1","first-page":"325","volume-title":"19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)","author":"Krishnaswamy Umesh","year":"2022","unstructured":"Umesh Krishnaswamy, Rachee Singh, Nikolaj Bj\u00f8rner, and Himanshu Raj. Decentralized cloud wide-area network traffic engineering with {BLASTSHIELD}. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 325--338, 2022."},{"key":"e_1_3_2_1_29_1","first-page":"2013","article-title":"The datacenter as a computer","author":"Barroso Luiz Andr\u00e9","year":"2018","unstructured":"Luiz Andr\u00e9 Barroso, U Holzle, and P Ranganathan. The datacenter as a computer. Morgan Claypool 2013, 2018.","journal-title":"Morgan Claypool"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCOM.2014.6917403"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.adhoc.2015.12.008"},{"key":"e_1_3_2_1_32_1","first-page":"659","volume-title":"Lessons learned from natural disasters and preparedness of data centers","author":"Geng Hwaiyu","year":"2015","unstructured":"Hwaiyu Geng and Masatoshi Kajimoto. Lessons learned from natural disasters and preparedness of data centers. Data Center Handbook, pages 659--667, 2015."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2568225.2568227"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.80192"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/HOTOS.1999.798396"},{"key":"e_1_3_2_1_36_1","first-page":"15","volume-title":"9th USENIX symposium on networked systems design and implementation (NSDI 12)","author":"Zaharia Matei","year":"2012","unstructured":"Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J Franklin, Scott Shenker, and Ion Stoica. Resilient distributed datasets: A {Fault-Tolerant} abstraction for {In-Memory} cluster computing. In 9th USENIX symposium on networked systems design and implementation (NSDI 12), pages 15--28, 2012."},{"key":"e_1_3_2_1_37_1","first-page":"607","volume-title":"17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)","author":"Meza Justin J","year":"2023","unstructured":"Justin J Meza, Thote Gowda, Ahmed Eid, Tomiwa Ijaware, Dmitry Chernyshev, Yi Yu, Md Nazim Uddin, Rohan Das, Chad Nachiappan, Sari Tran, et al. Defcon: Preventing overload with graceful feature degradation. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), pages 607--622, 2023."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/1751626.1751630"},{"key":"e_1_3_2_1_40_1","first-page":"87","volume-title":"USENIX Annual Technical Conference, General Track","author":"von Behren J Robert","year":"2002","unstructured":"J Robert von Behren, Eric A Brewer, Nikita Borisov, Michael Chen, Matt Welsh, Josh MacDonald, Jeremy Lau, and David E Culler. Ninja: A framework for network services. In USENIX Annual Technical Conference, General Track, pages 87--102, 2002."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1389-1286(99)00031-6"},{"key":"e_1_3_2_1_42_1","volume-title":"Workshop on Dependable Software, Tools and Methods","author":"Hibino Hideaki","year":"2005","unstructured":"Hideaki Hibino, Kenichi Kourai, and S Shiba. Difference of degradation schemes among operating systems: Experimental analysis for web application servers. In Workshop on Dependable Software, Tools and Methods, Yokohama, Japan. Citeseer, 2005."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3267809.3267823"},{"key":"e_1_3_2_1_44_1","unstructured":"SpringBooth. https:\/\/spring.io\/projects\/spring-boot. (Accessed on 04\/21\/2024)."},{"key":"e_1_3_2_1_45_1","unstructured":"Resilience4j. https:\/\/resilience4j.readme.io\/docs\/getting-started-3. (Accessed on 04\/21\/2024)."},{"key":"e_1_3_2_1_46_1","unstructured":"GoBackoff. https:\/\/github.com\/cenkalti\/backoff. (Accessed on 04\/21\/2024)."},{"key":"e_1_3_2_1_47_1","unstructured":"GoLimiter. https:\/\/github.com\/ulule\/limiter. (Accessed on 04\/21\/2024)."},{"key":"e_1_3_2_1_48_1","unstructured":"Hystrix. https:\/\/github.com\/Netflix\/Hystrix. (Accessed on 04\/21\/2024)."},{"key":"e_1_3_2_1_49_1","unstructured":"Simplify observability traffic management security and policy with the leading service mesh. https:\/\/istio.io. (Accessed on 04\/04\/2023)."},{"key":"e_1_3_2_1_50_1","unstructured":"Sentinel. https:\/\/github.com\/alibaba\/Sentinel. (Accessed on 04\/21\/2024)."},{"key":"e_1_3_2_1_51_1","unstructured":"gRPC: Identifying Failed Connections. https:\/\/grpc.io\/blog\/grpc-on-http2\/#identifying-failed-connections. (Accessed on 11\/26\/2023)."},{"key":"e_1_3_2_1_52_1","unstructured":"Wire: Automatic Dependency Injection in Go. https:\/\/go.dev\/blog\/wire. (Accessed on 04\/06\/2023)."},{"key":"e_1_3_2_1_53_1","unstructured":"Distributed Systems Safety Research. https:\/\/jepsen.io. (Accessed on 04\/03\/2023)."},{"key":"e_1_3_2_1_54_1","unstructured":"The Netflix Simian Army. http:\/\/techblog.netflix.com\/2011\/07\/netflix-simian-army.html. (Accessed on 04\/03\/2023)."},{"key":"e_1_3_2_1_55_1","unstructured":"LitmusChaos: Open Source Chaos Engineering platform. https:\/\/litmuschaos.io. (Accessed on 04\/04\/2023)."},{"key":"e_1_3_2_1_56_1","unstructured":"Manage reliability to a higher standard with Gremlin. https:\/\/www.gremlin.com. (Accessed on 04\/04\/2023)."},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3342195.3387517"},{"key":"e_1_3_2_1_58_1","first-page":"787","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Tang Chunqiang","year":"2020","unstructured":"Chunqiang Tang, Kenny Yu, Kaushik Veeraraghavan, Jonathan Kaldor, Scott Michelson, Thawan Kooburat, Aravind Anbudurai, Matthew Clark, Kabir Gogia, Long Cheng, et al. Twine: A unified cluster management system for shared infrastructure. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 787--803, 2020."},{"key":"e_1_3_2_1_59_1","unstructured":"Production Grade Container Orchestration. https:\/\/kubernetes.io. (Accessed on 06\/11\/2022)."},{"key":"e_1_3_2_1_60_1","volume-title":"Evolved The easy to use, online, collaborative LaTeX editor","author":"Overleaf","year":"2023","unstructured":"Overleaf: LaTeX, Evolved The easy to use, online, collaborative LaTeX editor. https:\/\/www.overleaf.com. (Accessed on 12\/06\/2023)."},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304013"},{"key":"e_1_3_2_1_62_1","first-page":"22","volume-title":"NSDI","volume":"11","author":"Hindman Benjamin","year":"2011","unstructured":"Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D Joseph, Randy H Katz, Scott Shenker, and Ion Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In NSDI, volume 11, pages 22--22, 2011."},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/2523616.2523633"},{"key":"e_1_3_2_1_64_1","unstructured":"Microsoft Blames ''Severe'' Weather for Azure Cloud Outage. https:\/\/www.datacenterknowledge.com\/uptime\/microsoft-blames-severe-weather-azure-cloud-outage. (Accessed on 12\/01\/2023)."},{"key":"e_1_3_2_1_65_1","unstructured":"Google cloud service health. https:\/\/bit.ly\/46WTJLb. (Accessed on 12\/02\/2023)."},{"key":"e_1_3_2_1_66_1","unstructured":"Designing for failure: Architecting resilient systems on AWS. https:\/\/d1.awsstatic.com\/events\/reinvent\/2019\/REPEAT_1_Designing_for_failure_Architecting_resilient_systems_on_AWS_ARC335-R1.pdf. (Accessed on 05\/12\/2022)."},{"key":"e_1_3_2_1_67_1","first-page":"692","volume-title":"Proceedings of the 2021 ACM SIGCOMM Conference","author":"Jyothi Sangeetha Abdu","year":"2021","unstructured":"Sangeetha Abdu Jyothi. Solar Superstorms: Planning for an Internet Apocalypse. In Proceedings of the 2021 ACM SIGCOMM Conference, pages 692--704, 2021."},{"key":"e_1_3_2_1_68_1","unstructured":"Incident Review--Google Cloud Outage has Widespread Downstream Impact. https:\/\/www.catchpoint.com\/blog\/incident-review-google-cloud-outage. (Accessed on 04\/04\/2023)."},{"key":"e_1_3_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/2670979.2670986"},{"key":"e_1_3_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/2987550.2987583"},{"key":"e_1_3_2_1_71_1","volume-title":"Brownout approach for adaptive management of resources and applications in cloud computing systems: A taxonomy and future directions. ACM Computing Surveys (CSUR), 52(1):1--27","author":"Xu Minxian","year":"2019","unstructured":"Minxian Xu and Rajkumar Buyya. Brownout approach for adaptive management of resources and applications in cloud computing systems: A taxonomy and future directions. ACM Computing Surveys (CSUR), 52(1):1--27, 2019."},{"key":"e_1_3_2_1_72_1","unstructured":"Consul. https:\/\/www.consul.io. (Accessed on 10\/31\/2022)."},{"key":"e_1_3_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/319344.319152"},{"key":"e_1_3_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2012.56"},{"key":"e_1_3_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1109\/SRDS.2012.34"},{"key":"e_1_3_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281100.1281138"},{"key":"e_1_3_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2009994"},{"key":"e_1_3_2_1_78_1","unstructured":"Google - site reliability engineering. https:\/\/sre.google\/sre-book\/addressing-cascading-failures\/#xref_cascading-failure_load-shed-graceful-degredation. (Accessed on 12\/05\/2023)."},{"key":"e_1_3_2_1_79_1","unstructured":"Target group load shedding for application load balancer | networking & content delivery. https:\/\/aws.amazon.com\/blogs\/networking-and-content-delivery\/target-group-load-shedding-for-application-load-balancer\/. (Accessed on 12\/05\/2023)."},{"key":"e_1_3_2_1_80_1","unstructured":"Using load shedding to avoid overload. https:\/\/aws.amazon.com\/builders-library\/using-load-shedding-to-avoid-overload\/. (Accessed on 12\/05\/2023)."},{"key":"e_1_3_2_1_81_1","volume-title":"Site reliability engineering: How Google runs production systems. ''O'Reilly Media","author":"Beyer Betsy","year":"2016","unstructured":"Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy. Site reliability engineering: How Google runs production systems. ''O'Reilly Media, Inc.'', 2016."},{"key":"e_1_3_2_1_82_1","first-page":"299","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Cho Inho","year":"2020","unstructured":"Inho Cho, Ahmed Saeed, Joshua Fried, Seo Jin Park, Mohammad Alizadeh, and Adam Belay. Overload control for {\u03bcs-scale} {RPCs} with breakwater. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 299--314, 2020."},{"key":"e_1_3_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1145\/3452296.3472918"},{"key":"e_1_3_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1145\/3603269.3604860"},{"key":"e_1_3_2_1_85_1","unstructured":"That time we unplugged a data center to test our disaster readiness. https:\/\/dropbox.tech\/infrastructure\/disaster-readiness-test-failover-blackhole-sjc. (Accessed on 04\/21\/2024)."},{"key":"e_1_3_2_1_86_1","unstructured":"Istioldie 1.4 \/ circuit breaking. https:\/\/istio.io\/v1.4\/docs\/tasks\/traffic-management\/circuit-breaking\/. (Accessed on 12\/05\/2023)."},{"key":"e_1_3_2_1_87_1","unstructured":"Kubernetes: Pod Priority and Preemption. https:\/\/kubernetes.io\/docs\/concepts\/scheduling-eviction\/pod-priority-preemption\/. (Accessed on 04\/04\/2023)."},{"key":"e_1_3_2_1_88_1","unstructured":"Disaster recovery planning guidebook . https:\/\/cloud.google.com\/architecture\/dr-scenarios-planning-guide. (Accessed on 10\/24\/2024)."},{"key":"e_1_3_2_1_89_1","unstructured":"Patterns for enabling data persistence. https:\/\/docs.aws.amazon.com\/prescriptive-guidance\/latest\/modernization-data-persistence\/enabling-patterns.html. (Accessed on 09\/11\/2022)."},{"key":"e_1_3_2_1_90_1","unstructured":"Shared Responsibility Model for Resilience . https:\/\/docs.aws.amazon.com\/whitepapers\/latest\/disaster-recovery-workloads-on-aws\/shared-responsibility-model-for-resiliency.html. (Accessed on 10\/24\/2024)."},{"key":"e_1_3_2_1_91_1","unstructured":"Composite SLAs . https:\/\/learn.microsoft.com\/en-us\/azure\/well-architected\/reliability\/metrics#understand-service-level-agreements. (Accessed on 10\/24\/2024)."},{"key":"e_1_3_2_1_92_1","first-page":"419","volume-title":"2023 USENIX Annual Technical Conference (USENIX ATC 23)","author":"Huye Darby","year":"2023","unstructured":"Darby Huye, Yuri Shkuro, and Raja R Sambasivan. Lifting the veil on Meta's microservice architecture: Analyses of topology and request workflows. In 2023 USENIX Annual Technical Conference (USENIX ATC 23), pages 419--432, 2023."},{"key":"e_1_3_2_1_93_1","volume-title":"Cooperative graceful degradation in containerized clouds","author":"Agrawal Kapil","year":"2024","unstructured":"Kapil Agrawal and Sangeetha Abdu Jyothi. Cooperative graceful degradation in containerized clouds, 2024. Available at https:\/\/arxiv.org\/abs\/2312.12809."},{"key":"e_1_3_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.1145\/3593856.3595909"},{"key":"e_1_3_2_1_95_1","doi-asserted-by":"publisher","DOI":"10.1145\/2785956.2787478"},{"key":"e_1_3_2_1_96_1","unstructured":"Fair Bandwidth Allocation. https:\/\/www.comm.utoronto.ca\/~jorg\/teaching\/ece1545\/schedslides\/bw-allocation.pdf month = year = note = (Accessed on 06\/12\/2023)."},{"key":"e_1_3_2_1_97_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2007.896231"},{"key":"e_1_3_2_1_98_1","unstructured":"Best-fit bin packing. https:\/\/en.wikipedia.org\/wiki\/Best-fit_bin_packing. (Accessed on 04\/06\/2023)."},{"key":"e_1_3_2_1_99_1","unstructured":"Bin packing problem. https:\/\/en.wikipedia.org\/wiki\/Bin_packing_problem. (Accessed on 04\/06\/2023)."},{"key":"e_1_3_2_1_100_1","unstructured":"Phoenix. https:\/\/github.com\/NetSAIL-UCI\/Phoenix. (Accessed on 10\/29\/2024)."},{"key":"e_1_3_2_1_101_1","unstructured":"NetworkX. https:\/\/networkx.org. (Accessed on 12\/06\/2023)."},{"key":"e_1_3_2_1_102_1","unstructured":"Sorted Containers. https:\/\/grantjenks.com\/docs\/sortedcontainers\/introduction.html#sorted-list. (Accessed on 09\/25\/2023)."},{"key":"e_1_3_2_1_103_1","doi-asserted-by":"publisher","DOI":"10.1145\/3135974.3135977"},{"key":"e_1_3_2_1_104_1","doi-asserted-by":"publisher","DOI":"10.1145\/3286685.3286687"},{"key":"e_1_3_2_1_105_1","unstructured":"Principles of Chaos Engineering. https:\/\/principlesofchaos.org. (Accessed on 11\/13\/2023)."},{"key":"e_1_3_2_1_106_1","unstructured":"SPS: the Pulse of Netflix Streaming. https:\/\/netflixtechblog.com\/sps-the-pulse-of-netflix-streaming-ae4db0e05f8a. (Accessed on 11\/13\/2023)."},{"key":"e_1_3_2_1_107_1","unstructured":"MongoDB. https:\/\/www.mongodb.com. (Accessed on 10\/31\/2022)."},{"key":"e_1_3_2_1_108_1","doi-asserted-by":"publisher","DOI":"10.1145\/3472883.3487003"},{"key":"e_1_3_2_1_109_1","unstructured":"Kubelet. https:\/\/kubernetes.io\/docs\/reference\/command-line-tools-reference\/kubelet\/. (Accessed on 11\/23\/2023)."},{"key":"e_1_3_2_1_110_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2022.3174631"},{"key":"e_1_3_2_1_111_1","doi-asserted-by":"publisher","DOI":"10.1145\/3542929.3563477"},{"key":"e_1_3_2_1_112_1","unstructured":"Azure Trace for Packing 2020. https:\/\/github.com\/Azure\/AzurePublicDataset\/blob\/master\/AzureTracesForPacking2020.md. (Accessed on 04\/04\/2023)."},{"key":"e_1_3_2_1_113_1","first-page":"623","volume-title":"17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)","author":"Bhardwaj Romil","year":"2023","unstructured":"Romil Bhardwaj, Kirthevasan Kandasamy, Asim Biswal, Wenshuo Guo, Benjamin Hindman, Joseph Gonzalez, Michael Jordan, and Ion Stoica. Cilantro:{Performance-Aware} resource allocation for general objectives via online feedback. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), pages 623--643. USENIX Association, 2023."},{"key":"e_1_3_2_1_114_1","first-page":"805","volume-title":"14th USENIX symposium on operating systems design and implementation (OSDI 20)","author":"Qiu Haoran","year":"2020","unstructured":"Haoran Qiu, Subho S Banerjee, Saurabh Jha, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. {FIRM}: An intelligent fine-grained resource management framework for {SLO-Oriented} microservices. In 14th USENIX symposium on operating systems design and implementation (OSDI 20), pages 805--825, 2020."},{"key":"e_1_3_2_1_115_1","doi-asserted-by":"publisher","DOI":"10.1145\/3342195.3387524"}],"event":{"name":"ASPLOS '25: 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages","SIGOPS ACM Special Interest Group on Operating Systems","SIGARCH ACM Special Interest Group on Computer Architecture"],"location":"Rotterdam Netherlands","acronym":"ASPLOS '25"},"container-title":["Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3669940.3707244","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3669940.3707244","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T14:50:43Z","timestamp":1755787843000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3669940.3707244"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,30]]},"references-count":115,"alternative-id":["10.1145\/3669940.3707244","10.1145\/3669940"],"URL":"https:\/\/doi.org\/10.1145\/3669940.3707244","relation":{},"subject":[],"published":{"date-parts":[[2025,3,30]]},"assertion":[{"value":"2025-03-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}