{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,24]],"date-time":"2026-04-24T20:52:47Z","timestamp":1777063967290,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":73,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,4,27]]},"DOI":"10.1145\/3767295.3769383","type":"proceedings-article","created":{"date-parts":[[2026,4,24]],"date-time":"2026-04-24T20:20:04Z","timestamp":1777062004000},"page":"2173-2188","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Garen: Reliable Cluster Management with Atomic State Reconciliation"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-0870-7568","authenticated-orcid":false,"given":"Mingi","family":"Kim","sequence":"first","affiliation":[{"name":"FriendliAI, Redwood City, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8962-4976","authenticated-orcid":false,"given":"Ahnjae","family":"Shin","sequence":"additional","affiliation":[{"name":"FriendliAI, Redwood City, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8644-3176","authenticated-orcid":false,"given":"Jaewoo","family":"Maeng","sequence":"additional","affiliation":[{"name":"Samsung advanced institute of technology, Suwon, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0748-6627","authenticated-orcid":false,"given":"Myeongjae","family":"Jeon","sequence":"additional","affiliation":[{"name":"POSTECH, Pohang, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-2356-971X","authenticated-orcid":false,"given":"Byung-Gon","family":"Chun","sequence":"additional","affiliation":[{"name":"FriendliAI, Redwood City, USA"},{"name":"Seoul National University, Seoul, Republic of Korea"}]}],"member":"320","published-online":{"date-parts":[[2026,4,26]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"https:\/\/github.com\/kubernetes\/kubernetes\/issues\/43806","year":"2017","unstructured":"Kube-scheduler gets stuck if there's a pod in a namespace that doesn't exist assigned to a node that doesn't exist. https:\/\/github.com\/kubernetes\/kubernetes\/issues\/43806, 2017."},{"key":"e_1_3_2_1_2_1","volume-title":"https:\/\/github.com\/kubernetes\/kubernetes\/issues\/42433","author":"CPU.","year":"2017","unstructured":"kube-scheduler scheduling despite OutOfCPU. https:\/\/github.com\/kubernetes\/kubernetes\/issues\/42433, 2017."},{"key":"e_1_3_2_1_3_1","volume-title":"https:\/\/github.com\/kubernetes\/kubernetes\/issues\/56057","author":"Pending","year":"2017","unstructured":"Pending pods with affinity rules looking for non-existing node. https:\/\/github.com\/kubernetes\/kubernetes\/issues\/56057, 2017."},{"key":"e_1_3_2_1_4_1","volume-title":"https:\/\/github.com\/kubernetes\/kubernetes\/issues\/56261","author":"Scheduler","year":"2017","unstructured":"Scheduler should delete a node from its cache if it gets \"node not found\" error. https:\/\/github.com\/kubernetes\/kubernetes\/issues\/56261, 2017."},{"key":"e_1_3_2_1_5_1","volume-title":"violating critical pod safety guarantees. https:\/\/github.com\/kubernetes\/kubernetes\/issues\/59848","author":"Kubernetes","year":"2018","unstructured":"Kubernetes is vulnerable to stale reads, violating critical pod safety guarantees. https:\/\/github.com\/kubernetes\/kubernetes\/issues\/59848, 2018."},{"key":"e_1_3_2_1_6_1","volume-title":"https:\/\/kubernetes.io\/docs\/reference\/using-api\/api-concepts\/#dry-run","author":"Kubernetes","year":"2020","unstructured":"Kubernetes Dry-run. https:\/\/kubernetes.io\/docs\/reference\/using-api\/api-concepts\/#dry-run, 2020."},{"key":"e_1_3_2_1_7_1","volume-title":"https:\/\/github.com\/kubernetes\/kubernetes\/issues\/94437","author":"Missing","year":"2020","unstructured":"Missing a single update message will cause the scheduler never able to schedule pods to the right nodes. https:\/\/github.com\/kubernetes\/kubernetes\/issues\/94437, 2020."},{"key":"e_1_3_2_1_8_1","volume-title":"https:\/\/brooker.co.za\/blog\/2021\/01\/22\/cloud-scale.html","author":"Scaling The Fundamental","year":"2020","unstructured":"The Fundamental Mechanism of Scaling. https:\/\/brooker.co.za\/blog\/2021\/01\/22\/cloud-scale.html, 2020."},{"key":"e_1_3_2_1_9_1","volume-title":"https:\/\/jira.percona.com\/browse\/K8SPSMDB-438","author":"Arbiter","year":"2021","unstructured":"[BUG] Arbiter statefulset gets mistakenly deleted when reading stale 'replset.arbiter.enabled'. https:\/\/jira.percona.com\/browse\/K8SPSMDB-438, 2021."},{"key":"e_1_3_2_1_10_1","volume-title":"https:\/\/github.com\/Orange-OpenSource\/casskop\/issues\/370","author":"Casskop","year":"2021","unstructured":"[BUG] Casskop fails to clean up PVC and refuses to handle user requests after crash and restart. https:\/\/github.com\/Orange-OpenSource\/casskop\/issues\/370, 2021."},{"key":"e_1_3_2_1_11_1","volume-title":"https:\/\/jira.percona.com\/browse\/K8SPSMDB-433","author":"Config","year":"2021","unstructured":"[BUG] Config statefulset gets mistakenly deleted when reading stale 'spec.sharding.enabled'. https:\/\/jira.percona.com\/browse\/K8SPSMDB-433, 2021."},{"key":"e_1_3_2_1_12_1","volume-title":"https:\/\/jira.percona.com\/browse\/K8SPXC-725","author":"Aproxy","year":"2021","unstructured":"[BUG] HAproxy statefulset and services get mistakenly deleted when reading stale 'spec.haproxy.enabled'. https:\/\/jira.percona.com\/browse\/K8SPXC-725, 2021."},{"key":"e_1_3_2_1_13_1","volume-title":"https:\/\/jira.percona.com\/browse\/K8SPXC-897","author":"Operator","year":"2021","unstructured":"[BUG] Operator never creates ssl-internal certificate if crash happens at some particular point. https:\/\/jira.percona.com\/browse\/K8SPXC-897, 2021."},{"key":"e_1_3_2_1_14_1","volume-title":"https:\/\/github.com\/Orange-OpenSource\/casskop\/issues\/321","author":"Pod","year":"2021","unstructured":"[BUG] Pod disruption budget's MaxPodUnavailable gets changed unexpectedly when reading stale information from apiserver. https:\/\/github.com\/Orange-OpenSource\/casskop\/issues\/321, 2021."},{"key":"e_1_3_2_1_15_1","volume-title":"https:\/\/github.com\/instaclustr\/cassandra-operator\/issues\/402","author":"PVC","year":"2021","unstructured":"[BUG] PVC can be accidentally deleted when controller reads stale data from apiserver. https:\/\/github.com\/instaclustr\/cassandra-operator\/issues\/402, 2021."},{"key":"e_1_3_2_1_16_1","volume-title":"https:\/\/github.com\/Orange-OpenSource\/casskop\/issues\/316","author":"PVC","year":"2021","unstructured":"[BUG] PVC is mistakenly deleted when the controller reads stale deletion Timestamp information. https:\/\/github.com\/Orange-OpenSource\/casskop\/issues\/316, 2021."},{"key":"e_1_3_2_1_17_1","volume-title":"https:\/\/github.com\/instaclustr\/cassandra-operator\/issues\/407","author":"Reading","year":"2021","unstructured":"[BUG] Reading stale pod information can lead to undesired PVC deletion. https:\/\/github.com\/instaclustr\/cassandra-operator\/issues\/407, 2021."},{"key":"e_1_3_2_1_18_1","volume-title":"https:\/\/github.com\/rabbitmq\/cluster-operator\/issues\/648","author":"Reading","year":"2021","unstructured":"[BUG] Reading stale RabbitmqCluster information can lead to undesired statefulset deletion. https:\/\/github.com\/rabbitmq\/cluster-operator\/issues\/648, 2021."},{"key":"e_1_3_2_1_19_1","volume-title":"https:\/\/github.com\/pravega\/zookeeper-operator\/issues\/410","author":"Unexpected","year":"2021","unstructured":"[BUG] Unexpected resource creation when the ZookeeperCluster is already deleted. https:\/\/github.com\/pravega\/zookeeper-operator\/issues\/410, 2021."},{"key":"e_1_3_2_1_20_1","volume-title":"https:\/\/k8ssandra.atlassian.net\/browse\/K8SSAND-1023","author":"Cassandra","year":"2021","unstructured":"Cassandra operator fails to start the Cassandra cluster due to a missing secret caused by unexpected crash. https:\/\/k8ssandra.atlassian.net\/browse\/K8SSAND-1023, 2021."},{"key":"e_1_3_2_1_21_1","volume-title":"https:\/\/cluster-api.sigs.k8s.io\/developer\/providers\/implementers-guide\/controllers_and_reconciliation.html","author":"Reconciliation Controllers","year":"2021","unstructured":"Controllers and Reconciliation. https:\/\/cluster-api.sigs.k8s.io\/developer\/providers\/implementers-guide\/controllers_and_reconciliation.html, 2021."},{"key":"e_1_3_2_1_22_1","volume-title":"https:\/\/etcd.io\/docs\/v3.3\/learning\/api\/#transaction","year":"2021","unstructured":"etcd Mini-transaction. https:\/\/etcd.io\/docs\/v3.3\/learning\/api\/#transaction, 2021."},{"key":"e_1_3_2_1_23_1","volume-title":"https:\/\/github.com\/crossplane-contrib\/provider-aws\/issues\/802","author":"Leaked","year":"2021","unstructured":"Leaked RouteTable due to eventually consistent DescribeRouteTables API. https:\/\/github.com\/crossplane-contrib\/provider-aws\/issues\/802, 2021."},{"key":"e_1_3_2_1_24_1","volume-title":"https:\/\/github.com\/kubernetes\/kubernetes\/issues\/106419","author":"OutOfcpu Pod","year":"2021","unstructured":"Pod created many times when node is OutOfcpu. https:\/\/github.com\/kubernetes\/kubernetes\/issues\/106419, 2021."},{"key":"e_1_3_2_1_25_1","volume-title":"https:\/\/k8ssandra.atlassian.net\/browse\/K8SSAND-559","author":"PVC","year":"2021","unstructured":"PVC can be deleted mistakenly when reading stale deletionTimestamp information. https:\/\/k8ssandra.atlassian.net\/browse\/K8SSAND-559, 2021."},{"key":"e_1_3_2_1_26_1","volume-title":"https:\/\/github.com\/k8ssandra\/cass-operator\/issues\/118","author":"PVC","year":"2021","unstructured":"PVC can be deleted mistakenly when reading stale deletionTimestamp information. https:\/\/github.com\/k8ssandra\/cass-operator\/issues\/118, 2021."},{"key":"e_1_3_2_1_27_1","volume-title":"https:\/\/github.com\/kubernetes\/kubernetes\/issues\/101872","author":"Scheduler","year":"2021","unstructured":"Scheduler scheduled pods to a deleted node. https:\/\/github.com\/kubernetes\/kubernetes\/issues\/101872, 2021."},{"key":"e_1_3_2_1_28_1","volume-title":"https:\/\/github.com\/kubernetes\/kubernetes\/issues\/106884","author":"Status","year":"2021","unstructured":"Status of pods can become \"OutOfCpu\" when many pods are created and completed in a short time on the same node. https:\/\/github.com\/kubernetes\/kubernetes\/issues\/106884, 2021."},{"key":"e_1_3_2_1_29_1","volume-title":"https:\/\/book-v1.book.kubebuilder.io\/basics\/what_is_a_controller.html","author":"Level What","year":"2021","unstructured":"What is a Level Based API. https:\/\/book-v1.book.kubebuilder.io\/basics\/what_is_a_controller.html, 2021."},{"key":"e_1_3_2_1_30_1","volume-title":"https:\/\/github.com\/elastic\/cloud-on-k8s\/issues\/5249","author":"Elastic","year":"2022","unstructured":"[BUG] Elastic operator mistakenly deletes the secret objects when seeing a stale cluster state. https:\/\/github.com\/elastic\/cloud-on-k8s\/issues\/5249, 2022."},{"key":"e_1_3_2_1_31_1","volume-title":"https:\/\/github.com\/elastic\/cloud-on-k8s\/issues\/5274","author":"Elastic","year":"2022","unstructured":"[BUG] Elastic operator mistakenly issues statefulset scaledown leading to unexpected node shutdown and data migration. https:\/\/github.com\/elastic\/cloud-on-k8s\/issues\/5274, 2022."},{"key":"e_1_3_2_1_32_1","volume-title":"https:\/\/github.com\/konpyutaika\/nifikop\/issues\/79","year":"2022","unstructured":"[BUG] nifikop fails to scale down nifi cluster due to a crash in the middle of reconcileNifiPod(). https:\/\/github.com\/konpyutaika\/nifikop\/issues\/79, 2022."},{"key":"e_1_3_2_1_33_1","volume-title":"https:\/\/jira.percona.com\/browse\/K8SPXC-896","author":"Operator","year":"2022","unstructured":"[BUG] Operator cannot create ssl-internal secret if crash happens at some particular point. https:\/\/jira.percona.com\/browse\/K8SPXC-896, 2022."},{"key":"e_1_3_2_1_34_1","volume-title":"PVC and services get mistakenly deleted when reading stale proxysql information. https:\/\/jira.percona.com\/browse\/K8SPXC-763","author":"Proxysql","year":"2022","unstructured":"[BUG] Proxysql statefulset, PVC and services get mistakenly deleted when reading stale proxysql information. https:\/\/jira.percona.com\/browse\/K8SPXC-763, 2022."},{"key":"e_1_3_2_1_35_1","volume-title":"https:\/\/github.com\/rabbitmq\/cluster-operator\/issues\/782","author":"PVC","year":"2022","unstructured":"[BUG] PVC expansion fails if the operator crashes in the middle of reconciliation. https:\/\/github.com\/rabbitmq\/cluster-operator\/issues\/782, 2022."},{"key":"e_1_3_2_1_36_1","volume-title":"https:\/\/jira.percona.com\/browse\/K8SPXC-979","author":"Cs","year":"2022","unstructured":"[BUG] xtradb-operator fails to delete the PVCs and secrets if it crashes and restarts in the middle of deleteStatefulSet(). https:\/\/jira.percona.com\/browse\/K8SPXC-979, 2022."},{"key":"e_1_3_2_1_37_1","volume-title":"https:\/\/github.com\/kubernetes\/kubernetes\/issues\/107679","year":"2022","unstructured":"kube-scheduler schedules pods despite of insufficient resource and the pod fails. https:\/\/github.com\/kubernetes\/kubernetes\/issues\/107679, 2022."},{"key":"e_1_3_2_1_38_1","volume-title":"https:\/\/kubernetes.io\/docs\/reference\/access-authn-authz\/admission-controllers\/","author":"Controllers Admission","year":"2023","unstructured":"Admission Controllers. https:\/\/kubernetes.io\/docs\/reference\/access-authn-authz\/admission-controllers\/, 2023."},{"key":"e_1_3_2_1_39_1","volume-title":"https:\/\/github.com\/kubernetes\/autoscaler","author":"Kubernetes Autoscaling","year":"2023","unstructured":"Autoscaling components for Kubernetes. https:\/\/github.com\/kubernetes\/autoscaler, 2023."},{"key":"e_1_3_2_1_40_1","volume-title":"https:\/\/kubernetes.io\/docs\/concepts\/extend-kubernetes\/api-extension\/custom-resources\/","author":"Resources Custom","year":"2023","unstructured":"Custom Resources. https:\/\/kubernetes.io\/docs\/concepts\/extend-kubernetes\/api-extension\/custom-resources\/, 2023."},{"key":"e_1_3_2_1_41_1","volume-title":"https:\/\/kubernetes.io\/docs\/concepts\/extend-kubernetes\/","author":"Kubernetes Extending","year":"2023","unstructured":"Extending Kubernetes. https:\/\/kubernetes.io\/docs\/concepts\/extend-kubernetes\/, 2023."},{"key":"e_1_3_2_1_42_1","volume-title":"setting OutOfCpu on scheduled pods. https:\/\/github.com\/kubernetes\/kubernetes\/issues\/115325","author":"Kubelet","year":"2023","unstructured":"Kubelet accepting pod, setting OutOfCpu on scheduled pods. https:\/\/github.com\/kubernetes\/kubernetes\/issues\/115325, 2023."},{"key":"e_1_3_2_1_43_1","volume-title":"https:\/\/kubernetes.io\/docs\/reference\/using-api\/api-concepts\/","author":"Concepts Kubernetes API","year":"2023","unstructured":"Kubernetes API Concepts. https:\/\/kubernetes.io\/docs\/reference\/using-api\/api-concepts\/, 2023."},{"key":"e_1_3_2_1_44_1","volume-title":"https:\/\/kubernetes.io\/docs\/reference\/generated\/kubernetes-api\/v1.27\/#-strong-api-overview-strong-","author":"Overview Kubernetes API","year":"2023","unstructured":"Kubernetes API Overview. https:\/\/kubernetes.io\/docs\/reference\/generated\/kubernetes-api\/v1.27\/#-strong-api-overview-strong-, 2023."},{"key":"e_1_3_2_1_45_1","volume-title":"https:\/\/kubernetes.io\/docs\/reference\/using-api\/api-concepts\/#resource-versions","author":"Version Semantic Kubernetes Resource","year":"2023","unstructured":"Kubernetes Resource Version Semantic. https:\/\/kubernetes.io\/docs\/reference\/using-api\/api-concepts\/#resource-versions, 2023."},{"key":"e_1_3_2_1_46_1","volume-title":"Cloud native certificate management. https:\/\/certmanager.io\/","author":"Cert-Manager","year":"2024","unstructured":"Cert-Manager: Cloud native certificate management. https:\/\/certmanager.io\/, 2024."},{"key":"e_1_3_2_1_47_1","volume-title":"https:\/\/github.com\/kubernetes-sigs\/scheduler-plugins\/blob\/master\/pkg\/coscheduling\/README.md","author":"Coscheduling based on PodGroup CRD.","year":"2024","unstructured":"Coscheduling based on PodGroup CRD. https:\/\/github.com\/kubernetes-sigs\/scheduler-plugins\/blob\/master\/pkg\/coscheduling\/README.md, 2024."},{"key":"e_1_3_2_1_48_1","volume-title":"A distributed, reliable key-value store for the most critical data of a distributed system. https:\/\/etcd.io\/","year":"2024","unstructured":"etcd: A distributed, reliable key-value store for the most critical data of a distributed system. https:\/\/etcd.io\/, 2024."},{"key":"e_1_3_2_1_49_1","volume-title":"https:\/\/github.com\/kubernetes-sigs\/controller-runtime","author":"Repo","year":"2025","unstructured":"Repo for the controller-runtime subproject of kubebuilder (sigapimachinery). https:\/\/github.com\/kubernetes-sigs\/controller-runtime, 2025."},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2898442.2898444"},{"key":"e_1_3_2_1_51_1","volume-title":"Keep the Space Shuttle Flying: Writing Robust Operators. https:\/\/kccnceu19.sched.com\/event\/MPaN","author":"Chekrygin Illya","year":"2019","unstructured":"Illya Chekrygin. Keep the Space Shuttle Flying: Writing Robust Operators. https:\/\/kccnceu19.sched.com\/event\/MPaN, 2019."},{"key":"e_1_3_2_1_52_1","volume-title":"KubeCon North America. https:\/\/kccncna2022.sched.com\/event\/182GE\/preventing-controller-sprawl-from-taking-down-your-cluster-when-a-scalable-pattern-stops-being-scalable-madhu-cs-robinhood-markets","author":"Madhusudan","year":"2022","unstructured":"Madhusudan C.S. Preventing Controller Sprawl From Taking Down Your Cluster. In KubeCon North America. https:\/\/kccncna2022.sched.com\/event\/182GE\/preventing-controller-sprawl-from-taking-down-your-cluster-when-a-scalable-pattern-stops-being-scalable-madhu-cs-robinhood-markets, 2022."},{"key":"e_1_3_2_1_53_1","first-page":"112","volume-title":"Proceedings of the 29th Symposium on Operating Systems Principles, SOSP '23","author":"Gu Jiawei Tyler","year":"2023","unstructured":"Jiawei Tyler Gu, Xudong Sun, Wentao Zhang, Yuxuan Jiang, Chen Wang, Mandana Vaziri, Owolabi Legunsen, and Tianyin Xu. Acto: Automatic end-to-end testing for operation correctness of cloud system management. In Proceedings of the 29th Symposium on Operating Systems Principles, SOSP '23, page 96\u2013112, New York, NY, USA, 2023. Association for Computing Machinery."},{"key":"e_1_3_2_1_54_1","volume-title":"KubeCon North America. https:\/\/kccncna19.sched.com\/event\/UaeV","author":"Guilloux Sebastien","year":"2019","unstructured":"Sebastien Guilloux. Writing a Kubernetes Operator: the Hard Parts. In KubeCon North America. https:\/\/kccncna19.sched.com\/event\/UaeV, 2019."},{"key":"e_1_3_2_1_55_1","volume-title":"KubeCon Europe. https:\/\/kccnceu2022.sched.com\/event\/ytr1\/how-a-couple-of-characters-and-gitops-brought-down-our-site-guy-templeton-stuart-davidson-skyscanner","author":"Guy Templeton Stuart Davidson","year":"2022","unstructured":"Stuart Davidson Guy Templeton. How a Couple of Characters (and GitOps) Brought Down Our Site. In KubeCon Europe. https:\/\/kccnceu2022.sched.com\/event\/ytr1\/how-a-couple-of-characters-and-gitops-brought-down-our-site-guy-templeton-stuart-davidson-skyscanner, 2022."},{"key":"e_1_3_2_1_56_1","first-page":"861","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Hadary Ori","unstructured":"Ori Hadary, Luke Marshall, Ishai Menache, Abhisek Pan, Esaias E Greeff, David Dion, Star Dorminey, Shailesh Joshi, Yang Chen, Mark Russinovich, and Thomas Moscibroda. Protean: VM allocation service at scale. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 845\u2013861. USENIX Association, November 2020."},{"key":"e_1_3_2_1_57_1","volume-title":"KubeCon North America. https:\/\/kccncna19.sched.com\/event\/UadX","author":"Hemant Kumar Jan \u0160afr\u00e1nek","year":"2019","unstructured":"Jan \u0160afr\u00e1nek Hemant Kumar. Storage on Kubernetes - Learning From Failures. In KubeCon North America. https:\/\/kccncna19.sched.com\/event\/UadX, 2019."},{"key":"e_1_3_2_1_58_1","first-page":"891","volume-title":"2018 USENIX Annual Technical Conference (USENIX ATC 18)","author":"Hu Yige","year":"2018","unstructured":"Yige Hu, Zhiting Zhu, Ian Neal, Youngjin Kwon, Tianyu Cheng, Vijay Chidambaram, and Emmett Witchel. TxFS: Leveraging File-System crash consistency to provide ACID transactions. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 879\u2013891, Boston, MA, July 2018. USENIX Association."},{"key":"e_1_3_2_1_59_1","unstructured":"IBM. Example multi-node cluster specifications. 2023."},{"key":"e_1_3_2_1_60_1","volume-title":"ContainerDays. https:\/\/speakerdeck.com\/maxlaverse\/moving-to-kubernetes-the-bad-and-the-ugly","author":"Lagresle Maxime","year":"2019","unstructured":"Maxime Lagresle. Moving to Kubernetes: the Bad and the Ugly. In ContainerDays. https:\/\/speakerdeck.com\/maxlaverse\/moving-to-kubernetes-the-bad-and-the-ugly, 2019."},{"key":"e_1_3_2_1_61_1","volume-title":"KubeCon North America. https:\/\/kccncna20.sched.com\/event\/ek9r\/10-more-weird-ways-to-blow-up-your-kubernetes-jian-cheung-joseph-kim-airbnb","author":"Melanie Cebula Bruce Sherrod","year":"2019","unstructured":"Bruce Sherrod Melanie Cebula. 10 Weird Ways to Blow Up Your Kubernetes. In KubeCon North America. https:\/\/kccncna20.sched.com\/event\/ek9r\/10-more-weird-ways-to-blow-up-your-kubernetes-jian-cheung-joseph-kim-airbnb, 2019."},{"key":"e_1_3_2_1_62_1","first-page":"157","volume-title":"Proceedings of the 13th Symposium on Cloud Computing, SoCC '22","author":"Melissaris Themis","year":"2022","unstructured":"Themis Melissaris, Kunal Nabar, Rares Radut, Samir Rehmtulla, Arthur Shi, Samartha Chandrashekar, and Ioannis Papapanagiotou. Elastic cloud services: scaling snowflake's control plane. In Proceedings of the 13th Symposium on Cloud Computing, SoCC '22, page 142\u2013157, New York, NY, USA, 2022. Association for Computing Machinery."},{"key":"e_1_3_2_1_63_1","volume-title":"Azure Public Dataset V2. https:\/\/github.com\/Azure\/AzurePublicDataset","year":"2019","unstructured":"Microsoft. Azure Public Dataset V2. https:\/\/github.com\/Azure\/AzurePublicDataset, 2019."},{"key":"e_1_3_2_1_64_1","first-page":"234","volume-title":"2015 USENIX Annual Technical Conference (USENIX ATC 15)","author":"Min Changwoo","year":"2015","unstructured":"Changwoo Min, Woon-Hak Kang, Taesoo Kim, Sang-Won Lee, and Young Ik Eom. Lightweight Application-Level crash consistency on transactional flash storage. In 2015 USENIX Annual Technical Conference (USENIX ATC 15), pages 221\u2013234, Santa Clara, CA, July 2015. USENIX Association."},{"key":"e_1_3_2_1_65_1","first-page":"176","volume-title":"Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09","author":"Porter Donald E.","year":"2009","unstructured":"Donald E. Porter, Owen S. Hofmann, Christopher J. Rossbach, Alexander Benn, and Emmett Witchel. Operating system transactions. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09, page 161\u2013176, New York, NY, USA, 2009. Association for Computing Machinery."},{"key":"e_1_3_2_1_66_1","first-page":"364","volume-title":"Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13","author":"Schwarzkopf Malte","year":"2013","unstructured":"Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. Omega: Flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, page 351\u2013364, New York, NY, USA, 2013. Association for Computing Machinery."},{"key":"e_1_3_2_1_67_1","first-page":"159","volume-title":"16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)","author":"Sun Xudong","year":"2022","unstructured":"Xudong Sun, Wenqing Luo, Jiawei Tyler Gu, Aishwarya Ganesan, Ramnatthan Alagappan, Michael Gasch, Lalith Suresh, and Tianyin Xu. Automatic reliability testing for cluster management controllers. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), pages 143\u2013159, Carlsbad, CA, July 2022. USENIX Association."},{"key":"e_1_3_2_1_68_1","first-page":"666","volume-title":"18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24)","author":"Sun Xudong","year":"2024","unstructured":"Xudong Sun, Wenjie Ma, Jiawei Tyler Gu, Zicheng Ma, Tej Chajed, Jon Howell, Andrea Lattuada, Oded Padon, Lalith Suresh, Adriana Szekeres, and Tianyin Xu. Anvil: Verifying liveness of cluster management controllers. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), pages 649\u2013666, Santa Clara, CA, July 2024. USENIX Association."},{"key":"e_1_3_2_1_69_1","first-page":"220","volume-title":"Proceedings of the Workshop on Hot Topics in Operating Systems, HotOS '21","author":"Sun Xudong","year":"2021","unstructured":"Xudong Sun, Lalith Suresh, Aishwarya Ganesan, Ramnatthan Alagappan, Michael Gasch, Lilia Tang, and Tianyin Xu. Reasoning about modern datacenter infrastructures using partial histories. In Proceedings of the Workshop on Hot Topics in Operating Systems, HotOS '21, page 213\u2013220, New York, NY, USA, 2021. Association for Computing Machinery."},{"key":"e_1_3_2_1_70_1","first-page":"844","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Suresh Lalith","unstructured":"Lalith Suresh, Jo\u00e3o Loff, Faria Kalim, Sangeetha Abdu Jyothi, Nina Narodytska, Leonid Ryzhyk, Sahan Gamage, Brian Oki, Pranshu Jain, and Michael Gasch. Building scalable and flexible cluster managers using declarative programming. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 827\u2013844. USENIX Association, November 2020."},{"key":"e_1_3_2_1_71_1","first-page":"803","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Tang Chunqiang","unstructured":"Chunqiang Tang, Kenny Yu, Kaushik Veeraraghavan, Jonathan Kaldor, Scott Michelson, Thawan Kooburat, Aravind Anbudurai, Matthew Clark, Kabir Gogia, Long Cheng, Ben Christensen, Alex Gartrell, Maxim Khutornenko, Sachin Kulkarni, Marcin Pawlowski, Tuomas Pelkonen, Andre Rodrigues, Rounak Tibrewal, Vaishnavi Venkatesan, and Peter Zhang. Twine: A unified cluster management system for shared infrastructure. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 787\u2013803. USENIX Association, November 2020."},{"key":"e_1_3_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/2741948.2741964"},{"key":"e_1_3_2_1_73_1","first-page":"449","volume-title":"Proceedings of the 5th International Conference on Software Engineering, ICSE '81","author":"Weiser Mark","unstructured":"Mark Weiser. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, ICSE '81, page 439\u2013449. IEEE Press, 1981."}],"event":{"name":"EUROSYS '26: 21st European Conference on Computer Systems","location":"McEwan Hall\/The University of Edinburgh Edinburgh Scotland UK","acronym":"EUROSYS '26","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems"]},"container-title":["Proceedings of the 21st European Conference on Computer Systems"],"original-title":[],"deposited":{"date-parts":[[2026,4,24]],"date-time":"2026-04-24T20:25:05Z","timestamp":1777062305000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3767295.3769383"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,26]]},"references-count":73,"alternative-id":["10.1145\/3767295.3769383","10.1145\/3767295"],"URL":"https:\/\/doi.org\/10.1145\/3767295.3769383","relation":{},"subject":[],"published":{"date-parts":[[2026,4,26]]},"assertion":[{"value":"2026-04-26","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}