{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T08:59:58Z","timestamp":1775638798330,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":81,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,26]],"date-time":"2021-10-26T00:00:00Z","timestamp":1635206400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,26]]},"DOI":"10.1145\/3477132.3483577","type":"proceedings-article","created":{"date-parts":[[2021,10,19]],"date-time":"2021-10-19T15:59:18Z","timestamp":1634659158000},"page":"116-131","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":38,"title":["Understanding and Detecting Software Upgrade Failures in Distributed Systems"],"prefix":"10.1145","author":[{"given":"Yongle","family":"Zhang","sequence":"first","affiliation":[{"name":"Purdue University"}]},{"given":"Junwen","family":"Yang","sequence":"additional","affiliation":[{"name":"University of Chicago"}]},{"given":"Zhuqi","family":"Jin","sequence":"additional","affiliation":[{"name":"University of Toronto"}]},{"given":"Utsav","family":"Sethi","sequence":"additional","affiliation":[{"name":"University of Chicago"}]},{"given":"Kirk","family":"Rodrigues","sequence":"additional","affiliation":[{"name":"University of Toronto"}]},{"given":"Shan","family":"Lu","sequence":"additional","affiliation":[{"name":"University of Chicago"}]},{"given":"Ding","family":"Yuan","sequence":"additional","affiliation":[{"name":"University of Toronto"}]}],"member":"320","published-online":{"date-parts":[[2021,10,26]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"American fuzzy lop. https:\/\/lcamtuf.coredump.cx\/afl\/.  American fuzzy lop. https:\/\/lcamtuf.coredump.cx\/afl\/."},{"key":"e_1_3_2_1_2_1","unstructured":"Application deployment and testing strategies. https:\/\/cloud.google.com\/architecture\/application-deployment-and-testing-strategies.  Application deployment and testing strategies. https:\/\/cloud.google.com\/architecture\/application-deployment-and-testing-strategies."},{"key":"e_1_3_2_1_3_1","unstructured":"Canary deployment. https:\/\/cloud.google.com\/blog\/products\/gcp\/how-release-canaries-can-save-your-bacon-cre-life-lessons.  Canary deployment. https:\/\/cloud.google.com\/blog\/products\/gcp\/how-release-canaries-can-save-your-bacon-cre-life-lessons."},{"key":"e_1_3_2_1_4_1","unstructured":"CASSANDRA-10652. https:\/\/jira.apache.org\/jira\/browse\/CASSANDRA-10652.  CASSANDRA-10652. https:\/\/jira.apache.org\/jira\/browse\/CASSANDRA-10652."},{"key":"e_1_3_2_1_5_1","unstructured":"CASSANDRA-10822. https:\/\/jira.apache.org\/jira\/browse\/CASSANDRA-10822.  CASSANDRA-10822. https:\/\/jira.apache.org\/jira\/browse\/CASSANDRA-10822."},{"key":"e_1_3_2_1_6_1","unstructured":"CASSANDRA-13441. https:\/\/issues.apache.org\/jira\/browse\/CASSANDRA-13441.  CASSANDRA-13441. https:\/\/issues.apache.org\/jira\/browse\/CASSANDRA-13441."},{"key":"e_1_3_2_1_7_1","unstructured":"CASSANDRA-5102. https:\/\/issues.apache.org\/jira\/browse\/CASSANDRA-5102.  CASSANDRA-5102. https:\/\/issues.apache.org\/jira\/browse\/CASSANDRA-5102."},{"key":"e_1_3_2_1_8_1","unstructured":"CASSANDRA-6678. https:\/\/issues.apache.org\/jira\/browse\/CASSANDRA-6678.  CASSANDRA-6678. https:\/\/issues.apache.org\/jira\/browse\/CASSANDRA-6678."},{"key":"e_1_3_2_1_9_1","unstructured":"Docker hub. https:\/\/www.docker.com\/products\/docker-hub.  Docker hub. https:\/\/www.docker.com\/products\/docker-hub."},{"key":"e_1_3_2_1_10_1","unstructured":"Dropbox upgrade failure. https:\/\/dropbox.tech\/infrastructure\/outage-post-mortem.  Dropbox upgrade failure. https:\/\/dropbox.tech\/infrastructure\/outage-post-mortem."},{"key":"e_1_3_2_1_11_1","unstructured":"HDFS-11856. https:\/\/issues.apache.org\/jira\/browse\/HDFS-11856.  HDFS-11856. https:\/\/issues.apache.org\/jira\/browse\/HDFS-11856."},{"key":"e_1_3_2_1_12_1","unstructured":"HDFS-14726. https:\/\/issues.apache.org\/jira\/browse\/HDFS-14726.  HDFS-14726. https:\/\/issues.apache.org\/jira\/browse\/HDFS-14726."},{"key":"e_1_3_2_1_13_1","unstructured":"HDFS-156224. https:\/\/issues.apache.org\/jira\/browse\/HDFS-15624.  HDFS-156224. https:\/\/issues.apache.org\/jira\/browse\/HDFS-15624."},{"key":"e_1_3_2_1_14_1","unstructured":"HDFS-1936. https:\/\/issues.apache.org\/jira\/browse\/HDFS-1936.  HDFS-1936. https:\/\/issues.apache.org\/jira\/browse\/HDFS-1936."},{"key":"e_1_3_2_1_15_1","unstructured":"HDFS-5988. https:\/\/issues.apache.org\/jira\/browse\/HDFS-5988.  HDFS-5988. https:\/\/issues.apache.org\/jira\/browse\/HDFS-5988."},{"key":"e_1_3_2_1_16_1","unstructured":"HDFS-8676. https:\/\/issues.apache.org\/jira\/browse\/HDFS-8676.  HDFS-8676. https:\/\/issues.apache.org\/jira\/browse\/HDFS-8676."},{"key":"e_1_3_2_1_17_1","unstructured":"Java virtual machine. https:\/\/en.wikipedia.org\/wiki\/Java_virtual_machine.  Java virtual machine. https:\/\/en.wikipedia.org\/wiki\/Java_virtual_machine."},{"key":"e_1_3_2_1_18_1","unstructured":"JUnit 5. https:\/\/junit.org\/junit5\/.  JUnit 5. https:\/\/junit.org\/junit5\/."},{"key":"e_1_3_2_1_19_1","unstructured":"KAFKA-10173. https:\/\/issues.apache.org\/jira\/browse\/KAFKA-10173.  KAFKA-10173. https:\/\/issues.apache.org\/jira\/browse\/KAFKA-10173."},{"key":"e_1_3_2_1_20_1","unstructured":"KAFKA-6238. https:\/\/issues.apache.org\/jira\/browse\/KAFKA-6238.  KAFKA-6238. https:\/\/issues.apache.org\/jira\/browse\/KAFKA-6238."},{"key":"e_1_3_2_1_21_1","unstructured":"KAFKA-7403. https:\/\/jira.apache.org\/jira\/browse\/KAFKA-7403.  KAFKA-7403. https:\/\/jira.apache.org\/jira\/browse\/KAFKA-7403."},{"key":"e_1_3_2_1_22_1","unstructured":"Linux containers. https:\/\/linuxcontainers.org\/.  Linux containers. https:\/\/linuxcontainers.org\/."},{"key":"e_1_3_2_1_23_1","unstructured":"MESOS-3834. https:\/\/issues.apache.org\/jira\/browse\/MESOS-3834.  MESOS-3834. https:\/\/issues.apache.org\/jira\/browse\/MESOS-3834."},{"key":"e_1_3_2_1_24_1","unstructured":"Microsoft Azure Blog. https:\/\/azure.microsoft.com\/en-us\/blog\/.  Microsoft Azure Blog. https:\/\/azure.microsoft.com\/en-us\/blog\/."},{"key":"e_1_3_2_1_25_1","unstructured":"Microsoft says 11-hour azure outage was caused by system update. https:\/\/www.entrepreneur.com\/article\/240029.  Microsoft says 11-hour azure outage was caused by system update. https:\/\/www.entrepreneur.com\/article\/240029."},{"key":"e_1_3_2_1_26_1","unstructured":"Proto Buffer Guide. https:\/\/developers.google.com\/protocol-buffers\/docs\/proto.  Proto Buffer Guide. https:\/\/developers.google.com\/protocol-buffers\/docs\/proto."},{"key":"e_1_3_2_1_27_1","unstructured":"PyParsing. https:\/\/github.com\/pyparsing\/pyparsing.  PyParsing. https:\/\/github.com\/pyparsing\/pyparsing."},{"key":"e_1_3_2_1_28_1","unstructured":"Summary of windows azure service disruption on feb 29th 2012. https:\/\/azure.microsoft.com\/en-us\/blog\/summary-of-windows-azure-service-disruption-on-feb-29th-2012\/.  Summary of windows azure service disruption on feb 29th 2012. https:\/\/azure.microsoft.com\/en-us\/blog\/summary-of-windows-azure-service-disruption-on-feb-29th-2012\/."},{"key":"e_1_3_2_1_29_1","unstructured":"Thrift Compatibility Checker. https:\/\/github.com\/brunorijsman\/thrift-compatibility.  Thrift Compatibility Checker. https:\/\/github.com\/brunorijsman\/thrift-compatibility."},{"key":"e_1_3_2_1_30_1","unstructured":"Thrift Guide. https:\/\/diwakergupta.github.io\/thrift-missing-guide\/.  Thrift Guide. https:\/\/diwakergupta.github.io\/thrift-missing-guide\/."},{"key":"e_1_3_2_1_31_1","unstructured":"ZOOKEEPER-1805. https:\/\/issues.apache.org\/jira\/browse\/ZOOKEEPER-1805.  ZOOKEEPER-1805. https:\/\/issues.apache.org\/jira\/browse\/ZOOKEEPER-1805."},{"key":"e_1_3_2_1_32_1","first-page":"8","volume-title":"Proceedings of the 9th Conference on Hot Topics in Operating Systems -","volume":"9","author":"Ajmani Sameer","year":"2003","unstructured":"Sameer Ajmani , Barbara Liskov , and Liuba Shrira . Scheduling and simulation: How to upgrade distributed systems . In Proceedings of the 9th Conference on Hot Topics in Operating Systems - Volume 9 , HOTOS'03, pages 8 -- 8 , 2003 . Sameer Ajmani, Barbara Liskov, and Liuba Shrira. Scheduling and simulation: How to upgrade distributed systems. In Proceedings of the 9th Conference on Hot Topics in Operating Systems - Volume 9, HOTOS'03, pages 8--8, 2003."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/11785477_26"},{"key":"e_1_3_2_1_34_1","unstructured":"Apache Cassandra. http:\/\/cassandra.apache.org.  Apache Cassandra. http:\/\/cassandra.apache.org."},{"key":"e_1_3_2_1_35_1","unstructured":"Apache HBase. http:\/\/hbase.apache.org.  Apache HBase. http:\/\/hbase.apache.org."},{"key":"e_1_3_2_1_36_1","unstructured":"Apache Kafka. https:\/\/kafka.apache.org\/.  Apache Kafka. https:\/\/kafka.apache.org\/."},{"key":"e_1_3_2_1_37_1","unstructured":"Apache Mesos. https:\/\/mesos.apache.org\/.  Apache Mesos. https:\/\/mesos.apache.org\/."},{"key":"e_1_3_2_1_38_1","unstructured":"Apache ZooKeeper. https:\/\/zookeeper.apache.org\/.  Apache ZooKeeper. https:\/\/zookeeper.apache.org\/."},{"key":"e_1_3_2_1_39_1","volume-title":"Camil Demetrescu, and Irene Finocchi. A survey of symbolic execution techniques. ACM Computing Surveys (CSUR), 51(3):1--39","author":"Baldoni Roberto","year":"2018","unstructured":"Roberto Baldoni , Emilio Coppa , Daniele Cono D'elia , Camil Demetrescu, and Irene Finocchi. A survey of symbolic execution techniques. ACM Computing Surveys (CSUR), 51(3):1--39 , 2018 . Roberto Baldoni, Emilio Coppa, Daniele Cono D'elia, Camil Demetrescu, and Irene Finocchi. A survey of symbolic execution techniques. ACM Computing Surveys (CSUR), 51(3):1--39, 2018."},{"key":"e_1_3_2_1_40_1","first-page":"209","volume-title":"Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08","author":"Cadar Cristian","year":"2008","unstructured":"Cristian Cadar , Daniel Dunbar , and Dawson Engler . Klee : Unassisted and automatic generation of high-coverage tests for complex systems programs . In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08 , page 209 -- 224 , 2008 . Cristian Cadar, Daniel Dunbar, and Dawson Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, page 209--224, 2008."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/502034.502042"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/IC2E.2018.00022"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.5555\/1656980.1657005"},{"key":"e_1_3_2_1_44_1","first-page":"416","volume-title":"Network and Distributed System Security Symposium, NDSS'08","author":"Godefroid Patrice","year":"2008","unstructured":"Patrice Godefroid , Michael Y Levin , David A Molnar , Automated whitebox fuzz testing . In Network and Distributed System Security Symposium, NDSS'08 , pages 416 -- 426 , 2008 . Patrice Godefroid, Michael Y Levin, David A Molnar, et al. Automated whitebox fuzz testing. In Network and Distributed System Security Symposium, NDSS'08, pages 416--426, 2008."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2670979.2670986"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2987550.2987583"},{"key":"e_1_3_2_1_47_1","unstructured":"Hadoop Distributed File System (HDFS) architecture guide. https:\/\/hadoop.apache.org\/docs\/stable\/hadoop-project-dist\/hadoop-hdfs\/HdfsDesign.html.  Hadoop Distributed File System (HDFS) architecture guide. https:\/\/hadoop.apache.org\/docs\/stable\/hadoop-project-dist\/hadoop-hdfs\/HdfsDesign.html."},{"key":"e_1_3_2_1_48_1","unstructured":"Hadoop MapReduce. https:\/\/hadoop.apache.org\/docs\/stable\/hadoop-mapreduce-client\/hadoop-mapreduce-clientcore\/MapReduceTutorial.html.  Hadoop MapReduce. https:\/\/hadoop.apache.org\/docs\/stable\/hadoop-mapreduce-client\/hadoop-mapreduce-clientcore\/MapReduceTutorial.html."},{"key":"e_1_3_2_1_49_1","unstructured":"Apache Hadoop YARN. https:\/\/hadoop.apache.org\/docs\/stable\/hadoop-yarn\/hadoop-yarn-site\/YARN.html.  Apache Hadoop YARN. https:\/\/hadoop.apache.org\/docs\/stable\/hadoop-yarn\/hadoop-yarn-site\/YARN.html."},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.5555\/3291168.3291170"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3102980.3103005"},{"key":"e_1_3_2_1_52_1","volume-title":"Continuous delivery: reliable software releases through build, test, and deployment automation","author":"Humble Jez","year":"2015","unstructured":"Jez Humble and David Farley . Continuous delivery: reliable software releases through build, test, and deployment automation . Addison-Wesley , 2015 . Jez Humble and David Farley. Continuous delivery: reliable software releases through build, test, and deployment automation. Addison-Wesley, 2015."},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2254064.2254075"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/1592568.1592597"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/360248.360252"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2872362.2872374"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2043556.2043583"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2013.6606646"},{"key":"e_1_3_2_1_59_1","first-page":"389","volume-title":"Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20)","author":"Li Ze","year":"2020","unstructured":"Ze Li , Qian Cheng , Ken Hsieh , Yingnong Dang , Peng Huang , Pankaj Singh , Xinsheng Yang , Qingwei Lin , Youjiang Wu , Sebastien Levy , and Murali Chintalapati . Gandalf : An intelligent, end-to-end analytics service for safe deployment in large-scale cloud infrastructure . In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20) , pages 389 -- 402 , 2020 . Ze Li, Qian Cheng, Ken Hsieh, Yingnong Dang, Peng Huang, Pankaj Singh, Xinsheng Yang, Qingwei Lin, Youjiang Wu, Sebastien Levy, and Murali Chintalapati. Gandalf: An intelligent, end-to-end analytics service for safe deployment in large-scale cloud infrastructure. In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pages 389--402, 2020."},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/1181309.1181314"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3317550.3321438"},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.5555\/2591272.2591276"},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/1346281.1346323"},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/633025.633027"},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/940071.940110"},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.5555\/1251254.1251259"},{"key":"e_1_3_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.5555\/1251460.1251461"},{"key":"e_1_3_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/1950365.1950401"},{"key":"e_1_3_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304063"},{"key":"e_1_3_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/MS.2012.73"},{"key":"e_1_3_2_1_71_1","unstructured":"Redis: an open source advanced key-value store. http:\/\/redis.io\/.  Redis: an open source advanced key-value store. http:\/\/redis.io\/."},{"key":"e_1_3_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/2889160.2889223"},{"key":"e_1_3_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2009.4"},{"key":"e_1_3_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2010.26"},{"key":"e_1_3_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807128.1807161"},{"key":"e_1_3_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522727"},{"key":"e_1_3_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/3180155.3180194"},{"key":"e_1_3_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/2043556.2043572"},{"key":"e_1_3_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/2025113.2025121"},{"key":"e_1_3_2_1_80_1","first-page":"249","volume-title":"Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI'14)","author":"Yuan Ding","year":"2014","unstructured":"Ding Yuan , Yu Luo , Xin Zhuang , Guilherme Renna Rodrigues , Xu Zhao , Yongle Zhang , Pranay U Jain , and Michael Stumm . Simple testing can prevent most critical failures: An analysis of production failures in distributed data-intensive systems . In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI'14) , pages 249 -- 265 , 2014 . Ding Yuan, Yu Luo, Xin Zhuang, Guilherme Renna Rodrigues, Xu Zhao, Yongle Zhang, Pranay U Jain, and Michael Stumm. Simple testing can prevent most critical failures: An analysis of production failures in distributed data-intensive systems. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI'14), pages 249--265, 2014."},{"key":"e_1_3_2_1_81_1","volume-title":"The Fuzzing Book","author":"Zeller Andreas","year":"2021","unstructured":"Andreas Zeller , Rahul Gopinath , Marcel B\u00f6hme , Gordon Fraser , and Christian Holler . The Fuzzing Book . CISPA Helmholtz Center for Information Security , 2021 . Andreas Zeller, Rahul Gopinath, Marcel B\u00f6hme, Gordon Fraser, and Christian Holler. The Fuzzing Book. CISPA Helmholtz Center for Information Security, 2021."}],"event":{"name":"SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles","location":"Virtual Event Germany","acronym":"SOSP '21","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems","USENIX Assoc USENIX Assoc"]},"container-title":["Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477132.3483577","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3477132.3483577","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:49:16Z","timestamp":1750193356000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477132.3483577"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,26]]},"references-count":81,"alternative-id":["10.1145\/3477132.3483577","10.1145\/3477132"],"URL":"https:\/\/doi.org\/10.1145\/3477132.3483577","relation":{},"subject":[],"published":{"date-parts":[[2021,10,26]]},"assertion":[{"value":"2021-10-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}