{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T01:37:52Z","timestamp":1773193072233,"version":"3.50.1"},"reference-count":12,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2021,4,11]],"date-time":"2021-04-11T00:00:00Z","timestamp":1618099200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGCOMM Comput. Commun. Rev."],"published-print":{"date-parts":[[2021,4,11]]},"abstract":"<jats:p>Switch failures can hamper access to client services, cause link congestion and blackhole network traffic. In this study, we examine the nature of switch failures in the datacenters of a large commercial cloud provider through the lens of survival theory. We study a cohort of over 180,000 switches with a variety of hardware and software configurations and find that datacenter switches have a 98% likelihood of functioning uninterrupted for over 3 months since deployment in production. However, there is significant heterogeneity in switch survival rates with respect to their hardware and software: the switches of one vendor are twice as likely to fail compared to the others. We attribute the majority of switch failures to hardware impairments and unplanned power losses. We find that the in-house switch operating system, SONiC, boosts the survival likelihood of switches in datacenters by 1% by eliminating switch failures caused by software bugs in vendor switch OSes.<\/jats:p>","DOI":"10.1145\/3464994.3464996","type":"journal-article","created":{"date-parts":[[2021,5,10]],"date-time":"2021-05-10T22:10:44Z","timestamp":1620684644000},"page":"2-9","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":25,"title":["Surviving switch failures in cloud datacenters"],"prefix":"10.1145","volume":"51","author":[{"given":"Rachee","family":"Singh","sequence":"first","affiliation":[{"name":"Microsoft"}]},{"given":"Muqeet","family":"Mukhtar","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Ashay","family":"Krishna","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Aniruddha","family":"Parkhi","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Jitendra","family":"Padhye","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"David","family":"Maltz","sequence":"additional","affiliation":[{"name":"Microsoft"}]}],"member":"320","published-online":{"date-parts":[[2021,5,10]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Arista Networks. AAA Configuration. https:\/\/www.arista.com\/en\/um-eos\/eos-aaa-configuration. (Accessed on 2020-05-11).  Arista Networks. AAA Configuration. https:\/\/www.arista.com\/en\/um-eos\/eos-aaa-configuration. (Accessed on 2020-05-11)."},{"key":"e_1_2_1_2_1","unstructured":"Arista Networks. EOS Central: Does this indicate a possible DRAM issue? https:\/\/eos.arista.com\/forum\/getting-ipt_crcerrpkt-and-jer_int_idr_mmu_ecc_1b_err_int-log-output-does-this-indicate-a-possible-dram-issue-on-bank-b\/. (Accessed on 2020-05-11).  Arista Networks. EOS Central: Does this indicate a possible DRAM issue? https:\/\/eos.arista.com\/forum\/getting-ipt_crcerrpkt-and-jer_int_idr_mmu_ecc_1b_err_int-log-output-does-this-indicate-a-possible-dram-issue-on-bank-b\/. (Accessed on 2020-05-11)."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2377677.2377760"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.2517-6161.1972.tb00899.x"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2018436.2018477"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1958.10501452"},{"key":"e_1_2_1_7_1","volume-title":"Survival analysis","author":"Kleinbaum D. G.","year":"2010","unstructured":"D. G. Kleinbaum and M. Klein . Survival analysis , volume 3 . Springer , 2010 . D. G. Kleinbaum and M. Klein. Survival analysis, volume 3. Springer, 2010."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3278532.3278566"},{"key":"e_1_2_1_9_1","unstructured":"Microsoft Azure. Software for open networking in the cloud. https:\/\/azure.github.io\/SONiC\/. (Accessed on 2020-05-11).  Microsoft Azure. Software for open networking in the cloud. https:\/\/azure.github.io\/SONiC\/. (Accessed on 2020-05-11)."},{"key":"e_1_2_1_10_1","unstructured":"Net-SNMP. SNMP coldStart. http:\/\/net-snmp.sourceforge.net\/docs\/mibs\/snmpMIB.html. (Accessed on 2020-05-11).  Net-SNMP. SNMP coldStart. http:\/\/net-snmp.sourceforge.net\/docs\/mibs\/snmpMIB.html. (Accessed on 2020-05-11)."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2504730.2504737"},{"key":"e_1_2_1_12_1","volume-title":"Jupiter rising: A decade of clos topologies and centralized control in google's datacenter network. ACM SIGCOMM computer communication review, 45(4):183--197","author":"Singh A.","year":"2015","unstructured":"A. Singh , J. Ong , A. Agarwal , G. Anderson , A. Armistead , R. Bannon , S. Boving , G. Desai , B. Felderman , P. Germano , Jupiter rising: A decade of clos topologies and centralized control in google's datacenter network. ACM SIGCOMM computer communication review, 45(4):183--197 , 2015 . A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, et al. Jupiter rising: A decade of clos topologies and centralized control in google's datacenter network. ACM SIGCOMM computer communication review, 45(4):183--197, 2015."}],"container-title":["ACM SIGCOMM Computer Communication Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3464994.3464996","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3464994.3464996","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:18:25Z","timestamp":1750191505000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3464994.3464996"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,11]]},"references-count":12,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2021,4,11]]}},"alternative-id":["10.1145\/3464994.3464996"],"URL":"https:\/\/doi.org\/10.1145\/3464994.3464996","relation":{},"ISSN":["0146-4833"],"issn-type":[{"value":"0146-4833","type":"print"}],"subject":[],"published":{"date-parts":[[2021,4,11]]},"assertion":[{"value":"2021-05-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}