{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,27]],"date-time":"2025-12-27T07:33:35Z","timestamp":1766820815800,"version":"3.37.0"},"reference-count":20,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2008,11]]},"abstract":"<jats:p>Building reliable storage systems becomes increasingly challenging as the complexity of modern storage systems continues to grow. Understanding storage failure characteristics is crucially important for designing and building a reliable storage system. While several recent studies have been conducted on understanding storage failures, almost all of them focus on the failure characteristics of one component\u2014disks\u2014and do not study other storage component failures.<\/jats:p><jats:p>This article analyzes the failure characteristics of storage subsystems. More specifically, we analyzed the storage logs collected from about 39,000 storage systems commercially deployed at various customer sites. The dataset covers a period of 44 months and includes about 1,800,000 disks hosted in about 155,000 storage-shelf enclosures. Our study reveals many interesting findings, providing useful guidelines for designing reliable storage systems. Some of our major findings include: (1) In addition to disk failures that contribute to 20--55% of storage subsystem failures, other components such as physical interconnects and protocol stacks also account for a significant percentage of storage subsystem failures. (2) Each individual storage subsystem failure type, and storage subsystem failure as a whole, exhibits strong self-correlations. In addition, these failures exhibit \u201cbursty\u201d patterns. (3) Storage subsystems configured with redundant interconnects experience 30--40% lower failure rates than those with a single interconnect. (4) Spanning disks of a RAID group across multiple shelves provides a more resilient solution for storage subsystems than within a single shelf.<\/jats:p>","DOI":"10.1145\/1416944.1416946","type":"journal-article","created":{"date-parts":[[2008,12,4]],"date-time":"2008-12-04T17:19:26Z","timestamp":1228411166000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":75,"title":["Are disks the dominant contributor for storage failures?"],"prefix":"10.1145","volume":"4","author":[{"given":"Weihang","family":"Jiang","sequence":"first","affiliation":[{"name":"University of Illinois at Urbana Champaign, Urbana, IL"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chongfeng","family":"Hu","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana Champaign, Urbana, IL"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuanyuan","family":"Zhou","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana Champaign, Urbana, IL"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Arkady","family":"Kanevsky","sequence":"additional","affiliation":[{"name":"Network Appliance, Inc., Sunnyvale, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2008,11,24]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"9","article-title":"Monitoring hard disks with smart","volume":"117","author":"Allen B.","year":"2004","unstructured":"Allen , B. 2004 . Monitoring hard disks with smart . Linux J. 117 , 9 . Allen, B. 2004. Monitoring hard disks with smart. Linux J. 117, 9.","journal-title":"Linux J."},{"volume-title":"Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST)","author":"Bairavasundaram L. N.","key":"e_1_2_1_2_1","unstructured":"Bairavasundaram , L. N. , Goodson , G. R. , Schroeder , B. , Arpaci-Dusseau , A. C. , and Arpaci-Dusseau , R. H . 2008. An Analysis of data corruption in the storage stack . In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST) , San Jose, CA. Bairavasundaram, L. N., Goodson, G. R., Schroeder, B., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2008. An Analysis of data corruption in the storage stack. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST), San Jose, CA."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1269899.1254917"},{"volume-title":"Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST), 1--14","author":"Corbett P.","key":"e_1_2_1_5_1","unstructured":"Corbett , P. , English , B. , Goel , A. , Grcanac , T. , Kleiman , S. , Leong , J. , and Sankar , S . 2004. Row-Diagonal parity for double disk failure correction . In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST), 1--14 . Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., and Sankar, S. 2004. Row-Diagonal parity for double disk failure correction. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST), 1--14."},{"volume-title":"Proceedings of the IEEE Reliability and Maintainability Symposium, 151--156","author":"Elerath J. G.","key":"e_1_2_1_6_1","unstructured":"Elerath , J. G. and Shah , S . 2004. Server class disk drives: How reliable are they . In Proceedings of the IEEE Reliability and Maintainability Symposium, 151--156 . Elerath, J. G. and Shah, S. 2004. Server class disk drives: How reliable are they. In Proceedings of the IEEE Reliability and Maintainability Symposium, 151--156."},{"volume-title":"Proceedings of the Reliability and Maintainability Symposium, 608--612","author":"Elerath J. G.","key":"e_1_2_1_7_1","unstructured":"Elerath , J. G. and Shah , S . 2003. Disk drive reliability case study: Dependence upon head fly-height and quantity of heads . In Proceedings of the Reliability and Maintainability Symposium, 608--612 . Elerath, J. G. and Shah, S. 2003. Disk drive reliability case study: Dependence upon head fly-height and quantity of heads. In Proceedings of the Reliability and Maintainability Symposium, 608--612."},{"key":"e_1_2_1_8_1","unstructured":"EMC. 2007. EMC symmetrix DMX-4 specification sheet. http:\/\/www.emc.com\/products\/systems\/symmetrix\/symmetri_DMX1000\/pdf\/DMX3000.pdf. EMC. 2007. EMC symmetrix DMX-4 specification sheet. http:\/\/www.emc.com\/products\/systems\/symmetrix\/symmetri_DMX1000\/pdf\/DMX3000.pdf."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/945445.945450"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/24.58719"},{"volume-title":"Proceedings of the 15th USENIX Conference on System Administration (LISA)","author":"Lancaster L.","key":"e_1_2_1_11_1","unstructured":"Lancaster , L. and Rowe , A . 2001. Measuring real-world data availability . In Proceedings of the 15th USENIX Conference on System Administration (LISA) , Berkeley, CA, 93--100. Lancaster, L. and Rowe, A. 2001. Measuring real-world data availability. In Proceedings of the 15th USENIX Conference on System Administration (LISA), Berkeley, CA, 93--100."},{"key":"e_1_2_1_12_1","unstructured":"NetApp. 2008. FAS6000 series technical specifications. http:\/\/www.netapp.com\/products\/filer\/fas6000_tech_specs.html. NetApp. 2008. FAS6000 series technical specifications. http:\/\/www.netapp.com\/products\/filer\/fas6000_tech_specs.html."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/50202.50214"},{"volume-title":"Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST)","author":"Pinheiro E.","key":"e_1_2_1_14_1","unstructured":"Pinheiro , E. , Weber , W.-D. , and Barroso , L. A . 2007. Failure trends in a large disk drive population . In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST) , Berkeley, CA. Pinheiro, E., Weber, W.-D., and Barroso, L. A. 2007. Failure trends in a large disk drive population. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST), Berkeley, CA."},{"volume-title":"Elementary Principles of Statistics","author":"Rosander A. C.","key":"e_1_2_1_15_1","unstructured":"Rosander , A. C. 1951. Elementary Principles of Statistics . D. Van Nostrand Company . Rosander, A. C. 1951. Elementary Principles of Statistics. D. Van Nostrand Company."},{"key":"e_1_2_1_16_1","unstructured":"Schroeder B. and Gibson G. A. 2007. Disk failures in the real world: What does an MTTF of 1 000 000 hours mean to you&quest; In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST) Berkeley CA. Schroeder B. and Gibson G. A. 2007. Disk failures in the real world: What does an MTTF of 1 000 000 hours mean to you&quest; In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST) Berkeley CA."},{"key":"e_1_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Schulze M. Gibson G. A. Katz R. H. and Patterson D. A. 1989. How reliable is a RAID&quest; In Proceedings of the COMPCON. 118--123. Schulze M. Gibson G. A. Katz R. H. and Patterson D. A. 1989. How reliable is a RAID&quest; In Proceedings of the COMPCON. 118--123.","DOI":"10.1109\/CMPCON.1989.301913"},{"volume-title":"Proceedings of the IEEE Reliability and Maintainability Symposium, 226--231","author":"Shah S.","key":"e_1_2_1_18_1","unstructured":"Shah , S. and Elerath , J. G . 2005. Reliability analysis of disk drive failure mechanisms . In Proceedings of the IEEE Reliability and Maintainability Symposium, 226--231 . Shah, S. and Elerath, J. G. 2005. Reliability analysis of disk drive failure mechanisms. In Proceedings of the IEEE Reliability and Maintainability Symposium, 226--231."},{"key":"e_1_2_1_19_1","unstructured":"SNIA. 2008. Storage Networking Industry Association dictionary. http:\/\/www.snia.org\/education\/dictionary\/. SNIA. 2008. Storage Networking Industry Association dictionary. http:\/\/www.snia.org\/education\/dictionary\/."},{"key":"e_1_2_1_20_1","volume-title":"Tech. Rep. UCB\/CSD-99-1042. Electrical Engineering and Computer Science Department","author":"Talagala N.","year":"1999","unstructured":"Talagala , N. and Patterson , D . 1999 . An analysis of error behaviour in a large storage system. Tech. Rep. UCB\/CSD-99-1042. Electrical Engineering and Computer Science Department , University of California, Berkeley . February. Talagala, N. and Patterson, D. 1999. An analysis of error behaviour in a large storage system. Tech. Rep. UCB\/CSD-99-1042. Electrical Engineering and Computer Science Department, University of California, Berkeley. February."},{"volume-title":"Proceedings of the Reliability and Maintainability Symposium, 403--409","author":"Yang J.","key":"e_1_2_1_21_1","unstructured":"Yang , J. and Sun , F . -B. 1999. A comprehensive review of hard-disk drive reliability . In Proceedings of the Reliability and Maintainability Symposium, 403--409 . Yang, J. and Sun, F.-B. 1999. A comprehensive review of hard-disk drive reliability. In Proceedings of the Reliability and Maintainability Symposium, 403--409."}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1416944.1416946","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,5]],"date-time":"2025-02-05T12:06:20Z","timestamp":1738757180000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1416944.1416946"}},"subtitle":["A comprehensive study of storage subsystem failure characteristics"],"short-title":[],"issued":{"date-parts":[[2008,11]]},"references-count":20,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2008,11]]}},"alternative-id":["10.1145\/1416944.1416946"],"URL":"https:\/\/doi.org\/10.1145\/1416944.1416946","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"type":"print","value":"1553-3077"},{"type":"electronic","value":"1553-3093"}],"subject":[],"published":{"date-parts":[[2008,11]]},"assertion":[{"value":"2008-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-11-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}