{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T15:59:58Z","timestamp":1770739198212,"version":"3.49.0"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2007,10,1]],"date-time":"2007-10-01T00:00:00Z","timestamp":1191196800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2007,10]]},"abstract":"<jats:p>Component failure in large-scale IT installations is becoming an ever-larger problem as the number of components in a single cluster approaches a million.<\/jats:p>\n          <jats:p>This article is an extension of our previous study on disk failures [Schroeder and Gibson 2007] and presents and analyzes field-gathered disk replacement data from a number of large production systems, including high-performance computing sites and internet services sites. More than 110,000 disks are covered by this data, some for an entire lifetime of five years. The data includes drives with SCSI and FC, as well as SATA interfaces. The mean time-to-failure (MTTF) of those drives, as specified in their datasheets, ranges from 1,000,000 to 1,500,000 hours, suggesting a nominal annual failure rate of at most 0.88%.<\/jats:p>\n          <jats:p>We find that in the field, annual disk replacement rates typically exceed 1%, with 2--4% common and up to 13% observed on some systems. This suggests that field replacement is a fairly different process than one might predict based on datasheet MTTF.<\/jats:p>\n          <jats:p>We also find evidence, based on records of disk replacements in the field, that failure rate is not constant with age, and that rather than a significant infant mortality effect, we see a significant early onset of wear-out degradation. In other words, the replacement rates in our data grew constantly with age, an effect often assumed not to set in until after a nominal lifetime of 5 years.<\/jats:p>\n          <jats:p>Interestingly, we observe little difference in replacement rates between SCSI, FC, and SATA drives, potentially an indication that disk-independent factors such as operating conditions affect replacement rates more than component-specific ones. On the other hand, we see only one instance of a customer rejecting an entire population of disks as a bad batch, in this case because of media error rates, and this instance involved SATA disks.<\/jats:p>\n          <jats:p>Time between replacement, a proxy for time between failure, is not well modeled by an exponential distribution and exhibits significant levels of correlation, including autocorrelation and long-range dependence.<\/jats:p>","DOI":"10.1145\/1288783.1288785","type":"journal-article","created":{"date-parts":[[2007,11,15]],"date-time":"2007-11-15T14:26:02Z","timestamp":1195136762000},"page":"8","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":93,"title":["Understanding disk failure rates"],"prefix":"10.1145","volume":"3","author":[{"given":"Bianca","family":"Schroeder","sequence":"first","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA"}]},{"given":"Garth A.","family":"Gibson","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA"}]}],"member":"320","published-online":{"date-parts":[[2007,10]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1254882.1254917"},{"key":"e_1_2_1_2_1","unstructured":"CFDR. 2007. The computer failure data repository. http:\/\/cfdr.usenix.org\/.  CFDR. 2007. The computer failure data repository. http:\/\/cfdr.usenix.org\/."},{"key":"e_1_2_1_3_1","unstructured":"Cole G. 2000. Estimating drive reliability in desktop computers and consumer electronics systems. TP-338.1. Seagate Technology November.  Cole G. 2000. Estimating drive reliability in desktop computers and consumer electronics systems. TP-338.1. Seagate Technology November."},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the Conference on File and Storage Technologies (FAST).","author":"Corbett P. F.","unstructured":"Corbett , P. F. , English , R. , Goel , A. , Grcanac , T. , Kleiman , S. , Leong , J. , and Sankar , S . 2004. Row-diagonal parity for double disk failure correction . In Proceedings of the Conference on File and Storage Technologies (FAST). Corbett, P. F., English, R., Goel, A., Grcanac, T., Kleiman, S., Leong, J., and Sankar, S. 2004. Row-diagonal parity for double disk failure correction. In Proceedings of the Conference on File and Storage Technologies (FAST)."},{"key":"e_1_2_1_5_1","unstructured":"Drummer D. Khurshudov A. Riedel E. and Watts R. 2006. Personal communication.  Drummer D. Khurshudov A. Riedel E. and Watts R. 2006. Personal communication."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/RAMS.2000.816286"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/RAMS.2000.816306"},{"key":"e_1_2_1_8_1","unstructured":"Elerath J. G. and Shah S. 2004. Server class drives: How reliable are they&quest; In Proceedings of the Annual Reliability and Maintainability Symposium.  Elerath J. G. and Shah S. 2004. Server class drives: How reliable are they&quest; In Proceedings of the Annual Reliability and Maintainability Symposium."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/945445.945450"},{"key":"e_1_2_1_10_1","volume-title":"Redundant disk arrays: Reliable, parallel secondary storage. Dissertation","author":"Gibson G. A.","unstructured":"Gibson , G. A. 1992. Redundant disk arrays: Reliable, parallel secondary storage. Dissertation . MIT Press , New York . Gibson, G. A. 1992. Redundant disk arrays: Reliable, parallel secondary storage. Dissertation. MIT Press, New York."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/24.58719"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 5th Symposium on Reliability in Distributed Software and Database Systems.","author":"Gray J.","year":"1986","unstructured":"Gray , J. 1986 . Why do computers stop and what can be done about it . In Proceedings of the 5th Symposium on Reliability in Distributed Software and Database Systems. Gray, J. 1986. Why do computers stop and what can be done about it. In Proceedings of the 5th Symposium on Reliability in Distributed Software and Database Systems."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/511334.511362"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/6420.6422"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.","author":"Kalyanakrishnam M.","unstructured":"Kalyanakrishnam , M. , Kalbarczyk , Z. , and Iyer , R . 1999. Failure data analysis of a LAN of Windows NT-based computers . In Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems. Kalyanakrishnam, M., Kalbarczyk, Z., and Iyer, R. 1999. Failure data analysis of a LAN of Windows NT-based computers. In Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems."},{"key":"e_1_2_1_16_1","volume-title":"Selfis: A short tutorial. Tech. rep.","author":"Karagiannis T.","year":"2002","unstructured":"Karagiannis , T. 2002 . Selfis: A short tutorial. Tech. rep. , University of California , Riverside. Karagiannis, T. 2002. Selfis: A short tutorial. Tech. rep., University of California, Riverside."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIC.2004.46"},{"key":"e_1_2_1_18_1","unstructured":"LANL. http:\/\/www.lanl.gov\/projects\/computerscience\/data\/.  LANL. http:\/\/www.lanl.gov\/projects\/computerscience\/data\/."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/90.282603"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/24.58720"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the International Symposium on Fault-Tolerant Computing.","author":"Meyer J.","unstructured":"Meyer , J. and Wei , L . 1988. Analysis of workload influence on dependability . In Proceedings of the International Symposium on Fault-Tolerant Computing. Meyer, J. and Wei, L. 1988. Analysis of workload influence on dependability. In Proceedings of the International Symposium on Fault-Tolerant Computing."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1002\/qre.4680110505"},{"key":"e_1_2_1_23_1","unstructured":"NERSC. 2007. Systems disk failure. http:\/\/pdsi.nersc.gov\/all_diskfailure.php.  NERSC. 2007. Systems disk failure. http:\/\/pdsi.nersc.gov\/all_diskfailure.php."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/11549468_50"},{"key":"e_1_2_1_25_1","unstructured":"Oppenheimer D. L. Ganapathi A. and Patterson D. A. 2003. Why do internet services fail and what can be done about it&quest; In USENIX Symposium on Internet Technologies and Systems.   Oppenheimer D. L. Ganapathi A. and Patterson D. A. 2003. Why do internet services fail and what can be done about it&quest; In USENIX Symposium on Internet Technologies and Systems."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/50202.50214"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the Conference on File and Storage Technologies (FAST).","author":"Pinheiro E.","unstructured":"Pinheiro , E. , Weber , W. D. , and Barroso , L. A . 2007. Failure trends in a large disk drive population . In Proceedings of the Conference on File and Storage Technologies (FAST). Pinheiro, E., Weber, W. D., and Barroso, L. A. 2007. Failure trends in a large disk drive population. In Proceedings of the Conference on File and Storage Technologies (FAST)."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1095810.1095830"},{"key":"e_1_2_1_29_1","volume-title":"Introduction to Probability Models","author":"Ross S. M.","unstructured":"Ross , S. M. Introduction to Probability Models . 6 th edn. Academic Press . Ross, S. M. Introduction to Probability Models. 6th edn. Academic Press.","edition":"6"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the International Conference on Dependable Systems and Networks (DSN).","author":"Sahoo R. K.","unstructured":"Sahoo , R. K. , Sivasubramaniam , A. , Squillante , M. S. , and Zhang , Y . 2004. Failure data analysis of a large-scale heterogeneous server environment . In Proceedings of the International Conference on Dependable Systems and Networks (DSN). Sahoo, R. K., Sivasubramaniam, A., Squillante, M. S., and Zhang, Y. 2004. Failure data analysis of a large-scale heterogeneous server environment. In Proceedings of the International Conference on Dependable Systems and Networks (DSN)."},{"key":"e_1_2_1_31_1","unstructured":"Schroeder B. and Gibson G. A. 2007. Disk failures in the real world: What does an MTTF of 1 000 000 hours mean to you&quest; In Proceedings of the Conference on File and Storage Technologies (FAST).   Schroeder B. and Gibson G. A. 2007. Disk failures in the real world: What does an MTTF of 1 000 000 hours mean to you&quest; In Proceedings of the Conference on File and Storage Technologies (FAST)."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2006.5"},{"key":"e_1_2_1_33_1","volume-title":"NASA\/IEEE Conference on Mass Storage Systems and Technologies (MSST) Work in Progress Session.","author":"Schwarz T.","unstructured":"Schwarz , T. , Baker , M. , Bassi , S. , Baumgart , B. , Flagg , W. , van Ingen , C. , Joste , K. , Manasse , M. , and Shah , M . 2006. Disk failure investigations at the internet archive . In NASA\/IEEE Conference on Mass Storage Systems and Technologies (MSST) Work in Progress Session. Schwarz, T., Baker, M., Bassi, S., Baumgart, B., Flagg, W., van Ingen, C., Joste, K., Manasse, M., and Shah, M. 2006. Disk failure investigations at the internet archive. In NASA\/IEEE Conference on Mass Storage Systems and Technologies (MSST) Work in Progress Session."},{"key":"e_1_2_1_34_1","volume-title":"The IEEE Workshop on Fault Tolerance in Parallel and Distributed Systems.","author":"Talagala N.","unstructured":"Talagala , N. and Patterson , D . 1999. An analysis of error behaviour in a large storage system . In The IEEE Workshop on Fault Tolerance in Parallel and Distributed Systems. Talagala, N. and Patterson, D. 1999. An analysis of error behaviour in a large storage system. In The IEEE Workshop on Fault Tolerance in Parallel and Distributed Systems."},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the International Symposium on Fault-tolerant Computing.","author":"Tang D.","unstructured":"Tang , D. , Iyer , R. K. , and Subramani , S. S . 1990. Failure analysis and modelling of a VAX cluster system . In Proceedings of the International Symposium on Fault-tolerant Computing. Tang, D., Iyer, R. K., and Subramani, S. S. 1990. Failure analysis and modelling of a VAX cluster system. In Proceedings of the International Symposium on Fault-tolerant Computing."},{"key":"e_1_2_1_36_1","volume-title":"Tech. Rep. MSR-TR-2005-166, Microsoft Research, December.","author":"van Ingen C.","year":"2005","unstructured":"van Ingen , C. and Gray , J . 2005 . Empirical measurements of disk failure rates and error rates. Tech. Rep. MSR-TR-2005-166, Microsoft Research, December. van Ingen, C. and Gray, J. 2005. Empirical measurements of disk failure rates and error rates. Tech. Rep. MSR-TR-2005-166, Microsoft Research, December."},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the Pacific Rim International Symposium on Dependable Computing.","author":"Xu J.","unstructured":"Xu , J. , Kalbarczyk , Z. , and Iyer , R. K . 1999. Networked Windows NT system field failure data analysis . In Proceedings of the Pacific Rim International Symposium on Dependable Computing. Xu, J., Kalbarczyk, Z., and Iyer, R. K. 1999. Networked Windows NT system field failure data analysis. In Proceedings of the Pacific Rim International Symposium on Dependable Computing."},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the Annual Reliability and Maintainability Symposium.","author":"Yang J.","unstructured":"Yang , J. and Sun , F . -B. 1999. A comprehensive review of hard-disk drive reliability . In Proceedings of the Annual Reliability and Maintainability Symposium. Yang, J. and Sun, F.-B. 1999. A comprehensive review of hard-disk drive reliability. In Proceedings of the Annual Reliability and Maintainability Symposium."}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1288783.1288785","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1288783.1288785","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T14:58:03Z","timestamp":1750258683000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1288783.1288785"}},"subtitle":["What does an MTTF of 1,000,000 hours mean to you?"],"short-title":[],"issued":{"date-parts":[[2007,10]]},"references-count":38,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2007,10]]}},"alternative-id":["10.1145\/1288783.1288785"],"URL":"https:\/\/doi.org\/10.1145\/1288783.1288785","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"value":"1553-3077","type":"print"},{"value":"1553-3093","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,10]]},"assertion":[{"value":"2007-10-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}