{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,1]],"date-time":"2026-06-01T23:21:24Z","timestamp":1780356084425,"version":"3.54.1"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2015,11,20]],"date-time":"2015-11-20T00:00:00Z","timestamp":1447977600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2015,11,21]]},"abstract":"<jats:p>Modern storage systems orchestrate a group of disks to achieve their performance and reliability goals. Even though such systems are designed to withstand the failure of individual disks, failure of multiple disks poses a unique set of challenges. We empirically investigate disk failure data from a large number of production systems, specifically focusing on the impact of disk failures on RAID storage systems. Our data covers about one million SATA disks from six disk models for periods up to 5 years. We show how observed disk failures weaken the protection provided by RAID. The count of<jats:italic>reallocated sectors<\/jats:italic>correlates strongly with impending failures.<\/jats:p><jats:p>With these findings we designed RAIDS<jats:sc>hield<\/jats:sc>, which consists of two components. First, we have built and evaluated an active defense mechanism that monitors the health of each disk and replaces those that are predicted to fail imminently. This proactive protection has been incorporated into our product and is observed to eliminate 88% of triple disk errors, which are 80% of all RAID failures. Second, we have designed and simulated a method of using the joint failure probability to quantify and predict how likely a RAID group is to face multiple simultaneous disk failures, which can identify disks that collectively represent a risk of failure even when no individual disk is flagged in isolation. We find in simulation that RAID-level analysis can effectively identify most vulnerable RAID-6 systems, improving the coverage to 98% of triple errors.<\/jats:p><jats:p>We conclude with discussions of operational considerations in deploying RAIDS<jats:sc>hield<\/jats:sc>more broadly and new directions in the analysis of disk errors. One interesting approach is to combine multiple metrics, allowing the values of different indicators to be used for predictions. Using newer field data that reports an additional metric,<jats:italic>medium errors<\/jats:italic>, we find that the relative efficacy of reallocated sectors and medium errors varies across disk models, offering an additional way to predict failures.<\/jats:p>","DOI":"10.1145\/2820615","type":"journal-article","created":{"date-parts":[[2015,11,23]],"date-time":"2015-11-23T13:20:44Z","timestamp":1448284844000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":75,"title":["RAIDShield"],"prefix":"10.1145","volume":"11","author":[{"given":"Ao","family":"Ma","sequence":"first","affiliation":[{"name":"EMC Corporation, Santa Clara, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rachel","family":"Traylor","sequence":"additional","affiliation":[{"name":"EMC Corporation and University of Texas at Arlington, Santa Clara, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fred","family":"Douglis","sequence":"additional","affiliation":[{"name":"EMC Corporation, Santa Clara, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mark","family":"Chamness","sequence":"additional","affiliation":[{"name":"EMC Corporation, Santa Clara, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Guanlin","family":"Lu","sequence":"additional","affiliation":[{"name":"EMC Corporation, Santa Clara, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Darren","family":"Sawyer","sequence":"additional","affiliation":[{"name":"EMC Corporation, Santa Clara, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Surendar","family":"Chandra","sequence":"additional","affiliation":[{"name":"Datrium, Inc., Santa Clara, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Windsor","family":"Hsu","sequence":"additional","affiliation":[{"name":"Datrium, Inc."}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2015,11,20]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"117","article-title":"Monitoring hard disks with S.M.A.R.T","volume":"2004","author":"Allen Bruce","year":"2004","journal-title":"Linux Journal"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/264107.264132"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the 2014 International Conference on Computing, Networking and Communications (ICNC). IEEE, 907--913","author":"Amer Ahmed"},{"key":"e_1_2_1_4_1","volume-title":"ATA. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST\u201903)","author":"Anderson Dave","year":"2003"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 8th Workshop on Hot Topics in Operating Systems (HotOS VIII). 33--38","author":"Remzi"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1254882.1254917"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST\u201908)","author":"Bairavasundaram Lakshmi N."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2004.4"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1755913.1755926"},{"key":"e_1_2_1_10_1","volume-title":"SNIA Software Developers\u2019s Conference.","author":"Bonwick Jeff","year":"2008"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/176979.176981"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST\u201904)","author":"Corbett Peter","year":"2004"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1353452.1353453"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST\u201909)","author":"Cezary"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1516046.1516059"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2577386"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/945445.945450"},{"key":"e_1_2_1_18_1","unstructured":"Garth Gibson. 1992. Redundant Disk Arrays: Reliable Parallel Secondary Storage. Ph.D. Dissertation. University of California Berkeley CA. Garth Gibson. 1992. Redundant Disk Arrays: Reliable Parallel Secondary Storage. Ph.D. Dissertation. University of California Berkeley CA."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2421648.2421655"},{"key":"e_1_2_1_20_1","unstructured":"Moises Goldszmidt. 2012. Finding soon-to-fail disks in a haystack. In USENIX HotStorage\u201912. Moises Goldszmidt. 2012. Finding soon-to-fail disks in a haystack. In USENIX HotStorage\u201912."},{"key":"e_1_2_1_21_1","unstructured":"Kevin M. Greenan James S. Plank and Jay J. Wylie. 2010. Mean time to meaningless: MTTDL Markov models and storage system reliability. In USENIX HotStorage\u201910. 5. Kevin M. Greenan James S. Plank and Jay J. Wylie. 2010. Mean time to meaningless: MTTDL Markov models and storage system reliability. In USENIX HotStorage\u201910. 5."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies (FAST\u201905)","author":"Hafner James Lee","year":"2005"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies (FAST\u201905)","author":"Hafner James Lee"},{"key":"e_1_2_1_24_1","unstructured":"Greg Hamerly and Charles Elkan. 2001. Bayesian approaches to failure prediction for disk drives. In ICML\u201901. 202--209. Greg Hamerly and Charles Elkan. 2001. Bayesian approaches to failure prediction for disk drives. In ICML\u201901. 202--209."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the USENIX Annual Technical Conference.","author":"Cheng"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TR.2002.802886"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies (FAST\u201905)","author":"Jain Navendu","year":"2005"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/647828.737176"},{"key":"e_1_2_1_29_1","unstructured":"Hannu H. Kari. 1997. Latent Sector Faults and Reliability of Disk Arrays. Ph.D. Dissertation. Helsinki University of Technology Espoo Finland. Hannu H. Kari. 1997. Latent Sector Faults and Reliability of Disk Arrays. Ph.D. Dissertation. Helsinki University of Technology Espoo Finland."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST\u201912)","author":"Khan O."},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST\u201908)","author":"Krioukov Andrew"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1480439.1480444"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/2750482.2750501"},{"key":"e_1_2_1_34_1","unstructured":"Chris Mellor. 2014. Kryder\u2019s law craps out: Race to UBER-cheap storage is over. The A Register. Retrieved from http:\/\/www.theregister.co.uk\/2014\/11\/10\/kryders_law_of_ever_cheaper_storage_disproven. Chris Mellor. 2014. Kryder\u2019s law craps out: Race to UBER-cheap storage is over. The A Register. Retrieved from http:\/\/www.theregister.co.uk\/2014\/11\/10\/kryders_law_of_ever_cheaper_storage_disproven."},{"key":"e_1_2_1_35_1","unstructured":"Joseph F. Murray Gordon F. Hughes and Kenneth Kreutz-Delgado. 2003. Hard drive failure prediction using non-parametric statistical methods. In ICANN\/ICONIP. 4. Joseph F. Murray Gordon F. Hughes and Kenneth Kreutz-Delgado. 2003. Hard drive failure prediction using non-parametric statistical methods. In ICANN\/ICONIP. 4."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.5555\/1046920.1088699"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/NAS.2010.37"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/50202.50214"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/1973333.1973342"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.5555\/1267903.1267905"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2560013"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST\u201913)","author":"Plank J. S."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/1095810.1095830"},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST\u201907)","author":"Schroeder Bianca"},{"key":"e_1_2_1_45_1","doi-asserted-by":"crossref","unstructured":"Thomas J. E. Schwarz Qin Xin Ethan L. Miller Darrell D. E. Long Andy Hospodor and Spencer Ng. 2004. Disk scrubbing in large archival storage systems. In IEEE MASCOTS\u201904. IEEE Computer Society 409--418. Thomas J. E. Schwarz Qin Xin Ethan L. Miller Darrell D. E. Long Andy Hospodor and Spencer Ng. 2004. Disk scrubbing in large archival storage systems. In IEEE MASCOTS\u201904. IEEE Computer Society 409--418.","DOI":"10.1109\/MASCOT.2004.1348296"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2010.5496972"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the IEEE Workshop on Fault Tolerance in Parallel and Distributed Systems.","author":"Talagala Nisha","year":"1999"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.5555\/2208461.2208465"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1038\/scientificamerican0805-32"}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2820615","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2820615","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T05:48:13Z","timestamp":1750225693000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2820615"}},"subtitle":["Characterizing, Monitoring, and Proactively Protecting Against Disk Failures"],"short-title":[],"issued":{"date-parts":[[2015,11,20]]},"references-count":49,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,11,21]]}},"alternative-id":["10.1145\/2820615"],"URL":"https:\/\/doi.org\/10.1145\/2820615","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"value":"1553-3077","type":"print"},{"value":"1553-3093","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,11,20]]},"assertion":[{"value":"2015-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-11-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}