{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,24]],"date-time":"2025-08-24T22:48:35Z","timestamp":1756075715067,"version":"3.41.0"},"reference-count":25,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2016,8,16]],"date-time":"2016-08-16T00:00:00Z","timestamp":1471305600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2016,8,29]]},"abstract":"<jats:p>\n            Data-intensive applications require extreme scaling of their underlying storage systems. Such scaling, together with the fact that storage systems must be implemented in actual data centers, increases the risk of data loss from failures of underlying components. Accurate engineering requires quantitatively predicting reliability, but this remains challenging due to the need to account for extreme scale, redundancy scheme type and strength, distribution architecture, and component dependencies. This article introduces CQS\n            <jats:sc>im<\/jats:sc>\n            -R, a tool suite for predicting the reliability of large-scale storage system designs and deployments. CQS\n            <jats:sc>im<\/jats:sc>\n            -R includes (a) direct calculations based on an only-drives-fail failure model and (b) an event-based simulator for detailed prediction that handles failures of and failure dependencies among arbitrary (drive or nondrive) components. These are based on a common combinatorial framework for modeling placement strategies. The article demonstrates CQS\n            <jats:sc>im<\/jats:sc>\n            -R using models of common storage systems, including replicated and erasure coded designs. New results, such as the poor reliability scaling of spread-placed systems and a quantification of the impact of data center distribution and rack-awareness on reliability, demonstrate the usefulness and generality of the tools. Analysis and empirical studies show the tools\u2019 soundness, performance, and scalability.\n          <\/jats:p>","DOI":"10.1145\/2911987","type":"journal-article","created":{"date-parts":[[2016,8,16]],"date-time":"2016-08-16T12:14:20Z","timestamp":1471349660000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Tools for Predicting the Reliability of Large-Scale Storage Systems"],"prefix":"10.1145","volume":"12","author":[{"given":"Robert J.","family":"Hall","sequence":"first","affiliation":[{"name":"AT&amp;T Labs Research, Bedminster, NJ"}]}],"member":"320","published-online":{"date-parts":[[2016,8,16]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/24.3761"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1254882.1254917"},{"key":"e_1_2_1_4_1","volume-title":"Retrieved","author":"LHC.","year":"2016","unstructured":"CERN- LHC. 2016 . Computing . Retrieved June 28, 2016, from http:\/\/home.cern\/about\/computing. CERN-LHC. 2016. Computing. Retrieved June 28, 2016, from http:\/\/home.cern\/about\/computing."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/176979.176981"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 2013 Usenix Annual Technical Conference.","author":"Cidon Asaf","year":"2013","unstructured":"Asaf Cidon , Stephen Rumble , Ryan Stutsman , Sachin Katti , John Ousterhout , and Mendel Rosenblum . 2013 . Copysets: Reducing the frequency of data loss in cloud storage . In Proceedings of the 2013 Usenix Annual Technical Conference. Asaf Cidon, Stephen Rumble, Ryan Stutsman, Sachin Katti, John Ousterhout, and Mendel Rosenblum. 2013. Copysets: Reducing the frequency of data loss in cloud storage. In Proceedings of the 2013 Usenix Annual Technical Conference."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2007.41"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2577386"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 9th Usenix Symposium on Operating Systems Design and Implementation.","author":"Ford Daniel","year":"2010","unstructured":"Daniel Ford , Franois Labelle , Florentina I. Popovici , Murray Stokely , Van anh Truong , Luiz Barroso , Carrie Grimes , and Sean Quinlan . 2010 . Availability in globally distributed storage systems . In Proceedings of the 9th Usenix Symposium on Operating Systems Design and Implementation. Daniel Ford, Franois Labelle, Florentina I. Popovici, Murray Stokely, Van anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. 2010. Availability in globally distributed storage systems. In Proceedings of the 9th Usenix Symposium on Operating Systems Design and Implementation."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 2nd USENIX Conference on Hot Topics in Storage Systems.","author":"Greenan Kevin","year":"2010","unstructured":"Kevin Greenan , James Plank , and Jay Wylie . 2010 . Mean time to meaningless: MTTDL, Markov models, and storage system reliability . In Proceedings of the 2nd USENIX Conference on Hot Topics in Storage Systems. Kevin Greenan, James Plank, and Jay Wylie. 2010. Mean time to meaningless: MTTDL, Markov models, and storage system reliability. In Proceedings of the 2nd USENIX Conference on Hot Topics in Storage Systems."},{"key":"e_1_2_1_11_1","article-title":"Power law distribution: Method of multi-scale inferential statistics","author":"Guerriero V.","year":"2012","unstructured":"V. Guerriero . 2012 . Power law distribution: Method of multi-scale inferential statistics . Journal of Modern Mathematics Frontier, 21--28. V. Guerriero. 2012. Power law distribution: Method of multi-scale inferential statistics. Journal of Modern Mathematics Frontier, 21--28.","journal-title":"Journal of Modern Mathematics Frontier, 21--28."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2700311"},{"volume-title":"Proceedings of the IEEE 29th Symposium on Mass Storage Systems and Technologies.","author":"Kao H.-W.","key":"e_1_2_1_13_1","unstructured":"H.-W. Kao , J.-F. Paris , T. Schwarz , and D. Long . 2013. A flexible simulation tool for estimating data loss risks in storage arrays . In Proceedings of the IEEE 29th Symposium on Mass Storage Systems and Technologies. H.-W. Kao, J.-F. Paris, T. Schwarz, and D. Long. 2013. A flexible simulation tool for estimating data loss risks in storage arrays. In Proceedings of the IEEE 29th Symposium on Mass Storage Systems and Technologies."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536222.2536234"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the Workshop on Designing Storage Architectures for Digital Collections","author":"Patiejunas Kestutis","year":"2014","unstructured":"Kestutis Patiejunas . 2014 . Freezing exabytes of data at Facebook\u2019s cold storage . In Proceedings of the Workshop on Designing Storage Architectures for Digital Collections 2014. Kestutis Patiejunas. 2014. Freezing exabytes of data at Facebook\u2019s cold storage. In Proceedings of the Workshop on Designing Storage Architectures for Digital Collections 2014."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/1267903.1267905"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2006.61"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1837915.1837917"},{"volume-title":"Proceedings of the 5th Usenix Conference on File and Storage Technologies.","author":"Schroeder Bianca","key":"e_1_2_1_19_1","unstructured":"Bianca Schroeder and Garth A. Gibson . 2007. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the 5th Usenix Conference on File and Storage Technologies. Bianca Schroeder and Garth A. Gibson. 2007. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the 5th Usenix Conference on File and Storage Technologies."},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 6th USENIX Conference on File and Storage Technologies.","author":"Storer Mark","year":"2008","unstructured":"Mark Storer , Kevin Greenan , Ethan Miller , and Kaladhar Voruganti . 2008 . Pergamum: Replacing tape with energy efficient, reliable, disk-based archival storage . In Proceedings of the 6th USENIX Conference on File and Storage Technologies. Mark Storer, Kevin Greenan, Ethan Miller, and Kaladhar Voruganti. 2008. Pergamum: Replacing tape with energy efficient, reliable, disk-based archival storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/QEST.2012.32"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOTS.2011.53"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOTS.2012.31"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementations.","author":"Weil Sage A.","year":"2006","unstructured":"Sage A. Weil , Scott A. Brandt , Ethan L. Miller , Darrell D. E. Long , and Carlos Maltzahn . 2006 b. Ceph: A scalable, high-performance distributed file system . In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementations. Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006b. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementations."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1188455.1188582"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/824467.825001"}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2911987","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2911987","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:56:07Z","timestamp":1750222567000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2911987"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,8,16]]},"references-count":25,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2016,8,29]]}},"alternative-id":["10.1145\/2911987"],"URL":"https:\/\/doi.org\/10.1145\/2911987","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"type":"print","value":"1553-3077"},{"type":"electronic","value":"1553-3093"}],"subject":[],"published":{"date-parts":[[2016,8,16]]},"assertion":[{"value":"2015-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-08-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}