{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,24]],"date-time":"2026-04-24T15:12:26Z","timestamp":1777043546353,"version":"3.51.4"},"reference-count":22,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2011,2,1]],"date-time":"2011-02-01T00:00:00Z","timestamp":1296518400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Commun. ACM"],"published-print":{"date-parts":[[2011,2]]},"abstract":"<jats:p>Errors in dynamic random access memory (DRAM) are a common form of hardware failure in modern compute clusters. Failures are costly both in terms of hardware replacement costs and service disruption. While a large body of work exists on DRAM in laboratory conditions, little has been reported on real DRAM failures in large production clusters. In this paper, we analyze measurements of memory errors in a large fleet of commodity servers over a period of 2.5 years. The collected data covers multiple vendors, DRAM capacities and technologies, and comprises many millions of dual in-line memory module (DIMM) days.<\/jats:p>\n          <jats:p>The goal of this paper is to answer questions such as the following: How common are memory errors in practice? What are their statistical properties? How are they affected by external factors, such as temperature and utilization, and by chip-specific factors, such as chip density, memory technology, and DIMM age?<\/jats:p>\n          <jats:p>We find that DRAM error behavior in the field differs in many key aspects from commonly held assumptions. For example, we observe DRAM error rates that are orders of magnitude higher than previously reported, with 25,000--70,000 errors per billion device hours per Mb and more than 8% of DIMMs affected by errors per year. We provide strong evidence that memory errors are dominated by hard errors, rather than soft errors, which previous work suspects to be the dominant error mode. We find that temperature, known to strongly impact DIMM error rates in lab conditions, has a surprisingly small effect on error behavior in the field, when taking all other factors into account. Finally, unlike commonly feared, we do not observe any indication that newer generations of DIMMs have worse error behavior.<\/jats:p>","DOI":"10.1145\/1897816.1897844","type":"journal-article","created":{"date-parts":[[2011,2,1]],"date-time":"2011-02-01T15:50:21Z","timestamp":1296575421000},"page":"100-107","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":90,"title":["DRAM errors in the wild"],"prefix":"10.1145","volume":"54","author":[{"given":"Bianca","family":"Schroeder","sequence":"first","affiliation":[{"name":"University of Toronto, Toronto, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eduardo","family":"Pinheiro","sequence":"additional","affiliation":[{"name":"Google Inc., Mountain View, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wolf-Dietrich","family":"Weber","sequence":"additional","affiliation":[{"name":"Google Inc., Mountain View, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2011,2]]},"reference":[{"key":"e_1_2_1_1_1","article-title":"adds soft-error protection, correction","author":"Mosys","year":"2002","unstructured":"Mosys adds soft-error protection, correction . Semiconductor Business News (28 Jan. 2002 ). Mosys adds soft-error protection, correction. Semiconductor Business News (28 Jan. 2002).","journal-title":"Semiconductor Business News"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/648021.746286"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/MDT.2005.69"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of 46th Annual International Reliability Physics Symposium","author":"Borucki L.","year":"2008","unstructured":"Borucki , L. , Schindlbeck , G. , Slayman , C. Comparison of accelerated DRAM soft error rates measred at component and system level . In Proceedings of 46th Annual International Reliability Physics Symposium ( 2008 ). Borucki, L., Schindlbeck, G., Slayman, C. Comparison of accelerated DRAM soft error rates measred at component and system level. In Proceedings of 46th Annual International Reliability Physics Symposium (2008)."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of OSDI'06","author":"Chang F.","year":"2006","unstructured":"Chang , F. , Dean , J. , Ghemawat , S. , Hsieh , W.C. , Wallach , D.A. , Burrows , M. , Chandra , T. , Fikes , A. , Gruber , R.E. Bigtable : A distributed storage system for structured data . In Proceedings of OSDI'06 ( 2006 ). Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E. Bigtable: A distributed storage system for structured data. In Proceedings of OSDI'06 (2006)."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.282.0124"},{"key":"e_1_2_1_7_1","volume-title":"A white paper on the benefits of chipkill-correct ECC for PC server main memory. IBM Microelectronics","author":"Dell T.J.","year":"1997","unstructured":"Dell , T.J. A white paper on the benefits of chipkill-correct ECC for PC server main memory. IBM Microelectronics ( 1997 ). Dell, T.J. A white paper on the benefits of chipkill-correct ECC for PC server main memory. IBM Microelectronics (1997)."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/16.678551"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 4th Annual Conference on Reliability","author":"Johnston A.H.","year":"2000","unstructured":"Johnston , A.H. Scaling and technology issues for soft error rates . In Proceedings of the 4th Annual Conference on Reliability ( 2000 ). Johnston, A.H. Scaling and technology issues for soft error rates. In Proceedings of the 4th Annual Conference on Reliability (2000)."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of USENIX Annual Technical Conference","author":"Li X.","year":"2007","unstructured":"Li , X. , Shen , K. , Huang , M. , Chu , L. A memory soft error measurement on production systems . In Proceedings of USENIX Annual Technical Conference ( 2007 ). Li, X., Shen, K., Huang, M., Chu, L. A memory soft error measurement on production systems. In Proceedings of USENIX Annual Technical Conference (2007)."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/T-ED.1979.19370"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2004.119"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/566726.566749"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/977407.978748"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2005.37"},{"key":"e_1_2_1_16_1","first-page":"43","article-title":"Single event upset at ground level","volume":"6","author":"Normand E","year":"1996","unstructured":"Normand , E . Single event upset at ground level . IEEE Trans. Nucl. Sci. 6 , 43 ( 1996 ), 2742--2750. Normand, E. Single event upset at ground level. IEEE Trans. Nucl. Sci. 6, 43 (1996), 2742--2750.","journal-title":"IEEE Trans. Nucl. Sci."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.401.0041"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2006.5"},{"key":"e_1_2_1_19_1","volume-title":"5th USENIX FAST Conference","author":"Schroeder B.","year":"2007","unstructured":"Schroeder , B. , Gibson , G.A. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In 5th USENIX FAST Conference ( 2007 ). Schroeder, B., Gibson, G.A. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In 5th USENIX FAST Conference (2007)."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555349.1555372"},{"key":"e_1_2_1_21_1","article-title":"Origin and characteristics of alpha-particle-induced permanent junction leakage","author":"Takeuchi K.","year":"1999","unstructured":"Takeuchi , K. , Shimohigashi , K. , Kozuka , H. , Toyabe , T. , Itoh , K. , Kurosawa , H . Origin and characteristics of alpha-particle-induced permanent junction leakage . IEEE Trans. Electron Dev. ( Mar. 1999 ). Takeuchi, K., Shimohigashi, K., Kozuka, H., Toyabe, T., Itoh, K., Kurosawa, H. Origin and characteristics of alpha-particle-induced permanent junction leakage. IEEE Trans. Electron Dev. (Mar. 1999).","journal-title":"IEEE Trans. Electron Dev."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.206.4420.776"}],"container-title":["Communications of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1897816.1897844","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1897816.1897844","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:52:36Z","timestamp":1750243956000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1897816.1897844"}},"subtitle":["a large-scale field study"],"short-title":[],"issued":{"date-parts":[[2011,2]]},"references-count":22,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2011,2]]}},"alternative-id":["10.1145\/1897816.1897844"],"URL":"https:\/\/doi.org\/10.1145\/1897816.1897844","relation":{},"ISSN":["0001-0782","1557-7317"],"issn-type":[{"value":"0001-0782","type":"print"},{"value":"1557-7317","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,2]]},"assertion":[{"value":"2011-02-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}