{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T14:05:50Z","timestamp":1773842750722,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":30,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,9,30]],"date-time":"2019-09-30T00:00:00Z","timestamp":1569801600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,9,30]]},"DOI":"10.1145\/3357526.3357558","type":"proceedings-article","created":{"date-parts":[[2019,11,6]],"date-time":"2019-11-06T14:25:56Z","timestamp":1573050356000},"page":"69-84","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["DRAM errors in the field"],"prefix":"10.1145","author":[{"given":"Darko","family":"Zivanovic","sequence":"first","affiliation":[{"name":"Barcelona Supercomputing Center, Barcelona, Spain"}]},{"given":"Pouya Esmaili","family":"Dokht","sequence":"additional","affiliation":[{"name":"Barcelona Supercomputing Center"}]},{"given":"Sergi","family":"Mor\u00e9","sequence":"additional","affiliation":[{"name":"Barcelona Supercomputing Center"}]},{"given":"Javier","family":"Bartolome","sequence":"additional","affiliation":[{"name":"Barcelona Supercomputing Center"}]},{"given":"Paul M.","family":"Carpenter","sequence":"additional","affiliation":[{"name":"Barcelona Supercomputing Center"}]},{"given":"Petar","family":"Radojkovi\u0107","sequence":"additional","affiliation":[{"name":"Barcelona Supercomputing Center"}]},{"given":"Eduard","family":"Ayguad\u00e9","sequence":"additional","affiliation":[{"name":"Universitat Polit\u00e8cnica de Catalunya"}]}],"member":"320","published-online":{"date-parts":[[2019,9,30]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2004.2"},{"key":"e_1_3_2_1_2_1","unstructured":"Barcelona Supercomputing Center. 2016. MareNostrum 3 User's Guide.  Barcelona Supercomputing Center. 2016. MareNostrum 3 User's Guide."},{"key":"e_1_3_2_1_3_1","unstructured":"Timothy J. Dell. 1997. A White Paper on the Benefits of Chipkill-Correct ECC for PC Server Main Memory. Technical white paper 4AA4-3490ENW. IBM.  Timothy J. Dell. 1997. A White Paper on the Benefits of Chipkill-Correct ECC for PC Server Main Memory . Technical white paper 4AA4-3490ENW. IBM."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3154448.3154451"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1209\/0295-5075\/81\/48002"},{"key":"e_1_3_2_1_6_1","first-page":"1","article-title":"Failures in Large Scale Systems: Long-term Measurement, Analysis, and Implications. In Proceedings of the International Conference for High Performance Computing","volume":"44","author":"Gupta Saurabh","year":"2017","unstructured":"Saurabh Gupta , Tirthak Patel , Christian Engelmann , and Devesh Tiwari . 2017 . Failures in Large Scale Systems: Long-term Measurement, Analysis, and Implications. In Proceedings of the International Conference for High Performance Computing , Networking, Storage and Analysis. 44 : 1 -- 44 :12. Saurabh Gupta, Tirthak Patel, Christian Engelmann, and Devesh Tiwari. 2017. Failures in Large Scale Systems: Long-term Measurement, Analysis, and Implications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 44:1--44:12.","journal-title":"Networking, Storage and Analysis."},{"key":"e_1_3_2_1_7_1","unstructured":"Hewlett Packard Enterprise 2016. HPE ProLiant DL580 Gen9 Server User Guide. Hewlett Packard Enterprise.  Hewlett Packard Enterprise 2016. HPE ProLiant DL580 Gen9 Server User Guide. Hewlett Packard Enterprise."},{"key":"e_1_3_2_1_8_1","unstructured":"HP. 2016. How memory RAS technologies can enhance the uptime of HPE ProLiant servers. Technical white paper 4AA4-3490ENW. Hewlett Packard Enterprise.  HP. 2016. How memory RAS technologies can enhance the uptime of HPE ProLiant servers . Technical white paper 4AA4-3490ENW. Hewlett Packard Enterprise."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2150976.2150989"},{"key":"e_1_3_2_1_10_1","unstructured":"IBM 2014. System x iDataPlex dx360 M4 Types 7912 and 7913: Problem Determination and Service Guide. IBM.  IBM 2014. System x iDataPlex dx360 M4 Types 7912 and 7913: Problem Determination and Service Guide . IBM."},{"key":"e_1_3_2_1_11_1","unstructured":"Intel Server Products and Solutions 2017. System Event Log (SEL) Troubleshooting Guide. Intel Server Products and Solutions.  Intel Server Products and Solutions 2017. System Event Log (SEL) Troubleshooting Guide . Intel Server Products and Solutions."},{"key":"e_1_3_2_1_12_1","volume-title":"Wang","author":"Jacob Bruce","year":"2008","unstructured":"Bruce Jacob , Spencer W. NG, and David T . Wang . 2008 . Memory Systems : Cache, DRAM, Disk. Morgan Kaufmann . Bruce Jacob, Spencer W. NG, and David T. Wang. 2008. Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann."},{"key":"e_1_3_2_1_13_1","volume-title":"MCELOG: Memory Error Handling in User Space. In International Linux System Technology Conference (Linux Kongress).","author":"Kleen Andy","year":"2010","unstructured":"Andy Kleen . 2010 . MCELOG: Memory Error Handling in User Space. In International Linux System Technology Conference (Linux Kongress). Andy Kleen. 2010. MCELOG: Memory Error Handling in User Space. In International Linux System Technology Conference (Linux Kongress)."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00046"},{"key":"e_1_3_2_1_15_1","volume-title":"Proc. of the USENIX Conference on USENIX Annual Technical Conference (USENIXATC). 6--6.","author":"Li Xin","year":"2010","unstructured":"Xin Li , Michael C. Huang , Kai Shen , and Lingkun Chu . 2010 . A Realistic Evaluation of Memory Hardware Errors and Software System Susceptibility . In Proc. of the USENIX Conference on USENIX Annual Technical Conference (USENIXATC). 6--6. Xin Li, Michael C. Huang, Kai Shen, and Lingkun Chu. 2010. A Realistic Evaluation of Memory Hardware Errors and Software System Susceptibility. In Proc. of the USENIX Conference on USENIX Annual Technical Conference (USENIXATC). 6--6."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2014.62"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2015.57"},{"key":"e_1_3_2_1_18_1","volume-title":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).","author":"Nie B.","unstructured":"B. Nie , D. Tiwari , S. Gupta , E. Smirni , and J. H. Rogers . 2016. A large-scale study of soft-errors on GPUs in the field . In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). B. Nie, D. Tiwari, S. Gupta, E. Smirni, and J. H. Rogers. 2016. A large-scale study of soft-errors on GPUs in the field. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_19_1","volume-title":"2018 48th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN).","author":"Nie B.","unstructured":"B. Nie , J. Xue , S. Gupta , T. Patel , C. Engelmann , E. Smirni , and D. Tiwari . 2018. Machine Learning Models for GPU Error Prediction in a Large Scale HPC System . In 2018 48th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN). B. Nie, J. Xue, S. Gupta, T. Patel, C. Engelmann, E. Smirni, and D. Tiwari. 2018. Machine Learning Models for GPU Error Prediction in a Large Scale HPC System. In 2018 48th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN)."},{"key":"e_1_3_2_1_20_1","unstructured":"PRACE. 2019. PRACE Research Infrastructure. http:\/\/www.prace-ri.eu.  PRACE. 2019. PRACE Research Infrastructure. http:\/\/www.prace-ri.eu."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2009.4"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555349.1555372"},{"key":"e_1_3_2_1_23_1","volume-title":"IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE).","author":"Siddiqua Taniya","year":"2013","unstructured":"Taniya Siddiqua , Athanasios Papathanasiou , Arijit Biswas , , and Sudhanva Gurumurthi . 2013 . Analysis of Memory Errors from Large-Scale Field Data Collection . In IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE). Taniya Siddiqua, Athanasios Papathanasiou, Arijit Biswas, , and Sudhanva Gurumurthi. 2013. Analysis of Memory Errors from Large-Scale Field Data Collection. In IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE)."},{"key":"e_1_3_2_1_24_1","volume-title":"Proceedings of the Royal Society of London. Number v. 58","author":"The Royal Society","year":"1895","unstructured":"The Royal Society . 1895 . Proceedings of the Royal Society of London. Number v. 58 . Taylor & Francis. The Royal Society. 1895. Proceedings of the Royal Society of London. Number v. 58. Taylor & Francis."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2694344.2694348"},{"key":"e_1_3_2_1_26_1","first-page":"1","article-title":"A Study of DRAM Failures in the Field. In Proc. of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC)","volume":"76","author":"Sridharan Vilas","year":"2012","unstructured":"Vilas Sridharan and Dean Liberty . 2012 . A Study of DRAM Failures in the Field. In Proc. of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC) . Article 76 , 76: 1 -- 76 :11 pages. Vilas Sridharan and Dean Liberty. 2012. A Study of DRAM Failures in the Field. In Proc. of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Article 76, 76:1--76:11 pages.","journal-title":"Article"},{"key":"e_1_3_2_1_27_1","first-page":"1","article-title":"Feng Shui of Supercomputer Memory: Positional Effects in DRAM and SRAM Faults. In Proc. of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC)","volume":"22","author":"Sridharan Vilas","year":"2013","unstructured":"Vilas Sridharan , Jon Stearley , Nathan DeBardeleben , Sean Blanchard , and Sudhanva Gurumurthi . 2013 . Feng Shui of Supercomputer Memory: Positional Effects in DRAM and SRAM Faults. In Proc. of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC) . Article 22 , 22: 1 -- 22 :11 pages. Vilas Sridharan, Jon Stearley, Nathan DeBardeleben, Sean Blanchard, and Sudhanva Gurumurthi. 2013. Feng Shui of Supercomputer Memory: Positional Effects in DRAM and SRAM Faults. In Proc. of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Article 22, 22:1--22:11 pages.","journal-title":"Article"},{"key":"e_1_3_2_1_28_1","volume-title":"Assessment of the Effect of Memory Page Retirement on System RAS Against Hardware Faults. In International Conference on Dependable Systems and Networks (DSN'06)","author":"Tang D.","unstructured":"D. Tang , P. Carruthers , Z. Totari , and M. W. Shapiro . 2006 . Assessment of the Effect of Memory Page Retirement on System RAS Against Hardware Faults. In International Conference on Dependable Systems and Networks (DSN'06) . D. Tang, P. Carruthers, Z. Totari, and M. W. Shapiro. 2006. Assessment of the Effect of Memory Page Retirement on System RAS Against Hardware Faults. In International Conference on Dependable Systems and Networks (DSN'06)."},{"key":"e_1_3_2_1_29_1","volume-title":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).","author":"Tiwari D.","unstructured":"D. Tiwari , S. Gupta , J. Rogers , D. Maxwell , P. Rech , S. Vazhkudai , D. Oliveira , D. Londo , N. DeBardeleben , P. Navaux , L. Carro , and A. Bland . 2015. Understanding GPU errors on large-scale HPC systems and the implications for system design and operation . In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). D. Tiwari, S. Gupta, J. Rogers, D. Maxwell, P. Rech, S. Vazhkudai, D. Oliveira, D. Londo, N. DeBardeleben, P. Navaux, L. Carro, and A. Bland. 2015. Understanding GPU errors on large-scale HPC systems and the implications for system design and operation. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_30_1","unstructured":"D. Watts R. Doughty and I. Solovyev. 2018. Lenovo System x3850 X6 and x3950 X6 Planning and Implementation Guide. Lenovo Press.  D. Watts R. Doughty and I. Solovyev. 2018. Lenovo System x3850 X6 and x3950 X6 Planning and Implementation Guide . Lenovo Press."}],"event":{"name":"MEMSYS '19: The International Symposium on Memory Systems","location":"Washington District of Columbia USA","acronym":"MEMSYS '19"},"container-title":["Proceedings of the International Symposium on Memory Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3357526.3357558","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3357526.3357558","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:23:22Z","timestamp":1750202602000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3357526.3357558"}},"subtitle":["a statistical approach"],"short-title":[],"issued":{"date-parts":[[2019,9,30]]},"references-count":30,"alternative-id":["10.1145\/3357526.3357558","10.1145\/3357526"],"URL":"https:\/\/doi.org\/10.1145\/3357526.3357558","relation":{},"subject":[],"published":{"date-parts":[[2019,9,30]]},"assertion":[{"value":"2019-09-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}