{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,1,27]],"date-time":"2025-01-27T03:40:07Z","timestamp":1737949207492,"version":"3.33.0"},"reference-count":43,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2014,12,9]],"date-time":"2014-12-09T00:00:00Z","timestamp":1418083200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2015,2]]},"abstract":"<jats:p> Building the next-generation of extreme-scale distributed systems will require overcoming several challenges related to system resilience. As the number of processors in these systems grow, the failure rate increases proportionally. One of the most common sources of failure in large-scale systems is memory. In this paper, we propose a novel runtime for transparently exploiting memory content similarity to improve system resilience by reducing the rate at which memory errors lead to node failure. We evaluate the viability of this approach by examining memory snapshots collected from eight high-performance computing (HPC) applications and two important HPC operating systems. Based on the characteristics of the similarity uncovered, we conclude that our proposed approach shows promise for addressing system resilience in large-scale systems. <\/jats:p>","DOI":"10.1177\/1094342014560354","type":"journal-article","created":{"date-parts":[[2014,12,10]],"date-time":"2014-12-10T03:49:14Z","timestamp":1418183354000},"page":"5-20","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":1,"title":["A study of the viability of exploiting memory content similarity to improve resilience to memory errors"],"prefix":"10.1177","volume":"29","author":[{"given":"Scott","family":"Levy","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of New Mexico, USA"}]},{"given":"Kurt B","family":"Ferreira","sequence":"additional","affiliation":[{"name":"Sandia National Laboratories, USA"}]},{"given":"Patrick G","family":"Bridges","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of New Mexico, USA"}]},{"given":"Aidan P","family":"Thompson","sequence":"additional","affiliation":[{"name":"Sandia National Laboratories, USA"}]},{"given":"Christian","family":"Trott","sequence":"additional","affiliation":[{"name":"Sandia National Laboratories, USA"}]}],"member":"179","published-online":{"date-parts":[[2014,12,9]]},"reference":[{"key":"bibr1-1094342014560354","first-page":"19","volume-title":"Proceedings of the Linux Symposium","author":"Arcangeli A","year":"2009"},{"key":"bibr2-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.104.136403"},{"key":"bibr3-1094342014560354","unstructured":"Bergman K, Borkar S, Campbell D, (2008) Exascale computing study: Technology challenges in achieving exascale systems. http:\/\/www.science.energy.gov\/ascr\/Research\/CS\/DARPA\/exascale-hardware(2008).pdf."},{"key":"bibr4-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.24"},{"key":"bibr5-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-29740-3_28"},{"key":"bibr6-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1145\/265924.265930"},{"key":"bibr7-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1109\/DFT.2008.50"},{"key":"bibr8-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2006.1639333"},{"key":"bibr9-1094342014560354","unstructured":"corbet (2007) The SLUB allocator. LWN.net, 11 April. http:\/\/lwn.net\/Articles\/229984\/."},{"key":"bibr10-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2004.15"},{"key":"bibr11-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1145\/568522.568525"},{"key":"bibr12-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-24449-0_31"},{"key":"bibr13-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063443"},{"key":"bibr14-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1145\/2318916.2318930"},{"key":"bibr15-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.95"},{"key":"bibr16-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1145\/1831407.1831429"},{"key":"bibr17-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1016\/S0168-9274(01)00115-5"},{"volume-title":"Improving Performance via Mini-applications","year":"2009","author":"Heroux MA","key":"bibr18-1094342014560354"},{"key":"bibr19-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1002\/rsa.3240060207"},{"key":"bibr20-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1145\/2150976.2150989"},{"key":"bibr21-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1145\/62546.62575"},{"volume-title":"Proceedings of Linux Kongress","year":"2010","author":"Kleen A","key":"bibr22-1094342014560354"},{"key":"bibr23-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2010.5470482"},{"key":"bibr24-1094342014560354","unstructured":"Lawrence Livermore National Laboratories (2009) ASC Sequoia Benchmark Codes. https:\/\/asc.llnl.gov\/sequoia\/benchmarks."},{"key":"bibr25-1094342014560354","unstructured":"Lawrence Livermore National Laboratories (2014a) IRS: Implicit Radiation Solver 1.4 Build Notes. https:\/\/asc.llnl.gov\/computing_resources\/purple\/archive\/benchmarks\/irs\/irs.readme.html."},{"key":"bibr26-1094342014560354","unstructured":"Lawrence Livermore National Laboratories (2014b) SAMRAI. https:\/\/computation.llnl.gov\/casc\/SAMRAI\/index.html."},{"key":"bibr27-1094342014560354","unstructured":"Los Alamos National Laboratories (1999) Sweep3d. http:\/\/www.c3.lanl.gov\/pal\/software\/sweep3d\/sweep3d_readme.html."},{"key":"bibr28-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1016\/0734-743X(90)90071-3"},{"key":"bibr29-1094342014560354","unstructured":"Meuer H, Strohmaier E, Dongarra J, Simon H (2013) June 2013 TOP500 Supercomputer Sites. http:\/\/top500.org\/lists\/2013\/06\/."},{"key":"bibr30-1094342014560354","unstructured":"Noyes K (2012) 94 Percent of the World\u2019s Top 500 Supercomputers Run Linux. http:\/\/www.linux.com\/news\/enterprise\/high-performance\/147-high-performance\/666669-94-percent-of-the-worlds-top-500-supercomputers-run-linux-"},{"key":"bibr31-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1006\/jcph.1995.1039"},{"key":"bibr32-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.v21:6"},{"key":"bibr33-1094342014560354","unstructured":"Sandia National Laboratories (2010) The LAMMPS molecular dynamics simulator. http:\/\/lammps.sandia.gov."},{"key":"bibr34-1094342014560354","unstructured":"Sandia National Laboratories (2012) Kitten lightweight kernel. https:\/\/software.sandia.gov\/trac\/kitten."},{"key":"bibr35-1094342014560354","unstructured":"Sandia National Laboratories (2014) Mantevo. http:\/\/software.sandia.gov\/mantevo."},{"key":"bibr36-1094342014560354","unstructured":"Schroeder B, Gibson GA (2006) A large-scale study of failures in high-performance computing systems. In: Proceedings of the International Conference on Dependable Systems and Networks (DSN\u201906). http:\/\/www.pdl.cmu.edu\/PDL-FTP\/stray\/dsn06_abs.html."},{"key":"bibr37-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2014.12.018"},{"key":"bibr38-1094342014560354","unstructured":"Tuininga A (2006) Cx bsdiff. http:\/\/starship.python.net\/crew\/atuining\/cx_bsdiff\/index.html."},{"key":"bibr39-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.342"},{"key":"bibr40-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1145\/844128.844146"},{"key":"bibr41-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1145\/2287056.2287061"},{"key":"bibr42-1094342014560354","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2010.5470468"},{"key":"bibr43-1094342014560354","first-page":"18:1","volume-title":"Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST\u201908)","author":"Zhu B","year":"2008"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342014560354","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342014560354","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342014560354","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,27]],"date-time":"2025-01-27T02:32:19Z","timestamp":1737945139000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342014560354"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,12,9]]},"references-count":43,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2015,2]]}},"alternative-id":["10.1177\/1094342014560354"],"URL":"https:\/\/doi.org\/10.1177\/1094342014560354","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2014,12,9]]}}}