{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,20]],"date-time":"2026-02-20T07:35:31Z","timestamp":1771572931155,"version":"3.50.1"},"reference-count":24,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2016,7,28]],"date-time":"2016-07-28T00:00:00Z","timestamp":1469664000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2017,1]]},"abstract":"<jats:p> Commodity clusters revolutionized high-performance computing when they first appeared two decades ago. As scale and complexity have grown, new challenges in reliability and systemic resilience, energy efficiency and optimization and software complexity have emerged that suggest the need for re-evaluation of current approaches. This paper reviews the state of the art and reflects on some of the challenges likely to be faced when building trans-petascale computing systems, using insights and perspectives drawn from operational experience and community debates. <\/jats:p>","DOI":"10.1177\/1094342015597083","type":"journal-article","created":{"date-parts":[[2015,8,7]],"date-time":"2015-08-07T02:00:39Z","timestamp":1438912839000},"page":"104-113","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":55,"title":["A survey of high-performance computing scaling challenges"],"prefix":"10.1177","volume":"31","author":[{"given":"Al","family":"Geist","sequence":"first","affiliation":[{"name":"Oak Ridge National Laboratory, USA"}]},{"given":"Daniel A","family":"Reed","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Iowa, USA"}]}],"member":"179","published-online":{"date-parts":[[2016,7,28]]},"reference":[{"key":"bibr1-1094342015597083","doi-asserted-by":"publisher","DOI":"10.1109\/40.342018"},{"key":"bibr2-1094342015597083","doi-asserted-by":"publisher","DOI":"10.1109\/MIC.2011.20"},{"key":"bibr3-1094342015597083","volume-title":"How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters","author":"Becker DJ","year":"1999"},{"key":"bibr4-1094342015597083","unstructured":"Bennett C, Tseitlin A (2012) Chaos Monkey Released Into The Wild Netflix. Netflix. Available at: http:\/\/techblog.netflix.com\/2012\/07\/chaos-monkey-released-into-wild.html"},{"key":"bibr5-1094342015597083","doi-asserted-by":"crossref","unstructured":"Bougeret M, Casanova H, Rabie M, (2011) Checkpointing strategies for parallel jobs. In: SC11.","DOI":"10.1145\/2063384.2063428"},{"key":"bibr6-1094342015597083","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.1974.1050511"},{"key":"bibr7-1094342015597083","doi-asserted-by":"crossref","unstructured":"Fuller SH, Millett I (eds) (2011) The future of computing performance: Game over or next level? Committee on Sustaining Growth in Computing Performance, National Research Council.","DOI":"10.1109\/MC.2011.15"},{"key":"bibr8-1094342015597083","doi-asserted-by":"publisher","DOI":"10.1177\/1094342009347445"},{"key":"bibr9-1094342015597083","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45540-X_6"},{"key":"bibr10-1094342015597083","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2008.4536223"},{"key":"bibr11-1094342015597083","volume-title":"Queueing Systems. Volume 1: Theory","author":"Kleinrock K","year":"1975"},{"key":"bibr12-1094342015597083","doi-asserted-by":"publisher","DOI":"10.1145\/1088149.1088179"},{"key":"bibr13-1094342015597083","unstructured":"Meuer H, Strohmaier E, Dongarra J, (2014) Top 500 Supercomputer Sites. Available at: http:\/\/www.top500.org\/"},{"key":"bibr14-1094342015597083","unstructured":"Microsoft Research (2009) The Fourth Paradigm: Data-Intensive Scientific Discovery."},{"key":"bibr15-1094342015597083","unstructured":"Rath J (2010) Blue Waters: Awesome power, awesome efficiency. Data Center Knowledge. Available at: http:\/\/www.datacenterknowledge.com\/archives\/2010\/06\/24\/blue-waters-awesome-power-awesome-efficiency\/b"},{"key":"bibr16-1094342015597083","doi-asserted-by":"publisher","DOI":"10.1145\/2491472.2491475"},{"key":"bibr17-1094342015597083","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2006.5"},{"key":"bibr18-1094342015597083","doi-asserted-by":"publisher","DOI":"10.1145\/1897816.1897844"},{"key":"bibr19-1094342015597083","doi-asserted-by":"crossref","unstructured":"Sridharan V, Liberty D (2012) A study of DRAM failures in the field. In: SC12.","DOI":"10.1109\/SC.2012.13"},{"key":"bibr20-1094342015597083","doi-asserted-by":"crossref","unstructured":"Sridharan V, Stearley J, DeBardeleben N, (2013) Feng shui of supercomputer memory: positional effects in DRAM and SRAM faults. In: SC13.","DOI":"10.1145\/2503210.2503257"},{"key":"bibr21-1094342015597083","unstructured":"Stoyanov M, Webster C (YEAR) Numerical analysis in the presence of hardware faults: Fixed point methods. SIAM Journal of Scientific Computing."},{"key":"bibr22-1094342015597083","doi-asserted-by":"crossref","unstructured":"Tiwari D, Gupta S, Rogers J, (2015) Understanding GPU errors on large-scale HPC systems and the implications for system design and operation. In: Proceedings of the 21st IEEE international symposium on high-performance computer architecture (HPCA), Burlingame, CA, 7\u201311 February 2015, pp. 331\u2013342.","DOI":"10.1109\/HPCA.2015.7056044"},{"key":"bibr23-1094342015597083","doi-asserted-by":"publisher","DOI":"10.1145\/1552272.1552275"},{"key":"bibr24-1094342015597083","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1109\/HPCA.2013.6522325","volume-title":"2013 IEEE 19th international symposium on high performance computer architecture (HPCA2013)","author":"Xun J","year":"2013"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342015597083","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342015597083","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342015597083","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T23:39:13Z","timestamp":1740785953000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342015597083"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,7,28]]},"references-count":24,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2017,1]]}},"alternative-id":["10.1177\/1094342015597083"],"URL":"https:\/\/doi.org\/10.1177\/1094342015597083","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,7,28]]}}}