{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T06:01:17Z","timestamp":1740981677374,"version":"3.38.0"},"reference-count":19,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2012,11,8]],"date-time":"2012-11-08T00:00:00Z","timestamp":1352332800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2013,11]]},"abstract":"<jats:p> A high-performance computing (HPC) system, which is composed of a large number of components, is prone to failure. To maximize HPC system utilization, one should understand the failure behavior and the reliability of the system. Studies in the literature show that the time to failure of a node is best described by a Weibull distribution. In this study, we consider, without loss of generality, the Weibull as the distribution of time to failure and develop a reliability model for a system of k nodes where nodes can fail simultaneously. From this model, we develop expressions for the probability of failure of the system at any time t, for the failure rate, and for the mean time to failure. Also, we validate the model by using failure data from the Blue Gene\/L logs obtained from the Lawrence Livermore National Laboratory. Results show that if failures of the components (nodes) in the system possess a degree of dependency, the system becomes less reliable, which means that the failure rate increases and the mean time to failure decreases. Also, an increase in the number of nodes decreases the reliability of the system. <\/jats:p>","DOI":"10.1177\/1094342012464506","type":"journal-article","created":{"date-parts":[[2012,11,9]],"date-time":"2012-11-09T03:49:15Z","timestamp":1352432955000},"page":"474-482","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":11,"title":["Reliability model of a system of <i>k<\/i> nodes with simultaneous failures for high-performance computing applications"],"prefix":"10.1177","volume":"27","author":[{"given":"Thanadech","family":"Thanakornworakij","sequence":"first","affiliation":[{"name":"College of Engineering and Science, Louisiana Tech University, Ruston, LA, USA"}]},{"given":"Raja","family":"Nassar","sequence":"additional","affiliation":[{"name":"College of Engineering and Science, Louisiana Tech University, Ruston, LA, USA"}]},{"given":"Chokchai Box","family":"Leangsuksun","sequence":"additional","affiliation":[{"name":"College of Engineering and Science, Louisiana Tech University, Ruston, LA, USA"}]},{"given":"Mihaela","family":"Paun","sequence":"additional","affiliation":[{"name":"College of Engineering and Science, Louisiana Tech University, Ruston, LA, USA"},{"name":"National Institute of Research and Development for Biological Sciences, Bucharest, Romania"}]}],"member":"179","published-online":{"date-parts":[[2012,11,8]]},"reference":[{"key":"bibr1-1094342012464506","first-page":"189","volume-title":"Proceedings of the 8th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN)","author":"Engelmann C","year":"2009"},{"key":"bibr2-1094342012464506","first-page":"2006","volume-title":"Proceedings of High Availability and Performance Workshop (HAPCW) 2006, in conjunction with Los Alamos Computer Science Institute (LACSI) Symposium","author":"Gottumukkala NR","year":"2006"},{"volume-title":"Continuous Univariate Distributions","year":"1994","author":"Johnson NL","key":"bibr3-1094342012464506"},{"key":"bibr4-1094342012464506","doi-asserted-by":"publisher","DOI":"10.1109\/TR.2009.2034291"},{"key":"bibr5-1094342012464506","first-page":"193","volume":"11","author":"Hanagal DD","year":"1996","journal-title":"Economic Quality Control"},{"key":"bibr6-1094342012464506","doi-asserted-by":"publisher","DOI":"10.1145\/511334.511362"},{"key":"bibr7-1094342012464506","volume-title":"Introduction to Mathematical Statistics","author":"Hogg RV","year":"2005","edition":"6"},{"key":"bibr8-1094342012464506","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2008.99"},{"key":"bibr9-1094342012464506","doi-asserted-by":"publisher","DOI":"10.1016\/j.csda.2008.11.009"},{"key":"bibr10-1094342012464506","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2003.12.026"},{"key":"bibr11-1094342012464506","doi-asserted-by":"publisher","DOI":"10.1109\/24.58720"},{"key":"bibr12-1094342012464506","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1967.10482885"},{"key":"bibr13-1094342012464506","doi-asserted-by":"publisher","DOI":"10.1016\/0026-2714(82)90174-3"},{"key":"bibr14-1094342012464506","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1976.10480370"},{"key":"bibr15-1094342012464506","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2004.1311948"},{"key":"bibr16-1094342012464506","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2006.5"},{"key":"bibr17-1094342012464506","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1093\/forestscience\/55.1.72","volume":"55","author":"Strimbu BM","year":"2009","journal-title":"Forest Science"},{"key":"bibr18-1094342012464506","doi-asserted-by":"publisher","DOI":"10.1145\/1183401.1183433"},{"key":"bibr19-1094342012464506","first-page":"pp","volume-title":"Proceedings of the 1999 Pacific Rim International Symposium on Dependable Computing","author":"Xu J","year":"1999"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342012464506","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342012464506","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342012464506","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T02:11:10Z","timestamp":1740967870000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342012464506"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,11,8]]},"references-count":19,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,11]]}},"alternative-id":["10.1177\/1094342012464506"],"URL":"https:\/\/doi.org\/10.1177\/1094342012464506","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2012,11,8]]}}}