{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,28]],"date-time":"2025-05-28T04:10:03Z","timestamp":1748405403770,"version":"3.41.0"},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2015,5,28]],"date-time":"2015-05-28T00:00:00Z","timestamp":1432771200000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cloud Comp"],"published-print":{"date-parts":[[2015,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Cloud computing paradigm has ushered in the need to provide resources to users in a scalable, flexible, and transparent fashion much like any other utility. This has led to a need for developing evaluation techniques that can provide quantitative measures of reliability of a cloud computing system (CCS) for efficient planning and expansion. This paper presents a new, scalable algorithm based on non-sequential Monte Carlo Simulation (MCS) to evaluate large scale cloud computing system (CCS) reliability, and it develops appropriate performance measures. Also, a new iterative algorithm is proposed and developed that leverages the MCS method for the design of highly reliable and highly utilized CCSs. The combination of these two algorithms allows CCSs to be evaluated by providers and users alike, providing a new method for estimating the parameters of service level agreements (SLAs) and designing CCSs to match those contractual requirements posed in SLAs. Results demonstrate that the proposed methods are effective and applicable to systems at a large scale. Multiple insights are also provided into the nature of CCS reliability and CCS design.<\/jats:p>","DOI":"10.1186\/s13677-015-0036-6","type":"journal-article","created":{"date-parts":[[2015,5,27]],"date-time":"2015-05-27T10:52:08Z","timestamp":1432723928000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Evaluation and design of highly reliable and highly utilized cloud computing systems"],"prefix":"10.1186","volume":"4","author":[{"given":"Brett","family":"Snyder","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jordan","family":"Ringenberg","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Robert","family":"Green","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vijay","family":"Devabhaktuni","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mansoor","family":"Alam","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2015,5,28]]},"reference":[{"key":"36_CR1","unstructured":"IWGCR: International Working Group on Cloud Computing Resiliency. http:\/\/iwgcr.org\/ (2013)."},{"key":"36_CR2","unstructured":"Gagnaire M, Diaz F, Coti C, Cerin C, Shiozaki K, Xu Y, Delort P, Smets JP, Lous JL, Lubiarz S, Leclerc P (2011) Downtime statistics of current cloud solutions. Technical report, International Working Group on Cloud Computing Resiliency (June 2012) https:\/\/iwgcr.files.wordpress.com\/2012\/06\/iwgcr-paris-ranking-001-en1.pdf"},{"key":"36_CR3","unstructured":"Cerin C, Coti C, Delort P, Diaz F, Gagnaire M, Gaumer Q, Guillaume N, Lous J, Lubiarz S, Raffaelli J, Shiozaki K, Schauer H, Smets J, Seguin L. Downtime statistics of current cloud solutions. Technical report, International Working Group on Cloud Computing Resiliency (June 2013) http:\/\/iwgcr.org\/wp-content\/uploads\/2013\/06\/IWGCR-Paris.Ranking-003.2-en.pdf."},{"key":"36_CR4","unstructured":"Kundra V. Federal cloud computing strategy. Technical report, The United States Government https:\/\/www.dhs.gov\/sites\/default\/files\/publications\/digital-strategy\/federal-cloud-computingstrategy.pdf."},{"key":"36_CR5","volume-title":"The Cloud at Your Service: The When, How, and Why of Enterprise Cloud Computing","author":"J Rosenberg","year":"2011","unstructured":"Rosenberg J, Mateos A (2011) The Cloud at Your Service: The When, How, and Why of Enterprise Cloud Computing. 1st edn. Manning Publications, Greenwich, Connecticut."},{"key":"36_CR6","unstructured":"Izrailevsky Y, Tseitlin A (2011) The Netflix Simian Army. http:\/\/techblog.netflix.com\/2011\/07\/netflix-simian-army.html."},{"key":"36_CR7","unstructured":"IBM\/Google Academic Cloud Computing Initiative. http:\/\/www.cloudbook.net\/directories\/research-clouds\/ibm-google-academic-cloud-computing-initiative (2012)."},{"key":"36_CR8","unstructured":"Cloud Computing. http:\/\/labs.yahoo.com\/nnComputing (2011)."},{"key":"36_CR9","first-page":"87","volume-title":"Systems and Virtualization Management. Standards and the Cloud. Communications in Computer and Information Science, vol 71","author":"CS Chang","year":"2010","unstructured":"Chang CS, Bostjancic D, Williams M (2010) Availability management in a virtualized world. In: Boursas L, Carlson M, Jin H, Sibilla M, Wold K (eds)Systems and Virtualization Management. Standards and the Cloud. Communications in Computer and Information Science, vol 71, 87\u201393.. Springer, Berlin Heidelberg."},{"key":"36_CR10","doi-asserted-by":"publisher","DOI":"10.1002\/9781118393994","volume-title":"Reliability and Availability of Cloud Computing","author":"E Bauer","year":"2012","unstructured":"Bauer E, Adams R (2012) Reliability and Availability of Cloud Computing. 1st edn. Wiley-IEEE Press, Piscataway, New Jersey."},{"issue":"1","key":"36_CR11","doi-asserted-by":"publisher","first-page":"224","DOI":"10.1287\/opre.1120.1133","volume":"61","author":"KB Oner","year":"2013","unstructured":"Oner KB, Scheller-Wolf A, van Houtum G-J (2013) Redundancy optimization for critical components in high-availability technical systems. Oper Res 61(1): 224\u2013264.","journal-title":"Oper Res"},{"key":"36_CR12","doi-asserted-by":"crossref","unstructured":"Kim DS, Machida F, Trivedi KS (2009) Availability modeling and analysis of a virtualized system In: 15th IEEE Pacific Rim International Symposium on Dependable Computing, 365\u2013371, Shanghai, China.","DOI":"10.1109\/PRDC.2009.64"},{"key":"36_CR13","unstructured":"Ghosh R, Trevedi K, Naik V, Kim D (2012) Interacting markov chain based hierarchical approach for cloud services. Technical report, IBM (April 2010) http:\/\/domino.research.ibm.com\/library\/cyberdig.nsf\/papers\/AABCE247ECDECE0F8525771A005D42B6."},{"key":"36_CR14","doi-asserted-by":"crossref","unstructured":"Che J, Zhang T, Lin W, Xi H (2011) A markov chain-based availability model of virtual cluster nodes In: International Conference on Computational Intelligence and Security, 507\u2013511, Hainan, China.","DOI":"10.1109\/CIS.2011.118"},{"key":"36_CR15","doi-asserted-by":"crossref","unstructured":"Zheng J, Okamura H, Dohi T (2012) Component importance analysis of virtualized system In: International Conference on Ubiquitous Intelligence and Computing, 462\u2013469, Fukuoka, Japan.","DOI":"10.1109\/UIC-ATC.2012.128"},{"issue":"PrePrints","key":"36_CR16","first-page":"1","volume":"99","author":"F Longo","year":"2014","unstructured":"Longo F, Trivedi K, Russo S, Ghosh R, Frattini F (2014) Scalable Analytics for IaaS Cloud Availability. IEEE Trans Cloud Comput 99(PrePrints): 1.","journal-title":"IEEE Trans Cloud Comput"},{"issue":"3","key":"36_CR17","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1016\/j.future.2010.12.006","volume":"28","author":"D Zissis","year":"2012","unstructured":"Zissis D, Lekkas D (2012) Addressing cloud computing security issues. Future Generation Comput Syst 28(3): 583\u2013592.","journal-title":"Future Generation Comput Syst"},{"key":"36_CR18","unstructured":"Page SCloud computing-availability. Technical report, ISA\/BIT Learning Centre http:\/\/uwcisa.uwaterloo.ca\/Biblio2\/Topic\/ACC626."},{"issue":"1","key":"36_CR19","first-page":"2","volume":"1","author":"SP Ahuja","year":"2012","unstructured":"Ahuja SP, Mani S (2012) Availability of services in the era of cloud computing. Netw Commun Technol 1(1): 2\u20136.","journal-title":"Netw Commun Technol"},{"key":"36_CR20","doi-asserted-by":"crossref","unstructured":"Wang W, Chen H, Chen X (2012) An availability-aware approach to resource placement of dynamic scaling in cloud In: IEEE Fifth International Conference on Cloud Computing, 930\u2013931, Honolulu, Hawaii.","DOI":"10.1109\/CLOUD.2012.82"},{"issue":"1","key":"36_CR21","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1016\/j.compeleceng.2012.03.005","volume":"39","author":"YS Jeong","year":"2013","unstructured":"Jeong YS, Park JH (2013) High availability and efficient energy consumption for cloud computing service with grid infrastructure. Comput Electrical Eng 39(1): 15\u201323. http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0045790612000456.","journal-title":"Comput Electrical Eng"},{"issue":"4","key":"36_CR22","first-page":"465","volume":"9","author":"RE Manesh","year":"2012","unstructured":"Manesh RE, Jamshidi M, Zareie A, Abdi S, Parseh F, Parandin F (2012) Presentation an approach for useful availability servers cloud computing in schedule list algorithm. Int J Comput Sci Issues 9(4): 465\u2013470.","journal-title":"Int J Comput Sci Issues"},{"key":"36_CR23","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1145\/2342509.2342517","volume-title":"MCC Workshop on Mobile Cloud Computing","author":"A Ferrari","year":"2012","unstructured":"Ferrari A, Puccinelli D, Giordano S (2012) Characterization of the impact of resource availability on opportunistic computing In: MCC Workshop on Mobile Cloud Computing, 35\u201340.. ACM, Helsinki, Finland."},{"issue":"1","key":"36_CR24","first-page":"53","volume":"7","author":"YK Lin","year":"2010","unstructured":"Lin YK, Chang PC (2010) Estimation of maintenance reliability for a cloud computing network. Int J Oper Res 7(1): 53\u201360.","journal-title":"Int J Oper Res"},{"issue":"11","key":"36_CR25","first-page":"14185","volume":"38","author":"YK Lin","year":"2011","unstructured":"Lin YK, Chang PC (2011) Maintenance reliability estimation for a cloud computing network with nodes failure. Expert Syst Appl 38(11): 14185\u201314189.","journal-title":"Expert Syst Appl"},{"issue":"3","key":"36_CR26","first-page":"1","volume":"47","author":"YK Lin","year":"2011","unstructured":"Lin YK, Chang PC (2011) Performance indicator evaluation for a cloud computing system from qos viewpoint. Quality & Quantity 47(3): 1\u201312.","journal-title":"Quality & Quantity"},{"issue":"1","key":"36_CR27","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1002\/sys.20196","volume":"15","author":"YK Lin","year":"2012","unstructured":"Lin YK, Chang PC (2012) Evaluation of system reliability for a cloud computing system with imperfect nodes. Syst Eng 15(1): 83\u201394.","journal-title":"Syst Eng"},{"issue":"2","key":"36_CR28","doi-asserted-by":"publisher","first-page":"543","DOI":"10.1016\/j.ijpe.2012.05.028","volume":"139","author":"YK Lin","year":"2012","unstructured":"Lin YK, Chang PC (2012) Approximate and accurate maintenance reliabilities of a cloud computing network with nodes failure subject to budget. Int J Prod Econ 139(2): 543\u2013550.","journal-title":"Int J Prod Econ"},{"key":"36_CR29","unstructured":"Lin YK, Chang PC (2012) Estimation Method to Evaluate a System Reliability of a Cloud Computing Network. United States Patent Application. http:\/\/www.google.com\/patents\/US20120023372."},{"key":"36_CR30","doi-asserted-by":"crossref","unstructured":"Qian H, Medhi D, Trivedi K (2011) A hierarchical model to evaluate quality of experience of online services hosted by cloud computing In: IFIP\/IEEE International Symposium on Integrated Network Management, 105\u2013112, Dublin, Ireland.","DOI":"10.1109\/INM.2011.5990680"},{"key":"36_CR31","doi-asserted-by":"crossref","unstructured":"Jhawar R, Piuri V, Santambrogio M (2012) A comprehensive conceptual system-level approach to fault tolerance in cloud computing In: IEEE International Systems Conference (SysCon), 1\u20135, Vancouver, BC.","DOI":"10.1109\/SysCon.2012.6189503"},{"key":"36_CR32","doi-asserted-by":"crossref","unstructured":"Jhawar R, Piuri V (2013). Chapter 7 - Fault Tolerance and Resilience in Cloud Computing Environments, In Computer and Information Security Handbook (Second Edition), edited by John R. Vacca, Morgan Kaufmann, Boston, 2013, Pages 125-141, ISBN: 9780123943972, http:\/\/dx.doi.org\/10.1016\/B978-0-12-394397-2.00007-6. (http:\/\/www.sciencedirect.com\/science\/article\/pii\/B9780123943972000076.","DOI":"10.1016\/B978-0-12-394397-2.00007-6"},{"key":"36_CR33","unstructured":"Limrungsi N, Zhao J, Xiang Y, Lan T, Huang HH, Subramaniam S (2013) Providing reliability as an elastic service in cloud computing. Technical report, George Washington University (February 2012) ISBN: 978-1-4577-2052-9."},{"issue":"3","key":"36_CR34","doi-asserted-by":"publisher","first-page":"1","DOI":"10.5815\/ijieeb.2012.03.01","volume":"4","author":"D Singh","year":"2012","unstructured":"Singh D, Singh J, Chhabra A (2012) Failures in cloud computing data centers in 3-tier cloud architecture. Int J Inform Eng Electron Business 4(3): 1\u20138.","journal-title":"Int J Inform Eng Electron Business"},{"key":"36_CR35","doi-asserted-by":"crossref","unstructured":"Zhao W, Melliar-Smith PM, Moser LE (2010) Fault tolerance middleware for cloud computing In: International Conference on Cloud Computing, 1\u20138, Miami, Florida.","DOI":"10.1109\/CLOUD.2010.26"},{"key":"36_CR36","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1145\/1807128.1807161","volume-title":"Proceedings of the 1st ACM Symposium on Cloud Computing. SoCC \u201910","author":"KV Vishwanath","year":"2010","unstructured":"Vishwanath KV, Nagappan N (2010) Characterizing cloud computing hardware reliability In: Proceedings of the 1st ACM Symposium on Cloud Computing. SoCC \u201910, 193\u2013204.. ACM, New York, NY, USA."},{"key":"36_CR37","doi-asserted-by":"crossref","unstructured":"Rashid L, Pattabiraman K, Gopalakrishnan S (2012) Intermittent hardware errors recovery: Modeling and evaluation In: 9th International Conference on Quantitative Evaluation of SysTems. QEST 2012, 1\u201310.","DOI":"10.1109\/QEST.2012.37"},{"key":"36_CR38","doi-asserted-by":"publisher","first-page":"343","DOI":"10.1145\/1966445.1966477","volume-title":"Proceedings of the Sixth Conference on Computer Systems. EuroSys \u201911","author":"EB Nightingale","year":"2011","unstructured":"Nightingale EB, Douceur JR, Orgovan V (2011) Cycles, cells and platters: an empirical analysisof hardware failures on a million consumer pcs In: Proceedings of the Sixth Conference on Computer Systems. EuroSys \u201911, 343\u2013356.. ACM, New York, NY, USA."},{"key":"36_CR39","doi-asserted-by":"crossref","unstructured":"Pham C, Cao P, Kalbarczyk Z, Iyer RK (2012) Toward a high availability cloud: Techniques and challenges In: Second International Workshop on Dependability of Clouds, Data Centers and Virtual Machine Technology, 1\u20136, Boston, Massachusetts.","DOI":"10.1109\/DSNW.2012.6264687"},{"key":"36_CR40","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1145\/2018436.2018477","volume-title":"Proceedings of the ACM SIGCOMM 2011 Conference. SIGCOMM \u201911","author":"P Gill","year":"2011","unstructured":"Gill P, Jain N, Nagappan N (2011) Understanding network failures in data centers: measurement, analysis, and implications In: Proceedings of the ACM SIGCOMM 2011 Conference. SIGCOMM \u201911, 350\u2013361.. ACM, New York, NY, USA."},{"issue":"4","key":"36_CR41","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1145\/2043164.2018477","volume":"41","author":"P Gill","year":"2011","unstructured":"Gill P, Jain N, Nagappan N (2011) Understanding network failures in data centers: measurement, analysis, and implications. SIGCOMM Comput Commun Rev 41(4): 350\u2013361.","journal-title":"SIGCOMM Comput Commun Rev"},{"key":"36_CR42","unstructured":"Birke R, Chen LY, Smirni EData centers in the wild: A large performance study. Technical report, IBM (April 2012) http:\/\/domino.research.ibm.com\/library\/cyberdig.nsf\/papers\/0C306B31CF0D3861852579E40045F17F."},{"key":"36_CR43","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4471-4588-2","volume-title":"The Monte Carlo Simulation Method for System Reliability and Risk Analysiss. Springer Series in Reliability Engineering","author":"E Zio","year":"2013","unstructured":"Zio E (2013) Monte carlo simulation: The method In: The Monte Carlo Simulation Method for System Reliability and Risk Analysiss. Springer Series in Reliability Engineering.. Springer, London."},{"key":"36_CR44","unstructured":"Extreme Science and Engineering Discovery Environment. https:\/\/www.xsede.org\/home."},{"key":"36_CR45","unstructured":"Amazon EC2 Instance Types. http:\/\/aws.amazon.com\/ec2\/instance-types\/."},{"key":"36_CR46","unstructured":"Services AW (2013) Amazon EC2 Service Level Agreement. http:\/\/aws.amazon.com\/ec2-sla."},{"key":"36_CR47","unstructured":"Rackspace: Managed Service Level Agreement. http:\/\/www.rackspace.com\/managed_hosting\/support\/servicelevels\/managedsla\/ (2013)."},{"key":"36_CR48","unstructured":"Harris C (2011) IT Downtime Costs $ 26.5 Billion In Lost Revenue. http:\/\/www.informationweek.com\/storage\/disaster-recovery\/it-downtime-costs-265-billion-in-lost-re\/229625441."}],"container-title":["Journal of Cloud Computing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13677-015-0036-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s13677-015-0036-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13677-015-0036-6","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13677-015-0036-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,28]],"date-time":"2025-05-28T03:33:56Z","timestamp":1748403236000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofcloudcomputing.springeropen.com\/articles\/10.1186\/s13677-015-0036-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,5,28]]},"references-count":48,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2015,12]]}},"alternative-id":["36"],"URL":"https:\/\/doi.org\/10.1186\/s13677-015-0036-6","relation":{},"ISSN":["2192-113X"],"issn-type":[{"type":"electronic","value":"2192-113X"}],"subject":[],"published":{"date-parts":[[2015,5,28]]},"assertion":[{"value":"9 June 2014","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 May 2015","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 May 2015","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"11"}}