{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T10:29:36Z","timestamp":1762338576304,"version":"build-2065373602"},"reference-count":54,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2024,7,19]],"date-time":"2024-07-19T00:00:00Z","timestamp":1721347200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>Moving high-performance computing (HPC) applications from HPC clusters to cloud computing clusters, also known as the HPC cloud, has recently been proposed by the HPC research community. Migrating these applications from the former environment to the latter can have an important impact on their performance, due to the different technologies used and the suboptimal use and configuration of cloud resources such as heterogeneous storage. Probabilistic models can be applied to predict the performance of these applications and to optimise them for the new system. Modelling the performance in the HPC cloud of applications that use heterogeneous storage is a difficult task, due to the variations in performance. This paper presents a novel model based on Extreme Value Theory (EVT) for the analysis, characterisation and prediction of the performance of HPC applications that use heterogeneous storage technologies in the cloud and high-performance distributed parallel file systems. Unlike standard approaches, our model focuses on extreme values, capturing the true variability and potential bottlenecks in storage performance. Our model is validated using return level analysis to study the performance of representative scientific benchmarks running on heterogeneous cloud storage at a large scale and gives prediction errors of less than 7%.<\/jats:p>","DOI":"10.3390\/computation12070150","type":"journal-article","created":{"date-parts":[[2024,7,19]],"date-time":"2024-07-19T14:16:38Z","timestamp":1721398598000},"page":"150","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Modelling the Impact of Cloud Storage Heterogeneity on HPC Application Performance"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2673-3507","authenticated-orcid":false,"given":"Jack","family":"Marquez","sequence":"first","affiliation":[{"name":"Faculty of Engineering, Universidad Autonoma de Occidente, Cali 760030, Colombia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5772-6545","authenticated-orcid":false,"given":"Oscar H.","family":"Mondragon","sequence":"additional","affiliation":[{"name":"Faculty of Engineering, Universidad Autonoma de Occidente, Cali 760030, Colombia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,7,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Neuwirth, S., and Paul, A.K. (2021, January 7\u201310). Parallel i\/o evaluation techniques and emerging hpc workloads: A perspective. Proceedings of the 2021 IEEE International Conference on Cluster Computing (CLUSTER), Portland, OR, USA.","DOI":"10.1109\/Cluster48925.2021.00100"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Mell, P., and Grance, T. (2011). The NIST Definition of Cloud Computing, NIST.","DOI":"10.6028\/NIST.SP.800-145"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3150224","article-title":"HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges","volume":"51","author":"Netto","year":"2018","journal-title":"ACM Comput. Surv."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Borin, E., Drummond, L.M.A., Gaudiot, J.L., Melo, A., Alves, M.M., and Navaux, P.O.A. (2023). High Performance Computing in Clouds: Moving HPC Applications to a Scalable and Cost-Effective Environment, Springer Nature.","DOI":"10.1007\/978-3-031-29769-4"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"2273","DOI":"10.1007\/s10586-023-04060-4","article-title":"Cloud benchmarking and performance analysis of an HPC application in Amazon EC2","volume":"27","author":"Dancheva","year":"2024","journal-title":"Clust. Comput."},{"key":"ref_6","first-page":"65","article-title":"Information communication & computation technology (ICCT) as a strategic tool for industry sectors","volume":"3","author":"Aithal","year":"2019","journal-title":"Int. J. Appl. Eng. Manag. Lett. (IJAEML)"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"45","DOI":"10.22456\/2175-2745.106794","article-title":"Cloud infrastructure for HPC investment analysis","volume":"27","author":"Cavalheiro","year":"2020","journal-title":"Rev. Inform\u00e1tica Te\u00f3rica E Apl."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1016\/j.jpdc.2020.02.001","article-title":"How fast can one resize a distributed file system?","volume":"140","author":"Cheriere","year":"2020","journal-title":"J. Parallel Distrib. Comput."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Subramanyam, R. (2015, January 21\u201325). HDFS Heterogeneous Storage Resource Management Based on Data Temperature. Proceedings of the 2015 International Conference on Cloud and Autonomic Computing, Boston, MA, USA.","DOI":"10.1109\/ICCAC.2015.33"},{"key":"ref_10","unstructured":"Braam, P. (2019). The Lustre storage architecture. arXiv."},{"key":"ref_11","unstructured":"Heichler, J. (2024, April 01). An introduction to BeeGFS. Available online: http:\/\/www.beegfs.de\/docs\/whitepapers\/Introduction_to_BeeGFS_by_ThinkParQ.pdf."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Souza Filho, P., Felipe, L., Arag\u00e4o, P., Bejarano, L., de Paula, D.T., Sardinha, A., Azambuja, A., and Sierra, F. (2020, January 8\u201311). Large Scale Seismic Processing in Public Cloud. Proceedings of the 82nd EAGE Annual Conference & Exhibition, Amsterdam, The Netherlands.","DOI":"10.3997\/2214-4609.202011916"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Rao, M.V. (2020). Data duplication using Amazon Web Services cloud storage. Data Deduplication Approaches: Concepts, Strategies, and Challenges, Academic Press.","DOI":"10.1016\/B978-0-12-823395-5.00006-9"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Chakraborty, M., and Kundan, A.P. (2021). Grafana. Monitoring Cloud-Native Applications: Lead Agile Operations Confidently Using Open Source Software, Springer.","DOI":"10.1007\/978-1-4842-6888-9"},{"key":"ref_15","unstructured":"Haan, L., and Ferreira, A. (2006). Extreme Value Theory: An Introduction, Springer."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"103135","DOI":"10.1016\/j.micpro.2020.103135","article-title":"Probabilistic-WCET reliability: Statistical testing of EVT hypotheses","volume":"77","author":"Reghenzani","year":"2020","journal-title":"Microprocess. Microsyst."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"569","DOI":"10.4236\/jmf.2020.104034","article-title":"Forecasting value-at-risk of financial markets under the global pandemic of COVID-19 using conditional extreme value theory","volume":"10","author":"Omar","year":"2020","journal-title":"J. Math. Financ."},{"key":"ref_18","unstructured":"Embrechts, P., Kl\u00fcppelberg, C., and Mikosch, T. (2013). Modelling Extremal Events: For Insurance and Finance, Springer Science & Business Media."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Coles, S., Bawa, J., Trenner, L., and Dorazio, P. (2001). An Introduction to Statistical Modeling of Extreme Values, Springer.","DOI":"10.1007\/978-1-4471-3675-0"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1016\/j.trc.2018.03.011","article-title":"A combined use of microscopic traffic simulation and extreme value methods for traffic safety evaluation","volume":"90","author":"Wang","year":"2018","journal-title":"Transp. Res. Part C Emerg. Technol."},{"key":"ref_21","unstructured":"Beirlant, J., Goegebeur, Y., Segers, J., and Teugels, J.L. (2006). Statistics of Extremes: Theory and Applications, John Wiley & Sons."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"467","DOI":"10.1061\/(ASCE)0733-9496(1985)111:4(467)","article-title":"Application of extreme value theory to flood damage","volume":"111","author":"Ouellette","year":"1985","journal-title":"J. Water Resour. Plan. Manag."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"e2021WR030506","DOI":"10.1029\/2021WR030506","article-title":"Understanding heavy tails of flood peak distributions","volume":"58","author":"Merz","year":"2022","journal-title":"Water Resour. Res."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"125932","DOI":"10.1016\/j.jhydrol.2020.125932","article-title":"Extreme value analysis dilemma for climate change impact assessment on global flood and extreme precipitation","volume":"593","author":"Tabari","year":"2021","journal-title":"J. Hydrol."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Haskins, K., Wofford, Q., and Bridges, P.G. (2019, January 23\u201326). Workflows for performance predictable and reproducible hpc applications. Proceedings of the 2019 IEEE International Conference on Cluster Computing (CLUSTER), Albuquerque, NM, USA.","DOI":"10.1109\/CLUSTER.2019.8891043"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Mondragon, O.H., Bridges, P.G., Levy, S., Ferreira, K.B., and Widener, P. (2016, January 13\u201318). Understanding performance interference in next-generation HPC systems. Proceedings of the SC\u201916: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT, USA.","DOI":"10.1109\/SC.2016.32"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Seelam, S., Fong, L., Tantawi, A., Lewars, J., Divirgilio, J., and Gildea, K. (2010, January 19\u201323). Extreme scale computing: Modeling the impact of system noise in multicore clustered systems. Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), Atlanta, GA, USA.","DOI":"10.1109\/IPDPS.2010.5470398"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"180","DOI":"10.1017\/S0305004100015681","article-title":"Limiting forms of the frequency distribution of the largest or smallest member of a sample","volume":"Volume 24","author":"Fisher","year":"1928","journal-title":"Proceedings of the Mathematical Proceedings of the Cambridge Philosophical Society"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"423","DOI":"10.2307\/1968974","article-title":"Sur La Distribution Limite Du Terme Maximum D\u2019Une S\u00e9rie Al\u00e9atoire","volume":"44","author":"Gnedenko","year":"1943","journal-title":"Ann. Math."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1002\/qj.49708134804","article-title":"The frequency distribution of the annual maximum (or minimum) values of meteorological elements","volume":"81","author":"Jenkinson","year":"1955","journal-title":"Q. J. R. Meteorol. Soc."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"35","DOI":"10.3905\/jod.2011.18.3.035","article-title":"The generalized extreme value distribution, implied tail index, and option pricing","volume":"18","author":"Markose","year":"2011","journal-title":"J. Deriv."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/0022-1694(92)90167-T","article-title":"Variance of two-and three-parameter GEV\/PWM quantile estimators: Formulae, confidence intervals, and a comparison","volume":"138","author":"Lu","year":"1992","journal-title":"J. Hydrol."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1109\/94.485513","article-title":"Maximum likelihood estimation in the 3-parameter Weibull distribution. A look through the generalized extreme-value distribution","volume":"3","author":"Hirose","year":"1996","journal-title":"IEEE Trans. Dielectr. Electr. Insul."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1111\/j.2517-6161.1990.tb01775.x","article-title":"L-moments: Analysis and estimation of distributions using linear combinations of order statistics","volume":"52","author":"Hosking","year":"1990","journal-title":"J. R. Stat. Soc. Ser. B (Methodol.)"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1016\/0022-1694(86)90004-1","article-title":"Extreme value theory based on the r largest annual events","volume":"86","author":"Smith","year":"1986","journal-title":"J. Hydrol."},{"key":"ref_36","unstructured":"McNeil, A.J. (1998). Calculating Quantile Risk Measures for Financial Return Series Using Extreme Value Theory, ETH Zurich. Technical Report."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Mehta, N.J., and Yang, F. (2022). Portfolio optimization for extreme risks with maximum diversification: An empirical analysis. Risks, 10.","DOI":"10.3390\/risks10050101"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"104144","DOI":"10.1016\/j.advwatres.2022.104144","article-title":"Extreme precipitation in China: A review on statistical methods and applications","volume":"163","author":"Gu","year":"2022","journal-title":"Adv. Water Resour."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"106407","DOI":"10.1016\/j.ijfatigue.2021.106407","article-title":"More than 25 years of extreme value statistics for defects: Fundamentals, historical developments, recent applications","volume":"151","author":"Beretta","year":"2021","journal-title":"Int. J. Fatigue"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1080\/03610918.2010.530368","article-title":"Minimum sample size determination for generalized extreme value distribution","volume":"40","author":"Cai","year":"2010","journal-title":"Commun. Stat. Comput."},{"key":"ref_41","unstructured":"Henwood, R., Watkins, N.W., Chapman, S.C., and McLay, R. (2018). A parallel workload has extreme variability in a production environment. arXiv."},{"key":"ref_42","unstructured":"Duplyakin, D., Ricci, R., Maricq, A., Wong, G., Duerig, J., Eide, E., Stoller, L., Hibler, M., Johnson, D., and Webb, K. (2019, January 10\u201312). The Design and Operation of CloudLab. Proceedings of the 2019 USENIX Annual Technical Conference (ATC 2019), Renton, WA, USA."},{"key":"ref_43","unstructured":"Fragalla, J. (2024, April 01). Configure, Tune, and Benchmark a Lustre FileSystem. In 2014 Oil & Gas HPC Workshop. Available online: http:\/\/rice2014oghpc.blogs.rice.edu\/files\/2014\/03\/Fragalla-Xyratex_Lustre_PerformanceTuning_Fragalla_0314.pdf."},{"key":"ref_44","unstructured":"NORCOTT (2024, April 01). Iozone Filesystem Benchmark. Available online: http:\/\/www.iozone.org\/."},{"key":"ref_45","unstructured":"Conway, A., Bakshi, A., Jiao, Y., Jannen, W., Zhan, Y., Yuan, J., Bender, M.A., Johnson, R., Kuszmaul, B.C., and Porter, D.E. (March, January 27). File systems fated for senescence? nonsense, says science!. Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST 17), Santa Clara, CA, USA."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Yu, W., Vetter, J., Canon, R.S., and Jiang, S. (2007, January 14\u201317). Exploiting lustre file joining for effective collective io. Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid\u201907), Rio de Janeiro, Brazil.","DOI":"10.1109\/CCGRID.2007.51"},{"key":"ref_47","unstructured":"Wong, P., and Der Wijngaart, R. (2003). NAS Parallel Benchmarks I\/O, NASA Ames Research Center. Version 2.4; Tech. Rep. NAS-03-002."},{"key":"ref_48","unstructured":"Oracle (2024, April 06). Lustre 1.6 Operations Manual. Available online: https:\/\/docs.oracle.com\/cd\/E19091-01\/lustre.fs16\/820-3681-11\/820-3681-11.pdf."},{"key":"ref_49","unstructured":"Amaral, J.N. (2024, April 02). About Computing Science Research Methodology. Available online: https:\/\/webdocs.cs.ualberta.ca\/~amaral\/courses\/MetodosDePesquisa\/papers\/Amaral-research-methods.pdf."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Huang, H.H., Li, S., Szalay, A., and Terzis, A. (2011, January 23\u201327). Performance modeling and analysis of flash-based storage devices. Proceedings of the 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST), Denver, CO, USA.","DOI":"10.1109\/MSST.2011.5937213"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Dominguez-Trujillo, J., Haskins, K., Khouzani, S.J., Leap, C., Tashakkori, S., Wofford, Q., Estrada, T., Bridges, P.G., and Widener, P.M. Lightweight Measurement and Analysis of HPC Performance Variability. Proceedings of the 2020 IEEE\/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).","DOI":"10.1109\/PMBS51919.2020.00011"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Lima, G., Dias, D., and Barros, E. (2016, January 5\u20138). Extreme value theory for estimating task execution time bounds: A careful look. Proceedings of the 2016 28th Euromicro Conference on Real-Time Systems (ECRTS), Toulouse, France.","DOI":"10.1109\/ECRTS.2016.20"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Berezovskyi, K., Santinelli, L., Bletsas, K., and Tovar, E. (2014, January 8\u201310). WCET measurement-based and extreme value theory characterisation of CUDA kernels. Proceedings of the Proceedings of the 22nd International Conference on Real-Time Networks and Systems, Versailles, France.","DOI":"10.1145\/2659787.2659827"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"348","DOI":"10.1504\/IJDATS.2017.088363","article-title":"Execution time distributions in embedded safety-critical systems using extreme value theory","volume":"9","author":"Castillo","year":"2017","journal-title":"Int. J. Data Anal. Tech. Strateg."}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/12\/7\/150\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:19:53Z","timestamp":1760109593000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/12\/7\/150"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,19]]},"references-count":54,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2024,7]]}},"alternative-id":["computation12070150"],"URL":"https:\/\/doi.org\/10.3390\/computation12070150","relation":{},"ISSN":["2079-3197"],"issn-type":[{"type":"electronic","value":"2079-3197"}],"subject":[],"published":{"date-parts":[[2024,7,19]]}}}