{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T10:05:56Z","timestamp":1776852356235,"version":"3.51.2"},"reference-count":39,"publisher":"MDPI AG","issue":"15","license":[{"start":{"date-parts":[[2020,7,23]],"date-time":"2020-07-23T00:00:00Z","timestamp":1595462400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Spanish Ministry of Economy and Competitiveness","award":["TEC2017-84197-C4-2-R"],"award-info":[{"award-number":["TEC2017-84197-C4-2-R"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Performance analysis is an essential task in high-performance computing (HPC) systems, and it is applied for different purposes, such as anomaly detection, optimal resource allocation, and budget planning. HPC monitoring tasks generate a huge number of key performance indicators (KPIs) to supervise the status of the jobs running in these systems. KPIs give data about CPU usage, memory usage, network (interface) traffic, or other sensors that monitor the hardware. Analyzing this data, it is possible to obtain insightful information about running jobs, such as their characteristics, performance, and failures. The main contribution in this paper was to identify which metric\/s (KPIs) is\/are the most appropriate to identify\/classify different types of jobs according to their behavior in the HPC system. With this aim, we had applied different clustering techniques (partition and hierarchical clustering algorithms) using a real dataset from the Galician computation center (CESGA). We concluded that (i) those metrics (KPIs) related to the network (interface) traffic monitoring provided the best cohesion and separation to cluster HPC jobs, and (ii) hierarchical clustering algorithms were the most suitable for this task. Our approach was validated using a different real dataset from the same HPC center.<\/jats:p>","DOI":"10.3390\/s20154111","type":"journal-article","created":{"date-parts":[[2020,7,23]],"date-time":"2020-07-23T11:26:01Z","timestamp":1595503561000},"page":"4111","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0736-554X","authenticated-orcid":false,"given":"Mohamed S.","family":"Halawa","sequence":"first","affiliation":[{"name":"Business Information Systems Department, Arab Academy for Science Technology and Maritime Transport, Cairo 11799, Egypt"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2367-2219","authenticated-orcid":false,"given":"Rebeca P.","family":"D\u00edaz Redondo","sequence":"additional","affiliation":[{"name":"Information &amp; Computing Lab, AtlanTTIC Research Center, Universidade de Vigo, 36310 Vigo, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1047-2143","authenticated-orcid":false,"given":"Ana","family":"Fern\u00e1ndez Vilas","sequence":"additional","affiliation":[{"name":"Information &amp; Computing Lab, AtlanTTIC Research Center, Universidade de Vigo, 36310 Vigo, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,7,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Sorkunlu, N., Chandola, V., and Patra, A. (2017, January 5\u20138). Tracking System Behavior from Resource Usage Data. Proceedings of the 2017 IEEE International Conference on Cluster Computing (CLUSTER), Honolulu, HI, USA.","DOI":"10.1109\/CLUSTER.2017.70"},{"key":"ref_2","unstructured":"Frey, S., Claudia, L., and Reich, C. (2013). Key Performance Indicators for Cloud Computing SLAs. Int. Conf. Emerg. Netw. Intell., 60\u201364."},{"key":"ref_3","first-page":"1","article-title":"Anomaly detection","volume":"14","author":"Prasad","year":"2009","journal-title":"Comput. Mater. Contin."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1016\/j.ijinfomgt.2018.08.006","article-title":"Real-time big data processing for anomaly detection: A Survey","volume":"45","author":"Habeeb","year":"2019","journal-title":"Int. J. Inf. Manag."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Wang, C., Viswanathan, K., Choudur, L., Talwar, V., Satterfield, W., and Schwan, K. (2011, January 23\u201327). Statistical techniques for online anomaly detection in data centers. Proceedings of the 12th IFIP\/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops, Dublin, Ireland.","DOI":"10.1109\/INM.2011.5990537"},{"key":"ref_6","first-page":"1","article-title":"Time series k -means: A new k -means type smooth subspace clustering for time series data","volume":"367","author":"Huang","year":"2016","journal-title":"Inf. Sci."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.is.2015.04.007","article-title":"Time-series clustering\u2014A decade review","volume":"53","author":"Aghabozorgi","year":"2015","journal-title":"Inf. Syst."},{"key":"ref_8","unstructured":"Vallis, O., Hochenbaum, J., and Kejariwal, A. (2014). A Novel Technique for Long-Term Anomaly Detection in the Cloud, Twitter Inc."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Peiris, M., Hill, J.H., Thelin, J., Bykov, S., Kliot, G., and K\u00f6nig, C. (July, January 27). PAD: Performance Anomaly Detection in Multi-server Distributed Systems. Proceedings of the 2014 IEEE 7th International Conference on Cloud Computing, Anchorage, AK, USA.","DOI":"10.1109\/CLOUD.2014.107"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"680","DOI":"10.1007\/978-3-030-14118-9_67","article-title":"Supervised Performance Anomaly Detection in HPC Data Centers","volume":"Volume 921","author":"Halawa","year":"2019","journal-title":"Proceedings of the Advances in Intelligent Systems and Computing"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1145\/2627534.2627557","article-title":"Big data classification: Problems and challenges in network intrusion prediction with machine learning","volume":"41","author":"Suthaharan","year":"2014","journal-title":"Perform. Eval. Rev."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"558","DOI":"10.1016\/j.ins.2017.08.065","article-title":"Unsupervised clustering of service performance behaviors","volume":"422","author":"Yahyaoui","year":"2018","journal-title":"Inf. Sci."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"12015","DOI":"10.1088\/1742-6596\/1235\/1\/012015","article-title":"Enhancement Clustering Evaluation Result of Davies-Bouldin Index with Determining Initial Centroid of K-Means Algorithm","volume":"1235","author":"Sitompul","year":"2019","journal-title":"J. Physics Conf. Ser."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1016\/j.procs.2016.03.005","article-title":"Penalty Parameter Selection for Hierarchical Data Stream Clustering","volume":"79","author":"Bhagat","year":"2016","journal-title":"Procedia Comput. Sci."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"8","DOI":"10.20982\/tqmp.11.1.p008","article-title":"Hierarchical Cluster Analysis: Comparison of Three Linkage Measures and Application to Psychological Data","volume":"11","author":"Yim","year":"2015","journal-title":"Quant. Methods Psychol."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.eswa.2019.06.064","article-title":"Combining hierarchical clustering approaches using the PCA method","volume":"137","author":"Jafarzadegan","year":"2019","journal-title":"Expert Syst. Appl."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Ma, R., Angryk, R., and Riley, P. (2016, January 5\u20138). A data-driven analysis of interplanetary coronal mass ejecta and magnetic flux ropes. Proceedings of the 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, USA.","DOI":"10.1109\/BigData.2016.7840973"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"e2227","DOI":"10.1002\/smr.2227","article-title":"Automatically identifying valid API versions for software development tutorials on the Web","volume":"32","author":"Nishi","year":"2020","journal-title":"J. Softw. Evol. Process."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"356","DOI":"10.1007\/978-981-10-7398-4_37","article-title":"Comparison of Similarity Measures in Collaborative Filtering Algorithm","volume":"Volume 464","author":"Wang","year":"2018","journal-title":"Proceedings of the Lecture Notes in Electrical Engineering"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1007\/s10618-005-0039-x","article-title":"Characteristic-Based Clustering for Time Series Data","volume":"13","author":"Wang","year":"2006","journal-title":"Data Min. Knowl. Discov."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Liu, Y., Li, Z., Xiong, H., Gao, X., and Wu, J. (2010, January 13\u201317). Understanding of Internal Clustering Validation Measures. Proceedings of the 2010 IEEE International Conference on Data Mining, Sydney, NSW, Australia.","DOI":"10.1109\/ICDM.2010.35"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: A graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J. Comput. Appl. Math."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Davies, D.L., and Bouldin, D.W. (1979). A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell., 224\u2013227.","DOI":"10.1109\/TPAMI.1979.4766909"},{"key":"ref_24","unstructured":"Dalmaijer, E.S., Nord, C.L., and Astle, D.E. (2003). Statistical Power for Cluster Analysis. arXiv."},{"key":"ref_25","unstructured":"Van Der Maaten, L., Postma, E., and Van Den Herik, J. (2009). Dimensionality Reduction: A Comparative Review, Tilburg Centre for Creative Computing."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1038\/nbt0308-303","article-title":"What is principal component analysis?","volume":"26","year":"2008","journal-title":"Nat. Biotechnol."},{"key":"ref_27","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"McInnes, L., and Healy, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv.","DOI":"10.21105\/joss.00861"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Ding, C., and He, X. (2004, January 22\u201324). Principal Component Analysis and Effective K-means Clustering. Proceedings of the 2004 SIAM International Conference on Data Mining; Society for Industrial & Applied Mathematics (SIAM), Lake Buena Vista, FL, USA.","DOI":"10.1137\/1.9781611972740.54"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Tajunisha, N., and Saravanan, V. (2010, January 5\u20137). An Increased Performance of Clustering High Dimensional Data Using Principal Component Analysis. Proceedings of the 2010 First International Conference on Integrated Intelligent Computing, Bangalore, India.","DOI":"10.1109\/ICIIC.2010.31"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Cao, D., Tian, Y., and Bai, D. (2015). Time Series Clustering Method Based on Principal Component Analysis, Atlantis Press.","DOI":"10.2991\/icimm-15.2015.163"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1016\/j.neucom.2019.03.060","article-title":"Multivariate time series clustering based on common principal component analysis","volume":"349","author":"Li","year":"2019","journal-title":"Neurocomputing"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1016\/j.future.2018.06.031","article-title":"CANF: Clustering and anomaly detection method using nearest and farthest neighbor","volume":"89","author":"Faroughi","year":"2018","journal-title":"Futur. Gener. Comput. Syst."},{"key":"ref_34","first-page":"17","article-title":"Toward Cloud Computing: Security and Performance","volume":"5","author":"Zanoon","year":"2015","journal-title":"Int. J. Cloud Comput. Serv. Arch."},{"key":"ref_35","first-page":"355","article-title":"Diagnosing Performance Variations in HPC Applications Using Machine Learning","volume":"Volume 10266","author":"Tuncer","year":"2017","journal-title":"Proceedings of the Intelligent Tutoring Systems"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Li, Z., Zhao, Y., Liu, R., and Pei, D. (2018, January 4\u20136). Robust and Rapid Clustering of KPIs for Large-Scale Anomaly Detection. Proceedings of the 2018 IEEE\/ACM 26th International Symposium on Quality of Service (IWQoS), Banff, AB, Canada.","DOI":"10.1109\/IWQoS.2018.8624168"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Mariani, L., Monni, C., Pezze, M., Riganelli, O., and Xin, R. (2018, January 9\u201313). Localizing Faults in Cloud Systems. Proceedings of the 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST), Vasteras, Sweden.","DOI":"10.1109\/ICST.2018.00034"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1038\/s41592-019-0686-2","article-title":"SciPy 1.0: Fundamental algorithms for scientific computing in Python","volume":"17","author":"Virtanen","year":"2020","journal-title":"Nat. Methods"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1186\/s40537-018-0145-4","article-title":"Intrusion detection model using machine learning algorithm on Big Data environment","volume":"5","author":"Othman","year":"2018","journal-title":"J. Big Data"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/15\/4111\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:51:16Z","timestamp":1760176276000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/15\/4111"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,23]]},"references-count":39,"journal-issue":{"issue":"15","published-online":{"date-parts":[[2020,8]]}},"alternative-id":["s20154111"],"URL":"https:\/\/doi.org\/10.3390\/s20154111","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,7,23]]}}}