{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T03:56:11Z","timestamp":1780545371277,"version":"3.54.1"},"reference-count":23,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2022,6,30]],"date-time":"2022-06-30T00:00:00Z","timestamp":1656547200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>In this article, we evaluate the efficiency and performance of two clustering algorithms: AHC (Agglomerative Hierarchical Clustering) and K\u2212Means. We are aware that there are various linkage options and distance measures that influence the clustering results. We assess the quality of clustering using the Davies\u2013Bouldin and Dunn cluster validity indexes. The main contribution of this research is to verify whether the quality of clusters without outliers is higher than those with outliers in the data. To do this, we compare and analyze outlier detection algorithms depending on the applied clustering algorithm. In our research, we use and compare the LOF (Local Outlier Factor) and COF (Connectivity-based Outlier Factor) algorithms for detecting outliers before and after removing 1%, 5%, and 10% of outliers. Next, we analyze how the quality of clustering has improved. In the experiments, three real data sets were used with a different number of instances.<\/jats:p>","DOI":"10.3390\/e24070917","type":"journal-article","created":{"date-parts":[[2022,6,30]],"date-time":"2022-06-30T20:53:02Z","timestamp":1656622382000},"page":"917","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":28,"title":["How the Outliers Influence the Quality of Clustering?"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7238-1170","authenticated-orcid":false,"given":"Agnieszka","family":"Nowak-Brzezi\u0144ska","sequence":"first","affiliation":[{"name":"Institute of Computer Science, Faculty of Science and Technology, University of Silesia, Bankowa 12, 40-007 Katowice, Poland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4708-9036","authenticated-orcid":false,"given":"Igor","family":"Gaibei","sequence":"additional","affiliation":[{"name":"Institute of Computer Science, Faculty of Science and Technology, University of Silesia, Bankowa 12, 40-007 Katowice, Poland"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,6,30]]},"reference":[{"key":"ref_1","unstructured":"Kaufman, L., and Rousseeuw, P.J. (2005). Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons."},{"key":"ref_2","unstructured":"Legany, C., Juhasz, S., and Babos, A. (2006, January 15\u201317). Cluster validity measurement techniques. Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, AIKED\u201906, Madrid, Spain."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1080\/01969727408546059","article-title":"Well Separated Clusters and Optimal Fuzzy Partitions","volume":"4","author":"Dunn","year":"1974","journal-title":"J. Cybern."},{"key":"ref_4","unstructured":"Steinbach, M.S., Karypis, G., and Kumar, V. (2000). A Comparison of Document Clustering Techniques, University of Minnesota. Technical Report."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1600","DOI":"10.30534\/ijeter\/2020\/20852020","article-title":"A Comparative Study on K-Means Clustering and Agglomerative Hierarchical Clustering","volume":"8","author":"Karthikeyan","year":"2020","journal-title":"Int. J. Emerg. Trends Eng. Res."},{"key":"ref_6","first-page":"74","article-title":"Comparison of K-Means Algorithm and Hierarchical Algorithm using Weka Tool","volume":"7","author":"Saleena","year":"2018","journal-title":"Int. J. Adv. Res. Comput. Commun. Eng."},{"key":"ref_7","first-page":"203","article-title":"An enhanced algorithm for improved cluster generation to remove outlier\u2019s ratio for large datasets in data mining","volume":"1","author":"Vadgasiya","year":"2014","journal-title":"Int. J. Adv. Eng. Res. Dev."},{"key":"ref_8","first-page":"76","article-title":"Local and Global Outlier Detection Algorithms in Unsupervised Approach: A Review","volume":"17","author":"Jabbar","year":"2021","journal-title":"Iraqi J. Electr. Electron. Eng."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1420","DOI":"10.1016\/j.procs.2020.09.152","article-title":"Outliers in rules-the comparision of LOF, COF and KMEANS algorithms","volume":"176","year":"2020","journal-title":"Procedia Comput. Sci."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1093\/comjnl\/10.3.271","article-title":"A general theory of classificatory sorting strategies. II Clustering systems","volume":"10","author":"Lance","year":"1967","journal-title":"Comput. J."},{"key":"ref_11","unstructured":"(2009). The CLUSTER Procedure: Clustering Methods. SAS\/STAT 9.2 Users Guide, SAS Institute."},{"key":"ref_12","unstructured":"Kishan, G.M., Chilukuri, K.M., and HuaMing, H. (2017). Anomaly Detection Principles and Algorithms, Springer."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ranga Suri, N.N.R., Narasimha Murty, M., and Athithan, G. (2019). Outlier Detection: Techniques and Applications, Springer.","DOI":"10.1007\/978-3-030-05127-3"},{"key":"ref_14","unstructured":"Maddala, G.S. (1992). Outliers. Introduction to Econometrics, MacMillan. [2nd ed.]."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Breunig, M., Kriegel, H.P., Ng, R.T., and Sander, J. (2000, January 16\u201318). LOF: Identifying Density-Based Local Outliers. Proceedings of the ACM SIGMOD 2000 International Conference on Management of Data, Dallas, TX, USA.","DOI":"10.1145\/342009.335388"},{"key":"ref_16","unstructured":"(2021, October 10). UCI Machine Learning Repository. Available online: https:\/\/archive.ics.uci.edu\/ml\/."},{"key":"ref_17","unstructured":"Martiniano, A., Ferreira, R.P., and Sassi, R.J. (2018, April 05). Universidade Nove de Julho-Postgraduate Program in Informatics and Knowledge Management. Available online: https:\/\/archive.ics.uci.edu\/ml\/datasets\/Absenteeism+at+work."},{"key":"ref_18","unstructured":"Alzahrani, A., and Sadaoui, S. (2020, March 10). Shill Bidding Dataset. Available online: https:\/\/archive.ics.uci.edu\/ml\/datasets\/Shill+Bidding+Dataset."},{"key":"ref_19","unstructured":"Gardner, A., Selmic, R.R., Kanno, J., and Duncan, C.A. (2016, November 02). MoCap Hand Postures Data Set, Louisiana Tech University, Quinnipiac University. Available online: https:\/\/archive.ics.uci.edu\/ml\/datasets\/MoCap+Hand+Postures."},{"key":"ref_20","unstructured":"Wes McKinney and the Pandas Development Team (2021, November 20). Pandas: Powerful Python Data Analysis Toolkit Release 1.3.3. Available online: https:\/\/devdocs.io\/pandas~0.25\/."},{"key":"ref_21","unstructured":"(2021, November 20). NumPy Reference, Release 1.21.0, Written by the NumPy Community. Available online: https:\/\/numpy.org\/doc\/stable\/numpy-ref.pdf."},{"key":"ref_22","unstructured":"(2021, November 20). Anomaly Detection Tutorial (ANO101). Available online: https:\/\/github.com\/pycaret\/pycaret\/blob\/master\/tutorials\/Anomaly%20Detection%20Tutorial%20Level%20Beginner%20-%20ANO101.ipynb."},{"key":"ref_23","unstructured":"(2021, October 10). An Introduction to Machine Learning with Scikit-Learn. Available online: https:\/\/scikit-learn.org\/0.21\/tutorial\/basic\/tutorial.html."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/7\/917\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:41:33Z","timestamp":1760139693000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/7\/917"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,30]]},"references-count":23,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,7]]}},"alternative-id":["e24070917"],"URL":"https:\/\/doi.org\/10.3390\/e24070917","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,30]]}}}