{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T07:07:52Z","timestamp":1773385672179,"version":"3.50.1"},"reference-count":39,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2019,6,5]],"date-time":"2019-06-05T00:00:00Z","timestamp":1559692800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>The sensitivity of the elbow rule in determining an optimal number of clusters in high-dimensional spaces that are characterized by tightly distributed data points is demonstrated. The high-dimensional data samples are not artificially generated, but they are taken from a real world evolutionary many-objective optimization. They comprise of Pareto fronts from the last 10 generations of an evolutionary optimization computation with 14 objective functions. The choice for analyzing Pareto fronts is strategic, as it is squarely intended to benefit the user who only needs one solution to implement from the Pareto set, and therefore a systematic means of reducing the cardinality of solutions is imperative. As such, clustering the data and identifying the cluster from which to pick the desired solution is covered in this manuscript, highlighting the implementation of the elbow rule and the use of hyper-radial distances for cluster identity. The Calinski-Harabasz statistic was favored for determining the criteria used in the elbow rule because of its robustness. The statistic takes into account the variance within clusters and also the variance between the clusters. This exercise also opened an opportunity to revisit the justification of using the highest Calinski-Harabasz criterion for determining the optimal number of clusters for multivariate data. The elbow rule predicted the maximum end of the optimal number of clusters, and the highest Calinski-Harabasz criterion method favored the number of clusters at the lower end. Both results are used in a unique way for understanding high-dimensional data, despite being inconclusive regarding which of the two methods determine the true optimal number of clusters.<\/jats:p>","DOI":"10.3390\/make1020042","type":"journal-article","created":{"date-parts":[[2019,6,6]],"date-time":"2019-06-06T03:38:01Z","timestamp":1559792281000},"page":"715-744","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Optimal Clustering and Cluster Identity in Understanding High-Dimensional Data Spaces with Tightly Distributed Points"],"prefix":"10.3390","volume":"1","author":[{"given":"Oliver","family":"Chikumbo","sequence":"first","affiliation":[{"name":"Living PlanIT AG, Knonauerstrasse 52E, 6330 Cham, Switzerland"}]},{"given":"Vincent","family":"Granville","sequence":"additional","affiliation":[{"name":"Data Science Central, 2428 35th Avenue NE, Issaquah, WA 98029, USA"}]}],"member":"1968","published-online":{"date-parts":[[2019,6,5]]},"reference":[{"key":"ref_1","first-page":"37","article-title":"Pushing the limit in Visual Data Exploration: Techniques and Applications","volume":"Volume 2821","author":"Keim","year":"2003","journal-title":"KI 2003: Advances in Artificial Intelligence, Lecture Notes in Computer Science"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1007\/s40708-016-0043-5","article-title":"Visual analytics for concept exploration in subspaces of patient groups","volume":"3","author":"Hund","year":"2016","journal-title":"Brain Inform."},{"key":"ref_3","first-page":"1307","article-title":"Sammon mapping","volume":"18","author":"Henderson","year":"1997","journal-title":"Pattern Recognit. Lett."},{"key":"ref_4","first-page":"429","article-title":"On some mathematics for visualizing high dimensional data","volume":"64","author":"Wegman","year":"2001","journal-title":"Indian J. Stat."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v040.i02","article-title":"Tourr: An R package for exploring multivariate data with projections","volume":"40","author":"Wickham","year":"2011","journal-title":"J. Stat. Softw."},{"key":"ref_6","unstructured":"Wegman, E.J. (1995). Visualization Methods for the Exploration of High Dimensional Data, US Army Research Office Rpt DAAL03-91-G-0039, George Mason University, Centre for Computational Statistics."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"857","DOI":"10.1016\/S0169-7161(05)80150-6","article-title":"Statistical graphics and visualization","volume":"Volume 9","author":"Rao","year":"1993","journal-title":"Computational Statistics"},{"key":"ref_8","unstructured":"Savoska, S., and Loskovska, S. (2009, January 24\u201326). Parallel coordinates as a tool of exploratory data analysis. Proceedings of the 17th Telecommunications forum, TELFOR 2009, Serbia, Belgrade."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1007\/BF01898350","article-title":"The plane with parallel coordinates","volume":"1","author":"Inselberg","year":"1985","journal-title":"Visual Comput."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1080\/00031305.1979.10482688","article-title":"Graphical methods in statistics","volume":"33","author":"Fienberg","year":"1979","journal-title":"Am. Stat."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1007\/BF00337288","article-title":"Self-Organized Formation of Topologically Correct Feature Maps","volume":"43","author":"Kohonen","year":"1982","journal-title":"Biol. Cybern."},{"key":"ref_12","first-page":"2812","article-title":"Principal component analysis","volume":"6","author":"Bro","year":"2014","journal-title":"R. Soc. Chem. Anal. Methods"},{"key":"ref_13","unstructured":"Granville, V. (2018). Applied Stochastic Processes, Chaos Modeling and Probabilistic Properties of Numeration Systems, Data Science Central."},{"key":"ref_14","unstructured":"Arthur, D., and Vassilvitskii, S. (2007, January 7\u20139). K-means++: The advantages of careful seeding. Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1109\/T-C.1969.222678","article-title":"A nonlinear mapping for data structure analysis","volume":"C-18","author":"Sammon","year":"1969","journal-title":"IEEE Trans. Comput."},{"key":"ref_16","first-page":"3","article-title":"Determining profitability for Ngati Whakaue Tribal Lands Inc., farms by developing a sustainable land management plan","volume":"41","author":"Chikumbo","year":"2011","journal-title":"N. Z. J. For. Sci."},{"key":"ref_17","unstructured":"James, R.N., and Tarlton, G.L. (1990). STANDPAK stand modelling system for radiata pine. New Approaches to Spacing and Thinning in Plantation Forestry, Ministry of Forestry. FRI Bulletin No 151."},{"key":"ref_18","first-page":"409","article-title":"Description and validation of C change: A model for simulating carbon content in managed Pinus radiata stands","volume":"29","author":"Beets","year":"1999","journal-title":"N. Z. J. For. Sci."},{"key":"ref_19","unstructured":"Warner, M. (2003). Putting the Sustainable \u2018Development\u2019 Performance of Companies on the Balance Sheet, Overseas Development Institute."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1080\/00288231003606054","article-title":"Description and evaluation of the Farmax Dairy Pro decision support model","volume":"53","author":"Bryant","year":"2010","journal-title":"N. Z. J. Agric. Res."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.plrev.2006.10.002","article-title":"Fundamentals of natural computing: An overview","volume":"4","year":"2007","journal-title":"Phys. Life Rev."},{"key":"ref_22","unstructured":"Katoen, J.-P., and Stevens, P. (2002, January 8\u201312). Exploring the very large state spaces using genetic algorithms. Proceedings of the 8th International Conference on Tools and Algorithms for the construction and Analysis of Systems, Grenoble, France."},{"key":"ref_23","unstructured":"Holland, J.H. (2017, September 03). Genetic Algorithms. Available online: https:\/\/wiki.eecs.yorku.ca\/course_archive\/2011-12\/F\/4403\/_media\/genetic_algorithms.pdf."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1002\/humu.10296","article-title":"Allelic genes of blood group antigens: A source of human mutations and cSNPs documented in the Blood Group Antigen Gene Mutation Database","volume":"23","author":"Blumenfeld","year":"2004","journal-title":"Hum. Mutat."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1465","DOI":"10.1021\/bi702209s","article-title":"Misfolding of the cystic fibrosis transmembrane conductance regulator and disease","volume":"47","author":"Cheung","year":"2008","journal-title":"Biochemistry"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1002\/mcda.1536","article-title":"The triple bottomline many-objective-based decision making for a land use management problem","volume":"22","author":"Chikumbo","year":"2015","journal-title":"J. Multi-Criteria Decis. Anal."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/BF02289565","article-title":"Multidimensional scaling by optimizing goodness of fit to a non-metric hypothesis","volume":"29","author":"Kruskal","year":"1964","journal-title":"Psychometrika"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1007\/s100440050006","article-title":"On the initialization of Sammon\u2019s nonlinear mapping","volume":"3","author":"Lerner","year":"2000","journal-title":"Patterns Anal. Appl."},{"key":"ref_29","unstructured":"Ripley, B.D. (1996). Pattern Recognition and Neural Networks, Cambridge University Press. Chapter 9."},{"key":"ref_30","unstructured":"Pohlheim, H. (2019, June 04). GEATbx: Introduction, Evolutionary Algorithms: Overview, Methods and Operators. Available online: www.geatbx.com."},{"key":"ref_31","unstructured":"MathWorks Inc. (2015). Statistics and Machine Learning Toolbox, MathWorks Inc.. Package: Clustering.evaluation, Documentation."},{"key":"ref_32","unstructured":"Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Springer."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"667","DOI":"10.1007\/s00500-007-0247-y","article-title":"A particular Gaussian mixture model for clustering and its application to image retrieval","volume":"12","author":"Sahbi","year":"2008","journal-title":"Soft Comput."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1007\/BF02289263","article-title":"Who belongs in the family?","volume":"18","author":"Thorndike","year":"1953","journal-title":"Psychometrika"},{"key":"ref_35","unstructured":"Granville, V. (2019, June 04). How to Automatically Determine the Number of Clusters in Your Data\u2014And More. Available online: https:\/\/www.datasciencecentral.com\/profiles\/blogs\/how-to-automatically-determine-the-number-of-clusters-in-your-dat."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","article-title":"Least squares quantization in PCM","volume":"28","author":"Lloyd","year":"1982","journal-title":"IEEE Trans. Inform. Theory"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1504\/IJPD.2009.026172","article-title":"The hyper-radial visualization method for multi-attribute decision-making under certainty","volume":"9","author":"Chiu","year":"2009","journal-title":"Int. J. Prod. Dev."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Naim, A.M., Chiu, P.-W., Bloebaum, C.L., and Lewis, K.E. (2009, January 10\u201312). Hyper-radial visualization for multi-objective decision-making support under uncertainty using preference ranges: The PRUF method. Proceedings of the 12th AIAA\/ISSMO Multidisciplinary Analysis and Optimization Conference, Victoria, BC, Canada.","DOI":"10.2514\/6.2008-6087"},{"key":"ref_39","unstructured":"Balling, R. (1999, January 17\u201321). Design by shopping: A new paradigm?. Proceedings of the 3rd World Congress of Structural and Multidisciplinary Optimization (WCSMO-3), University at Buffalo, Buffalo, NY, USA."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/2\/42\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:56:31Z","timestamp":1760187391000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/2\/42"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,6,5]]},"references-count":39,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2019,6]]}},"alternative-id":["make1020042"],"URL":"https:\/\/doi.org\/10.3390\/make1020042","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,6,5]]}}}