{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,26]],"date-time":"2026-06-26T04:01:04Z","timestamp":1782446464280,"version":"3.54.5"},"reference-count":101,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,7,6]],"date-time":"2024-07-06T00:00:00Z","timestamp":1720224000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,7,6]],"date-time":"2024-07-06T00:00:00Z","timestamp":1720224000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000923","name":"Australian Research Council","doi-asserted-by":"publisher","award":["ARC DP210100227"],"award-info":[{"award-number":["ARC DP210100227"]}],"id":[{"id":"10.13039\/501100000923","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Classif"],"published-print":{"date-parts":[[2025,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they are meaningful in low-dimensional partitional data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm and the expert labels from a large battery of benchmark data, we discover that MST methods can be very competitive. Next, we review, study, extend, and generalise a few existing, state-of-the-art MST-based partitioning schemes. This leads to some new noteworthy approaches. Overall, the Genie and the information-theoretic methods often outperform the non-MST algorithms such as K-means, Gaussian mixtures, spectral clustering, Birch, density-based, and classical hierarchical agglomerative procedures. Nevertheless, we identify that there is still some room for improvement, and thus the development of novel algorithms is encouraged.<\/jats:p>","DOI":"10.1007\/s00357-024-09483-1","type":"journal-article","created":{"date-parts":[[2024,7,6]],"date-time":"2024-07-06T10:01:36Z","timestamp":1720260096000},"page":"90-112","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Clustering with Minimum Spanning Trees: How Good Can It Be?"],"prefix":"10.1007","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0637-6028","authenticated-orcid":false,"given":"Marek","family":"Gagolewski","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8697-5383","authenticated-orcid":false,"given":"Anna","family":"Cena","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6088-8273","authenticated-orcid":false,"given":"Maciej","family":"Bartoszuk","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3625-3312","authenticated-orcid":false,"given":"\u0141ukasz","family":"Brzozowski","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,7,6]]},"reference":[{"issue":"1","key":"9483_CR1","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1016\/j.patcog.2012.07.021","volume":"46","author":"O Arbelaitz","year":"2013","unstructured":"Arbelaitz, O., Gurrutxaga, I., Muguerza, J., P\u00e9rez, J. M., & Perona, I. (2013). An extensive comparative study of cluster validity indices. Pattern Recognition, 46(1), 243\u2013256. https:\/\/doi.org\/10.1016\/j.patcog.2012.07.021","journal-title":"Pattern Recognition"},{"key":"9483_CR2","unstructured":"Ball, G., & Hall, D. (1965). ISODATA: A novel method of data analysis and pattern classification (Tech. Rep. No. AD699616). Stanford Research Institute."},{"issue":"3","key":"9483_CR3","doi-asserted-by":"publisher","first-page":"368","DOI":"10.1109\/91.771092","volume":"7","author":"J Bezdek","year":"1999","unstructured":"Bezdek, J., Keller, J., Krishnapuram, R., Kuncheva, L., & Pal, N. (1999). Will the real Iris data please stand up? IEEE Transactions on Fuzzy Systems, 7(3), 368\u2013369. https:\/\/doi.org\/10.1109\/91.771092","journal-title":"IEEE Transactions on Fuzzy Systems"},{"key":"9483_CR4","doi-asserted-by":"publisher","unstructured":"Bezdek, J., & Pal, N. (1998). Some new indexes of cluster validity. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 28(3), 301\u2013315. https:\/\/doi.org\/10.1109\/3477.678624","DOI":"10.1109\/3477.678624"},{"issue":"495","key":"9483_CR5","doi-asserted-by":"publisher","first-page":"1075","DOI":"10.1198\/jasa.2011.tm10183","volume":"106","author":"J Bien","year":"2011","unstructured":"Bien, J., & Tibshirani, R. (2011). Hierarchical clustering with prototypes via Minimax linkage. The Journal of the American Statistical Association, 106(495), 1075\u20131084.","journal-title":"The Journal of the American Statistical Association"},{"key":"9483_CR6","doi-asserted-by":"crossref","unstructured":"Blum, A., Hopcroft, J., & Kannan, R. (2020). Foundations of data science. Cambridge University Press. Retrieved from https:\/\/www.cs.cornell.edu\/jeh\/book.pdf","DOI":"10.1017\/9781108755528"},{"key":"9483_CR7","first-page":"37","volume":"3","author":"O Bor\u016fvka","year":"1926","unstructured":"Bor\u016fvka, O. (1926). O jist\u00e9m probl\u00e9mu minim\u00e1ln\u00edm. Pr\u00e1ce Moravsk\u00e9 P\u0159\u00edrodov\u011bdeck\u00e9 Spole\u010dnosti v Brn\u011b, 3, 37\u201358.","journal-title":"Pr\u00e1ce Moravsk\u00e9 P\u0159\u00edrodov\u011bdeck\u00e9 Spole\u010dnosti v Brn\u011b"},{"issue":"1","key":"9483_CR8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1080\/03610927408827101","volume":"3","author":"T Cali\u0144ski","year":"1974","unstructured":"Cali\u0144ski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3(1), 1\u201327. https:\/\/doi.org\/10.1080\/03610927408827101","journal-title":"Communications in Statistics"},{"key":"9483_CR9","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1016\/j.ins.2022.11.114","volume":"623","author":"A Campagner","year":"2023","unstructured":"Campagner, A., Ciucci, D., & Denoeux, T. (2023). A general framework for evaluating and comparing soft clusterings. Information Sciences, 623, 70\u201393. https:\/\/doi.org\/10.1016\/j.ins.2022.11.114","journal-title":"Information Sciences"},{"key":"9483_CR10","doi-asserted-by":"publisher","unstructured":"Campello, R.J.G.B., Moulavi, D., Zimek, A., & Sander, J. (2015). Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data, 10(1), 5:1\u20135:51. https:\/\/doi.org\/10.1145\/2733381","DOI":"10.1145\/2733381"},{"key":"9483_CR11","volume-title":"Adaptive hierarchical clustering algorithms based on data aggregation methods (Unpublished doctoral dissertation)","author":"A Cena","year":"2018","unstructured":"Cena, A. (2018). Adaptive hierarchical clustering algorithms based on data aggregation methods (Unpublished doctoral dissertation). Polish Academy of Sciences: Systems Research Institute. (In Polish)."},{"key":"9483_CR12","unstructured":"Chaudhuri, K., & Dasgupta, S. (2010). Rates of convergence for the cluster tree. Advances in neural information processing systems (pp. 343\u2013351)."},{"key":"9483_CR13","unstructured":"Cormen, T., Leiserson, C., Rivest, R., & Stein, C. (2009). Introduction to algorithms. MIT Press and McGraw-Hill."},{"key":"9483_CR14","unstructured":"Dasgupta, S., & Ng, V. (2009). Single data, multiple clusterings. Proceedings NIPS Workshop Clustering: Science or Art? Towards Principled Approaches."},{"key":"9483_CR15","doi-asserted-by":"publisher","unstructured":"Davies, D.L., & Bouldin, D.W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI\u20131(2), 224\u2013227. https:\/\/doi.org\/10.1109\/TPAMI.1979.4766909","DOI":"10.1109\/TPAMI.1979.4766909"},{"issue":"1","key":"9483_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","volume":"39","author":"A Dempster","year":"1977","unstructured":"Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1\u201322. https:\/\/doi.org\/10.1111\/j.2517-6161.1977.tb01600.x","journal-title":"Journal of the Royal Statistical Society, Series B"},{"issue":"5","key":"9483_CR17","doi-asserted-by":"publisher","first-page":"525","DOI":"10.1016\/0031-3203(83)90057-2","volume":"16","author":"V Di Gesu","year":"1983","unstructured":"Di Gesu, V., & Sacco, B. (1983). Some statistical properties of the minimum spanning forest. Pattern Recognition, 16(5), 525\u2013531.","journal-title":"Pattern Recognition"},{"issue":"5","key":"9483_CR18","doi-asserted-by":"publisher","first-page":"420","DOI":"10.1147\/rd.175.0420","volume":"17","author":"W Donath","year":"1973","unstructured":"Donath, W., & Hoffman, A. (1973). Lower bounds for the partitioning of graphs. IBM Journal of Research and Development, 17(5), 420\u2013425. https:\/\/doi.org\/10.1147\/rd.175.0420","journal-title":"IBM Journal of Research and Development"},{"key":"9483_CR19","unstructured":"Dua, D., & Graff, C. (2021). UCI Machine Learning Repository. Irvine, CA. http:\/\/archive.ics.uci.edu\/ml"},{"issue":"3","key":"9483_CR20","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1080\/01969727308546046","volume":"3","author":"J Dunn","year":"1974","unstructured":"Dunn, J. (1974). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3(3), 32\u201357. https:\/\/doi.org\/10.1080\/01969727308546046","journal-title":"Journal of Cybernetics"},{"key":"9483_CR21","doi-asserted-by":"publisher","first-page":"600","DOI":"10.1007\/s00357-022-09420-0","volume":"39","author":"P D\u2019Urso","year":"2022","unstructured":"D\u2019Urso, P., & Vitale, V. (2022). A Kemeny distance-based robust fuzzy clustering for preference data. Journal of Classification, 39, 600\u2013647. https:\/\/doi.org\/10.1007\/s00357-022-09420-0","journal-title":"Journal of Classification"},{"issue":"2","key":"9483_CR22","doi-asserted-by":"publisher","first-page":"362","DOI":"10.2307\/2528096","volume":"21","author":"AWF Edwards","year":"1965","unstructured":"Edwards, A. W. F., & Cavalli-Sforza, L. L. (1965). A method for cluster analysis. Biometrics, 21(2), 362\u2013375. https:\/\/doi.org\/10.2307\/2528096","journal-title":"Biometrics"},{"key":"9483_CR23","doi-asserted-by":"publisher","unstructured":"Eggels, A., & Crommelin, D. (2019). Quantifying data dependencies with R\u00e9nyi mutual information and minimum spanning trees. Entropy, 21(2). https:\/\/doi.org\/10.3390\/e21020100","DOI":"10.3390\/e21020100"},{"key":"9483_CR24","doi-asserted-by":"publisher","first-page":"282","DOI":"10.4064\/cm-2-3-4-282-285","volume":"2","author":"K Florek","year":"1951","unstructured":"Florek, K., \u0141ukasiewicz, J., Perkal, J., Steinhaus, H., & Zubrzycki, S. (1951). Sur la liaison et la division des points d\u2019un ensemble fini. Colloquium Mathematicum, 2, 282\u2013285.","journal-title":"Colloquium Mathematicum"},{"issue":"12","key":"9483_CR25","doi-asserted-by":"publisher","first-page":"4743","DOI":"10.1007\/s10489-018-1238-7","volume":"48","author":"P Fr\u00e4nti","year":"2018","unstructured":"Fr\u00e4nti, P., & Sieranoja, S. (2018). K-means properties on six clustering benchmark datasets. Applied Intelligence, 48(12), 4743\u20134759.","journal-title":"Applied Intelligence"},{"issue":"5","key":"9483_CR26","doi-asserted-by":"publisher","first-page":"761","DOI":"10.1016\/j.patcog.2005.09.012","volume":"39","author":"P Fr\u00e4nti","year":"2006","unstructured":"Fr\u00e4nti, P., & Virmajoki, O. (2006). Iterative shrinking method for clustering problems. Pattern Recognition, 39(5), 761\u2013765.","journal-title":"Pattern Recognition"},{"key":"9483_CR27","doi-asserted-by":"crossref","unstructured":"Fr\u00e4nti, P., Virmajoki, O., & Hautam\u00e4ki, V. (2006). Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11).","DOI":"10.1109\/TPAMI.2006.227"},{"key":"9483_CR28","doi-asserted-by":"publisher","DOI":"10.1016\/j.softx.2021.100722","volume":"15","author":"M Gagolewski","year":"2021","unstructured":"Gagolewski, M. (2021). genieclust: Fast and robust hierarchical clustering. SoftwareX, 15, 100722. https:\/\/doi.org\/10.1016\/j.softx.2021.100722","journal-title":"SoftwareX"},{"key":"9483_CR29","doi-asserted-by":"publisher","unstructured":"Gagolewski, M. (2022). A framework for benchmarking clustering algorithms. SoftwareX, 20, 101270. Retrieved from https:\/\/clustering-benchmarks.gagolewski.com\/https:\/\/doi.org\/10.1016\/j.softx.2022.101270","DOI":"10.1016\/j.softx.2022.101270"},{"key":"9483_CR30","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1016\/j.ins.2016.05.003","volume":"363","author":"M Gagolewski","year":"2016","unstructured":"Gagolewski, M., Bartoszuk, M., & Cena, A. (2016). Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences, 363, 8\u201323.","journal-title":"Information Sciences"},{"key":"9483_CR31","doi-asserted-by":"publisher","first-page":"620","DOI":"10.1016\/j.ins.2021.10.004","volume":"581","author":"M Gagolewski","year":"2021","unstructured":"Gagolewski, M., Bartoszuk, M., & Cena, A. (2021). Are cluster validity measures (in)valid? Information Sciences, 581, 620\u2013636. https:\/\/doi.org\/10.1016\/j.ins.2021.10.004","journal-title":"Information Sciences"},{"key":"9483_CR32","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-022-00902-8","author":"T Gerald","year":"2023","unstructured":"Gerald, T., Zaatiti, H., Hajri, H., et al. (2023). A hyperbolic approach for learning communities on graphs. Data Mining and Knowledge Discovery. https:\/\/doi.org\/10.1007\/s10618-022-00902-8","journal-title":"Data Mining and Knowledge Discovery"},{"key":"9483_CR33","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1016\/S0167-7152(02)00421-2","volume":"62","author":"JM Gonz\u00e1lez-Barrios","year":"2003","unstructured":"Gonz\u00e1lez-Barrios, J. M., & Quiroz, A. J. (2003). A clustering procedure based on the comparison between the k nearest neighbors graph and the minimal spanning tree. Statistics & Probability Letters, 62, 23\u201334. https:\/\/doi.org\/10.1016\/S0167-7152(02)00421-2","journal-title":"Statistics & Probability Letters"},{"key":"9483_CR34","doi-asserted-by":"crossref","unstructured":"Gower, J.C., & Ross, G.J.S. (1969). Minimum spanning trees and single linkage cluster analysis. Journal of the Royal Statistical Society. Series C (Applied Statistics), 18(1), 54\u201364.","DOI":"10.2307\/2346439"},{"issue":"1","key":"9483_CR35","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1109\/MAHC.1985.10011","volume":"7","author":"R Graham","year":"1985","unstructured":"Graham, R., & Hell, P. (1985). On the history of the minimum spanning tree problem. Annals of the History of Computing, 7(1), 43\u201357.","journal-title":"Annals of the History of Computing"},{"key":"9483_CR36","doi-asserted-by":"publisher","first-page":"522","DOI":"10.1016\/j.fss.2009.10.021","volume":"161","author":"D Graves","year":"2010","unstructured":"Graves, D., & Pedrycz, W. (2010). Kernel-based fuzzy clustering: A comparative experimental study. Fuzzy Sets and Systems, 161, 522\u2013543.","journal-title":"Fuzzy Sets and Systems"},{"key":"9483_CR37","doi-asserted-by":"crossref","unstructured":"Grygorash, O., Zhou, Y., & Jorgensen, Z. (2006). Minimum spanning tree based clustering algorithms. Proceedings ICTAI\u201906 (pp. 1\u20139).","DOI":"10.1109\/ICTAI.2006.83"},{"key":"9483_CR38","doi-asserted-by":"publisher","DOI":"10.1016\/j.envres.2022.114877","volume":"217","author":"X Guo","year":"2023","unstructured":"Guo, X., Yang, Z., Li, C., Xiong, H., & Ma, C. (2023). Combining the classic vulnerability index and affinity propagation clustering algorithm to assess the intrinsic aquifer vulnerability of coastal aquifers on an integrated scale. Environmental Research, 217, 114877. https:\/\/doi.org\/10.1016\/j.envres.2022.114877","journal-title":"Environmental Research"},{"key":"9483_CR39","doi-asserted-by":"publisher","unstructured":"Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of Intelligent Information Systems, 107\u2013145,. https:\/\/doi.org\/10.1023\/A:1012801612483","DOI":"10.1023\/A:1012801612483"},{"key":"9483_CR40","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/j.patrec.2015.04.009","volume":"64","author":"C Hennig","year":"2015","unstructured":"Hennig, C. (2015). What are the true clusters? Pattern Recognition Letters, 64, 53\u201362. https:\/\/doi.org\/10.1016\/j.patrec.2015.04.009","journal-title":"Pattern Recognition Letters"},{"key":"9483_CR41","doi-asserted-by":"publisher","unstructured":"Hero III, A.O., & Michel, O. (1998). Robust entropy estimation strategies based on edge weighted random graphs. In: A. Mohammad-Djafari (Ed.), Bayesian inference for inverse problems (vol. 3459, pp. 250 \u2013 261). SPIE. https:\/\/doi.org\/10.1117\/12.323804","DOI":"10.1117\/12.323804"},{"issue":"93","key":"9483_CR42","first-page":"2949","volume":"16","author":"D Horta","year":"2015","unstructured":"Horta, D., & Campello, R. (2015). Comparing hard and overlapping clusterings. Journal of Machine Learning Research, 16(93), 2949\u20132997.","journal-title":"Journal of Machine Learning Research"},{"key":"9483_CR43","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/BF01908075","volume":"2","author":"L Hubert","year":"1985","unstructured":"Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193\u2013218.","journal-title":"Journal of Classification"},{"key":"9483_CR44","doi-asserted-by":"publisher","DOI":"10.1016\/j.metabol.2023.155514","volume":"141","author":"Y-C Hwang","year":"2023","unstructured":"Hwang, Y.-C., Ahn, H.-Y., Jun, J. E., Jeong, I.-K., Ahn, K. J., & Chung, H. Y. (2023). Subtypes of type 2 diabetes and their association with outcomes in Korean adults - A cluster analysis of community-based prospective cohort. Metabolism, 141, 155514. https:\/\/doi.org\/10.1016\/j.metabol.2023.155514","journal-title":"Metabolism"},{"key":"9483_CR45","doi-asserted-by":"crossref","unstructured":"Jackson, T., & Read, N. (2010a). Theory of minimum spanning trees. II. Exact graphical methods and perturbation expansion at the percolation threshold. Physical Review E, 81, 021131.","DOI":"10.1103\/PhysRevE.81.021131"},{"key":"9483_CR46","doi-asserted-by":"crossref","unstructured":"Jackson, T., & Read, N. (2010b). Theory of minimum spanning trees. I. Meanfield theory and strongly disordered spin-glass model. Physical Review E, 81, 021130.","DOI":"10.1103\/PhysRevE.81.021130"},{"issue":"3","key":"9483_CR47","doi-asserted-by":"publisher","DOI":"10.1002\/wics.1597","volume":"15","author":"A Jaeger","year":"2023","unstructured":"Jaeger, A., & Banks, D. (2023). Cluster analysis: A modern statistical review. Wiley Interdisciplinary Reviews: Computational Statistics, 15(3), e1597. https:\/\/doi.org\/10.1002\/wics.1597","journal-title":"Wiley Interdisciplinary Reviews: Computational Statistics"},{"key":"9483_CR48","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/11590316_1","volume":"3776","author":"A Jain","year":"2005","unstructured":"Jain, A., & Law, M. (2005). Data clustering: A user\u2019s dilemma. Lecture Notes in Computer Science, 3776, 1\u201310.","journal-title":"Lecture Notes in Computer Science"},{"key":"9483_CR49","first-page":"57","volume":"6","author":"V Jarn\u00edk","year":"1930","unstructured":"Jarn\u00edk, V. (1930). O jist\u00e9m probl\u00e9mu minim\u00e1ln\u00edm (z dopisu panu O. Bor\u016fvkovi). Pr\u00e1ce Moravsk\u00e9 P\u0159\u00edrodov\u011bdeck\u011b Spole\u010dnosti v Brn\u011b, 6, 57\u201363.","journal-title":"Pr\u00e1ce Moravsk\u00e9 P\u0159\u00edrodov\u011bdeck\u011b Spole\u010dnosti v Brn\u011b"},{"key":"9483_CR50","doi-asserted-by":"publisher","first-page":"1219","DOI":"10.1007\/s10618-022-00829-0","volume":"36","author":"P Jaskowiak","year":"2022","unstructured":"Jaskowiak, P., Costa, I., & Campello, R. (2022). The area under the ROC curve as a measure of clustering quality. Data Mining and Knowledge Discovery, 36, 1219\u20131245. https:\/\/doi.org\/10.1007\/s10618-022-00829-0","journal-title":"Data Mining and Knowledge Discovery"},{"issue":"8","key":"9483_CR51","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1109\/2.781637","volume":"32","author":"G Karypis","year":"1999","unstructured":"Karypis, G., Han, E., & Kumar, V. (1999). CHAMELEON: Hierarchical clustering using dynamic modeling. Computer, 32(8), 68\u201375. https:\/\/doi.org\/10.1109\/2.781637","journal-title":"Computer"},{"key":"9483_CR52","doi-asserted-by":"publisher","unstructured":"Kobren, A., Monath, N., Krishnamurthy, A., & McCallum, A. (2017). A hierarchical algorithm for extreme clustering. Proceedings 23rd ACM SIGKDD\u201917 (pp. 255\u2013264). https:\/\/doi.org\/10.1145\/3097983.3098079","DOI":"10.1145\/3097983.3098079"},{"key":"9483_CR53","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1090\/S0002-9939-1956-0078686-7","volume":"7","author":"JB Kruskal","year":"1956","unstructured":"Kruskal, J. B. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society, 7, 48\u201350.","journal-title":"Proceedings of the American Mathematical Society"},{"key":"9483_CR54","doi-asserted-by":"publisher","unstructured":"Lloyd, S. (1957). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28, 128\u2013137. (Originally a 1957 Bell Telephone Laboratories Research Report; republished in 1982) https:\/\/doi.org\/10.1109\/TIT.1982.1056489","DOI":"10.1109\/TIT.1982.1056489"},{"key":"9483_CR55","doi-asserted-by":"publisher","first-page":"194","DOI":"10.1016\/j.ins.2020.12.016","volume":"557","author":"Y Ma","year":"2021","unstructured":"Ma, Y., Lin, H., Wang, Y., Huang, H., & He, X. (2021). A multi-stage hierarchical clustering algorithm based on centroid of tree and cut edge constraint. Information Sciences, 557, 194\u2013219. https:\/\/doi.org\/10.1016\/j.ins.2020.12.016","journal-title":"Information Sciences"},{"key":"9483_CR56","doi-asserted-by":"crossref","unstructured":"March, W.B., Ram, P., & Gray, A.G. (2010). Fast Euclidean minimum spanning tree: Algorithm, analysis, and applications. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 603\u2013612). ACM.","DOI":"10.1145\/1835804.1835882"},{"issue":"4","key":"9483_CR57","doi-asserted-by":"publisher","first-page":"558","DOI":"10.1109\/72.238311","volume":"4","author":"TM Martinetz","year":"1993","unstructured":"Martinetz, T. M., Berkovich, S. G., & Schulten, K. J. (1993). \u2018Neural-gas\u2019 network for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks, 4(4), 558\u2013569.","journal-title":"IEEE Transactions on Neural Networks"},{"issue":"12","key":"9483_CR58","doi-asserted-by":"publisher","first-page":"1650","DOI":"10.1109\/TPAMI.2002.1114856","volume":"24","author":"U Maulik","year":"2002","unstructured":"Maulik, U., & Bandyopadhyay, S. (2002). Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1650\u20131654. https:\/\/doi.org\/10.1109\/TPAMI.2002.1114856","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"9483_CR59","doi-asserted-by":"publisher","unstructured":"McInnes, L., Healy, J., & Astels, S. (2017). hdbscan: Hierarchical density based clustering. The Journal of Open Source Software, 2(11), 205. https:\/\/doi.org\/10.21105\/joss.00205","DOI":"10.21105\/joss.00205"},{"issue":"2","key":"9483_CR60","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1007\/BF02294245","volume":"50","author":"GW Milligan","year":"1985","unstructured":"Milligan, G. W., & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2), 159\u2013179.","journal-title":"Psychometrika"},{"key":"9483_CR61","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1016\/j.eswa.2019.04.048","volume":"132","author":"G Mishra","year":"2019","unstructured":"Mishra, G., & Mohanty, S. K. (2019). A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree. Expert Systems with Applications, 132, 28\u201343. https:\/\/doi.org\/10.1016\/j.eswa.2019.04.048","journal-title":"Expert Systems with Applications"},{"issue":"4","key":"9483_CR62","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1093\/comjnl\/26.4.354","volume":"26","author":"F Murtagh","year":"1983","unstructured":"Murtagh, F. (1983). A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26(4), 354\u2013359.","journal-title":"The Computer Journal"},{"key":"9483_CR63","doi-asserted-by":"crossref","unstructured":"M\u00fcller, A., Nowozin, S., & Lampert, C. (2012). Information theoretic clustering using minimum spanning trees. Proceedings German Conference on Pattern Recognition. https:\/\/github.com\/amueller\/information-theoretic-mst","DOI":"10.1007\/978-3-642-32717-9_21"},{"key":"9483_CR64","unstructured":"M\u00fcllner, D. (2011). Modern hierarchical, agglomerative clustering algorithms. arXiv:1109.2378"},{"key":"9483_CR65","unstructured":"Naidan, B., Boytsov, L., Malkov, Y., & Novak, D. (2019). Non-metric space library (NMSLIB) manual, version 2.0 [Computer software manual]. Retrieved from https:\/\/github.com\/nmslib\/nmslib\/blob\/master\/manual\/latex\/manual.pdf"},{"key":"9483_CR66","unstructured":"P\u00e1l, D., P\u00f3czos, B., & Szepesv\u00e1ri, C. (2010). Estimation of r\u00e9nyi entropy and mutual information based on generalized nearest-neighbor graphs. Advances in Neural Information Processing Systems, 23."},{"key":"9483_CR67","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa, F., et al. (2011). scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825\u20132830.","journal-title":"Journal of Machine Learning Research"},{"key":"9483_CR68","doi-asserted-by":"publisher","unstructured":"Peter, S. (2013). Local density-based hierarchical clustering using minimum spanning tree. Journal of Discrete Mathematical Sciences and Cryptography, 16,. https:\/\/doi.org\/10.1080\/09720529.2013.778471","DOI":"10.1080\/09720529.2013.778471"},{"issue":"6","key":"9483_CR69","doi-asserted-by":"publisher","first-page":"1389","DOI":"10.1002\/j.1538-7305.1957.tb01515.x","volume":"36","author":"RC Prim","year":"1957","unstructured":"Prim, R. C. (1957). Shortest connection networks and some generalizations. Bell System Technical Journal, 36(6), 1389\u20131401. https:\/\/doi.org\/10.1002\/j.1538-7305.1957.tb01515.x","journal-title":"Bell System Technical Journal"},{"issue":"8","key":"9483_CR70","doi-asserted-by":"publisher","first-page":"2173","DOI":"10.1109\/TKDE.2016.2551240","volume":"28","author":"M Rezaei","year":"2016","unstructured":"Rezaei, M., & Fr\u00e4nti, P. (2016). Set matching measures for external cluster validity. IEEE Transactions on Knowledge and Data Engineering, 28(8), 2173\u20132186. https:\/\/doi.org\/10.1109\/TKDE.2016.2551240","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"9483_CR71","volume-title":"Pattern recognition and neural networks","author":"BD Ripley","year":"2007","unstructured":"Ripley, B. D. (2007). Pattern recognition and neural networks. Cambridge University Press."},{"issue":"6191","key":"9483_CR72","doi-asserted-by":"publisher","first-page":"1492","DOI":"10.1126\/science.124207","volume":"344","author":"A Rodriguez","year":"2014","unstructured":"Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492\u20131496. https:\/\/doi.org\/10.1126\/science.124207","journal-title":"Science"},{"key":"9483_CR73","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","volume":"20","author":"PJ Rousseeuw","year":"1987","unstructured":"Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53\u201365. https:\/\/doi.org\/10.1016\/0377-0427(87)90125-7","journal-title":"Journal of Computational and Applied Mathematics"},{"key":"9483_CR74","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316801","author":"PJ Rousseeuw","year":"1990","unstructured":"Rousseeuw, P. J., & Kaufman, L. (1990). Finding groups in data. Wiley. https:\/\/doi.org\/10.1002\/9780470316801","journal-title":"Wiley"},{"key":"9483_CR75","doi-asserted-by":"publisher","first-page":"551","DOI":"10.1016\/j.patrec.2019.10.019","volume":"128","author":"S Sieranoja","year":"2019","unstructured":"Sieranoja, S., & Fr\u00e4nti, P. (2019). Fast and general density peaks clustering. Pattern Recognition Letters, 128, 551\u2013558. https:\/\/doi.org\/10.1016\/j.patrec.2019.10.019","journal-title":"Pattern Recognition Letters"},{"issue":"1","key":"9483_CR76","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1099\/00221287-17-1-201","volume":"17","author":"P Sneath","year":"1957","unstructured":"Sneath, P. (1957). The application of computers to taxonomy. Journal of General Microbiology, 17(1), 201\u2013226. https:\/\/doi.org\/10.1099\/00221287-17-1-201","journal-title":"Journal of General Microbiology"},{"key":"9483_CR77","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1007\/s00357-005-0012-9","volume":"22","author":"G Szekely","year":"2005","unstructured":"Szekely, G., & Rizzo, M. (2005). Hierarchical clustering via joint betweenwithin distances: Extending Ward\u2019s minimum variance method. Journal of Classification, 22, 151\u2013183. https:\/\/doi.org\/10.1007\/s00357-005-0012-9","journal-title":"Journal of Classification"},{"key":"9483_CR78","doi-asserted-by":"publisher","unstructured":"Temple, J. (2023). Characteristics of distance matrices based on Euclidean, Manhattan and Hausdorff coefficients. Journal of Classification. https:\/\/doi.org\/10.1007\/s00357-023-09435-1","DOI":"10.1007\/s00357-023-09435-1"},{"key":"9483_CR79","doi-asserted-by":"publisher","DOI":"10.1016\/j.softx.2020.100642","volume":"13","author":"M Thrun","year":"2021","unstructured":"Thrun, M., & Stier, Q. (2021). Fundamental clustering algorithms suite. SoftwareX, 13, 100642. https:\/\/doi.org\/10.1016\/j.softx.2020.100642","journal-title":"Fundamental clustering algorithms suite. SoftwareX"},{"key":"9483_CR80","doi-asserted-by":"publisher","DOI":"10.1016\/j.dib.2020.105501","volume":"30","author":"M Thrun","year":"2020","unstructured":"Thrun, M., & Ultsch, A. (2020). Clustering benchmark datasets exploiting the fundamental clustering problems. Data in Brief, 30, 105501. https:\/\/doi.org\/10.1016\/j.dib.2020.105501","journal-title":"Data in Brief"},{"issue":"3","key":"9483_CR81","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1444","volume":"12","author":"T Ullmann","year":"2022","unstructured":"Ullmann, T., Hennig, C., & Boulesteix, A.-L. (2022). Validation of cluster analysis results on validation data: A systematic framework. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 12(3), e1444. https:\/\/doi.org\/10.1002\/widm.1444","journal-title":"Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery"},{"key":"9483_CR82","unstructured":"Ultsch, A. (2005). Clustering with SOM: U*C. Workshop on self-organizing maps (pp. 75\u201382). WSOM 2005."},{"key":"9483_CR83","doi-asserted-by":"publisher","first-page":"353","DOI":"10.1007\/s41237-018-0075-7","volume":"46","author":"H van der Hoef","year":"2019","unstructured":"van der Hoef, H., & Warrens, M. (2019). Understanding information theoretic measures for comparing clusterings. Behaviormetrika, 46, 353\u2013370. https:\/\/doi.org\/10.1007\/s41237-018-0075-7","journal-title":"Behaviormetrika"},{"key":"9483_CR84","doi-asserted-by":"publisher","unstructured":"van Mechelen, I., Boulesteix, A.-L., Dangl, R., et al. (2023). A white paper on good research practices in benchmarking: The case of cluster analysis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1511. https:\/\/doi.org\/10.1002\/widm.1511","DOI":"10.1002\/widm.1511"},{"key":"9483_CR85","unstructured":"von Luxburg, U., Williamson, R., & Guyon, I. (2012). Clustering: Science or art? In: I. Guyon et al. (Eds.), Proceedings ICML Workshop on Unsupervised and Transfer Learning (vol. 27, pp. 65\u201379)."},{"key":"9483_CR86","unstructured":"Wagner, S., & Wagner, D. (2006). Comparing clusterings \u2013 An overview(Tech. Rep. No. 2006-04). Faculty of Informatics, Universit\u00e4t Karlsruhe (TH)."},{"issue":"7","key":"9483_CR87","doi-asserted-by":"publisher","first-page":"945","DOI":"10.1109\/TKDE.2009.37","volume":"21","author":"X Wang","year":"2009","unstructured":"Wang, X., Wang, X., & Wilkes, D. M. (2009). A divide-and-conquer approach for minimum spanning tree-based clustering. IEEE Transactions on Knowledge and Data Engineering, 21(7), 945\u2013958.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"9483_CR88","doi-asserted-by":"publisher","unstructured":"Wang, X., & Xu, Y. (2015). Fast clustering using adaptive density peak detection. Statistical Methods in Medical Research, 26(6). https:\/\/doi.org\/10.1177\/0962280215609948","DOI":"10.1177\/0962280215609948"},{"issue":"301","key":"9483_CR89","doi-asserted-by":"publisher","first-page":"236","DOI":"10.1080\/01621459.1963.10500845","volume":"58","author":"JH Ward Jr","year":"1963","unstructured":"Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236\u2013244. https:\/\/doi.org\/10.1080\/01621459.1963.10500845","journal-title":"Journal of the American Statistical Association"},{"key":"9483_CR90","doi-asserted-by":"publisher","first-page":"387","DOI":"10.1007\/s00357-022-09413-z","volume":"39","author":"M Warrens","year":"2022","unstructured":"Warrens, M., & van der Hoef, H. (2022). Understanding the adjusted Rand index and other partition comparison indices based on counting object pairs. Journal of Classification, 39, 387\u2013509. https:\/\/doi.org\/10.1007\/s00357-022-09413-z","journal-title":"Journal of Classification"},{"key":"9483_CR91","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-69308-8","volume-title":"Modern algorithms of cluster analysis","author":"S Wierzcho\u0144","year":"2018","unstructured":"Wierzcho\u0144, S., & K\u0142opotek, M. (2018). Modern algorithms of cluster analysis. Springer."},{"key":"9483_CR92","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2020.113367","volume":"151","author":"Q Xu","year":"2020","unstructured":"Xu, Q., Zhang, Q., Liu, J., & Luo, B. (2020). Efficient synthetical clustering validity indexes for hierarchical clustering. Expert Systems with Applications, 151, 113367. https:\/\/doi.org\/10.1016\/j.eswa.2020.113367","journal-title":"Expert Systems with Applications"},{"issue":"2","key":"9483_CR93","doi-asserted-by":"publisher","first-page":"536","DOI":"10.1093\/bioinformatics\/18.4.536","volume":"18","author":"Y Xu","year":"2002","unstructured":"Xu, Y., Olman, V., & Xu, D. (2002). Clustering gene expression data using a graph-theoretic approach: An application of minimum spanning trees. Bioinformatics, 18(2), 536\u2013545.","journal-title":"Bioinformatics"},{"issue":"12","key":"9483_CR94","doi-asserted-by":"publisher","first-page":"3146","DOI":"10.1016\/j.patcog.2008.12.013","volume":"42","author":"F Yin","year":"2009","unstructured":"Yin, F., & Liu, C.-L. (2009). Handwritten Chinese text line segmentation by clustering with distance metric learning. Pattern Recognition, 42(12), 3146\u20133157. https:\/\/doi.org\/10.1016\/j.patcog.2008.12.013","journal-title":"Pattern Recognition"},{"key":"9483_CR95","doi-asserted-by":"crossref","unstructured":"Zahn, C. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, C-20(1), 68\u201386.","DOI":"10.1109\/T-C.1971.223083"},{"key":"9483_CR96","doi-asserted-by":"crossref","unstructured":"Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: An efficient data clustering method for large databases. Proceedings ACM SIGMOD International Conference on Management of data \u2013 SIGMOND \u201996 (pp. 103\u2013114).","DOI":"10.1145\/233269.233324"},{"key":"9483_CR97","doi-asserted-by":"publisher","unstructured":"Zhao, W., Ma, J., Liu, Q., & et al. (2023). Comparison and application of SOFM, fuzzy c-means and k-means clustering algorithms for natural soil environment regionalization in China. Environmental Research, 216, 114519. https:\/\/doi.org\/10.1016\/j.envres.2022.114519","DOI":"10.1016\/j.envres.2022.114519"},{"key":"9483_CR98","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.ins.2014.10.012","volume":"205","author":"C Zhong","year":"2015","unstructured":"Zhong, C., Malinen, M., Miao, D., & Fr\u00e4nti, P. (2015). A fast minimum spanning tree algorithm based on k-means. Information Sciences, 205, 1\u201317. https:\/\/doi.org\/10.1016\/j.ins.2014.10.012","journal-title":"Information Sciences"},{"key":"9483_CR99","doi-asserted-by":"publisher","first-page":"3397","DOI":"10.1016\/j.ins.2011.04.013","volume":"181","author":"C Zhong","year":"2011","unstructured":"Zhong, C., Miao, D., & Fr\u00e4nti, P. (2011). Minimum spanning tree based splitand- merge: A hierarchical clustering method. Information Sciences, 181, 3397\u20133410. https:\/\/doi.org\/10.1016\/j.ins.2011.04.013","journal-title":"Information Sciences"},{"issue":"3","key":"9483_CR100","doi-asserted-by":"publisher","first-page":"752","DOI":"10.1016\/j.patcog.2009.07.010","volume":"43","author":"C Zhong","year":"2010","unstructured":"Zhong, C., Miao, D., & Wang, R. (2010). A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recognition, 43(3), 752\u2013766. https:\/\/doi.org\/10.1016\/j.patcog.2009.07.010","journal-title":"Pattern Recognition"},{"key":"9483_CR101","doi-asserted-by":"publisher","DOI":"10.1016\/j.dcan.2023.01.010","author":"H Zhou","year":"2023","unstructured":"Zhou, H., Bai, J., Wang, Y., Ren, J., Yang, X., & Jiao, L. (2023). Deep radio signal clustering with interpretability analysis based on saliency map. Digital Communications and Networks. https:\/\/doi.org\/10.1016\/j.dcan.2023.01.010","journal-title":"Digital Communications and Networks"}],"container-title":["Journal of Classification"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00357-024-09483-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00357-024-09483-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00357-024-09483-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T08:10:45Z","timestamp":1742371845000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00357-024-09483-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,6]]},"references-count":101,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3]]}},"alternative-id":["9483"],"URL":"https:\/\/doi.org\/10.1007\/s00357-024-09483-1","relation":{},"ISSN":["0176-4268","1432-1343"],"issn-type":[{"value":"0176-4268","type":"print"},{"value":"1432-1343","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,6]]},"assertion":[{"value":"19 June 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 July 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of Interest"}}]}}