{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,5]],"date-time":"2026-04-05T08:10:13Z","timestamp":1775376613281,"version":"3.50.1"},"reference-count":104,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2023,9,29]],"date-time":"2023-09-29T00:00:00Z","timestamp":1695945600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,9,29]],"date-time":"2023-09-29T00:00:00Z","timestamp":1695945600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100010571","name":"Bundesministerium f\u00fcr Bildung, Wissenschaft, Forschung und Technologie","doi-asserted-by":"publisher","award":["01IS18036A"],"award-info":[{"award-number":["01IS18036A"]}],"id":[{"id":"10.13039\/501100010571","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010571","name":"Bundesministerium f\u00fcr Bildung, Wissenschaft, Forschung und Technologie","doi-asserted-by":"publisher","award":["01IS18036A"],"award-info":[{"award-number":["01IS18036A"]}],"id":[{"id":"10.13039\/501100010571","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005722","name":"Ludwig-Maximilians-Universit\u00e4t M\u00fcnchen","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005722","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Min Knowl Disc"],"published-print":{"date-parts":[[2024,5]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We discuss topological aspects of cluster analysis and show that inferring the topological structure of a dataset before clustering it can considerably enhance cluster detection: we show that clustering embedding vectors representing the inherent structure of a dataset instead of the observed feature vectors themselves is highly beneficial. To demonstrate, we combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN. Synthetic and real data results show that this both simplifies and improves clustering in a diverse set of low- and high-dimensional problems including clusters of varying density and\/or entangled shapes. Our approach simplifies clustering because topological pre-processing consistently reduces parameter sensitivity of DBSCAN. Clustering the resulting embeddings with DBSCAN can then even outperform complex methods such as SPECTACL and ClusterGAN. Finally, our investigation suggests that the crucial issue in clustering does not appear to be the nominal dimension of the data or how many irrelevant features it contains, but rather how <jats:italic>separable<\/jats:italic> the clusters are in the ambient observation space they are embedded in, which is usually the (high-dimensional) Euclidean space defined by the features of the data. The approach is successful because it performs the cluster analysis after projecting the data into a more suitable space that is optimized for separability, in some sense.<\/jats:p>","DOI":"10.1007\/s10618-023-00980-2","type":"journal-article","created":{"date-parts":[[2023,9,29]],"date-time":"2023-09-29T15:02:12Z","timestamp":1695999732000},"page":"840-887","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Enhancing cluster analysis via topological manifold learning"],"prefix":"10.1007","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4893-5812","authenticated-orcid":false,"given":"Moritz","family":"Herrmann","sequence":"first","affiliation":[]},{"given":"Daniyal","family":"Kazempour","sequence":"additional","affiliation":[]},{"given":"Fabian","family":"Scheipl","sequence":"additional","affiliation":[]},{"given":"Peer","family":"Kr\u00f6ger","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,9,29]]},"reference":[{"issue":"8","key":"980_CR1","doi-asserted-by":"publisher","first-page":"1065","DOI":"10.1016\/0031-3203(94)90145-7","volume":"27","author":"S Aeberhard","year":"1994","unstructured":"Aeberhard S, Coomans D, de Vel O (1994) Comparative analysis of statistical pattern recognition methods in high dimensional settings. Pattern Recognit 27(8):1065\u20131077. https:\/\/doi.org\/10.1016\/0031-3203(94)90145-7","journal-title":"Pattern Recognit"},{"key":"980_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1201\/9781315373515","volume-title":"Data clustering","author":"CC Aggarwal","year":"2014","unstructured":"Aggarwal CC (2014) An introduction to cluster analysis. In: Aggarwal CC, Reddy CK (eds) Data clustering, 1st edn. Chapman and Hall\/CRC, Boca Raton, pp 1\u201328. https:\/\/doi.org\/10.1201\/9781315373515","edition":"1"},{"key":"980_CR3","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-14142-8","volume-title":"Data mining: the textbook","author":"CC Aggarwal","year":"2015","unstructured":"Aggarwal CC (2015) Data mining: the textbook. Springer, Cham. https:\/\/doi.org\/10.1007\/978-3-319-14142-8"},{"key":"980_CR4","doi-asserted-by":"publisher","DOI":"10.1201\/9781315373515","volume-title":"Data clustering: Algorithms and Applications","author":"CC Aggarwal","year":"2014","unstructured":"Aggarwal CC, Reddy CK (2014) Data clustering: Algorithms and Applications. Chapman and Hall\/CRC, Boca Raton. https:\/\/doi.org\/10.1201\/9781315373515"},{"key":"980_CR5","unstructured":"Ala\u0131z CM, Fern\u00e1ndez \u00c1, Dorronsoro JR (2015) Diffusion maps parameters selection based on neighbourhood preservation. Comput Intell 6"},{"issue":"1","key":"980_CR6","first-page":"1","volume":"9","author":"F Alimo\u011flu","year":"2001","unstructured":"Alimo\u011flu F, Alpaydin E (2001) Combining multiple representations for pen-based handwritten digit recognition. Turk J Electr Eng Comp Sci 9(1):1\u201312","journal-title":"Turk J Electr Eng Comp Sci"},{"key":"980_CR7","doi-asserted-by":"publisher","first-page":"317","DOI":"10.1007\/978-3-030-51935-3_34","volume-title":"International conference on image and signal processing","author":"M Allaoui","year":"2020","unstructured":"Allaoui M, Kherfi ML, Cheriet A (2020) Considerably improving clustering algorithms using UMAP dimensionality reduction technique: a comparative study. International conference on image and signal processing. Springer, Cham, pp 317\u2013325. https:\/\/doi.org\/10.1007\/978-3-030-51935-3_34"},{"key":"980_CR8","first-page":"2","volume":"59","author":"E Anderson","year":"1935","unstructured":"Anderson E (1935) The irises of the gasp\u00e9 peninsula. Bull Am Iris Soc 59:2\u20135","journal-title":"Bull Am Iris Soc"},{"issue":"2","key":"980_CR9","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1145\/304181.304187","volume":"28","author":"M Ankerst","year":"1999","unstructured":"Ankerst M, Breunig MM, Kriegel HP et al (1999) OPTICS: ordering points to identify the clustering structure. ACM SIGMOD Rec 28(2):49\u201360. https:\/\/doi.org\/10.1145\/304181.304187","journal-title":"ACM SIGMOD Rec"},{"issue":"9","key":"980_CR10","first-page":"1","volume":"18","author":"E Arias-Castro","year":"2017","unstructured":"Arias-Castro E, Lerman G, Zhang T (2017) Spectral clustering based on local PCA. J Mach Learn Res 18(9):1\u201357","journal-title":"J Mach Learn Res"},{"issue":"4","key":"980_CR11","doi-asserted-by":"publisher","first-page":"340","DOI":"10.1002\/widm.1062","volume":"2","author":"I Assent","year":"2012","unstructured":"Assent I (2012) Clustering high dimensional data. WIREs Data Min Knowl Discov 2(4):340\u2013350. https:\/\/doi.org\/10.1002\/widm.1062","journal-title":"WIREs Data Min Knowl Discov"},{"key":"980_CR12","doi-asserted-by":"publisher","DOI":"10.1145\/3299876","author":"T Barton","year":"2019","unstructured":"Barton T, Bruna T, Kordik P (2019) Chameleon 2: an improved graph-based clustering algorithm. ACM Trans Knowl Discov Data. https:\/\/doi.org\/10.1145\/3299876","journal-title":"ACM Trans Knowl Discov Data"},{"key":"980_CR13","unstructured":"Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Dietterich T, Becker S, Ghahramani Z (eds) Advances in neural information processing systems, vol 14. MIT Press, Cambridge https:\/\/proceedings.neurips.cc\/paper\/2001\/file\/f106b7f99d2cb30c3db1c3cc0fde9ccb-Paper.pdf"},{"issue":"1","key":"980_CR14","doi-asserted-by":"publisher","first-page":"5415","DOI":"10.1038\/s41467-019-13055-y","volume":"10","author":"AC Belkina","year":"2019","unstructured":"Belkina AC, Ciccolella CO, Anno R et al (2019) Automated optimized parameters for t-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Nature Commun 10(1):5415","journal-title":"Nature Commun"},{"key":"980_CR15","unstructured":"Ben-David S, Ackerman M (2008) Measures of clustering quality: a working set of axioms for clustering. In: Koller D, Schuurmans D, Bengio Y, et\u00a0al (eds) Advances in neural information processing systems, vol\u00a021. Curran Associates, Inc., https:\/\/proceedings.neurips.cc\/paper\/2008\/hash\/beed13602b9b0e6ecb5b568ff5058f07-Abstract.html"},{"issue":"8","key":"980_CR16","doi-asserted-by":"publisher","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","volume":"35","author":"Y Bengio","year":"2013","unstructured":"Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798\u20131828. https:\/\/doi.org\/10.1109\/TPAMI.2013.50","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"980_CR17","doi-asserted-by":"publisher","unstructured":"Beyer K, Goldstein J, Ramakrishnan R, et\u00a0al (1999) When is \u201cnearest neighbor\u201d meaningful? In: Beeri C, Buneman P (eds) Database Theory - ICDT\u201999. ICDT 1999. Lecture Notes in Computer Science, vol 1540. Springer, Berlin, pp 217\u2013235, https:\/\/doi.org\/10.1007\/3-540-49257-7_15","DOI":"10.1007\/3-540-49257-7_15"},{"issue":"2","key":"980_CR18","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1484","volume":"13","author":"B Bischl","year":"2023","unstructured":"Bischl B, Binder M, Lang M et al (2023) Hyperparameter optimization: foundations, algorithms, best practices, and open challenges. Wiley Interdiscip Rev Data Min Knowl Discov 13(2):e1484","journal-title":"Wiley Interdiscip Rev Data Min Knowl Discov"},{"key":"980_CR19","doi-asserted-by":"publisher","DOI":"10.1017\/9781108755528","volume-title":"Foundations of data science","author":"A Blum","year":"2020","unstructured":"Blum A, Hopcroft J, Kannan R (2020) Foundations of data science. Cambridge University Press, Cambridge"},{"key":"980_CR20","doi-asserted-by":"publisher","unstructured":"Boudiaf M, Rony J, Ziko IM, et\u00a0al (2020) A unifying mutual information view of metric learning: cross-entropy versus pairwise losses. In: Vedaldi A, Bischof H, Brox T, et\u00a0al (eds) Computer Vision - ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, vol 12351. Springer, Cham, pp 548\u2013564, https:\/\/doi.org\/10.1007\/978-3-030-58539-6_33","DOI":"10.1007\/978-3-030-58539-6_33"},{"issue":"3","key":"980_CR21","first-page":"77","volume":"16","author":"P Bubenik","year":"2015","unstructured":"Bubenik P (2015) Statistical topological data analysis using persistence landscapes. J Mach Learn Res 16(3):77\u2013102","journal-title":"J Mach Learn Res"},{"key":"980_CR22","doi-asserted-by":"publisher","unstructured":"Campello RJ, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Pei J, Tseng V, Cao L, et\u00a0al (eds) Advances in knowledge discovery and data mining. PAKDD 2013. Lecture Notes in Computer Science, vol 7819. Springer, Berlin, Heidelberg, pp 160\u2013172, https:\/\/doi.org\/10.1007\/978-3-642-37456-2_14","DOI":"10.1007\/978-3-642-37456-2_14"},{"key":"980_CR23","unstructured":"Cayton L (2005) Algorithms for manifold learning. University of California at San Diego, Tech. rep"},{"key":"980_CR24","doi-asserted-by":"publisher","DOI":"10.3389\/frai.2021.667963","author":"F Chazal","year":"2021","unstructured":"Chazal F, Michel B (2021) An introduction to topological data analysis: Fundamental and practical aspects for data scientists. Front Artif Intell. https:\/\/doi.org\/10.3389\/frai.2021.667963","journal-title":"Front Artif Intell"},{"issue":"485","key":"980_CR25","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1198\/jasa.2009.0111","volume":"104","author":"L Chen","year":"2009","unstructured":"Chen L, Buja A (2009) Local multidimensional scaling for nonlinear dimension reduction, graph drawing, and proximity analysis. J Am Stat Assoc 104(485):209\u2013219. https:\/\/doi.org\/10.1198\/jasa.2009.0111","journal-title":"J Am Stat Assoc"},{"key":"980_CR26","doi-asserted-by":"crossref","unstructured":"Cohen-Addad V, Schwiegelshohn C (2017) On the local structure of stable clustering instances. arXiv preprint arXiv:1701.08423","DOI":"10.1109\/FOCS.2017.14"},{"key":"980_CR27","unstructured":"Dalmia A, Sia S (2021) Clustering with UMAP: why and how connectivity matters. arXiv preprint https:\/\/arxiv.org\/abs\/2108.05525"},{"key":"980_CR28","doi-asserted-by":"publisher","first-page":"224","DOI":"10.1109\/TPAMI.1979.4766909","volume":"2","author":"DL Davies","year":"1979","unstructured":"Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224\u2013227","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"980_CR29","unstructured":"Debatty T, Michiardi P, Mees W, et\u00a0al (2014) Determining the k in k-means with MapReduce. In: EDBT\/ICDT 2014 Joint Conference, Ath\u00e8nes, Greece, proceedings of the workshops of the EDBT\/ICDT 2014 joint conference, https:\/\/hal.archives-ouvertes.fr\/hal-01525708"},{"issue":"1","key":"980_CR30","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","volume":"39","author":"AP Dempster","year":"1977","unstructured":"Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B Stat Methodol 39(1):1\u201322","journal-title":"J R Stat Soc Series B Stat Methodol"},{"issue":"2","key":"980_CR31","doi-asserted-by":"publisher","first-page":"561","DOI":"10.1109\/TVCG.2020.3030441","volume":"27","author":"H Doraiswamy","year":"2021","unstructured":"Doraiswamy H, Tierny J, Silva PJ et al (2021) TopoMap: a 0-dimensional homology preserving projection of high-dimensional data. IEEE Trans Vis Comput Graph 27(2):561\u2013571. https:\/\/doi.org\/10.1109\/TVCG.2020.3030441","journal-title":"IEEE Trans Vis Comput Graph"},{"key":"980_CR32","unstructured":"Dua D, Graff C (2017) UCI machine learning repository. http:\/\/archive.ics.uci.edu\/ml"},{"key":"980_CR33","unstructured":"Ester M, Kriegel HP, Sander J, et\u00a0al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 226\u2013231"},{"issue":"3","key":"980_CR34","doi-asserted-by":"publisher","first-page":"601","DOI":"10.1137\/18M1209854","volume":"49","author":"D Feldman","year":"2020","unstructured":"Feldman D, Schmidt M, Sohler C (2020) Turning big data into tiny data: constant-size coresets for k-means, PCA, and projective clustering. SIAM J Comput 49(3):601\u2013657. https:\/\/doi.org\/10.1137\/18M1209854","journal-title":"SIAM J Comput"},{"issue":"2","key":"980_CR35","doi-asserted-by":"publisher","first-page":"179","DOI":"10.1111\/j.1469-1809.1936.tb02137.x","volume":"7","author":"RA Fisher","year":"1936","unstructured":"Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179\u2013188","journal-title":"Ann Eugen"},{"key":"980_CR36","unstructured":"Forina M, Leard R, Armanino C, et\u00a0al (1988) Parvus: an extendible package for data exploration, classification and correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy"},{"key":"980_CR37","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-13-0553-5","volume-title":"An introduction to clustering with R","author":"P Giordani","year":"2020","unstructured":"Giordani P, Ferraro MB, Martella F (2020) An introduction to clustering with R, 1st edn. Springer, Singapore. https:\/\/doi.org\/10.1007\/978-981-13-0553-5","edition":"1"},{"key":"980_CR38","doi-asserted-by":"publisher","unstructured":"Goebl S, He X, Plant C, et\u00a0al (2014) Finding the optimal subspace for clustering. In: 2014 IEEE international conference on data mining, pp 130\u2013139, https:\/\/doi.org\/10.1109\/ICDM.2014.34","DOI":"10.1109\/ICDM.2014.34"},{"key":"980_CR39","unstructured":"Guan S, Loew M (2021) A novel intrinsic measure of data separability. arXiv preprint https:\/\/arxiv.org\/abs\/2109.05180"},{"key":"980_CR40","unstructured":"Hamerly G, Elkan C (2003) Learning the k in k-means. In: Thrun S, Saul L, Sch\u00f6lkopf B (eds) Advances in neural information processing systems, vol\u00a016. MIT Press, https:\/\/proceedings.neurips.cc\/paper\/2003\/file\/234833147b97bb6aed53a8f4f1c7a7d8-Paper.pdf"},{"key":"980_CR41","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The elements of statistical learning: data mining, inference and prediction","author":"T Hastie","year":"2009","unstructured":"Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, New York. https:\/\/doi.org\/10.1007\/978-0-387-84858-7","edition":"2"},{"key":"980_CR42","doi-asserted-by":"publisher","DOI":"10.1201\/b19706","volume-title":"Handbook of cluster analysis","author":"C Hennig","year":"2015","unstructured":"Hennig C, Meila M, Murtagh F et al (2015) Handbook of cluster analysis, 1st edn. Chapman and Hall\/CRC, New York. https:\/\/doi.org\/10.1201\/b19706","edition":"1"},{"key":"980_CR43","unstructured":"Herrmann M, Scheipl F (2020) Unsupervised functional data analysis via nonlinear dimension reduction. arXiv preprint arXiv:2012.11987"},{"issue":"4","key":"980_CR44","doi-asserted-by":"publisher","first-page":"971","DOI":"10.3390\/stats4040057","volume":"4","author":"M Herrmann","year":"2021","unstructured":"Herrmann M, Scheipl F (2021) A geometric perspective on functional outlier detection. Stats 4(4):971\u20131011. https:\/\/doi.org\/10.3390\/stats4040057","journal-title":"Stats"},{"key":"980_CR45","doi-asserted-by":"publisher","unstructured":"Hess S, Duivesteijn W, Honysz P, et\u00a0al (2019) The SpectACl of nonconvex clustering: a spectral approach to density-based clustering. In: Proceedings of the AAAI conference on artificial intelligence, pp 3788\u20133795, https:\/\/doi.org\/10.1609\/aaai.v33i01.33013788","DOI":"10.1609\/aaai.v33i01.33013788"},{"issue":"2","key":"980_CR46","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1093\/oxfordjournals.aob.a083391","volume":"18","author":"B Hopkins","year":"1954","unstructured":"Hopkins B, Skellam JG (1954) A new method for determining the type of distribution of plant individuals. Ann Bot 18(2):213\u2013227","journal-title":"Ann Bot"},{"issue":"104","key":"980_CR47","doi-asserted-by":"publisher","first-page":"264","DOI":"10.1016\/j.compbiomed.2021.104264","volume":"131","author":"Y Hozumi","year":"2021","unstructured":"Hozumi Y, Wang R, Yin C et al (2021) UMAP-assisted k-means clustering of large-scale SARS-CoV-2 mutation datasets. Comput Biol Med 131(104):264. https:\/\/doi.org\/10.1016\/j.compbiomed.2021.104264","journal-title":"Comput Biol Med"},{"issue":"1","key":"980_CR48","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/BF01908075","volume":"2","author":"L Hubert","year":"1985","unstructured":"Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193\u2013218. https:\/\/doi.org\/10.1007\/BF01908075","journal-title":"J Classif"},{"issue":"8","key":"980_CR49","doi-asserted-by":"publisher","first-page":"651","DOI":"10.1016\/j.patrec.2009.09.011","volume":"31","author":"AK Jain","year":"2010","unstructured":"Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651\u2013666. https:\/\/doi.org\/10.1016\/j.patrec.2009.09.011","journal-title":"Pattern Recognit Lett"},{"issue":"3","key":"980_CR50","doi-asserted-by":"publisher","first-page":"264","DOI":"10.1145\/331499.331504","volume":"31","author":"AK Jain","year":"1999","unstructured":"Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264\u2013323. https:\/\/doi.org\/10.1145\/331499.331504","journal-title":"ACM Comput Surv"},{"key":"980_CR51","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1016\/j.cmpb.2016.11.011","volume":"140","author":"IE Kaya","year":"2017","unstructured":"Kaya IE, Pehlivanl\u0131 A\u00c7, Sekizkarde\u015f EG et al (2017) PCA based clustering for brain tumor segmentation of T1w MRI images. Comput Methods Programs Biomed 140:19\u201328. https:\/\/doi.org\/10.1016\/j.cmpb.2016.11.011","journal-title":"Comput Methods Programs Biomed"},{"issue":"2","key":"980_CR52","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1038\/s41587-020-00809-z","volume":"39","author":"D Kobak","year":"2021","unstructured":"Kobak D, Linderman GC (2021) Initialization is critical for preserving global data structure in both t-SNE and UMAP. Nat Biotechnol 39(2):156\u2013157. https:\/\/doi.org\/10.1038\/s41587-020-00809-z","journal-title":"Nat Biotechnol"},{"issue":"1","key":"980_CR53","doi-asserted-by":"publisher","first-page":"342","DOI":"10.32614\/RJ-2018-039","volume":"10","author":"G Kraemer","year":"2018","unstructured":"Kraemer G, Reichstein M, Mahecha MD (2018) dimRed and coRanking - Unifying Dimensionality Reduction in R. R J 10(1):342. https:\/\/doi.org\/10.32614\/RJ-2018-039","journal-title":"R J"},{"issue":"1145\/1497577","key":"980_CR54","first-page":"1497578","volume":"10","author":"HP Kriegel","year":"2009","unstructured":"Kriegel HP, Kr\u00f6ger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data doi 10(1145\/1497577):1497578","journal-title":"ACM Trans Knowl Discov Data doi"},{"issue":"3","key":"980_CR55","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1002\/widm.30","volume":"1","author":"HP Kriegel","year":"2011","unstructured":"Kriegel HP, Kr\u00f6ger P, Sander J et al (2011) Density-based clustering. WIREs Data Min Knowl Discov 1(3):231\u2013240. https:\/\/doi.org\/10.1002\/widm.30","journal-title":"WIREs Data Min Knowl Discov"},{"key":"980_CR56","doi-asserted-by":"publisher","unstructured":"Lecun Y, Bottou L, Bengio Y, et\u00a0al (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp 2278\u20132324, https:\/\/doi.org\/10.1109\/5.726791","DOI":"10.1109\/5.726791"},{"key":"980_CR57","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-39351-3","volume-title":"Nonlinear dimensionality reduction","author":"JA Lee","year":"2007","unstructured":"Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction, 1st edn. Springer, New York. https:\/\/doi.org\/10.1007\/978-0-387-39351-3","edition":"1"},{"key":"980_CR58","unstructured":"Lee JA, Verleysen M (2008) Quality assessment of nonlinear dimensionality reduction based on K-ary neighborhoods. In: Saeys Y, Liu H, Inza I, et\u00a0al (eds) Proceedings of the workshop on new challenges for feature selection in data mining and knowledge discovery at ECML\/PKDD 2008, proceedings of machine learning research, vol\u00a04. PMLR, Antwerp, Belgium, pp 21\u201335"},{"issue":"7\u20139","key":"980_CR59","doi-asserted-by":"publisher","first-page":"1431","DOI":"10.1016\/j.neucom.2008.12.017","volume":"72","author":"JA Lee","year":"2009","unstructured":"Lee JA, Verleysen M (2009) Quality assessment of dimensionality reduction: rank-based criteria. Neurocomputing 72(7\u20139):1431\u20131443. https:\/\/doi.org\/10.1016\/j.neucom.2008.12.017","journal-title":"Neurocomputing"},{"issue":"1","key":"980_CR60","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1002\/sam.11445","volume":"13","author":"J Liang","year":"2020","unstructured":"Liang J, Chenouri S, Small CG (2020) A new method for performance analysis in nonlinear dimensionality reduction. Stat Anal Data Min ASA Data Sci J 13(1):98\u2013108. https:\/\/doi.org\/10.1002\/sam.11445","journal-title":"Stat Anal Data Min ASA Data Sci J"},{"issue":"2","key":"980_CR61","doi-asserted-by":"publisher","first-page":"313","DOI":"10.1137\/18M1216134","volume":"1","author":"GC Linderman","year":"2019","unstructured":"Linderman GC, Steinerberger S (2019) Clustering with t-sne, provably. SIAM J Math Data Sci 1(2):313\u2013332. https:\/\/doi.org\/10.1137\/18M1216134","journal-title":"SIAM J Math Data Sci"},{"key":"980_CR62","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1201\/9781315373515","volume-title":"Data clustering","author":"J Liu","year":"2014","unstructured":"Liu J, Han J (2014) Spectral clustering. In: Aggarwal CC, Reddy CK (eds) Data clustering, 1st edn. Chapman and Hall\/CRC, Boca Raton, pp 177\u2013200. https:\/\/doi.org\/10.1201\/9781315373515","edition":"1"},{"issue":"2","key":"980_CR63","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","volume":"28","author":"S Lloyd","year":"1982","unstructured":"Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129\u2013137. https:\/\/doi.org\/10.1109\/TIT.1982.1056489","journal-title":"IEEE Trans Inf Theory"},{"key":"980_CR64","doi-asserted-by":"publisher","DOI":"10.1201\/b11431","volume-title":"Manifold learning theory and applications","year":"2012","unstructured":"Ma Y, Fu Y (eds) (2012) Manifold learning theory and applications, vol 434, 1st edn. CRC Press, Boca Raton. https:\/\/doi.org\/10.1201\/b11431","edition":"1"},{"key":"980_CR65","doi-asserted-by":"publisher","unstructured":"Mautz D, Ye W, Plant C, et\u00a0al (2017) Towards an optimal subspace for k-means. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 365\u2013373, https:\/\/doi.org\/10.1145\/3097983.3097989","DOI":"10.1145\/3097983.3097989"},{"key":"980_CR66","unstructured":"McInnes L (2018) Using UMAP for Clustering. https:\/\/umap-learn.readthedocs.io\/en\/latest\/clustering.html, [Online; accessed 11-January-2022]"},{"key":"980_CR67","doi-asserted-by":"crossref","unstructured":"McInnes L, Healy J, Melville J (2018) Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint https:\/\/arxiv.org\/abs\/1802.03426","DOI":"10.21105\/joss.00861"},{"key":"980_CR68","doi-asserted-by":"publisher","unstructured":"Mehar AM, Matawie K, Maeder A (2013) Determining an optimal value of k in k-means clustering. In: 2013 IEEE international conference on bioinformatics and biomedicine, pp 51\u201355, https:\/\/doi.org\/10.1109\/BIBM.2013.6732734","DOI":"10.1109\/BIBM.2013.6732734"},{"issue":"3","key":"980_CR69","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1300","volume":"9","author":"M Mittal","year":"2019","unstructured":"Mittal M, Goyal LM, Hemanth DJ et al (2019) Clustering approaches for high-dimensional databases: a review. WIREs Data Min Knowl Discov 9(3):e1300. https:\/\/doi.org\/10.1002\/widm.1300","journal-title":"WIREs Data Min Knowl Discov"},{"key":"980_CR70","doi-asserted-by":"publisher","unstructured":"Mu Z, Wu Y, Yin H, et\u00a0al (2020) Study on single-phase ground fault location of distribution network based on MDS and DBSCAN clustering. In: 2020 39th Chinese control conference (CCC), IEEE, pp 6146\u20136150, https:\/\/doi.org\/10.23919\/CCC50068.2020.9188678","DOI":"10.23919\/CCC50068.2020.9188678"},{"key":"980_CR71","doi-asserted-by":"publisher","unstructured":"Mukherjee S, Asnani H, Lin E, et\u00a0al (2019) ClusterGAN: latent space clustering in generative adversarial networks. In: Proceedings of the AAAI conference on artificial intelligence, pp 4610\u20134617, https:\/\/doi.org\/10.1609\/aaai.v33i01.33014610","DOI":"10.1609\/aaai.v33i01.33014610"},{"key":"980_CR72","unstructured":"Nane S, Nayar S, Murase H (1996) Columbia object image library: COIL-20. Department of Computer Science, Columbia University, New York, Tech. rep"},{"issue":"3","key":"980_CR73","doi-asserted-by":"publisher","first-page":"646","DOI":"10.1137\/090762932","volume":"40","author":"P Niyogi","year":"2011","unstructured":"Niyogi P, Smale S, Weinberger S (2011) A topological view of unsupervised learning from noisy data. SIAM J Comput 40(3):646\u2013663. https:\/\/doi.org\/10.1137\/090762932","journal-title":"SIAM J Comput"},{"issue":"2","key":"980_CR74","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3132088","volume":"12","author":"D Pandove","year":"2018","unstructured":"Pandove D, Goel S, Rani R (2018) Systematic review of clustering high-dimensional and large datasets. ACM Trans Knowl Discov Data 12(2):1\u201368. https:\/\/doi.org\/10.1145\/3132088","journal-title":"ACM Trans Knowl Discov Data"},{"key":"980_CR75","doi-asserted-by":"publisher","unstructured":"Pealat C, Bouleux G, Cheutet V (2021) Improved time-series clustering with UMAP dimension reduction method. In: 2020 25th international conference on pattern recognition (ICPR), IEEE, pp 5658\u20135665, https:\/\/doi.org\/10.1109\/ICPR48806.2021.9412261","DOI":"10.1109\/ICPR48806.2021.9412261"},{"issue":"1","key":"980_CR76","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1243\/095440605X8298","volume":"219","author":"DT Pham","year":"2005","unstructured":"Pham DT, Dimov SS, Nguyen CD (2005) Selection of k in k-means clustering. Proc Inst Mech Eng Part C J Mech Eng Sci 219(1):103\u2013119. https:\/\/doi.org\/10.1243\/095440605X8298","journal-title":"Proc Inst Mech Eng Part C J Mech Eng Sci"},{"key":"980_CR77","doi-asserted-by":"publisher","first-page":"624","DOI":"10.1007\/978-3-030-30490-4_50","volume-title":"Artificial neural networks and machine learning - ICANN 2019: text and time series. ICANN 2019. Lecture notes in computer science","author":"GH Putri","year":"2019","unstructured":"Putri GH, Read MN, Koprinska I et al (2019) Dimensionality reduction for clustering and cluster tracking of cytometry data. In: Tetko IV, K\u016frkov\u00e1 V, Karpov P et al (eds) Artificial neural networks and machine learning - ICANN 2019: text and time series. ICANN 2019. Lecture notes in computer science, vol 11730. Springer, Cham, pp 624\u2013640. https:\/\/doi.org\/10.1007\/978-3-030-30490-4_50"},{"key":"980_CR78","volume-title":"Advances in neural information processing systems","author":"C Rasmussen","year":"2000","unstructured":"Rasmussen C (2000) The infinite gaussian mixture model. In: Solla S, Leen T, M\u00fcller K (eds) Advances in neural information processing systems, vol 12. MIT Press, Cambridge"},{"issue":"1","key":"980_CR79","first-page":"27","volume":"5","author":"E Rend\u00f3n","year":"2011","unstructured":"Rend\u00f3n E, Abundez I, Arizmendi A et al (2011) Internal versus external cluster validation indexes. Int J Comput Commun Control 5(1):27\u201334","journal-title":"Int J Comput Commun Control"},{"issue":"3","key":"980_CR80","doi-asserted-by":"publisher","first-page":"431","DOI":"10.1111\/cgf.12655","volume":"34","author":"B Rieck","year":"2015","unstructured":"Rieck B, Leitte H (2015) Persistent homology for the evaluation of dimensionality reduction schemes. Comput Graph Forum 34(3):431\u2013440. https:\/\/doi.org\/10.1111\/cgf.12655","journal-title":"Comput Graph Forum"},{"key":"980_CR81","unstructured":"Riehl E (2011) A leisurely introduction to simplicial sets. Unpublished expository article available online at http:\/\/www.math.harvard.edu\/eriehl"},{"key":"980_CR82","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","volume":"20","author":"PJ Rousseeuw","year":"1987","unstructured":"Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53\u201365. https:\/\/doi.org\/10.1016\/0377-0427(87)90125-7","journal-title":"J Comput Appl Math"},{"key":"980_CR83","doi-asserted-by":"publisher","first-page":"293","DOI":"10.7551\/mitpress\/6173.003.0022","volume-title":"Semi-supervised Learning","author":"LK Saul","year":"2006","unstructured":"Saul LK, Weinberger KQ, Sha F et al (2006) Spectral methods for dimensionality reduction. In: Chapelle O, Sch\u00f6lkopf B, Zien A (eds) Semi-supervised Learning. MIT Press, Cambridge, Massachusetts, pp 293\u2013308"},{"key":"980_CR84","doi-asserted-by":"publisher","first-page":"664","DOI":"10.1016\/j.neucom.2017.06.053","volume":"267","author":"A Saxena","year":"2017","unstructured":"Saxena A, Prasad M, Gupta A et al (2017) A review of clustering techniques and developments. Neurocomputing 267:664\u2013681. https:\/\/doi.org\/10.1016\/j.neucom.2017.06.053","journal-title":"Neurocomputing"},{"issue":"3","key":"980_CR85","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3068335","volume":"42","author":"E Schubert","year":"2017","unstructured":"Schubert E, Sander J, Ester M et al (2017) DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Trans Database Syst 42(3):1\u201321. https:\/\/doi.org\/10.1145\/3068335","journal-title":"ACM Trans Database Syst"},{"key":"980_CR86","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-74552-3","volume-title":"Cluster analysis and applications","author":"R Scitovski","year":"2021","unstructured":"Scitovski R, Sabo K, Mart\u00ednez \u00c1lvarez F et al (2021) Cluster analysis and applications, 1st edn. Springer, Cham. https:\/\/doi.org\/10.1007\/978-3-030-74552-3","edition":"1"},{"key":"980_CR87","doi-asserted-by":"publisher","unstructured":"Souvenir R, Pless R (2005) Manifold clustering. In: Tenth IEEE international conference on computer vision (ICCV\u201905), vol 1, IEEE, pp 648\u2013653, https:\/\/doi.org\/10.1109\/ICCV.2005.149","DOI":"10.1109\/ICCV.2005.149"},{"issue":"5500","key":"980_CR88","doi-asserted-by":"publisher","first-page":"2319","DOI":"10.1126\/science.290.5500.2319","volume":"290","author":"JB Tenenbaum","year":"2000","unstructured":"Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319\u20132323. https:\/\/doi.org\/10.1126\/science.290.5500.2319","journal-title":"Science"},{"issue":"105","key":"980_CR89","doi-asserted-by":"publisher","first-page":"501","DOI":"10.1016\/j.dib.2020.105501","volume":"30","author":"MC Thrun","year":"2020","unstructured":"Thrun MC, Ultsch A (2020) Clustering benchmark datasets exploiting the fundamental clustering problems. Data Brief 30(105):501. https:\/\/doi.org\/10.1016\/j.dib.2020.105501","journal-title":"Data Brief"},{"issue":"3","key":"980_CR90","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1444","volume":"12","author":"T Ullmann","year":"2022","unstructured":"Ullmann T, Hennig C, Boulesteix AL (2022) Validation of cluster analysis results on validation data: a systematic framework. WIREs Data Min Knowl Discov 12(3):e1444. https:\/\/doi.org\/10.1002\/widm.1444","journal-title":"WIREs Data Min Knowl Discov"},{"key":"980_CR91","doi-asserted-by":"publisher","unstructured":"Ultsch A (2005) Clustering with SOM: U*C. In: Proceedings of the workshop on self-organizing maps, Paris, France, https:\/\/doi.org\/10.13140\/RG.2.1.2394.5446","DOI":"10.13140\/RG.2.1.2394.5446"},{"key":"980_CR92","doi-asserted-by":"publisher","DOI":"10.3390\/data5010013","author":"A Ultsch","year":"2020","unstructured":"Ultsch A, L\u00f6tsch J (2020) The fundamental clustering and projection suite (FCPS): a dataset collection to test the performance of clustering and data projection algorithms. Data. https:\/\/doi.org\/10.3390\/data5010013","journal-title":"Data"},{"issue":"86","key":"980_CR93","first-page":"2579","volume":"9","author":"L van der Maaten","year":"2008","unstructured":"van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(86):2579\u20132605","journal-title":"J Mach Learn Res"},{"key":"980_CR94","unstructured":"Van\u00a0Mechelen I, Boulesteix AL, Dangl R, et\u00a0al (2018) Benchmarking in cluster analysis: a white paper. arXiv:1809.10496 [stat]"},{"issue":"95","key":"980_CR95","first-page":"2837","volume":"11","author":"NX Vinh","year":"2010","unstructured":"Vinh NX, Epps J, Bailey J (2010a) Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. J Mach Learn Res 11(95):2837\u20132854","journal-title":"J Mach Learn Res"},{"issue":"95","key":"980_CR96","first-page":"2837","volume":"11","author":"NX Vinh","year":"2010","unstructured":"Vinh NX, Epps J, Bailey J (2010b) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11(95):2837\u20132854","journal-title":"J Mach Learn Res"},{"issue":"4","key":"980_CR97","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1007\/s11222-007-9033-z","volume":"17","author":"U Von Luxburg","year":"2007","unstructured":"Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395\u2013416. https:\/\/doi.org\/10.1007\/s11222-007-9033-z","journal-title":"Stat Comput"},{"key":"980_CR98","first-page":"1","volume":"22","author":"Y Wang","year":"2021","unstructured":"Wang Y, Huang H, Rudin C et al (2021) Understanding how dimension reduction tools work: An empirical approach to deciphering t-SNE, UMAP, TriMap, and PaCMAP for data visualization. J Mach Learn Res 22:1\u201373","journal-title":"J Mach Learn Res"},{"issue":"1","key":"980_CR99","doi-asserted-by":"publisher","first-page":"501","DOI":"10.1146\/annurev-statistics-031017-100045","volume":"5","author":"L Wasserman","year":"2018","unstructured":"Wasserman L (2018) Topological data analysis. Annu Rev Stat Appl 5(1):501\u2013532. https:\/\/doi.org\/10.1146\/annurev-statistics-031017-100045","journal-title":"Annu Rev Stat Appl"},{"issue":"59","key":"980_CR100","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13059-019-1663-x","volume":"20","author":"FA Wolf","year":"2019","unstructured":"Wolf FA, Hamey FK, Plass M et al (2019) PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol 20(59):1\u20139. https:\/\/doi.org\/10.1186\/s13059-019-1663-x","journal-title":"Genome Biol"},{"key":"980_CR101","unstructured":"Xiao H, Rasul K, Vollgraf R (2017) Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747"},{"issue":"1","key":"980_CR102","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1007\/s10994-013-5334-y","volume":"98","author":"A Zimek","year":"2015","unstructured":"Zimek A, Vreeken J (2015) The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives. Mach Learn 98(1):121\u2013155. https:\/\/doi.org\/10.1007\/s10994-013-5334-y","journal-title":"Mach Learn"},{"issue":"2","key":"980_CR103","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1330","volume":"10","author":"A Zimmermann","year":"2020","unstructured":"Zimmermann A (2020) Method evaluation, parameterization, and result validation in unsupervised data mining: a critical survey. WIREs Data Min Knowl Discov 10(2):e1330. https:\/\/doi.org\/10.1002\/widm.1330","journal-title":"WIREs Data Min Knowl Discov"},{"issue":"2","key":"980_CR104","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1007\/s00454-004-1146-y","volume":"33","author":"A Zomorodian","year":"2005","unstructured":"Zomorodian A, Carlsson G (2005) Computing persistent homology. Discrete Comput Geom 33(2):249\u2013274. https:\/\/doi.org\/10.1007\/s00454-004-1146-y","journal-title":"Discrete Comput Geom"}],"container-title":["Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-023-00980-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10618-023-00980-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-023-00980-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,4]],"date-time":"2024-05-04T09:10:50Z","timestamp":1714813850000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10618-023-00980-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,29]]},"references-count":104,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,5]]}},"alternative-id":["980"],"URL":"https:\/\/doi.org\/10.1007\/s10618-023-00980-2","relation":{},"ISSN":["1384-5810","1573-756X"],"issn-type":[{"value":"1384-5810","type":"print"},{"value":"1573-756X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,29]]},"assertion":[{"value":"26 February 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 August 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 September 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no competing interests to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}