{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T15:25:05Z","timestamp":1780500305546,"version":"3.54.1"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,6,13]],"date-time":"2023-06-13T00:00:00Z","timestamp":1686614400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"German Federal Ministry of Education and Research","award":["01IS1705"],"award-info":[{"award-number":["01IS1705"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,6,13]]},"abstract":"<jats:p>Analysts often struggle with the combined algorithm selection and hyperparameter optimization problem, a.k.a. CASH problem in literature. Typically, they execute several algorithms with varying hyperparameter settings to find configurations that show valuable results. Efficiently finding these configurations is a major challenge. In clustering analyses, analysts face the additional challenge to select a cluster validity index that allows them to evaluate clustering results in a purely unsupervised fashion. Many different cluster validity indices exist and each one has its benefits depending on the dataset characteristics. While experienced analysts might address these challenges using their domain knowledge and experience, especially novice analysts struggle with them. In this paper, we propose a new meta-learning approach to address these challenges. Our approach uses knowledge from past clustering evaluations to apply strategies that experienced analysts would exploit. In particular, we use meta-learning to (a) select a suitable clustering validity index, (b) efficiently select well-performing clustering algorithm and hyperparameter configurations, and (c) reduce the search space to suitable clustering algorithms. In the evaluation, we show that our approach significantly outperforms state-of-the-art approaches regarding accuracy and runtime.<\/jats:p>","DOI":"10.1145\/3589289","type":"journal-article","created":{"date-parts":[[2023,6,20]],"date-time":"2023-06-20T20:26:45Z","timestamp":1687292805000},"page":"1-26","source":"Crossref","is-referenced-by-count":8,"title":["ML2DAC: Meta-Learning to Democratize AutoML for Clustering Analysis"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2502-4215","authenticated-orcid":false,"given":"Dennis","family":"Treder-Tschechlov","sequence":"first","affiliation":[{"name":"Universit\u00e4t Stuttgart, Stuttgart, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4640-8477","authenticated-orcid":false,"given":"Manuel","family":"Fritz","sequence":"additional","affiliation":[{"name":"Universit\u00e4t Stuttgart, Stuttgart, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7085-2813","authenticated-orcid":false,"given":"Holger","family":"Schwarz","sequence":"additional","affiliation":[{"name":"Universit\u00e4t Stuttgart, Stuttgart, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0809-9159","authenticated-orcid":false,"given":"Bernhard","family":"Mitschang","sequence":"additional","affiliation":[{"name":"Universit\u00e4t Stuttgart, Stuttgart, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,6,20]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2012.07.021"},{"key":"e_1_2_2_2_1","volume-title":"DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling","author":"Baldi Pierre","unstructured":"Pierre Baldi and G. Wesley Hatfield. 2011. DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling. Cambridge University Press."},{"key":"e_1_2_2_3_1","volume-title":"Algorithms for hyper-parameter optimization. Advances in neural information processing systems","author":"Bergstra James","year":"2011","unstructured":"James Bergstra, R\u00e9mi Bardenet, Yoshua Bengio, and Bal\u00e1zs K\u00e9gl. 2011. Algorithms for hyper-parameter optimization. Advances in neural information processing systems, Vol. 24 (2011)."},{"key":"e_1_2_2_4_1","volume-title":"Metalearning: Applications to data mining.","author":"Brazdil P.","year":"2008","unstructured":"P. Brazdil, C. Giraud-Carrier, C. Soares, and R. Vilalta. 2008. Metalearning: Applications to data mining."},{"key":"e_1_2_2_5_1","volume-title":"Random forests. Machine learning","author":"Breiman Leo","year":"2001","unstructured":"Leo Breiman. 2001. Random forests. Machine learning, Vol. 45, 1 (2001), 5--32."},{"key":"e_1_2_2_6_1","doi-asserted-by":"crossref","unstructured":"T. Cali n ski and J. Harabasz. 1974. A Dendrite Method For Cluster Analysis. Communications in Statistics 1 (1974).","DOI":"10.1080\/03610927408827101"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8655(85)90053-4"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.1000236"},{"key":"e_1_2_2_9_1","volume-title":"Bouldin","author":"Davies David L.","year":"1979","unstructured":"David L. Davies and Donald W. Bouldin. 1979. A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 2 (1979)."},{"key":"e_1_2_2_10_1","volume-title":"Proceedings of the International Joint Conference on Neural Networks.","author":"Marcilio","unstructured":"Marcilio C.P. De Souto et al. 2008. Ranking and selecting clustering algorithms using a meta-learning approach. In Proceedings of the International Joint Conference on Neural Networks."},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1080\/01969727408546059"},{"key":"e_1_2_2_12_1","volume-title":"Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.","author":"Martin","unstructured":"Martin Ester et al. 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining."},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-35380-2_18"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2014.12.044"},{"key":"e_1_2_2_15_1","volume-title":"International Workshop on Automatic Machine Learning at ICML.","author":"Matthias","unstructured":"Matthias Feurer et al. 2018. Practical automated machine learning. In International Workshop on Automatic Machine Learning at ICML."},{"key":"e_1_2_2_16_1","unstructured":"Matthias Feurer Aaron Klein Katharina Eggensperger Jost Springenberg Manuel Blum and Frank Hutter. 2015a. Efficient and robust automated machine learning. In Advances in neural information processing systems."},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/2887007.2887164"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.765656"},{"key":"e_1_2_2_19_1","first-page":"4","article-title":"Efficient exploratory clustering analyses in large-scale exploration processes","volume":"31","author":"Manuel Fritz","year":"2021","unstructured":"Manuel Fritz et al. 2021. Efficient exploratory clustering analyses in large-scale exploration processes. The VLDB Journal, Vol. 31, 4 (nov 2021), 711--732.","journal-title":"The VLDB Journal"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407813"},{"key":"e_1_2_2_21_1","volume-title":"Google Vizier: A Service for Black-Box Optimization. In ACM SIGKDD (KDD '17). 1487--1495.","author":"Daniel Golovin","year":"2017","unstructured":"Daniel Golovin et al. 2017. Google Vizier: A Service for Black-Box Optimization. In ACM SIGKDD (KDD '17). 1487--1495."},{"key":"e_1_2_2_22_1","first-page":"10","article-title":"SEP\/COP: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index","volume":"43","author":"Ibai Gurrutxaga","year":"2010","unstructured":"Ibai Gurrutxaga et al. 2010. SEP\/COP: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index. Pattern Recognition, Vol. 43, 10 (oct 2010), 3364--3373.","journal-title":"Pattern Recognition"},{"key":"e_1_2_2_23_1","volume-title":"Journal of Intelligent Information Systems","author":"Halkidi Maria","year":"2001","unstructured":"Maria Halkidi, Yannis Batistakis, and Michalis Vazirgiannis. 2001. Journal of Intelligent Information Systems, Vol. 17, 2\/3 (2001), 107--145."},{"key":"e_1_2_2_24_1","volume-title":"Neural Information Processing Systems","volume":"17","author":"Hamerly Greg","year":"2003","unstructured":"Greg Hamerly and Charles Elkan. 2003. Learning the K in K-Means. In Neural Information Processing Systems, Vol. 17 (2003)."},{"key":"e_1_2_2_25_1","volume-title":"Comparing partitions. Journal of Classification","author":"Hubert Lawrence","year":"1985","unstructured":"Lawrence Hubert and Phipps Arabie. 1985. Comparing partitions. Journal of Classification (1985)."},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-25566-3_40"},{"key":"e_1_2_2_27_1","volume-title":"Data clustering: 50 years beyond K-means. Pattern recognition letters 8","author":"Jain Anil K","year":"2010","unstructured":"Anil K Jain. 2010. Data clustering: 50 years beyond K-means. Pattern recognition letters 8 (2010)."},{"key":"e_1_2_2_28_1","doi-asserted-by":"crossref","unstructured":"Haifeng Jin Qingquan Song and Xia Hu. 2019. Auto-Keras: An Efficient Neural Architecture Search System. In ACM SIGKDD. 1946--1956.","DOI":"10.1145\/3292500.3330648"},{"key":"e_1_2_2_29_1","volume-title":"Dubes","author":"Jain Anil K.","year":"1988","unstructured":"Anil K. Jain and Richard C. Dubes. 1988. Algorithms for clustering data."},{"key":"e_1_2_2_30_1","doi-asserted-by":"crossref","unstructured":"Yue Liu Shuang Li and Wenjie Tian. 2021. AutoCluster: Meta-learning Based Ensemble Method for Automated Unsupervised Clustering. In PAKDD.","DOI":"10.1007\/978-3-030-75768-7_20"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2010.35"},{"key":"e_1_2_2_32_1","volume-title":"Understanding and enhancement of internal clustering validation measures","author":"Liu Yanchi","year":"2013","unstructured":"Yanchi Liu, Zhongmou Li, Hui Xiong, Xuedong Gao, Junjie Wu, and Sen Wu. 2013. Understanding and enhancement of internal clustering validation measures. IEEE Transactions on Cybernetics 3 (6 2013)."},{"key":"e_1_2_2_33_1","volume-title":"Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability.","author":"MacQueen J","year":"1967","unstructured":"J MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability."},{"key":"e_1_2_2_34_1","volume-title":"Accelerated Hierarchical Density Based Clustering","author":"McInnes Leland","unstructured":"Leland McInnes and John Healy. 2017. Accelerated Hierarchical Density Based Clustering. In IEEE ICDMW."},{"key":"e_1_2_2_35_1","doi-asserted-by":"crossref","unstructured":"Davoud Moulavi et al. 2014. Density-based clustering validation. In SIAM\/ SDM.","DOI":"10.1137\/1.9781611973440.96"},{"key":"e_1_2_2_36_1","volume-title":"Nascimento et al","author":"Andr\u00e9","year":"2009","unstructured":"Andr\u00e9 C.A. Nascimento et al. 2009. Mining rules for the automatic selection process of clustering methods applied to cancer gene expression data. In Artificial Neural Networks -- ICANN 2009, Vol. 5769 LNCS. Springer Berlin Heidelberg."},{"key":"e_1_2_2_37_1","volume-title":"Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research","author":"F Pedregosa","year":"2011","unstructured":"F Pedregosa et al. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research (2011)."},{"key":"e_1_2_2_38_1","volume-title":"Proceedings of the Seventeenth International Conference on Machine Learning.","author":"Pelleg Dan","year":"2000","unstructured":"Dan Pelleg and Andrew W Moore. 2000. X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In Proceedings of the Seventeenth International Conference on Machine Learning."},{"key":"e_1_2_2_39_1","volume-title":"Giraud-Carrier","author":"Pfahringer Bernhard","year":"2000","unstructured":"Bernhard Pfahringer, Hilan Bensusan, and Christophe G. Giraud-Carrier. 2000. Meta-Learning by Landmarking Various Learning Algorithms. In ICML."},{"key":"e_1_2_2_40_1","volume-title":"de Carvalho","author":"Pimentel Bruno Almeida","year":"2019","unstructured":"Bruno Almeida Pimentel and Andr\u00e9 C.P.L.F. de Carvalho. 2019. A new data characterization for selecting clustering algorithms using meta-learning. Information Sciences (3 2019)."},{"key":"e_1_2_2_41_1","volume-title":"2018 International Joint Conference on Neural Networks.","author":"Pimentel Bruno Almeida","unstructured":"Bruno Almeida Pimentel and Andre C. P. L. F. de Carvalho. 2018. Statistical versus Distance-Based Meta-Features for Clustering Algorithm recommendation Using Meta-Learning. In 2018 International Joint Conference on Neural Networks."},{"key":"e_1_2_2_42_1","volume-title":"AutoClust: A Framework for Automated Clustering Based on Cluster Validity Indices. In 2020 IEEE International Conference on Data Mining (ICDM).","author":"Poulakis Yannis","year":"2020","unstructured":"Yannis Poulakis, Christos Doulkeridis, and Dimosthenis Kyriazis. 2020. AutoClust: A Framework for Automated Clustering Based on Cluster Validity Indices. In 2020 IEEE International Conference on Data Mining (ICDM)."},{"key":"e_1_2_2_43_1","unstructured":"Carl Edward Rasmussen. 1999. The infinite Gaussian mixture model. In Advances in neural information processing systems."},{"key":"e_1_2_2_44_1","unstructured":"Adriano Rivolli et al. 2018. Towards reproducible empirical research in meta-learning. arXiv preprint arXiv:1808.10406 (2018)."},{"key":"e_1_2_2_45_1","volume-title":"Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. C","author":"Rousseeuw Peter J.","year":"1987","unstructured":"Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. C (1987)."},{"key":"e_1_2_2_46_1","volume-title":"A Meta-Learning Recommendation System for Characterizing Unsupervised Problems: On Using Quality Indices to Describe Data Conformations","author":"Corchado J A","year":"2019","unstructured":"J A S\u00e1 ez and E Corchado. 2019. A Meta-Learning Recommendation System for Characterizing Unsupervised Problems: On Using Quality Indices to Describe Data Conformations. IEEE Access (2019)."},{"key":"e_1_2_2_47_1","unstructured":"Jasper Snoek Hugo Larochelle and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems."},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-04274-4_14"},{"key":"e_1_2_2_49_1","unstructured":"Alexander Strehl and Joydeep Ghosh. 2003. Cluster ensembles - A knowledge reuse framework for combining multiple partitions. In JMLR."},{"key":"e_1_2_2_50_1","volume-title":"Guttag","author":"Suresh Harini","year":"2018","unstructured":"Harini Suresh, Jen J. Gong, and John V. Guttag. 2018. Learning Tasks for Multitask Learning: Heterogenous Patient Populations in the ICU. In ACM SIGKDD."},{"key":"e_1_2_2_51_1","doi-asserted-by":"crossref","unstructured":"Chris Thornton et al. 2013. Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In ACM SIGKDD.","DOI":"10.1145\/2487575.2487629"},{"key":"e_1_2_2_52_1","unstructured":"Dennis Tschechlov Manuel Fritz and Holger Schwarz. 2021. AutoML4Clust: Efficient AutoML for Clustering Analyses. In EDBT."},{"key":"e_1_2_2_53_1","volume-title":"ICML","author":"van Craenendonck Toon","year":"2015","unstructured":"Toon van Craenendonck and Hendrik Blockeel. 2015. Using internal validity measures to compare clustering algorithms. In ICML 2015."},{"key":"e_1_2_2_54_1","volume-title":"Automated machine learning","author":"Vanschoren Joaquin","unstructured":"Joaquin Vanschoren. 2019. Meta-learning. In Automated machine learning. Springer, Cham, 35--61."},{"key":"e_1_2_2_55_1","volume-title":"Proceedings of ICML workshop on unsupervised and transfer learning.","author":"Von Ulrike","unstructured":"Ulrike Von Luxburg et al. 2012. Clustering: Science or art?. In Proceedings of ICML workshop on unsupervised and transfer learning."},{"key":"e_1_2_2_56_1","doi-asserted-by":"crossref","unstructured":"Milan Vukicevic et al. 2016. Extending meta-learning framework for clustering gene expression data with component-based algorithm design and internal evaluation measures. IJDMB (2016).","DOI":"10.1504\/IJDMB.2016.074682"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589289","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3589289","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:48:54Z","timestamp":1750182534000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589289"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,13]]},"references-count":56,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,6,13]]}},"alternative-id":["10.1145\/3589289"],"URL":"https:\/\/doi.org\/10.1145\/3589289","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,13]]}}}