{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T01:45:29Z","timestamp":1773193529368,"version":"3.50.1"},"reference-count":66,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:p>\n            Efficient clustering algorithms, such as\n            <jats:italic>k<\/jats:italic>\n            -Means, are often used in practice because they scale well for large datasets. However, they are only able to detect simple data characteristics. Ensemble clustering can overcome this limitation by combining multiple results of efficient algorithms. However, analysts face several challenges when applying ensemble clustering, i. e., analysts struggle to (a) efficiently generate an ensemble and (b) combine the ensemble using a suitable consensus function with a corresponding hyperparameter setting. In this paper, we propose EffEns, an efficient ensemble clustering approach to address these challenges. Our approach relies on meta-learning to learn about dataset characteristics and the correlation between generated base clusterings and the performance of consensus functions. We apply the learned knowledge to generate appropriate ensembles and select a suitable consensus function to combine their results. Further, we use a state-of-the-art optimization technique to tune the hyperparameters of the selected consensus function. Our comprehensive evaluation on synthetic and real-world datasets demonstrates that EffEns significantly outperforms state-of-the-art approaches w.r.t. accuracy and runtime.\n          <\/jats:p>","DOI":"10.14778\/3681954.3681970","type":"journal-article","created":{"date-parts":[[2024,8,30]],"date-time":"2024-08-30T16:23:36Z","timestamp":1725035016000},"page":"2880-2892","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Ensemble Clustering Based on Meta-Learning and Hyperparameter Optimization"],"prefix":"10.14778","volume":"17","author":[{"given":"Dennis","family":"Treder-Tschechlov","sequence":"first","affiliation":[{"name":"University of Stuttgart, Stuttgart, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Manuel","family":"Fritz","sequence":"additional","affiliation":[{"name":"University of Stuttgart, Stuttgart, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Holger","family":"Schwarz","sequence":"additional","affiliation":[{"name":"University of Stuttgart, Stuttgart, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bernhard","family":"Mitschang","sequence":"additional","affiliation":[{"name":"University of Stuttgart, Stuttgart, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,8,30]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Ebrahim Akbari et al. 2015. Hierarchical cluster ensemble selection. Engineering Applications of Artificial Intelligence (2015).","DOI":"10.1016\/j.engappai.2014.12.005"},{"key":"e_1_2_1_2_1","first-page":"1","article-title":"MFE: Towards reproducible meta-feature extraction","volume":"21","author":"Edesio Alcoba\u00e7a","year":"2020","unstructured":"Edesio Alcoba\u00e7a et al. 2020. MFE: Towards reproducible meta-feature extraction. Journal of Machine Learning Research 21, 111 (2020), 1--5.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_3_1","volume-title":"OPTICS: Ordering Points to Identify the Clustering Structure. In ACM SIGMOD.","author":"Mihael Ankerst","year":"1999","unstructured":"Mihael Ankerst et al. 1999. OPTICS: Ordering Points to Identify the Clustering Structure. In ACM SIGMOD."},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, 1027--1035","author":"Arthur D.","year":"2007","unstructured":"D. Arthur and Vassilvitskii. 2007. k-means++: The Advantages of Careful Seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, 1027--1035."},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Hanan Ayad and Mohamed Kamel. 2003. Finding Natural Clusters Using Multi-Clusterer Combiner Based on Shared Nearest Neighbors. In MCS.","DOI":"10.1007\/3-540-44938-8_17"},{"key":"e_1_2_1_6_1","volume-title":"Kamel","author":"Ayad Hanan G.","year":"2008","unstructured":"Hanan G. Ayad and Mohamed S. Kamel. 2008. Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters. IEEE TPAMI (2008)."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2009.11.012"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/1661445.1661603"},{"key":"e_1_2_1_9_1","volume-title":"Pattern Recognition and Machine Learning (Information Science and Statistics)","author":"Bishop Christopher M.","unstructured":"Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg."},{"key":"e_1_2_1_10_1","volume-title":"Cluster ensembles: A survey of approaches with recent extensions and applications. Computer Science Review","author":"Boongoen Tossapon","year":"2018","unstructured":"Tossapon Boongoen and Natthakan Iam-On. 2018. Cluster ensembles: A survey of approaches with recent extensions and applications. Computer Science Review (2018)."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2004.03.009"},{"key":"e_1_2_1_12_1","volume-title":"Fayyad","author":"Bradley Paul S.","year":"1998","unstructured":"Paul S. Bradley and Usama M. Fayyad. 1998. Refining Initial Points for K-Means Clustering. In ICML."},{"key":"e_1_2_1_13_1","volume-title":"Metalearning: Applications to data mining. (1 ed.).","author":"Brazdil P.","year":"2008","unstructured":"P. Brazdil, C. Giraud-Carrier, C. Soares, and R. Vilalta. 2008. Metalearning: Applications to data mining. (1 ed.)."},{"key":"e_1_2_1_14_1","volume-title":"Random forests. Machine learning 45, 1","author":"Breiman Leo","year":"2001","unstructured":"Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32."},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"T. Cali\u00f1ski and J. Harabasz. 1974. A Dendrite Method For Cluster Analysis. Communications in Statistics 1 (1974).","DOI":"10.1080\/03610927408827101"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.1000236"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.1979.4766909"},{"key":"e_1_2_1_18_1","volume-title":"Weighted cluster ensembles. ACM TKDD","author":"Domeniconi Carlotta","year":"2009","unstructured":"Carlotta Domeniconi and Muna Al-Razgan. 2009. Weighted cluster ensembles. ACM TKDD (2009)."},{"key":"e_1_2_1_19_1","unstructured":"Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http:\/\/archive.ics.uci.edu\/ml"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData52589.2021.9671542"},{"key":"e_1_2_1_21_1","volume-title":"TPE-AutoClust: A Tree-based Pipline Ensemble Framework for Automated Clustering. In 2022 IEEE International Conference on Data Mining Workshops (ICDMW). 1144--1153","author":"ElShawi Radwa","year":"2022","unstructured":"Radwa ElShawi and Sherif Sakr. 2022. TPE-AutoClust: A Tree-based Pipline Ensemble Framework for Automated Clustering. In 2022 IEEE International Conference on Data Mining Workshops (ICDMW). 1144--1153."},{"key":"e_1_2_1_22_1","unstructured":"Martin Ester et al. 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In ACM SIGKDD."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1002\/sam.10008"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2014.12.044"},{"key":"e_1_2_1_25_1","unstructured":"Matthias Feurer et al. 2015. Efficient and robust automated machine learning. In Advances in neural information processing systems. 2962--2970."},{"key":"e_1_2_1_26_1","volume-title":"Jain","author":"Fred Ana L.N.","year":"2005","unstructured":"Ana L.N. Fred and Anil K. Jain. 2005. Combining multiple clusterings using evidence accumulation. IEEE PAMI (2005)."},{"key":"e_1_2_1_27_1","first-page":"4","article-title":"Efficient exploratory clustering analyses in large-scale exploration processes","volume":"31","author":"Manuel Fritz","year":"2021","unstructured":"Manuel Fritz et al. 2021. Efficient exploratory clustering analyses in large-scale exploration processes. The VLDB Journal 31, 4 (nov 2021), 711--732.","journal-title":"The VLDB Journal"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Manuel Fritz Dennis Tschechlov and Holger Schwarz. 2020. Learning from Past Observations: Meta-Learning for Efficient Clustering Analyses. In Big Data Analytics and Knowledge Discovery.","DOI":"10.1007\/978-3-030-59065-9_28"},{"key":"e_1_2_1_29_1","volume-title":"LOG-Means: Efficiently Estimating the number of Clusters in Large Datasets. PVLDB","author":"Michael Behringer Holger Schwarz Manuel Fritz","year":"2020","unstructured":"Manuel Fritz Michael Behringer Holger Schwarz. 2020. LOG-Means: Efficiently Estimating the number of Clusters in Large Datasets. PVLDB (2020)."},{"key":"e_1_2_1_30_1","unstructured":"Junhao Gan and Yufei Tao. 2015. DBSCAN revisited: Mis-claim un-fixability and approximation. In ACM SIGMOD."},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Andrey Goder and Vladimir Filkov. 2008. Consensus Clustering Algorithms: Comparison and Refinement. In ALENEX.","DOI":"10.1137\/1.9781611972887.11"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2021.104388"},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Ibai Gurrutxaga et al. 2010. SEP\/COP: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index. Pattern Recognition (2010).","DOI":"10.1016\/j.patcog.2010.04.021"},{"key":"e_1_2_1_34_1","volume-title":"On Clustering Validation Techniques. Journal of Intelligent Information Systems","author":"Halkidi Maria","year":"2001","unstructured":"Maria Halkidi, Yannis Batistakis, and Michalis Vazirgiannis. 2001. On Clustering Validation Techniques. Journal of Intelligent Information Systems (2001)."},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Francisco Herrera et al. 2016. Multilabel classification. Springer.","DOI":"10.1007\/978-3-319-41111-8"},{"key":"e_1_2_1_36_1","volume-title":"SCAR: Spectral Clustering Accelerated and Robustified. PVLDB","author":"Ellen Hohma","year":"2022","unstructured":"Ellen Hohma et al. 2022. SCAR: Spectral Clustering Accelerated and Robustified. PVLDB (2022)."},{"key":"e_1_2_1_37_1","volume-title":"Goldgof","author":"Hore Prodip","year":"2009","unstructured":"Prodip Hore, Lawrence O. Hall, and Dmitry B. Goldgof. 2009. A scalable framework for cluster ensembles. Pattern Recognition (2009)."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2017.2702343"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-25566-3_40"},{"key":"e_1_2_1_40_1","volume-title":"Data clustering: 50 years beyond K-means. Pattern recognition letters 31, 8","author":"Jain Anil K","year":"2010","unstructured":"Anil K Jain. 2010. Data clustering: 50 years beyond K-means. Pattern recognition letters 31, 8 (2010), 651--666."},{"key":"e_1_2_1_41_1","volume-title":"Dubes","author":"Jain Anil K.","year":"1988","unstructured":"Anil K. Jain and Richard C. Dubes. 1988. Algorithms for clustering data. Prentice Hall. 1--320 pages."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIF.2006.301614"},{"key":"e_1_2_1_43_1","unstructured":"Marius Lindauer et al. 2022. SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization. Journal of Machine Learning Research (2022)."},{"key":"e_1_2_1_44_1","volume-title":"Proceedings - IEEE International Conference on Data Mining, ICDM.","author":"Yanchi","unstructured":"Yanchi Liu et al. 2010. Understanding of internal clustering validation measures. In Proceedings - IEEE International Conference on Data Mining, ICDM."},{"key":"e_1_2_1_45_1","doi-asserted-by":"crossref","unstructured":"Yue Liu Shuang Li and Wenjie Tian. 2021. AutoCluster: Meta-learning Based Ensemble Method for Automated Unsupervised Clustering. In PAKDD.","DOI":"10.1007\/978-3-030-75768-7_20"},{"key":"e_1_2_1_47_1","volume-title":"Deep Clustering With Consensus Representations. 2022 IEEE International Conference on Data Mining (ICDM)","author":"Lukas","year":"2022","unstructured":"Lukas Miklautz et al. 2022. Deep Clustering With Consensus Representations. 2022 IEEE International Conference on Data Mining (ICDM) (2022), 1119--1124."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-012-0290-x"},{"key":"e_1_2_1_49_1","volume-title":"On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems 14","author":"Ng Andrew","year":"2001","unstructured":"Andrew Ng, Michael Jordan, and Yair Weiss. 2001. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems 14 (2001)."},{"key":"e_1_2_1_50_1","volume-title":"Consensus Clusterings","author":"Nguyen Nam","unstructured":"Nam Nguyen and Rich Caruana. 2007. Consensus Clusterings. In IEEE ICDM."},{"key":"e_1_2_1_51_1","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"F Pedregosa et al. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_52_1","volume-title":"A Random Subspace Approach for the Missing Feature Problem. Pattern Recognition","author":"Polikar Robi","year":"2010","unstructured":"Robi Polikar and others. 2010. Learn++.MF: A Random Subspace Approach for the Missing Feature Problem. Pattern Recognition (2010)."},{"key":"e_1_2_1_53_1","volume-title":"AutoClust: A Framework for Automated Clustering Based on Cluster Validity Indices","author":"Poulakis Yannis","unstructured":"Yannis Poulakis, Christos Doulkeridis, and Dimosthenis Kyriazis. 2020. AutoClust: A Framework for Automated Clustering Based on Cluster Validity Indices. In IEEE ICDM."},{"key":"e_1_2_1_54_1","volume-title":"Towards reproducible empirical research in meta-learning. arXiv preprint arXiv:1808.10406","author":"Rivolli Adriano","year":"2018","unstructured":"Adriano Rivolli, Lu\u2032P F Garcia, Carlos Soares, Joaquin Vanschoren, and Andr\u00e9 CPLF de Carvalho. 2018. Towards reproducible empirical research in meta-learning. arXiv preprint arXiv:1808.10406 (2018)."},{"key":"e_1_2_1_55_1","first-page":"1","article-title":"Adjusting for chance clustering comparison measures","volume":"17","author":"Romano Simone","year":"2016","unstructured":"Simone Romano, Nguyen Xuan Vinh, James Bailey, and Karin Verspoor. 2016. Adjusting for chance clustering comparison measures. Journal of Machine Learning Research 17, 134 (2016), 1--32.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_56_1","unstructured":"Jasper Snoek Hugo Larochelle and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. In NeurIPS."},{"key":"e_1_2_1_57_1","unstructured":"Alexander Strehl and Joydeep Ghosh. 2003. Cluster Ensembles - a Knowledge Reuse Framework for Combining Multiple Partitions. J. Mach. Learn. Res. (2003)."},{"key":"e_1_2_1_58_1","doi-asserted-by":"crossref","unstructured":"A. Topchy A.K. Jain and W. Punch. 2005. Clustering ensembles: models of consensus and weak partitions. IEEE TPAMI (2005).","DOI":"10.1109\/TPAMI.2005.237"},{"key":"e_1_2_1_59_1","series-title":"SIAM SDM.","volume-title":"A Mixture Model for Clustering Ensembles","author":"Topchy Alexander","unstructured":"Alexander Topchy, Anil K. Jain, and William Punch. 2004. A Mixture Model for Clustering Ensembles. In SIAM SDM."},{"key":"e_1_2_1_60_1","unstructured":"A.P. Topchy M.H.C. Law A.K. Jain and A.L. Fred. 2004. Analysis of Consensus Partition in Cluster Ensemble. In IEEE ICDM."},{"key":"e_1_2_1_61_1","volume-title":"Proc. ACM Manag. Data","author":"Dennis","year":"2023","unstructured":"Dennis Treder-Tschechlov et al. 2023. ML2DAC: Meta-Learning to Democratize AutoML for Clustering Analysis. Proc. ACM Manag. Data (2023)."},{"key":"e_1_2_1_62_1","unstructured":"Dennis Tschechlov Manuel Fritz and Holger Schwarz. 2021. AutoML4Clust: Efficient AutoML for Clustering Analyses. In EDBT."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/2641190.2641198"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001411008683"},{"key":"e_1_2_1_65_1","volume-title":"Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research","author":"Vinh Nguyen Xuan","year":"2010","unstructured":"Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2010. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research (2010)."},{"key":"e_1_2_1_66_1","unstructured":"Xindong Wu et al. 2008. Top 10 algorithms in data mining. Knowledge and information systems (2008)."},{"key":"e_1_2_1_67_1","volume-title":"Clustering ensemble selection for categorical data based on internal validity indices. Pattern Recognition","author":"Zhao Xingwang","year":"2017","unstructured":"Xingwang Zhao, Jiye Liang, and Chuangyin Dang. 2017. Clustering ensemble selection for categorical data based on internal validity indices. Pattern Recognition (2017)."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3681954.3681970","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,4]],"date-time":"2024-09-04T18:27:40Z","timestamp":1725474460000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3681954.3681970"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7]]},"references-count":66,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["10.14778\/3681954.3681970"],"URL":"https:\/\/doi.org\/10.14778\/3681954.3681970","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,7]]},"assertion":[{"value":"2024-08-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}