{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T08:44:44Z","timestamp":1781340284153,"version":"3.54.1"},"reference-count":27,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2007,3,1]],"date-time":"2007-03-01T00:00:00Z","timestamp":1172707200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2007,3]]},"abstract":"<jats:p>\n            We consider the following problem: given a set of clusterings, find a single clustering that agrees as much as possible with the input clusterings. This problem,\n            <jats:italic>clustering aggregation<\/jats:italic>\n            , appears naturally in various contexts. For example, clustering categorical data is an instance of the clustering aggregation problem; each categorical attribute can be viewed as a clustering of the input rows where rows are grouped together if they take the same value on that attribute. Clustering aggregation can also be used as a metaclustering method to improve the robustness of clustering by combining the output of multiple algorithms. Furthermore, the problem formulation does not require a priori information about the number of clusters; it is naturally determined by the optimization function.\n          <\/jats:p>\n          <jats:p>\n            In this article, we give a formal statement of the clustering aggregation problem, and we propose a number of algorithms. Our algorithms make use of the connection between clustering aggregation and the problem of\n            <jats:italic>correlation clustering<\/jats:italic>\n            . Although the problems we consider are NP-hard, for several of our methods, we provide theoretical guarantees on the quality of the solutions. Our work provides the best deterministic approximation algorithm for the variation of the correlation clustering problem we consider. We also show how sampling can be used to scale the algorithms for large datasets. We give an extensive empirical evaluation demonstrating the usefulness of the problem and of the solutions.\n          <\/jats:p>","DOI":"10.1145\/1217299.1217303","type":"journal-article","created":{"date-parts":[[2007,6,8]],"date-time":"2007-06-08T15:00:08Z","timestamp":1181314808000},"page":"4","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":596,"title":["Clustering aggregation"],"prefix":"10.1145","volume":"1","author":[{"given":"Aristides","family":"Gionis","sequence":"first","affiliation":[{"name":"Yahoo! Research Labs, Barcelona, Spain"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Heikki","family":"Mannila","sequence":"additional","affiliation":[{"name":"University of Helsinki and Helsinki University of Technology"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Panayiotis","family":"Tsaparas","sequence":"additional","affiliation":[{"name":"Microsoft Search Labs"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2007,3]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1060590.1060692"},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the International Conference on Extending Database Technology (EDBT). 123--146","author":"Andritsos P.","unstructured":"Andritsos , P. , Tsaparas , P. , Miller , R. J. , and Sevcik , K. C . 2004. LIMBO: Scalable clustering of categorical data . In Proceedings of the International Conference on Extending Database Technology (EDBT). 123--146 . Andritsos, P., Tsaparas, P., Miller, R. J., and Sevcik, K. C. 2004. LIMBO: Scalable clustering of categorical data. In Proceedings of the International Conference on Extending Database Technology (EDBT). 123--146."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:MACH.0000033116.57574.95"},{"key":"e_1_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Barthelemy J.-P. and Leclerc B. 1995. The median procedure for partitions. DIMACS Series in Discrete Mathematics 3--34.  Barthelemy J.-P. and Leclerc B. 1995. The median procedure for partitions. DIMACS Series in Discrete Mathematics 3--34.","DOI":"10.1090\/dimacs\/019\/01"},{"key":"e_1_2_1_5_1","unstructured":"Blake C. L. and Merz C. J. 1998. UCI repository of machine learning databases.  Blake C. L. and Merz C. J. 1998. UCI repository of machine learning databases."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). 63--74","author":"Boulis C.","unstructured":"Boulis , C. and Ostendorf , M . 2004. Combining multiple clustering systems . In Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). 63--74 . Boulis, C. and Ostendorf, M. 2004. Combining multiple clustering systems. In Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). 63--74."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS). 524--533","author":"Charikar M.","unstructured":"Charikar , M. , Guruswami , V. , and Wirth , A . 2003. Clustering with qualitative information . In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS). 524--533 . Charikar, M., Guruswami, V., and Wirth, A. 2003. Clustering with qualitative information. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS). 524--533."},{"key":"e_1_2_1_8_1","unstructured":"Cristofor D. and Simovici D. A. 2001. An information-theoretical approach to genetic algorithms for clustering. Tech. rep. TR-01-02 UMass Boston MA.  Cristofor D. and Simovici D. A. 2001. An information-theoretical approach to genetic algorithms for clustering. Tech. rep. TR-01-02 UMass Boston MA."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2006.05.008"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Deza M. and Laurent M. 1997. Geometry of Cuts and Metrics. Springer-Verlag.  Deza M. and Laurent M. 1997. Geometry of Cuts and Metrics. Springer-Verlag.","DOI":"10.1007\/978-3-642-04295-9"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/371920.372165"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 28--36","author":"Fagin R.","unstructured":"Fagin , R. , Kumar , R. , and Sivakumar , D . 2003. Comparing top k lists . In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 28--36 . Fagin, R., Kumar, R., and Sivakumar, D. 2003. Comparing top k lists. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 28--36."},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the International Conference on Machine Learning (ICML). 186--193","author":"Fern X. Z.","unstructured":"Fern , X. Z. and Brodley , C. E . 2003. Random projection for high dimensional data clustering: A cluster ensemble approach . In Proceedings of the International Conference on Machine Learning (ICML). 186--193 . Fern, X. Z. and Brodley, C. E. 2003. Random projection for high dimensional data clustering: A cluster ensemble approach. In Proceedings of the International Conference on Machine Learning (ICML). 186--193."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218213004001867"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the International Conference on Pattern Recognition (ICPR). 276--280","author":"Fred A.","unstructured":"Fred , A. and Jain , A. K . 2002. Data clustering using evidence accumulation . In Proceedings of the International Conference on Pattern Recognition (ICPR). 276--280 . Fred, A. and Jain, A. K. 2002. Data clustering using evidence accumulation. In Proceedings of the International Conference on Pattern Recognition (ICPR). 276--280."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4379(00)00022-3"},{"key":"e_1_2_1_17_1","unstructured":"Hamerly G. and Elkan C. 2003. Learning the k in k-means. In Advances in Neural Information Processing Systems (NIPS).  Hamerly G. and Elkan C. 2003. Learning the k in k-means. In Advances in Neural Information Processing Systems (NIPS)."},{"key":"e_1_2_1_18_1","volume-title":"Data Mining: Concepts and Techniques. Morgan Kaufmann.","author":"Han J.","year":"2001","unstructured":"Han , J. and Kamber , M . 2001 . Data Mining: Concepts and Techniques. Morgan Kaufmann. Han, J. and Kamber, M. 2001. Data Mining: Concepts and Techniques. Morgan Kaufmann."},{"key":"e_1_2_1_19_1","unstructured":"Hand D. Mannila H. and Smyth P. 2001. Principles of Data Mining. The MIT Press Cambridge MA.   Hand D. Mannila H. and Smyth P. 2001. Principles of Data Mining. The MIT Press Cambridge MA."},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Hochbaum D. and Shmoys D. 1985. A best possible heuristic for the k-center problem. Mathem. Operat. Resea. 180--184.  Hochbaum D. and Shmoys D. 1985. A best possible heuristic for the k-center problem. Mathem. Operat. Resea. 180--184.","DOI":"10.1287\/moor.10.2.180"},{"key":"e_1_2_1_21_1","unstructured":"Jain A. K. and Dubes R. C. 1988. Algorithms for Clustering Data. Prentice-Hall.   Jain A. K. and Dubes R. C. 1988. Algorithms for Clustering Data. Prentice-Hall."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150442"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1214\/aos\/1176344136"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008940618127"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1162\/153244303321897735"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 526--527","author":"Swamy C.","year":"2004","unstructured":"Swamy , C. 2004 . Correlation clustering: maximizing agreements via semidefinite programming . In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 526--527 . Swamy, C. 2004. Correlation clustering: maximizing agreements via semidefinite programming. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 526--527."},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the SIAM International Conference on Data Mining (SDM). 379--390","author":"Topchy A.","unstructured":"Topchy , A. , Jain , A. K. , and Punch , W . 2004. A mixture model of clustering ensembles . In Proceedings of the SIAM International Conference on Data Mining (SDM). 379--390 . Topchy, A., Jain, A. K., and Punch, W. 2004. A mixture model of clustering ensembles. In Proceedings of the SIAM International Conference on Data Mining (SDM). 379--390."}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1217299.1217303","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1217299.1217303","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:29:33Z","timestamp":1750285773000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1217299.1217303"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,3]]},"references-count":27,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2007,3]]}},"alternative-id":["10.1145\/1217299.1217303"],"URL":"https:\/\/doi.org\/10.1145\/1217299.1217303","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,3]]},"assertion":[{"value":"2007-03-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}