{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T08:21:53Z","timestamp":1759134113697,"version":"3.41.0"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2008,7,1]],"date-time":"2008-07-01T00:00:00Z","timestamp":1214870400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000145","name":"Division of Information and Intelligent Systems","doi-asserted-by":"publisher","award":["IIS-0713142IIS-0325116IIS-0307792"],"award-info":[{"award-number":["IIS-0713142IIS-0325116IIS-0307792"]}],"id":[{"id":"10.13039\/100000145","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2008,7]]},"abstract":"<jats:p>\n            In classical clustering, each data point is assigned to at least one cluster. However, in many applications only a small subset of the available data is relevant for the problem and the rest needs to be ignored in order to obtain good clusters. Certain nonparametric density-based clustering methods find the most relevant data as multiple dense regions, but such methods are generally limited to low-dimensional data and do not scale well to large, high-dimensional datasets. Also, they use a specific notion of \u201cdistance\u201d, typically Euclidean or Mahalanobis distance, which further limits their applicability. On the other hand, the recent One Class Information Bottleneck (OC-IB) method is fast and works on a large class of distortion measures known as Bregman Divergences, but can only find a\n            <jats:italic>single<\/jats:italic>\n            dense region. This article presents a broad framework for finding\n            <jats:italic>k<\/jats:italic>\n            dense clusters while ignoring the rest of the data. It includes a seeding algorithm that can automatically determine a suitable value for\n            <jats:italic>k<\/jats:italic>\n            . When\n            <jats:italic>k<\/jats:italic>\n            is forced to 1, our method gives rise to an improved version of OC-IB with optimality guarantees. We provide a generative model that yields the proposed iterative algorithm for finding\n            <jats:italic>k<\/jats:italic>\n            dense regions as a special case. Our analysis reveals an interesting and novel connection between the problem of finding dense regions and exponential mixture models; a hard model corresponding to\n            <jats:italic>k<\/jats:italic>\n            exponential mixtures with a uniform background results in a set of\n            <jats:italic>k<\/jats:italic>\n            dense clusters. The proposed method describes a highly scalable algorithm for finding multiple dense regions that works with any Bregman Divergence, thus extending density based clustering to a variety of non-Euclidean problems not addressable by earlier methods. We present empirical results on three artificial, two microarray and one text dataset to show the relevance and effectiveness of our methods.\n          <\/jats:p>","DOI":"10.1145\/1376815.1376817","type":"journal-article","created":{"date-parts":[[2008,7,22]],"date-time":"2008-07-22T13:04:05Z","timestamp":1216731845000},"page":"1-49","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Bregman bubble clustering"],"prefix":"10.1145","volume":"2","author":[{"given":"Gunjan","family":"Gupta","sequence":"first","affiliation":[{"name":"University of Texas at Austin, Austin, TX"}]},{"given":"Joydeep","family":"Ghosh","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, TX"}]}],"member":"320","published-online":{"date-parts":[[2008,7,24]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304187"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1177\/002224378101800305"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1081870.1081932"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1014052.1014112"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/1046920.1194902"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/974614.974637"},{"volume-title":"Proceedings of the 22nd International Conference on Very Large Databases. Morgan-Kaufmann","author":"Berchtold S.","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1980.1163445"},{"key":"e_1_2_1_9_1","unstructured":"Casella G. Robert C. P. and Wells M. T. 2003. Mixture models latent variables and partitioned importance sampling. In Technical Report RePEc:fth:inseep:2000-03 Institut National de la Statistique et des Etudes Economiques. http:\/\/ideas.repec.org\/p\/fth\/inseep\/2000-03.html.  Casella G. Robert C. P. and Wells M. T. 2003. Mixture models latent variables and partitioned importance sampling. In Technical Report RePEc:fth:inseep:2000-03 Institut National de la Statistique et des Etudes Economiques. http:\/\/ideas.repec.org\/p\/fth\/inseep\/2000-03.html."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/72.536318"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.400568"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015399"},{"volume-title":"Proceedings of COLT-03","author":"Crammer K.","key":"e_1_2_1_13_1"},{"volume-title":"Proceedings of the Workshop for Data Mining in Marketing (DMM 2007)","author":"Deodhar M.","key":"e_1_2_1_14_1"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1186\/gb-2002-3-12-research0069"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944973"},{"volume-title":"Proceedings of 2nd SIAM International Conference on Data Mining (Workshop on Clustering High-Dimensional Data and Its Applications). SIAM","author":"Dhillon I. S.","key":"e_1_2_1_17_1"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007612920971"},{"volume-title":"Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'96)","author":"Ester M.","key":"e_1_2_1_19_1"},{"key":"e_1_2_1_20_1","first-page":"112","article-title":"Genomic expression programs in the response of yeast cells to environmental changes","volume":"11","author":"Gasch A. P.","year":"2000","journal-title":"Mol. Bio. Cell."},{"volume-title":"ICCV '03: Proceedings of the 9th IEEE International Conference on Computer Vision","author":"Georgescu B.","key":"e_1_2_1_21_1"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkg078"},{"volume-title":"Proceedings of the 15th International Conference on Data Engineering","author":"Guha S.","key":"e_1_2_1_23_1"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1102351.1102386"},{"key":"e_1_2_1_26_1","unstructured":"Gupta G. and Ghosh J. 2006a. Bregman Bubble Clustering: A robust framework for mining dense clusters. In Tech Report Dept. of Elec. &amp; Comp. Engineering University of Texas at Austin. IDEAL-TR04 http:\/\/www.lans.ece.utexas.edu\/techreps.html.  Gupta G. and Ghosh J. 2006a. Bregman Bubble Clustering: A robust framework for mining dense clusters. In Tech Report Dept. of Elec. &amp; Comp. Engineering University of Texas at Austin. IDEAL-TR04 http:\/\/www.lans.ece.utexas.edu\/techreps.html."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2006.32"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCBB.2008.32"},{"volume-title":"Proceedings of the 1st International SIAM Conference on Data Mining (SDM'01)","author":"Gupta G. K.","key":"e_1_2_1_30_1"},{"volume-title":"Value Balanced Agglomerative Connectivity Clustering. In SPIE Proceedings of the Conference on Data Mining and Knowledge Discovery III","author":"Gupta G. K.","key":"e_1_2_1_31_1"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1186\/gb-2000-1-2-research0003"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1137\/0916028"},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Hubert L. and Arabie P. 1985. Comparing partitions. J. Class. 193--218.  Hubert L. and Arabie P. 1985. Comparing partitions. J. Class. 193--218.","DOI":"10.1007\/BF01908075"},{"key":"e_1_2_1_35_1","unstructured":"Jain A. K. and Dubes R. C. 1988. Algorithms for Clustering Data. Prentice Hall Englewood Cliffs NJ.   Jain A. K. and Dubes R. C. 1988. Algorithms for Clustering Data. Prentice Hall Englewood Cliffs NJ."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1014052.1014101"},{"volume-title":"DHC: A density-based hierarchical clustering method for time series gene expression data. In Proceedings of BIBE'03 (Washington, DC)","year":"2003","author":"Jiang D.","key":"e_1_2_1_37_1"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02289588"},{"key":"e_1_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Judd A. and Hovland M. 2007. Seabed Fluid Flow: The Impact of Geology Biology and the Marine Environment. Cambridge University Press Cambridge UK.  Judd A. and Hovland M. 2007. Seabed Fluid Flow: The Impact of Geology Biology and the Marine Environment. Cambridge University Press Cambridge UK.","DOI":"10.1017\/CBO9780511535918"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/266021.266273"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827595287997"},{"volume-title":"Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence. AAAI","author":"Kearns M.","key":"e_1_2_1_42_1"},{"volume":"26","volume-title":"BTW. LNI","author":"Kriegel H.-P.","key":"e_1_2_1_43_1"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-377-6.50048-7"},{"issue":"1","key":"e_1_2_1_45_1","first-page":"61","article-title":"Plaid models for gene expression data","volume":"12","author":"Lazzeroni L.","year":"2002","journal-title":"Stat. Sin."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.1099511"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCOM.1980.1094577"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIC.2003.1167344"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273568"},{"volume-title":"Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 281--297","year":"1967","author":"MacQueen J.","key":"e_1_2_1_50_1"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1074\/jbc.M400589200"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/28.22.4523"},{"key":"e_1_2_1_53_1","doi-asserted-by":"crossref","unstructured":"Neal R. and Hinton G. 1998. A view of the EM algorithm that justifies incremental sparse and other variants. In Learning in Graphical Models M. I. Jordan Ed. Kluwer.   Neal R. and Hinton G. 1998. A view of the EM algorithm that justifies incremental sparse and other variants. In Learning in Graphical Models M. I. Jordan Ed. Kluwer.","DOI":"10.1007\/978-94-011-5014-9_12"},{"volume-title":"Tech. Rep. CMU-CS-01-109, School of Computer Science","year":"2001","author":"Pietra S. D.","key":"e_1_2_1_54_1"},{"key":"e_1_2_1_55_1","doi-asserted-by":"crossref","unstructured":"Schmid C. Sengstag T. Bucher P. and Delorenzi M. 2007. MADAP a flexible clustering tool for the interpretation of one-dimensional genome annotation data. Nucleic Acids Res W201--W205.  Schmid C. Sengstag T. Bucher P. and Delorenzi M. 2007. MADAP a flexible clustering tool for the interpretation of one-dimensional genome annotation data. Nucleic Acids Res W201--W205.","DOI":"10.1093\/nar\/gkm343"},{"volume-title":"Proceedings of KDD. AAAI Press","author":"Sch\u00f6lkopf B.","key":"e_1_2_1_56_1"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976601750264965"},{"volume-title":"Proceedings of 8th Pacific Symposium on Biocomputing (PSB), Kaua'i.","author":"Segal E.","key":"e_1_2_1_58_1"},{"volume-title":"Proceedings of 8th ISMB, 307--316","author":"Sharan R.","key":"e_1_2_1_59_1"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0507432102"},{"volume":"4057","volume-title":"Proceedings of the SPIE Conference on Data Mining and Knowledge Discovery: Theory, Tools, and Technology II","author":"Strehl A.","key":"e_1_2_1_61_1"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1287\/ijoc.15.2.208.14448"},{"volume-title":"Proceedings of the AAAI Workshop on AI for Web Search (AAAI","year":"2000","author":"Strehl A.","key":"e_1_2_1_63_1"},{"volume-title":"Proceedings of the ESANN-99","author":"Tax D.","key":"e_1_2_1_64_1"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.290.5500.2319"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0893-6080(97)00133-0"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1177\/002224379102800401"},{"volume-title":"Proceedings of the Colloquium in Numerical Taxonomy","year":"1968","author":"Wishart D.","key":"e_1_2_1_68_1"},{"volume-title":"Proceedings of the Computer Software and Applications Conference. 505--510","author":"Yun C.-H.","key":"e_1_2_1_69_1"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2005.06.008"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1376815.1376817","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1376815.1376817","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:57:55Z","timestamp":1750255075000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1376815.1376817"}},"subtitle":["A robust framework for mining dense clusters"],"short-title":[],"issued":{"date-parts":[[2008,7]]},"references-count":68,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2008,7]]}},"alternative-id":["10.1145\/1376815.1376817"],"URL":"https:\/\/doi.org\/10.1145\/1376815.1376817","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2008,7]]},"assertion":[{"value":"2007-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-07-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}