{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:36:27Z","timestamp":1750307787629,"version":"3.41.0"},"reference-count":27,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2008,1,1]],"date-time":"2008-01-01T00:00:00Z","timestamp":1199145600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000143","name":"Division of Computing and Communication Foundations","doi-asserted-by":"publisher","award":["CCF-0325459IIS-0347408IIS-0612170"],"award-info":[{"award-number":["CCF-0325459IIS-0347408IIS-0612170"]}],"id":[{"id":"10.13039\/100000143","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000145","name":"Division of Information and Intelligent Systems","doi-asserted-by":"publisher","award":["CCF-0325459IIS-0347408IIS-0612170"],"award-info":[{"award-number":["CCF-0325459IIS-0347408IIS-0612170"]}],"id":[{"id":"10.13039\/100000145","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2008,1]]},"abstract":"<jats:p>\n            Using a mixture of random variables to model data is a tried-and-tested method common in data mining, machine learning, and statistics. By using mixture modeling it is often possible to accurately model even complex, multimodal data via very simple components. However, the classical mixture model assumes that a data point is generated by a single component in the model. A lot of datasets can be modeled closer to the underlying reality if we drop this restriction. We propose a probabilistic framework, the\n            <jats:italic>mixture-of-subsets (MOS) model<\/jats:italic>\n            , by making two fundamental changes to the classical mixture model. First, we allow a data point to be generated by a set of components, rather than just a single component. Next, we limit the number of data attributes that each component can influence. We also propose an EM framework to learn the MOS model from a dataset, and experimentally evaluate it on real, high-dimensional datasets. Our results show that the MOS model learned from the data represents the underlying nature of the data accurately.\n          <\/jats:p>","DOI":"10.1145\/1324172.1324175","type":"journal-article","created":{"date-parts":[[2008,2,8]],"date-time":"2008-02-08T15:32:16Z","timestamp":1202484736000},"page":"1-42","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Learning correlations using the mixture-of-subsets model"],"prefix":"10.1145","volume":"1","author":[{"given":"Manas","family":"Somaiya","sequence":"first","affiliation":[{"name":"University of Florida, Gainesville, FL"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christopher","family":"Jermaine","sequence":"additional","affiliation":[{"name":"University of Florida, Gainesville, FL"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sanjay","family":"Ranka","sequence":"additional","affiliation":[{"name":"University of Florida, Gainesville, FL"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2008,2,2]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_2_1_1_1","DOI":"10.1145\/304182.304188"},{"doi-asserted-by":"publisher","key":"e_1_2_1_2_1","DOI":"10.1145\/342009.335383"},{"doi-asserted-by":"publisher","key":"e_1_2_1_3_1","DOI":"10.1145\/276304.276314"},{"volume-title":"Proceedings of the 20th International Conference on Very Large Databases (VLDB). Morgan Kaufmann","author":"Agrawal R.","key":"e_1_2_1_4_1"},{"key":"e_1_2_1_5_1","volume-title":"Lecture Notes in Math","volume":"1117","author":"Aldous D. J.","year":"1985"},{"doi-asserted-by":"publisher","key":"e_1_2_1_6_1","DOI":"10.1016\/0893-6080(95)00003-8"},{"doi-asserted-by":"publisher","key":"e_1_2_1_7_1","DOI":"10.1145\/1014052.1014111"},{"doi-asserted-by":"publisher","key":"e_1_2_1_9_1","DOI":"10.1145\/347090.347119"},{"doi-asserted-by":"publisher","key":"e_1_2_1_10_1","DOI":"10.1145\/502512.502523"},{"doi-asserted-by":"publisher","key":"e_1_2_1_11_1","DOI":"10.1145\/508791.508886"},{"doi-asserted-by":"publisher","key":"e_1_2_1_12_1","DOI":"10.1145\/312129.312199"},{"doi-asserted-by":"crossref","unstructured":"Dempster A. P. Laird N. M. and Rubin D. B. 1977. Maximum likelihood from incomplete data via the em algorithm. J. Royal Statist. Soc. B-39 1--39.  Dempster A. P. Laird N. M. and Rubin D. B. 1977. Maximum likelihood from incomplete data via the em algorithm. J. Royal Statist. Soc. B-39 1--39.","key":"e_1_2_1_13_1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x"},{"volume-title":"Proceedings of the 3rd IEEE International Conference on Data Mining. IEEE Computer Society","author":"Dhillon I. S.","key":"e_1_2_1_14_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_15_1","DOI":"10.1145\/956750.956764"},{"doi-asserted-by":"publisher","key":"e_1_2_1_16_1","DOI":"10.1111\/j.1467-9868.2004.02059.x"},{"doi-asserted-by":"publisher","key":"e_1_2_1_17_1","DOI":"10.1145\/1081870.1081879"},{"doi-asserted-by":"publisher","key":"e_1_2_1_18_1","DOI":"10.1109\/TSP.2006.870586"},{"unstructured":"Griffiths T. and Ghahramani Z. 2006. Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems 18 Y. Weiss et al. eds. MIT Press Cambridge MA 475--482.  Griffiths T. and Ghahramani Z. 2006. Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems 18 Y. Weiss et al. eds. MIT Press Cambridge MA 475--482.","key":"e_1_2_1_19_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_20_1","DOI":"10.1145\/354756.354775"},{"volume-title":"Mixture Models: Inference and Applications to Clustering","year":"1988","author":"McLachlan G. J.","key":"e_1_2_1_21_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_22_1","DOI":"10.1093\/bioinformatics\/18.3.413"},{"doi-asserted-by":"crossref","unstructured":"McLachlan G. J. and Peel D. 2000. Finite Mixture Models. Wiley New York.  McLachlan G. J. and Peel D. 2000. Finite Mixture Models. Wiley New York.","key":"e_1_2_1_23_1","DOI":"10.1002\/0471721182"},{"volume-title":"Mafia: Efficient and scalable subspace clustering for very large datasets. Tech. Rep. CPDC-TR-9906-010","year":"1999","author":"Nagesh H.","key":"e_1_2_1_24_1"},{"unstructured":"Pitman J. 2002. Combinatorial stochastic processes. Notes for Saint Flour Summer School.  Pitman J. 2002. Combinatorial stochastic processes. Notes for Saint Flour Summer School.","key":"e_1_2_1_25_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_26_1","DOI":"10.1145\/564691.564739"},{"doi-asserted-by":"publisher","key":"e_1_2_1_27_1","DOI":"10.1016\/j.infsof.2003.07.003"},{"volume-title":"Proceedings of the 18th International Conference on Data Engineering. IEEE Computer Society","author":"Yang J.","key":"e_1_2_1_28_1"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1324172.1324175","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1324172.1324175","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:56:15Z","timestamp":1750254975000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1324172.1324175"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,1]]},"references-count":27,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2008,1]]}},"alternative-id":["10.1145\/1324172.1324175"],"URL":"https:\/\/doi.org\/10.1145\/1324172.1324175","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2008,1]]},"assertion":[{"value":"2007-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2007-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-02-02","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}