{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,11]],"date-time":"2025-03-11T09:10:24Z","timestamp":1741684224962,"version":"3.38.0"},"reference-count":53,"publisher":"SAGE Publications","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IDA"],"published-print":{"date-parts":[[2020,12,18]]},"abstract":"<jats:p>Subgroup Discovery is a supervised, exploratory data mining paradigm that aims to identify subsets of a dataset that show interesting behaviour with respect to some designated target attribute. The way in which such distributional differences are quantified varies with the target attribute type. This work concerns continuous targets, which are important in many practical applications. For such targets, differences are often quantified using z-score and similar measures that compare simple statistics such as the mean and variance of the subset and the data. However, most distributions are not fully determined by their mean and variance alone. As a result, measures of distributional difference solely based on such simple statistics will miss potentially interesting subgroups. This work proposes methods to recognise distributional differences in a much broader sense. To this end, density estimation is performed using histogram and kernel density estimation techniques. In the spirit of Exceptional Model Mining, the proposed methods are extended to deal with multiple continuous target attributes, such that comparisons are not restricted to univariate distributions, but are available for joint distributions of any dimensionality. The methods can be incorporated easily into existing Subgroup Discovery frameworks, so no new frameworks are developed.<\/jats:p>","DOI":"10.3233\/ida-194719","type":"journal-article","created":{"date-parts":[[2020,12,22]],"date-time":"2020-12-22T19:58:41Z","timestamp":1608667121000},"page":"1403-1439","source":"Crossref","is-referenced-by-count":2,"title":["Uni- and multivariate probability density models for numeric subgroup discovery"],"prefix":"10.1177","volume":"24","author":[{"given":"Marvin","family":"Meeng","sequence":"first","affiliation":[{"name":"LIACS, Leiden University, the Netherlands"}]},{"given":"Harm","family":"de Vries","sequence":"additional","affiliation":[{"name":"Universit\u00e9 de Montr\u00e9al, Canada"}]},{"given":"Peter","family":"Flach","sequence":"additional","affiliation":[{"name":"University of Bristol, United Kingdom"}]},{"given":"Siegfried","family":"Nijssen","sequence":"additional","affiliation":[{"name":"Universit\u00e9 Catholique de Louvain, Belgium"}]},{"given":"Arno","family":"Knobbe","sequence":"additional","affiliation":[{"name":"LIACS, Leiden University, the Netherlands"}]}],"member":"179","reference":[{"issue":"1","key":"10.3233\/IDA-194719_ref1","first-page":"35","article-title":"Subgroup discovery","volume":"5","author":"Atzm\u00fcller","year":"2015","journal-title":"Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery"},{"key":"10.3233\/IDA-194719_ref2","doi-asserted-by":"crossref","unstructured":"P.A. Flach, Machine Learning: The Art and Science of Algorithms that Make Sense of Data, Cambridge University Press, 2012.","DOI":"10.1017\/CBO9780511973000"},{"key":"10.3233\/IDA-194719_ref3","first-page":"249","article-title":"EXPLORA: A multipattern and multistrategy discovery assistant","author":"Kl\u00f6sgen","year":"1996","journal-title":"Advances in Knowledge Discovery and Data Mining"},{"key":"10.3233\/IDA-194719_ref4","first-page":"78","article-title":"An algorithm for multi-relational discovery of subgroups","author":"Wrobel","year":"1997","journal-title":"PKDD 1997, Principles of Data Mining and Knowledge Discovery, European Symposium, Trondheim, Norway, 24\u201327 June, 1997, Proceedings"},{"key":"10.3233\/IDA-194719_ref5","first-page":"35","article-title":"Fast subgroup discovery for continuous target concepts","author":"Atzm\u00fcller","year":"2009","journal-title":"ISMIS 2009, International Symposium on Methodologies for Intelligent Systems, Prague, Czech Republic, 14\u201317 September, 2009, Proceedings"},{"issue":"3","key":"10.3233\/IDA-194719_ref6","doi-asserted-by":"crossref","first-page":"711","DOI":"10.1007\/s10618-015-0436-8","article-title":"Fast exhaustive subgroup discovery with numerical target concepts","volume":"30","author":"Lemmerich","year":"2016","journal-title":"Data Mining and Knowledge Discovery"},{"key":"10.3233\/IDA-194719_ref7","unstructured":"B.F.I. Pieters, A. Knobbe and S. D\u017eeroski, Subgroup discovery in ranked data, with an application to gene set enrichment, in PL-10, Preference Learning Workshop at ECML PKDD 2010, 2010."},{"key":"10.3233\/IDA-194719_ref8","first-page":"1","article-title":"Exceptional model mining","author":"Leman","year":"2008","journal-title":"ECML PKDD 2008, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Antwerp, Belgium, 15\u201319 September, 2008, Proceedings, Part II"},{"issue":"3","key":"10.3233\/IDA-194719_ref9","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.1214\/aoms\/1177704472","article-title":"On estimation of a probability density function and mode","volume":"33","author":"Parzen","year":"1962","journal-title":"The Annals of Mathematical Statistics"},{"issue":"3","key":"10.3233\/IDA-194719_ref10","doi-asserted-by":"crossref","first-page":"832","DOI":"10.1214\/aoms\/1177728190","article-title":"Remarks on some nonparametric estimates of a density function","volume":"27","author":"Rosenblatt","year":"1956","journal-title":"The Annals of Mathematical Statistics"},{"key":"10.3233\/IDA-194719_ref11","unstructured":"M.P. Wand and M.C. Jones, Kernel Smoothing. No.\u00a060 in Monographs on Statistics and Applied Probability, Boca Raton, FL, USA: Chapman & Hall\/CRC, 1994."},{"key":"10.3233\/IDA-194719_ref12","first-page":"247","article-title":"Distribution rules with numeric attributes of interest","author":"Jorge","year":"2006","journal-title":"PKDD 2006, European Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, 18\u201322 September, 2006, Proceedings"},{"issue":"1","key":"10.3233\/IDA-194719_ref13","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1093\/mnras\/225.1.155","article-title":"A multidimensional version of the Kolmogorov-Smirnov test","volume":"225","author":"Fasano","year":"1987","journal-title":"Monthly Notices of the Royal Astronomical Society"},{"issue":"3","key":"10.3233\/IDA-194719_ref14","doi-asserted-by":"crossref","first-page":"615","DOI":"10.1093\/mnras\/202.3.615","article-title":"Two-dimensional goodness-of-fit testing in astronomy","volume":"202","author":"Peacock","year":"1983","journal-title":"Monthly Notices of the Royal Astronomical Society"},{"key":"10.3233\/IDA-194719_ref15","unstructured":"H. Grosskreutz, Cascaded subgroups discovery with an application to regression, in: LeGo-08, From Local Patterns to Global Models Workshop at ECML PKDD 2008, 2008."},{"issue":"5","key":"10.3233\/IDA-194719_ref16","doi-asserted-by":"crossref","first-page":"1391","DOI":"10.1007\/s10618-017-0520-3","article-title":"Identifying consistent statements about numerical data with dispersion-corrected subgroup discovery","volume":"31","author":"Boley","year":"2017","journal-title":"Data Mining and Knowledge Discovery"},{"key":"10.3233\/IDA-194719_ref17","first-page":"288","article-title":"Difference-based estimates for generalization-aware subgroup discovery","author":"Lemmerich","year":"2013","journal-title":"ECML PKDD 2013, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Prague, Czech Republic, 23\u201327 September, 2013, Proceedings, Part III"},{"key":"10.3233\/IDA-194719_ref18","first-page":"194","article-title":"Supervised and unsupervised discretization of continuous features","author":"Dougherty","year":"1995","journal-title":"ICML 1995, International Conference on Machine Learning, Tahoe City, California, USA, 9\u201312 July, 1995, Proceedings"},{"key":"10.3233\/IDA-194719_ref19","doi-asserted-by":"crossref","unstructured":"M. Atzm\u00fcller and F. Lemmerich, VIKAMINE \u2013 Open-Source subgroup discovery, pattern mining, and analytics, in Flach et al. , pp.\u00a0842\u2013845.","DOI":"10.1007\/978-3-642-33486-3_60"},{"key":"10.3233\/IDA-194719_ref20","doi-asserted-by":"crossref","unstructured":"M. Meeng and A. Knobbe, For real \u2013 A thorough look at numeric attributes in subgroup discovery, Data Mining and Knowledge Discovery, 2020.","DOI":"10.1007\/s10618-020-00703-x"},{"key":"10.3233\/IDA-194719_ref21","first-page":"533","article-title":"Fast and memory-efficient discovery of the top-k relevant subgroups in a reduced candidate space","author":"Grosskreutz","year":"2011","journal-title":"ECML PKDD 2011, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Athens, Greece, 5\u20139 September, 2011, Proceedings, Part I"},{"key":"10.3233\/IDA-194719_ref22","doi-asserted-by":"crossref","unstructured":"F. Lemmerich, M. Becker and M. Atzm\u00fcller, Generic pattern trees for exhaustive exceptional model mining, in Flach et al. , pp.\u00a0277\u2013292.","DOI":"10.1007\/978-3-642-33486-3_18"},{"key":"10.3233\/IDA-194719_ref23","first-page":"868","article-title":"Different slopes for different folks \u2013 Mining for exceptional regression models with Cook\u2019s distance","author":"Duivesteijn","year":"2012","journal-title":"KDD 2012, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12\u201316 August, 2012, Proceedings"},{"key":"10.3233\/IDA-194719_ref24","first-page":"1","article-title":"Mining frequent patterns without candidate generation","author":"Han","year":"2000","journal-title":"SIGMOD 2000, International Conference on Management of Data, Dallas, Texas, USA, 16\u201318, May, 2000, Proceedings"},{"key":"10.3233\/IDA-194719_ref25","first-page":"158","article-title":"Subgroup discovery meets Bayesian networks \u2013 An exceptional model mining approach","author":"Duivesteijn","year":"2010","journal-title":"ICDM 2010, IEEE International Conference on Data Mining, Sydney, Australia, 14\u201317 December, 2010, Proceedings"},{"key":"10.3233\/IDA-194719_ref26","doi-asserted-by":"crossref","unstructured":"E. Galbrun and P. Miettinen, Redescription Mining, Briefs in Computer Science, Springer, 2017.","DOI":"10.1007\/978-3-319-72889-6"},{"key":"10.3233\/IDA-194719_ref27","first-page":"704","article-title":"ROCsearch \u2013 An ROC-guided search strategy for subgroup discovery","author":"Meeng","year":"2014","journal-title":"SDM 2014, International Conference on Data Mining, Philadelphia, Pennsylvania, USA, 24\u201326 April, 2014, Proceedings"},{"issue":"2","key":"10.3233\/IDA-194719_ref28","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1007\/s10618-012-0273-y","article-title":"Diverse subgroup set discovery","volume":"25","author":"van Leeuwen","year":"2012","journal-title":"Data Mining and Knowledge Discovery"},{"key":"10.3233\/IDA-194719_ref29","first-page":"117","article-title":"Flexible enrichment with Cortana \u2013 Software demo","author":"Meeng","year":"2011","journal-title":"Benelearn 2011, Belgian Dutch Conference on Machine Learn., The Hague, The Netherlands, 20 May, 2011, Proceedings"},{"issue":"4","key":"10.3233\/IDA-194719_ref30","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1007\/BF01025868","article-title":"On the histogram as a density estimator: L2 theory","volume":"57","author":"Freedman","year":"1981","journal-title":"Zeitschrift f\u00fcr Wahrscheinlichkeitstheorie und Verwandte Gebiete"},{"key":"10.3233\/IDA-194719_ref31","first-page":"466","article-title":"Fast incremental maintenance of approximate histograms","author":"Gibbons","year":"1997","journal-title":"VLDB 1997, International Conference on Very Large Data Bases, Athens, Greece, 25\u201329 August, 1997, Proceedings"},{"key":"10.3233\/IDA-194719_ref32","doi-asserted-by":"crossref","unstructured":"Y.E. Ioannidis, The history of histograms (abridged), in VLDB 2003, International Conference on Very Large Data Bases, Berlin, Germany, 9\u201312 September, 2003, Proceedings, J.C. Freytag, P.C. Lockemann, S. Abiteboul, M.J. Carey, P.G. Selinger and A. Heuer, eds, (San Francisco, CA, USA), Morgan Kaufmann, 2003.","DOI":"10.1016\/B978-012722442-8\/50011-2"},{"key":"10.3233\/IDA-194719_ref33","first-page":"1022","article-title":"Multi-interval discretization of continuous-valued attributes for classification learning","author":"Fayyad","year":"1993","journal-title":"IJCAI 1993, International Joint Conference on Artificial Intelligence, Chamb\u00e9ry, France, 28 August \u2013 3 September, 1993, Proceedings, Part II"},{"key":"10.3233\/IDA-194719_ref34","first-page":"219","article-title":"MDL histogram density estimation","author":"Kontkanen","year":"2007","journal-title":"AISTATS 2007, International Conference on Artificial Intelligence and Statistics, San Juan, Puerto Rico, 21\u201324 March, 2007, Proceedings, Part II"},{"issue":"3","key":"10.3233\/IDA-194719_ref35","first-page":"64","article-title":"Markov processes over denumerable products of spaces, describing large systems of automata","volume":"5","author":"Vaserstein","year":"1969","journal-title":"Problemy Peredachi Informatsii"},{"key":"10.3233\/IDA-194719_ref36","first-page":"99","article-title":"On a measure of divergence between two statistical populations defined by their probability distributions","volume":"35","author":"Bhattacharyya","year":"1943","journal-title":"Bulletin of the Calcutta Mathematical Society"},{"issue":"4","key":"10.3233\/IDA-194719_ref37","doi-asserted-by":"crossref","first-page":"588","DOI":"10.1214\/088342304000000297","article-title":"Density estimation","volume":"19","author":"Sheather","year":"2004","journal-title":"Statistical Science"},{"key":"10.3233\/IDA-194719_ref38","unstructured":"C.M. Bishop, Pattern Recognition and Machine Learning, New York, NY, USA: Springer Verlag, 2006."},{"issue":"1","key":"10.3233\/IDA-194719_ref39","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1214\/aoms\/1177729694","article-title":"On information and sufficiency","volume":"22","author":"Kullback","year":"1951","journal-title":"The Annals of Mathematical Statistics"},{"issue":"1","key":"10.3233\/IDA-194719_ref40","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1109\/18.61115","article-title":"Divergence measures based on the Shannon entropy","volume":"37","author":"Lin","year":"1991","journal-title":"IEEE Transactions on Information Theory"},{"issue":"1","key":"10.3233\/IDA-194719_ref41","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1006\/jmva.1994.1033","article-title":"Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates","volume":"50","author":"Anderson","year":"1994","journal-title":"Journal of Multivariate Analysis"},{"key":"10.3233\/IDA-194719_ref42","doi-asserted-by":"crossref","unstructured":"C.E. Rasmussen and C.K.I. Williams, Gaussian Processes for Machine Learning, Adaptive computation and machine learning, MIT Press, 2006.","DOI":"10.7551\/mitpress\/3206.001.0001"},{"issue":"1","key":"10.3233\/IDA-194719_ref44","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1023\/A:1008323212047","article-title":"Predicting chemical parameters of river water quality from bioindicator data","volume":"13","author":"D\u017eeroski","year":"2000","journal-title":"Applied Intelligence"},{"key":"10.3233\/IDA-194719_ref45","first-page":"481","article-title":"Cross-mining binary and numerical attributes","author":"Garriga","year":"2007","journal-title":"ICDM 2007, IEEE International Conference on Data Mining, Omaha, Nebraska, USA, 28\u201331 October, 2007, Proceedings"},{"key":"10.3233\/IDA-194719_ref46","unstructured":"A.J. Mitchell-Jones, G. Amori, W. Bogdanowicz, B. Krystufek, P.J.H. Reijnders, F. Spitzenberger, M. Stubbe, J.B.M. Thissen, V. Vohralik and J. Zima, The Atlas of European Mammals 3. Academic Press London, 1999."},{"key":"10.3233\/IDA-194719_ref47","doi-asserted-by":"crossref","unstructured":"B.W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall, 1986.","DOI":"10.1007\/978-1-4899-3324-9"},{"issue":"1","key":"10.3233\/IDA-194719_ref48","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1198\/jcgs.2009.0002","article-title":"Variations on the histogram","volume":"18","author":"Denby","year":"2009","journal-title":"Journal of Computational and Graphical Statistics"},{"issue":"433","key":"10.3233\/IDA-194719_ref49","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1080\/01621459.1996.10476701","article-title":"A brief survey of bandwidth selection for density estimation","volume":"91","author":"Jones","year":"1996","journal-title":"Journal of the American Statistical Association"},{"issue":"3","key":"10.3233\/IDA-194719_ref50","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1093\/biomet\/66.3.605","article-title":"On optimal and data-based histograms","volume":"66","author":"Scott","year":"1979","journal-title":"Biometrika"},{"issue":"3","key":"10.3233\/IDA-194719_ref51","doi-asserted-by":"crossref","first-page":"683","DOI":"10.1111\/j.2517-6161.1991.tb01857.x","article-title":"A reliable data-based bandwidth selection method for kernel density estimation","volume":"53","author":"Sheather","year":"1991","journal-title":"Journal of the Royal Statistical Society"},{"issue":"2","key":"10.3233\/IDA-194719_ref52","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1093\/biomet\/71.2.353","article-title":"An alternative method of cross-validation for the smoothing of density estimates","volume":"71","author":"Bowman","year":"1984","journal-title":"Biometrika"},{"issue":"1","key":"10.3233\/IDA-194719_ref53","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/BF01205233","article-title":"Smoothed cross-validation","volume":"92","author":"Hall","year":"1992","journal-title":"Probability Theory and Related Fields"},{"key":"10.3233\/IDA-194719_ref54","doi-asserted-by":"crossref","unstructured":"P.A. Flach, De Bie and N. Cristianini, eds, ECML PKDD 2012, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Bristol, UK, 24\u201328 September, 2012, Proceedings II, vol.\u00a07524 of LNCS, (Berlin, Heidelberg, Germany), Springer, 2012.","DOI":"10.1007\/978-3-642-33486-3"}],"container-title":["Intelligent Data Analysis"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/IDA-194719","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,11]],"date-time":"2025-03-11T08:08:01Z","timestamp":1741680481000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/IDA-194719"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,18]]},"references-count":53,"journal-issue":{"issue":"6"},"URL":"https:\/\/doi.org\/10.3233\/ida-194719","relation":{},"ISSN":["1088-467X","1571-4128"],"issn-type":[{"type":"print","value":"1088-467X"},{"type":"electronic","value":"1571-4128"}],"subject":[],"published":{"date-parts":[[2020,12,18]]}}}