{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:24:57Z","timestamp":1750220697943,"version":"3.41.0"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2020,6,21]],"date-time":"2020-06-21T00:00:00Z","timestamp":1592697600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100003500","name":"University of Padova","doi-asserted-by":"crossref","award":["SID2017 and STARS"],"award-info":[{"award-number":["SID2017 and STARS"]}],"id":[{"id":"10.13039\/501100003500","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Science Foundation","award":["IIS-1247581"],"award-info":[{"award-number":["IIS-1247581"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2020,10,31]]},"abstract":"<jats:p>We present MiSoSouP, a suite of algorithms for extracting high-quality approximations of the most interesting subgroups, according to different popular interestingness measures, from a random sample of a transactional dataset. We describe a new formulation of these measures as functions of averages, that makes it possible to approximate them using sampling. We then discuss how pseudodimension, a key concept from statistical learning theory, relates to the sample size needed to obtain an high-quality approximation of the most interesting subgroups. We prove an upper bound on the pseudodimension of the problem at hand, which depends on characteristic quantities of the dataset and of the language of patterns of interest. This upper bound then leads to small sample sizes. Our evaluation on real datasets shows that MiSoSouP outperforms state-of-the-art algorithms offering the same guarantees, and it vastly speeds up the discovery of subgroups w.r.t.\u00a0analyzing the whole dataset.<\/jats:p>","DOI":"10.1145\/3385653","type":"journal-article","created":{"date-parts":[[2020,6,22]],"date-time":"2020-06-22T02:39:58Z","timestamp":1592793598000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["MiSoSouP"],"prefix":"10.1145","volume":"14","author":[{"given":"Matteo","family":"Riondato","sequence":"first","affiliation":[{"name":"Amherst College, East Drive, Amherst, MA"}]},{"given":"Fabio","family":"Vandin","sequence":"additional","affiliation":[{"name":"Universit\u00e0 di Padova, Via G. Gradenigo, Padova, Italy"}]}],"member":"320","published-online":{"date-parts":[[2020,6,21]]},"reference":[{"volume-title":"Bartlett","year":"1999","author":"Anthony Martin","key":"e_1_2_1_1_1"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1144"},{"volume-title":"Proceedings of the 24th Annual European Symposium on Algorithms (ESA\u201916)","year":"2016","author":"Borassi Michele","key":"e_1_2_1_3_1"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-017-0547-5"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4612-0537-1"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2339530.2339668"},{"volume-title":"Proceedings of the18th National Conference on Artificial Intelligence, Rina Dechter and Richard S. Sutton (Eds.). AAAI Press\/The MIT Press, 140--145","year":"2002","author":"Elomaa Tapio","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-010-0356-2"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1963.10500830"},{"key":"e_1_2_1_10_1","article-title":"Selective Rademacher penalization and reduced error pruning of decision trees","author":"K\u00e4\u00e4ri\u00e4inen Matti","year":"2004","journal-title":"Journal of Machine Learning Research 5"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1002\/int.4550070707"},{"volume-title":"Proceedings of the Advances in Knowledge Discovery and Data Mining. American Association for Artificial Intelligence, 249--271","year":"1996","author":"Kl\u00f6sgen Willi","key":"e_1_2_1_12_1"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/18.930926"},{"key":"e_1_2_1_14_1","first-page":"377","article-title":"Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining","author":"Novak Petra Kralj","year":"2009","journal-title":"Journal of Machine Learning Research 10"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.2000.1741"},{"key":"e_1_2_1_16_1","unstructured":"M. Lichman. 2013. UCI Machine Learning Repository. Retrieved from http:\/\/archive.ics.uci.edu\/ml.  M. Lichman. 2013. UCI Machine Learning Repository. Retrieved from http:\/\/archive.ics.uci.edu\/ml."},{"volume-title":"Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 422--436","year":"2014","author":"Uno Takeaki","key":"e_1_2_1_17_1"},{"volume-title":"Probability and Computing: Randomized Algorithms and Probabilistic Analysis","author":"Mitzenmacher Michael","key":"e_1_2_1_18_1","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511813603"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-12571-8_18"},{"key":"e_1_2_1_20_1","unstructured":"Yotam Ottolenghi. 2012. Yotam Ottolenghi\u2019s recipes for char-grilled sprouting broccoli with sweet tahini plus gingery fish balls in miso soup. Retrieved from https:\/\/www.theguardian.com\/lifeandstyle\/2012\/feb\/03\/grilled-broccoli-fishball-soup-recipes.  Yotam Ottolenghi. 2012. Yotam Ottolenghi\u2019s recipes for char-grilled sprouting broccoli with sweet tahini plus gingery fish balls in miso soup. Retrieved from https:\/\/www.theguardian.com\/lifeandstyle\/2012\/feb\/03\/grilled-broccoli-fishball-soup-recipes."},{"volume-title":"Knowledge Discovery in Databases","author":"Piatetsky-Shapiro Gregory","key":"e_1_2_1_21_1"},{"volume-title":"Convergence of Stochastic Processes","author":"Pollard David","key":"e_1_2_1_22_1","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4612-5254-2"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035951"},{"volume-title":"Efficient discovery of association rules and frequent itemsets through sampling with tight performance guarantees. ACM Transactions on Knowledge Discovery from Data 8, 4","year":"2014","author":"Riondato Matteo","key":"e_1_2_1_24_1"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2783258.2783265"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3208351"},{"volume-title":"Proceedings of the 2014 SIAM International Conference on Data Mining, Mohammed Javeed Zaki, Zoran Obradovic, Pang-Ning Tan","author":"Riondato Matteo","key":"e_1_2_1_27_1"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219989"},{"key":"e_1_2_1_29_1","article-title":"Finding the most interesting patterns in a database quickly by using sequential sampling","author":"Scheffer Tobias","year":"2002","journal-title":"Journal of Machine Learning Research. 3"},{"volume-title":"Understanding Machine Learning: From Theory to Algorithms","author":"Shalev-Shwartz Shai","key":"e_1_2_1_30_1","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781107298019"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1302233110"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23808-6_30"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46307-0_4"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-3264-1"},{"volume-title":"Chervonenkis","year":"1971","author":"Vapnik Vladimir N.","key":"e_1_2_1_35_1"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.5555\/645801.669035"},{"volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201916)","year":"2016","author":"Zhao Shengjia","key":"e_1_2_1_37_1"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3385653","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3385653","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:32:49Z","timestamp":1750199569000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3385653"}},"subtitle":["Mining Interesting Subgroups with Sampling and Pseudodimension"],"short-title":[],"issued":{"date-parts":[[2020,6,21]]},"references-count":37,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2020,10,31]]}},"alternative-id":["10.1145\/3385653"],"URL":"https:\/\/doi.org\/10.1145\/3385653","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2020,6,21]]},"assertion":[{"value":"2019-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-06-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}