{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,7]],"date-time":"2024-08-07T00:42:01Z","timestamp":1722991321730},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"10","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,6]]},"abstract":"<jats:p>\n            Significant pattern mining is a fundamental task in mining transactional data, requiring to identify\n            <jats:italic>patterns<\/jats:italic>\n            significantly associated with the value of a given feature, the\n            <jats:italic>target.<\/jats:italic>\n            In several applications, such as biomedicine, basket market analysis, and social networks, the goal is to discover patterns whose association with the target is defined with respect to an underlying population, or process, of which the dataset represents only a collection of observations, or samples. A natural way to capture the association of a pattern with the target is to consider its\n            <jats:italic>statistical significance<\/jats:italic>\n            , assessing its deviation from the (null) hypothesis of independence between the pattern and the target. While several algorithms have been proposed to find statistically significant patterns, it remains a computationally demanding task, and for complex patterns such as subgroups, no efficient solution exists.\n          <\/jats:p>\n          <jats:p>We present FSR, an efficient algorithm to identify statistically significant patterns with rigorous guarantees on the probability of false discoveries. FSR builds on a novel general framework for mining significant patterns that captures some of the most commonly considered patterns, including itemsets, sequential patterns, and subgroups. FSR uses a small number of resampled datasets, obtained by assigning i.i.d. labels to each transaction, to rigorously bound the supremum deviation of a quality statistic measuring the significance of patterns. FSR builds on novel tight bounds on the supremum deviation that require to mine a small number of resampled datasets, while providing a high effectiveness in discovering significant patterns. As a test case, we consider significant subgroup mining, and our evaluation on several real datasets shows that FSR is effective in discovering significant subgroups, while requiring a small number of resampled datasets.<\/jats:p>","DOI":"10.14778\/3675034.3675055","type":"journal-article","created":{"date-parts":[[2024,8,6]],"date-time":"2024-08-06T22:19:11Z","timestamp":1722982751000},"page":"2668-2680","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Efficient Discovery of Significant Patterns with Few-Shot Resampling"],"prefix":"10.14778","volume":"17","author":[{"given":"Leonardo","family":"Pellegrina","sequence":"first","affiliation":[{"name":"Dept. of Information Engineering, University of Padova, Padova, Italy"}]},{"given":"Fabio","family":"Vandin","sequence":"additional","affiliation":[{"name":"Dept. of Information Engineering, University of Padova, Padova, Italy"}]}],"member":"320","published-online":{"date-parts":[[2024,8,6]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"1","article-title":"On dense pattern mining in graph streams","volume":"3","author":"Aggarwal Charu C","year":"2010","unstructured":"Charu C Aggarwal, Yao Li, Philip S Yu, and Ruoming Jin. 2010. On dense pattern mining in graph streams. Proceedings of the VLDB Endowment 3, 1-2 (2010), 975--984.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/170035.170072"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687710"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1144"},{"key":"e_1_2_1_5_1","volume-title":"A new test for 2\u00d7 2 tables. Nature 156, 3954","author":"Barnard GA","year":"1945","unstructured":"GA Barnard. 1945. A new test for 2\u00d7 2 tables. Nature 156, 3954 (1945)."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.2517-6161.1995.tb02031.x"},{"key":"e_1_2_1_7_1","volume-title":"Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze 8","author":"Bonferroni Carlo","year":"1936","unstructured":"Carlo Bonferroni. 1936. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze 8 (1936), 3--62."},{"key":"e_1_2_1_8_1","volume-title":"Concentration inequalities: A nonasymptotic theory of independence","author":"Boucheron St\u00e9phane","unstructured":"St\u00e9phane Boucheron, G\u00e1bor Lugosi, and Pascal Massart. 2013. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3324301.3324308"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565838.3565840"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687711"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.","author":"Dalleiger Sebastian","year":"2022","unstructured":"Sebastian Dalleiger and Jilles Vreeken. 2022. Discovering significant patterns under sequential false discovery control. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining."},{"key":"e_1_2_1_13_1","volume-title":"Concentration of measure for the analysis of randomized algorithms","author":"Dubhashi Devdatt P","unstructured":"Devdatt P Dubhashi and Alessandro Panconesi. 2009. Concentration of measure for the analysis of randomized algorithms. Cambridge University Press."},{"key":"e_1_2_1_14_1","volume-title":"Exploring the Inner Life of Neural Networks with Robust Rules. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research)","volume":"139","author":"Fischer Jonas","year":"2021","unstructured":"Jonas Fischer, Anna Olah, and Jilles Vreeken. 2021. What's in the Box? Exploring the Inner Life of Neural Networks with Robust Rules. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 139. PMLR, 3352--3362."},{"key":"e_1_2_1_15_1","article-title":"On the interpretation of \u03c7 2 from contingency tables, and the calculation of P","volume":"85","author":"Fisher Ronald A","year":"1922","unstructured":"Ronald A Fisher. 1922. On the interpretation of \u03c7 2 from contingency tables, and the calculation of P. Journal of the royal statistical society 85, 1 (1922).","journal-title":"Journal of the royal statistical society"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-018-0590-x"},{"key":"e_1_2_1_17_1","volume-title":"Frequent pattern mining: current status and future directions. Data mining and knowledge discovery 15, 1","author":"Han Jiawei","year":"2007","unstructured":"Jiawei Han, Hong Cheng, Dong Xin, and Xifeng Yan. 2007. Frequent pattern mining: current status and future directions. Data mining and knowledge discovery 15, 1 (2007), 55--86."},{"key":"e_1_2_1_18_1","first-page":"673","article-title":"Efficient temporal pattern mining in big time series using mutual information","volume":"15","author":"Thao Ho Nguyen Thi","year":"2022","unstructured":"Nguyen Thi Thao Ho, Torben Bach Pedersen, et al. 2022. Efficient temporal pattern mining in big time series using mutual information. Proceedings of the VLDB Endowment 15, 3 (2022), 673--685.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_2_1_19_1","volume-title":"A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics","author":"Holm Sture","year":"1979","unstructured":"Sture Holm. 1979. A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics (1979), 65--70."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177698243"},{"key":"e_1_2_1_21_1","volume-title":"2017 IEEE International Conference on Data Mining (ICDM). IEEE.","author":"Kalofolias Janis","year":"2017","unstructured":"Janis Kalofolias, Mario Boley, and Jilles Vreeken. 2017. Efficiently discovering locally exceptional yet globally representative subgroups. In 2017 IEEE International Conference on Data Mining (ICDM). IEEE."},{"key":"e_1_2_1_22_1","unstructured":"Yann LeCun and Corinna Cortes. 2010. MNIST handwritten digit database. http:\/\/yann.lecun.com\/exdb\/mnist\/"},{"key":"e_1_2_1_23_1","volume-title":"ECML PKDD","author":"Lemmerich Florian","year":"2018","unstructured":"Florian Lemmerich and Martin Becker. 2019. pysubgroup: Easy-to-use subgroup discovery in python. In ECML PKDD 2018. Springer, 658--662."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.2000.1741"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2783258.2783363"},{"key":"e_1_2_1_26_1","volume-title":"On the method of bounded differences. Surveys in combinatorics 141, 1","author":"McDiarmid Colin","year":"1989","unstructured":"Colin McDiarmid. 1989. On the method of bounded differences. Surveys in combinatorics 141, 1 (1989), 148--188."},{"key":"e_1_2_1_27_1","volume-title":"Asymptotic optimality of the Westfall-Young permutation procedure for multiple testing under dependence. The Annals of Statistics","author":"Meinshausen Nicolai","year":"2011","unstructured":"Nicolai Meinshausen, Marloes H Maathuis, and Peter B\u00fchlmann. 2011. Asymptotic optimality of the Westfall-Young permutation procedure for multiple testing under dependence. The Annals of Statistics (2011), 3369--3391."},{"key":"e_1_2_1_28_1","volume-title":"ECML PKDD","author":"Uno Takeaki","year":"2014","unstructured":"Shin-ichi Minato, Takeaki Uno, Koji Tsuda, Aika Terada, and Jun Sese. 2014. A fast method of statistical assessment for combinatorial hypotheses based on frequent itemset enumeration. In ECML PKDD 2014. Springer."},{"key":"e_1_2_1_29_1","volume-title":"MCRapper: Monte-Carlo Rademacher averages for poset families and approximate pattern mining. ACM Transactions on Knowledge Discovery from Data (TKDD) 16, 6","author":"Pellegrina Leonardo","year":"2022","unstructured":"Leonardo Pellegrina, Cyrus Cousins, Fabio Vandin, and Matteo Riondato. 2022. MCRapper: Monte-Carlo Rademacher averages for poset families and approximate pattern mining. ACM Transactions on Knowledge Discovery from Data (TKDD) 16, 6 (2022), 1--29."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3332286"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330978"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-020-00687-8"},{"key":"e_1_2_1_33_1","volume-title":"Efficient Discovery of Significant Patterns with Few-Shot Resampling (Extended version). arXiv preprint arXiv:2406.11803","author":"Pellegrina Leonardo","year":"2024","unstructured":"Leonardo Pellegrina and Fabio Vandin. 2024. Efficient Discovery of Significant Patterns with Few-Shot Resampling (Extended version). arXiv preprint arXiv:2406.11803 (2024). http:\/\/arxiv.org\/abs\/2406.11803"},{"key":"e_1_2_1_34_1","volume-title":"International Conference on Discovery Science. Springer, 275--280","author":"Pietracaprina Andrea","year":"2007","unstructured":"Andrea Pietracaprina and Fabio Vandin. 2007. Efficient incremental mining of top-K frequent closed itemsets. In International Conference on Discovery Science. Springer, 275--280."},{"key":"e_1_2_1_35_1","volume-title":"Convergence of stochastic processes","author":"Pollard David","unstructured":"David Pollard. 2012. Convergence of stochastic processes. Springer Science & Business Media."},{"key":"e_1_2_1_36_1","volume-title":"MiSoSouP: Mining interesting subgroups with sampling and pseudodimension. ACM Transactions on Knowledge Discovery from Data (TKDD) 14, 5","author":"Riondato Matteo","year":"2020","unstructured":"Matteo Riondato and Fabio Vandin. 2020. MiSoSouP: Mining interesting subgroups with sampling and pseudodimension. ACM Transactions on Knowledge Discovery from Data (TKDD) 14, 5 (2020), 1--31."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.3390\/a13050123"},{"key":"e_1_2_1_38_1","volume-title":"Understanding Machine Learning: From Theory to Algorithms","author":"Shalev-Shwartz Shai","unstructured":"Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808719.2808721"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1302233110"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-012-0273-y"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150451"},{"key":"e_1_2_1_44_1","volume-title":"Discovering significant patterns. Machine learning 68","author":"Webb Geoffrey I","year":"2007","unstructured":"Geoffrey I Webb. 2007. Discovering significant patterns. Machine learning 68 (2007), 1--33."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-008-5046-x"},{"key":"e_1_2_1_46_1","volume-title":"Resampling-based multiple testing: Examples and methods for p-value adjustment","author":"Westfall Peter H","unstructured":"Peter H Westfall and S Stanley Young. 1993. Resampling-based multiple testing: Examples and methods for p-value adjustment. John Wiley & Sons."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3675034.3675055","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,6]],"date-time":"2024-08-06T22:20:04Z","timestamp":1722982804000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3675034.3675055"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6]]},"references-count":45,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2024,6]]}},"alternative-id":["10.14778\/3675034.3675055"],"URL":"https:\/\/doi.org\/10.14778\/3675034.3675055","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,6]]},"assertion":[{"value":"2024-08-06","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}