{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,6]],"date-time":"2025-08-06T12:32:38Z","timestamp":1754483558935,"version":"3.37.3"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"9","license":[{"start":{"date-parts":[[2022,8,2]],"date-time":"2022-08-02T00:00:00Z","timestamp":1659398400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,8,2]],"date-time":"2022-08-02T00:00:00Z","timestamp":1659398400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Knowl Inf Syst"],"published-print":{"date-parts":[[2022,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Pattern mining is a fundamental data mining task with applications in several domains. In this work, we consider the scenario in which we have a sequence of datasets generated by potentially different underlying generative processes, and we study the problem of mining <jats:italic>statistically robust patterns<\/jats:italic>, which are patterns whose probabilities of appearing in transactions drawn from such generative processes respect well-defined conditions. Such conditions define the patterns of interest, describing the evolution of their probabilities through the datasets in the sequence, which may, for example, increase, decrease, or stay stable, through the sequence. Due to the stochastic nature of the data, one cannot identify the exact set of the statistically robust patterns by analyzing a sequence of samples, i.e., the datasets, taken from the generative processes, and has to resort to approximations. We then propose <jats:sc>gRosSo<\/jats:sc>, an algorithm to find rigorous approximations of the statistically robust patterns that do not contain false positives or false negatives with high probability. We apply our framework to the mining of statistically robust sequential patterns and statistically robust itemsets. Our extensive evaluation on pseudo-artificial and real data shows that <jats:sc>gRosSo<\/jats:sc> provides high-quality approximations for the problem of mining statistically robust sequential patterns and statistically robust itemsets.\n<\/jats:p>","DOI":"10.1007\/s10115-022-01689-2","type":"journal-article","created":{"date-parts":[[2022,8,2]],"date-time":"2022-08-02T04:02:54Z","timestamp":1659412974000},"page":"2329-2359","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["gRosSo: mining statistically robust patterns from a sequence of datasets"],"prefix":"10.1007","volume":"64","author":[{"given":"Andrea","family":"Tonon","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2244-2320","authenticated-orcid":false,"given":"Fabio","family":"Vandin","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,8,2]]},"reference":[{"key":"1689_CR1","doi-asserted-by":"crossref","unstructured":"Tonon A, Vandin F (2020) gRosSo: mining statistically robust patterns from a sequence of datasets. In: Proceedings of the 20th IEEE international conference on data mining, IEEE, ICDM\u201920, pp 551\u2013560","DOI":"10.1109\/ICDM50108.2020.00064"},{"issue":"1","key":"1689_CR2","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1007\/s10618-006-0059-1","volume":"15","author":"J Han","year":"2007","unstructured":"Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Disc 15(1):55\u201386","journal-title":"Data Min Knowl Disc"},{"key":"1689_CR3","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1145\/170036.170072","volume":"22","author":"R Agrawal","year":"1993","unstructured":"Agrawal R, Imieli\u0144ski T, Swami A (1993) Mining association rules between sets of items in large databases. SIGMOD Rec 22:207\u2013216","journal-title":"SIGMOD Rec"},{"key":"1689_CR4","doi-asserted-by":"crossref","unstructured":"Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the 11th international conference on data engineering, IEEE, ICDE\u201995, pp 3\u201314","DOI":"10.1109\/ICDE.1995.380415"},{"issue":"7","key":"1689_CR5","doi-asserted-by":"publisher","first-page":"649","DOI":"10.1002\/int.4550070707","volume":"7","author":"W Kl\u00f6sgen","year":"1992","unstructured":"Kl\u00f6sgen W (1992) Problems for knowledge discovery in databases and their treatment in the statistics interpreter explora. Int J Intell Syst 7(7):649\u2013673","journal-title":"Int J Intell Syst"},{"key":"1689_CR6","doi-asserted-by":"crossref","unstructured":"Ahmed NK, Neville J, Rossi RA, Duffield N (2015) Efficient graphlet counting for large networks. In: Proceedings of the 2015 IEEE international conference on data mining, IEEE, ICDM\u201915, pp 1\u201310","DOI":"10.1109\/ICDM.2015.141"},{"issue":"2","key":"1689_CR7","doi-asserted-by":"publisher","first-page":"325","DOI":"10.1007\/s10618-018-0590-x","volume":"33","author":"W H\u00e4m\u00e4l\u00e4inen","year":"2019","unstructured":"H\u00e4m\u00e4l\u00e4inen W, Webb GI (2019) A tutorial on statistically sound pattern discovery. Data Min Knowl Disc 33(2):325\u2013377","journal-title":"Data Min Knowl Disc"},{"key":"1689_CR8","doi-asserted-by":"crossref","unstructured":"Pellegrina L, Riondato M, Vandin F (2019) Hypothesis testing and statistically-sound pattern mining. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 3215\u20133216","DOI":"10.1145\/3292500.3332286"},{"key":"1689_CR9","doi-asserted-by":"crossref","unstructured":"Komiyama J, Ishihata M, Arimura H, Nishibayashi T, Minato SI (2017) Statistical emerging pattern mining with multiple testing correction. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 897\u2013906","DOI":"10.1145\/3097983.3098137"},{"key":"1689_CR10","doi-asserted-by":"crossref","unstructured":"Llinares-L\u00f3pez F, Sugiyama M, Papaxanthos L, Borgwardt K (2015) Fast and memory-efficient significant pattern mining via permutation testing. In: Proceedings of the 21st ACM SIGKDD international conference on knowledge discovery and data mining, pp 725\u2013734","DOI":"10.1145\/2783258.2783363"},{"key":"1689_CR11","doi-asserted-by":"crossref","unstructured":"Pellegrina L, Riondato M, Vandin F (2019) SPuManTE: Significant pattern mining with unconditional testing. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1528\u20131538","DOI":"10.1145\/3292500.3330978"},{"key":"1689_CR12","doi-asserted-by":"publisher","first-page":"1201","DOI":"10.1007\/s10618-020-00687-8","volume":"34","author":"L Pellegrina","year":"2020","unstructured":"Pellegrina L, Vandin F (2020) Efficient mining of the most significant patterns with permutation testing. Data Min Knowl Disc 34:1201\u20131234","journal-title":"Data Min Knowl Disc"},{"key":"1689_CR13","doi-asserted-by":"crossref","unstructured":"Gwadera R, Crestani F (2010) Ranking sequential patterns with respect to significance. In: Zaki MJ, Yu JX, Ravindran B, Pudi V (eds) Advances in knowledge discovery and data mining, PAKDD 2010, pp 286\u2013299","DOI":"10.1007\/978-3-642-13657-3_32"},{"key":"1689_CR14","doi-asserted-by":"crossref","unstructured":"Low-Kam C, Ra\u00efssi C, Kaytoue M, Pei J (2013) Mining statistically significant sequential patterns. In: Proceedings of the 13th IEEE international conference on data mining, IEEE, ICDM\u201913, pp 488\u2013497","DOI":"10.1109\/ICDM.2013.124"},{"key":"1689_CR15","doi-asserted-by":"crossref","unstructured":"Tonon A, Vandin F (2019) Permutation strategies for mining significant sequential patterns. In: Proceedings of the 19th IEEE international conference on data mining, IEEE, ICDM\u201919, pp 1330\u20131335","DOI":"10.1109\/ICDM.2019.00169"},{"key":"1689_CR16","doi-asserted-by":"crossref","unstructured":"Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 43\u201352","DOI":"10.1145\/312129.312191"},{"key":"1689_CR17","unstructured":"Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, VLDB\u201994, pp 487\u2013499"},{"issue":"1","key":"1689_CR18","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1023\/B:DAMI.0000005258.31418.83","volume":"8","author":"J Han","year":"2004","unstructured":"Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 8(1):53\u201387","journal-title":"Data Min Knowl Disc"},{"key":"1689_CR19","doi-asserted-by":"crossref","unstructured":"Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology, EDBT\u201996, pp 1\u201317","DOI":"10.1007\/BFb0014140"},{"issue":"11","key":"1689_CR20","doi-asserted-by":"publisher","first-page":"1424","DOI":"10.1109\/TKDE.2004.77","volume":"16","author":"J Pei","year":"2004","unstructured":"Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424\u20131440","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"1689_CR21","doi-asserted-by":"publisher","first-page":"1313","DOI":"10.1007\/s10115-019-01393-8","volume":"62","author":"S Servan-Schreiber","year":"2020","unstructured":"Servan-Schreiber S, Riondato M, Zgraggen E (2020) ProSecCo: progressive sequence mining with convergence guarantees. Knowl Inf Syst 62:1313\u20131340","journal-title":"Knowl Inf Syst"},{"issue":"5","key":"1689_CR22","doi-asserted-by":"publisher","first-page":"123","DOI":"10.3390\/a13050123","volume":"13","author":"D Santoro","year":"2020","unstructured":"Santoro D, Tonon A, Vandin F (2020) Mining sequential patterns with VC-dimension and rademacher complexity. Algorithms 13(5):123","journal-title":"Algorithms"},{"issue":"4","key":"1689_CR23","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2629586","volume":"8","author":"M Riondato","year":"2014","unstructured":"Riondato M, Upfal E (2014) Efficient discovery of association rules and frequent itemsets through sampling with tight performance guarantees. ACM Trans Knowl Discov Data (TKDD) 8(4):1\u201332","journal-title":"ACM Trans Knowl Discov Data (TKDD)"},{"key":"1689_CR24","doi-asserted-by":"crossref","unstructured":"Riondato M, Vandin F (2014) Finding the true frequent itemsets. In: Zaki MJ, Obradovic Z, Tan P, Banerjee A, Kamath C, Parthasarathy S (eds) Proceedings of the 2014 SIAM international conference on data mining, SIAM, pp 497\u2013505","DOI":"10.1137\/1.9781611973440.57"},{"key":"1689_CR25","doi-asserted-by":"crossref","unstructured":"Zhu F, Yan X, Han J, Philip SY, Cheng H (2007) Mining colossal frequent patterns by core pattern fusion. In: 2007 IEEE 23rd international conference on data engineering, pp 706-715","DOI":"10.1109\/ICDE.2007.367916"},{"issue":"1","key":"1689_CR26","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1007\/s10115-016-1002-4","volume":"52","author":"E Egho","year":"2017","unstructured":"Egho E, Gay D, Boull\u00e9 M, Voisine N, Cl\u00e9rot F (2017) A user parameter-free approach for mining robust sequential classification rules. Knowl Inf Syst 52(1):53\u201381","journal-title":"Knowl Inf Syst"},{"key":"1689_CR27","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1007\/978-3-319-21852-6_3","volume-title":"Measures of complexity","author":"VN Vapnik","year":"2015","unstructured":"Vapnik VN, Chervonenkis AY (2015) On the uniform convergence of relative frequencies of events to their probabilities. In: Vovk V, Papadopoulos H, Gammerman A (eds) Measures of complexity. Springer, Cham, pp 11\u201330"},{"key":"1689_CR28","doi-asserted-by":"publisher","first-page":"323","DOI":"10.1051\/ps:2005018","volume":"9","author":"S Boucheron","year":"2005","unstructured":"Boucheron S, Bousquet O, Lugosi G (2005) Theory of classification: a survey of some recent advances. ESAIM Probab Stat 9:323\u2013375","journal-title":"ESAIM Probab Stat"},{"key":"1689_CR29","volume-title":"Probability and computing: randomization and probabilistic techniques in algorithms and data analysis","author":"M Mitzenmacher","year":"2017","unstructured":"Mitzenmacher M, Upfal E (2017) Probability and computing: randomization and probabilistic techniques in algorithms and data analysis. Cambridge University Press, Cambridge"},{"issue":"3","key":"1689_CR30","doi-asserted-by":"publisher","first-page":"516","DOI":"10.1006\/jcss.2000.1741","volume":"62","author":"Y Li","year":"2001","unstructured":"Li Y, Long PM, Srinivasan A (2001) Improved bounds on the sample complexity of learning. J Comput Syst Sci 62(3):516\u2013527","journal-title":"J Comput Syst Sci"},{"issue":"3","key":"1689_CR31","doi-asserted-by":"publisher","first-page":"732","DOI":"10.1007\/s10618-014-0362-1","volume":"29","author":"E Egho","year":"2015","unstructured":"Egho E, Ra\u00efssi C, Calders T, Jay N, Napoli A (2015) On measuring similarity for sequences of itemsets. Data Min Knowl Discov 29(3):732\u2013764","journal-title":"Data Min Knowl Discov"},{"key":"1689_CR32","doi-asserted-by":"crossref","unstructured":"Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: Proceedings of 19th European conference on machine learning and principles and practice of knowledge discovery and data mining (Part III), ECML PKDD\u201916","DOI":"10.1007\/978-3-319-46131-1_8"}],"container-title":["Knowledge and Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-022-01689-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10115-022-01689-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-022-01689-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,8,31]],"date-time":"2022-08-31T17:05:26Z","timestamp":1661965526000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10115-022-01689-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,2]]},"references-count":32,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2022,9]]}},"alternative-id":["1689"],"URL":"https:\/\/doi.org\/10.1007\/s10115-022-01689-2","relation":{},"ISSN":["0219-1377","0219-3116"],"issn-type":[{"type":"print","value":"0219-1377"},{"type":"electronic","value":"0219-3116"}],"subject":[],"published":{"date-parts":[[2022,8,2]]},"assertion":[{"value":"28 January 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 April 2022","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 April 2022","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 August 2022","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}