{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T19:01:01Z","timestamp":1768071661178,"version":"3.49.0"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,10,4]],"date-time":"2021-10-04T00:00:00Z","timestamp":1633305600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,10,4]],"date-time":"2021-10-04T00:00:00Z","timestamp":1633305600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Min Knowl Disc"],"published-print":{"date-parts":[[2022,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Pattern mining is well established in data mining research, especially for mining binary datasets. Surprisingly, there is much less work about numerical pattern mining and this research area remains under-explored. In this paper we propose<jats:sc>Mint<\/jats:sc>, an efficient MDL-based algorithm for mining numerical datasets. The MDL principle is a robust and reliable framework widely used in pattern mining, and as well in subgroup discovery. In<jats:sc>Mint<\/jats:sc>we reuse MDL for discovering useful patterns and returning a set of non-redundant overlapping patterns with well-defined boundaries and covering meaningful groups of objects.<jats:sc>Mint<\/jats:sc>is not alone in the category of numerical pattern miners based on MDL. In the experiments presented in the paper we show that<jats:sc>Mint<\/jats:sc>outperforms competitors among which IPD,<jats:sc>RealKrimp<\/jats:sc>, and<jats:sc>Slim<\/jats:sc>.<\/jats:p>","DOI":"10.1007\/s10618-021-00799-9","type":"journal-article","created":{"date-parts":[[2021,10,5]],"date-time":"2021-10-05T04:19:20Z","timestamp":1633407560000},"page":"108-145","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Mint: MDL-based approach for Mining INTeresting Numerical Pattern Sets"],"prefix":"10.1007","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6724-3803","authenticated-orcid":false,"given":"Tatiana","family":"Makhalova","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3284-9001","authenticated-orcid":false,"given":"Sergei O.","family":"Kuznetsov","sequence":"additional","affiliation":[]},{"given":"Amedeo","family":"Napoli","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,10,4]]},"reference":[{"key":"799_CR1","doi-asserted-by":"crossref","unstructured":"Akoglu L, Tong H, Vreeken J, Faloutsos C (2012) Fast and reliable anomaly detection in categorical data. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 415\u2013424","DOI":"10.1145\/2396761.2396816"},{"key":"799_CR2","doi-asserted-by":"crossref","unstructured":"Bariatti F, Cellier P, Ferr\u00e9 S (2020) GraphMDL: graph pattern selection based on minimum description length. In: International symposium on intelligent data analysis (IDA). Springer, pp 54\u201366","DOI":"10.1007\/978-3-030-44584-3_5"},{"issue":"1","key":"799_CR3","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1007\/s10115-009-0230-2","volume":"24","author":"A Bondu","year":"2010","unstructured":"Bondu A, Boull\u00e9 M, Lemaire V (2010) A non-parametric semi-supervised discretization method. Knowl Inf Syst 24(1):35\u201357","journal-title":"Knowl Inf Syst"},{"issue":"1","key":"799_CR4","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1007\/s10994-006-8364-x","volume":"65","author":"M Boull\u00e9","year":"2006","unstructured":"Boull\u00e9 M (2006) MODL: a Bayes optimal discretization method for continuous attributes. Mach Learn 65(1):131\u2013165","journal-title":"Mach Learn"},{"key":"799_CR5","doi-asserted-by":"crossref","unstructured":"Budhathoki K, Vreeken J (2015) The difference and the norm\u2014characterising similarities and differences between databases. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 206\u2013223","DOI":"10.1007\/978-3-319-23525-7_13"},{"key":"799_CR6","doi-asserted-by":"crossref","unstructured":"Calders T, Goethals B, Jaroszewicz S (2006) Mining rank-correlated sets of numerical attributes. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 96\u2013105","DOI":"10.1145\/1150402.1150417"},{"key":"799_CR7","unstructured":"Coenen F (2003) The LUCS-KDD discretised\/normalised ARM and CARM data library. Department of CS, The University of Liverpool, UK http:\/\/www.csc.liv.ac.uk\/~frans\/KDD\/Software\/LUCS_KDD_DN"},{"issue":"3","key":"799_CR8","first-page":"29","volume":"2","author":"R Dash","year":"2011","unstructured":"Dash R, Lochan PR, Rasmita D (2011) Comparative analysis of supervised and unsupervised discretization techniques. Int J Adv Sci Technol 2(3):29\u201337","journal-title":"Int J Adv Sci Technol"},{"key":"799_CR9","unstructured":"Dua D, Graff C (2017) UCI machine learning repository. http:\/\/archive.ics.uci.edu\/ml"},{"key":"799_CR10","doi-asserted-by":"crossref","unstructured":"Faas M, van Leeuwen M (2020) Vouw: geometric pattern mining using the MDL principle. In: International symposium on intelligent data analysis (IDA). Springer, pp 158\u2013170","DOI":"10.1007\/978-3-030-44584-3_13"},{"key":"799_CR11","unstructured":"Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Ruzena B (ed) Proceedings of the 13th international joint conference on artificial intelligence. Morgan Kaufmann, pp 1022\u20131029"},{"key":"799_CR12","unstructured":"Galbrun E (2020) The minimum description length principle for pattern mining: a survey. arXiv:2007.14009"},{"key":"799_CR13","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/4643.001.0001","volume-title":"The minimum description length principle","author":"P Gr\u00fcnwald","year":"2007","unstructured":"Gr\u00fcnwald P (2007) The minimum description length principle. MIT, Cambridge"},{"issue":"8","key":"799_CR14","doi-asserted-by":"publisher","first-page":"651","DOI":"10.1016\/j.patrec.2009.09.011","volume":"31","author":"AK Jain","year":"2010","unstructured":"Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651\u2013666","journal-title":"Pattern Recogn Lett"},{"key":"799_CR15","doi-asserted-by":"crossref","unstructured":"Jeantet I, Mikl\u00f3s Z, Gross-Amblard D (2020) Overlapping hierarchical clustering (OHC). In: Proceedings of the 18th international symposium on intelligent data analysis (IDA), volume 12080 of lecture notes in computer science, vol 12080. Springer, pp 261\u2013273","DOI":"10.1007\/978-3-030-44584-3_21"},{"key":"799_CR16","doi-asserted-by":"crossref","unstructured":"Kang Y, Wang S, Liu X, Lai H, Wang H, Miao B (2006) An ICA-based multivariate discretization algorithm. In: International conference on knowledge science, engineering and management. Springer, pp 556\u2013562","DOI":"10.1007\/11811220_47"},{"key":"799_CR17","unstructured":"Kaytoue M, Kuznetsov SO, Napoli A (2011) Revisiting numerical pattern mining with formal concept analysis. In: Twenty-second international joint conference on artificial intelligence"},{"key":"799_CR18","unstructured":"Kontkanen P, Myllym\u00e4ki P (2007) MDL histogram density estimation. In: Artificial intelligence and statistics, pp 219\u2013226"},{"issue":"1","key":"799_CR19","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1007\/s11634-019-00383-6","volume":"15","author":"T Makhalova","year":"2021","unstructured":"Makhalova T, Trnecka M (2021) From-below Boolean matrix factorization algorithm based on MDL. Adv Data Anal Classif 15(1):37\u201356","journal-title":"Adv Data Anal Classif"},{"key":"799_CR20","doi-asserted-by":"crossref","unstructured":"Makhalova T, Kuznetsov SO, Napoli A (2019) Numerical pattern mining through compression. In: 2019 data compression conference (DCC). IEEE, pp 112\u2013121","DOI":"10.1109\/DCC.2019.00019"},{"key":"799_CR21","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071","volume-title":"Introduction to Information Retrieva","author":"CD Manning","year":"2008","unstructured":"Manning CD, Raghavan P, Sch\u00fctze H (2008) Introduction to Information Retrieva. Cambridge University Press, Cambridge"},{"issue":"9","key":"799_CR22","doi-asserted-by":"publisher","first-page":"1174","DOI":"10.1109\/TKDE.2005.153","volume":"17","author":"S Mehta","year":"2005","unstructured":"Mehta S, Parthasarathy S, Yang H (2005) Toward unsupervised correlation preserving discretization. IEEE Trans Knowl Data Eng 17(9):1174\u20131185","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"4","key":"799_CR23","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2601437","volume":"8","author":"P Miettinen","year":"2014","unstructured":"Miettinen P, Vreeken J (2014) MDL4BMF: minimum description length for Boolean matrix factorization. ACM Trans Knowl Discov Data: TKDD 8(4):1\u201331","journal-title":"ACM Trans Knowl Discov Data: TKDD"},{"issue":"5\u20136","key":"799_CR24","doi-asserted-by":"publisher","first-page":"1366","DOI":"10.1007\/s10618-014-0350-5","volume":"28","author":"H-V Nguyen","year":"2014","unstructured":"Nguyen H-V, M\u00fcller E, Vreeken J, B\u00f6hm K (2014) Unsupervised interaction-preserving discretization of multivariate data. Data Min Knowl Disc 28(5\u20136):1366\u20131397","journal-title":"Data Min Knowl Disc"},{"key":"799_CR25","doi-asserted-by":"publisher","first-page":"1372","DOI":"10.1016\/j.ins.2019.10.050","volume":"512","author":"HM Proen\u00e7a","year":"2020","unstructured":"Proen\u00e7a HM, van Leeuwen M (2020) Interpretable multiclass classification by MDL-based rule lists. Inf Sci 512:1372\u20131393","journal-title":"Inf Sci"},{"issue":"2","key":"799_CR26","doi-asserted-by":"publisher","first-page":"416","DOI":"10.1214\/aos\/1176346150","volume":"11","author":"J Rissanen","year":"1983","unstructured":"Rissanen J (1983) A universal prior for integers and estimation by minimum description length. Ann Stat 11(2):416\u2013431","journal-title":"Ann Stat"},{"issue":"2","key":"799_CR27","doi-asserted-by":"publisher","first-page":"315","DOI":"10.1109\/18.119689","volume":"38","author":"J Rissanen","year":"1992","unstructured":"Rissanen J, Speed TP, Bin Yu (1992) Density estimation by stochastic complexity. IEEE Trans Inf Theory 38(2):315\u2013323","journal-title":"IEEE Trans Inf Theory"},{"key":"799_CR28","doi-asserted-by":"crossref","unstructured":"Siebes A, Vreeken J, van Leeuwen M (2006) Item sets that compress. In: Proceedings of the 2006 SIAM international conference on data mining. SIAM, pp 395\u2013406","DOI":"10.1137\/1.9781611972764.35"},{"key":"799_CR29","doi-asserted-by":"crossref","unstructured":"Smets K, Vreeken J (2012) Slim: directly mining descriptive patterns. In: Proceedings of SIAM. SIAM, pp 236\u2013247","DOI":"10.1137\/1.9781611972825.21"},{"key":"799_CR30","doi-asserted-by":"crossref","unstructured":"Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1\u201312","DOI":"10.1145\/235968.233311"},{"key":"799_CR31","doi-asserted-by":"crossref","unstructured":"Tatti N (2013) Itemsets for real-valued datasets. In: 2013 IEEE 13th international conference on data mining. IEEE, pp 717\u2013726","DOI":"10.1109\/ICDM.2013.138"},{"key":"799_CR32","doi-asserted-by":"crossref","unstructured":"Tatti N, Vreeken J (2008) Finding good itemsets by packing data. In: Eighth IEEE international conference on data mining. IEEE, pp 588\u2013597","DOI":"10.1109\/ICDM.2008.39"},{"key":"799_CR33","doi-asserted-by":"crossref","unstructured":"Tatti N, Vreeken J (2012a) The long and the short of it: summarising event sequences with serial episodes. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 462\u2013470","DOI":"10.1145\/2339530.2339606"},{"key":"799_CR34","doi-asserted-by":"crossref","unstructured":"Tatti N, Vreeken J (2012b) Discovering descriptive tile trees\u2014by mining optimal geometric subtiles. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML-PKDD), lecture notes in computer science, vol 7523. Springer, pp 9\u201324","DOI":"10.1007\/978-3-642-33460-3_6"},{"key":"799_CR35","doi-asserted-by":"crossref","unstructured":"van Craenendonck T, Dumancic S, Blockeel H (2017) COBRA: a fast and simple method for active clustering with pairwise constraints. In: Proceedings of the 26 international joint conference on artificial intelligence (IJCAI), pp 2871\u20132877","DOI":"10.24963\/ijcai.2017\/400"},{"key":"799_CR36","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1007\/978-3-319-07821-2_5","volume-title":"Frequent pattern mining","author":"J Vreeken","year":"2014","unstructured":"Vreeken J, Tatti N (2014) Interesting patterns. In: Aggarwal CC, Han J (eds) Frequent pattern mining. Springer, Berlin, pp 105\u2013134"},{"issue":"1","key":"799_CR37","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1007\/s10618-010-0202-x","volume":"23","author":"J Vreeken","year":"2011","unstructured":"Vreeken J, Van Leeuwen M, Siebes A (2011) Krimp: mining itemsets that compress. Data Min Knowl Discov 23(1):169\u2013214","journal-title":"Data Min Knowl Discov"},{"key":"799_CR38","unstructured":"Witteveen J (2012) Mining hyperintervals\u2014getting to grips with real-valued data. Bachelor\u2019s thesis"},{"key":"799_CR39","doi-asserted-by":"crossref","unstructured":"Witteveen J, Duivesteijn W, Knobbe A, Gr\u00fcnwald P (2014) Realkrimp\u2014finding hyperintervals that compress with MDL for real-valued data. In: International symposium on intelligent data analysis. Springer, pp 368\u2013379","DOI":"10.1007\/978-3-319-12571-8_32"}],"container-title":["Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-021-00799-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10618-021-00799-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-021-00799-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,10]],"date-time":"2023-11-10T00:44:58Z","timestamp":1699577098000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10618-021-00799-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,4]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,1]]}},"alternative-id":["799"],"URL":"https:\/\/doi.org\/10.1007\/s10618-021-00799-9","relation":{},"ISSN":["1384-5810","1573-756X"],"issn-type":[{"value":"1384-5810","type":"print"},{"value":"1573-756X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,4]]},"assertion":[{"value":"22 May 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 September 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 October 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}