{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,25]],"date-time":"2025-06-25T13:33:40Z","timestamp":1750858420290,"version":"3.37.3"},"reference-count":43,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2022,3,5]],"date-time":"2022-03-05T00:00:00Z","timestamp":1646438400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,3,5]],"date-time":"2022-03-05T00:00:00Z","timestamp":1646438400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Helmholtz-Zentrum f\u00fcr Informationssicherheit \u2013 CISPA gGmbH"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Knowl Inf Syst"],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Suppose we are given a discrete-valued time series<jats:inline-formula><jats:alternatives><jats:tex-math>$$X $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mi>X<\/mml:mi><\/mml:math><\/jats:alternatives><\/jats:inline-formula>of observed events and an equally long binary sequence<jats:inline-formula><jats:alternatives><jats:tex-math>$$Y $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mi>Y<\/mml:mi><\/mml:math><\/jats:alternatives><\/jats:inline-formula>that indicates whether something of interest happened at that particular point in time. We consider the problem of mining serial episodes, sequential patterns allowing for gaps, from<jats:inline-formula><jats:alternatives><jats:tex-math>$$X $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mi>X<\/mml:mi><\/mml:math><\/jats:alternatives><\/jats:inline-formula>that reliably predict those interesting events. With<jats:italic>reliable<\/jats:italic>we mean patterns that not only predict<jats:italic>that<\/jats:italic>an interesting event is likely to follow, but in particular that we can also accurately tell how<jats:italic>how long<\/jats:italic>until that event will happen. In other words, we are specifically interested in patterns with a highly skewed distribution of delays between pattern occurrences and predicted events. As it is unlikely that a single pattern can explain a complex real-world progress, we are after the smallest, least redundant set of such patterns that together explain the interesting events well. We formally define this problem in terms of the Minimum Description Length principle, by which we identify the best patterns as those that describe the occurrences of interesting events<jats:inline-formula><jats:alternatives><jats:tex-math>$$Y $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mi>Y<\/mml:mi><\/mml:math><\/jats:alternatives><\/jats:inline-formula>most succinctly given the data over<jats:inline-formula><jats:alternatives><jats:tex-math>$$X $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mi>X<\/mml:mi><\/mml:math><\/jats:alternatives><\/jats:inline-formula>. As neither discovering the optimal explanation of<jats:inline-formula><jats:alternatives><jats:tex-math>$$Y $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mi>Y<\/mml:mi><\/mml:math><\/jats:alternatives><\/jats:inline-formula>given a set of patterns, nor the discovery of optimal pattern set are problems that allow for straightforward optimization, we break the problem in two and propose effective heuristics for both. Through extensive empirical evaluation, we show that both our main method,<jats:sc>Omen<\/jats:sc>, and its fast approximation<jats:sc>fOmen<\/jats:sc>, work well in practice and both quantitatively and qualitatively beat the state of the art.<\/jats:p>","DOI":"10.1007\/s10115-022-01660-1","type":"journal-article","created":{"date-parts":[[2022,3,5]],"date-time":"2022-03-05T21:02:31Z","timestamp":1646514151000},"page":"1013-1045","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Omen: discovering sequential patterns with reliable prediction delays"],"prefix":"10.1007","volume":"64","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6628-2192","authenticated-orcid":false,"given":"Joscha","family":"C\u00fcppers","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5842-8750","authenticated-orcid":false,"given":"Janis","family":"Kalofolias","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2310-2806","authenticated-orcid":false,"given":"Jilles","family":"Vreeken","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,3,5]]},"reference":[{"key":"1660_CR1","unstructured":"Aggarwal CC, Han J (eds) (2004) Frequent pattern mining. Springer"},{"issue":"1","key":"1660_CR2","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1007\/s10115-015-0819-6","volume":"46","author":"I Batal","year":"2016","unstructured":"Batal I, Cooper GF, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2016) An efficient pattern mining approach for event detection in multivariate temporal data. Knowl Inf Syst 46(1):115\u2013150","journal-title":"Knowl Inf Syst"},{"key":"1660_CR3","doi-asserted-by":"crossref","unstructured":"Bertens R, Vreeken J, Siebes A (2016) Keeping it short and simple: summarising complex event sequences with multivariate patterns. In: Proceedings of the 22nd ACM international conference on knowledge discovery and data mining (SIGKDD), San Francisco, CA, pp 735\u2013744","DOI":"10.1145\/2939672.2939761"},{"key":"1660_CR4","doi-asserted-by":"crossref","unstructured":"Bhattacharyya A, Vreeken J (2017) Efficiently summarising event sequences with rich interleaving patterns. In: Proceedings of the SIAM international conference on data mining (SDM), Houston, TX, SIAM, pp 795\u2013803","DOI":"10.1137\/1.9781611974973.89"},{"key":"1660_CR5","doi-asserted-by":"crossref","unstructured":"Budhathoki K, Vreeken J (2018) Causal inference on event sequences. In: Proceedings of the SIAM international conference on data mining (SDM), San Diego, CA, SIAM, pp 55\u201363","DOI":"10.1137\/1.9781611975321.7"},{"issue":"1","key":"1660_CR6","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1016\/j.physleta.2004.02.032","volume":"324","author":"Y Chen","year":"2004","unstructured":"Chen Y, Rangarajan G, Feng J, Ding M (2004) Analyzing multiple nonlinear time series with extended granger causality. Phys Lett A 324(1):26\u201335","journal-title":"Phys Lett A"},{"key":"1660_CR7","unstructured":"Corbi\u00e8re C, Thome N, Bar-Hen A, Cord M, P\u00e9rez P (2019) Addressing failure prediction by learning model confidence. pp 2898\u20132909"},{"key":"1660_CR8","volume-title":"Elements of information theory","author":"TM Cover","year":"2006","unstructured":"Cover TM, Thomas JA (2006) Elements of information theory. Wiley, New York"},{"key":"1660_CR9","doi-asserted-by":"crossref","unstructured":"C\u00fcppers J, Vreeken J (2020) Just wait for it... mining sequential patterns with reliable prediction delays. In: Proceedings of the 20th IEEE international conference on data mining (ICDM), Virtual Event, Sorrento, Italy","DOI":"10.1109\/ICDM50108.2020.00017"},{"issue":"2","key":"1660_CR10","doi-asserted-by":"publisher","first-page":"248","DOI":"10.1145\/321694.321699","volume":"19","author":"J Edmonds","year":"1972","unstructured":"Edmonds J, Karp RM (1972) Theoretical improvements in algorithmic efficiency for network flow problems. J ACM 19(2):248\u2013264","journal-title":"J ACM"},{"issue":"1","key":"1660_CR11","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1007\/s10115-016-1002-4","volume":"52","author":"E Egho","year":"2017","unstructured":"Egho E, Gay D, Boull\u00e9 M, Voisine N, Cl\u00e9rot F (2017) A user parameter-free approach for mining robust sequential classification rules. Knowl Inf Syst 52(1):53\u201381","journal-title":"Knowl Inf Syst"},{"key":"1660_CR12","doi-asserted-by":"crossref","unstructured":"Fowkes J, Sutton C (2016) A subsequence interleaving model for sequential pattern mining. In: Proceedings of the 22nd ACM international conference on knowledge discovery and data mining (SIGKDD), San Francisco, CA, pp 835\u2013844","DOI":"10.1145\/2939672.2939787"},{"key":"1660_CR13","doi-asserted-by":"crossref","unstructured":"Galbrun E, Cellier P, Tatti N, Termier A, Cr\u00e9milleux B (2018) Mining periodic patterns with a mdl criterion. In: Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD), Dublin, Ireland, Springer, pp 535\u2013551","DOI":"10.1007\/978-3-030-10928-8_32"},{"key":"1660_CR14","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1016\/j.dss.2014.02.003","volume":"61","author":"MS Gerber","year":"2014","unstructured":"Gerber MS (2014) Predicting crime using twitter and kernel density estimation. Decis Supp Syst 61:115\u2013125","journal-title":"Decis Supp Syst"},{"key":"1660_CR15","doi-asserted-by":"crossref","unstructured":"Granger CW (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica: 424\u2013438","DOI":"10.2307\/1912791"},{"key":"1660_CR16","doi-asserted-by":"crossref","unstructured":"Gr\u00fcnwald PD, Grunwald A (2007) The minimum description length principle. MIT press","DOI":"10.7551\/mitpress\/4643.001.0001"},{"key":"1660_CR17","doi-asserted-by":"crossref","unstructured":"Hagberg AA, Schult DA, Swart PJ (2008) Exploring network structure, dynamics, and function using networkx. In: Varoquaux G, Vaught T, Millman J (eds) Proceedings of the 7th Python in Science conference, Pasadena, CA USA, pp 11 \u2013 15","DOI":"10.25080\/TCWV9851"},{"key":"1660_CR18","doi-asserted-by":"crossref","unstructured":"Hooi B, Faloutsos C (2019) Branch and border: partition-based change detection in multivariate time series. In: Proceedings of the SIAM international conference on data mining (SDM), Alberta, Canada, SIAM, pp 504\u2013512","DOI":"10.1137\/1.9781611975673.57"},{"key":"1660_CR19","unstructured":"Hoyer P, Janzing D, Mooij J, Peters J, Sch\u00f6lkopf B (2009) Nonlinear causal discovery with additive noise models. pp 689\u2013696"},{"issue":"10","key":"1660_CR20","doi-asserted-by":"publisher","first-page":"5168","DOI":"10.1109\/TIT.2010.2060095","volume":"56","author":"D Janzing","year":"2010","unstructured":"Janzing D, Scholkopf B (2010) Causal inference using the algorithmic markov condition. IEEE Trans Inf Technol 56(10):5168\u20135194","journal-title":"IEEE Trans Inf Technol"},{"key":"1660_CR21","doi-asserted-by":"crossref","unstructured":"Laxman S, Sastry PS, Unnikrishnan KP (2007) A fast algorithm for finding frequent episodes in event streams. In: Proceedings of the 13th ACM international conference on knowledge discovery and data mining (SIGKDD), San Jose, CA, ACM, pp 410\u2013419","DOI":"10.1145\/1281192.1281238"},{"key":"1660_CR22","doi-asserted-by":"crossref","unstructured":"Laxman S, Tankasali V, White RW (2008) Stream prediction using a generative model based on frequent episodes in event sequences. In: Proceedings of the 14th ACM international conference on knowledge discovery and data mining (SIGKDD), Las Vegas, NV, pp 453\u2013461","DOI":"10.1145\/1401890.1401947"},{"key":"1660_CR23","doi-asserted-by":"crossref","unstructured":"Li M, Vit\u00e1nyi P (1993) An introduction to kolmogorov complexity and its applications. Springer","DOI":"10.1007\/978-1-4757-3860-5"},{"issue":"2","key":"1660_CR24","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1007\/s10618-007-0064-z","volume":"15","author":"J Lin","year":"2007","unstructured":"Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107\u2013144","journal-title":"Data Min Knowl Discov"},{"issue":"3","key":"1660_CR25","doi-asserted-by":"publisher","first-page":"259","DOI":"10.1023\/A:1009748302351","volume":"1","author":"H Mannila","year":"1997","unstructured":"Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259\u2013289","journal-title":"Data Min Knowl Discov"},{"key":"1660_CR26","doi-asserted-by":"crossref","unstructured":"Pearl J (2009) Causality: models, reasoning and inference, 2nd edn. Cambridge University Press","DOI":"10.1017\/CBO9780511803161"},{"key":"1660_CR27","doi-asserted-by":"crossref","unstructured":"Radinsky K, Horvitz E (2013) Mining the web to predict future events. In: WSDM, pp 255\u2013264","DOI":"10.1145\/2433396.2433431"},{"issue":"5","key":"1660_CR28","doi-asserted-by":"publisher","first-page":"465","DOI":"10.1016\/0005-1098(78)90005-5","volume":"14","author":"J Rissanen","year":"1978","unstructured":"Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465\u2013471","journal-title":"Automatica"},{"issue":"2","key":"1660_CR29","doi-asserted-by":"publisher","first-page":"416","DOI":"10.1214\/aos\/1176346150","volume":"11","author":"J Rissanen","year":"1983","unstructured":"Rissanen J (1983) A universal prior for integers and estimation by minimum description length. Ann Stat 11(2):416\u2013431","journal-title":"Ann Stat"},{"key":"1660_CR30","doi-asserted-by":"crossref","unstructured":"Saadallah A, Jakobs M, Morik K (2021) Explainable online deep neural network selection using adaptive saliency maps for time series forecasting. In: Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD), Virtual","DOI":"10.1007\/978-3-030-86486-6_25"},{"key":"1660_CR31","doi-asserted-by":"crossref","unstructured":"Scharw\u00e4chter E, M\u00fcller E (2020) Two-sample testing for event impacts in time series. In: Proceedings of the SIAM international conference on data mining (SDM), SIAM, pp 10\u201318","DOI":"10.1137\/1.9781611976236.2"},{"issue":"2","key":"1660_CR32","doi-asserted-by":"publisher","first-page":"461","DOI":"10.1103\/PhysRevLett.85.461","volume":"85","author":"T Schreiber","year":"2000","unstructured":"Schreiber T (2000) Measuring information transfer. Phys Rev Lett 85(2):461","journal-title":"Phys Rev Lett"},{"key":"1660_CR33","first-page":"2003","volume":"7","author":"S Shimizu","year":"2006","unstructured":"Shimizu S, Hoyer PO, Hyv\u00e4rinen A, Kerminen A (2006) A linear non-gaussian acyclic model for causal discovery. J Mach Learn Res 7:2003\u20132030","journal-title":"J Mach Learn Res"},{"issue":"4","key":"1660_CR34","doi-asserted-by":"publisher","first-page":"1046","DOI":"10.1007\/s10618-013-0327-9","volume":"28","author":"N Tatti","year":"2014","unstructured":"Tatti N (2014) Discovering episodes with compact minimal windows. Data Min Knowl Discov 28(4):1046\u20131077","journal-title":"Data Min Knowl Discov"},{"key":"1660_CR35","doi-asserted-by":"crossref","unstructured":"Tatti N, Vreeken J (2012) The long and the short of it: summarising event sequences with serial episodes. In: Proceedings of the 18th ACM international conference on knowledge discovery and data mining (SIGKDD), Beijing, China, pp 462\u2013470","DOI":"10.1145\/2339530.2339606"},{"issue":"8","key":"1660_CR36","doi-asserted-by":"publisher","first-page":"1042","DOI":"10.1109\/TKDE.2007.1043","volume":"19","author":"J Wang","year":"2007","unstructured":"Wang J, Han J, Li C (2007) Frequent closed sequence mining without candidate maintenance. IEEE Trans Knowl Data Eng 19(8):1042\u20131056","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"1660_CR37","first-page":"359","volume":"98","author":"GM Weiss","year":"1998","unstructured":"Weiss GM, Hirsh H (1998) Learning to predict rare events in event sequences. Proc ACM Int Conf Knowl Discov Data Min (SIGKDD) 98:359\u2013363","journal-title":"Proc ACM Int Conf Knowl Discov Data Min (SIGKDD)"},{"key":"1660_CR38","doi-asserted-by":"crossref","unstructured":"Wu Q, Gao Y, Gao X, Weng P, Chen G (2019) Dual sequential prediction models linking sequential recommendation and information dissemination. In: Proceedings of the ACM international conference on knowledge discovery and data mining (SIGKDD), pp 447\u2013457","DOI":"10.1145\/3292500.3330959"},{"key":"1660_CR39","doi-asserted-by":"crossref","unstructured":"Wu Z, Pan S, Long G, Jiang J, Chang X, Zhang C (2020) Connecting the dots: multivariate time series forecasting with graph neural networks. In: Proceedings of the ACM international conference on knowledge discovery and data mining (SIGKDD), ACM, Virtual Event CA USA, pp 753\u2013763","DOI":"10.1145\/3394486.3403118"},{"issue":"12","key":"1660_CR40","doi-asserted-by":"publisher","first-page":"1802","DOI":"10.14778\/3137765.3137784","volume":"10","author":"CCM Yeh","year":"2017","unstructured":"Yeh CCM, Kavantzas N, Keogh E (2017) Matrix profile IV: using weakly labeled time series to predict outcomes. Proc VLDB Endow 10(12):1802\u20131812","journal-title":"Proc VLDB Endow"},{"key":"1660_CR41","doi-asserted-by":"crossref","unstructured":"Zhao L, Ye J, Chen F, Lu CT, Ramakrishnan N (2016) Hierarchical incomplete multi-source feature learning for spatiotemporal event forecasting. In: Proceedings of the ACM international conference on knowledge discovery and data mining (SIGKDD), pp 2085\u20132094","DOI":"10.1145\/2939672.2939847"},{"issue":"23","key":"1660_CR42","doi-asserted-by":"publisher","first-page":"9294","DOI":"10.1016\/j.eswa.2015.08.021","volume":"42","author":"C Zhou","year":"2015","unstructured":"Zhou C, Cule B, Goethals B (2015) A pattern based predictor for event streams. Expert Syst Appl 42(23):9294\u20139306","journal-title":"Expert Syst Appl"},{"issue":"5","key":"1660_CR43","doi-asserted-by":"publisher","first-page":"1285","DOI":"10.1109\/TKDE.2015.2510010","volume":"28","author":"C Zhou","year":"2016","unstructured":"Zhou C, Cule B, Goethals B (2016) Pattern based sequence classification. IEEE Trans Knowl Data Eng 28(5):1285\u20131298","journal-title":"IEEE Trans Knowl Data Eng"}],"container-title":["Knowledge and Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-022-01660-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10115-022-01660-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-022-01660-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,19]],"date-time":"2024-09-19T20:41:28Z","timestamp":1726778488000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10115-022-01660-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,5]]},"references-count":43,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["1660"],"URL":"https:\/\/doi.org\/10.1007\/s10115-022-01660-1","relation":{},"ISSN":["0219-1377","0219-3116"],"issn-type":[{"type":"print","value":"0219-1377"},{"type":"electronic","value":"0219-3116"}],"subject":[],"published":{"date-parts":[[2022,3,5]]},"assertion":[{"value":"18 August 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 January 2022","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 January 2022","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 March 2022","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}