{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T02:13:49Z","timestamp":1769825629433,"version":"3.49.0"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2007,1,26]],"date-time":"2007-01-26T00:00:00Z","timestamp":1169769600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data Min Knowl Disc"],"published-print":{"date-parts":[[2007,2,24]]},"DOI":"10.1007\/s10618-006-0049-3","type":"journal-article","created":{"date-parts":[[2007,1,25]],"date-time":"2007-01-25T20:23:03Z","timestamp":1169756583000},"page":"99-129","source":"Crossref","is-referenced-by-count":73,"title":["Compression-based data mining of sequential data"],"prefix":"10.1007","volume":"14","author":[{"given":"Eamonn","family":"Keogh","sequence":"first","affiliation":[]},{"given":"Stefano","family":"Lonardi","sequence":"additional","affiliation":[]},{"given":"Chotirat Ann","family":"Ratanamahatana","sequence":"additional","affiliation":[]},{"given":"Li","family":"Wei","sequence":"additional","affiliation":[]},{"given":"Sang-Hee","family":"Lee","sequence":"additional","affiliation":[]},{"given":"John","family":"Handley","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2007,1,26]]},"reference":[{"issue":"1","key":"49_CR1","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1016\/S0097-8485(00)80006-6","volume":"24","author":"L Allison","year":"2000","unstructured":"Allison L, Stern L, Edgoose T, Dix TI (2000) Sequence complexity for biological sequence analysis. Comput Chem 24(1):43\u201355","journal-title":"Comput Chem"},{"key":"49_CR2","unstructured":"Baronchelli A, Caglioti E, Loreto V (2005) Artificial sequences and complexity measures. J. Stat. Mech: Theory and Exp, Issue 04, P04002"},{"key":"49_CR3","doi-asserted-by":"crossref","first-page":"048702","DOI":"10.1103\/PhysRevLett.88.048702","volume":"88","author":"D Benedetto","year":"2002","unstructured":"Benedetto D, Caglioti E, Loreto V (2002) Language trees and zipping. Phys Rev Lett 88: 048702","journal-title":"Phys Rev Lett"},{"key":"49_CR4","doi-asserted-by":"crossref","unstructured":"Chakrabarti D, Papadimitriou S, Modha D, Faloutsos C (2004) Fully automatic cross-assocations, In: Proceedings of the KDD 2004, Seattle, WA","DOI":"10.1145\/1014052.1014064"},{"key":"49_CR5","volume-title":"Towards automated data linkage and deduplication","author":"P Christen","year":"2005","unstructured":"Christen P, Goiser K (2005) Towards automated data linkage and deduplication. Tech Report, Australian National University"},{"issue":"2","key":"49_CR6","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1109\/5254.850825","volume":"15","author":"D Cook","year":"2000","unstructured":"Cook D, Holder LB (2000) Graph-based data mining. IEEE Intell Syst 15(2):32\u201341","journal-title":"IEEE Intell Syst"},{"key":"49_CR7","unstructured":"Dasgupta D, Forrest S (1999) Novelty detection in time series data using ideas from immunology. In: Proc. of the international conference on intelligent systems, Heidelberg, Germany"},{"key":"49_CR8","unstructured":"Domingos P (1998) A process-oriented heuristic for model selection. In: Machine learning Proc. of the fifteenth international conference,. Morgan Kaufmann Publishers, San Francisco, CA, pp 27\u2013135"},{"key":"49_CR9","doi-asserted-by":"crossref","unstructured":"Elkan, C (2001) Magical thinking in data mining: lessons from CoIL challenge 2000. In Proc. of SIGKDD 2001, San Francisco, CA, USA, pp 426\u2013431","DOI":"10.1145\/502512.502576"},{"key":"49_CR10","unstructured":"Elkan C (2003) Using the triangle inequality to accelerate k-Means. In: Proc. of ICML 2003, Washington DC, USA, pp 147\u2013153"},{"key":"49_CR11","doi-asserted-by":"crossref","unstructured":"Faloutsos C, Lin K (1995) FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proc. of 24th ACM SIGMOD, San Jose, CA, USA","DOI":"10.1145\/223784.223812"},{"key":"49_CR12","unstructured":"Farach M, Noordewier M, Savari S, Shepp L, Wyner A, Ziv J (1995) On the entropy of DNA: algorithms and measurements based on memory and rapid convergence. In: Proc. of the symp. on discrete algorithms, San Francisco, CA, USA pp 48-57"},{"key":"49_CR13","unstructured":"Ferrandina F, Meyer T, Zicari R (1994) Implementing lazy database updates for an object database system. In: Proc. of the 20 international conference on very large databases, Santiago de Chile, Chile, pp 261\u2013272"},{"key":"49_CR14","unstructured":"Flexer A (1996) Statistical evaluation of neural networks experiments: minimum requirements and current practice. In: Proc. of the 13th european meeting on cybernetics and systems research, vol. 2, Austria, pp 1005\u20131008"},{"key":"49_CR15","doi-asserted-by":"crossref","unstructured":"Frank E, Chui C, Witten I (2000) Text categorization using compression models. In: Proc. of the IEEE data compression conference, Snowbird, Utah, IEEE Comput Soc p555","DOI":"10.1109\/DCC.2000.838202"},{"key":"49_CR16","doi-asserted-by":"crossref","unstructured":"Gaussier E, Goutte C, Popat K, Chen F (2002) A hierarchical model for clustering and categorising documents source lecture notes in computer science; Vol. 2291 archive Proceedings of the 24th BCS-IRSG european colloquium on IR research: advances in information retrieval, Glasgow, UK","DOI":"10.1007\/3-540-45886-7_16"},{"key":"49_CR17","volume-title":"Information theory and the living systems","author":"L Gatlin","year":"1972","unstructured":"Gatlin L (1972) Information theory and the living systems. Columbia University Press, columbia"},{"key":"49_CR18","unstructured":"Gavrilov M, Anguelov D, Indyk P, Motwahl R (2000) Mining the stock market: which measure is best? In: Proc. of the 6th ACM SIGKDD, 2000, Boston, MA, USA"},{"key":"49_CR19","doi-asserted-by":"crossref","unstructured":"Ge X, Smyth P (2000) Deformable Markov model templates for time-series pattern matching. In: Proc. of the 6th ACM SIGKDD, Boston, MA, pp 81\u201390","DOI":"10.1145\/347090.347109"},{"issue":"23","key":"49_CR20","doi-asserted-by":"crossref","first-page":"e215","DOI":"10.1161\/01.CIR.101.23.e215","volume":"101","author":"A.L Goldberger","year":"2000","unstructured":"Goldberger A.L, Amaral L, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE (2000) PhysioBank, physioToolkit, and physioNet: components of a new research resource for complex physiologic signals. Circulation 101(23):e215\u2013e220","journal-title":"Circulation"},{"key":"49_CR21","doi-asserted-by":"crossref","unstructured":"Kalpakis K, Gada D, Puttagunta V (2001) Distance measures for effective clustering of ARIMA time-series. In: Proceedings of the 1st IEEE ICDM, San Jose, CA, pp 273-280","DOI":"10.1109\/ICDM.2001.989529"},{"key":"49_CR22","doi-asserted-by":"crossref","unstructured":"Kennel M (2004) Testing time symmetry in time series using data compression dictionaries. Phys Rev E 69; 056208","DOI":"10.1103\/PhysRevE.69.056208"},{"key":"49_CR23","unstructured":"Keogh E. http:\/\/www.cs.ucr.edu\/\u223ceamonn\/SIGKDD2004, University of California, Riverside"},{"key":"49_CR24","volume-title":"The UCR time series data mining archive","author":"E Keogh","year":"2002","unstructured":"Keogh E, Folias T (2002) The UCR time series data mining archive. University of California, Riverside CA [http:\/\/www.cs.ucr.edu\/\u223ceamonn\/TSDMA\/index.html]"},{"key":"49_CR25","doi-asserted-by":"crossref","unstructured":"Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proc. of SIGKDD, Edmonton, Alberta, Canada","DOI":"10.1145\/775047.775062"},{"key":"49_CR26","doi-asserted-by":"crossref","unstructured":"Keogh E, Lin J, Truppel W (2003) Clustering of time series subsequences is meaningless: implications for past and future research. In: Proc. of the 3rd IEEE ICDM, Melbourne, FL, pp 115\u2013122","DOI":"10.1109\/ICDM.2003.1250910"},{"key":"49_CR27","unstructured":"Kit C (1998) A goodness measure for phrase learning via compression with the MDL principle. In: Kruijff-Korbayova I (ed) The ELLSSI-98 student session, Chapt 13, Saarbrueken, pp 175\u2013187"},{"key":"49_CR28","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1093\/bioinformatics\/17.2.149","volume":"17","author":"M Li","year":"2001","unstructured":"Li M, Badger JH, Chen X, Kwong S, Kearney P, Zhang H (2001) An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17:149\u2013154","journal-title":"Bioinformatics"},{"key":"49_CR29","unstructured":"Li M, Chen X, Li X, Ma B, Vitanyi, P (2003) The similarity metric. In: Proc. of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, Baltimore, MD, USA, pp 863\u2013872"},{"key":"49_CR30","doi-asserted-by":"crossref","unstructured":"Li M, Vitanyi P (1997) An introduction to kolmogorov complexity and its applications, 2nd edn, Springer Verlag, Berlin","DOI":"10.1007\/978-1-4757-2606-0"},{"key":"49_CR31","doi-asserted-by":"crossref","unstructured":"Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proc. of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, San Diego, CA","DOI":"10.1145\/882082.882086"},{"key":"49_CR32","unstructured":"Loewenstern D, Hirsh H, Yianilos P, Noordewier M (1995) DNA sequence classification using compression-based induction, DIMACS Technical Report 95-04"},{"key":"49_CR33","doi-asserted-by":"crossref","unstructured":"Loewenstern D, Yianilos PN (1999) Significantly lower entropy estimates for natural DNA sequences, J Comput Biol 6(1)","DOI":"10.1089\/cmb.1999.6.125"},{"key":"49_CR34","doi-asserted-by":"crossref","unstructured":"Ma J, Perkins S (2003) Online novelty detection on temporal sequences. In: Proc. international conference on knowledge discovery and data mining, Washington, DC","DOI":"10.1145\/956804.956828"},{"key":"49_CR35","unstructured":"Mahoney M, Chan P (2005) Learning rules for time series anomaly detection. SensorMiner Tech report (available at [www.interfacecontrol.com\/products\/sensorMiner\/])"},{"key":"49_CR36","unstructured":"Mehta M, Rissanen J, Agrawal R (1995) MDL-based decision tree pruning, In: Proceedings of the first international conference on knowledge discovery and data mining (KDD\u201995), Montreal, Canada"},{"key":"49_CR37","unstructured":"Needham S, Dowe D(2001) Message length as an effective ockham\u2019s razor in decision tree induction, In: Proc. 8th international workshop on AI and statistics, Key West, FL, USA, pp 253\u2013260"},{"key":"49_CR38","unstructured":"Ortega A, Beferull-Lozano B, Srinivasamurthy N, Xie H (2000) Compression for recognition and content based retrieval. In: Proc. of the European signal processing conference, EUSIPCO\u201900, Tampere, Finland"},{"key":"49_CR39","doi-asserted-by":"crossref","unstructured":"Papadimitriou S, Gionis A, Tsaparas P, V\u00e4is\u00e4nen A, Mannila H, Faloutsos C (2005) Parameter-free spatial data mining using MDL, In: Proc of the 5th International Conference on Data Mining (ICDM), Houston, TX, USA","DOI":"10.1109\/ICDM.2005.117"},{"key":"49_CR40","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1016\/0890-5401(89)90010-2","volume":"80","author":"JR Quinlan","year":"1989","unstructured":"Quinlan JR, Rivest RL (1989) Inferring decision trees using the minimum description length principle. Infor Comput 80:227\u2013248","journal-title":"Infor Comput"},{"key":"49_CR41","doi-asserted-by":"crossref","unstructured":"Ratanamahatana CA, Keogh E (2004) Making time-series classification more accurate using learned constraints. In: Proc. of SIAM international conference on data mining (SDM \u201904), Lake Buena Vista, Florida","DOI":"10.1137\/1.9781611972740.2"},{"key":"49_CR42","doi-asserted-by":"crossref","first-page":"465","DOI":"10.1016\/0005-1098(78)90005-5","volume":"14","author":"J Rissanen","year":"1978","unstructured":"Rissanen J (1978) Modeling by shortest data description. Automatica, 14:465\u2013471","journal-title":"Automatica,"},{"issue":"3","key":"49_CR43","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1023\/A:1009752403260","volume":"1","author":"SL Salzberg","year":"1997","unstructured":"Salzberg SL (1997) On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Min Knowl Disc 1(3):317\u2013328","journal-title":"Data Min Knowl Disc"},{"key":"49_CR44","doi-asserted-by":"crossref","unstructured":"Segen J (1990) Graph clustering and model learning by data compression. In: Proc. of the machine learning conference, Austin, TX, USA, pp 93\u2013101","DOI":"10.1016\/B978-1-55860-141-3.50015-8"},{"key":"49_CR45","doi-asserted-by":"crossref","unstructured":"Sculley D, Brodley CE (2006) Compression and machine learning: a new perspective on feature space vectors, In: Proceedings of data compression conference, Snowbird, UT, USA, pp 332\u2013341","DOI":"10.1109\/DCC.2006.13"},{"key":"49_CR46","doi-asserted-by":"crossref","unstructured":"Shahabi C, Tian X, Zhao W (2000) TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries. In: Proc. of the 12th Int\u2019l conference on scientific and statistical database management (SSDBM 2000), Berlin, Germany","DOI":"10.1109\/SSDM.2000.869778"},{"key":"49_CR47","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1162\/089120100561746","volume":"26","author":"WJ Teahan","year":"2000","unstructured":"Teahan WJ, Wen Y, McNab RJ, Witten IH (2000) A compression-based algorithm for Chinese word segmentation. Comput Linguist 26:375\u2013393","journal-title":"Comput Linguist"},{"key":"49_CR48","doi-asserted-by":"crossref","unstructured":"Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh E (2003) Indexing multi-dimensional time-series with support for multiple distance measures. In: Proc. of the 9th ACM SIGKDD, Washington, DC, USA, pp 216\u2013225","DOI":"10.1145\/956750.956777"},{"issue":"(2","key":"49_CR49","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1093\/comjnl\/11.2.185","volume":"11","author":"C Wallace","year":"1968","unstructured":"Wallace C, Boulton (1968) An information measure for classification. Comput J 11 (2):185\u2013194","journal-title":"Comput J"},{"key":"49_CR50","unstructured":"Yairi T, Kato Y, Hori K (2001) Fault detection by mining association rules from house-keeping data. In: Proc. of Int\u2019l sym. on AI, Robotics and Automation in Space"}],"container-title":["Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-006-0049-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10618-006-0049-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-006-0049-3","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,5,30]],"date-time":"2019-05-30T19:29:39Z","timestamp":1559244579000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10618-006-0049-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,1,26]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2007,2,24]]}},"alternative-id":["49"],"URL":"https:\/\/doi.org\/10.1007\/s10618-006-0049-3","relation":{},"ISSN":["1384-5810","1573-756X"],"issn-type":[{"value":"1384-5810","type":"print"},{"value":"1573-756X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,1,26]]}}}