{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T11:50:09Z","timestamp":1759146609201},"reference-count":33,"publisher":"IGI Global","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,10,1]]},"abstract":"<p>In many data mining applications, both classification and clustering algorithms require a distance\/similarity measure. The central problem in similarity based clustering\/classification comprising sequential data is deciding an appropriate similarity metric. The existing metrics like Euclidean, Jaccard, Cosine, and so forth do not exploit the sequential nature of data explicitly. In this paper, the authors propose a similarity preserving function called Sequence and Set Similarity Measure (S3M) that captures both the order of occurrence of items in sequences and the constituent items of sequences. The authors demonstrate the usefulness of the proposed measure for classification and clustering tasks. Experiments were conducted on benchmark datasets, that is, DARPA\u201998 and msnbc, for classification task in intrusion detection and clustering task in web mining domains. Results show the usefulness of the proposed measure.<\/p>","DOI":"10.4018\/jdwm.2010100102","type":"journal-article","created":{"date-parts":[[2011,2,15]],"date-time":"2011-02-15T20:14:43Z","timestamp":1297800883000},"page":"16-32","source":"Crossref","is-referenced-by-count":13,"title":["A New Similarity Metric for Sequential Data"],"prefix":"10.4018","volume":"6","author":[{"given":"Pradeep","family":"Kumar","sequence":"first","affiliation":[{"name":"Indian Institute of Management, India"}]},{"given":"Bapi S.","family":"Raju","sequence":"additional","affiliation":[{"name":"University of Hyderabad, India"}]},{"given":"P. Radha","family":"Krishna","sequence":"additional","affiliation":[{"name":"Infosys Technologies Limited, Hyderabad, India"}]}],"member":"2432","reference":[{"issue":"4","key":"jdwm.2010100102-0","doi-asserted-by":"crossref","first-page":"62","DOI":"10.4018\/jdwm.2008100104","article-title":"Effectiveness of fuzzy classifier rules in capturing correlations between genes.","volume":"4","author":"M.Alshalalfa","year":"2008","journal-title":"International Journal of Data Warehousing and Mining"},{"key":"jdwm.2010100102-1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01840365"},{"key":"jdwm.2010100102-2","unstructured":"Bergroth, L., Hakonen, H., & Raita, T. (2000). A survey of longest common subsequence algorithm. In Proceedings of the Seventh International Symposium on String Processing and Information Retriveal SPIRE, Atlanta (pp. 39-48). Washington, DC: IEEE Computer Society."},{"key":"jdwm.2010100102-3","author":"R. O.Duda","year":"2001","journal-title":"Pattern Classification"},{"key":"jdwm.2010100102-4","doi-asserted-by":"publisher","DOI":"10.1145\/360825.360861"},{"key":"jdwm.2010100102-5","doi-asserted-by":"publisher","DOI":"10.1007\/BF01934514"},{"key":"jdwm.2010100102-6","doi-asserted-by":"publisher","DOI":"10.1145\/359581.359603"},{"key":"jdwm.2010100102-7","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8655(85)90061-3"},{"key":"jdwm.2010100102-8","doi-asserted-by":"publisher","DOI":"10.1145\/245108.245126"},{"issue":"1","key":"jdwm.2010100102-9","doi-asserted-by":"crossref","first-page":"29","DOI":"10.4018\/jdwm.2007010102","article-title":"SeqPAM: A sequence clustering algorithm for web personalization.","volume":"3","author":"P.Kumar","year":"2007","journal-title":"International Journal of Data Warehousing and Mining"},{"key":"jdwm.2010100102-10","doi-asserted-by":"publisher","DOI":"10.1016\/j.datak.2007.01.003"},{"key":"jdwm.2010100102-11","doi-asserted-by":"crossref","unstructured":"Kumar, P., Rao, M. V., Krishna, P. R., & Bapi, R. S. (2005). Using sub-sequence information with kNN for classification of sequential data. In International Conference on Distributed Computing and Internet Technology (LNCS, pp. 536-546). New York: Springer.","DOI":"10.1007\/11604655_60"},{"issue":"7","key":"jdwm.2010100102-12","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions, and reversals.","volume":"10","author":"L. I.Levenshtein","year":"1966","journal-title":"Soviet Physics, Doklady"},{"key":"jdwm.2010100102-13","unstructured":"Liao, Y., & Vemuri, V. R. (2002). Using text categorization techniques for intrusion detection. In Proceedings of the 11th USENIX Security Symposium, Berkeley, CA (pp. 51-59). USENIX Association."},{"key":"jdwm.2010100102-14","doi-asserted-by":"publisher","DOI":"10.1016\/0022-0000(80)90002-1"},{"key":"jdwm.2010100102-15","author":"T. M.Mitchell","year":"1997","journal-title":"Machine learning"},{"issue":"2","key":"jdwm.2010100102-16","doi-asserted-by":"crossref","first-page":"55","DOI":"10.4018\/jdwm.2006040103","article-title":"Improving similarity search in time series using wavelets.","volume":"2","author":"S. I. L.Mohamad","year":"2006","journal-title":"International Journal of Data Warehousing and Mining"},{"key":"jdwm.2010100102-17","author":"D. W.Mount","year":"2004","journal-title":"Bioinformatics: Sequence and Genome Analysis"},{"key":"jdwm.2010100102-18","doi-asserted-by":"publisher","DOI":"10.1007\/BF01840446"},{"key":"jdwm.2010100102-19","doi-asserted-by":"publisher","DOI":"10.1007\/BF00264437"},{"issue":"2","key":"jdwm.2010100102-20","doi-asserted-by":"crossref","first-page":"63","DOI":"10.4018\/jdwm.2008040108","article-title":"Classification of imbalanced data with random sets and mean-variance filtering.","volume":"4","author":"V.Nikulin","year":"2008","journal-title":"International Journal of Data Warehousing and Mining"},{"key":"jdwm.2010100102-21","doi-asserted-by":"crossref","unstructured":"Paterson, M., & Danc\u2019ik, V. (1994). Longest common subsequences. In Proceedings of the 19th International Symposium Mathematical Foundations of Computer Science, Kosice, Slovakia (LNCS, pp. 127-142). Berlin: Springer Verlag.","DOI":"10.1007\/3-540-58338-6_63"},{"issue":"1","key":"jdwm.2010100102-22","first-page":"43","article-title":"Intrusion detection using processing techniques with a binary-weighted gosine metric.","volume":"1","author":"S.Rawat","year":"2006","journal-title":"Journal of Information Assurance and Security"},{"key":"jdwm.2010100102-23","doi-asserted-by":"crossref","unstructured":"Rick. (1994). New algorithms for the longest common subsequence problem (Tech. Rep. No. 85123-CS). Bonn, Germany: University of Bonn, Department of Computer Science.","DOI":"10.1007\/3-540-60044-2_53"},{"key":"jdwm.2010100102-24","unstructured":"Sankoff & Kruskal. J. B. (1983). Time warps, string edits, and macromolecules: The theory and practice of sequence comparison. Reading, MA: Addison-Wesley."},{"issue":"3","key":"jdwm.2010100102-25","doi-asserted-by":"crossref","first-page":"28","DOI":"10.4018\/jdwm.2007070103","article-title":"A single pass algorithm for discovering significant intervals in time-series data.","volume":"3","author":"S.Savla","year":"2007","journal-title":"International Journal of Data Warehousing and Mining"},{"key":"jdwm.2010100102-26","unstructured":"Shahabi, C., Zarkesh, A. M., Adibi, J., & Shah, V. (1997). Knowledge discovery from user\u2019s web-page navigation. In Proceedings of the 7th International Workshop on Research Issues in Data Engineering, High Performance Database Management for Large-Scale Applications, Birmingham, England (pp. 20-29). Washington, DC: IEEE Computer Society."},{"key":"jdwm.2010100102-27","first-page":"79","article-title":"Sequence comparison: some theory and some practice","author":"I.Simon","year":"1987","journal-title":"Electronic Dictionaries and Automata in Computational Linguistics, Saint Pierre d\u2019Oleron"},{"issue":"4","key":"jdwm.2010100102-28","doi-asserted-by":"crossref","first-page":"83","DOI":"10.4018\/jdwm.2007100105","article-title":"Acquiring semantic sibling associations from web documents.","volume":"3","author":"M.Spiliopoulou","year":"2007","journal-title":"International Journal of Data Warehousing and Mining"},{"issue":"1","key":"jdwm.2010100102-29","doi-asserted-by":"crossref","first-page":"1","DOI":"10.3233\/HIS-2006-3101","article-title":"Maximum-entropy estimated distribution model for classification problems.","volume":"3","author":"L.Tan","year":"2006","journal-title":"International Journal Hybrid Intelligent System"},{"key":"jdwm.2010100102-30","doi-asserted-by":"publisher","DOI":"10.1145\/321796.321811"},{"key":"jdwm.2010100102-31","doi-asserted-by":"crossref","unstructured":"Yan, T. W., Jacobsen, M., Molina, H. G., & Dayal, U. (1996). From user access patterns to dynamic hypertext linking. In Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems (pp. 1007-1014). Amsterdam: Elsevier.","DOI":"10.1016\/0169-7552(96)00051-7"},{"issue":"4","key":"jdwm.2010100102-32","doi-asserted-by":"crossref","first-page":"84","DOI":"10.4018\/jdwm.2008100105","article-title":"A Graph-Based Biomedical Literature Clustering Approach Utilizing Term\u2019s Global and Local Importance Information.","volume":"4","author":"X.Zhang","year":"2008","journal-title":"International Journal of Data Warehousing and Mining"}],"container-title":["International Journal of Data Warehousing and Mining"],"original-title":[],"language":"ng","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=46941","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,1]],"date-time":"2022-06-01T22:51:27Z","timestamp":1654123887000},"score":1,"resource":{"primary":{"URL":"https:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/jdwm.2010100102"}},"subtitle":[""],"short-title":[],"issued":{"date-parts":[[2010,10,1]]},"references-count":33,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2010,10]]}},"URL":"https:\/\/doi.org\/10.4018\/jdwm.2010100102","relation":{},"ISSN":["1548-3924","1548-3932"],"issn-type":[{"value":"1548-3924","type":"print"},{"value":"1548-3932","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,10,1]]}}}