{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T18:04:57Z","timestamp":1754157897533,"version":"3.41.2"},"reference-count":42,"publisher":"Emerald","issue":"3","license":[{"start":{"date-parts":[[2009,8,28]],"date-time":"2009-08-28T00:00:00Z","timestamp":1251417600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,8,28]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-heading\">Purpose<\/jats:title><jats:p>Web users' clustering is an important mining task since it contributes in identifying usage patterns, a beneficial task for a wide range of applications that rely on the web. The purpose of this paper is to examine the usage of Kullback\u2010Leibler (KL) divergence, an information theoretic distance, as an alternative option for measuring distances in web users clustering.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Design\/methodology\/approach<\/jats:title><jats:p>KL\u2010divergence is compared with other well\u2010known distance measures and clustering results are evaluated using a criterion function, validity indices, and graphical representations. Furthermore, the impact of noise (i.e. occasional or mistaken page visits) is evaluated, since it is imperative to assess whether a clustering process exhibits tolerance in noisy environments such as the web.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Findings<\/jats:title><jats:p>The proposed KL clustering approach is of similar performance when compared with other distance measures under both synthetic and real data workloads. Moreover, imposing extra noise on real data, the approach shows minimum deterioration among most of the other conventional distance measures.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Practical implications<\/jats:title><jats:p>The experimental results show that a probabilistic measure such as KL\u2010divergence has proven to be quite efficient in noisy environments and thus constitute a good alternative, the web users clustering problem.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Originality\/value<\/jats:title><jats:p>This work is inspired by the usage of divergence in clustering of biological data and it is introduced by the authors in the area of web clustering. According to the experimental results presented in this paper, KL\u2010divergence can be considered as a good alternative for measuring distances in noisy environments such as the web.<\/jats:p><\/jats:sec>","DOI":"10.1108\/17440080910983583","type":"journal-article","created":{"date-parts":[[2009,10,5]],"date-time":"2009-10-05T10:55:23Z","timestamp":1254740123000},"page":"348-371","source":"Crossref","is-referenced-by-count":5,"title":["A new approach to web users clustering and validation: a divergence\u2010based scheme"],"prefix":"10.1108","volume":"5","author":[{"given":"Vassiliki A.","family":"Koutsonikola","sequence":"first","affiliation":[]},{"given":"Sophia G.","family":"Petridou","sequence":"additional","affiliation":[]},{"given":"Athena I.","family":"Vakali","sequence":"additional","affiliation":[]},{"given":"Georgios I.","family":"Papadimitriou","sequence":"additional","affiliation":[]}],"member":"140","reference":[{"key":"key2022012519550593800_b2","unstructured":"Baeza\u2010Yates, R. and Frakes, W. (1992), Information Retrieval: Data Structures and Algorithms, Prentice\u2010Hall, Upper Saddle River, NJ."},{"key":"key2022012519550593800_b3","unstructured":"Boutin, F. and Hascoer, M. (2004), \u201cCluster validity indices for graph partitioning\u201d, Proceedings of the 8th IEEE International Conference on Information Visualisation, London, pp. 376\u201081."},{"key":"key2022012519550593800_b4","unstructured":"Cadez, I., Heckerman, D., Meek, C., Smyth, P. and White, S. (2002), \u201cVisualization of navigation patterns on a website using model\u2010based clustering\u201d, Technical Report MSR\u2010TR\u201000\u201018, Microsoft Research."},{"key":"key2022012519550593800_b5","doi-asserted-by":"crossref","unstructured":"Castellano, G., Fanelli, A.M., Mencar, C. and Torsello, M.A. (2007), \u201cSimilarity\u2010based fuzzy clustering for user profiling\u201d, IEEE\/WIC\/ACM International Conferences on Web Intelligence and Intelligent Agent Technology Workshops, pp. 75\u20108.","DOI":"10.1109\/WI-IATW.2007.32"},{"key":"key2022012519550593800_b6","doi-asserted-by":"crossref","unstructured":"Charikar, M., Guha, S., Tardos, E. and Shmoys, D. (1999), \u201cA constant\u2010factor approximation algorithm for the k\u2010median problem\u201d, Proceedings of the 31st Annual ACM Symposium on Theory of Computing, (STOC), ACM, Atlanta, GA, May 1\u20104, pp. 1\u201010.","DOI":"10.1145\/301250.301257"},{"key":"key2022012519550593800_b7","doi-asserted-by":"crossref","unstructured":"Cover, T. and Thomas, J. (1991), Elements of Information Theory, Wiley, New York, NY.","DOI":"10.1002\/0471200611"},{"key":"key2022012519550593800_b8","doi-asserted-by":"crossref","unstructured":"Davies, D. and Bouldin, D. (1979), \u201cA cluster separation measure\u201d, IEEE Transactions on Pattern Analysis and Machine Learning, Vol. 1 No. 4, pp. 224\u20107.","DOI":"10.1109\/TPAMI.1979.4766909"},{"key":"key2022012519550593800_b9","doi-asserted-by":"crossref","unstructured":"Dhillon, I., Mallela, S. and Kumar, R. (2002), \u201cEnhanced word clustering for hierarchical text classification\u201d, Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), ACM, Edmonton, Canada, July 23\u201026, pp. 191\u2010200.","DOI":"10.1145\/775047.775076"},{"key":"key2022012519550593800_b10","doi-asserted-by":"crossref","unstructured":"Dunn, J.C. (1974), \u201cWell separated clusters and optimal fuzzy partitions\u201d, Journal Cybern, Vol. 4 No. 3, pp. 95\u2010104.","DOI":"10.1080\/01969727408546059"},{"key":"key2022012519550593800_b12","unstructured":"Fung, G. (2001), \u201cA comprehensive overview of basic clustering algorithms\u201d, Technical Report, University of Winsconsin, Madison, WI."},{"key":"key2022012519550593800_b13","unstructured":"Garey, M. and Johnson, D. (1979), Computers and Intractability: A Guide to the Theory of NP\u2010Completeness, W.H. Freeman, New York, NY."},{"key":"key2022012519550593800_b14","doi-asserted-by":"crossref","unstructured":"Gunter, S. and Bunke, H. (2003), \u201cValidation indices for graph clustering\u201d, Pattern Recognition Letters, Vol. 24 No. 8, pp. 1107\u201013.","DOI":"10.1016\/S0167-8655(02)00257-X"},{"key":"key2022012519550593800_b15","doi-asserted-by":"crossref","unstructured":"Halkidi, M., Batistakis, Y. and Vazirgiannis, M. (2002a), \u201cCluster validity methods: part I\u201d, SIGMOD Record, Vol. 31 No. 12, pp. 40\u20105.","DOI":"10.1145\/565117.565124"},{"key":"key2022012519550593800_b16","doi-asserted-by":"crossref","unstructured":"Halkidi, M., Batistakis, Y. and Vazirgiannis, M. (2002b), \u201cClustering validity checking methods: part II\u201d, SIGMOD Record, Vol. 31 No. 3, pp. 19\u201027.","DOI":"10.1145\/601858.601862"},{"key":"key2022012519550593800_b17","doi-asserted-by":"crossref","unstructured":"Jain, A., Murty, M. and Flynn, P. (1999), \u201cData clustering: a review\u201d, ACM Computing Surveys, Vol. 31 No. 3, pp. 264\u2010323.","DOI":"10.1145\/331499.331504"},{"key":"key2022012519550593800_b18","doi-asserted-by":"crossref","unstructured":"Kasturi, J., Acharya, R. and Ramanathan, M. (2003), \u201cAn information theoretic approach for analyzing temporal patterns of gene expression\u201d, Bioinformatics, Vol. 19 No. 4, pp. 449\u201058.","DOI":"10.1093\/bioinformatics\/btg020"},{"key":"key2022012519550593800_b19","doi-asserted-by":"crossref","unstructured":"Kerr, M. and Churchill, A. (2001), \u201cBootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments\u201d, Proceedings of National Academy of Sciences of the United States of America 98, pp. 8961\u20105.","DOI":"10.1073\/pnas.161273698"},{"key":"key2022012519550593800_b20","unstructured":"Lazzerini, B., Marcelloni, F. and Cococcioni, M. (2003), \u201cA system based on hierarchical fuzzy clustering for web users profiling\u201d, IEEE International Conference on Systems, Man and Cybernetics, pp. 1995\u20102000."},{"key":"key2022012519550593800_b21","unstructured":"Liu, X., He, P. and Yang, Q. (2005), \u201cMining user access patterns based on web logs\u201d, Canadian Conference on Electrical and Computer Engineering, pp. 2280\u20103."},{"key":"key2022012519550593800_b23","unstructured":"McQueen, J. (1967), \u201cSome methods for classification and analysis of multivariate observations\u201d, Proceedings of the 5th Berkely Symposium on Mathematical Statistics and Probability, Berkeley, CA, June\u2010July, Vol. 1, pp. 281\u201097."},{"key":"key2022012519550593800_b24","unstructured":"Mojica, J.A., Rojas, D.A., Gomez, J. and Gonzalez, F. (2005), \u201cPage clustering using a distance based algorithm\u201d, paper presented at the 3rd Latin American Web Congress, October 31\u2010November 2, p. 7."},{"key":"key2022012519550593800_b25","doi-asserted-by":"crossref","unstructured":"Petridou, S., Koutsonikola, V., Vakali, A. and Papadimitriou, G. (2006), \u201cA divergence\u2010oriented approach for web users clustering\u201d, Proceedings of International Conference on Computational Science and its Applications (ICCSA 2006), Glasgow, pp. 1229\u201038.","DOI":"10.1007\/11751588_130"},{"key":"key2022012519550593800_b26","doi-asserted-by":"crossref","unstructured":"Petridou, S., Koutsonikola, V., Vakali, A. and Papadimitriou, G. (2008), \u201cTime aware web users clustering\u201d, IEEE Transactions on Knowledge and Data Engineering, Vol. 20 No. 5, pp. 653\u201067.","DOI":"10.1109\/TKDE.2007.190741"},{"key":"key2022012519550593800_b29","doi-asserted-by":"crossref","unstructured":"Shokry, R.A., Saad, A.A., El\u2010Makkey, N.M. and lsmail, M.A. (2006), \u201cUsing new soft clustering technique in adaptive web site\u201d, IEEE\/WIC\/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 281\u20106.","DOI":"10.1109\/WI-IATW.2006.140"},{"key":"key2022012519550593800_b30","doi-asserted-by":"crossref","unstructured":"Srivastava, J., Cooley, R., Deshpande, M. and Tan, P.\u2010N. (2000), \u201cWeb usage mining: discovery and applications of usage patterns from web data\u201d, SIGKDD Explorations, Vol. 1 No. 2, pp. 12\u201023.","DOI":"10.1145\/846183.846188"},{"key":"key2022012519550593800_b31","unstructured":"Stein, B., Meyer zu Eissen, S. and Wibrock, F. (2003), \u201cOn cluster validity and the information need of users\u201d, 3rd IASTED International Conference on Artificial Intelligence and Applications (AIA 03), ACTA Press, Benalm\u00b4dena, Spain, September 8\u201010, pp. 216\u201021."},{"key":"key2022012519550593800_b32","unstructured":"Sturn, A. (2001), \u201cCluster analysis for large scale gene expression studies\u201d, Master's thesis, Graz University of Technology, Graz."},{"key":"key2022012519550593800_b34","doi-asserted-by":"crossref","unstructured":"Vakali, A., Pokorny, J. and Dalamagas, T. (2004), \u201cAn overview of web data clustering practices\u201d, Proceedings of the EDBT 2004 Workshop, Lecture Notes in Computer Science (LNCS) Series, Springer Verlag, Heraklion, pp. 597\u2010606.","DOI":"10.1007\/978-3-540-30192-9_59"},{"key":"key2022012519550593800_b41","doi-asserted-by":"crossref","unstructured":"Vakali, A., Pallis, G. and Angelis, L. (2006), \u201cClustering web information sources, published in the book\u201d, in Vakali, A. and Pallis, G. (Eds), Web Data Management Practices: Emerging Techniques and Technologies, Idea Group Publishing, Hershey, PA, pp. 34\u201055.","DOI":"10.4018\/978-1-59904-228-2.ch002"},{"key":"key2022012519550593800_b35","doi-asserted-by":"crossref","unstructured":"Wijaya, D.T. and Bressan, S. (2006), \u201cClustering web documents using co\u2010citation, coupling, incoming, and outgoing hyperlinks: a comparative performance analysis of algorithms\u201d, International Journal of Web Information Systems, Vol. 2 No. 2, pp. 69\u201076.","DOI":"10.1108\/17440080680000102"},{"key":"key2022012519550593800_b42","doi-asserted-by":"crossref","unstructured":"Xu, R. and Wunsch, D.I. (2005), \u201cSurvey of clustering algorithms\u201d, IEEE Transactions on Neural Networks, Vol. 16 No. 3, pp. 645\u201078.","DOI":"10.1109\/TNN.2005.845141"},{"key":"key2022012519550593800_b36","doi-asserted-by":"crossref","unstructured":"Yang, Y. and Padmanabhan, B. (2005), \u201cGHIC: a hierarchical pattern\u2010based clustering algorithm for grouping web transactions\u201d, IEEE Transactions on Knowledge and Data Engineering, Vol. 17 No. 9, pp. 1300\u20104.","DOI":"10.1109\/TKDE.2005.145"},{"key":"key2022012519550593800_b37","unstructured":"Zeng, H.\u2010J., Chen, Z. and Ma, Y.\u2010M. (2002), \u201cA unified framework for clustering heterogeneous web objects\u201d, Proceedings of the 3rd International Conference on Web Information Systems Engineering, IEEE Press, Grand. Hyatt, Singapore, pp. 161\u201072."},{"key":"key2022012519550593800_b38","doi-asserted-by":"crossref","unstructured":"Zhang, K. (1995), \u201cAlgorithms for the constrained editing distance between ordered labeled trees and related problems\u201d, Pattern Recognition, Vol. 28 No. 3, pp. 463\u201074.","DOI":"10.1016\/0031-3203(94)00109-Y"},{"key":"key2022012519550593800_b40","doi-asserted-by":"crossref","unstructured":"Zhao, Y. and Karypis, G. (2005), \u201cTopic\u2010driven clustering for document datasets\u201d, Proceedings of the SIAM International Conference on Data Mining, Newport Beach, CA, pp. 358\u201069.","DOI":"10.1137\/1.9781611972757.32"},{"key":"key2022012519550593800_frd1","unstructured":"Agrawal, R. and Srikant, R. (1994), \u201cFast algorithms for mining association rules\u201d, Proceedings of the 20th International Conference on Very Large Databases, Santiago, pp. 487\u201099."},{"key":"key2022012519550593800_frd2","doi-asserted-by":"crossref","unstructured":"Eirinaki, M. and Vazirgiannis, M. (2003), \u201cWeb mining for web personalization\u201d, ACM Transactions on Internet Technology, Vol. 3 No. 1, pp. 1\u201027.","DOI":"10.1145\/643477.643478"},{"key":"key2022012519550593800_frd3","unstructured":"Mannila, H., Toivonen, H. and Verkamo, A. (1995), \u201cDiscovering frequent episodes in sequences\u201d, Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, (KDD\u201095), American Association for Artificial Intelligence, Menlo Park, CA, pp. 210\u20105."},{"key":"key2022012519550593800_frd4","doi-asserted-by":"crossref","unstructured":"Schockaert, S., De Cock, M., Cornelis, C. and Kerre, E.E. (2007), \u201cClustering web search results using fuzzy ants\u201d, International Journal of Intelligent Systems, Vol. 22 No. 5, pp. 455\u201074.","DOI":"10.1002\/int.20209"},{"key":"key2022012519550593800_frd5","doi-asserted-by":"crossref","unstructured":"Shen, D., Chen, Z., Yang, Q., Zeng, H.\u2010Z., Zhang, B., Lu, Y. and Ma, W.\u2010Y. (2004), \u201cText classification: web\u2010page classification through summarization\u201d, Proceedings of the 27th ACM International Conference on Research and Development in Information Retrieval, SIGIR'04, Sheffield, July 25\u201029, pp. 242\u20109.","DOI":"10.1145\/1008992.1009035"},{"key":"key2022012519550593800_frd6","doi-asserted-by":"crossref","unstructured":"Vakali, A. and Papadimitriou, G. (2004), \u201cWeb engineering: the evolution of new technologies\u201d, Guest Editorial in IEEE Computing in Science and Engineering, Vol. 6 No. 4, pp. 10\u201011.","DOI":"10.1109\/MCSE.2004.11"},{"key":"key2022012519550593800_frd7","doi-asserted-by":"crossref","unstructured":"Zhang, Y.J. and Liu, Z.Q. (2004), \u201cRefining web search engine results using incremental clustering\u201d, International Journal of Intelligent Systems, Vol. 19 Nos 1\/2, pp. 191\u20109.","DOI":"10.1002\/int.10161"}],"container-title":["International Journal of Web Information Systems"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/www.emeraldinsight.com\/doi\/full-xml\/10.1108\/17440080910983583","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/17440080910983583\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/17440080910983583\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,25]],"date-time":"2025-07-25T00:24:59Z","timestamp":1753403099000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/ijwis\/article\/5\/3\/348-371\/164881"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,8,28]]},"references-count":42,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2009,8,28]]}},"alternative-id":["10.1108\/17440080910983583"],"URL":"https:\/\/doi.org\/10.1108\/17440080910983583","relation":{},"ISSN":["1744-0084"],"issn-type":[{"type":"print","value":"1744-0084"}],"subject":[],"published":{"date-parts":[[2009,8,28]]}}}