{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T03:11:18Z","timestamp":1761621078926,"version":"3.41.0"},"reference-count":101,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2011,10,1]],"date-time":"2011-10-01T00:00:00Z","timestamp":1317427200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2011,10]]},"abstract":"<jats:p>In the last few years we have observed a proliferation of approaches for clustering XML documents and schemas based on their structure and content. The presence of such a huge amount of approaches is due to the different applications requiring the clustering of XML data. These applications need data in the form of similar contents, tags, paths, structures, and semantics. In this article, we first outline the application contexts in which clustering is useful, then we survey approaches so far proposed relying on the abstract representation of data (instances or schema), on the identified similarity measure, and on the clustering algorithm. In this presentation, we aim to draw a taxonomy in which the current approaches can be classified and compared. We aim at introducing an integrated view that is useful when comparing XML data clustering approaches, when developing a new clustering algorithm, and when implementing an XML clustering component. Finally, the article moves into the description of future trends and research issues that still need to be faced.<\/jats:p>","DOI":"10.1145\/1978802.1978804","type":"journal-article","created":{"date-parts":[[2011,10,18]],"date-time":"2011-10-18T13:01:58Z","timestamp":1318942918000},"page":"1-41","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":45,"title":["XML data clustering"],"prefix":"10.1145","volume":"43","author":[{"given":"Alsayed","family":"Algergawy","sequence":"first","affiliation":[{"name":"Madgeburg University, Madegeburg, Germany"}]},{"given":"Marco","family":"Mesiti","sequence":"additional","affiliation":[{"name":"University of Milano, Milano, Italy"}]},{"given":"Richi","family":"Nayak","sequence":"additional","affiliation":[{"name":"Queensland University of Technology, Brisbane, Australia"}]},{"given":"Gunter","family":"Saake","sequence":"additional","affiliation":[{"name":"Magdeburg University, Magdeburg, Germany"}]}],"member":"320","published-online":{"date-parts":[[2011,10,18]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281192.1281201"},{"volume-title":"Proceedings of the 1st International Workshop Model-Based Software and Data Integration (MBSDI'08)","author":"Algergawy A.","key":"e_1_2_1_2_1","unstructured":"Algergawy , A. , Schallehn , E. , and Saake , G . 2008a. Combining effectiveness and efficiency for schema matching evaluation . In Proceedings of the 1st International Workshop Model-Based Software and Data Integration (MBSDI'08) . 19--30. Algergawy, A., Schallehn, E., and Saake, G. 2008a. Combining effectiveness and efficiency for schema matching evaluation. In Proceedings of the 1st International Workshop Model-Based Software and Data Integration (MBSDI'08). 19--30."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1497308.1497337"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.datak.2009.01.001"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbn058"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1363686.1363940"},{"volume-title":"Modern Information Retrieval","author":"Baeza-Yates R.","key":"e_1_2_1_7_1","unstructured":"Baeza-Yates , R. and Ribeiro-Neto , B. 1999. Modern Information Retrieval . ACM Press\/Addison-Wesley . Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. ACM Press\/Addison-Wesley."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/27633.27634"},{"key":"e_1_2_1_9_1","unstructured":"Berkhin P. 2002. Survey of clustering data mining techniques. 10.1.1.145.895.pdf.  Berkhin P. 2002. Survey of clustering data mining techniques. 10.1.1.145.895.pdf."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/4236.968835"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4379(03)00031-0"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10844-006-0023-y"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2004.12.030"},{"key":"e_1_2_1_14_1","unstructured":"Bolshakova N. and Cunningham P. 2005. cluML: a markup language for clustering and cluster validity assessment of microarray data. Tech. rep. TCD-CS-2005-23 The University of Dublin.  Bolshakova N. and Cunningham P. 2005. cluML: a markup language for clustering and cluster validity assessment of microarray data. Tech. rep. TCD-CS-2005-23 The University of Dublin."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1353343.1353358"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1096601.1096629"},{"key":"e_1_2_1_17_1","unstructured":"Bourret R. 2009. XML database products. http:\/\/www.rpbourret.com\/xml\/XMLDatabaseProds.htm.  Bourret R. 2009. XML database products. http:\/\/www.rpbourret.com\/xml\/XMLDatabaseProds.htm."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the International Conference on Internet Computing. 3--9.","author":"Buttler D.","year":"2004","unstructured":"Buttler , D. 2004 . A short survey of document structure similarity algorithms . In Proceedings of the International Conference on Internet Computing. 3--9. Buttler, D. 2004. A short survey of document structure similarity algorithms. In Proceedings of the International Conference on Internet Computing. 3--9."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/130226.134466"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-34963-1_36"},{"volume-title":"XML for Bioinformatics","author":"Cerami E.","key":"e_1_2_1_21_1","unstructured":"Cerami , E. 2005. XML for Bioinformatics . Springer New York . Cerami, E. 2005. XML for Bioinformatics. Springer New York."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539702418498"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/645925.671669"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.datak.2006.02.004"},{"volume-title":"Proceedings of the IJCAI-03 Workshop on Information Integration on the Web (IIWeb). 73--78","author":"Cohen W. W.","key":"e_1_2_1_25_1","unstructured":"Cohen , W. W. , Ravikumar , P. , and Fienberg , S. E . 2003. A comparison of string distance metrics for name-matching tasks . In Proceedings of the IJCAI-03 Workshop on Information Integration on the Web (IIWeb). 73--78 . Cohen, W. W., Ravikumar, P., and Fienberg, S. E. 2003. A comparison of string distance metrics for name-matching tasks. In Proceedings of the IJCAI-03 Workshop on Information Integration on the Web (IIWeb). 73--78."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1013625426931"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2004.11.009"},{"volume-title":"Proceedings of the 28th International Conference on Very Large Data Bases (VLDB). 610--621","author":"Do H. H.","key":"e_1_2_1_28_1","unstructured":"Do , H. H. and Rahm , E . 2002. COMA- A system for flexible combination of schema matching approaches . In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB). 610--621 . Do, H. H. and Rahm, E. 2002. COMA- A system for flexible combination of schema matching approaches. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB). 610--621."},{"key":"e_1_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Doan A. Madhavan J. Domingos P. and Halevy A. 2004. Handbook on Ontologies. Springer 385--404.  Doan A. Madhavan J. Domingos P. and Halevy A. 2004. Handbook on Ontologies. Springer 385--404.","DOI":"10.1007\/978-3-540-24750-0_19"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1031453.1031465"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.95.25.14863"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2005.27"},{"key":"e_1_2_1_33_1","first-page":"27","article-title":"Storing and querying XML data using an RDMBS","volume":"22","author":"Florescu D.","year":"1999","unstructured":"Florescu , D. and Kossmann , D. 1999 . Storing and querying XML data using an RDMBS . IEEE Data Eng. Bull. 22 , 3, 27 -- 34 . Florescu, D. and Kossmann, D. 1999. Storing and querying XML data using an RDMBS. IEEE Data Eng. Bull. 22, 3, 27--34.","journal-title":"IEEE Data Eng. Bull."},{"volume-title":"Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery (PKDD). 175--187","author":"Giannotti F.","key":"e_1_2_1_34_1","unstructured":"Giannotti , F. , Gozzi , C. , and Manco , G . 2002. Clustering transactional data . In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery (PKDD). 175--187 . Giannotti, F., Gozzi, C., and Manco, G. 2002. Clustering transactional data. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery (PKDD). 175--187."},{"key":"e_1_2_1_35_1","first-page":"1","article-title":"Semantic matching: Algorithms and implementation","volume":"9","author":"Giunchiglia F.","year":"2007","unstructured":"Giunchiglia , F. , Yatskevich , M. , and Shvaiko , P. 2007 . Semantic matching: Algorithms and implementation . J. Data Semantics 9 , 1 -- 38 . Giunchiglia, F., Yatskevich, M., and Shvaiko, P. 2007. Semantic matching: Algorithms and implementation. J. Data Semantics 9, 1--38.","journal-title":"J. Data Semantics"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.1060"},{"key":"e_1_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Guerrini G. Mesiti M. and Sanz I. 2007. An Overview of Similarity Measures for Clustering XML Documents. Web Data Management Practices: Emerging Techniques and Technologies. IDEA GROUP.  Guerrini G. Mesiti M. and Sanz I. 2007. An Overview of Similarity Measures for Clustering XML Documents. Web Data Management Practices: Emerging Techniques and Technologies. IDEA GROUP.","DOI":"10.4018\/978-1-59904-228-2.ch003"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/276304.276312"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4379(00)00022-3"},{"key":"e_1_2_1_40_1","first-page":"313","article-title":"ProML-the protein markup language for specification of protein sequences, structures and families","volume":"2","author":"Hanisch D.","year":"2002","unstructured":"Hanisch , D. , Zimmer , R. , and Lengauer , T. 2002 . ProML-the protein markup language for specification of protein sequences, structures and families . Silico Biol. 2 , 3, 313 -- 324 . Hanisch, D., Zimmer, R., and Lengauer, T. 2002. ProML-the protein markup language for specification of protein sequences, structures and families. Silico Biol. 2, 3, 313--324.","journal-title":"Silico Biol."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009769707641"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btg015"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/331499.331504"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2007.01.025"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/11915034_96"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1410140.1410178"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/11751632_22"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/362084.362140"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/584792.584841"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-006-0024-z"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/1321440.1321483"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-004-0156-7"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-023X(99)00044-0"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2004.1264824"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1007\/11511854_7"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.2197\/ipsjdc.2.382"},{"volume-title":"Proceedings of 27th International Conference on Very Large Data Bases (VLDB'01)","author":"Madhavan J.","key":"e_1_2_1_57_1","unstructured":"Madhavan , J. , Bernstein , P. A. , and Rahm , E . 2001. Generic schema matching with cupid . In Proceedings of 27th International Conference on Very Large Data Bases (VLDB'01) . 49--58. Madhavan, J., Bernstein, P. A., and Rahm, E. 2001. Generic schema matching with cupid. In Proceedings of 27th International Conference on Very Large Data Bases (VLDB'01). 49--58."},{"key":"e_1_2_1_58_1","doi-asserted-by":"crossref","unstructured":"Manning C. D. Raghavan P. and Sch\u00fctze H. 2008. Introduction to Information Retrieval. Cambridge University Press.   Manning C. D. Raghavan P. and Sch\u00fctze H. 2008. Introduction to Information Retrieval. Cambridge University Press.","DOI":"10.1017\/CBO9780511809071"},{"volume-title":"Proceedings of the 18th International Conference on Data Engineering (ICDE).","author":"Melnik S.","key":"e_1_2_1_59_1","unstructured":"Melnik , S. , Garcia-Molina , H. , and Rahm , E . 2002. Similarity flooding: A versatile graph matching algorithm and its application to schema matching . In Proceedings of the 18th International Conference on Data Engineering (ICDE). Melnik, S., Garcia-Molina, H., and Rahm, E. 2002. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In Proceedings of the 18th International Conference on Data Engineering (ICDE)."},{"key":"e_1_2_1_60_1","doi-asserted-by":"crossref","unstructured":"Melton J. and Buxton S. 2006. Querying XML: XQuery XPath and SQL\/XML in Context. Morgan Kaufmann\/Elsevier.   Melton J. and Buxton S. 2006. Querying XML: XQuery XPath and SQL\/XML in Context. Morgan Kaufmann\/Elsevier.","DOI":"10.1016\/B978-155860711-8\/50016-2"},{"volume-title":"Proceedings of the Second Joint Meeting of the Institute of Mathematical Statistics and International Society for Bayesian Analysis (IMS-ISBA).","author":"Muller T.","key":"e_1_2_1_61_1","unstructured":"Muller , T. , Selinski , S. , and Ickstadt , K . 2005a. Cluster analysis: A comparison of different similarity measures for SNP data . In Proceedings of the Second Joint Meeting of the Institute of Mathematical Statistics and International Society for Bayesian Analysis (IMS-ISBA). Muller, T., Selinski, S., and Ickstadt, K. 2005a. Cluster analysis: A comparison of different similarity measures for SNP data. In Proceedings of the Second Joint Meeting of the Institute of Mathematical Statistics and International Society for Bayesian Analysis (IMS-ISBA)."},{"key":"e_1_2_1_62_1","unstructured":"Muller T. Selinski S. and Ickstadt K. 2005b. How similar is it&quest; towards personalized similarity measures in ontologies. In 7. International Tagung Wirschaftinformatik.  Muller T. Selinski S. and Ickstadt K. 2005b. How similar is it&quest; towards personalized similarity measures in ontologies. In 7. International Tagung Wirschaftinformatik."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-007-0080-8"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2006.08.006"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001407005648"},{"volume-title":"Proceedings of the 5th International Workshop on the Web and Databases (WebDB). 61--66","author":"Nierman A.","key":"e_1_2_1_66_1","unstructured":"Nierman , A. and Jagadish , H. V . 2002. Evaluating structural similarity in XML documents . In Proceedings of the 5th International Workshop on the Web and Databases (WebDB). 61--66 . Nierman, A. and Jagadish, H. V. 2002. Evaluating structural similarity in XML documents. In Proceedings of the 5th International Workshop on the Web and Databases (WebDB). 61--66."},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2004.25"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2002.1031947"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/DEXA.2006.62"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1007\/s007780100057"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2008.01.010"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/361219.361220"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/DEXA.2008.55"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1007\/11687238_88"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/18.suppl_1.S14"},{"volume-title":"Proceedings of the 25th International Conference on Very Large Data Bases (VLDB). 302--314","author":"Shanmugasundaram J.","key":"e_1_2_1_76_1","unstructured":"Shanmugasundaram , J. , Tufte , K. , He , G. , Zhang , C. , DeWitt , D. , and Naughton , J . 1999. Relational databases for querying XML documents: Limitations and opportunities . In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB). 302--314 . Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., and Naughton, J. 1999. Relational databases for querying XML documents: Limitations and opportunities. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB). 302--314."},{"key":"e_1_2_1_77_1","unstructured":"Shasha D. and Zhang K. 1995. Approximate tree pattern matching. In Pattern Matching in Strings Trees and Arrays. Oxford University Press.  Shasha D. and Zhang K. 1995. Approximate tree pattern matching. In Pattern Matching in Strings Trees and Arrays. Oxford University Press."},{"key":"e_1_2_1_78_1","first-page":"35","article-title":"Modern information retrieval: A brief overview","volume":"24","author":"Singhal A.","year":"2001","unstructured":"Singhal , A. 2001 . Modern information retrieval: A brief overview . IEEE Data Eng. Bull. 24 , 4, 35 -- 43 . Singhal, A. 2001. Modern information retrieval: A brief overview. IEEE Data Eng. Bull. 24, 4, 35--43.","journal-title":"IEEE Data Eng. Bull."},{"volume-title":"Proceedings of the 3rd International Conference on Discovery Science. 76--85","author":"Somervuo P.","key":"e_1_2_1_79_1","unstructured":"Somervuo , P. and Kohonen , T . 2000. Clustering and visualization of large protein sequence databases by means of an extension of the self-organizing map . In Proceedings of the 3rd International Conference on Discovery Science. 76--85 . Somervuo, P. and Kohonen, T. 2000. Clustering and visualization of large protein sequence databases by means of an extension of the self-organizing map. In Proceedings of the 3rd International Conference on Discovery Science. 76--85."},{"volume-title":"Proceedings of the 5th International Conference on Extending Database Technology (EDBT). 3--17","author":"Srikant R.","key":"e_1_2_1_80_1","unstructured":"Srikant , R. and Agrawal , R . 1996. Mining sequential patterns: Generalizations and performance improvements . In Proceedings of the 5th International Conference on Extending Database Technology (EDBT). 3--17 . Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the 5th International Conference on Extending Database Technology (EDBT). 3--17."},{"volume-title":"Proceedings of the 6th SIAM International Conference on Data Mining (SDM). 188--199","author":"Tagarelli A.","key":"e_1_2_1_81_1","unstructured":"Tagarelli , A. and Greco , S . 2006. Toward semantic XML clustering . In Proceedings of the 6th SIAM International Conference on Data Mining (SDM). 188--199 . Tagarelli, A. and Greco, S. 2006. Toward semantic XML clustering. In Proceedings of the 6th SIAM International Conference on Data Mining (SDM). 188--199."},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1145\/322139.322143"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.96.6.2907"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cosrev.2009.03.001"},{"volume-title":"Proceedings of the 8th International Conference on Web Information Systems Engineering (WISE). 196--211","author":"Tekli J.","key":"e_1_2_1_85_1","unstructured":"Tekli , J. , Chbeir , R. , and Ytongnon , K . 2007. Structural similarity evaluation between XML documents and DTDs . In Proceedings of the 8th International Conference on Web Information Systems Engineering (WISE). 196--211 . Tekli, J., Chbeir, R., and Ytongnon, K. 2007. Structural similarity evaluation between XML documents and DTDs. In Proceedings of the 8th International Conference on Web Information Systems Engineering (WISE). 196--211."},{"volume-title":"Proceedings of the 7th Australasian Data Mining Conference (AusDM). 219--226","author":"Tran T.","key":"e_1_2_1_86_1","unstructured":"Tran , T. , Nayak , R. , and Bruza , P . 2008. Combining structure and content similarities for XML document clustering . In Proceedings of the 7th Australasian Data Mining Conference (AusDM). 219--226 . Tran, T., Nayak, R., and Bruza, P. 2008. Combining structure and content similarities for XML document clustering. In Proceedings of the 7th Australasian Data Mining Conference (AusDM). 219--226."},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-30192-9_59"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-88871-0_35"},{"volume-title":"Proceedings of the International World Wide Web Conference (WWW).","author":"Vutukuru V.","key":"e_1_2_1_89_1","unstructured":"Vutukuru , V. , Pasupuleti , K. , Khare , A. , and Garg , A . 2002. Conceptemy: An issue in XML information retrieval . In Proceedings of the International World Wide Web Conference (WWW). Vutukuru, V., Pasupuleti, K., Khare, A., and Garg, A. 2002. Conceptemy: An issue in XML information retrieval. In Proceedings of the International World Wide Web Conference (WWW)."},{"key":"e_1_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02944801"},{"key":"e_1_2_1_91_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-008-0100-7"},{"key":"e_1_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.1145\/1364782.1364795"},{"key":"e_1_2_1_93_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2005.845141"},{"key":"e_1_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-008-0138-2"},{"key":"e_1_2_1_95_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218213005002387"},{"key":"e_1_2_1_96_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066243"},{"key":"e_1_2_1_97_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1012861931139"},{"key":"e_1_2_1_98_1","doi-asserted-by":"publisher","DOI":"10.1137\/0218082"},{"key":"e_1_2_1_99_1","doi-asserted-by":"publisher","DOI":"10.1145\/233269.233324"},{"key":"e_1_2_1_100_1","unstructured":"Zhao Y. and Karypis G. 2002a. Criterion functions for document clustering: Experiments and analysis. Tech. rep. 01-40 Department of Computer Science University of Minnesota.  Zhao Y. and Karypis G. 2002a. Criterion functions for document clustering: Experiments and analysis. Tech. rep. 01-40 Department of Computer Science University of Minnesota."},{"key":"e_1_2_1_101_1","doi-asserted-by":"publisher","DOI":"10.1145\/584792.584877"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1978802.1978804","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1978802.1978804","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:59:37Z","timestamp":1750244377000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1978802.1978804"}},"subtitle":["An overview"],"short-title":[],"issued":{"date-parts":[[2011,10]]},"references-count":101,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2011,10]]}},"alternative-id":["10.1145\/1978802.1978804"],"URL":"https:\/\/doi.org\/10.1145\/1978802.1978804","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"type":"print","value":"0360-0300"},{"type":"electronic","value":"1557-7341"}],"subject":[],"published":{"date-parts":[[2011,10]]},"assertion":[{"value":"2009-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-10-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}