{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,12,29]],"date-time":"2022-12-29T20:27:41Z","timestamp":1672345661204},"reference-count":11,"publisher":"World Scientific Pub Co Pte Lt","issue":"04","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Artif. Intell. Tools"],"published-print":{"date-parts":[[2005,8]]},"abstract":"<jats:p> XML is increasingly important in data exchange and information management. A great deal of efforts have been spent in developing efficient techniques for storing, querying, indexing and accessing XML documents. In this paper we propose a new approach to clustering XML data. In contrast to previous work, which focused on documents defined by different DTDs, the proposed method works for documents with the same DTD. Our approach is to extract features from documents, modeled by ordered labeled trees, and transform the documents to vectors in a high-dimensional Euclidean space based on the occurrences of the features in the documents. We then reduce the dimensionality of the vectors by principal component analysis (PCA) and cluster the vectors in the reduced dimensional space. The PCA enables one to identify vectors with co-occurrent features, thereby enhancing the accuracy of the clustering. We also discuss an extension of our techniques to XML retrieval. Experimental results based on documents obtained from Wisconsin's XML data bank show the effectiveness and good performance of the proposed techniques. <\/jats:p>","DOI":"10.1142\/s0218213005002326","type":"journal-article","created":{"date-parts":[[2005,7,25]],"date-time":"2005-07-25T04:57:18Z","timestamp":1122267438000},"page":"683-699","source":"Crossref","is-referenced-by-count":5,"title":["XML CLUSTERING AND RETRIEVAL THROUGH PRINCIPAL COMPONENT ANALYSIS"],"prefix":"10.1142","volume":"14","author":[{"given":"JASON T. L.","family":"WANG","sequence":"first","affiliation":[{"name":"Department of Computer Science,  New Jersey Institute of Technology, University Heights,  Newark, NJ 07102, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"JIANGHUI","family":"LIU","sequence":"additional","affiliation":[{"name":"Department of Computer Science,  New Jersey Institute of Technology, University Heights,  Newark, NJ 07102, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"JUNHAN","family":"WANG","sequence":"additional","affiliation":[{"name":"Department of Computer Science,  New Jersey Institute of Technology, University Heights,  Newark, NJ 07102, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"219","published-online":{"date-parts":[[2011,11,21]]},"reference":[{"key":"rf6","doi-asserted-by":"publisher","DOI":"10.1142\/S1469026802000671"},{"key":"rf11","doi-asserted-by":"publisher","DOI":"10.1137\/0218082"},{"key":"rf14","first-page":"205","volume":"14","author":"Dubiner M.","journal-title":"J. ACM"},{"key":"rf15","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539791218202"},{"key":"rf16","doi-asserted-by":"publisher","DOI":"10.1006\/jagm.1999.1044"},{"key":"rf18","unstructured":"K. G.\u00a0Herbert, J. T. L.\u00a0Wang and J.\u00a0Liu, Handbook of Computer Science and Engineering, 2nd edn., ed. A. B.\u00a0Tucker (CRC Press, 2004)\u00a0pp. 75.1\u201375.16."},{"key":"rf20","volume-title":"Data Mining: Concepts and Techniques","author":"Han J.","year":"2001"},{"key":"rf23","doi-asserted-by":"publisher","DOI":"10.1007\/s101150050009"},{"key":"rf24","volume-title":"Matrix Computation","author":"Golub G.","year":"1989"},{"key":"rf26","volume-title":"Digital Image Processing","author":"Gonzales R. C.","year":"1993"},{"key":"rf32","doi-asserted-by":"publisher","DOI":"10.1016\/j.datak.2003.06.001"}],"container-title":["International Journal on Artificial Intelligence Tools"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218213005002326","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,7]],"date-time":"2019-08-07T02:30:33Z","timestamp":1565145033000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0218213005002326"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,8]]},"references-count":11,"journal-issue":{"issue":"04","published-online":{"date-parts":[[2011,11,21]]},"published-print":{"date-parts":[[2005,8]]}},"alternative-id":["10.1142\/S0218213005002326"],"URL":"https:\/\/doi.org\/10.1142\/s0218213005002326","relation":{},"ISSN":["0218-2130","1793-6349"],"issn-type":[{"value":"0218-2130","type":"print"},{"value":"1793-6349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,8]]}}}