{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:37:02Z","timestamp":1750307822643,"version":"3.41.0"},"reference-count":31,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2008,6,1]],"date-time":"2008-06-01T00:00:00Z","timestamp":1212278400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2008,6]]},"abstract":"<jats:p>\n            Internet directories such as Yahoo! are an approach to improve\nthe efficacy and efficiency of Information Retrieval (IR) on the\nWeb, as pages (documents) are organized into hierarchical\ncategories, and similar pages are grouped together. Most of the\nsearch engines on the Web service find documents that are assigned\nto a single classification hierarchy. Categories in the hierarchy\nare carefully defined by human experts and documents are well\norganized. However, a single hierarchy in one language is often\ninsufficient to find all relevant material, as each hierarchy tends\nto have some bias in both defining hierarchical structure and\nclassifying documents. Moreover, documents written in a language\nother than the users native language often include large amounts of\ninformation related to the users request. In this article, we\npropose a method of integrating cross-language (CL) category\nhierarchies, that is, Reuters 96 hierarchy and UDC code hierarchy\nof Japanese by estimating category similarities. The method does\nnot simply merge two different hierarchies into one large hierarchy\nbut instead extracts sets of similar categories, where each element\nof the sets is relevant with each other. It consists of three\nsteps. First, we classify documents from one hierarchy into\ncategories with another hierarchy using a cross-language text\nclassification (CLTC) technique, and extract category pairs of two\nhierarchies. Next, we apply\n            <jats:italic>\u00c7<\/jats:italic>\n            <jats:sup>2<\/jats:sup>\n            statistics\nto these pairs to obtain similar category pairs, and finally we\napply the generating function of the Apriori algorithm\n(Apriori-Gen) to the category pairs, and find sets of similar\ncategories. Moreover, we examined whether integrating hierarchies\nhelps to support retrieval of documents with similar contents. The\nretrieval results showed a 42.7% improvement over the baseline\nnonhierarchy model, and a 21.6% improvement over a single\nhierarchy.\n          <\/jats:p>","DOI":"10.1145\/1386869.1386870","type":"journal-article","created":{"date-parts":[[2008,8,27]],"date-time":"2008-08-27T11:56:36Z","timestamp":1219838196000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Integrating Cross-Language Hierarchies and Its Application to Retrieving Relevant Documents"],"prefix":"10.1145","volume":"7","author":[{"given":"Fumiyo","family":"Fukumoto","sequence":"first","affiliation":[{"name":"University of Yamanashi"}]},{"given":"Yoshimi","family":"Suzuki","sequence":"additional","affiliation":[{"name":"University of Yamanashi"}]}],"member":"320","published-online":{"date-parts":[[2008,6]]},"reference":[{"volume-title":"Proceedings of the 24th International Conference on Very Large Databases (VLDB\u201998)","author":"Agrawal R.","unstructured":"Agrawal , R. and Srikant , R . 1998. Fast algorithms for mining association rules . In Proceedings of the 24th International Conference on Very Large Databases (VLDB\u201998) . Morgan Kaufmann, San Francisco, CA., 478--499. Agrawal, R. and Srikant, R. 1998. Fast algorithms for mining association rules. In Proceedings of the 24th International Conference on Very Large Databases (VLDB\u201998). Morgan Kaufmann, San Francisco, CA., 478--499.","key":"e_1_2_1_1_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_2_1","DOI":"10.1145\/371920.372163"},{"volume-title":"Topic Detection and Tracking: Event-Based Information Organization","author":"Allan J.","unstructured":"Allan , J. 2002. Topic Detection and Tracking: Event-Based Information Organization . Kluwer Academic Publishers , Boston, MA . Allan, J. 2002. Topic Detection and Tracking: Event-Based Information Organization. Kluwer Academic Publishers, Boston, MA.","key":"e_1_2_1_3_1"},{"volume-title":"Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL\u201904)","author":"Chen F.","unstructured":"Chen , F. , Farahat , A. , and Brants , T . 2004. Multiple similarity measures and source-pair information . In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL\u201904) . Boston, MA, 313--320. Chen, F., Farahat, A., and Brants, T. 2004. Multiple similarity measures and source-pair information. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL\u201904). Boston, MA, 313--320.","key":"e_1_2_1_4_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_5_1","DOI":"10.1145\/1168092.1168097"},{"doi-asserted-by":"publisher","key":"e_1_2_1_6_1","DOI":"10.3115\/980845.980888"},{"doi-asserted-by":"publisher","key":"e_1_2_1_7_1","DOI":"10.1145\/511446.511532"},{"doi-asserted-by":"publisher","key":"e_1_2_1_8_1","DOI":"10.1145\/345508.345593"},{"volume-title":"Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI\u201903)","author":"Ichise R.","unstructured":"Ichise , R. , Tanaka , H. , and Honiden , S . 2003. Integrating multiple internet directories by instance-based learning . In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI\u201903) Morgan Kaufmann, San Francisco, CA, 22--30. Ichise, R., Tanaka, H., and Honiden, S. 2003. Integrating multiple internet directories by instance-based learning. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI\u201903) Morgan Kaufmann, San Francisco, CA, 22--30.","key":"e_1_2_1_9_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_10_1","DOI":"10.1162\/089120103322711587"},{"volume-title":"Elements of Machine Learning. Morgan Kaufmann","author":"Langley P.","unstructured":"Langley , P. 1996. Elements of Machine Learning. Morgan Kaufmann , San Francisco, CA . Langley, P. 1996. Elements of Machine Learning. Morgan Kaufmann, San Francisco, CA.","key":"e_1_2_1_11_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_12_1","DOI":"10.3115\/992730.992806"},{"unstructured":"Matsumoto Y. Kitauchi A. Yamashita T. Hirano Y. Matsuda Y. Takaoka K. and Asahara M. 2000. Japanese morphological analysis system ChaSen version 2.2.1. NAIST Tech. rep. NAIST Nara Japan. Matsumoto Y. Kitauchi A. Yamashita T. Hirano Y. Matsuda Y. Takaoka K. and Asahara M. 2000. Japanese morphological analysis system ChaSen version 2.2.1. NAIST Tech. rep. NAIST Nara Japan.","key":"e_1_2_1_13_1"},{"volume-title":"Proceedings of the 15th International Conference on Machine Learning (ICML\u201998)","author":"McCallum A.","unstructured":"McCallum , A. , Rosenfeld , R. , Mitchell , T. , and Ng , A . 1998. Improving text classification by shrinkage in a hierarchy of classes . In Proceedings of the 15th International Conference on Machine Learning (ICML\u201998) . Morgan Kaufmann, San Francisco, CA, 359--367. McCallum, A., Rosenfeld, R., Mitchell, T., and Ng, A. 1998. Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the 15th International Conference on Machine Learning (ICML\u201998). Morgan Kaufmann, San Francisco, CA, 359--367.","key":"e_1_2_1_14_1"},{"volume-title":"Machine Learning","author":"Mitchell T.","unstructured":"Mitchell , T. 1997. Machine Learning . McGraw-Hill , New York . Mitchell, T. 1997. Machine Learning. McGraw-Hill, New York.","key":"e_1_2_1_15_1"},{"volume-title":"Proceedings of the 17th National Conference on Artificial Intelligence (AAAI\u201900)","author":"Noy N. F.","unstructured":"Noy , N. F. and Musen , M. A . 2000. Algorithm and tool for automated ontology merging and alignment . In Proceedings of the 17th National Conference on Artificial Intelligence (AAAI\u201900) . Austin, Texas. Noy, N. F. and Musen, M. A. 2000. Algorithm and tool for automated ontology merging and alignment. In Proceedings of the 17th National Conference on Artificial Intelligence (AAAI\u201900). Austin, Texas.","key":"e_1_2_1_16_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_17_1","DOI":"10.1162\/089120103322711578"},{"key":"e_1_2_1_18_1","volume-title":"Information Retrieval: Uncertainty and Logics","author":"Rijsbergen V.","year":"1998","unstructured":"Rijsbergen , V. , Joost , C. , Fabio , C. , and Mounia , L . 1998 . Information Retrieval: Uncertainty and Logics . Kluwer Academic Publishers . Rijsbergen, V., Joost, C., Fabio, C., and Mounia, L. 1998. Information Retrieval: Uncertainty and Logics. Kluwer Academic Publishers."},{"volume-title":"Proceedings of the 9th Text Retrieval Conference (TREC-9). NIST","author":"Robertson S. E.","unstructured":"Robertson , S. E. and Hull , D. A . 2000. TREC-9 filtering track final report . In Proceedings of the 9th Text Retrieval Conference (TREC-9). NIST , Gaithersburg, MD, 25--40. Robertson, S. E. and Hull, D. A. 2000. TREC-9 filtering track final report. In Proceedings of the 9th Text Retrieval Conference (TREC-9). NIST, Gaithersburg, MD, 25--40.","key":"e_1_2_1_19_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_20_1","DOI":"10.1145\/1008992.1009021"},{"unstructured":"RWCP. 1998. Rwc text database. In Real World Computing Partnership. RWCP. 1998. Rwc text database. In Real World Computing Partnership .","key":"e_1_2_1_21_1"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the EACL SIGDAT Workshop","author":"Schmid H.","year":"1995","unstructured":"Schmid , H. 1995 . Improvements in part-of-speech tagging with an application to German . In Proceedings of the EACL SIGDAT Workshop . Dublin, Ireland. Schmid, H. 1995. Improvements in part-of-speech tagging with an application to German. In Proceedings of the EACL SIGDAT Workshop. Dublin, Ireland."},{"volume-title":"Proceedings of the International Workshop on Foundations of Models for Information Integration (FMII\u201901)","author":"Stumme G.","unstructured":"Stumme , G. and Madche , A . 2001. Ontology merging for federated ontologies on the semantic Web . In Proceedings of the International Workshop on Foundations of Models for Information Integration (FMII\u201901) . Viterbo, Italy, 413--419. Stumme, G. and Madche, A. 2001. Ontology merging for federated ontologies on the semantic Web. In Proceedings of the International Workshop on Foundations of Models for Information Integration (FMII\u201901). Viterbo, Italy, 413--419.","key":"e_1_2_1_23_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_24_1","DOI":"10.1145\/1277741.1277778"},{"doi-asserted-by":"publisher","key":"e_1_2_1_25_1","DOI":"10.3115\/1075096.1075106"},{"doi-asserted-by":"publisher","key":"e_1_2_1_26_1","DOI":"10.3115\/1067807.1067854"},{"volume-title":"The Nature of Statistical Learning Theory","author":"Vapnik V.","unstructured":"Vapnik , V. 1995. The Nature of Statistical Learning Theory . Springer , Berlin, Germany . Vapnik, V. 1995. The Nature of Statistical Learning Theory. Springer, Berlin, Germany.","key":"e_1_2_1_27_1"},{"unstructured":"Weston J. and Watkin C. 1998. Multi-class support vector machines. Tech. rep. Department of Computer Science Royal Holloway University of London UK. Weston J. and Watkin C. 1998. Multi-class support vector machines. Tech. rep. Department of Computer Science Royal Holloway University of London UK.","key":"e_1_2_1_28_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_29_1","DOI":"10.1145\/312624.312647"},{"doi-asserted-by":"publisher","key":"e_1_2_1_30_1","DOI":"10.1145\/860435.860455"},{"doi-asserted-by":"publisher","key":"e_1_2_1_31_1","DOI":"10.3115\/1220175.1220250"}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1386869.1386870","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1386869.1386870","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:57:48Z","timestamp":1750255068000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1386869.1386870"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,6]]},"references-count":31,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2008,6]]}},"alternative-id":["10.1145\/1386869.1386870"],"URL":"https:\/\/doi.org\/10.1145\/1386869.1386870","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"type":"print","value":"1530-0226"},{"type":"electronic","value":"1558-3430"}],"subject":[],"published":{"date-parts":[[2008,6]]},"assertion":[{"value":"2007-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-06-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}