{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T00:27:29Z","timestamp":1777854449321,"version":"3.51.4"},"reference-count":32,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2011,5,9]],"date-time":"2011-05-09T00:00:00Z","timestamp":1304899200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2011,6]]},"abstract":"<jats:p>It is often necessary to categorize automatically multilingual document sets, in which documents written in a variety of languages are included, into topically homogeneous subsets, such as when applying an automatic summarization system for multilingual news articles. However, there have been few studies on multilingual document clustering to date. In particular, it is not known whether clustering techniques are effective in medium- or large-scale multilingual document sets. For scalability, techniques should be based on dictionary-based translation and a single- or double-pass clustering algorithm. This article reports on experiments of applying multilingual document clustering to medium-scale sets of English, French, German and Italian documents (Reuters news articles). The results show that the double-pass algorithm has a positive effect in the case that each document is translated. On the other hand, the cluster translation strategy in which clusters obtained by applying a clustering algorithm to each language document set are translated has almost no effect. Also, translation disambiguation techniques can improve, but only slightly, the effectiveness of clustering.<\/jats:p>","DOI":"10.1177\/0165551511404867","type":"journal-article","created":{"date-parts":[[2011,5,10]],"date-time":"2011-05-10T00:35:46Z","timestamp":1304987746000},"page":"304-321","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":4,"title":["Double-pass clustering technique for multilingual document collections"],"prefix":"10.1177","volume":"37","author":[{"given":"Kazuaki","family":"Kishida","sequence":"first","affiliation":[{"name":"Keio University, Japan,"}]}],"member":"179","published-online":{"date-parts":[[2011,5,9]]},"reference":[{"key":"atypb1","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(75)90031-X"},{"key":"atypb2","volume-title":"Proceedings of the 10th ACM SIGIR conference on research and development in information retrieval","author":"Rasmussen E."},{"key":"atypb3","unstructured":"J. Rasmussen E. Clustering algorithm. In: Frakes WB and Baeza-Yates R (eds) Information retrieval: data structures & algorithms. Englewood Cliffs, NJ : PTR Prentice Hall, 1992, pp. 419-442."},{"key":"atypb4","volume-title":"Proceedings of the 18th conference on computational linguistics","author":"J Chen HH"},{"key":"atypb5","volume-title":"Newsblaster Russian-English clustering performance analysis (Technical Report CUCS-010-03)","author":"Leftin LJ","year":"2003"},{"key":"atypb6","volume-title":"A platform for multilingual news summarization (Technical Report CUCS-014-03)","author":"Evans DK","year":"2003"},{"key":"atypb7","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-71701-0_107"},{"key":"atypb8","first-page":"223","volume":"33","author":"Oard DW","year":"1998","journal-title":"Annual Review of Information Science and Technology"},{"key":"atypb9","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2004.06.007"},{"key":"atypb10","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-031-02138-1","volume-title":"Cross-language information retrieval","author":"Nie JY","year":"2010"},{"key":"atypb11","doi-asserted-by":"crossref","first-page":"1092","DOI":"10.1002\/asi.21311","volume":"61","author":"Kishida K.","year":"2010","journal-title":"Journal of the American Society for Information Science and Technology"},{"key":"atypb12","first-page":"361","volume":"5","author":"Lewis DD","year":"2004","journal-title":"Journal of Machine Learning Research"},{"key":"atypb13","volume-title":"Proceedings of the third all-Russian scientific conference - digital libraries: advanced methods and technologies, digital collections (RCDL01)","author":"Rauber A."},{"key":"atypb14","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4615-0933-2"},{"key":"atypb15","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4615-0933-2_4"},{"key":"atypb16","volume-title":"Proceedings of the 20th international conference on computational linguistics (COLING)","author":"Pouliquen B."},{"key":"atypb17","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2007.07.011"},{"key":"atypb18","doi-asserted-by":"publisher","DOI":"10.4018\/978-1-59904-618-1.ch004"},{"key":"atypb19","doi-asserted-by":"publisher","DOI":"10.1007\/11427445_38"},{"key":"atypb20","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2007.07.008"},{"key":"atypb21","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45329-6_11"},{"key":"atypb22","unstructured":"Mathieu B., Besan\u00e7on R., Fluhr C. Multilingual document clusters discovery. In: Proceedings of RIAO\u2019 2004. Avignon, 2004, pp. 1-10."},{"key":"atypb23","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4615-0933-2_12"},{"key":"atypb24","doi-asserted-by":"crossref","unstructured":"Oard DW, Levow GA, Cabezas, CI CLEF experiments at Maryland: statistical stemming and backoff translation . In: Peters C (ed.) Cross-language information retrieval and evaluation. Berlin: Springer, 2001, pp. 176-187.","DOI":"10.1007\/3-540-44645-1_17"},{"key":"atypb25","volume-title":"Proceedings of the fourth NTCIR workshop on research in information access technologies","author":"Kang IS"},{"key":"atypb26","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2006.04.006"},{"key":"atypb27","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009989801965"},{"key":"atypb28","volume-title":"Proceedings of 24th ACM SIGIR conference on research and development in information retrieval","author":"Gao J."},{"key":"atypb29","unstructured":"Croft WB, Metzler D., Strohman T. Search engines: information retrieval in practice. Boston, MA: Addison Wesley, 2010, p. 242."},{"key":"atypb30","unstructured":"Duda RO, Hart PE, Stork DG Pattern classification, 2nd ed. New York: John Wiley & Sons, 2001, p. 562."},{"key":"atypb31","doi-asserted-by":"publisher","DOI":"10.21236\/ADA459769"},{"key":"atypb32","volume-title":"Proceedings of the 25th ACM SIGIR conference on research and development in information retrieval","author":"Liu X."}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551511404867","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551511404867","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T23:08:02Z","timestamp":1777504082000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0165551511404867"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,5,9]]},"references-count":32,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2011,6]]}},"alternative-id":["10.1177\/0165551511404867"],"URL":"https:\/\/doi.org\/10.1177\/0165551511404867","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"value":"0165-5515","type":"print"},{"value":"1741-6485","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,5,9]]}}}