{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T00:28:27Z","timestamp":1777854507630,"version":"3.51.4"},"reference-count":49,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2015,5,19]],"date-time":"2015-05-19T00:00:00Z","timestamp":1431993600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2015,8]]},"abstract":"<jats:p>The evaluation of clustering results is one of the most important issues in cluster analysis, a core task for effective information access. There are two types of measures for evaluating the quality of clustering results: internal and external. External validity measures evaluate how well the clustering results match prior knowledge about the data, whereas internal measures do not need external information, dealing only with information within the data. In this regard, the main drawback of external evaluation measures is that they are not applicable in real-world situations. In this paper we present an experimental study to determine whether it is possible to predict the quality of multilingual news clustering results by means of an internal evaluation measure. We study whether the internal evaluation measure Expected Density correlates with the external measure F-measure, the most common way of evaluating clustering results. In the experiments, we use different data collections, clustering algorithms and similarity measures in order to determine their influence in the correlation between those measures. Regarding similarity measures, another important issue in clustering, we propose a new similarity measure to calculate how similar two news documents are. This measure is based on the Named Entities shared by both documents. The results show that correlation depends on several different factors, such as the type of collection, the granularity of the clusters, the type of algorithm and the similarity measure.<\/jats:p>","DOI":"10.1177\/0165551515586671","type":"journal-article","created":{"date-parts":[[2015,5,19]],"date-time":"2015-05-19T22:30:44Z","timestamp":1432074644000},"page":"518-530","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":0,"title":["Quality prediction of multilingual news clustering: An experimental study"],"prefix":"10.1177","volume":"41","author":[{"given":"Soto","family":"Montalvo","sequence":"first","affiliation":[{"name":"Universidad Rey Juan Carlos, M\u00f3stoles, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Raquel","family":"Mart\u00ednez","sequence":"additional","affiliation":[{"name":"Universidad Nacional de Educaci\u00f3n a Distancia, Madrid, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"V\u00edctor","family":"Fresno","sequence":"additional","affiliation":[{"name":"Universidad Nacional de Educaci\u00f3n a Distancia, Madrid, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2015,5,19]]},"reference":[{"key":"bibr1-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1023\/A:1012801612483"},{"key":"bibr2-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2000.860153"},{"key":"bibr3-0165551515586671","volume-title":"Algorithms for clustering data","author":"Jain A","year":"1988"},{"key":"bibr4-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2010.11.006"},{"key":"bibr5-0165551515586671","volume-title":"Proceedings of KDD workshop on text mining","author":"Steinbach M","year":"2000"},{"key":"bibr6-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1080\/01969727408546059"},{"key":"bibr7-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.1979.4766909"},{"key":"bibr8-0165551515586671","first-page":"216","volume-title":"Proceedings of the 3rd IASTED international conference on artificial intelligence and applications (AIA)","author":"Stein B","year":"2003"},{"key":"bibr9-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1108\/eb026584"},{"key":"bibr10-0165551515586671","unstructured":"Zhao Y, Karypis G. Criterion functions for document clustering: Experiments and analysis. 2001. Technical Report 01-40. University of Minnesota, Department of Computer Science."},{"key":"bibr11-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-24630-5_74"},{"key":"bibr12-0165551515586671","first-page":"55","volume":"39","author":"Ingaramo D","year":"2007","journal-title":"Sociedad Espa\u00f1ola para el Procesamiento del Lenguaje Natural"},{"key":"bibr13-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2006.06.026"},{"key":"bibr14-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-008-9066-8"},{"key":"bibr15-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2010.35"},{"key":"bibr16-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2010.01.002"},{"key":"bibr17-0165551515586671","first-page":"410","volume-title":"Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL)","author":"Rosenberg A","year":"2007"},{"key":"bibr18-0165551515586671","first-page":"55","volume-title":"Proceedings of the 2nd multiclust workshop on discovering, summarizing and using multiple clusterings","author":"Kriegel H","year":"2011"},{"key":"bibr19-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-22732-5_18"},{"key":"bibr20-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1145\/2020408.2020555"},{"key":"bibr21-0165551515586671","first-page":"29","volume-title":"Proceedings of the 3rd international conference on language resources and evaluation","author":"Rose T","year":"2002"},{"key":"bibr22-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-78135-6_48"},{"key":"bibr23-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1109\/BIBM.2011.97"},{"key":"bibr24-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-70981-7_41"},{"key":"bibr25-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2006.07.016"},{"key":"bibr26-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2014.11.017"},{"key":"bibr27-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1613\/jair.991"},{"key":"bibr28-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2009.07.048"},{"key":"bibr29-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989474"},{"key":"bibr30-0165551515586671","volume-title":"Proceedings of the international joint conferences on artificial intelligence (IJCAI)","author":"Gael J","year":"2007"},{"key":"bibr31-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-71701-0_107"},{"key":"bibr32-0165551515586671","first-page":"108","volume-title":"Proceedings of the second international conference on multidisciplinary information retrieval facility (IRFC)","author":"Kumar N","year":"2011"},{"key":"bibr33-0165551515586671","volume-title":"Proceedings of the 4th Slovenian language technology conference, information society","author":"Steinberger R","year":"2004"},{"key":"bibr34-0165551515586671","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220493"},{"key":"bibr35-0165551515586671","doi-asserted-by":"publisher","DOI":"10.2498\/cit.2005.04.01"},{"key":"bibr36-0165551515586671","volume-title":"Proceedings of 11th international conference on intelligent text processing and computational linguistics (CICLing)","author":"Denicia-Carral C","year":"2010"},{"key":"bibr37-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4615-0933-2"},{"key":"bibr38-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1109\/5254.784083"},{"key":"bibr39-0165551515586671","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220477"},{"key":"bibr40-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1145\/1183614.1183771"},{"key":"bibr41-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1145\/1008992.1009044"},{"key":"bibr42-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2012.02.084"},{"key":"bibr43-0165551515586671","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2007.07.011"},{"key":"bibr44-0165551515586671","unstructured":"Montalvo S. Estudio y nuevas estrategias en el uso de las entidades nombradas en el clustering biling\u00fce de noticias. 2012. PhD thesis, Universidad Rey Juan Carlos, Spain."},{"key":"bibr45-0165551515586671","doi-asserted-by":"crossref","unstructured":"Karypis G. Cluto: A clustering toolkit. 2003. Technical Report 02-017, University of Minnesota, Department of Computer Science.","DOI":"10.21236\/ADA439508"},{"key":"bibr46-0165551515586671","first-page":"81","volume":"48","author":"Montalvo S","year":"2012","journal-title":"Procesamiento del Lenguaje Natural"},{"key":"bibr47-0165551515586671","volume-title":"Working Notes for the CLEF 2003 Workshop","author":"Peters C","year":"2003"},{"key":"bibr48-0165551515586671","first-page":"116","volume-title":"Proceedings of the conference of recherche d\u2019information assistee par ordinateur (RIAO 2004)","author":"Mathieu B","year":"2004"},{"key":"bibr49-0165551515586671","volume-title":"Proceedings of 7th Language Resources and Evaluation Conference (LREC)","author":"Padr\u00f3 L","year":"2010"}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551515586671","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/0165551515586671","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551515586671","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T23:09:06Z","timestamp":1777504146000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0165551515586671"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,5,19]]},"references-count":49,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,8]]}},"alternative-id":["10.1177\/0165551515586671"],"URL":"https:\/\/doi.org\/10.1177\/0165551515586671","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"value":"0165-5515","type":"print"},{"value":"1741-6485","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,5,19]]}}}