{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,4]],"date-time":"2025-11-04T15:49:27Z","timestamp":1762271367664,"version":"3.37.0"},"reference-count":54,"publisher":"Cambridge University Press (CUP)","issue":"1","license":[{"start":{"date-parts":[[2009,9,9]],"date-time":"2009-09-09T00:00:00Z","timestamp":1252454400000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2010,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this article, we present a comprehensive study aimed at computing semantic relatedness of word pairs. We analyze the performance of a large number of semantic relatedness measures proposed in the literature with respect to different experimental conditions, such as (i) the datasets employed, (ii) the language (English or German), (iii) the underlying knowledge source, and (iv) the evaluation task (computing scores of semantic relatedness, ranking word pairs, solving word choice problems). To our knowledge, this study is the first to systematically analyze semantic relatedness on a large number of datasets with different properties, while emphasizing the role of the knowledge source compiled either by the \u2018wisdom of linguists\u2019 (i.e., classical wordnets) or by the \u2018wisdom of crowds\u2019 (i.e., collaboratively constructed knowledge sources like Wikipedia).<\/jats:p><jats:p>The article discusses benefits and drawbacks of different approaches to evaluating semantic relatedness. We show that results should be interpreted carefully to evaluate particular aspects of semantic relatedness. For the first time, we employ a vector based measure of semantic relatedness, relying on a concept space built from documents, to the first paragraph of Wikipedia articles, to English WordNet glosses, and to GermaNet based pseudo glosses. Contrary to previous research (Strube and Ponzetto 2006; Gabrilovich and Markovitch 2007; Zesch<jats:italic>et al<\/jats:italic>. 2007), we find that \u2018wisdom of crowds\u2019 based resources are not superior to \u2018wisdom of linguists\u2019 based resources. We also find that using the first paragraph of a Wikipedia article as opposed to the whole article leads to better precision, but decreases recall. Finally, we present two systems that were developed to aid the experiments presented herein and are freely available<jats:sup>1<\/jats:sup>for research purposes: (i) DEXTRACT, a software to semi-automatically construct corpus-driven semantic relatedness datasets, and (ii) JWPL, a Java-based high-performance Wikipedia Application Programming Interface (API) for building natural language processing (NLP) applications.<\/jats:p>","DOI":"10.1017\/s1351324909990167","type":"journal-article","created":{"date-parts":[[2009,9,9]],"date-time":"2009-09-09T09:51:41Z","timestamp":1252489901000},"page":"25-59","source":"Crossref","is-referenced-by-count":67,"title":["Wisdom of crowds versus wisdom of linguists \u2013 measuring the semantic relatedness of words"],"prefix":"10.1017","volume":"16","author":[{"given":"TORSTEN","family":"ZESCH","sequence":"first","affiliation":[]},{"given":"IRYNA","family":"GUREVYCH","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2009,9,9]]},"reference":[{"key":"S1351324909990167_ref34","doi-asserted-by":"publisher","DOI":"10.1109\/21.24528"},{"key":"S1351324909990167_ref50","first-page":"1","volume-title":"Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007)","author":"Zesch","year":"2007"},{"key":"S1351324909990167_ref54","first-page":"861","volume-title":"Proceedings of AAAI","author":"Zesch","year":"2008"},{"key":"S1351324909990167_ref5","doi-asserted-by":"publisher","DOI":"10.1162\/coli.2006.32.1.13"},{"volume-title":"Reader's Digest, das Beste f\u00fcr Deutschland","year":"2005","author":"Wallace","key":"S1351324909990167_ref45"},{"key":"S1351324909990167_ref20","doi-asserted-by":"publisher","DOI":"10.1145\/318723.318728"},{"key":"S1351324909990167_ref2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45715-1_11"},{"key":"S1351324909990167_ref21","doi-asserted-by":"crossref","first-page":"871","DOI":"10.1109\/TKDE.2003.1209005","article-title":"An approach for measuring semantic similarity between words using multiple information sources","volume":"15","author":"Li","year":"2003","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"S1351324909990167_ref19","doi-asserted-by":"crossref","first-page":"265","DOI":"10.7551\/mitpress\/7287.003.0018","volume-title":"WordNet: An Electronic Lexical Database","author":"Leacock","year":"1998"},{"key":"S1351324909990167_ref14","first-page":"305","volume-title":"WordNet: An Electronic Lexical Database and Some of Its Applications","author":"Hirst","year":"1998"},{"key":"S1351324909990167_ref15","first-page":"111","volume-title":"Proceedings of Recent Advances in Natural Language Processing","author":"Jarmasz","year":"2003"},{"key":"S1351324909990167_ref25","doi-asserted-by":"publisher","DOI":"10.1080\/01690969108406936"},{"volume-title":"Roget's International Thesaurus","year":"1962","author":"Roget","key":"S1351324909990167_ref36"},{"volume-title":"Longman Dictionary of Contemporary English","year":"1978","author":"Procter","key":"S1351324909990167_ref32"},{"key":"S1351324909990167_ref17","first-page":"232","volume-title":"Proceedings of the sixth conference of the European chapter of the Association for Computational Linguistics","author":"Kozima","year":"1993"},{"key":"S1351324909990167_ref31","first-page":"1271","volume-title":"OTM '08: Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE","author":"Pirro","year":"2008"},{"volume-title":"Introduction to Modern Information Retrieval","year":"1983","author":"Salton","key":"S1351324909990167_ref38"},{"key":"S1351324909990167_ref10","first-page":"767","volume-title":"Proceedings of the 2nd International Joint Conference on Natural Language Processing","author":"Gurevych","year":"2005"},{"key":"S1351324909990167_ref49","doi-asserted-by":"publisher","DOI":"10.3115\/1641976.1641980"},{"key":"S1351324909990167_ref52","first-page":"205","volume-title":"Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)","author":"Zesch","year":"2007"},{"key":"S1351324909990167_ref26","first-page":"571","volume-title":"Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)","author":"Mohammad","year":"2007"},{"key":"S1351324909990167_ref28","first-page":"46","volume-title":"Workshop on Computational Lexical Semantics, Human Language Technology Conference of the North American Chapter of the ACL","author":"Morris","year":"2004"},{"volume-title":"Proceedings of the 16th ACM International Conference on Research and Development in Information Retrieval","year":"1993","author":"Qiu","key":"S1351324909990167_ref33"},{"volume-title":"Cohesion in English","year":"1976","author":"Halliday","key":"S1351324909990167_ref13"},{"volume-title":"Proceedings of the Conference on Language Resources and Evaluation (LREC)","year":"2008","author":"Zesch","key":"S1351324909990167_ref53"},{"key":"S1351324909990167_ref51","first-page":"197","volume-title":"Data Structures for Linguistic Resources and Applications","author":"Zesch","year":"2007"},{"key":"S1351324909990167_ref9","first-page":"1486","volume-title":"Proceedings of 18th International Joint Conference on Artificial Intelligence (IJCAI'03)","author":"Galley","year":"2003"},{"key":"S1351324909990167_ref48","first-page":"121","volume-title":"Proceedings of the Third International WordNet Conference (GWC-06)","author":"Yang","year":"2006"},{"key":"S1351324909990167_ref23","unstructured":"McHale M. 1998. A comparison of wordnet and roget's taxonomy for measuring semantic similarity. CoRR, cmp-lg\/9809003."},{"key":"S1351324909990167_ref47","first-page":"133","volume-title":"32nd Annual Meeting of the ACL","author":"Wu","year":"1994"},{"key":"S1351324909990167_ref11","first-page":"1032","volume-title":"Proceedings of ACL","author":"Gurevych","year":"2007"},{"key":"S1351324909990167_ref42","first-page":"1419","volume-title":"Proceedings of the 21st National Conference on ArtificialIntelligence (AAAI-06)","author":"Strube","year":"2006"},{"volume-title":"The Macquarie Thesaurus","year":"1986","author":"Bernard","key":"S1351324909990167_ref3"},{"key":"S1351324909990167_ref12","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220465"},{"key":"S1351324909990167_ref37","doi-asserted-by":"publisher","DOI":"10.1145\/365628.365657"},{"key":"S1351324909990167_ref43","first-page":"313","volume-title":"Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL","author":"Turney","year":"2006"},{"key":"S1351324909990167_ref30","first-page":"1","volume-title":"Proceedings of the EACL 2006 Workshop Making Sense of Sense - Bringing Computational Linguistics and Psycholinguistics Together","author":"Patwardhan","year":"2006"},{"key":"S1351324909990167_ref44","unstructured":"Voss J. 2006. Collaborative thesaurus tagging the Wikipedia way. CoRR, abs\/cs\/0604036."},{"key":"S1351324909990167_ref6","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/7287.001.0001","volume-title":"WordNet an Electronic Lexical Database","author":"Fellbaum","year":"1998"},{"key":"S1351324909990167_ref27","first-page":"21","article-title":"Lexical cohesion computed by thesaural relations as an indicator of the structure of text","volume":"17","author":"Morris","year":"1991","journal-title":"Computational Linguistics"},{"key":"S1351324909990167_ref1","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1080\/00031305.1973.10478966","article-title":"Graphs in statistical analysis","volume":"27","author":"Anscombe","year":"1973","journal-title":"American Statistician"},{"key":"S1351324909990167_ref24","first-page":"152","volume-title":"Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics","author":"Mihalcea","year":"1999"},{"key":"S1351324909990167_ref41","doi-asserted-by":"publisher","DOI":"10.3115\/1219840.1219887"},{"key":"S1351324909990167_ref22","first-page":"296","volume-title":"Proceedings of International Conference on Machine Learning","author":"Lin","year":"1998"},{"key":"S1351324909990167_ref35","first-page":"448","volume-title":"Proceedings of the 14th International Joint Conference on Artificial Intelligence","author":"Resnik","year":"1995"},{"key":"S1351324909990167_ref29","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-36456-0_24"},{"key":"S1351324909990167_ref40","doi-asserted-by":"publisher","DOI":"10.1162\/089120102762671954"},{"volume-title":"Proceedings of the 10th International Conference on Research in Computational Linguistics","year":"1997","author":"Jiang","key":"S1351324909990167_ref16"},{"volume-title":"Proceedings of ECAI'2004, the 16th European Conference on Artificial Intelligence","year":"2004","author":"Seco","key":"S1351324909990167_ref39"},{"key":"S1351324909990167_ref46","doi-asserted-by":"crossref","unstructured":"Weeds J. E. 2003. Measures and Applications of Lexical Distributional Similarity. PhD thesis, East Sussex, UK: University of Sussex.","DOI":"10.3115\/1220355.1220501"},{"key":"S1351324909990167_ref8","first-page":"1606","volume-title":"Proceedings of The 20th International Joint Conference on Artificial Intelligence (IJCAI)","author":"Gabrilovich","year":"2007"},{"volume-title":"Proceedings of the Third Global WordNet Meeting","year":"2006","author":"Boyd-Graber","key":"S1351324909990167_ref4"},{"key":"S1351324909990167_ref7","doi-asserted-by":"publisher","DOI":"10.1145\/503104.503110"},{"key":"S1351324909990167_ref18","first-page":"423","volume-title":"Lexikalisch-semantische Wortnetze","author":"Kunze","year":"2004"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324909990167","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,12]],"date-time":"2025-02-12T01:27:47Z","timestamp":1739323667000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324909990167\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,9,9]]},"references-count":54,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,1]]}},"alternative-id":["S1351324909990167"],"URL":"https:\/\/doi.org\/10.1017\/s1351324909990167","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"type":"print","value":"1351-3249"},{"type":"electronic","value":"1469-8110"}],"subject":[],"published":{"date-parts":[[2009,9,9]]}}}