{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T13:45:19Z","timestamp":1760708719082,"version":"3.40.4"},"reference-count":82,"publisher":"MIT Press","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computational Linguistics"],"published-print":{"date-parts":[[2014,12]]},"abstract":"<jats:p>The evaluation of several tasks in lexical semantics is often limited by the lack of large amounts of manual annotations, not only for training purposes, but also for testing purposes. Word Sense Disambiguation (WSD) is a case in point, as hand-labeled datasets are particularly hard and time-consuming to create. Consequently, evaluations tend to be performed on a small scale, which does not allow for in-depth analysis of the factors that determine a systems' performance.<\/jats:p><jats:p>In this paper we address this issue by means of a realistic simulation of large-scale evaluation for the WSD task. We do this by providing two main contributions: First, we put forward two novel approaches to the wide-coverage generation of semantically aware pseudowords (i.e., artificial words capable of modeling real polysemous words); second, we leverage the most suitable type of pseudoword to create large pseudosense-annotated corpora, which enable a large-scale experimental framework for the comparison of state-of-the-art supervised and knowledge-based algorithms. Using this framework, we study the impact of supervision and knowledge on the two major disambiguation paradigms and perform an in-depth analysis of the factors which affect their performance.<\/jats:p>","DOI":"10.1162\/coli_a_00202","type":"journal-article","created":{"date-parts":[[2014,6,18]],"date-time":"2014-06-18T15:12:36Z","timestamp":1403104356000},"page":"837-881","source":"Crossref","is-referenced-by-count":25,"title":["A Large-Scale Pseudoword-Based Evaluation Framework for State-of-the-Art Word Sense Disambiguation"],"prefix":"10.1162","volume":"40","author":[{"given":"Mohammad Taher","family":"Pilehvar","sequence":"first","affiliation":[{"name":"Sapienza University of Rome"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Roberto","family":"Navigli","sequence":"additional","affiliation":[{"name":"Sapienza University of Rome"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","reference":[{"key":"R1","doi-asserted-by":"publisher","DOI":"10.3115\/1620754.1620758"},{"key":"R2","unstructured":"Agirre, Eneko, Olatz Ansa, David Martinez, and Eduard Hovy. 2001. Enriching WordNet concepts with topic signatures. In Proceedings of the NAACL Workshop on WordNet and Other Lexical Resources, pages 23\u201328, Pittsburg, PA."},{"key":"R3","unstructured":"Agirre, Eneko and Oier Lopez de Lacalle. 2004. Publicly available topic signatures for all WordNet nominal senses. In Proceedings of the LREC, pages 1,123\u20131,126, Lisbon."},{"key":"R4","unstructured":"Agirre, Eneko, Oier Lopez de Lacalle, and Aitor Soroa. 2009. Knowledge-based WSD on specific domains: Performing better than generic supervised WSD. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), pages 1,501\u20131,506, Pasadena, CA."},{"key":"R5","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00164"},{"key":"R6","unstructured":"Agirre, Eneko and David Mart\u00ednez. 2004. Unsupervised WSD based on automatically retrieved examples: The importance of bias. In Proceedings of EMNLP, pages 25\u201332, Barcelona."},{"key":"R7","doi-asserted-by":"publisher","DOI":"10.3115\/1609067.1609070"},{"key":"R8","doi-asserted-by":"crossref","unstructured":"Banko, Michele and Eric Brill. 2001. Scaling to very very large corpora for natural language disambiguation. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pages 26\u201333, Toulouse.","DOI":"10.3115\/1073012.1073017"},{"key":"R9","doi-asserted-by":"crossref","unstructured":"Bergsma, Shane, Dekang Lin, and Randy Goebel. 2008. Discriminative learning of selectional preference from unlabeled text. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 59\u201368, Honolulu, HI.","DOI":"10.3115\/1613715.1613725"},{"key":"R10","unstructured":"Bordag, Stefan. 2006. Word Sense Induction: Triplet-based clustering and automatic evaluation. In Proceedings of the 11thConference of the European Chapter of the Association for Computational Linguistics (EACL), pages 137\u2013144, Trento."},{"key":"R12","unstructured":"Chambers, Nathanael and Dan Jurafsky. 2010. Improving the use of pseudo-words for evaluating Selectional Preferences. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pages 445\u2013453, Uppsala."},{"key":"R13","unstructured":"Chan, Yee Seng and Hwee Tou Ng. 2005a. Scaling up word sense disambiguation via parallel texts. In Proceedings of the 20thNational Conference on Artificial Intelligence (AAAI), pages 1,037\u20131,042, Pittsburgh, PA."},{"key":"R14","unstructured":"Chan, Yee Seng and Hwee Tou Ng. 2005b. Word Sense Disambiguation with distribution estimation. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, IJCAI'05, pages 1,010\u20131,015, Edinburgh."},{"key":"R15","unstructured":"Chan, Yee Seng and Hwee Tou Ng. 2007. Domain adaptation with active learning for Word Sense Disambiguation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pages 49\u201356, Prague."},{"key":"R16","doi-asserted-by":"publisher","DOI":"10.3115\/1621474.1621528"},{"key":"R17","doi-asserted-by":"publisher","DOI":"10.3115\/1610075.1610149"},{"key":"R18","doi-asserted-by":"publisher","DOI":"10.3115\/1621474.1621489"},{"key":"R19","doi-asserted-by":"crossref","unstructured":"Cuadros, Montse and German Rigau. 2008. KnowNet: Building a large net of knowledge from the Web. In Proceedings of the 22nd International Conference on Computational Linguistics, pages 161\u2013168, Manchester.","DOI":"10.3115\/1599081.1599102"},{"key":"R20","doi-asserted-by":"crossref","unstructured":"Curran, James R. and Stephen Clark. 2003. Investigating GIS and smoothing for maximum entropy taggers. In Proceedings of the Tenth Conference of the European Chapter of the Association for Computational Linguistics - Volume 1, pages 91\u201398, Budapest.","DOI":"10.3115\/1067807.1067821"},{"key":"R21","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00148"},{"key":"R22","unstructured":"Erk, Katrin. 2007. A simple, similarity-based model for Selectional Preferences. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pages 216\u2013223, Prague."},{"key":"R23","doi-asserted-by":"publisher","DOI":"10.1162\/coli_a_00017"},{"key":"R24","doi-asserted-by":"crossref","unstructured":"Escudero, Gerard, Llu\u00eds M\u00e0rquez, and German Rigau. 2000. An empirical study of the domain dependence of supervised Word Sense Disambiguation systems. In Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13, EMNLP '00, pages 172\u2013180, Hong Kong.","DOI":"10.3115\/1117794.1117816"},{"key":"R25","unstructured":"Faralli, Stefano and Roberto Navigli. 2012. A new minimally-supervised framework for Domain Word Sense Disambiguation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1,411\u20131,422, Jeju."},{"key":"R27","doi-asserted-by":"publisher","DOI":"10.1613\/jair.3456"},{"key":"R28","unstructured":"Gale, William, Kenneth Church, and David Yarowsky. 1992a.Work on statistical methods for Word Sense Disambiguation. In Proceedings of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pages 54\u201360, Cambridge, MA."},{"key":"R29","doi-asserted-by":"publisher","DOI":"10.1007\/BF00136984"},{"key":"R30","unstructured":"Gaustad, Tanja. 2001. Statistical corpus-based word sense disambiguation: Pseudowords vs. real ambiguous words. In Proceedings of the Student Research Workshop of the 39th Annual Meeting of the Association for Computational Linguistics (ACL\/EACL 2001), pages 61\u201366, Toulouse."},{"key":"R31","unstructured":"Graff, David and Christopher Cieri. 2003. English gigaword, LDC2003T05. In Linguistic Data Consortium. Philadelphia, PA."},{"key":"R32","doi-asserted-by":"crossref","unstructured":"Haveliwala, Taher H. 2002. Topic-sensitive PageRank. In Proceedings of the 11thInternational Conference on World Wide Web (WWW 2002), pages 517\u2013526, Honolulu, HI.","DOI":"10.1145\/511446.511513"},{"key":"R33","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2012.10.002"},{"key":"R34","unstructured":"Hughes, Thad and Daniel Ramage. 2007. Lexical semantic relatedness with random graph walks. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '07, pages 581\u2013589, Prague."},{"key":"R35","unstructured":"Ide, Nancy, Collin F. Baker, Christiane Fellbaum, and Rebecca J. Passonneau. 2010. The manually annotated sub-corpus: A community resource for and by the people. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (Short Papers), pages 68\u201373, Uppsala."},{"key":"R36","unstructured":"Jurgens, David and Keith Stevens. 2011. Measuring the impact of sense similarity on Word Sense Induction. In Proceedings of the First Workshop on Unsupervised Learning in NLP, EMNLP '11, pages 113\u2013123, Edinburgh."},{"key":"R37","unstructured":"Khapra, Mitesh, Anup Kulkarni, Saurabh Sohoney, and Pushpak Bhattacharyya. 2010. All words domain adapted WSD: Finding a middle ground between supervision and unsupervision. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pages 1,532\u20131,541, Uppsala."},{"key":"R38","unstructured":"Kilgarriff, Adam and Joseph Rosenzweig. 2000. English Senseval: Report and results. In Proceedings of the 2ndConference on Language Resources and Evaluation (LREC), pages 1,239\u20131,244, Athens."},{"key":"R39","unstructured":"Leacock, Claudia, Martin Chodorow, and George Miller. 1998. Using corpus statistics andWordNet relations for sense identification. Computational Linguistics, 24(1):147\u2013166."},{"key":"R40","doi-asserted-by":"publisher","DOI":"10.3115\/1118693.1118699"},{"key":"R41","doi-asserted-by":"publisher","DOI":"10.3115\/990820.990892"},{"key":"R42","unstructured":"Lin, Dekang. 1998. An information-theoretic definition of similarity. In Proceedings of the 15thInternational Conference on Machine Learning, pages 296\u2013304, Madison, WI."},{"key":"R43","unstructured":"Litkowski, Ken. 2004. Senseval-3 task: Word Sense Disambiguation of WordNet glosses. In Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pages 13\u201316, Barcelona."},{"key":"R44","doi-asserted-by":"crossref","unstructured":"Lu, Zhimao, Haifeng Wang, Jianmin Yao, Ting Liu, and Sheng Li. 2006. An equivalent pseudoword solution to Chinese Word Sense Disambiguation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 457\u2013464, Sydney.","DOI":"10.3115\/1220175.1220233"},{"key":"R45","doi-asserted-by":"crossref","unstructured":"Marcus, Mitchell, Grace Kim, Mary Ann Marcinkiewicz, Robert Macintyre, Ann Bies, Mark Ferguson, Karen Katz, and Britta Schasberger. 1994. The Penn Treebank: Annotating predicate argument structure. In ARPA Human Language Technology Workshop, pages 114\u2013119, Plainsboro, NJ.","DOI":"10.3115\/1075812.1075835"},{"key":"R47","doi-asserted-by":"publisher","DOI":"10.1613\/jair.2395"},{"key":"R48","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00217"},{"key":"R49","doi-asserted-by":"publisher","DOI":"10.3115\/1218955.1218991"},{"key":"R50","unstructured":"Mihalcea, Rada. 2002. Bootstrapping large sense tagged corpora. In Proceedings of the 3rdInternational Conference on Language Resources and Evaluations (LREC), pages 1,407\u20131,411, Las Palmas."},{"key":"R51","unstructured":"Mihalcea, Rada. 2007. Using Wikipedia for automatic Word Sense Disambiguation. In Proceedings of NAACL-HLT-07, pages 196\u2013203, Rochester, NY."},{"key":"R52","unstructured":"Mihalcea, Rada, Timothy Chklovski, and Adam Kilgarriff. 2004. The Senseval-3 English lexical sample task. In Proceedings of Senseval-3: The Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pages 25\u201328, Barcelona."},{"key":"R53","unstructured":"Mihalcea, Rada and Dan Moldovan. 1999. An automatic method for generating sense tagged corpora. In Proceedings AAAI '99, pages 461\u2013466, Orlando, FL."},{"key":"R54","unstructured":"Mihalcea, Rada and Dan Moldovan. 2001. eXtended WordNet: Progress report. In Proceedings of the NAACL Workshop on WordNet and Other Lexical Resources, pages 95\u2013100, Pittsburgh, PA."},{"key":"R55","doi-asserted-by":"publisher","DOI":"10.1093\/ijl\/3.4.235"},{"key":"R56","doi-asserted-by":"publisher","DOI":"10.3115\/1075671.1075742"},{"key":"R58","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00179"},{"key":"R59","doi-asserted-by":"crossref","unstructured":"Nakov, Preslav I. and Marti A. Hearst. 2003. Category-based pseudowords. In HLT-NAACL 2003\u2013Short Papers, pages 67\u201369, Edmonton.","DOI":"10.3115\/1073483.1073506"},{"key":"R60","unstructured":"Navigli, Roberto. 2005. Semi-automatic extension of large-scale linguistic knowledge bases. In Proceedings of FLAIRS-05, pages 548\u2013553, Clearwater Beach, FL."},{"key":"R61","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324908004749"},{"key":"R62","doi-asserted-by":"publisher","DOI":"10.1145\/1459352.1459355"},{"key":"R64","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2009.36"},{"key":"R65","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2012.07.001"},{"key":"R66","unstructured":"Navigli, Roberto and Simone Paolo Ponzetto. 2012b. Joining forces pays off: Multilingual joint Word Sense Disambiguation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1,399\u20131,410, Jeju."},{"key":"R67","unstructured":"Navigli, Roberto and Daniele Vannella. 2013. SemEval-2013 task 11: Evaluating Word Sense Induction and Disambiguation within an end-user application. In Proceedings of the 7thInternational Workshop on Semantic Evaluation (SemEval 2013), in conjunction with the Second Joint Conference on Lexical and Computational Semantics (\u2021SEM 2013), pages 193\u2013201, Atlanta, GA."},{"key":"R68","unstructured":"Otrusina, Lubomir and Pavel Smrz. 2010. A new approach to pseudoword generation. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), pages 1,195\u20131,199, Valletta."},{"key":"R69","doi-asserted-by":"publisher","DOI":"10.1017\/S135132490500402X"},{"key":"R70","unstructured":"Pham, Thanh Phong, Hwee Tou Ng, and Wee Sun Lee. 2005. Word Sense Disambiguation with semi-supervised learning. In Proceedings of the 20th National Conference on Artificial Intelligence, AAAI'05, pages 1,093\u20131,098, Pittsburgh, PA."},{"key":"R71","unstructured":"Pilehvar, Mohammad Taher, David Jurgens, and Roberto Navigli. 2013. Align, disambiguate and walk: A unified approach for measuring semantic similarity. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 1,341\u20131,351, Sofia."},{"key":"R72","unstructured":"Pilehvar, Mohammad Taher and Roberto Navigli. 2013. Paving the way to a large-scale pseudosense-annotated dataset. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), pages 1,100\u20131,109, Atlanta, GA."},{"key":"R73","unstructured":"Ponzetto, Simone Paolo and Roberto Navigli. 2010. Knowledge-rich Word Sense Disambiguation rivaling supervised system. In Proceedings of the 48thAnnual Meeting of the Association for Computational Linguistics (ACL), pages 1,522\u20131,531, Uppsala."},{"key":"R74","doi-asserted-by":"publisher","DOI":"10.3115\/1621474.1621490"},{"key":"R75","doi-asserted-by":"publisher","DOI":"10.1109\/ICSC.2007.83"},{"key":"R76","doi-asserted-by":"publisher","DOI":"10.1145\/326440.326447"},{"key":"R77","doi-asserted-by":"crossref","unstructured":"Sch\u00fctze, Hinrich. 1992. Dimensions of meaning. In Supercomputing '92: Proceedings of the 1992 ACM\/IEEE Conference on Supercomputing, pages 787\u2013796, Los Alamitos, CA.","DOI":"10.1109\/SUPERC.1992.236684"},{"key":"R78","unstructured":"Shen, Hui, Razvan Bunescu, and Rada Mihalcea. 2013. Coarse to fine grained sense disambiguation in Wikipedia. In Second Joint Conference on Lexical and Computational Semantics (\u2021SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pages 22\u201331, Atlanta, GA."},{"key":"R79","doi-asserted-by":"publisher","DOI":"10.3115\/1613715.1613751"},{"key":"R80","unstructured":"Snyder, Benjamin and Martha Palmer. 2004. The English all-words task. In Proceedings of the 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (SENSEVAL-3), pages 41\u201343, Barcelona."},{"key":"R81","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-1122"},{"key":"R82","unstructured":"Venhuizen, Noortje J., Valerio Basile, Kilian Evang, and Johan Bos. 2013. Gamification for word sense labeling. In Proceedings of the International Conference on Computational Semantics (IWCS), pages 397\u2013403, Potsdam."},{"key":"R83","doi-asserted-by":"publisher","DOI":"10.3115\/1220575.1220644"},{"key":"R84","doi-asserted-by":"publisher","DOI":"10.3115\/1075671.1075731"},{"key":"R85","doi-asserted-by":"publisher","DOI":"10.3115\/981658.981684"},{"key":"R86","unstructured":"Zhong, Zhi and Hwee Tou Ng. 2009. Word Sense Disambiguation for all words without hard labor. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), pages 1,616\u20131,622, Pasadena, CA."},{"key":"R87","unstructured":"Zhong, Zhi and Hwee Tou Ng. 2010. It makes sense: A wide-coverage Word Sense Disambiguation system for free text. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pages 78\u201383, Uppsala."}],"container-title":["Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/COLI_a_00202","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,3]],"date-time":"2025-05-03T11:14:48Z","timestamp":1746270888000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/coli\/article\/40\/4\/837-881\/1488"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,12]]},"references-count":82,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2014,12]]}},"alternative-id":["10.1162\/COLI_a_00202"],"URL":"https:\/\/doi.org\/10.1162\/coli_a_00202","relation":{},"ISSN":["0891-2017","1530-9312"],"issn-type":[{"type":"print","value":"0891-2017"},{"type":"electronic","value":"1530-9312"}],"subject":[],"published":{"date-parts":[[2014,12]]}}}