{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,4]],"date-time":"2024-07-04T18:02:55Z","timestamp":1720116175544},"reference-count":38,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2013,3,5]],"date-time":"2013-03-05T00:00:00Z","timestamp":1362441600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2014,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In recent studies it has been shown that syntax-based semantic space models outperform models in which the context is represented as a bag-of-words in several semantic analysis tasks. This has been generally attributed to the fact that syntax-based models employ corpora that are syntactically annotated by a parser and a computational grammar. However, if the corpora processed contain words which are unknown to the parser and the grammar, a syntax-based model may lose its advantage since the syntactic properties of such words are unavailable. On the other hand, bag-of-words models do not face this issue since they operate on raw, non-annotated corpora and are thus more robust. In this paper, we compare the performance of syntax-based and bag-of-words models when applied to the task of learning the semantics of unknown words. In our experiments, unknown words are considered the words which are not known to the Alpino parser and grammar of Dutch. In our study, the semantics of an unknown word is defined by finding its most similar word in<jats:sc>cornetto<\/jats:sc>, a Dutch lexico-semantic hierarchy. We show that for unknown words the syntax-based model performs worse than the bag-of-words approach. Furthermore, we show that if we first learn the syntactic properties of unknown words by an appropriate lexical acquisition method, then in fact the syntax-based model does outperform the bag-of-words approach. The conclusion we draw is that, for words unknown to a given grammar, a bag-of-words model is more robust than a syntax-based model. However, the combination of lexical acquisition and syntax-based semantic models is best suited for learning the semantics of unknown words.<\/jats:p>","DOI":"10.1017\/s1351324913000053","type":"journal-article","created":{"date-parts":[[2013,3,5]],"date-time":"2013-03-05T13:45:34Z","timestamp":1362491134000},"page":"537-555","source":"Crossref","is-referenced-by-count":1,"title":["Lexical acquisition and semantic space models: Learning the semantics of unknown words"],"prefix":"10.1017","volume":"20","author":[{"given":"KOSTADIN","family":"CHOLAKOV","sequence":"first","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2013,3,5]]},"reference":[{"key":"S1351324913000053_ref35","unstructured":"van Noord G. 2006. At last parsing is now operational. In Proceedings of TALN, Leuven, Belgium, pp. 20\u201342."},{"key":"S1351324913000053_ref36","doi-asserted-by":"crossref","unstructured":"Vossen P. (ed.) 1998. EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Dordrecht, Netherlands: Kluwer.","DOI":"10.1007\/978-94-017-1491-4"},{"key":"S1351324913000053_ref31","unstructured":"Sch\u00fctze H. 1998. Automatic word sense discrimination. Computational Linguistics 24 (1): 97\u2013123."},{"key":"S1351324913000053_ref30","doi-asserted-by":"crossref","unstructured":"Salton G. , Wong A. , and Yang C. S. 1975. A vector space model for automatic indexing. Communications of the ACM 18: 613\u201320.","DOI":"10.1145\/361219.361220"},{"key":"S1351324913000053_ref10","unstructured":"Copestake A. , and Flickinger D. 2000. An open-source grammar development environment and broad-coverage English grammar using HPSG. In Proceedings of the 2nd International Conference on Language Resource and Evaluation (LREC 2000), Athens, Greece."},{"key":"S1351324913000053_ref5","doi-asserted-by":"crossref","unstructured":"Cholakov K. , Kordoni V. , and Zhang Y. 2008. Towards domain-independent deep linguistic processing: ensuring portability and re-usability of lexicalised grammars. In Proceedings of COLING 2008 Workshop on Grammar Engineering Across Frameworks (GEAF08), Manchester, UK, pp. 57\u201364.","DOI":"10.3115\/1611546.1611554"},{"key":"S1351324913000053_ref26","unstructured":"Ordelman R. J. F. 2002. Twente nieuws corpus (TwNC). Technical report, Parlevink Language Technology Group, University of Twente, Enschede, Netherlands."},{"key":"S1351324913000053_ref19","unstructured":"Hor\u00e1k A. , Vossen P. , and Rambousek A. 2008. The development of a complex-structured lexicon based on WordNet. In Proceedings of the 4th International Global WordNet Conference (GWC-2008), Szeged, Hungary, pp. 200\u20138."},{"key":"S1351324913000053_ref23","doi-asserted-by":"crossref","unstructured":"Malouf R. 2002. A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of the 6th conference on Natural Language Learning (CoNLL-2002), Taipei, Taiwan, pp. 49\u201355.","DOI":"10.3115\/1118853.1118871"},{"key":"S1351324913000053_ref18","doi-asserted-by":"crossref","unstructured":"Grefenstette G. 1994. Explorations in Automatic Thesaurus Discovery. New York: Springer.","DOI":"10.1007\/978-1-4615-2710-7"},{"key":"S1351324913000053_ref17","unstructured":"Golub G. H. and Van Loan C. F. 1996. Matrix Computations, vol. 3. St Baltimore, MD: Johns Hopkins Univ. Press."},{"key":"S1351324913000053_ref11","unstructured":"Crysmann B. 2003. On the efficient implementation of German verb placement in HPSG. In Proceedings of RANLP 2003, Borovets, Bulgaria."},{"key":"S1351324913000053_ref4","doi-asserted-by":"crossref","unstructured":"Berry M. W. , Dumais S. T. , and O'Brien G. W. 1994. Using linear algebra for intelligent information retrieval. SIAM Review 37: 573\u201395.","DOI":"10.1137\/1037127"},{"key":"S1351324913000053_ref2","unstructured":"Baldwin T. 2005. General-purpose lexical acquisition: Procedures, questions and results. In Proceedings of the Pacific Association for Computational Linguistics, Tokyo, Japan, pp. 23\u201332."},{"key":"S1351324913000053_ref27","doi-asserted-by":"publisher","DOI":"10.1162\/coli.2007.33.2.161"},{"key":"S1351324913000053_ref22","unstructured":"Lowe W. 2001. Towards a theory of semantic space. In Proceedings of the 2nd Annual Conference of the Cognitive Science Society, Edinburgh, UK, pp. 576\u201381."},{"key":"S1351324913000053_ref1","unstructured":"Almuhareb A. , and Poesio M. 2004. Attribute-based and value-based clustering: an evaluation. In Proceedings of EMNLP 2004, Edinburgh, UK, pp. 158\u201365."},{"key":"S1351324913000053_ref13","doi-asserted-by":"crossref","unstructured":"Erbach G. 1990. Syntactic processing of unknown words. IWBS Technical report 131, IBM, Stuttgart.","DOI":"10.1016\/B978-0-444-88771-9.50046-5"},{"key":"S1351324913000053_ref8","unstructured":"Cholakov K. , van Noord G. , Kordoni V. , and Zhang Y. 2011. Adaptability of lexical acquisition for large-scale grammars. In Proceedings of the 8th Conference on Recent Advances in Natural Language Processing (RANLP), Hissar, Bulgaria, pp. 355\u201362."},{"key":"S1351324913000053_ref3","unstructured":"Barg P. , and Walther M. 1998. Processing unknown words in HPSG. In Proceedings of the 36th Conference of the ACL, Montreal, Quebec, Canada, pp. 91\u20135."},{"key":"S1351324913000053_ref16","doi-asserted-by":"crossref","unstructured":"Fouvry F. 2003. Lexicon acquisition with a large-coverage unification-based grammar. In Companion to the 10th Conference of EACL, Budapest, Hungary, pp. 87\u201390.","DOI":"10.3115\/1067737.1067755"},{"key":"S1351324913000053_ref20","unstructured":"Lin D. 1998a. Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics, Montreal, Canada, pp. 768\u201374."},{"key":"S1351324913000053_ref9","unstructured":"Church K. W. , and Hanks P. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16 (1): 22\u20139."},{"key":"S1351324913000053_ref33","unstructured":"Van de Cruys T. 2008. A comparison of bag of words and syntax-based approaches for word categorization. In Proceedings of the ESSLLI Workshop on Distributional Lexical Semantics. Bridging the Gap Between Semantic Theory and Computational Simulations, Hamburg, Germany, pp. 47\u201354."},{"key":"S1351324913000053_ref32","doi-asserted-by":"crossref","unstructured":"Turney Peter D. , and Pantel P. 2010. From frequency to meaning. Vector space models of semantics. Journal of Artificial Intelligence Research 37 (1): 141\u201388.","DOI":"10.1613\/jair.2934"},{"key":"S1351324913000053_ref29","doi-asserted-by":"crossref","unstructured":"Rothenh\u00e4usler K. , and Sch\u00fctze H. 2009. Unsupervised classification with dependency-based word spaces. In Proceedings of the Workshop on Geometrical Models of Natural Language Semantics, Singapore, pp. 17\u201324.","DOI":"10.3115\/1705415.1705418"},{"key":"S1351324913000053_ref15","doi-asserted-by":"crossref","unstructured":"Fellbaum C. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: The MIT Press.","DOI":"10.7551\/mitpress\/7287.001.0001"},{"key":"S1351324913000053_ref14","unstructured":"Erk K. 2007. A simple, similarity-based model for selectional preferences. In Proceedings of the 45th ACL Meeting, Prague, Czech Republic, pp. 216\u201323."},{"key":"S1351324913000053_ref7","unstructured":"Cholakov K. and van Noord G. 2010. Acquisition of unknown word paradigms for large-scale grammars. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING-2010), Beijing, China, pp. 153\u201361."},{"key":"S1351324913000053_ref24","doi-asserted-by":"crossref","unstructured":"McCarthy D. , Koeling R. , Weeds J. , and Carroll J. 2004. Finding predominant word senses in untagged text. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Stroudsburg, PA, pp. 279\u201386.","DOI":"10.3115\/1218955.1218991"},{"key":"S1351324913000053_ref25","doi-asserted-by":"publisher","DOI":"10.1080\/01690969108406936"},{"key":"S1351324913000053_ref34","doi-asserted-by":"crossref","unstructured":"Van der Plas L. , and Tiedemann J. 2006. Finding synonyms using automatic word alignment and measures of distributional similarity. In Proceedings of the COLING-ACL Joint Conference, Sydney, Australia, pp. 866\u201373.","DOI":"10.3115\/1273073.1273184"},{"key":"S1351324913000053_ref37","doi-asserted-by":"crossref","unstructured":"Wu Z. , and Palmer M. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, Las Cruces, New Mexico, pp. 133\u20138.","DOI":"10.3115\/981732.981751"},{"key":"S1351324913000053_ref28","unstructured":"Rapp R. 2004. A freely available automatically generated thesaurus of related words. In Proceedings of the 4th Language Resources and Evaluation Conference (LREC 2004), Lisbon, Portugal, pp. 395\u20138."},{"key":"S1351324913000053_ref38","unstructured":"Zhang Y. , and Kordoni V. 2006. Automated deep lexical acquisition for robust open text processing. In Proceedings of the 5th International Conference on Language Recourses and Evaluation (LREC 2006), Genoa, Italy, pp. 275\u201380."},{"key":"S1351324913000053_ref21","unstructured":"Lin D. 1998b. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI, pp. 296\u2013304."},{"key":"S1351324913000053_ref12","doi-asserted-by":"crossref","unstructured":"Curran J. R. , and Moens M. 2002. Improvements in automatic thesaurus extraction. In Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, Philadelphia, PA, pp. 59\u201366.","DOI":"10.3115\/1118627.1118635"},{"key":"S1351324913000053_ref6","unstructured":"Cholakov K. and van Noord G. 2009. Combining finite state and corpus-based techniques for unknown word prediction. In Proceedings of the 7th Conference on Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria, pp. 60\u201365."}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324913000053","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,7,10]],"date-time":"2019-07-10T13:14:15Z","timestamp":1562764455000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324913000053\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,3,5]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2014,10]]}},"alternative-id":["S1351324913000053"],"URL":"https:\/\/doi.org\/10.1017\/s1351324913000053","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,3,5]]}}}