{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:05:59Z","timestamp":1760241959577,"version":"build-2065373602"},"reference-count":41,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2018,11,22]],"date-time":"2018-11-22T00:00:00Z","timestamp":1542844800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003329","name":"Ministerio de Econom\u00eda y Competitividad","doi-asserted-by":"publisher","award":["FFI2014-51978-C2-1-R"],"award-info":[{"award-number":["FFI2014-51978-C2-1-R"]}],"id":[{"id":"10.13039\/501100003329","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>In this article, we define the outlier detection task and use it to compare neural-based word embeddings with transparent count-based distributional representations. Using the English Wikipedia as a text source to train the models, we observed that embeddings outperform count-based representations when their contexts are made up of bag-of-words. However, there are no sharp differences between the two models if the word contexts are defined as syntactic dependencies. In general, syntax-based models tend to perform better than those based on bag-of-words for this specific task. Similar experiments were carried out for Portuguese with similar results. The test datasets we have created for the outlier detection task in English and Portuguese are freely available.<\/jats:p>","DOI":"10.3390\/make1010013","type":"journal-article","created":{"date-parts":[[2018,11,23]],"date-time":"2018-11-23T03:41:31Z","timestamp":1542944491000},"page":"211-223","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Using the Outlier Detection Task to Evaluate Distributional Semantic Models"],"prefix":"10.3390","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5819-2469","authenticated-orcid":false,"given":"Pablo","family":"Gamallo","sequence":"first","affiliation":[{"name":"Centro Singular de Investigaci\u00f3n en Tecnolox\u00edas da Informaci\u00f3n (CiTIUS), Campus Vida, Universidade de Santiago de Compostela, 15782 Santiago de Compostela, Galiza, Spain"}]}],"member":"1968","published-online":{"date-parts":[[2018,11,22]]},"reference":[{"key":"ref_1","first-page":"13:1","article-title":"Evaluation of Distributional Models with the Outlier Detection Task","volume":"Volume 62","author":"Henriques","year":"2018","journal-title":"7th Symposium on Languages, Applications and Technologies (SLATE 2018)"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Batchkarov, M., Kober, T., Reffin, J., Weeds, J., and Weir, D. (2016, January 7\u201312). A Critique of Word Similarity as a Method for Evaluating Distributional Semantic Models. Proceedings of the ACL Workshop on Evaluating Vector Space Representations for NLP, Berlin, Germany.","DOI":"10.18653\/v1\/W16-2502"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Camacho-Collados, J., and Navigli, R. (2016, January 7\u201312). Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations. Proceedings of the ACL Workshop on Evaluating Vector Space Representations for NLP, Berlin, Germany.","DOI":"10.18653\/v1\/W16-2508"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Baroni, M., Dinu, G., and Kruszewski, G. (2014, January 23\u201325). Don\u2019t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MA, USA.","DOI":"10.3115\/v1\/P14-1023"},{"key":"ref_5","unstructured":"Mikolov, T., Yih, W.-T., and Zweig, G. (2013, January 9\u201314). Linguistic Regularities in Continuous Space Word Representations. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, USA."},{"key":"ref_6","first-page":"417","article-title":"Rehabilitation of Count-Based Models for Word Vector Representations","volume":"Volume 9041","author":"Gelbukh","year":"2015","journal-title":"CICLing-2015"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1162\/tacl_a_00134","article-title":"Improving Distributional Similarity with Lessons Learned from Word Embeddings","volume":"3","author":"Levy","year":"2015","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Levy, O., and Goldberg, Y. (2014, January 26\u201327). Linguistic Regularities in Sparse and Explicit Word Representations. Proceedings of the Eighteenth Conference on Computational Natural Language Learning (CoNLL 2014), Baltimore, MD, USA.","DOI":"10.3115\/v1\/W14-1618"},{"key":"ref_9","unstructured":"Blacoe, W., and Lapata, M. (2012, January 12\u201314). A comparison of vector-based representations for semantic composition. Proceedings of the Empirical Methods in Natural Language Processing-EMNLP-2012, Jeju Island, Korea."},{"key":"ref_10","unstructured":"Faruqui, M., and Dyer, C. (2015, January 26\u201331). Non-distributional Word Vector Representations. Proceedings of the ACL, Beijing, China."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"727","DOI":"10.1007\/s10579-016-9357-4","article-title":"Comparing Explicit and Predictive Distributional Semantic Models Endowed with Syntactic Contexts","volume":"51","author":"Gamallo","year":"2017","journal-title":"Lang. Resour. Eval."},{"key":"ref_12","unstructured":"Huang, E., Socher, R., and Manning, C. (2012, January 8\u201314). Improving word representations via global context and multiple word prototypes. Proceedings of the ACL-2012, Jeju Island, Korea."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Levy, O., and Goldberg, Y. (2014, January 22\u201327). Dependency-Based Word Embeddings. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Baltimore, MD, USA.","DOI":"10.3115\/v1\/P14-2050"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Camacho-Collados, J., Pilehvar, M., Collier, N., and Navigli, R. (2017, January 3\u20134). SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity. Proceedings of the SemEval, Vancouver, BC, Canada.","DOI":"10.18653\/v1\/S17-2002"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1145\/503104.503110","article-title":"Placing search in context: The concept revisited","volume":"20","author":"Finkelstein","year":"2002","journal-title":"ACM Trans. Inf. Syst."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pa\u015fca, M., and Soroa, A. (2009). A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches. Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL \u201909), Association for Computational Linguistics.","DOI":"10.3115\/1620754.1620758"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"665","DOI":"10.1162\/COLI_a_00237","article-title":"Simlex-999: Evaluating semantic models with (genuine) similarity estimation","volume":"41","author":"Hill","year":"2015","journal-title":"Comput. Linguist."},{"key":"ref_18","unstructured":"Mitchell, J., and Lapata, M. (2008, January 19\u201320). Vector-based models of semantic composition. Proceedings of the ACL-08: HLT, Columbus, OH, USA."},{"key":"ref_19","unstructured":"Grefenstette, E., and Sadrzadeh, M. (2011, January 31). Experimenting with Transitive Verbs in a DisCoCat. Proceedings of the Workshop on Geometrical Models of Natural Language Semantics (EMNLP-2011), Edinburgh, UK."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1037\/0033-295X.104.2.211","article-title":"A solution to Plato\u2019s problem: The Latent Semantic Analysis theory of acquision, induction and representation of knowledge","volume":"10","author":"Landauer","year":"1997","journal-title":"Psychol. Rev."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Turney, P. (2001, January 5\u20137). Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the 12th European Conference of Machine Learning, Freiburg, Germany.","DOI":"10.1007\/3-540-44795-4_42"},{"key":"ref_22","unstructured":"Wilkens, R., Zilio, L., Ferreira, E., and Villavicencio, A. (2016, January 23\u201328). B2SG: A TOEFL-like Task for Portuguese. Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portoro\u017e, Slovenia."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"77","DOI":"10.2307\/3585941","article-title":"The role of vocabulary teaching","volume":"10","author":"Egenhofer","year":"1976","journal-title":"TESOL Q."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1162\/0891201053630318","article-title":"Clustering Syntactic Positions with Similar Semantic Requirements","volume":"31","author":"Gamallo","year":"2005","journal-title":"Comput. Linguist."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1162\/coli_a_00016","article-title":"Distributional Memory: A General Framework for Corpus-based Semantics","volume":"36","author":"Baroni","year":"2010","journal-title":"Comput. Linguist."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1007\/s10579-010-9129-5","article-title":"Is Singular Value Decomposition Useful for Word Simalirity Extraction","volume":"45","author":"Gamallo","year":"2011","journal-title":"Lang. Resour. Eval."},{"key":"ref_27","unstructured":"Bordag, S. (2008, January 17\u201323). A Comparison of Co-occurrence and Similarity Measures as Simulations of Context. Proceedings of the 9th CICLing, Haifa, Israel."},{"key":"ref_28","first-page":"61","article-title":"Accurate Methods for the Statistics of Surprise and Coincidence","volume":"19","author":"Dunning","year":"1993","journal-title":"Comput. Linguist."},{"key":"ref_29","first-page":"55","article-title":"Text: Now in 2D! A Framework for Lexical Expansion with Contextual Similarity","volume":"1","author":"Biemann","year":"2013","journal-title":"J. Lang. Model."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Padr\u00f3, M., Idiart, M., Villavicencio, A., and Ramisch, C. (2014, January 25\u201329). Nothing like Good Old Frequency: Studying Context Filters for Distributional Thesauri. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014, A Meeting of SIGDAT, a Special Interest Group of the ACL), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1047"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Gamallo, P. (2015, January 5\u20137). Dependency Parsing with Compression Rules. Proceedings of the 14th International Workshop on Parsing Technology (IWPT 2015), Association for Computational Linguistics, Bilbao, Spain.","DOI":"10.18653\/v1\/W15-2214"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C.D. (2014, January 25\u201329). GloVe: Global Vectors for Word Representation. Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1162"},{"key":"ref_33","unstructured":"Goldberg, Y., and Nivre, J. (2012, January 8\u201315). A Dynamic Oracle for Arc-Eager Dependency Parsing. Proceedings of the 24th International Conference on Computational Linguistics: Technical Papers (COLING 2012), Mumbai, India."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1244","DOI":"10.1016\/j.ipm.2018.05.003","article-title":"Dependency parsing with finite state transducers and compression rules","volume":"54","author":"Gamallo","year":"2018","journal-title":"Inf. Process. Manag."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Gamallo, P. (2008). Comparing Window and Syntax Based Strategies for Semantic Extraction. PROPOR-2008, Springer. Lecture Notes in Computer Science.","DOI":"10.1007\/978-3-540-85980-2_5"},{"key":"ref_36","unstructured":"Gamallo, P. (2009, January 12\u201315). Comparing Different Properties Involved in Word Similarity Extraction. Proceedings of the 14th Portuguese Conference on Artificial Intelligence (EPIA\u201909), LNCS, Aveiro, Portugal."},{"key":"ref_37","unstructured":"Grefenstette, G. (1993, January 21). Evaluation techniques for automatic semantic extraction: Comparing syntactic and window-based approaches. Proceedings of the Workshop on Acquisition of Lexical Knowledge from Text SIGLEX\/ACL, Columbus, OH, USA."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1162\/coli.2007.33.2.161","article-title":"Dependency-Based Construction of Semantic Space Models","volume":"33","author":"Lapata","year":"2007","journal-title":"Comput. Linguist."},{"key":"ref_39","unstructured":"Peirsman, Y., Heylen, K., and Speelman, D. (2007, January 20). Finding semantically related words in Dutch. Co-occurrences versus syntactic contexts. Proceedings of the CoSMO Workshop, Roskilde, Denmark."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Seretan, V., and Wehrli, E. (2006, January 17\u201318). Accurate Collocation Extraction Using a Multilingual Parser. Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, Sydney, Australia.","DOI":"10.3115\/1220175.1220295"},{"key":"ref_41","unstructured":"Hartmann, N., Fonseca, E.R., Shulby, C., Treviso, M.V., Silva, J., and Alu\u00edsio, S.M. (2017, January 2\u20135). Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks. Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology (STIL 2017), Uberl\u00e2ndia, Brazil."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/1\/13\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:31:30Z","timestamp":1760196690000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/1\/13"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,11,22]]},"references-count":41,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["make1010013"],"URL":"https:\/\/doi.org\/10.3390\/make1010013","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2018,11,22]]}}}