{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,20]],"date-time":"2026-02-20T08:04:39Z","timestamp":1771574679191,"version":"3.50.1"},"reference-count":56,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2010,10,11]],"date-time":"2010-10-11T00:00:00Z","timestamp":1286755200000},"content-version":"unspecified","delay-in-days":10,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2010,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Languages are not uniform. Speakers of different language varieties use certain words differently \u2013 more or less frequently, or with different meanings. We argue that distributional semantics is the ideal framework for the investigation of such lexical variation. We address two research questions and present our analysis of the lexical variation between Belgian Dutch and Netherlandic Dutch. The first question involves a classic application of distributional models: the automatic retrieval of synonyms. We use corpora of two different language varieties to identify the Netherlandic Dutch synonyms for a set of typically Belgian words. Second, we address the problem of automatically identifying words that are typical of a given lect, either because of their high frequency or because of their divergent meaning. Overall, we show that distributional models are able to identify more <jats:italic>lectal markers<\/jats:italic> than traditional keyword methods. Distributional models also have a bias towards a different type of variation. In summary, our results demonstrate how distributional semantics can help research in variational linguistics, with possible future applications in lexicography or terminology extraction.<\/jats:p>","DOI":"10.1017\/s1351324910000161","type":"journal-article","created":{"date-parts":[[2010,10,11]],"date-time":"2010-10-11T15:13:10Z","timestamp":1286809990000},"page":"469-491","source":"Crossref","is-referenced-by-count":17,"title":["The automatic identification of lexical variation between language varieties"],"prefix":"10.1017","volume":"16","author":[{"given":"YVES","family":"PEIRSMAN","sequence":"first","affiliation":[]},{"given":"DIRK","family":"GEERAERTS","sequence":"additional","affiliation":[]},{"given":"DIRK","family":"SPEELMAN","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2010,10,11]]},"reference":[{"key":"S1351324910000161_ref55","doi-asserted-by":"publisher","DOI":"10.1075\/z.136.17wul"},{"key":"S1351324910000161_ref51","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1613\/jair.2934","article-title":"From frequency to meaning: vector space models of semantics","volume":"37","author":"Turney","year":"2010","journal-title":"Journal of Artificial Intelligence Research"},{"key":"S1351324910000161_ref48","doi-asserted-by":"publisher","DOI":"10.1023\/A:1025019216574"},{"key":"S1351324910000161_ref46","doi-asserted-by":"publisher","DOI":"10.1016\/S0346-251X(97)00011-0"},{"key":"S1351324910000161_ref44","volume-title":"Introduction to Modern Information Retrieval","author":"Salton","year":"1983"},{"key":"S1351324910000161_ref41","first-page":"926","volume-title":"Le poids des mots. Actes des 7es Journ\u00e9es internationales d'Analyse statistique des Donn\u00e9es Textuelles (JADT 2004)","author":"Rayson","year":"2004"},{"key":"S1351324910000161_ref39","doi-asserted-by":"publisher","DOI":"10.3115\/981658.981709"},{"key":"S1351324910000161_ref37","first-page":"648","volume-title":"Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2009)","author":"Peirsman","year":"2009"},{"key":"S1351324910000161_ref35","doi-asserted-by":"publisher","DOI":"10.1162\/coli.2007.33.2.161"},{"key":"S1351324910000161_ref34","first-page":"1","article-title":"TwNC: a multifaceted Dutch news corpus","volume":"12","author":"Ordelman","year":"2007","journal-title":"ELRA Newsletter"},{"key":"S1351324910000161_ref33","first-page":"571","volume-title":"Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007)","author":"Mohammad","year":"2007"},{"key":"S1351324910000161_ref32","doi-asserted-by":"publisher","DOI":"10.1126\/science.1152876"},{"key":"S1351324910000161_ref47","volume-title":"Advances in Cognitive Sociolinguistics","author":"Soares da Silva","year":"2010"},{"key":"S1351324910000161_ref30","first-page":"675","volume-title":"Proceedings of the 22nd Annual Conference of the Cognitive Science Society (CogSci 2000)","author":"Lowe","year":"2000"},{"key":"S1351324910000161_ref27","doi-asserted-by":"publisher","DOI":"10.3758\/BF03212981"},{"key":"S1351324910000161_ref25","doi-asserted-by":"publisher","DOI":"10.1075\/ijcl.6.1.05kil"},{"key":"S1351324910000161_ref22","doi-asserted-by":"publisher","DOI":"10.1080\/00437956.1954.11659520"},{"key":"S1351324910000161_ref21","doi-asserted-by":"publisher","DOI":"10.1515\/9783110197709"},{"key":"S1351324910000161_ref20","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4615-2710-7"},{"key":"S1351324910000161_ref18","volume-title":"Convergentie en Divergentie in de Nederlandse Woordenschat","author":"Geeraerts","year":"1999"},{"key":"S1351324910000161_ref17","first-page":"820","volume-title":"Language and Space. An International Handbook of Linguistic Variation","author":"Geeraerts","year":"2010"},{"key":"S1351324910000161_ref14","unstructured":"Fung P. , and Yee L. Y. 1998. An IR approach for translating new words from non-parallel, comparable texts. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL 1998), pp. 414\u2013420."},{"key":"S1351324910000161_ref13","unstructured":"Fung P. , and McKeown K. 1997. Finding terminology translations from non-parallel corpora. In Proceedings of the 5th Workshop on Very Large Corpora, pp. 192\u2013202."},{"key":"S1351324910000161_ref10","first-page":"61","article-title":"Accurate methods for the statistics of surprise and coincidence","volume":"19","author":"Dunning","year":"1993","journal-title":"Computational Linguistics"},{"key":"S1351324910000161_ref40","first-page":"519","volume-title":"Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999)","author":"Rapp","year":"1999"},{"key":"S1351324910000161_ref7","unstructured":"Curran J. R. 2004. From Distributional to Semantic Similarity. PhD thesis, University of Edinburgh, Edinburgh, UK."},{"key":"S1351324910000161_ref5","first-page":"111","article-title":"Overcrowding in semantic neighborhoods: modeling deep dyslexia","volume":"32","author":"Buchanan","year":"1996","journal-title":"Brain and Cognition"},{"key":"S1351324910000161_ref3","first-page":"179","volume-title":"Actes des 9es Journ\u00e9es internationales d'Analyse statistique des Donn\u00e9es Textuelles (JADT 2008)","author":"Bertels","year":"2008"},{"key":"S1351324910000161_ref1","first-page":"688","volume-title":"Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM 2005)","author":"Bai","year":"2005"},{"key":"S1351324910000161_ref2","doi-asserted-by":"publisher","DOI":"10.3115\/1629795.1629802"},{"key":"S1351324910000161_ref4","unstructured":"Boussidan A. , Sagi E. , and Ploux S. 2009. Phonaesthemic and etymological effects on the distribution of senses in statistical models of semantics. In Proceedings of the CogSci Workshop on Distributional Semantics Beyond Concrete Concepts (DiSCo 2009), pp. 35\u201340. http:\/\/www.let.rug.nl\/disco2009\/proc\/disco2009_proceedings.pdf"},{"key":"S1351324910000161_ref49","volume-title":"Advances in Cognitive Sociolinguistics","author":"Szmrecsanyi","year":"2010"},{"key":"S1351324910000161_ref12","doi-asserted-by":"publisher","DOI":"10.3758\/BF03204765"},{"key":"S1351324910000161_ref50","doi-asserted-by":"publisher","DOI":"10.1515\/cllt.2005.1.2.225"},{"key":"S1351324910000161_ref36","first-page":"613","volume-title":"Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002)","author":"Pantel","year":"2002"},{"key":"S1351324910000161_ref8","volume-title":"Van Dale Groot Woordenboek van de Nederlandse taal (14th ed.)","author":"Den Boon","year":"2005"},{"key":"S1351324910000161_ref26","doi-asserted-by":"publisher","DOI":"10.1515\/cllt.2005.1.2.263"},{"key":"S1351324910000161_ref56","doi-asserted-by":"publisher","DOI":"10.1162\/coli.08-032-R1-06-96"},{"key":"S1351324910000161_ref15","first-page":"19","volume-title":"Proceedings of the LREC-2008 Workshop on Comparable Corpora","author":"Gamallo Otero","year":"2008"},{"key":"S1351324910000161_ref28","doi-asserted-by":"publisher","DOI":"10.1037\/0033-295X.104.2.211"},{"key":"S1351324910000161_ref38","unstructured":"Peirsman Y. , Heylen K. , and Speelman D. 2007. Finding semantically related words in Dutch. Co-occurrences versus syntactic contexts. In Proceedings of the Workshop on Contextual Information in Semantic Space Models (CoSMO 2007), pp. 34\u201341. http:\/\/clic.cimec.unitn.it\/marco\/beyond_words\/proceedings\/proceedingsCosmo.pdf"},{"key":"S1351324910000161_ref54","volume-title":"Philosophical Investigations","author":"Wittgenstein","year":"1953"},{"key":"S1351324910000161_ref43","unstructured":"Sahlgren M. 2006. The Word-Space Model. Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations Between Words in High-dimensional Vector Spaces. PhD thesis, Stockholm University, Stockholm, Sweden."},{"key":"S1351324910000161_ref24","doi-asserted-by":"publisher","DOI":"10.3115\/1609829.1609835"},{"key":"S1351324910000161_ref19","unstructured":"Glynn D. 2007. Mapping Meaning. Toward a Usage-Based Methodology in Cognitive Semantics. PhD thesis, University of Leuven, Leuven, Belgium."},{"key":"S1351324910000161_ref6","doi-asserted-by":"publisher","DOI":"10.1080\/01638539809545027"},{"key":"S1351324910000161_ref45","first-page":"97","article-title":"Automatic word sense discrimination","volume":"24","author":"Sch\u00fctze","year":"1998","journal-title":"Computational Linguistics"},{"key":"S1351324910000161_ref31","unstructured":"Martin W. 2005. Het Belgisch-Nederlands anders bekeken: het Referentiebestand Belgisch-Nederlands (RBBN). Technical report, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands."},{"key":"S1351324910000161_ref53","unstructured":"Van der Plas L. 2008. Automatic Lexico-Semantic Acquisition for Question Answering. PhD thesis, University of Groningen, Groningen, the Netherlands."},{"key":"S1351324910000161_ref23","first-page":"449","volume-title":"Machine Learning Challenges, Evaluating Predictive Uncertainty, Visual Object Classification and Recognizing Textual Entailment, First PASCAL Machine Learning Challenges Workshop (MLCW 2005), Lecture Notes in Computer Science 3944","author":"Jijkoun","year":"2005"},{"key":"S1351324910000161_ref11","first-page":"1","volume-title":"Studies in Linguistic Analysis","author":"Firth","year":"1957"},{"key":"S1351324910000161_ref42","doi-asserted-by":"publisher","DOI":"10.3115\/1705415.1705429"},{"key":"S1351324910000161_ref9","doi-asserted-by":"publisher","DOI":"10.1515\/CLLT.2006.002"},{"key":"S1351324910000161_ref16","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-78135-6_36"},{"key":"S1351324910000161_ref52","unstructured":"Van de Cruys T. 2008. A comparison of bag of words and syntax-based approaches for word categorization. In Baroni M. , Evert S. , and Lenci A. (eds.), Proceedings of the ESSLLI Workshop on Distributional Lexical Semantics, pp. 47\u201354. http:\/\/wordspace.collocations.de\/lib\/exe\/fetch.php\/workshop:esslli:esslli_2008_lexicalsemantics.pdf"},{"key":"S1351324910000161_ref29","first-page":"768","volume-title":"Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL 1998)","author":"Lin","year":"1998"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324910000161","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,4,27]],"date-time":"2019-04-27T21:13:58Z","timestamp":1556399638000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324910000161\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,10]]},"references-count":56,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2010,10]]}},"alternative-id":["S1351324910000161"],"URL":"https:\/\/doi.org\/10.1017\/s1351324910000161","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,10]]}}}