{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,13]],"date-time":"2025-05-13T22:00:06Z","timestamp":1747173606876,"version":"3.40.5"},"reference-count":32,"publisher":"Cambridge University Press (CUP)","issue":"5","license":[{"start":{"date-parts":[[2019,8,15]],"date-time":"2019-08-15T00:00:00Z","timestamp":1565827200000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2020,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper describes experiments in which I tried to distinguish between Flemish and Netherlandic Dutch subtitles, as originally proposed in the VarDial 2018 Dutch\u2013Flemish Subtitle task. However, rather than using all data as a monolithic block, I divided them into two non-overlapping domains and then investigated how the relation between training and test domains influences the recognition quality. I show that the best estimate of the level of recognizability of the language varieties is derived when training on one domain and testing on another. Apart from the quantitative results, I also present a qualitative analysis, by investigating in detail the most distinguishing features in the various scenarios. Here too, it is with the out-of-domain recognition that some genuine differences between Flemish and Netherlandic Dutch can be found.<\/jats:p>","DOI":"10.1017\/s1351324919000445","type":"journal-article","created":{"date-parts":[[2019,8,15]],"date-time":"2019-08-15T08:17:42Z","timestamp":1565857062000},"page":"493-510","update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":1,"title":["Domain bias in distinguishing Flemish and Dutch subtitles"],"prefix":"10.1017","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8115-1799","authenticated-orcid":false,"given":"Hans","family":"van Halteren","sequence":"first","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2019,8,15]]},"reference":[{"unstructured":"\u00c7\u00f6ltekin \u00c7., Rama, T. and Blaschke, V. (2018). T\u00fcbingen-Oslo team at the VarDial 2018 evaluation campaign: an analysis of n-gram features in language variety identification. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), Santa Fe, USA, pp. 55\u201365.","key":"S1351324919000445_ref8"},{"doi-asserted-by":"publisher","key":"S1351324919000445_ref25","DOI":"10.18653\/v1\/W17-1224"},{"doi-asserted-by":"publisher","key":"S1351324919000445_ref31","DOI":"10.18653\/v1\/W17-1201"},{"unstructured":"Malmasi, S. , Zampieri, M. , Ljube\u0161i\u0107, N. , Nakov, P. , Ali, A. and Tiedemann, J. (2016). Discriminating between similar languages and Arabic dialect identification: a report on the third DSL shared task. In Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pp. 1\u201314.","key":"S1351324919000445_ref16"},{"doi-asserted-by":"publisher","key":"S1351324919000445_ref18","DOI":"10.18653\/v1\/W16-5805"},{"unstructured":"Komen, E. (2015). Surfacing Dutch syntactic parses. Presentation at Computational Linguistics in the Netherlands (CLIN26), Amsterdam, 2015. http:wordpress.let.vupr.nlclin26abstracts.","key":"S1351324919000445_ref13"},{"key":"S1351324919000445_ref20","first-page":"37","article-title":"Multiple discriminant analysis in linguistic problems","volume":"4","author":"Mustonen","year":"1965","journal-title":"Statistical Methods in Linguistics"},{"unstructured":"Malmasi, S. , Refaee, E. and Dras, M. (2015). Arabic dialect identification using a parallel multidialectal corpus. In Conference of the Pacific Association for Computational Linguistics, pp. 35\u201353.","key":"S1351324919000445_ref15"},{"unstructured":"Zampieri, M. , Malmasi, S. , Nakov, P. , Ali, A. , Shon, S. , Glass, J. , Scherrer, Y. , Samard\u017ei\u0107, T. , Ljube\u0161i, N. , Tiedemann, J. , van der Lee, C. , Grondelaers, S. , Oostdijk, N. , van den Bosch, A. , Kumar, R. , Lahiri, B. and Jain, M. (2018). Language identification and morphosyntactic tagging: the second VarDial evaluation campaign. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), Santa Fe, USA, pp. 1\u201317.","key":"S1351324919000445_ref32"},{"doi-asserted-by":"publisher","key":"S1351324919000445_ref9","DOI":"10.1075\/mdm.1.13cal"},{"doi-asserted-by":"publisher","key":"S1351324919000445_ref11","DOI":"10.18653\/v1\/W17-1211"},{"key":"S1351324919000445_ref5","first-page":"45","article-title":"Alpino: wide-coverage computational analysis of Dutch","volume":"37","author":"Bouma","year":"2001","journal-title":"Language and Computers"},{"unstructured":"Campigotto, R. , Conde C\u00e9spedes, P. and Guillaume, J.-L. (2014). A generalized and adaptive method for community detection. arXiv preprint arXiv:1406.2518.","key":"S1351324919000445_ref6"},{"unstructured":"Baayen, H. , van Halteren, H. , Neijt, A. and Tweedie, F. (2002). An experiment in authorship attribution. In Proceedings of JADT 2002: Sixth International Conference on Textual Data Statistical Analysis, pp. 29\u201337.","key":"S1351324919000445_ref3"},{"unstructured":"Argamon, S. , \u0160ari\u0107, M. and Stein S., S. (2003). Style mining of electronic messages for multiple author discrimination. In Proceedings of ACM Conference on Knowledge Discovery and Data Mining, 2003.","key":"S1351324919000445_ref2"},{"doi-asserted-by":"publisher","key":"S1351324919000445_ref19","DOI":"10.1080\/09296170500500694"},{"doi-asserted-by":"publisher","key":"S1351324919000445_ref10","DOI":"10.1017\/S1470542711000110"},{"unstructured":"van der Lee, C. (2017). Text-Based Video Genre Classification Using Multiple Feature Categories and Categorization Methods. Master\u2019s Thesis, Tilburg University.","key":"S1351324919000445_ref24"},{"doi-asserted-by":"publisher","key":"S1351324919000445_ref14","DOI":"10.1002\/asi.20961"},{"doi-asserted-by":"publisher","key":"S1351324919000445_ref26","DOI":"10.1145\/1187415.1187416"},{"key":"S1351324919000445_ref28","first-page":"171","article-title":"Gender recognition on Dutch tweets","volume":"4","author":"van Halteren","year":"2014","journal-title":"Computational Linguistics in the Netherlands Journal"},{"doi-asserted-by":"publisher","key":"S1351324919000445_ref29","DOI":"10.3115\/v1\/W14-5307"},{"unstructured":"Mikros, G.K. and Argiri, E.K. (2007). Investigating topic influence in authorship attribution. In Proceedings of the Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection, SIGIR \u201907, Amsterdam.","key":"S1351324919000445_ref17"},{"doi-asserted-by":"publisher","key":"S1351324919000445_ref22","DOI":"10.1002\/asi.21001"},{"doi-asserted-by":"publisher","key":"S1351324919000445_ref21","DOI":"10.3115\/v1\/W14-3907"},{"unstructured":"Zampieri, M. , Tan, L. , Ljube\u0161i\u0107, N. , Tiedemann, J. and Nakov, P. (2015). Overview of the DSL shared task 2015. In Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects, pp. 1\u20139.","key":"S1351324919000445_ref30"},{"doi-asserted-by":"publisher","key":"S1351324919000445_ref4","DOI":"10.1088\/1742-5468\/2008\/10\/P10008"},{"unstructured":"Aguilar, G. , AlGhamdi, F. , Soto, V. , Solorio, T. , Diab, M. and Hirschberg, J. (2018). Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching. Melbourne, Australia: Association for Computational Linguistics.","key":"S1351324919000445_ref1"},{"unstructured":"van den Bosch, A. , Busser, B. and Daelemans, W. (2007). An efficient memory-based morphosyntactic tagger and parser for Dutch. In van Eynde F., Dirix P., Schuurman I. and Vandeghinste V. (eds), Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting, Leuven, Belgium, pp. 99\u2013114.","key":"S1351324919000445_ref23"},{"unstructured":"van Halteren, H. and Oostdijk, N. (2018). Identification of differences between Dutch language varieties with the VarDial2018 Dutch-Flemish subtitle data. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), Santa Fe, USA, pp. 199\u2013209.","key":"S1351324919000445_ref27"},{"key":"S1351324919000445_ref7","first-page":"1","article-title":"Empirical evaluations of language-based author identification techniques","volume":"8","author":"Chaski","year":"2001","journal-title":"Forensic Linguistics"},{"unstructured":"Jauhiainen, T. , Lui, M. , Zampieri, M. , Baldwin, T. and Lind\u00e9n, K. (2018). Automatic language identification in texts: a survey. arXiv preprint arXiv:1804.08186.","key":"S1351324919000445_ref12"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324919000445","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,8,10]],"date-time":"2020-08-10T13:31:36Z","timestamp":1597066296000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324919000445\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,8,15]]},"references-count":32,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2020,9]]}},"alternative-id":["S1351324919000445"],"URL":"https:\/\/doi.org\/10.1017\/s1351324919000445","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"type":"print","value":"1351-3249"},{"type":"electronic","value":"1469-8110"}],"subject":[],"published":{"date-parts":[[2019,8,15]]},"assertion":[{"value":"\u00a9 The Author(s) 2019","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http:\/\/creativecommons.org\/licenses\/by\/4.0\/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.","name":"license","label":"License","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}]}}