{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T15:10:53Z","timestamp":1774451453563,"version":"3.50.1"},"reference-count":55,"publisher":"Cambridge University Press (CUP)","issue":"2","license":[{"start":{"date-parts":[[2020,11,26]],"date-time":"2020-11-26T00:00:00Z","timestamp":1606348800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2022,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Native language identification (NLI)\u2014the task of automatically identifying the native language (L1) of persons based on their writings in the second language (L2)\u2014is based on the hypothesis that characteristics of L1 will surface and interfere in the production of texts in L2 to the extent that L1 is identifiable. We present an in-depth investigation of features that model a variety of linguistic phenomena potentially involved in native language interference in the context of the NLI task: the languages\u2019 structuring of information through punctuation usage, emotion expression in language, and similarities of form with the L1 vocabulary through the use of anglicized words, cognates, and other misspellings. The results of experiments with different combinations of features in a variety of settings allow us to quantify the native language interference value of these linguistic phenomena and show how robust they are in cross-corpus experiments and with respect to proficiency in L2. These experiments provide a deeper insight into the NLI task, showing how native language interference explains the gap between baseline, corpus-independent features, and the state of the art that relies on features\/representations that cover (indiscriminately) a variety of linguistic phenomena.<\/jats:p>","DOI":"10.1017\/s1351324920000595","type":"journal-article","created":{"date-parts":[[2020,11,26]],"date-time":"2020-11-26T07:54:00Z","timestamp":1606377240000},"page":"167-197","update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":3,"title":["Exploiting native language interference for native language identification"],"prefix":"10.1017","volume":"28","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9533-748X","authenticated-orcid":false,"given":"Ilia","family":"Markov","sequence":"first","affiliation":[]},{"given":"Vivi","family":"Nastase","sequence":"additional","affiliation":[]},{"given":"Carlo","family":"Strapparava","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2020,11,26]]},"reference":[{"key":"S1351324920000595_ref1","doi-asserted-by":"publisher","DOI":"10.1016\/S0388-0001(00)00027-9"},{"key":"S1351324920000595_ref27","doi-asserted-by":"publisher","DOI":"10.1177\/0146167211399103"},{"key":"S1351324920000595_ref37","first-page":"6","article-title":"What\u2019s the point?","volume":"3","author":"Moore","year":"2016","journal-title":"The role of punctuation in realising information structure in written English. Functional Linguistics"},{"key":"S1351324920000595_ref6","doi-asserted-by":"publisher","DOI":"10.1016\/0271-5309(93)90019-J"},{"key":"S1351324920000595_ref43","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00024"},{"key":"S1351324920000595_ref28","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions, and reversals","volume":"10","author":"Levenshtein","year":"1966","journal-title":"Soviet Physics Doklady"},{"key":"S1351324920000595_ref54","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-2096"},{"key":"S1351324920000595_ref45","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2015.06.003"},{"key":"S1351324920000595_ref15","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1395"},{"key":"S1351324920000595_ref40","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781139524537"},{"key":"S1351324920000595_ref42","volume-title":"Linguistic Inquiry and Word Count: LIWC2007","author":"Pennebaker","year":"2007"},{"key":"S1351324920000595_ref12","unstructured":"de Melo, G. and Weikum, G. (2010). Towards universal multilingual knowledge bases. In Principles, Construction, and Applications of Multilingual Wordnets. Proceedings of the 5th Global WordNet Conference. Mumbai, India: Narosa Publishing House, pp. 149\u2013156."},{"key":"S1351324920000595_ref29","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324915000406"},{"key":"S1351324920000595_ref20","doi-asserted-by":"crossref","unstructured":"Ionescu, R.T. and Popescu, M. (2017). Can string kernels pass the test of time in native language identification? In Proceedings of the 12th Workshop on Building Educational Applications Using NLP. Copenhagen, Denmark: ACL, pp. 224\u2013234.","DOI":"10.18653\/v1\/W17-5024"},{"key":"S1351324920000595_ref55","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511521256"},{"key":"S1351324920000595_ref9","unstructured":"Chen, L. (2016). Native Language Identification on Learner Corpora. M.Phil. Thesis, University of Trento, Department of Information Engineering and Science, Trento, Italy."},{"key":"S1351324920000595_ref4","unstructured":"Brooke, J. and Hirst, G. (2011). Native language detection with \u2018cheap\u2019 learner corpora. In Proceedings of the Conference of Learner Corpus Research. Louvain-la-Neuve, Belgium: Presses universitaires de Louvain, pp. 37\u201347."},{"key":"S1351324920000595_ref35","doi-asserted-by":"publisher","DOI":"10.1007\/BF02295996"},{"key":"S1351324920000595_ref47","first-page":"13","volume-title":"Improvements in Part-of-Speech Tagging With an Application to German","author":"Schmid","year":"1999"},{"key":"S1351324920000595_ref50","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3907"},{"key":"S1351324920000595_ref51","unstructured":"Tetreault, J. , Blanchard, D. , Cahill, A. and Chodorow, M. (2012). Native tongues, lost and found: Resources and empirical evaluations in native language identification. In Proceedings of the 24th International Conference on Computational Linguistics. Mumbai, India: The COLING 2012 Organizing Committee, pp. 2585\u20132602."},{"key":"S1351324920000595_ref30","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-5007"},{"key":"S1351324920000595_ref16","unstructured":"G\u00f3mez-Adorno, H. , Bel-Enguix, G. , Sierra, G. , S\u00e1nchez, O. and Quezada, D. (2018). A machine learning approach for detecting aggressive tweets in Spanish. In Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages, vol. 2150. Seville, Spain: CEUR-WS.org, pp. 97\u2013101."},{"key":"S1351324920000595_ref22","unstructured":"Jarvis, S. , Bestgen, Y. and Pepper, S. (2013). Maximizing classification accuracy in native language identification. In Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications. Atlanta, GA, USA: ACL, pp. 111\u2013118."},{"key":"S1351324920000595_ref21","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1142"},{"key":"S1351324920000595_ref26","unstructured":"Kumar, A. , Ganesh, B. , Singh, S. , Soman, P. and Rosso, P. (2017). Overview of the INLI PAN at FIRE-2017 track on Indian native language identification. In Working notes of FIRE 2017 - Forum for Information Retrieval Evaluation, vol. 2036. Bangalore, India: CEUR Workshop Proceedings, pp. 99\u2013105."},{"key":"S1351324920000595_ref33","doi-asserted-by":"crossref","unstructured":"Markov, I. and Sidorov, G. (2018). CIC-IPN@INLI2018: Indian native language identification. In Working Notes of FIRE 2018 - Forum for Information Retrieval Evaluation, vol. 2266. Gandhinagar, India: CEUR Workshop Proceedings, pp. 82\u201388.","DOI":"10.1145\/3293339.3293342"},{"key":"S1351324920000595_ref44","unstructured":"Rangel, F. and Rosso, P. (2013). On the identification of emotions and authors\u2019 gender in facebook comments on the basis of their writing style. In Proceedings of the First International Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and perspectives from AI, vol. 1096. Torino, Italy: CEUR-WS.org, pp. 34\u201346."},{"key":"S1351324920000595_ref8","first-page":"1","article-title":"Empirical evaluations of language-based author identification techniques","volume":"8","author":"Chaski","year":"2001","journal-title":"Forensic Linguistics"},{"key":"S1351324920000595_ref3","doi-asserted-by":"publisher","DOI":"10.1002\/j.2333-8504.2013.tb02331.x"},{"key":"S1351324920000595_ref13","doi-asserted-by":"publisher","DOI":"10.4018\/IJDET.2018100102"},{"key":"S1351324920000595_ref2","unstructured":"Bergsma, S. and Kondrak, G. (2007). Alignment-based discriminative string similarity. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Prague, Czech Republic: ACL, pp. 656\u2013663."},{"key":"S1351324920000595_ref23","doi-asserted-by":"crossref","unstructured":"Kestemont, M. (2014). Function words in authorship attribution. From black magic to theory? In Proceedings of the 3rd Workshop on Computational Linguistics for Literature. Gothenburg, Sweden: ACL, pp. 59\u201366.","DOI":"10.3115\/v1\/W14-0908"},{"key":"S1351324920000595_ref49","unstructured":"Smith, T. and Witten, I. (1993). Language inference from function words. Tech. rept. 93\/3. Department of Computer Science, University of Waikato. Computer Science Working Papers."},{"key":"S1351324920000595_ref10","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-2086"},{"key":"S1351324920000595_ref32","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-5042"},{"key":"S1351324920000595_ref41","first-page":"2825","article-title":"Scikit-learn: Machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"Journal of Machine Learning Research"},{"key":"S1351324920000595_ref39","unstructured":"Nicolai, G. , Hauer, B. , Salameh, M. , Yao, L. and Kondrak, G. (2013). Cognate and misspelling features for natural language identification. In Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications. Atlanta, GA, USA: ACL, pp. 140\u2013145."},{"key":"S1351324920000595_ref48","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-37807-2_1"},{"key":"S1351324920000595_ref52","unstructured":"Tetreault, J. , Blanchard, D. and Cahill, A. (2013). A report on the first native language identification shared task. In Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications. Atlanta, GA, USA: ACL, pp. 48\u201357."},{"key":"S1351324920000595_ref46","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-1605"},{"key":"S1351324920000595_ref53","doi-asserted-by":"publisher","DOI":"10.1002\/asi.22627"},{"key":"S1351324920000595_ref14","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2017.08.068"},{"key":"S1351324920000595_ref5","unstructured":"Brooke, J. and Hirst, G. (2012). Robust, lexicalized native language identification. In Proceedings of the 24th International Conference on Computational Linguistics. Mumbai, India: The COLING 2012 Organizing Committee, pp. 391\u2013408."},{"key":"S1351324920000595_ref38","doi-asserted-by":"publisher","DOI":"10.1177\/0146167203029005010"},{"key":"S1351324920000595_ref36","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-8640.2012.00460.x"},{"key":"S1351324920000595_ref31","doi-asserted-by":"publisher","DOI":"10.3115\/1073336.1073356"},{"key":"S1351324920000595_ref25","doi-asserted-by":"crossref","unstructured":"Kumar, A. , Ganesh, B. , Ajay, S. and Soman, P. (2018). Overview of the second shared task on Indian native language identification (INLI). In Working notes of FIRE 2018 - Forum for Information Retrieval Evaluation, vol. 2266. Gandhinagar, India: CEUR Workshop Proceedings, pp. 39\u201350.","DOI":"10.1145\/3293339.3293342"},{"key":"S1351324920000595_ref17","volume-title":"Louvain-la-Neuve","author":"Granger","year":"2009"},{"key":"S1351324920000595_ref24","doi-asserted-by":"publisher","DOI":"10.1145\/1081870.1081947"},{"key":"S1351324920000595_ref34","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-77116-8_21"},{"key":"S1351324920000595_ref18","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqm020"},{"key":"S1351324920000595_ref7","doi-asserted-by":"publisher","DOI":"10.3389\/fpsyg.2014.01055"},{"key":"S1351324920000595_ref19","doi-asserted-by":"publisher","DOI":"10.1016\/j.system.2012.01.006"},{"key":"S1351324920000595_ref11","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-5049"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324920000595","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,2,8]],"date-time":"2022-02-08T03:26:39Z","timestamp":1644290799000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324920000595\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,26]]},"references-count":55,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,3]]}},"alternative-id":["S1351324920000595"],"URL":"https:\/\/doi.org\/10.1017\/s1351324920000595","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,26]]},"assertion":[{"value":"\u00a9 The Author(s), 2020. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}}]}}