{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T15:10:59Z","timestamp":1774451459380,"version":"3.50.1"},"reference-count":74,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2023,4,10]],"date-time":"2023-04-10T00:00:00Z","timestamp":1681084800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"DOI":"10.13039\/501100000780","name":"European Commission","doi-asserted-by":"publisher","award":["871042"],"award-info":[{"award-number":["871042"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,8,31]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Native language identification (NLI) is the task of training (via supervised machine learning) a classifier that guesses the native language of the author of a text. This task has been extensively researched in the last decade, and the performance of NLI systems has steadily improved over the years. We focus on a different facet of the NLI task, i.e. that of analysing the internals of an NLI classifier trained by an explainable machine learning (EML) algorithm, in order to obtain explanations of its classification decisions, with the ultimate goal of gaining insight into which linguistic phenomena \u2018give a speaker\u2019s native language away\u2019. We use this perspective in order to tackle both NLI and a (much less researched) companion task, i.e. guessing whether a text has been written by a native or a non-native speaker. Using three datasets of different provenance (two datasets of English learners\u2019 essays and a dataset of social media posts), we investigate which kind of linguistic traits (lexical, morphological, syntactic, and statistical) are most effective for solving our two tasks, namely, are most indicative of a speaker\u2019s L1; our experiments indicate that the most discriminative features are the lexical ones, followed by the morphological, syntactic, and statistical features, in this order. We also present two case studies, one on Italian and one on Spanish learners of English, in which we analyse individual linguistic traits that the classifiers have singled out as most important for spotting these L1s; we show that the traits identified as most discriminative well align with our intuition, i.e. represent typical patterns of language misuse, underuse, or overuse, by speakers of the given L1. Overall, our study shows that the use of EML can be a valuable tool for the scholar who investigates interlanguage facts and language transfer.<\/jats:p>","DOI":"10.1093\/llc\/fqad019","type":"journal-article","created":{"date-parts":[[2023,4,10]],"date-time":"2023-04-10T18:38:51Z","timestamp":1681151931000},"page":"953-977","source":"Crossref","is-referenced-by-count":2,"title":["Unravelling interlanguage facts via explainable machine learning"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-3454-8934","authenticated-orcid":false,"given":"Barbara","family":"Berti","sequence":"first","affiliation":[{"name":"Dipartimento di Lingue, Letterature, Culture e Mediazioni, Universit\u00e0 degli Studi di Milano , Milano, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5725-4322","authenticated-orcid":false,"given":"Andrea","family":"Esuli","sequence":"additional","affiliation":[{"name":"Istituto di Scienza e Tecnologie dell\u2019Informazione, Consiglio Nazionale delle Ricerche , Pisa, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4221-6427","authenticated-orcid":false,"given":"Fabrizio","family":"Sebastiani","sequence":"additional","affiliation":[{"name":"Istituto di Scienza e Tecnologie dell\u2019Informazione, Consiglio Nazionale delle Ricerche , Pisa, Italy"}]}],"member":"286","published-online":{"date-parts":[[2023,4,10]]},"reference":[{"key":"2023083111394426700_fqad019-B1","first-page":"141","volume-title":"Learner English on Computer","author":"Aarts","year":"1998"},{"key":"2023083111394426700_fqad019-B2","doi-asserted-by":"crossref","DOI":"10.1075\/scl.54","volume-title":"Advances in Corpus-based Contrastive Linguistics: Studies in Honour of Stig Johansson","author":"Aijmer","year":"2013"},{"key":"2023083111394426700_fqad019-B3","first-page":"105","article-title":"Lexical bundles in learner writing: an analysis of formulaic language in the ALESS learner corpus","volume":"1","author":"Allen","year":"2010","journal-title":"Komaba Journal of English Education"},{"key":"2023083111394426700_fqad019-B4","first-page":"80","volume-title":"Learner English on Computer","author":"Altenberg","year":"1998"},{"key":"2023083111394426700_fqad019-B5","first-page":"39","volume-title":"Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2018)","author":"Anand Kumar","year":"2018"},{"key":"2023083111394426700_fqad019-B6","first-page":"99","volume-title":"Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2017)","author":"Anand Kumar","year":"2017"},{"key":"2023083111394426700_fqad019-B7","first-page":"1","volume-title":"The TESOL Encyclopedia of English Language Teaching","author":"Bardovi-Harlig","year":"2018"},{"key":"2023083111394426700_fqad019-B8","first-page":"177","volume-title":"Atti del Convegno Nazionale dell-Associazione Italiana Terminologia","author":"Basile","year":"2008"},{"key":"2023083111394426700_fqad019-B9","author":"Beare","year":"2000"},{"key":"2023083111394426700_fqad019-B10","doi-asserted-by":"crossref","first-page":"688969","DOI":"10.3389\/fdata.2021.688969","article-title":"Principles and practice of explainable machine learning","volume":"4","author":"Belle","year":"2021","journal-title":"Frontiers in Big Data"},{"issue":"1\u20134","key":"2023083111394426700_fqad019-B11","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.jslw.2014.09.004","article-title":"Quantifying the development of phraseological competence in L2 English writing: an automated approach","volume":"26","author":"Bestegen","year":"2014","journal-title":"Journal of Second Language Writing"},{"issue":"2","key":"2023083111394426700_fqad019-B12","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1080\/09332480.2003.10554843","article-title":"Who wrote the 15th Book of Oz? An application of multivariate analysis to authorship attribution","volume":"16","author":"Binongo","year":"2003","journal-title":"Chance"},{"key":"2023083111394426700_fqad019-B13","author":"Blanchard","year":"2013"},{"key":"2023083111394426700_fqad019-B14","author":"Brooke","year":"2011"},{"key":"2023083111394426700_fqad019-B15","volume-title":"Odd Pairs and False Friends: Dizionario di false analogie e ambigue affinit\u00e0 fra inglese e italiano","author":"Browne","year":"1987"},{"key":"2023083111394426700_fqad019-B16","first-page":"115","article-title":"A contrastive analysis of epistemic modality in scientific English","volume":"18","author":"Carri\u00f3 Pastor","year":"2012","journal-title":"Revista de Lengua para Fines Espec\u00edficos"},{"issue":"3","key":"2023083111394426700_fqad019-B17","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1016\/j.jeap.2013.04.002","article-title":"A contrastive study of the variation of sentence connectors in academic English","volume":"12","author":"Carri\u00f3 Pastor","year":"2013","journal-title":"Journal of English for Academic Purposes"},{"key":"2023083111394426700_fqad019-B18","first-page":"148","author":"Corbara","year":"2019"},{"issue":"1\u20134","key":"2023083111394426700_fqad019-B19","first-page":"161","article-title":"The significance of learner\u2019s errors","volume":"5","author":"Corder","year":"1967","journal-title":"International Review of Applied Linguistics in Language Teaching"},{"issue":"1","key":"2023083111394426700_fqad019-B20","first-page":"11","article-title":"On sources of errors in foreign language learning","volume":"7","author":"Duv\u0161kov\u00e1","year":"1969","journal-title":"International Review of Applied Linguistics in Language Teaching"},{"issue":"2","key":"2023083111394426700_fqad019-B21","doi-asserted-by":"crossref","DOI":"10.1145\/3433164","article-title":"A critical reassessment of the Saerens-Latinne-Decaestecker algorithm for posterior probability adjustment","volume":"39","author":"Esuli","year":"2021","journal-title":"ACM Transactions on Information Systems"},{"issue":"4","key":"2023083111394426700_fqad019-B22","doi-asserted-by":"crossref","first-page":"825","DOI":"10.3102\/00028312033004825","article-title":"A cognitive theory of orthographic transitioning: predictable errors in how Spanish-speaking children spell English words","volume":"33","author":"Fashola","year":"1996","journal-title":"American Educational Research Journal"},{"key":"2023083111394426700_fqad019-B23","volume-title":"Teaching and Learning English as a Foreign Language","author":"Fries","year":"1945"},{"key":"2023083111394426700_fqad019-B24","first-page":"1","author":"Geertzen","year":"2013"},{"key":"2023083111394426700_fqad019-B25","doi-asserted-by":"crossref","first-page":"96","DOI":"10.56021\/9780801834585","volume-title":"Clues, Myths, and the Historical Method: Works of Carlo Ginzburg","author":"Ginzburg","year":"1989"},{"key":"2023083111394426700_fqad019-B26","first-page":"3591","author":"Goldin","year":"2018"},{"key":"2023083111394426700_fqad019-B27","first-page":"3","volume-title":"Learner English on Computer","author":"Granger","year":"1998"},{"key":"2023083111394426700_fqad019-B28","volume-title":"International Corpus of Learner English","author":"Granger","year":"2009","edition":"2nd edn."},{"issue":"1","key":"2023083111394426700_fqad019-B29","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1111\/j.1467-971X.1996.tb00089.x","article-title":"Connector usage in the English essay writing of native and non-native EFL speakers of English","volume":"15","author":"Granger","year":"1996","journal-title":"World Englishes"},{"key":"2023083111394426700_fqad019-B30","doi-asserted-by":"crossref","first-page":"143","DOI":"10.21832\/9781847693389-007","volume-title":"Thinking and Speaking in Two Languages","author":"Gullberg","year":"2011"},{"issue":"1","key":"2023083111394426700_fqad019-B31","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1075\/ijcl.16080.hua","article-title":"Dependency parsing of learner English","volume":"23","author":"Huang","year":"2018","journal-title":"International Journal of Corpus Linguistics"},{"key":"2023083111394426700_fqad019-B32","article-title":"Un caso di attribuzionismo novecentesco: Il \u201cDiario Postumo\u201d di Montale","volume":"6","author":"Italia","year":"2013","journal-title":"Cognitive Philology"},{"key":"2023083111394426700_fqad019-B33","doi-asserted-by":"crossref","DOI":"10.21832\/9781847696991","volume-title":"Approaching Language Transfer through Text Classification Explorations in the Detection-based Approach","author":"Jarvis","year":"2012"},{"key":"2023083111394426700_fqad019-B34","first-page":"3309","author":"Jiang","year":"2014"},{"issue":"6245","key":"2023083111394426700_fqad019-B35","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1126\/science.aaa8415","article-title":"Machine learning: trends, perspectives, and prospects","volume":"349","author":"Jordan","year":"2015","journal-title":"Science"},{"issue":"1","key":"2023083111394426700_fqad019-B36","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/s10579-018-9424-0","article-title":"Computational authorship attribution in medieval Latin corpora: the case of the Monk of Lido (ca. 1101\u201308) and Gallus Anonymous (ca. 1113\u201317)","volume":"54","author":"Kabala","year":"2020","journal-title":"Language Resources and Evaluation"},{"issue":"2","key":"2023083111394426700_fqad019-B37","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1093\/llc\/fqt063","article-title":"Collaborative authorship in the twelfth century: a stylometric study of Hildegard of Bingen and Guibert of Gembloux","volume":"30","author":"Kestemont","year":"2015","journal-title":"Digital Scholarship in the Humanities"},{"key":"2023083111394426700_fqad019-B38","first-page":"624","author":"Koppel","year":"2005"},{"key":"2023083111394426700_fqad019-B39","first-page":"135","volume-title":"Language Transfer in Language Learning: Issues in Second Language Research","author":"Krashen","year":"1983"},{"key":"2023083111394426700_fqad019-B40","author":"K\u00f6hlmyr","year":"2001"},{"key":"2023083111394426700_fqad019-B41","volume-title":"Linguistics Across Cultures","author":"Lado","year":"1957"},{"key":"2023083111394426700_fqad019-B42","first-page":"61","article-title":"Principes de stylom\u00e9trie","volume":"11","author":"Lutos\u0142awski","year":"1898","journal-title":"Revue des \u00c9tudes Grecques"},{"key":"2023083111394426700_fqad019-B43","author":"Malmasi","year":"2016"},{"key":"2023083111394426700_fqad019-B44","first-page":"1403","author":"Malmasi","year":"2015"},{"key":"2023083111394426700_fqad019-B45","first-page":"62","author":"Malmasi","year":"2017"},{"issue":"2","key":"2023083111394426700_fqad019-B46","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1017\/S0272263100004137","article-title":"On determining developmental stages in natural second language acquisition","volume":"3","author":"Meisel","year":"1981","journal-title":"Studies in Second Language Acquisition"},{"key":"2023083111394426700_fqad019-B47","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1126\/science.ns-9.214S.237","article-title":"The characteristic curves of composition","volume":"9","author":"Mendenhall","year":"1887","journal-title":"Science"},{"key":"2023083111394426700_fqad019-B48","author":"Miliander","year":"2003"},{"issue":"2","key":"2023083111394426700_fqad019-B49","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1109\/TKDE.2018.2883446","article-title":"Learning to weight for text classification","volume":"32","author":"Moreo","year":"2020","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"2023083111394426700_fqad019-B50","volume-title":"Inference and Disputed Authorship: The Federalist","author":"Mosteller","year":"1964"},{"issue":"1","key":"2023083111394426700_fqad019-B51","first-page":"1","article-title":"An investigation of three Chinese students\u2019 English writing strategies","volume":"11","author":"Mu","year":"2007","journal-title":"TESL-EJ: The Electronic Journal for English as a Second Language"},{"key":"2023083111394426700_fqad019-B52","author":"Narita","year":"2004"},{"key":"2023083111394426700_fqad019-B53","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781139524537","volume-title":"Language Transfer: Cross-linguistic Influence in Language Learning","author":"Odlin","year":"1989"},{"key":"2023083111394426700_fqad019-B54","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1002\/9780470756492.ch15","volume-title":"The Handbook of Second Language Acquisition","author":"Odlin","year":"2003"},{"key":"2023083111394426700_fqad019-B55","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1163\/9789401206204_006","volume-title":"Linking Up Contrastive and Learner Corpus Research","author":"Osborne","year":"2008"},{"key":"2023083111394426700_fqad019-B56","doi-asserted-by":"crossref","first-page":"61","DOI":"10.7551\/mitpress\/1113.003.0008","volume-title":"Advances in Large Margin Classifiers","author":"Platt","year":"2000"},{"key":"2023083111394426700_fqad019-B57","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1162\/tacl_a_00024","article-title":"Native language cognate effects on second-language lexical choice","volume":"6","author":"Rabinovich","year":"2018","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2023083111394426700_fqad019-B58","author":"Ros\u00e9n","year":"2006"},{"key":"2023083111394426700_fqad019-B59","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-030-53360-1","volume-title":"Machine Learning Methods for Stylometry: Authorship Attribution and Author Profiling","author":"Savoy","year":"2020"},{"key":"2023083111394426700_fqad019-B60","first-page":"98","volume-title":"Language Transfer in Language Learning: Issues in Second Language Research","author":"Schachter","year":"1983"},{"issue":"1","key":"2023083111394426700_fqad019-B61","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/505282.505283","article-title":"Machine learning in automated text categorization","volume":"34","author":"Sebastiani","year":"2002","journal-title":"ACM Computing Surveys"},{"issue":"1\u20134","key":"2023083111394426700_fqad019-B62","first-page":"209","article-title":"Interlanguage","volume":"10","author":"Selinker","year":"1972","journal-title":"International Review of Applied Linguistics in Language Teaching"},{"key":"2023083111394426700_fqad019-B63","doi-asserted-by":"crossref","first-page":"9","DOI":"10.13053\/rcs-123-1-1","article-title":"Authorship verification: a review of recent advances","volume":"123","author":"Stamatatos","year":"2016","journal-title":"Research in Computing Science"},{"key":"2023083111394426700_fqad019-B64","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511667121","volume-title":"Learner English: A Teacher\u2019s Guide to Interference and Other Problems","author":"Swan","year":"2001"},{"key":"2023083111394426700_fqad019-B65","first-page":". 48","author":"Tetreault","year":"2013"},{"key":"2023083111394426700_fqad019-B66","first-page":"2585","author":"Tetreault","year":"2012"},{"issue":"2","key":"2023083111394426700_fqad019-B67","first-page":"435","article-title":"An application of a profile-based method for authorship verification: investigating the authenticity of Pliny the Younger\u2019s letter to Trajan concerning the Christians","volume":"32","author":"Tuccinardi","year":"2017","journal-title":"Digital Scholarship in the Humanities"},{"issue":"2","key":"2023083111394426700_fqad019-B68","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1093\/llc\/fqaa067","article-title":"Arden of Faversham, the authorship problem: Shakespeare, Watson, or Kyd?","volume":"37","author":"Vickers","year":"2022","journal-title":"Digital Scholarship in the Humanities"},{"issue":"2","key":"2023083111394426700_fqad019-B69","doi-asserted-by":"crossref","first-page":"123","DOI":"10.2307\/3586182","article-title":"The contrastive analysis hypothesis","volume":"4","author":"Wardhaugh","year":"1970","journal-title":"TESOL Quarterly"},{"issue":"3","key":"2023083111394426700_fqad019-B70","doi-asserted-by":"crossref","first-page":"41","DOI":"10.3991\/ijet.v10i3.4563","article-title":"An error analysis of the word class: a case study of Chinese college students","volume":"10","author":"Xia","year":"2015","journal-title":"International Journal of Emerging Technologies in Learning"},{"key":"2023083111394426700_fqad019-B71","first-page":"2048","author":"Xu","year":"2015"},{"key":"2023083111394426700_fqad019-B72","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1515\/iprg.2004.1.2.211","article-title":"Chinese categorization of interpersonal relationships and the cultural logic of Chinese social interaction: an indigenous perspective","volume":"1","author":"Ye","year":"2004","journal-title":"Intercultural Pragmatics"},{"issue":"5","key":"2023083111394426700_fqad019-B73","doi-asserted-by":"crossref","first-page":"578","DOI":"10.4304\/jltr.1.5.578-582","article-title":"A study of Chinese learning of English tag questions","volume":"1","author":"Zhang","year":"2010","journal-title":"Journal of Language Teaching and Research"},{"key":"2023083111394426700_fqad019-B74","doi-asserted-by":"crossref","first-page":"941","DOI":"10.1007\/978-0-387-30164-8_804","volume-title":"Encyclopedia of Machine Learning","author":"Zhang","year":"2011"}],"container-title":["Digital Scholarship in the Humanities"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/dsh\/article-pdf\/38\/3\/953\/51309582\/fqad019.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/dsh\/article-pdf\/38\/3\/953\/51309582\/fqad019.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,9]],"date-time":"2023-12-09T16:11:57Z","timestamp":1702138317000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/dsh\/article\/38\/3\/953\/7111841"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,10]]},"references-count":74,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,4,10]]},"published-print":{"date-parts":[[2023,8,31]]}},"URL":"https:\/\/doi.org\/10.1093\/llc\/fqad019","relation":{},"ISSN":["2055-7671","2055-768X"],"issn-type":[{"value":"2055-7671","type":"print"},{"value":"2055-768X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,9,1]]},"published":{"date-parts":[[2023,4,10]]}}}