{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T04:58:14Z","timestamp":1773723494495,"version":"3.50.1"},"reference-count":64,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,11,30]],"date-time":"2023-11-30T00:00:00Z","timestamp":1701302400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>We discover sizable differences between the lexical complexity assignments of first language (L1) and second language (L2) English speakers. The complexity assignments of 940 shared tokens without context were extracted and compared from three lexical complexity prediction (LCP) datasets: the CompLex dataset, the Word Complexity Lexicon, and the CERF-J wordlist. It was found that word frequency, length, syllable count, familiarity, and prevalence as well as a number of derivations had a greater effect on perceived lexical complexity for L2 English speakers than they did for L1 English speakers. We explain these findings in connection to several theories from applied linguistics and then use these findings to inform a binary classifier that is trained to distinguish between spelling errors made by L1 and L2 English speakers. Our results indicate that several of our findings are generalizable. Differences in perceived lexical complexity are shown to be useful in the automatic identification of problematic words for these differing target populations. This gives support to the development of personalized lexical complexity prediction and text simplification systems.<\/jats:p>","DOI":"10.3389\/frai.2023.1236963","type":"journal-article","created":{"date-parts":[[2023,11,30]],"date-time":"2023-11-30T07:12:15Z","timestamp":1701328335000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Features of lexical complexity: insights from L1 and L2 speakers"],"prefix":"10.3389","volume":"6","author":[{"given":"Kai","family":"North","sequence":"first","affiliation":[]},{"given":"Marcos","family":"Zampieri","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2023,11,30]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"446","DOI":"10.1177\/1367006911429511","article-title":"The acquisition of concrete, abstract, and emotion words in a second language","volume":"16","author":"Altarriba","year":"2011","journal-title":"Int. J. Bilingual."},{"key":"B2","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1186\/1471-2105-13-161","article-title":"Concept Annotation in the CRAFT corpus","volume":"13","author":"Bada","year":"2012","journal-title":"BMC Bioinform."},{"key":"B3","unstructured":"The British National Corpus, XML Edition. Oxford Text Archive2015"},{"key":"B4","article-title":"\u201cWeb 1t 5-gram version 1,\u201d","volume-title":"Linguistic Data Consortium (LDC)","author":"Brants","year":"2006"},{"key":"B5","doi-asserted-by":"publisher","first-page":"1520","DOI":"10.3758\/s13428-016-0811-4","article-title":"Test-based age-of-acquisition norms for 44 thousand English word meanings","volume":"49","author":"Brysbaert","year":"2017","journal-title":"Behav. Res."},{"key":"B6","doi-asserted-by":"publisher","first-page":"467","DOI":"10.3758\/s13428-018-1077-9","article-title":"Word prevalence norms for 62,000 English lemmas","volume":"51","author":"Brysbaert","year":"2019","journal-title":"Behav. Res. Methods"},{"key":"B7","doi-asserted-by":"publisher","first-page":"904","DOI":"10.3758\/s13428-013-0403-5","article-title":"Concreteness ratings for 40 thousand generally known English word lemmas","volume":"46","author":"Brysbaert","year":"2013","journal-title":"Behav. Res. Methods"},{"key":"B8","doi-asserted-by":"publisher","first-page":"375","DOI":"10.1007\/s10579-014-9287-y","article-title":"A massively parallel corpus: the bible in 100 languages","volume":"49","author":"Christodouloupoulos","year":"2015","journal-title":"Lang. Resour. Eval."},{"key":"B9","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1017\/S0142716406060206","article-title":"Continuity and shallow structures in language processing","volume":"27","author":"Clahsen","year":"","journal-title":"Appl. Psycholinguist."},{"key":"B10","doi-asserted-by":"publisher","first-page":"564","DOI":"10.1016\/j.tics.2006.10.002","volume":"10","author":"Clahsen","year":"","journal-title":"Trends Cogn. Sci."},{"key":"B11","doi-asserted-by":"publisher","first-page":"639","DOI":"10.1017\/S0272263117000250","article-title":"Critical commentary: some notes on the shallow structure hypothesis","volume":"40","author":"Clahsen","year":"2018","journal-title":"Stud. Second Lang. Acquisit."},{"key":"B12","year":"2020","journal-title":"Common European Framework of Reference for Languages: Learning, Teaching, Assessment"},{"key":"B13","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1016\/j.jslw.2009.02.002","article-title":"Computational assessment of lexical differences in L1 and L2 writing","volume":"18","author":"Crossley","year":"2009","journal-title":"J. Second Lang. Writ."},{"key":"B14","doi-asserted-by":"publisher","first-page":"1377","DOI":"10.1080\/17470210802483834","article-title":"The different representational frameworks underpinning abstract and concrete knowledge: evidence from odd-one-out judgements","volume":"62","author":"Crutch","year":"2009","journal-title":"Q. J. Exp. Psychol."},{"key":"B15","article-title":"\u201cLCP-RIT at SemEval-2021 task 1: exploring linguistic features for lexical complexity prediction,\u201d","author":"Desai","year":"2021","journal-title":"Proceedings of SemEval"},{"key":"B16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1017\/S014271640606005X","volume":"27","author":"Dowens","year":"2006","journal-title":"Appl. Psycholinguist."},{"key":"B17","first-page":"62","article-title":"Pronunciation of English consonants, vowels and diphthongs of Mandarin Chinese speakers","volume":"8","author":"Enli","year":"2014","journal-title":"Stud. Lit. Lang."},{"key":"B18","first-page":"7","article-title":"Brown Corpus manual","volume":"5","author":"Francis","year":"1979","journal-title":"Lett. Edit"},{"key":"B19","doi-asserted-by":"crossref","first-page":"395","DOI":"10.3758\/BF03201693","article-title":"Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measuresfor 1,944 words","volume":"12","author":"Gilhooly","year":"1980","journal-title":"Behav. Res. Methods Instrument."},{"key":"B20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1111\/j.1467-9922.2009.00549.x","volume":"60","author":"Gor","year":"2010","journal-title":"Lang. Learn."},{"key":"B21","doi-asserted-by":"publisher","first-page":"250","DOI":"10.1080\/10489223.2014.892943","article-title":"Working memory effects in the L2 processing of ambiguous relative clauses","volume":"21","author":"Hopp","year":"2014","journal-title":"Lang. Acquisit."},{"key":"B22","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1016\/j.cortex.2019.01.012","article-title":"Acquisition of L2 morphology by adult language learners","volume":"116","author":"Kimppa","year":"2019","journal-title":"Cortex"},{"key":"B23","article-title":"\u201cEuroparl: a parallel corpus for statistical machine translation,\u201d","volume-title":"Proceedings of MT Summit","author":"Koehn","year":"2005"},{"key":"B24","doi-asserted-by":"publisher","first-page":"1030","DOI":"10.3758\/s13428-017-0924-4","article-title":"The tool for the automatic analysis of lexical sophistication (TAALES): version 2.0","volume":"50","author":"Kyle","year":"2018","journal-title":"Behav. Res."},{"key":"B25","doi-asserted-by":"crossref","DOI":"10.1109\/ICNLSP.2018.8374392","article-title":"\u201cAutomatic prediction of vocabulary knowledge for learners of Chinese as a foreign language,\u201d","volume-title":"Proceedings of ICNLSP","author":"Lee","year":""},{"key":"B26","article-title":"\u201cPersonalizing lexical simplification,\u201d","volume-title":"Proceedings of COLING","author":"Lee","year":""},{"key":"B27","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/D18-1410","article-title":"\u201cA word-complexity lexicon and a neural readability ranking model for lexical simplification,\u201d","volume-title":"Proceedings of EMNLP","author":"Maddela","year":"2018"},{"key":"B28","first-page":"33","article-title":"The CEFR and English education in Japan","volume":"56","author":"Markel","year":"2018","journal-title":"J. Policy Stud."},{"key":"B29","doi-asserted-by":"publisher","first-page":"554","DOI":"10.1017\/S1366728919000233","article-title":"The grammatical class effect is separable from the concreteness effect in language learning","volume":"23","author":"Martin","year":"2020","journal-title":"Bilingual. Lang. Cogn."},{"key":"B30","doi-asserted-by":"publisher","first-page":"4398","DOI":"10.1002\/hbm.23668","article-title":"Recently learned foreign abstract and concrete nouns are represented in distinct cortical networks similar to the native language","volume":"38","author":"Mayer","year":"2017","journal-title":"Hum. Brain Mapp."},{"key":"B31","doi-asserted-by":"publisher","DOI":"10.1037\/tmb0000063","article-title":"Toward more effective and equitable learning: identifying barriers and solutions for the future of online education","author":"McCarthy","year":"2022","journal-title":"Technol. Mind Behav"},{"key":"B32","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1016\/j.jml.2006.06.006","article-title":"Beyond the critical period: processing-based explanations for poor grammaticality judgment performance by late second language learners","volume":"55","author":"McDonald","year":"2006","journal-title":"J. Mem. Lang."},{"key":"B33","doi-asserted-by":"publisher","DOI":"10.1177\/02655322221147924","article-title":"L2 and L1 semantic context indices as automated measures of lexical sophistication","author":"Monteiro","year":"2023","journal-title":"Lang. Test."},{"key":"B34","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s41239-020-00227-w","article-title":"Negotiating growth of online education in higher education","volume":"17","author":"Morris","year":"2020","journal-title":"Int. J. Educ. Technol. Higher Educ."},{"key":"B35","article-title":"\u201cAlejandro Mosquera at SemEval-2021 task 1: exploring sentence and word features for lexical complexity prediction,\u201d","author":"Mosquera","year":"2021","journal-title":"Proceedings of SemEval"},{"key":"B36","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/S16-1152","article-title":"\u201cJU_NLP at SemEval-2016 task 11: identifying complex words in a sentence,\u201d","volume-title":"Proceedings of SemEval","author":"Mukherjee","year":"2016"},{"key":"B37","doi-asserted-by":"publisher","first-page":"551","DOI":"10.1162\/tacl_a_00282","article-title":"Enabling robust grammatical error correction in new domains: data sets, metrics, and analyses","volume":"7","author":"Napoles","year":"2019","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"B38","article-title":"\u201cWord complexity estimation for Japanese lexical simplification,\u201d","volume-title":"Proceedings of LREC","author":"Nishihara","year":"2020"},{"key":"B39","doi-asserted-by":"publisher","DOI":"10.1145\/3557885","article-title":"Lexical complexity prediction: an overview","author":"North","year":"2022","journal-title":"ACM Comput. Surv"},{"key":"B40","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/S16-1085","article-title":"\u201cSemEval 2016 Task 11: complex word identification,\u201d","volume-title":"Proceedings of SemEval","author":"Paetzold","year":"2016"},{"key":"B41","volume-title":"Mind and Its Evolution: A Dual Coding Theoretical Account","author":"Paivio","year":"2006"},{"key":"B42","article-title":"\u201cDeepBlueAI at SemEval-2021 task 1: lexical complexity prediction with a deep ensemble approach,\u201d","author":"Pan","year":"2021","journal-title":"Proceedings of SemEval"},{"key":"B43","doi-asserted-by":"publisher","first-page":"1373","DOI":"10.3389\/fpsyg.2014.01373","article-title":"The effect of morphology on spelling and reading accuracy: a study on Italian children","volume":"5","author":"Paola","year":"2014","journal-title":"Front. Psychol."},{"key":"B44","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/S16-1161","article-title":"\u201cHMC at SemEval-2016 task 11: identifying complex words using depth-limited decision trees,\u201d","volume-title":"Proceedings of SemEval","author":"Quijada","year":"2016"},{"key":"B45","article-title":"\u201cRG PA at SemEval-2021 task 1: a contextual attention-based model with RoBERTa for lexical complexity prediction,\u201d","author":"Rao","year":"2021","journal-title":"Proceedings of SemEval"},{"key":"B46","doi-asserted-by":"publisher","first-page":"705","DOI":"10.1111\/jcal.12517","article-title":"To simplify or not? Facilitating English L2 users' comprehension and processing of open educational resources in English using text simplification","volume":"37","author":"Rets","year":"2020","journal-title":"J. Comput. Assist. Learn."},{"key":"B47","article-title":"\u201cCompLex \u2014 a new corpus for lexical complexity prediction from likert scale data,\u201d","volume-title":"Proceedings of READI","author":"Shardlow","year":"2020"},{"key":"B48","article-title":"\u201cSemEval-2021 task 1: lexical complexity prediction,\u201d","author":"Shardlow","year":"","journal-title":"Proceedings of SemEval"},{"key":"B49","article-title":"\u201cPredicting lexical complexity in English texts,\u201d","volume-title":"Proceedings of LREC","author":"Shardlow","year":""},{"key":"B50","volume-title":"Complex word identification for Swedish","author":"Smolenska","year":"2018"},{"key":"B51","unstructured":"TackA.\n          Ph.D. thesisMark my words! On the automated prediction of lexical difficulty for foreign language readers2021"},{"key":"B52","first-page":"221","article-title":"\u201cMod\u00e8les Adaptatifs pour Pr\u00e9dire Automatiquement la Comp\u00e9tence Lexicale D'un apprenant de Fran\u00e7ais Langue \u00e9trang\u00e8re (Adaptive Models for Automatically Predicting the Lexical Competence of French as a Foreign Language Learners,\u201d","volume-title":"Actes de la conf\u00e9rence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Articles longs)","author":"Tack","year":"2016"},{"key":"B53","first-page":"31","volume":"4","author":"Tono","year":"2017","journal-title":"The CEFR-J and its Impact on English Language Teaching in Japan"},{"key":"B54","unstructured":"Fostering EFL teachers' CALL competencies through project-based learning94105\n            TsengS.-S.\n            YehH.-C.\n          Educ. Technol. Soc.222019"},{"key":"B55","article-title":"\u201cCEFR-based lexical simplification dataset,\u201d","volume-title":"Proceedings of LREC","author":"Uchida","year":"2018"},{"key":"B56","doi-asserted-by":"publisher","first-page":"533","DOI":"10.1111\/tops.12347","article-title":"Learning and processing abstract words and concepts: insights from typical and atypical development","volume":"10","author":"Vigliocco","year":"2018","journal-title":"Top. Cogn. Sci."},{"key":"B57","doi-asserted-by":"crossref","first-page":"6","DOI":"10.3758\/BF03202594","article-title":"MRC psycholinguistic database: machine-usable dictionary, version 2.00","volume":"20","author":"Wilson","year":"1988","journal-title":"Behav. Res. Methods Instrum. Comput."},{"key":"B58","unstructured":"WuY.-C.\n          Ph.D. thesisThe Linguistic Profiles of Spelling Errors in Fourth, Fifth, and Seventh Grade Students2013"},{"key":"B59","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40862-017-0036-9","article-title":"An investigation of cross-linguistic transfer between Chinese and English: a meta-analysis","volume":"2","author":"Yang","year":"2017","journal-title":"Asian Pac. J. Second Foreign Lang. Educ."},{"key":"B60","article-title":"\u201cJUST-BLUE at SemEval-2021 task 1: predicting lexical complexity using BERT and RoBERTa pre-trained language models,\u201d","author":"Yaseen","year":"2021","journal-title":"Proceedings of SemEval"},{"key":"B61","article-title":"\u201cPersonalized text retrieval for learners of Chinese as a foreign language,\u201d","volume-title":"Proceedings of COLING","author":"Yeung","year":"2018"},{"key":"B62","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/W18-0507","article-title":"\u201cA report on the complex word identification shared task 2018,\u201d","volume-title":"Proceedings of BEA","author":"Yimam","year":"2018"},{"key":"B63","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/S16-1155","article-title":"\u201cMacSaar at SemEval-2016 task 11: Zipfian and character features for complexword identification,\u201d","volume-title":"Proceedings of SemEval","author":"Zampieri","year":"2016"},{"key":"B64","first-page":"184","article-title":"\u201cA text corpora-based estimation of the familiarity of health terminology,\u201d","volume-title":"ISBMDA'05","author":"Zeng","year":"2005"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2023.1236963\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,30]],"date-time":"2023-11-30T07:12:27Z","timestamp":1701328347000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2023.1236963\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,30]]},"references-count":64,"alternative-id":["10.3389\/frai.2023.1236963"],"URL":"https:\/\/doi.org\/10.3389\/frai.2023.1236963","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,30]]},"article-number":"1236963"}}