{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T04:58:13Z","timestamp":1773723493869,"version":"3.50.1"},"reference-count":78,"publisher":"Elsevier BV","issue":"3","license":[{"start":{"date-parts":[[2024,9,13]],"date-time":"2024-09-13T00:00:00Z","timestamp":1726185600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,9,13]],"date-time":"2024-09-13T00:00:00Z","timestamp":1726185600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["MO 648\/25-2"],"award-info":[{"award-number":["MO 648\/25-2"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001711","name":"Schweizerischer Nationalfonds zur F\u00f6rderung der Wissenschaftlichen Forschung","doi-asserted-by":"publisher","award":["197968"],"award-info":[{"award-number":["197968"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002869","name":"Christian-Albrechts-Universit\u00e4t zu Kiel","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100002869","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Artif Intell Educ"],"published-print":{"date-parts":[[2025,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Recent investigations in automated essay scoring research imply that hybrid models, which combine feature engineering and the powerful tools of deep neural networks (DNNs), reach state-of-the-art performance. However, most of these findings are from holistic scoring tasks. In the present study, we use a total of four prompts from two different corpora consisting of both L1 and L2 learner essays annotated with trait scores (e.g., content, organization, and language quality). In our main experiments, we compare three variants of trait-specific models using different inputs: (1) models based on 220 linguistic features, (2) models using essay-level contextual embeddings from the distilled version of the pre-trained transformer BERT (DistilBERT), and (3) a hybrid model using both types of features. Results imply that when trait-specific models are trained based on a single resource, the feature-based models slightly outperform the embedding-based models. These differences are most prominent for the organization traits. The hybrid models outperform the single-resource models, indicating that linguistic features and embeddings indeed capture partially different aspects relevant for the assessment of essay traits. To gain more insights into the interplay between both feature types, we run addition and ablation tests for individual feature groups. Trait-specific addition tests across prompts indicate that the embedding-based models can most consistently be enhanced in content assessment when combined with morphological complexity features. Most consistent performance gains in the organization traits are achieved when embeddings are combined with length features, and most consistent performance gains in the assessment of the language traits when combined with lexical complexity, error, and occurrence features. Cross-prompt scoring again reveals slight advantages for the feature-based models.<\/jats:p>","DOI":"10.1007\/s40593-024-00426-w","type":"journal-article","created":{"date-parts":[[2024,9,13]],"date-time":"2024-09-13T14:02:22Z","timestamp":1726236142000},"page":"1178-1217","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Neural Networks or Linguistic Features? - Comparing Different Machine-Learning Approaches for Automated Assessment of Text Quality Traits Among L1- and L2-Learners\u2019 Argumentative Essays"],"prefix":"10.1016","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5864-9692","authenticated-orcid":false,"given":"Julian F.","family":"Lohmann","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0009-4834-8325","authenticated-orcid":false,"given":"Fynn","family":"Junge","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1767-5859","authenticated-orcid":false,"given":"Jens","family":"M\u00f6ller","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4488-1455","authenticated-orcid":false,"given":"Johanna","family":"Fleckenstein","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9915-8611","authenticated-orcid":false,"given":"Ruth","family":"Tr\u00fcb","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0115-5314","authenticated-orcid":false,"given":"Stefan","family":"Keller","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9714-6505","authenticated-orcid":false,"given":"Thorben","family":"Jansen","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0004-3680-3304","authenticated-orcid":false,"given":"Andrea","family":"Horbach","sequence":"additional","affiliation":[]}],"member":"78","published-online":{"date-parts":[[2024,9,13]]},"reference":[{"key":"426_CR1","doi-asserted-by":"publisher","unstructured":"Alikaniotis, D., Yannakoudakis, H., & Rei, M. (2016). Automatic text scoring using neural networks. In K. Erk & N. A. Smith (Eds.), Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016\u00a0(pp. 715\u2013725). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/P16-1068","DOI":"10.18653\/v1\/P16-1068"},{"key":"426_CR2","doi-asserted-by":"publisher","first-page":"376","DOI":"10.1017\/9781316832134.019","volume-title":"The Cambridge handbook of instructional feedback","author":"HL Andrade","year":"2018","unstructured":"Andrade, H. L. (2018). Feedback in the context of self-assessment. In A. A. Lipnevich & J. K. Smith (Eds.), The Cambridge handbook of instructional feedback (pp. 376\u2013408). Cambridge University Press. https:\/\/doi.org\/10.1017\/9781316832134.019"},{"key":"426_CR3","doi-asserted-by":"publisher","unstructured":"Attali, Y., & Powers, D. (2008). A developmental writing scale. ETS Research Report Series, 2008(1). https:\/\/doi.org\/10.1002\/j.2333-8504.2008.tb02105.x","DOI":"10.1002\/j.2333-8504.2008.tb02105.x"},{"issue":"4","key":"426_CR4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s40593-022-00323-0","volume":"33","author":"X Bai","year":"2022","unstructured":"Bai, X., & Stede, M. (2022). A survey of current machine learning approaches to Student Free-text evaluation for intelligent tutoring. International Journal of Artificial Intelligence in Education, 33(4), 1\u201339. https:\/\/doi.org\/10.1007\/s40593-022-00323-0","journal-title":"International Journal of Artificial Intelligence in Education"},{"issue":"2","key":"426_CR5","first-page":"281","volume":"13","author":"J Bergstra","year":"2012","unstructured":"Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research,13(2), 281\u2013305.","journal-title":"Journal of Machine Learning Research"},{"key":"426_CR6","doi-asserted-by":"publisher","unstructured":"Beseiso, M., & Alzahrani, S. (2020). An empirical analysis of BERT Embedding for Automated Essay Scoring. International Journal of Advanced Computer Science and Applications, 11(10). https:\/\/doi.org\/10.14569\/IJACSA.2020.0111027","DOI":"10.14569\/IJACSA.2020.0111027"},{"issue":"3","key":"426_CR7","doi-asserted-by":"publisher","first-page":"727","DOI":"10.1007\/s12528-021-09283-1","volume":"33","author":"M Beseiso","year":"2021","unstructured":"Beseiso, M., Alzubi, O. A., & Rashaideh, H. (2021). A novel automated essay scoring approach for reliable higher educational assessments. Journal of Computing in Higher Education,33(3), 727\u2013746. https:\/\/doi.org\/10.1007\/s12528-021-09283-1","journal-title":"Journal of Computing in Higher Education"},{"key":"426_CR8","doi-asserted-by":"publisher","unstructured":"Bexte, M., Horbach, A., & Zesch, T. (2022). Similarity-Based Content Scoring - How to Make S-BERT Keep Up With BERT. In E. Kochmar, J. C. Burstein, A. Horbach, R. Laarmann-Quante, N. Madnani, A. Tack, V. Yaneva, Z. Yuan, & T. Zesch (Eds.), Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022) (pp. 118\u2013123). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2022.bea-1.16","DOI":"10.18653\/v1\/2022.bea-1.16"},{"key":"426_CR9","doi-asserted-by":"publisher","unstructured":"Bexte, M., Horbach, A., & Zesch, T. (2023). Similarity-Based Content Scoring - A more Classroom-Suitable Alternative to Instance-Based Scoring? In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Findings of the Association for Computational Linguistics: ACL 2023 (pp. 1892\u20131903). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2023.findings-acl.119","DOI":"10.18653\/v1\/2023.findings-acl.119"},{"issue":"1","key":"426_CR10","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1177\/0267658316643125","volume":"35","author":"V Brezina","year":"2019","unstructured":"Brezina, V., & Pallotti, G. (2019). Morphological complexity in written L2 texts. Second Language Research, 35(1), 99\u2013119. https:\/\/doi.org\/10.1177\/0267658316643125","journal-title":"Second Language Research"},{"key":"426_CR11","doi-asserted-by":"publisher","unstructured":"Chassab, R. H., Zakaria, L. Q., & Tiun, S. (2021). Automatic essay Scoring: A review on the feature analysis techniques. International Journal of Advanced Computer Science and Applications, 12(10). https:\/\/doi.org\/10.14569\/IJACSA.2021.0121028","DOI":"10.14569\/IJACSA.2021.0121028"},{"key":"426_CR12","doi-asserted-by":"publisher","unstructured":"Chen, X., & Meurers, D. (2016). CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis.https:\/\/doi.org\/10.17863\/CAM.39630","DOI":"10.17863\/CAM.39630"},{"issue":"1","key":"426_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1002\/ets2.12094","volume":"2016","author":"J Chen","year":"2016","unstructured":"Chen, J., Fife, J. H., Bejar, I. I., & Rupp, A. A. (2016). Building e-rater \u00ae scoring models using machine learning methods. ETS Research Report Series,2016(1), 1\u201312. https:\/\/doi.org\/10.1002\/ets2.12094","journal-title":"ETS Research Report Series"},{"issue":"4","key":"426_CR14","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1037\/h0026256","volume":"70","author":"J Cohen","year":"1968","unstructured":"Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin,70(4), 213\u2013220. https:\/\/doi.org\/10.1037\/h0026256","journal-title":"Psychological Bulletin"},{"key":"426_CR15","doi-asserted-by":"publisher","first-page":"100651","DOI":"10.1016\/j.asw.2022.100651","volume":"53","author":"W Condon","year":"2022","unstructured":"Condon, W., & Elliot, N. (2022). Liz Hamp Lyons: a life in writing assessment. Assessing Writing,53, 100651. https:\/\/doi.org\/10.1016\/j.asw.2022.100651","journal-title":"Assessing Writing"},{"issue":"2","key":"426_CR16","doi-asserted-by":"publisher","first-page":"251","DOI":"10.17239\/jowr-2019.11.02.01","volume":"11","author":"SA Crossley","year":"2019","unstructured":"Crossley, S. A. (2019). Using human judgments to examine the validity of automated grammar, syntax, and mechanical errors in writing. Journal of Writing Research,11(2), 251\u2013270. https:\/\/doi.org\/10.17239\/jowr-2019.11.02.01","journal-title":"Journal of Writing Research"},{"issue":"3","key":"426_CR17","doi-asserted-by":"publisher","first-page":"415","DOI":"10.17239\/jowr-2020.11.03.01","volume":"11","author":"SA Crossley","year":"2020","unstructured":"Crossley, S. A. (2020). Linguistic features in writing quality and development: An overview. Journal of Writing Research,11(3), 415\u2013443. https:\/\/doi.org\/10.17239\/jowr-2020.11.03.01","journal-title":"Journal of Writing Research"},{"issue":"1","key":"426_CR18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1075\/jsls.22006.cro","volume":"6","author":"SA Crossley","year":"2023","unstructured":"Crossley, S. A., & Holmes, L. (2023). Assessing receptive vocabulary using state\u2013of\u2013the\u2013art natural language processing techniques. Journal of Second Language Studies,6(1), 1\u201328. https:\/\/doi.org\/10.1075\/jsls.22006.cro","journal-title":"Journal of Second Language Studies"},{"key":"426_CR19","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1016\/j.jslw.2014.09.006","volume":"26","author":"SA Crossley","year":"2014","unstructured":"Crossley, S. A., & McNamara, D. S. (2014). Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing,26, 66\u201379. https:\/\/doi.org\/10.1016\/j.jslw.2014.09.006","journal-title":"Journal of Second Language Writing"},{"key":"426_CR20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.jslw.2016.01.003","volume":"32","author":"SA Crossley","year":"2016","unstructured":"Crossley, S. A., Kyle, K., & McNamara, D. S. (2016). The development and use of cohesive devices in L2 writing and their relations to judgments of essay quality. Journal of Second Language Writing,32, 1\u201316. https:\/\/doi.org\/10.1016\/j.jslw.2016.01.003","journal-title":"Journal of Second Language Writing"},{"issue":"3","key":"426_CR21","doi-asserted-by":"publisher","first-page":"803","DOI":"10.3758\/s13428-016-0743-z","volume":"49","author":"SA Crossley","year":"2017","unstructured":"Crossley, S. A., Kyle, K., & McNamara, D. S. (2017). Sentiment analysis and social cognition engine (SEANCE): An automatic tool for sentiment, social cognition, and social-order analysis. Behavior Research Methods,49(3), 803\u2013821. https:\/\/doi.org\/10.3758\/s13428-016-0743-z","journal-title":"Behavior Research Methods"},{"issue":"1","key":"426_CR22","doi-asserted-by":"publisher","first-page":"1","DOI":"10.2307\/1494403","volume":"8","author":"M Crowhurst","year":"1983","unstructured":"Crowhurst, M. (1983). Syntactic complexity and writing quality: A review. Canadian Journal of Education \/ Revue Canadienne De L\u2019\u00e9ducation,8(1), 1. https:\/\/doi.org\/10.2307\/1494403","journal-title":"Canadian Journal of Education \/ Revue Canadienne De L\u2019\u00e9ducation"},{"key":"426_CR23","doi-asserted-by":"publisher","unstructured":"Dasgupta, T., Naskar, A., Dey, L., & Rupsa, S. (2018). Augmenting textual qualitative features in deep convolution recurrent neural network for automatic essay scoring. Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications, 93\u2013102. https:\/\/doi.org\/10.18653\/v1\/W18-3713","DOI":"10.18653\/v1\/W18-3713"},{"key":"426_CR24","doi-asserted-by":"publisher","unstructured":"Deane, P., Yan, D., Castellano, K., Attali, Y., Lamar, M., Zhang, M., Blood, I., Bruno, J. V., Li, C., [Chen], Cui, W., Ruan, C., Appel, C., James, K., Long, R., & Qureshi, F. (2024). Modeling writing traits in a formative essay Corpus. ETS Research Report Series, Article ets2.12377. https:\/\/doi.org\/10.1002\/ets2.12377. Advance online publication","DOI":"10.1002\/ets2.12377"},{"key":"426_CR25","doi-asserted-by":"publisher","unstructured":"Ding, Y., Riordan, B., Horbach, A., Cahill, A., & Zesch, T. (2020). Don\u2019t take nswvtnvakgxpm for an answer \u2013The surprising vulnerability of automatic content scoring systems to adversarial input. In D. Scott, N. Bel, & C. Zong (Eds.), Proceedings of the 28th International Conference on Computational Linguistics (pp. 882\u2013892). International Committee on Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2020.coling-main.76","DOI":"10.18653\/v1\/2020.coling-main.76"},{"key":"426_CR26","unstructured":"Doewes, A., Kurdhi, N., & Saxena, A. (2023). Evaluating quadratic weighted kappa as the standard performance\u00a0metric for automated essay scoring. In 16th International Conference on Educational Data Mining, EDM 2023 (pp.\u00a0103\u2013113). International Educational Data Mining Society (IEDMS)."},{"issue":"1","key":"426_CR27","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1111\/emip.12537","volume":"42","author":"T Firoozi","year":"2023","unstructured":"Firoozi, T., Mohammadi, H., & Gierl, M. J. (2023). Using active learning methods to strategically select essays for automated scoring. Educational Measurement: Issues and Practice,42(1), 34\u201343. https:\/\/doi.org\/10.1111\/emip.12537","journal-title":"Educational Measurement: Issues and Practice"},{"key":"426_CR28","doi-asserted-by":"publisher","first-page":"100420","DOI":"10.1016\/j.asw.2019.100420","volume":"43","author":"J Fleckenstein","year":"2020","unstructured":"Fleckenstein, J., Keller, S., Kr\u00fcger, M., Tannenbaum, R. J., & K\u00f6ller, O. (2020). Linking TOEFL iBT\u00ae writing rubrics to CEFR levels: Cut scores and validity evidence from a standard setting study. Assessing Writing,43, 100420. https:\/\/doi.org\/10.1016\/j.asw.2019.100420","journal-title":"Assessing Writing"},{"key":"426_CR29","doi-asserted-by":"publisher","first-page":"562462","DOI":"10.3389\/fpsyg.2020.562462","volume":"11","author":"J Fleckenstein","year":"2020","unstructured":"Fleckenstein, J., Meyer, J., Jansen, T., Keller, S., & K\u00f6ller, O. (2020). Is a long essay always a good essay? The effect of text length on writing Assessment. Frontiers in Psychology,11, 562462. https:\/\/doi.org\/10.3389\/fpsyg.2020.562462","journal-title":"Frontiers in Psychology"},{"key":"426_CR30","first-page":"251","volume-title":"Handbook on automated essay evaluation: Current applications and new directions","author":"M Gamon","year":"2013","unstructured":"Gamon, M., Chodorow, M., Leacock, C., & Tetreault, J. (2013). Grammatical error detection in Automatic Essay Scoring and Feedback. In M. D. Shermis & J. C. Burstein (Eds.), Handbook on automated essay evaluation: Current applications and new directions (pp. 251\u2013266). Routledge Academic."},{"key":"426_CR31","doi-asserted-by":"publisher","unstructured":"Horbach, A., & Palmer, A. (2016). Investigating Active Learning for Short-Answer Scoring. In J. Tetreault, J. C. Burstein, C. Leacock, & H. Yannakoudakis (Eds.), Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 301\u2013311). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/W16-0535","DOI":"10.18653\/v1\/W16-0535"},{"key":"426_CR32","doi-asserted-by":"publisher","unstructured":"Horbach, A., Scholten-Akoun, D., Ding, Y., & Zesch, T. (2017). Fine-grained essay scoring of a complex writing task for native speakers.\u00a0In J. Tetreault, J. Burstein, C. Leacock, & H. Yannakoudakis (Eds.),\u00a0Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (pp.\u00a0357\u2013366).\u00a0Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/W17-5040","DOI":"10.18653\/v1\/W17-5040"},{"key":"426_CR33","doi-asserted-by":"publisher","first-page":"e208","DOI":"10.7717\/peerj-cs.208","volume":"5","author":"MA Hussein","year":"2019","unstructured":"Hussein, M. A., Hassan, H., & Nassef, M. (2019). Automated language essay scoring systems: A literature review. PeerJ Computer Science,5, e208. https:\/\/doi.org\/10.7717\/peerj-cs.208","journal-title":"PeerJ Computer Science"},{"issue":"5","key":"426_CR34","doi-asserted-by":"publisher","first-page":"3299","DOI":"10.1007\/s10462-020-09948-w","volume":"54","author":"M Injadat","year":"2021","unstructured":"Injadat, M., Moubayed, A., Nassif, A. B., & Shami, A. (2021). Machine learning towards intelligent systems: Applications, challenges, and opportunities. Artificial Intelligence Review,54(5), 3299\u20133348. https:\/\/doi.org\/10.1007\/s10462-020-09948-w","journal-title":"Artificial Intelligence Review"},{"issue":"s1","key":"426_CR35","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1111\/j.1467-9922.2012.00739.x","volume":"63","author":"S Jarvis","year":"2013","unstructured":"Jarvis, S. (2013). Capturing the diversity in lexical diversity. Language Learning,63(s1), 87\u2013106. https:\/\/doi.org\/10.1111\/j.1467-9922.2012.00739.x","journal-title":"Language Learning"},{"key":"426_CR36","doi-asserted-by":"publisher","unstructured":"Ke, Z., & Ng, V. (2019). Automated essay scoring: A survey of the state of the art. In T. Eiter & S.\u00a0Kraus (Eds.), Proceedings of the Twenty-Eighth International Joint Conference on Artificial\u00a0Intelligence (pp. 6300\u20136308). International Joint Conferences on Artificial Intelligence Organization.\u00a0https:\/\/doi.org\/10.24963\/ijcai.2019\/879","DOI":"10.24963\/ijcai.2019\/879"},{"key":"426_CR37","doi-asserted-by":"publisher","first-page":"100700","DOI":"10.1016\/j.jslw.2019.100700","volume":"48","author":"SD Keller","year":"2020","unstructured":"Keller, S. D., Fleckenstein, J., Kr\u00fcger, M., K\u00f6ller, O., & Rupp, A. A. (2020). English writing skills of students in upper secondary education: Results from an empirical study in Switzerland and Germany. Journal of Second Language Writing,48, 100700. https:\/\/doi.org\/10.1016\/j.jslw.2019.100700","journal-title":"Journal of Second Language Writing"},{"key":"426_CR38","doi-asserted-by":"publisher","first-page":"101129","DOI":"10.1016\/j.jslw.2024.101129","volume":"65","author":"SD Keller","year":"2024","unstructured":"Keller, S. D., Lohmann, J., Tr\u00fcb, R., Fleckenstein, J., Meyer, J., Jansen, T., & M\u00f6ller, J. (2024). Language quality, content, structure: What analytic ratings tell us about EFL writing skills at upper secondary school level in Germany and Switzerland. Journal of Second Language Writing,65, 101129. https:\/\/doi.org\/10.1016\/j.jslw.2024.101129","journal-title":"Journal of Second Language Writing"},{"issue":"3","key":"426_CR39","doi-asserted-by":"publisher","first-page":"538","DOI":"10.1007\/s40593-020-00211-5","volume":"31","author":"VS Kumar","year":"2021","unstructured":"Kumar, V. S., & Boulanger, D. (2021). Automated essay scoring and the deep learning black box: How are rubric scores determined? International Journal of Artificial Intelligence in Education,31(3), 538\u2013584. https:\/\/doi.org\/10.1007\/s40593-020-00211-5","journal-title":"International Journal of Artificial Intelligence in Education"},{"key":"426_CR40","doi-asserted-by":"publisher","unstructured":"Kusuma, J. S., Halim, K., Pranoto, E. J. P., Kanigoro, B., & Irwansyah, E. (2022). Automated Essay Scoring Using Machine Learning. In 2022 4th International Conference on Cybernetics and Intelligent System (ICORIS) (pp. 1\u20135). IEEE. https:\/\/doi.org\/10.1109\/ICORIS56080.2022.10031338","DOI":"10.1109\/ICORIS56080.2022.10031338"},{"issue":"3","key":"426_CR41","doi-asserted-by":"publisher","first-page":"1030","DOI":"10.3758\/s13428-017-0924-4","volume":"50","author":"K Kyle","year":"2018","unstructured":"Kyle, K., Crossley, S. A., & Berger, C. (2018). The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behavior Research Methods,50(3), 1030\u20131046. https:\/\/doi.org\/10.3758\/s13428-017-0924-4","journal-title":"Behavior Research Methods"},{"key":"426_CR42","doi-asserted-by":"publisher","unstructured":"Lagakis, P., & Demetriadis, S. (2021). Automated essay scoring: A review of the field. In 2021 International Conference on Computer, Information and Telecommunication Systems (CITS) (pp. 1\u20136). IEEE. https:\/\/doi.org\/10.1109\/CITS52676.2021.9618476","DOI":"10.1109\/CITS52676.2021.9618476"},{"key":"426_CR43","doi-asserted-by":"publisher","unstructured":"Lample, G., & Conneau, A. (2019). Cross-lingual Language Model Pretraining.https:\/\/doi.org\/10.48550\/arXiv.1901.07291","DOI":"10.48550\/arXiv.1901.07291"},{"key":"426_CR44","doi-asserted-by":"publisher","unstructured":"Lewis, M., Liu, Y., [Yinhan], Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2019). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.https:\/\/doi.org\/10.48550\/arXiv.1910.13461","DOI":"10.48550\/arXiv.1910.13461"},{"key":"426_CR45","volume-title":"Many-facet rasch measurement","author":"JM Linacre","year":"1994","unstructured":"Linacre, J. M. (1994). Many-facet rasch measurement (2nd ed.). Mesa Press.","edition":"2"},{"key":"426_CR46","volume-title":"Facets (Version 3.82.1)","author":"JM Linacre","year":"2019","unstructured":"Linacre, J. M. (2019). Facets (Version 3.82.1). [Computer software]."},{"key":"426_CR47","unstructured":"Mathias, S., & Bhattacharyya, P. (2018). ASAP++: Enriching the ASAP automated essay grading dataset with essay attribute scores. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). https:\/\/aclanthology.org\/L18-1187. Accessed\u00a010.12.2023."},{"key":"426_CR48","doi-asserted-by":"publisher","unstructured":"Mathias, S., & Bhattacharyya, P. (2020). Can Neural Networks Automatically Score Essay Traits? In J. C. Burstein, E. Kochmar, C. Leacock, N. Madnani, I. Pil\u00e1n, H. Yannakoudakis, & T. Zesch (Eds.), Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 85\u201391). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2020.bea-1.8","DOI":"10.18653\/v1\/2020.bea-1.8"},{"key":"426_CR49","doi-asserted-by":"publisher","unstructured":"Mayfield, E., & Black, A. W. (2020). Should You Fine-Tune BERT for Automated Essay Scoring? In J. C. Burstein, E. Kochmar, C. Leacock, N. Madnani, I. Pil\u00e1n, H. Yannakoudakis, & T. Zesch (Eds.), Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 151\u2013162). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2020.bea-1.15","DOI":"10.18653\/v1\/2020.bea-1.15"},{"key":"426_CR50","doi-asserted-by":"publisher","unstructured":"McNamara, D. S., Graesser, A. C., McCarthy, P. M., & Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix. Cambridge University Press. https:\/\/doi.org\/10.1017\/CBO9780511894664","DOI":"10.1017\/CBO9780511894664"},{"key":"426_CR51","doi-asserted-by":"publisher","unstructured":"Mesgar, M., & Strube, M. (2018). A Neural Local Coherence Model for Text Quality Assessment. In E. Riloff, D. Chiang, J. Hockenmaier, & J. Tsujii (Eds.), Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 4328\u20134339). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/D18-1464","DOI":"10.18653\/v1\/D18-1464"},{"key":"426_CR52","doi-asserted-by":"publisher","unstructured":"Mitkov, R., & Voutilainen, A. (2012). Part-of-Speech Tagging (Vol. 1). Oxford University Press. https:\/\/doi.org\/10.1093\/oxfordhb\/9780199276349.013.0011","DOI":"10.1093\/oxfordhb\/9780199276349.013.0011"},{"issue":"2","key":"426_CR53","doi-asserted-by":"publisher","first-page":"100050","DOI":"10.1016\/j.rmal.2023.100050","volume":"2","author":"A Mizumoto","year":"2023","unstructured":"Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics,2(2), 100050. https:\/\/doi.org\/10.1016\/j.rmal.2023.100050","journal-title":"Research Methods in Applied Linguistics"},{"key":"426_CR54","doi-asserted-by":"publisher","unstructured":"Nadeem, F., Nguyen, H., Liu, Y., [Yang], & Ostendorf, M. (2019). Automated Essay Scoring with Discourse-Aware Neural Models. In H. Yannakoudakis, E. Kochmar, C. Leacock, N. Madnani, I. Pil\u00e1n, & T. Zesch (Eds.), Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 484\u2013493). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/W19-4450","DOI":"10.18653\/v1\/W19-4450"},{"issue":"3","key":"426_CR55","doi-asserted-by":"publisher","first-page":"138","DOI":"10.1111\/j.1749-818X.2010.00187.x","volume":"4","author":"J Nivre","year":"2010","unstructured":"Nivre, J. (2010). Dependency parsing. Language and Linguistics Compass,4(3), 138\u2013152. https:\/\/doi.org\/10.1111\/j.1749-818X.2010.00187.x","journal-title":"Language and Linguistics Compass"},{"key":"426_CR56","doi-asserted-by":"publisher","first-page":"104","DOI":"10.1016\/j.asw.2014.05.001","volume":"21","author":"L Perelman","year":"2014","unstructured":"Perelman, L. (2014). When the state of the art is counting words. Assessing Writing,21, 104\u2013111. https:\/\/doi.org\/10.1016\/j.asw.2014.05.001","journal-title":"Assessing Writing"},{"key":"426_CR57","doi-asserted-by":"crossref","unstructured":"Pitler, E., & Nenkova, A. (2008). Revisiting readability: A unified framework for predicting text quality. In M. Lapata & H. T. Ng (Eds.), Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (pp. 186\u2013195).\u00a0Association for Computational Linguistics.","DOI":"10.3115\/1613715.1613742"},{"issue":"3","key":"426_CR58","doi-asserted-by":"publisher","first-page":"2495","DOI":"10.1007\/s10462-021-10068-2","volume":"55","author":"D Ramesh","year":"2022","unstructured":"Ramesh, D., & Sanampudi, S. K. (2022). An automated essay scoring systems: A systematic literature review. Artificial Intelligence Review,55(3), 2495\u20132527. https:\/\/doi.org\/10.1007\/s10462-021-10068-2","journal-title":"Artificial Intelligence Review"},{"issue":"2","key":"426_CR59","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1017\/S0305000900012885","volume":"14","author":"B Richards","year":"1987","unstructured":"Richards, B. (1987). Type\/token ratios: What do they really tell us? Journal of Child Language,14(2), 201\u2013209. https:\/\/doi.org\/10.1017\/S0305000900012885","journal-title":"Journal of Child Language"},{"issue":"1","key":"426_CR60","first-page":"101","volume":"60","author":"A Robitzsch","year":"2018","unstructured":"Robitzsch, A., & Steinfeld, J. (2018). Item response models for human ratings: Item response models for human ratings: Overview, estimation methods, and implementation in R. Psychological Test and Assessment Modeling, 60(1), 101\u2013139.","journal-title":"Psychological Test and Assessment Modeling"},{"key":"426_CR61","doi-asserted-by":"publisher","unstructured":"Rodriguez, P. U., Jafari, A., & Ormerod, C. M. (2019). Language models and Automated Essay Scoring.https:\/\/doi.org\/10.48550\/arXiv.1909.09482","DOI":"10.48550\/arXiv.1909.09482"},{"issue":"1","key":"426_CR62","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1002\/ets2.12249","volume":"2019","author":"AA Rupp","year":"2019","unstructured":"Rupp, A. A., Casabianca, J. M., Kr\u00fcger, M., Keller, S., & K\u00f6ller, O. (2019). Automated essay scoring at scale: A case study in Switzerland and Germany. ETS Research Report Series,2019(1), 1\u201323. https:\/\/doi.org\/10.1002\/ets2.12249","journal-title":"ETS Research Report Series"},{"key":"426_CR63","unstructured":"Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. https:\/\/arxiv.org\/pdf\/1910.01108v4. Accessed 10.12.2023."},{"key":"426_CR64","first-page":"pp. 210","volume-title":"Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024) (pp. 210\u2013221).","author":"NJ Schaller","year":"2024","unstructured":"Schaller, N. J., Ding, Y., Horbach, A., Meyer, J., & Jansen, T. (2024). Fairness in Automated Essay Scoring: A Fairness in Automated Essay Scoring: A Comparative Analysis of Algorithms on German Learner Essays from Secondary Education. In E. Kochmar, M. Bexte, J. C. Burstein, A. Horbach, R. Laarmann-Quante, A. Tack, V. Yaneva, & Z. Yuan (Eds.), Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024) (pp. 210\u2013221). (pp. 210\u2013221)"},{"key":"426_CR65","doi-asserted-by":"publisher","unstructured":"Shen, D., Wang, G., Wang, W., Min, M. R., Su, Q., Zhang, Y., Li, C., [Chunyuan], Henao, R., & Carin, L. (2018). Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms.https:\/\/doi.org\/10.48550\/arXiv.1805.09843","DOI":"10.48550\/arXiv.1805.09843"},{"key":"426_CR66","doi-asserted-by":"publisher","unstructured":"Shermis, M. D., & Burstein, J. C. (2003). Automated essay Scoring. Routledge. https:\/\/doi.org\/10.4324\/9781410606860","DOI":"10.4324\/9781410606860"},{"key":"426_CR67","doi-asserted-by":"publisher","unstructured":"Taghipour, K., & Ng, H. T. (2016). A Neural Approach to Automated Essay Scoring. In J. Su, K. Duh, & X. Carreras (Eds.), Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 1882\u20131891). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/d16-1193","DOI":"10.18653\/v1\/d16-1193"},{"key":"426_CR68","volume-title":"TensorFlow [Computer software]","author":"TensorFlow Developers","year":"2024","unstructured":"TensorFlow Developers. (2024). TensorFlow [Computer software]. Zenodo."},{"issue":"2","key":"426_CR69","doi-asserted-by":"publisher","first-page":"459","DOI":"10.1007\/s41237-021-00142-y","volume":"48","author":"M Uto","year":"2021","unstructured":"Uto, M. (2021). A review of deep-neural automated essay scoring models. Behaviormetrika,48(2), 459\u2013484. https:\/\/doi.org\/10.1007\/s41237-021-00142-y","journal-title":"Behaviormetrika"},{"key":"426_CR70","doi-asserted-by":"publisher","unstructured":"Uto, M., & Okano, M. (2020). Robust Neural Automated Essay Scoring Using Item Response Theory. In I. I. Bittencourt, M. Cukurova, K. Muldner, R. Luckin, & E. Mill\u00e1n (Eds.), Lecture Notes in Computer Science. Artificial Intelligence in Education (Vol. 12163, pp. 549\u2013561). Springer International Publishing. https:\/\/doi.org\/10.1007\/978-3-030-52237-7_44","DOI":"10.1007\/978-3-030-52237-7_44"},{"key":"426_CR71","doi-asserted-by":"publisher","first-page":"pp. 6077","DOI":"10.18653\/v1\/2020.coling-main.535","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics","author":"M Uto","year":"2020","unstructured":"Uto, M., Xie, Y., & Ueno, M. (2020). Neural automated essay scoring incorporating handcrafted features. Proceedings of the 28th International Conference on Computational Linguistics (pp. 6077\u20136088)"},{"key":"426_CR72","doi-asserted-by":"publisher","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is all You Need. https:\/\/doi.org\/10.48550\/arXiv.1706.03762","DOI":"10.48550\/arXiv.1706.03762"},{"key":"426_CR73","doi-asserted-by":"publisher","unstructured":"Wang, X., Lee, Y., & Park, J. (2022). Automated Evaluation for Student Argumentative Writing: A Survey.https:\/\/doi.org\/10.48550\/arXiv.2205.04083","DOI":"10.48550\/arXiv.2205.04083"},{"key":"426_CR74","doi-asserted-by":"publisher","unstructured":"Weigle, S. C. (2002). Assessing writing. Cambridge University Press. https:\/\/doi.org\/10.1017\/CBO9780511732997","DOI":"10.1017\/CBO9780511732997"},{"key":"426_CR75","doi-asserted-by":"publisher","first-page":"125403","DOI":"10.1109\/ACCESS.2021.3110683","volume":"9","author":"J Xue","year":"2021","unstructured":"Xue, J., Tang, X., & Zheng, L. (2021). A hierarchical BERT-based transfer learning approach for multi-dimensional essay scoring. Ieee Access\u202f: Practical Innovations, Open Solutions,9, 125403\u2013125415. https:\/\/doi.org\/10.1109\/ACCESS.2021.3110683","journal-title":"Ieee Access : Practical Innovations, Open Solutions"},{"key":"426_CR76","doi-asserted-by":"crossref","unstructured":"Yan, D. (2020). Handbook of automated scoring: Theory into practice. Chapman and Hall\/CRC statistics in the social and behavioral sciences ser. CRC Press LLC. https:\/\/ebookcentral.proquest.com\/lib\/kxp\/detail.action?docID=6124217","DOI":"10.1201\/9781351264808"},{"key":"426_CR77","doi-asserted-by":"publisher","unstructured":"Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le, Q. V. (2019). XLNet: Generalized Autoregressive Pretraining for Language Understanding.https:\/\/doi.org\/10.48550\/arXiv.1906.08237","DOI":"10.48550\/arXiv.1906.08237"},{"key":"426_CR78","doi-asserted-by":"publisher","unstructured":"Zesch, T., Wojatzki, M., & Scholten-Akoun, D. (2015). Task-Independent Features for Automated Essay Grading. In J. Tetreault, J. C. Burstein, & C. Leacock (Eds.), Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 224\u2013232). Association for Computational Linguistics. https:\/\/doi.org\/10.3115\/v1\/W15-0626","DOI":"10.3115\/v1\/W15-0626"}],"container-title":["International Journal of Artificial Intelligence in Education"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-024-00426-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40593-024-00426-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-024-00426-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T18:12:36Z","timestamp":1772647956000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40593-024-00426-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,13]]},"references-count":78,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,9]]}},"alternative-id":["426"],"URL":"https:\/\/doi.org\/10.1007\/s40593-024-00426-w","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-3979182\/v1","asserted-by":"object"}]},"ISSN":["1560-4292","1560-4306"],"issn-type":[{"value":"1560-4292","type":"print"},{"value":"1560-4306","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,9,13]]},"assertion":[{"value":"17 August 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 September 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests"}}]}}