{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T16:05:31Z","timestamp":1776355531255,"version":"3.51.2"},"reference-count":108,"publisher":"Elsevier BV","issue":"5","license":[{"start":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T00:00:00Z","timestamp":1753315200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T00:00:00Z","timestamp":1753315200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"DIPF | Leibniz-Institut f\u00fcr Bildungsforschung und Bildungsinformation"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Artif Intell Educ"],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Equal treatment of groups and individuals is crucial for fair assessment and demand unbiased scoring decisions. We examined algorithmic fairness focusing on demographic disparities between groups of different gender and language use based on automatic scoring (\n                    <jats:inline-formula>\n                      <jats:tex-math>$$\\:n\\:=\\:\\text{38,722}$$<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    text responses). We tested various combinations of semantic representations and classification methods on responses to reading comprehension items from the 2015 German PISA assessment. Classifications from the most accurate method, namely a Support Vector Machine trained with RoBERTa embeddings, exhibited no discernible gender differences, but a minor significant bias in the automatic scoring of students based on their language background. Specifically, students speaking mainly a foreign language at home received significantly higher automatic scores than their actual performance warranted, thereby gaining a relative advantage from the machine scoring system. Lower performing groups with more incorrect responses tend to receive more correct scores because incorrect responses are generally less likely to be recognized. Differences are particularly evident at the item level, where we identified several factors that promote algorithmic unfairness such as scoring accuracy, student performance, linguistic diversity of text responses, and the psychometrically determined item difficulty.\n                  <\/jats:p>","DOI":"10.1007\/s40593-025-00495-5","type":"journal-article","created":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T20:46:04Z","timestamp":1753389964000},"page":"3128-3165","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Algorithmic Fairness in Automatic Short Answer Scoring"],"prefix":"10.1016","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8333-9071","authenticated-orcid":false,"given":"Nico","family":"Andersen","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1660-4576","authenticated-orcid":false,"given":"Julia","family":"Mang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0289-9534","authenticated-orcid":false,"given":"Frank","family":"Goldhammer","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3512-1403","authenticated-orcid":false,"given":"Fabian","family":"Zehner","sequence":"additional","affiliation":[]}],"member":"78","published-online":{"date-parts":[[2025,7,24]]},"reference":[{"issue":"2","key":"495_CR1","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1007\/s11145-018-9859-0","volume":"32","author":"AM Adams","year":"2019","unstructured":"Adams, A. M., & Simmons, F. R. (2019). Exploring individual and gender differences in early writing performance. Reading and Writing, 32(2), 235\u2013263. https:\/\/doi.org\/10.1007\/s11145-018-9859-0","journal-title":"Reading and Writing"},{"key":"495_CR2","doi-asserted-by":"publisher","unstructured":"Al-Saadi, Z. (2020). Gender differences in writing: The mediating effect of language proficiency and writing fluency in text quality. Cogent Education, 7(1). https:\/\/doi.org\/10.1080\/2331186X.2020.1770923. Article 1770923.","DOI":"10.1080\/2331186X.2020.1770923"},{"key":"495_CR3","unstructured":"American educational research association, American psychological association, & National council on measurement in. (2014). In education (Ed.), Standards for educational and psychological testing. American Educational Research Association. https:\/\/www.apa.org\/science\/programs\/testing\/standards"},{"issue":"3","key":"495_CR4","doi-asserted-by":"publisher","first-page":"841","DOI":"10.1111\/jcal.12717","volume":"39","author":"N Andersen","year":"2023","unstructured":"Andersen, N., Zehner, F., & Goldhammer, F. (2023). Semi-automatic coding of open-ended text responses in large-scale assessments. Journal of Computer Assisted Learning, 39(3), 841\u2013854. https:\/\/doi.org\/10.1111\/jcal.12717","journal-title":"Journal of Computer Assisted Learning"},{"issue":"3","key":"495_CR5","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1515\/text.2003.014","volume":"23","author":"S Argamon","year":"2003","unstructured":"Argamon, S., Fine, J., & Shimoni, A. (2003). Gender, genre, and writing style in formal written texts. Text & Talk, 23(3), 321\u2013346. https:\/\/doi.org\/10.1515\/text.2003.014","journal-title":"Text & Talk"},{"key":"495_CR6","unstructured":"Artelt, C., Naumann, J., & Schneider, J. (2010). Lesemotivation und Lernstrategien. In E. Klieme, C. Artelt, J. Hartig, N. Jude, O. K\u00f6ller, M. Prenzel, W. Schneider, & P. Stanat (Eds.), Lesemotivation und Lernstrategien (pp. 73\u2013112). Waxmann. https:\/\/www.doi.org\/0.25656\/01:3531."},{"issue":"2","key":"495_CR7","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1007\/s11133-019-9413-7","volume":"42","author":"P Aspers","year":"2019","unstructured":"Aspers, P., & Corte, U. (2019). What is qualitative in qualitative research. Qualitative Sociology, 42(2), 139\u2013160. https:\/\/doi.org\/10.1007\/s11133-019-9413-7","journal-title":"Qualitative Sociology"},{"key":"495_CR8","unstructured":"Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater\u00ae V.2. The Journal of Technology Learning and Assessment, 4(3), Article3. https:\/\/www.ejournals.bc.edu\/index.php\/jtla\/article\/view\/1650"},{"key":"495_CR9","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1162\/tacl_a_00236","volume":"1","author":"S Basu","year":"2013","unstructured":"Basu, S., Jacobs, C., & Vanderwende, L. (2013). Powergrading: A clustering approach to amplify human effort for short answer grading. Transactions of the Association for Computational Linguistics, 1, 391\u2013402. https:\/\/doi.org\/10.1162\/tacl_a_00236","journal-title":"Transactions of the Association for Computational Linguistics"},{"issue":"3","key":"495_CR10","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1111\/j.1745-3992.2012.00238.x","volume":"31","author":"II Bejar","year":"2012","unstructured":"Bejar, I. I. (2012). Rater cognition: Implications for validity. Educational Measurement: Issues and Practice, 31(3), 2\u20139. https:\/\/doi.org\/10.1111\/j.1745-3992.2012.00238.x","journal-title":"Educational Measurement: Issues and Practice"},{"key":"495_CR11","unstructured":"Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In D. D. Lee, R. Garnett, U. von Luxburg, R. Garnett, M. Sugiyama, & I. Guyon (Eds.), NIPS\u201916: Proceedings of the 30th International Conference on Neural Information Processing Systems (Vol. 29, pp. 1\u20139). Curran Associates, Inc. https:\/\/papers.nips.cc\/paper_files\/paper\/2016\/hash\/a486cd07e4ac3d270571622f4f316ec5-Abstract.html"},{"key":"495_CR12","doi-asserted-by":"publisher","unstructured":"Bonefeld, M., & Dickhauser, O. (2018). (Biased) Grading of students\u2019 performance: students\u2019 names, performance level, and implicit attitudes. Frontiers in Psychology, 9, Article 481. https:\/\/doi.org\/10.3389\/fpsyg.2018.00481","DOI":"10.3389\/fpsyg.2018.00481"},{"key":"495_CR13","unstructured":"Bridgeman, B., Trapani, C., & Attali, Y. (2009, April 13). Considering fairness and validity in evaluating automated scoring [Paper presentation]. National Council on Measurement in Education (NCME), San Diego, CA."},{"key":"495_CR14","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1613\/jair.1.12228","volume":"70","author":"N Burkart","year":"2021","unstructured":"Burkart, N., & Huber, M. F. (2021). A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research, 70, 245\u2013317. https:\/\/doi.org\/10.1613\/jair.1.12228","journal-title":"Journal of Artificial Intelligence Research"},{"issue":"6","key":"495_CR15","doi-asserted-by":"publisher","first-page":"691","DOI":"10.1080\/00224540209603929","volume":"142","author":"V Burr","year":"2002","unstructured":"Burr, V. (2002). Judging gender from samples of adult handwriting: Accuracy and use of cues. The Journal of Social Psychology, 142(6), 691\u2013700. https:\/\/doi.org\/10.1080\/00224540209603929","journal-title":"The Journal of Social Psychology"},{"issue":"1","key":"495_CR16","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1007\/s40593-014-0026-8","volume":"25","author":"S Burrows","year":"2015","unstructured":"Burrows, S., Gurevych, I., & Stein, B. (2015). The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25(1), 60\u2013117. https:\/\/doi.org\/10.1007\/s40593-014-0026-8","journal-title":"International Journal of Artificial Intelligence in Education"},{"key":"495_CR17","doi-asserted-by":"crossref","unstructured":"Burstein, J., & Chodorow, M. (1999). Automated essay scoring for nonnative english speakers. In marl broman olsen (Ed.), Computer Mediated Language Assessment and Evaluation in Natural Language Processing (pp. 68\u201375). https:\/\/aclanthology.org\/W99-0411","DOI":"10.3115\/1598834.1598847"},{"issue":"3","key":"495_CR18","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1609\/aimag.v25i3.1774","volume":"25","author":"J Burstein","year":"2004","unstructured":"Burstein, J., Chodorow, M., & Leacock, C. (2004). Automated essay evaluation: The criterion online writing service. AI Magazine, 25(3), 27\u201327. https:\/\/doi.org\/10.1609\/aimag.v25i3.1774","journal-title":"AI Magazine"},{"issue":"6334","key":"495_CR19","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1126\/science.aal4230","volume":"356","author":"A Caliskan","year":"2017","unstructured":"Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from Language corpora contain human-like biases. Science, 356(6334), 183\u2013186. https:\/\/doi.org\/10.1126\/science.aal4230","journal-title":"Science"},{"key":"495_CR20","doi-asserted-by":"publisher","unstructured":"Camus, L., & Filighera, A. (2020). Investigating transformers for automatic short answer grading. In I. I. Bittencourt, M. Cukurova, K. Muldner, R. Luckin, & E. Mill\u00e1n (Eds.), Artificial Intelligence in Education (Vol. 12164, pp. 43\u201348). Springer. https:\/\/doi.org\/10.1007\/978-3-030-52240-7_8","DOI":"10.1007\/978-3-030-52240-7_8"},{"key":"495_CR21","doi-asserted-by":"publisher","unstructured":"Chen, J., Kallus, N., Mao, X., Svacha, G., & Udell, M. (2019). Fairness under unawareness: assessing disparity when protected class is unobserved. Proceedings of the Conference on Fairness Accountability and Transparency, 339-348. https:\/\/doi.org\/10.1145\/3287560.3287594","DOI":"10.1145\/3287560.3287594"},{"key":"495_CR22","doi-asserted-by":"publisher","unstructured":"Chen, L., Chen, P., & Lin, Z. (2020). Artificial intelligence in education: a review. IEEE Access, 8, 75264\u201375278. IEEE Access. https:\/\/doi.org\/10.1109\/ACCESS.2020.2988510","DOI":"10.1109\/ACCESS.2020.2988510"},{"issue":"1","key":"495_CR23","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1177\/001316446002000104","volume":"20","author":"J Cohen","year":"1960","unstructured":"Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37\u201346. https:\/\/doi.org\/10.1177\/001316446002000104","journal-title":"Educational and Psychological Measurement"},{"issue":"6","key":"495_CR24","doi-asserted-by":"publisher","first-page":"474","DOI":"10.1080\/01434639708666335","volume":"18","author":"VJ Cook","year":"1997","unstructured":"Cook, V. J. (1997). L2 users and english spelling. Journal of Multilingual and Multicultural Development, 18(6), 474\u2013488. https:\/\/doi.org\/10.1080\/01434639708666335","journal-title":"Journal of Multilingual and Multicultural Development"},{"issue":"3","key":"495_CR25","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1007\/BF00994018","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273\u2013297. https:\/\/doi.org\/10.1007\/BF00994018","journal-title":"Machine Learning"},{"issue":"1","key":"495_CR26","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1111\/j.1467-1770.1989.tb00592.x","volume":"39","author":"A Cumming","year":"1989","unstructured":"Cumming, A. (1989). Writing expertise and Second-Language proficiency. Language Learning, 39(1), 81\u2013135. https:\/\/doi.org\/10.1111\/j.1467-1770.1989.tb00592.x","journal-title":"Language Learning"},{"key":"495_CR27","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2302.13007","author":"H Dai","year":"2023","unstructured":"Dai, H., Liu, Z., Liao, W., Huang, X., Cao, Y., Wu, Z., Zhao, L., Xu, S., Liu, W., Liu, N., Li, S., Zhu, D., Cai, H., Sun, L., Li, Q., Shen, D., Liu, T., & Li, X. (2023). AugGPT: Leveraging ChatGPT for text data augmentation. ArXiv. https:\/\/doi.org\/10.48550\/ArXiv.2302.13007","journal-title":"ArXiv"},{"key":"495_CR28","doi-asserted-by":"publisher","unstructured":"Deerwester, S., Dumais, S. T., Furnas, G. W., & Landauer, T. K. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391\u2013407. https:\/\/doi.org\/10.1002\/(SICI)1097-4571(199009)41:65%3C391::AID-ASI1%3E3.0.CO;2-9","DOI":"10.1002\/(SICI)1097-4571(199009)41:65%3C391::AID-ASI1%3E3.0.CO;2-9"},{"key":"495_CR29","doi-asserted-by":"publisher","unstructured":"Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Vol. 1, pp. 4171\u20134186). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/N19-1423","DOI":"10.18653\/v1\/N19-1423"},{"issue":"3","key":"495_CR30","doi-asserted-by":"publisher","first-page":"355","DOI":"10.1111\/j.1468-0475.2005.00137.x","volume":"6","author":"H Entorf","year":"2005","unstructured":"Entorf, H., & Minoiu, N. (2005). What a difference immigration policy makes: A comparison of PISA scores in Europe and traditional countries of immigration. German Economic Review, 6(3), 355\u2013376. https:\/\/doi.org\/10.1111\/j.1468-0475.2005.00137.x","journal-title":"German Economic Review"},{"key":"495_CR31","unstructured":"Erickson, J. A., Botelho, A. F., Peng, Z., Huang, R., Kasal, M. V., & Heffernan, N. (2021). Is It fair? automated open response grading. In Proceedings of the 14th Educational Data Mining conference. Educational Data Mining. https:\/\/educationaldatamining.org\/EDM2021\/virtual\/poster_paper214.html"},{"key":"495_CR32","unstructured":"European Commission (2020). WHITE PAPER On Artificial Intelligence\u2014A European approach to excellence and trust. https:\/\/eur-lex.europa.eu\/legal-content\/EN\/ALL\/?uri=CELEX:52020DC0065"},{"key":"495_CR33","doi-asserted-by":"publisher","unstructured":"Flor, M., Futagi, Y., Lopez, M., & Mulholland, M. (2015). Patterns of misspellings in L2 and L1 english: A view from the ETS spelling Corpus. Bergen Language and Linguistics Studies, 6. https:\/\/doi.org\/10.15845\/bells.v6i0.811","DOI":"10.15845\/bells.v6i0.811"},{"issue":"6","key":"495_CR34","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1145\/3345317","volume":"52","author":"T Folt\u00fdnek","year":"2019","unstructured":"Folt\u00fdnek, T., Meuschke, N., & Gipp, B. (2019). Academic plagiarism detection: A systematic literature review. ACM Computing Surveys, 52(6), 112. https:\/\/doi.org\/10.1145\/3345317","journal-title":"ACM Computing Surveys"},{"key":"495_CR35","doi-asserted-by":"publisher","unstructured":"Fraillon, J., Ainley, J., Schulz, W., Duckworth, D., & Friedman, T. (2019). Computer and information literacy framework. In J. Fraillon, J. Ainley, W. Schulz, D. Duckworth, & T. Friedman (Eds.), IEA International Computer and Information Literacy Study 2018 Assessment Framework (pp. 13\u201323). Springer International Publishing. https:\/\/doi.org\/10.1007\/978-3-030-19389-8_2","DOI":"10.1007\/978-3-030-19389-8_2"},{"key":"495_CR36","doi-asserted-by":"publisher","unstructured":"Galhardi, L. B., & Brancher, J. D. (2018). Machine learning approach for automatic short answer grading: A systematic review. In G. R. Simari, E. Ferm\u00e9, F. Guti\u00e9rrez Segura, & J. A. Rodr\u00edguez Melquiades (Eds.), Advances in Artificial Intelligence\u2014IBERAMIA 2018 (pp. 380\u2013391). Springer International Publishing. https:\/\/doi.org\/10.1007\/978-3-030-03928-8_31","DOI":"10.1007\/978-3-030-03928-8_31"},{"key":"495_CR37","doi-asserted-by":"publisher","DOI":"10.1007\/s40593-023-00387-6","author":"S Gombert","year":"2024","unstructured":"Gombert, S., Fink, A., Giorgashvili, T., Jivet, I., Di Mitri, D., Yau, J., Frey, A., & Drachsler, H. (2024). From the automated assessment of student essay content to highly informative feedback: A case study. International Journal of Artificial Intelligence in Education. https:\/\/doi.org\/10.1007\/s40593-023-00387-6","journal-title":"International Journal of Artificial Intelligence in Education"},{"issue":"3","key":"495_CR38","doi-asserted-by":"publisher","first-page":"361","DOI":"10.1111\/j.2044-8279.2011.02029.x","volume":"82","author":"R Greifeneder","year":"2012","unstructured":"Greifeneder, R., Zelt, S., Seele, T., Bottenberg, K., & Alt, A. (2012). Towards a better understanding of the legibility bias in performance assessments: The case of gender-based inferences. British Journal of Educational Psychology, 82(3), 361\u2013374. https:\/\/doi.org\/10.1111\/j.2044-8279.2011.02029.x","journal-title":"British Journal of Educational Psychology"},{"key":"495_CR39","doi-asserted-by":"publisher","unstructured":"Guo, X., Yin, Y., Dong, C., Yang, G., & Zhou, G. (2008). On the class imbalance problem. In Maozu Guo, Liang Zhao, & Lipo Wang (Eds.), 2008 Fourth International Conference on Natural Computation (Vol. 4, pp. 192\u2013201). IEEE. https:\/\/doi.org\/10.1109\/ICNC.2008.871","DOI":"10.1109\/ICNC.2008.871"},{"key":"495_CR40","doi-asserted-by":"publisher","unstructured":"Haeri, M. A., & Zweig, K. A. (2020). The crucial role of sensitive attributes in fair classification. 2020 IEEE Symposium Series on Computational Intelligence (SSCI), 2993\u20133002. https:\/\/doi.org\/10.1109\/SSCI47803.2020.9308585","DOI":"10.1109\/SSCI47803.2020.9308585"},{"key":"495_CR41","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2204.03503","author":"S Haller","year":"2022","unstructured":"Haller, S., Aldea, A., Seifert, C., & Strisciuglio, N. (2022). Survey on automated short answer grading with deep learning: From word embeddings to Transformers. ArXiv. https:\/\/doi.org\/10.48550\/ArXiv.2204.03503","journal-title":"ArXiv"},{"issue":"5220","key":"495_CR42","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1126\/science.7604277","volume":"269","author":"LV Hedges","year":"1995","unstructured":"Hedges, L. V., & Nowell, A. (1995). Sex differences in mental test scores, variability, and numbers of High-Scoring individuals. Science, 269(5220), 41\u201345. https:\/\/doi.org\/10.1126\/science.7604277","journal-title":"Science"},{"issue":"4","key":"495_CR43","first-page":"811","volume":"106","author":"D Hellman","year":"2020","unstructured":"Hellman, D. (2020). Measuring algorithmic fairness. Virginia Law Review, 106(4), 811\u2013866.","journal-title":"Virginia Law Review"},{"key":"495_CR44","unstructured":"Hellstr\u00f6m, T., Dignum, V., & Bensch, S. (2020). In L. Serafini, & Paul Lukowicz (Eds.), Bias in machine learning\u2014What is it good for? (pp. 3\u201310). Alessandro Saffiotti. https:\/\/ceur-ws.org\/Vol-2659\/hellstrom.pdf CEUR-WS."},{"issue":"17","key":"495_CR45","doi-asserted-by":"publisher","first-page":"2879","DOI":"10.1080\/09500693.2015.1114190","volume":"37","author":"SI Hofer","year":"2015","unstructured":"Hofer, S. I. (2015). Studying gender Bias in physics grading: The role of teaching experience and country. International Journal of Science Education, 37(17), 2879\u20132905. https:\/\/doi.org\/10.1080\/09500693.2015.1114190","journal-title":"International Journal of Science Education"},{"key":"495_CR46","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2403.00742","author":"V Hofmann","year":"2024","unstructured":"Hofmann, V., Kalluri, P. R., Jurafsky, D., & King, S. (2024). Dialect prejudice predicts AI decisions about people\u2019s character, employability, and criminality. ArXiv. https:\/\/doi.org\/10.48550\/ArXiv.2403.00742","journal-title":"ArXiv"},{"key":"495_CR47","unstructured":"Horbach, A., & Pinkal, M. (2018). Semi-supervised clustering for short answer scoring. In N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, & T. Tokunaga (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 4065\u20134071). European Language Resources Association (ELRA). https:\/\/aclanthology.org\/L18-1641."},{"issue":"6","key":"495_CR48","doi-asserted-by":"publisher","first-page":"581","DOI":"10.1037\/0003-066X.60.6.581","volume":"60","author":"JS Hyde","year":"2005","unstructured":"Hyde, J. S. (2005). The gender similarities hypothesis. American Psychologist, 60(6), 581\u2013592. https:\/\/doi.org\/10.1037\/0003-066X.60.6.581","journal-title":"American Psychologist"},{"key":"495_CR49","doi-asserted-by":"publisher","first-page":"752273","DOI":"10.3389\/fpsyg.2021.752273","volume":"12","author":"JA Ib\u00e1\u00f1ez-Alfonso","year":"2021","unstructured":"Ib\u00e1\u00f1ez-Alfonso, J. A., Hern\u00e1ndez-Cabrera, J. A., Du\u00f1abeitia, J. A., Est\u00e9vez, A., Macizo, P., Bajo, M. T., Fuentes, L. J., & Salda\u00f1a, D. (2021). Socioeconomic status, culture, and reading comprehension in immigrant students. Frontiers in Psychology, 12, 752273. https:\/\/doi.org\/10.3389\/fpsyg.2021.752273","journal-title":"Frontiers in Psychology"},{"issue":"3","key":"495_CR50","doi-asserted-by":"publisher","first-page":"338","DOI":"10.1111\/jedm.12335","volume":"59","author":"MS Johnson","year":"2022","unstructured":"Johnson, M. S., Liu, X., & McCaffrey, D. F. (2022). Psychometric methods to evaluate measurement and algorithmic Bias in automated scoring. Journal of Educational Measurement, 59(3), 338\u2013361. https:\/\/doi.org\/10.1111\/jedm.12335","journal-title":"Journal of Educational Measurement"},{"issue":"2","key":"495_CR51","doi-asserted-by":"publisher","first-page":"456","DOI":"10.2307\/20466646","volume":"30","author":"S Jones","year":"2007","unstructured":"Jones, S., & Myhill, D. (2007). Discourses of difference? Examining gender differences in linguistic characteristics of writing. Canadian Journal of Education, 30(2), 456\u2013482. https:\/\/doi.org\/10.2307\/20466646","journal-title":"Canadian Journal of Education"},{"issue":"3","key":"495_CR52","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1080\/0261976032000128201","volume":"26","author":"J Klein","year":"2003","unstructured":"Klein, J., & El, L. P. (2003). Impairment of teacher efficiency during extended sessions of test correction. European Journal of Teacher Education, 26(3), 379\u2013392. https:\/\/doi.org\/10.1080\/0261976032000128201","journal-title":"European Journal of Teacher Education"},{"issue":"4","key":"495_CR53","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1093\/llc\/17.4.401","volume":"7","author":"M Koppel","year":"2002","unstructured":"Koppel, M., Argamon, S., & Shimoni, A. R. (2002). Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 7(4), 401\u2013412. https:\/\/doi.org\/10.1093\/llc\/17.4.401","journal-title":"Literary and Linguistic Computing"},{"key":"495_CR54","doi-asserted-by":"publisher","unstructured":"Kusner, M., Loftus, J., Russell, C., & Silva, R. (2017). Counterfactual fairness. In Ulrike von Luxburg, Isabelle Guyon, Samy Bengio, Hanna Wallach, & Rob Fergus (Eds.), Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 4069\u20134079). Curran Associates Inc. https:\/\/doi.org\/10.5555\/3294996.3295162","DOI":"10.5555\/3294996.3295162"},{"key":"495_CR55","doi-asserted-by":"publisher","first-page":"159","DOI":"10.2307\/2529310","volume":"33","author":"JR Landis","year":"1977","unstructured":"Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159\u2013174.","journal-title":"Biometrics"},{"key":"495_CR56","unstructured":"Legislative Resolution on the proposal for a regulation of the european parliament and of the council on laying down harmonised rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts (2021\/0106(COD); P9_TA(2024)0138), 2021\/0106(COD), European Parliament, P9_TA (2024). 0138 (2024). https:\/\/www.europarl.europa.eu\/doceo\/document\/TA-9-2024-0138_EN.pdf"},{"issue":"4","key":"495_CR57","doi-asserted-by":"publisher","first-page":"479","DOI":"10.1177\/0265532214530699","volume":"31","author":"G Ling","year":"2014","unstructured":"Ling, G., Mollaun, P., & Xi, X. (2014). A study on the impact of fatigue on human raters when scoring speaking responses. Language Testing, 31(4), 479\u2013499. https:\/\/doi.org\/10.1177\/0265532214530699","journal-title":"Language Testing"},{"key":"495_CR58","doi-asserted-by":"publisher","unstructured":"Litman, D., Zhang, H., Correnti, R., Matsumura, L. C., & Wang, E. (2021). A fairness evaluation of automated methods for scoring text evidence usage in writing. Artificial Intelligence in Education: 22nd International Conference, AIED 2021, Utrecht, The Netherlands, June 14\u201318, 2021, Proceedings, Part I, 255\u2013267. https:\/\/doi.org\/10.1007\/978-3-030-78292-4_21","DOI":"10.1007\/978-3-030-78292-4_21"},{"key":"495_CR59","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach (No. arXiv:1907.11692). arXiv. http:\/\/arxiv.org\/abs\/1907.11692"},{"key":"495_CR60","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1016\/j.eap.2020.11.003","volume":"69","author":"LA Lopez-Agudo","year":"2021","unstructured":"Lopez-Agudo, L. A., Gonz\u00e1lez-Betancor, S. M., & Marcenaro-Gutierrez, O. D. (2021). Language at home and academic performance: The case of Spain. Economic Analysis and Policy, 69, 16\u201333. https:\/\/doi.org\/10.1016\/j.eap.2020.11.003","journal-title":"Economic Analysis and Policy"},{"key":"495_CR61","doi-asserted-by":"publisher","unstructured":"Loukina, A., Madnani, N., & Zechner, K. (2019). The many dimensions of algorithmic fairness in educational applications. In H. Yannakoudakis, E. Kochmar, C. Leacock, N. Madnani, I. Pil\u00e1n, & T. Zesch (Eds.), Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 1\u201310). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/W19-4401","DOI":"10.18653\/v1\/W19-4401"},{"key":"495_CR62","doi-asserted-by":"publisher","unstructured":"Lowie, W., & Verspoor, M. (2008). Input versus transfer? - the role of frequency and similarity in the acquisition of L2 prepositions. In michel achard & susanne niemeier (Eds.), Cognitive Linguistics, Second Language Acquisition, and Foreign Language Teaching (Vol. 18, pp. 77\u201394). De Gruyter Mouton. https:\/\/doi.org\/10.1515\/9783110199857.77","DOI":"10.1515\/9783110199857.77"},{"issue":"4","key":"495_CR63","doi-asserted-by":"publisher","first-page":"897","DOI":"10.3390\/psych3040056","volume":"3","author":"S Ludwig","year":"2021","unstructured":"Ludwig, S., Mayer, C., Hansen, C., Eilers, K., & Brandt, S. (2021). Automated essay scoring using transformer models. Psych, 3(4), 897\u2013915. https:\/\/doi.org\/10.3390\/psych3040056","journal-title":"Psych"},{"key":"495_CR64","unstructured":"Madnani, N., & Cahill, A. (2018). Automated scoring: Beyond natural language processing. In E. M. Bender, L. Derczynski, & P. Isabelle (Eds.), Proceedings of the 27th International Conference on Computational Linguistics (pp. 1099\u20131109). Association for Computational Linguistics. https:\/\/aclanthology.org\/C18-1094."},{"issue":"1","key":"495_CR65","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1017\/S1366728903001020","volume":"6","author":"BC Malt","year":"2003","unstructured":"Malt, B. C., & Sloman, S. A. (2003). Linguistic diversity and object naming by non-native speakers of english. Bilingualism: Language and Cognition, 6(1), 47\u201367. https:\/\/doi.org\/10.1017\/S1366728903001020","journal-title":"Bilingualism: Language and Cognition"},{"key":"495_CR66","doi-asserted-by":"publisher","unstructured":"Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Survey (CSUR), 54(6). https:\/\/doi.org\/10.1145\/3457607. Article 115.","DOI":"10.1145\/3457607"},{"key":"495_CR67","unstructured":"Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., & Leisch, F. (2024). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien (Version 1.7\u201316) [Computer software]. https:\/\/cran.r-project.org\/web\/packages\/e1071\/index.html"},{"key":"495_CR68","unstructured":"Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Proceedings of the 26th International Conference on Neural Information Processing Systems\u2014Volume 2 (Vol. 26, pp. 3111\u20133119). Curran Associates Inc. https:\/\/papers.nips.cc\/paper_files\/paper\/2013\/hash\/9aa42b31882ec039965f3c4923ce901b-Abstract.html"},{"issue":"4","key":"495_CR69","doi-asserted-by":"publisher","first-page":"297","DOI":"10.1177\/014662169301700401","volume":"17","author":"RE Millsap","year":"1993","unstructured":"Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement Bias. Applied Psychological Measurement, 17(4), 297\u2013334. https:\/\/doi.org\/10.1177\/014662169301700401","journal-title":"Applied Psychological Measurement"},{"issue":"1","key":"495_CR70","doi-asserted-by":"publisher","first-page":"142","DOI":"10.1080\/10494820.2018.1558257","volume":"29","author":"E Mousavinasab","year":"2021","unstructured":"Mousavinasab, E., Zarifsanaiey, N., Niakan Kalhori, R., Rakhshan, S., Keikha, M., L., & Ghazi Saeedi, M. (2021). Intelligent tutoring systems: A systematic review of characteristics, applications, and evaluation methods. Interactive Learning Environments, 29(1), 142\u2013163. https:\/\/doi.org\/10.1080\/10494820.2018.1558257","journal-title":"Interactive Learning Environments"},{"key":"495_CR71","unstructured":"Mullis, I. V. S., & Martin, M. O. (Eds.). (2019). Pirls 2021 assessment frameworks. TIMSS & PIRLS."},{"issue":"11","key":"495_CR72","doi-asserted-by":"publisher","first-page":"16","DOI":"10.5120\/ijca2015906113","volume":"125","author":"RR Naik","year":"2015","unstructured":"Naik, R. R., Landge, M. B., & Mahender, C. N. (2015). A review on plagiarism detection tools. International Journal of Computer Applications, 125(11), 16\u201322.","journal-title":"International Journal of Computer Applications"},{"key":"495_CR73","doi-asserted-by":"publisher","unstructured":"Northcutt, C. G., Athalye, A., & Mueller, J. (2021). Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks. arXiv. https:\/\/doi.org\/10.48550\/arXiv.2103.14749","DOI":"10.48550\/arXiv.2103.14749"},{"key":"495_CR74","doi-asserted-by":"publisher","unstructured":"Ntoutsi, E., Fafalios, P., Gadiraju, U., Iosifidis, V., Nejdl, W., Vidal, M. E., Ruggieri, S., Turini, F., Papadopoulos, S., Krasanakis, E., Kompatsiaris, I., Kinder-Kurlanda, K., Wagner, C., Karimi, F., Fernandez, M., Alani, H., Berendt, B., Kruegel, T., Heinze, C., & Staab, S. (2020). Bias in data-driven artificial intelligence systems\u2014An introductory survey. WIREs Data Mining and Knowledge Discovery, 10(3). https:\/\/doi.org\/10.1002\/widm.1356. Article e1356.","DOI":"10.1002\/widm.1356"},{"issue":"2","key":"495_CR75","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1348\/000712606X117649","volume":"98","author":"JV Oakhill","year":"2007","unstructured":"Oakhill, J. V., & Petrides, A. (2007). Sex differences in the effects of interest on boys\u2019 and girls\u2019 reading comprehension. British Journal of Psychology, 98(2), 223\u2013235. https:\/\/doi.org\/10.1348\/000712606X117649","journal-title":"British Journal of Psychology"},{"issue":"6464","key":"495_CR76","doi-asserted-by":"publisher","first-page":"447","DOI":"10.1126\/science.aax2342","volume":"366","author":"Z Obermeyer","year":"2019","unstructured":"Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447\u2013453. https:\/\/doi.org\/10.1126\/science.aax2342","journal-title":"Science"},{"key":"495_CR77","doi-asserted-by":"publisher","DOI":"10.1787\/19963777","author":"OECD","year":"2009","unstructured":"OECD. (2009). Take the test: Sample questions from oecd\u2019s PISA assessments. OECD Publishing. https:\/\/doi.org\/10.1787\/19963777","journal-title":"OECD Publishing"},{"key":"495_CR79","unstructured":"OECD. (2017a). PISA 2015 assessment and analytical framework: Science, reading, mathematic, financial literacy and collaborative problem solving. OECD Publishing."},{"key":"495_CR81","unstructured":"OECD. (2024). PISA 2022 technical report. OECD Publishing. https:\/\/www.oecd-ilibrary.org\/education\/pisa-2022-technical-report_01820d6d-en"},{"key":"495_CR78","doi-asserted-by":"publisher","unstructured":"OECD (2011). Education at a Glance 2011: OECD Indicators. OECD Publishing. https:\/\/doi.org\/10.1787\/eag-2011-en","DOI":"10.1787\/eag-2011-en"},{"key":"495_CR80","unstructured":"Pisa 2015 Technical Report. OECD OECD, & Publishing (2017b). https:\/\/www.oecd.org\/content\/dam\/oecd\/en\/about\/programmes\/edu\/pisa\/publications\/technical-report\/PISA2015_TechRep_Final.pdf"},{"key":"495_CR82","unstructured":"Palviainen, \u00c5., Kalaja, P., & M\u00e4ntyl\u00e4, K. (2012). Development of L2 writing: Fluency and proficiency. In L. Meril\u00e4inen, L. Kolehmainen, & T. Nieminen (Eds.), AFinLA-e Soveltavan kielitieteen tutkimuksia 2012 (Vol. 4, pp. 47\u201359). AFinLA. https:\/\/journal.fi\/afinla\/article\/view\/7037"},{"issue":"2","key":"495_CR83","doi-asserted-by":"publisher","first-page":"246","DOI":"10.1162\/neco.1991.3.2.246","volume":"3","author":"J Park","year":"1991","unstructured":"Park, J., & Sandberg, I. W. (1991). Universal approximation using radial-basis-function networks. Neural Computation, 3(2), 246\u2013257. https:\/\/doi.org\/10.1162\/neco.1991.3.2.246","journal-title":"Neural Computation"},{"issue":"3","key":"495_CR84","doi-asserted-by":"publisher","first-page":"434","DOI":"10.1016\/j.paid.2010.10.026","volume":"50","author":"TW Payne","year":"2011","unstructured":"Payne, T. W., & Lynn, R. (2011). Sex differences in second Language comprehension. Personality and Individual Differences, 50(3), 434\u2013436. https:\/\/doi.org\/10.1016\/j.paid.2010.10.026","journal-title":"Personality and Individual Differences"},{"key":"495_CR85","doi-asserted-by":"publisher","unstructured":"Pennington, J., Socher, R., & Manning, C. (2014). GloVe: global vectors for word representation. In A. Moschitti, B. Pang, & W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532\u20131543). Association for Computational Linguistics. https:\/\/doi.org\/10.3115\/v1\/D14-1162","DOI":"10.3115\/v1\/D14-1162"},{"issue":"4","key":"495_CR86","doi-asserted-by":"publisher","first-page":"1269","DOI":"10.1007\/s10648-018-9450-x","volume":"30","author":"J Petersen","year":"2018","unstructured":"Petersen, J. (2018). Gender difference in verbal performance: A meta-analysis of united states state performance assessments. Educational Psychology Review, 30(4), 1269\u20131281. https:\/\/doi.org\/10.1007\/s10648-018-9450-x","journal-title":"Educational Psychology Review"},{"issue":"2","key":"495_CR87","doi-asserted-by":"publisher","first-page":"114","DOI":"10.1027\/1016-9040.13.2.114","volume":"13","author":"A Piolat","year":"2008","unstructured":"Piolat, A., Barbier, M. L., & Roussey, J. Y. (2008). Fluency and cognitive effort during first- and second-language notetaking and writing by undergraduate students. European Psychologist, 13(2), 114\u2013125. https:\/\/doi.org\/10.1027\/1016-9040.13.2.114","journal-title":"European Psychologist"},{"key":"495_CR88","doi-asserted-by":"publisher","unstructured":"Prenzel, M., Blum, W., & Klieme, E. (2015). The impact of PISA on mathematics teaching and learning in Germany. In K. Stacey & R. Turner (Eds.), Assessing Mathematical Literacy: The PISA Experience (pp. 239\u2013248). Springer International Publishing. https:\/\/doi.org\/10.1007\/978-3-319-10121-7_12","DOI":"10.1007\/978-3-319-10121-7_12"},{"key":"495_CR89","unstructured":"R Core Team (2020). R: A language and environment for statistical computing [Computer software]."},{"issue":"3","key":"495_CR90","doi-asserted-by":"publisher","first-page":"2495","DOI":"10.1007\/s10462-021-10068-2","volume":"55","author":"D Ramesh","year":"2022","unstructured":"Ramesh, D., & Sanampudi, S. K. (2022). An automated essay scoring systems: A systematic literature review. Artificial Intelligence Review, 55(3), 2495\u20132527. https:\/\/doi.org\/10.1007\/s10462-021-10068-2","journal-title":"Artificial Intelligence Review"},{"key":"495_CR91","unstructured":"Regulation, E. U. (2024\/1689). of the European Parliament and of the Council of 13 June 2024 Laying down Harmonised Rules on Artificial Intelligence and Amending Regulations, 2024\/1689 (2024). https:\/\/eur-lex.europa.eu\/legal-content\/EN\/TXT\/?uri=CELEX:32024R1689"},{"key":"495_CR92","doi-asserted-by":"crossref","unstructured":"Ruder, S. (2022). Square one Bias in NLP: Towards a multi-dimensional exploration of the research manifold. In P. Smaranda Muresan, Nakov, & A. Villavicencio (Eds.), Findings of the association for computational linguistics: ACL 2022. Association for Computational Linguistics.","DOI":"10.18653\/v1\/2022.findings-acl.184"},{"key":"495_CR93","doi-asserted-by":"publisher","unstructured":"Schlippe, T., Stierstorfer, Q., Koppel, M., & Libbrecht, P. (2023). Explainability in automatic short answer grading. In E. C. K. Cheng, T. Wang, T. Schlippe, & G. N. Beligiannis (Eds.), Artificial Intelligence in Education Technologies: New Development and Innovative Practices (pp. 69\u201387). Springer Nature. https:\/\/doi.org\/10.1007\/978-981-19-8040-4_5","DOI":"10.1007\/978-981-19-8040-4_5"},{"key":"495_CR94","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1177\/2378023120967171","volume":"6","author":"C Schwemmer","year":"2020","unstructured":"Schwemmer, C., Knight, C., Bello-Pardo, E. D., Oklobdzija, S., Schoonvelde, M., & Lockhart, J. W. (2020). Diagnosing gender bias in image recognition systems. Socius, 6, 1\u201317. https:\/\/doi.org\/10.1177\/2378023120967171","journal-title":"Socius"},{"key":"495_CR95","unstructured":"Shin, H. J., Andersen, N., Horbach, A., Kim, E., Baik, J., & Zehner, F. (2024). Operational Automatic Scoring of Text Responses in 2016 ePIRLS: Performance and Linguistic Variance. https:\/\/www.iea.nl\/sites\/default\/files\/2024-04\/Operational-Automatic-Scoring-of-Text-Responses-ePIRLS.pdf"},{"issue":"4","key":"495_CR96","doi-asserted-by":"publisher","first-page":"887","DOI":"10.1007\/s10044-014-0371-0","volume":"18","author":"I Siddiqi","year":"2014","unstructured":"Siddiqi, I., Djeddi, C., Raza, A., & Souici-meslati, L. (2014). Automatic analysis of handwriting for gender classification. Pattern Analysis and Applications, 18(4), 887\u2013899. https:\/\/doi.org\/10.1007\/s10044-014-0371-0","journal-title":"Pattern Analysis and Applications"},{"issue":"4","key":"495_CR97","doi-asserted-by":"publisher","first-page":"657","DOI":"10.2307\/3587400","volume":"27","author":"T Silva","year":"1993","unstructured":"Silva, T. (1993). Toward an Understanding of the distinct nature of L2 writing: The ESL research and its implications. TESOL Quarterly, 27(4), 657\u2013677. https:\/\/doi.org\/10.2307\/3587400","journal-title":"TESOL Quarterly"},{"issue":"3","key":"495_CR98","doi-asserted-by":"publisher","first-page":"271","DOI":"10.1080\/0141192890150304","volume":"15","author":"MG Spear","year":"1989","unstructured":"Spear, M. G. (1989). Differences between the written work of boys and girls. British Educational Research Journal, 15(3), 271\u2013277.","journal-title":"British Educational Research Journal"},{"key":"495_CR99","doi-asserted-by":"publisher","unstructured":"Steinig, W., & Betzel, D. (2013). Schreiben Grundsch\u00fcler heute schlechter als vor 40 Jahren? Texte von Viertkl\u00e4sslern aus den Jahren 1972, 2002 und 2012. Sprachverfall? Dynamik - Wandel - Variation, 353\u2013371. https:\/\/doi.org\/10.1515\/9783110343007.353","DOI":"10.1515\/9783110343007.353"},{"issue":"3","key":"495_CR100","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1111\/j.1745-3992.2012.00240.x","volume":"31","author":"I Suto","year":"2012","unstructured":"Suto, I. (2012). A critical review of some qualitative research methods used to explore rater cognition. Educational Measurement: Issues and Practice, 31(3), 21\u201330. https:\/\/doi.org\/10.1111\/j.1745-3992.2012.00240.x","journal-title":"Educational Measurement: Issues and Practice"},{"key":"495_CR101","doi-asserted-by":"publisher","unstructured":"Tatman, R. (2017). Gender and Dialect Bias in YouTube\u2019s Automatic Captions (pp. 53\u201359). https:\/\/doi.org\/10.18653\/v1\/W17-1606","DOI":"10.18653\/v1\/W17-1606"},{"key":"495_CR102","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, \u0141., & Polosukhin, I. (2017). Attention is all you need. In U. von Luxburg, I. Guyon, S. Bengio, H. Wallach, & R. Fergus (Eds.), Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 6000\u20136010). Curran Associates Inc."},{"issue":"4","key":"495_CR103","doi-asserted-by":"publisher","first-page":"863","DOI":"10.1016\/S0191-8869(02)00288-X","volume":"35","author":"EM Weiss","year":"2003","unstructured":"Weiss, E. M., Kemmler, G., Deisenhammer, E. A., Fleischhacker, W. W., & Delazer, M. (2003). Sex differences in cognitive functions. Personality and Individual Differences, 35(4), 863\u2013875. https:\/\/doi.org\/10.1016\/S0191-8869(02)00288-X","journal-title":"Personality and Individual Differences"},{"issue":"2","key":"495_CR104","doi-asserted-by":"publisher","first-page":"554","DOI":"10.2307\/20466650","volume":"30","author":"B White","year":"2007","unstructured":"White, B. (2007). Are girls better readers than boys? Which boys? Which girls? Canadian Journal of Education, 30(2), 554\u2013581. https:\/\/doi.org\/10.2307\/20466650","journal-title":"Canadian Journal of Education"},{"key":"495_CR105","doi-asserted-by":"publisher","unstructured":"Yamada, I., Asai, A., Sakuma, J., Shindo, H., Takeda, H., Takefuji, Y., & Matsumoto, Y. (2020). Wikipedia2Vec: An efficient toolkit for learning and visualizing the embeddings of words and entities from wikipedia. In Q. Liu & D. Schlangen (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 23\u201330). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2020.emnlp-demos.4","DOI":"10.18653\/v1\/2020.emnlp-demos.4"},{"key":"495_CR106","doi-asserted-by":"publisher","unstructured":"Zanga, G., & De Gioannis, E. (2023). Discrimination in grading: A scoping review of studies on teachers\u2019 discrimination in school. Studies in Educational Evaluation, 78., Article 101284. https:\/\/doi.org\/10.1016\/j.stueduc.2023.101284","DOI":"10.1016\/j.stueduc.2023.101284"},{"issue":"2","key":"495_CR108","doi-asserted-by":"publisher","first-page":"280","DOI":"10.1177\/0013164415590022","volume":"76","author":"F Zehner","year":"2016","unstructured":"Zehner, F., S\u00e4lzer, C., & Goldhammer, F. (2016). Automatic coding of short text responses via clustering in educational assessment. Educational and Psychological Measurement, 76(2), 280\u2013303. https:\/\/doi.org\/10.1177\/0013164415590022","journal-title":"Educational and Psychological Measurement"},{"key":"495_CR107","doi-asserted-by":"publisher","unstructured":"Zehner, F., Goldhammer, F., & S\u00e4lzer, C. (2018). Automatically analyzing text responses for exploring gender-specific cognitions in PISA reading. Large-Scale Assessments in Education, 6(1). https:\/\/doi.org\/10.1186\/s40536-018-0060-3. Article 7.","DOI":"10.1186\/s40536-018-0060-3"}],"container-title":["International Journal of Artificial Intelligence in Education"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-025-00495-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40593-025-00495-5","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-025-00495-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T18:12:45Z","timestamp":1772647965000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40593-025-00495-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,24]]},"references-count":108,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["495"],"URL":"https:\/\/doi.org\/10.1007\/s40593-025-00495-5","relation":{},"ISSN":["1560-4292","1560-4306"],"issn-type":[{"value":"1560-4292","type":"print"},{"value":"1560-4306","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,24]]},"assertion":[{"value":"16 June 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 July 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests"}}]}}