{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T02:58:48Z","timestamp":1781319528487,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":29,"publisher":"ACM","license":[{"start":{"date-parts":[[2026,6,12]],"date-time":"2026-06-12T00:00:00Z","timestamp":1781222400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/legalcode"}],"funder":[{"DOI":"10.13039\/100014013","name":"UK Research and Innovation","doi-asserted-by":"publisher","award":["MR\/Z505882\/1"],"award-info":[{"award-number":["MR\/Z505882\/1"]}],"id":[{"id":"10.13039\/100014013","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,6,13]]},"DOI":"10.1145\/3802974.3809410","type":"proceedings-article","created":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T02:25:06Z","timestamp":1781317506000},"page":"171-175","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Should Machines Get to Judge? Rethinking the Design of AI-Mediated Assessment in Education"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4032-4766","authenticated-orcid":false,"given":"Zaki","family":"Pauzi","sequence":"first","affiliation":[{"name":"UCL Knowledge Lab, University College London, London, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0575-0823","authenticated-orcid":false,"given":"Manolis","family":"Mavrikis","sequence":"additional","affiliation":[{"name":"UCL Knowledge Lab, University College London, London, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2026,6,12]]},"reference":[{"key":"e_1_3_3_2_2_2","doi-asserted-by":"publisher","unstructured":"Daniele Agostini and Federica Picasso. 2024. Large language models for sustainable assessment and feedback in higher education: Towards a Pedagogical and Technological Framework. Intelligenza Artificiale 18 1 (2024) 121\u2013138. 10.3233\/IA-240033","DOI":"10.3233\/IA-240033"},{"key":"e_1_3_3_2_3_2","doi-asserted-by":"publisher","unstructured":"Catalin Anghel Andreea\u00a0Alexandra Anghel Emilia Pecheanu Adina Cocu Marian\u00a0Viorel Craciun Paul Iacobescu Antonio\u00a0Stefan Balau and Constantin\u00a0Adrian Andrei. 2025. GraderAssist: A Graph-Based Multi-LLM Framework for Transparent and Reproducible Automated Evaluation. Informatics 12 4 (2025). 10.3390\/informatics12040123","DOI":"10.3390\/informatics12040123"},{"key":"e_1_3_3_2_4_2","doi-asserted-by":"publisher","unstructured":"Catalin Anghel Marian\u00a0Viorel Craciun Emilia Pecheanu Adina Cocu Andreea\u00a0Alexandra Anghel Paul Iacobescu Calina Maier Constantin\u00a0Adrian Andrei Cristian Scheau and Serban Dragosloveanu. 2025. CourseEvalAI: Rubric-Guided Framework for Transparent and Consistent Evaluation of Large Language Models. Computers 14 10 (2025). 10.3390\/computers14100431","DOI":"10.3390\/computers14100431"},{"key":"e_1_3_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-99267-4_14"},{"key":"e_1_3_3_2_6_2","doi-asserted-by":"publisher","unstructured":"Okan Bulut and Maggie Beiting-Parrish. 2024. The Rise of Artificial Intelligence in Educational Measurement: Opportunities and Ethical Challenges. Chinese\/English Journal of Educational Measurement and Evaluation 5 3 (Dec. 2024). 10.59863\/miql7785","DOI":"10.59863\/miql7785"},{"key":"e_1_3_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.emnlp-main.146"},{"key":"e_1_3_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TALE66047.2025.11346771"},{"key":"e_1_3_3_2_9_2","unstructured":"Clayton Cohn Nicole Hutchins and Gautam Biswas. 2023. Towards a formative feedback generation agent: Leveraging a human-in-the-loop chain-of-thought prompting approach with LLMs to evaluate formative assessment responses in K-12 science. NSF Preprint (2023). https:\/\/par.nsf.gov\/biblio\/10468997 Preprint\u2014not formally published."},{"key":"e_1_3_3_2_10_2","doi-asserted-by":"publisher","unstructured":"Mohamed Diab\u00a0Idris Xiaohua Feng and Vladimir Dyo. 2024. Revolutionizing Higher Education: Unleashing the Potential of Large Language Models for Strategic Transformation. IEEE Access 12 (2024) 67738\u201367757. 10.1109\/ACCESS.2024.3400164","DOI":"10.1109\/ACCESS.2024.3400164"},{"key":"e_1_3_3_2_11_2","doi-asserted-by":"publisher","unstructured":"Jonas Flod\u00e9n. 2024. Grading exams using large language models: A comparison between human and AI grading of exams in higher education using ChatGPT. British Educational Research Journal 51 1 (Sept. 2024) 201\u2013224. 10.1002\/berj.4069","DOI":"10.1002\/berj.4069"},{"key":"e_1_3_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.4135\/9781849208574"},{"key":"e_1_3_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3643795.3648375"},{"key":"e_1_3_3_2_14_2","doi-asserted-by":"publisher","unstructured":"Yue Huang Corey Palermo Ruitao Liu and Yong He. 2025. An Early Review of Generative Language Models in Automated Writing Evaluation: Advancements Challenges and Future Directions for Automated Essay Scoring and Feedback Generation. Chinese\/English Journal of Educational Measurement and Evaluation (Aug. 2025). 10.59863\/famj7696","DOI":"10.59863\/famj7696"},{"key":"e_1_3_3_2_15_2","first-page":"71","volume-title":"Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Works in Progress","author":"Huang Yue","year":"2025","unstructured":"Yue Huang and Joshua Wilson. 2025. Evaluating LLM-Based Automated Essay Scoring: Accuracy, Fairness, and Validity. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Works in Progress, Joshua Wilson, Christopher Ormerod, and Magdalen Beiting\u00a0Parrish (Eds.). National Council on Measurement in Education (NCME), Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States, 71\u201383. https:\/\/aclanthology.org\/2025.aimecon-wip.9\/"},{"key":"e_1_3_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-56069-9_28"},{"key":"e_1_3_3_2_17_2","doi-asserted-by":"publisher","unstructured":"Zhaokun Jiang and Ziyin Zhang. 2025. From black box to transparency: Enhancing automated interpreting assessment with explainable AI in college classrooms. Research Methods in Applied Linguistics 4 3 (2025) 100237. 10.1016\/j.rmal.2025.100237","DOI":"10.1016\/j.rmal.2025.100237"},{"key":"e_1_3_3_2_18_2","doi-asserted-by":"publisher","unstructured":"Matthew Johnson and Mo Zhang. 2024. Examining the responsible use of zero-shot AI approaches to scoring essays. Scientific Reports 14 1 (Dec. 2024). 10.1038\/s41598-024-79208-2","DOI":"10.1038\/s41598-024-79208-2"},{"key":"e_1_3_3_2_19_2","doi-asserted-by":"publisher","unstructured":"Arnab Kundu and Tripti Bej. 2025. AI policies in school education: a comparative study on China Singapore Finland and the US. Journal of Science and Technology Policy Management (10 2025). 10.1108\/JSTPM-06-2024-0218","DOI":"10.1108\/JSTPM-06-2024-0218"},{"key":"e_1_3_3_2_20_2","doi-asserted-by":"publisher","unstructured":"Jinsook Lee Yann Hicke Renzhe Yu Christopher Brooks and Ren\u00e9\u00a0F. Kizilcec. 2024. The life cycle of large language models in education: A framework for understanding sources of bias. British Journal of Educational Technology 55 5 (July 2024) 1982\u20132002. 10.1111\/bjet.13505","DOI":"10.1111\/bjet.13505"},{"key":"e_1_3_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/FIE63693.2025.11328653"},{"key":"e_1_3_3_2_22_2","doi-asserted-by":"publisher","unstructured":"Alfredo Milani Valentina Franzoni Emanuele Florindi Assel Omarbekova Gulmira Bekmanova and Banu Yergesh. 2025. When AI Is Fooled: Hidden Risks in LLM-Assisted Grading. Education Sciences 15 11 (2025). 10.3390\/educsci15111419","DOI":"10.3390\/educsci15111419"},{"key":"e_1_3_3_2_23_2","doi-asserted-by":"publisher","unstructured":"Majdi Quttainah Vinaytosh Mishra Somayya Madakam Yotam Lurie and Shlomo Mark. 2024. Cost Usability Credibility Fairness Accountability Transparency and Explainability Framework for Safe and Effective Large Language Models in Medical Education: Narrative Review and Qualitative Study. JMIR AI 3 (23 Apr 2024) e51834. 10.2196\/51834","DOI":"10.2196\/51834"},{"key":"e_1_3_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.14324\/111.9781787357242"},{"key":"e_1_3_3_2_25_2","doi-asserted-by":"publisher","unstructured":"Daniel Schiff. 2021. Education for AI not AI for Education: The Role of Education and Ethics in National AI Policy Strategies. International Journal of Artificial Intelligence in Education 32 3 (Sept. 2021) 527\u2013563. 10.1007\/s40593-021-00270-2","DOI":"10.1007\/s40593-021-00270-2"},{"key":"e_1_3_3_2_26_2","volume-title":"ICLR 2025 Workshop on Building Trust in Language Models and Applications","author":"Wei Hui","year":"2025","unstructured":"Hui Wei, Shenghua He, Tian Xia, Fei Liu, Andy Wong, Jingyang Lin, and Mei Han. 2025. Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates. In ICLR 2025 Workshop on Building Trust in Language Models and Applications. https:\/\/openreview.net\/forum?id=CAgBCSt8gL"},{"key":"e_1_3_3_2_27_2","volume-title":"ICLR 2025 Workshop on Building Trust in Language Models and Applications","author":"Wei Kevin","year":"2025","unstructured":"Kevin Wei, Patricia Paskov, Sunishchal Dev, Michael\u00a0J Byun, Anka Reuel, Xavier Roberts-Gaal, Rachel Calcott, Evie Coxon, and Chinmay Deshpande. 2025. Model Evaluations Need Rigorous and Transparent Human Baselines. In ICLR 2025 Workshop on Building Trust in Language Models and Applications. https:\/\/openreview.net\/forum?id=VbG9sIsn4F"},{"key":"e_1_3_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2025.findings-naacl.314"},{"key":"e_1_3_3_2_29_2","doi-asserted-by":"publisher","unstructured":"Xuansheng Wu Padmaja\u00a0Pravin Saraf Gyeonggeon Lee Ehsan Latif Ninghao Liu and Xiaoming Zhai. 2025. Unveiling Scoring Processes: Dissecting the Differences Between LLMs and Human Graders in Automatic Scoring. Technology Knowledge and Learning (March 2025). 10.1007\/s10758-025-09836-8","DOI":"10.1007\/s10758-025-09836-8"},{"key":"e_1_3_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3209889.3209897"}],"event":{"name":"DIS '26: Designing Interactive Systems Conference","location":"Singapore Singapore","acronym":"DIS '26 Companion","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"]},"container-title":["Companion Publication of the 2026 ACM Designing Interactive Systems Conference"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3802974.3809410","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T02:26:15Z","timestamp":1781317575000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3802974.3809410"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,6,12]]},"references-count":29,"alternative-id":["10.1145\/3802974.3809410","10.1145\/3802974"],"URL":"https:\/\/doi.org\/10.1145\/3802974.3809410","relation":{},"subject":[],"published":{"date-parts":[[2026,6,12]]},"assertion":[{"value":"2026-06-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}