{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T07:54:47Z","timestamp":1776930887472,"version":"3.51.2"},"publisher-location":"New York, NY, USA","reference-count":154,"publisher":"ACM","funder":[{"name":"National Research Foundation, Singapore and Infocomm Media Development Authority","award":["DTC-RGC-09"],"award-info":[{"award-number":["DTC-RGC-09"]}]},{"name":"Ministry of Education, Singapore","award":["MOE-T2EP20121-0010"],"award-info":[{"award-number":["MOE-T2EP20121-0010"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,4,13]]},"DOI":"10.1145\/3772318.3790539","type":"proceedings-article","created":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T04:12:28Z","timestamp":1776053548000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["iRULER: Intelligible Rubric-Based User-Defined LLM Evaluation for Revision"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5118-993X","authenticated-orcid":false,"given":"Jingwen","family":"Bai","sequence":"first","affiliation":[{"name":"Department of Computer Science, National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-2179-3455","authenticated-orcid":false,"given":"Wei Soon","family":"Cheong","sequence":"additional","affiliation":[{"name":"Department of Computer Science, National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6765-4020","authenticated-orcid":false,"given":"Philippe","family":"Muller","sequence":"additional","affiliation":[{"name":"IRIT, University of Toulouse, Toulouse, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0543-2414","authenticated-orcid":false,"given":"Brian Y","family":"Lim","sequence":"additional","affiliation":[{"name":"Department of Computer Science, National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,4,13]]},"reference":[{"key":"e_1_3_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3174156"},{"key":"e_1_3_3_3_3_2","doi-asserted-by":"crossref","unstructured":"Bekir Afsar Kaisa Miettinen and Francisco Ruiz. 2021. Assessing the performance of interactive multiobjective optimization methods: A survey. ACM Computing Surveys (CSUR) 54 4 (2021) 1\u201327.","DOI":"10.1145\/3448301"},{"key":"e_1_3_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642780"},{"key":"e_1_3_3_3_5_2","doi-asserted-by":"crossref","unstructured":"Rola Ajjawi and David Boud. 2017. Researching feedback dialogue: An interactional analysis approach. Assessment & evaluation in higher education 42 2 (2017) 252\u2013265.","DOI":"10.1080\/02602938.2015.1102863"},{"key":"e_1_3_3_3_6_2","doi-asserted-by":"crossref","unstructured":"Rola Ajjawi Fiona Kent Jaclyn Broadbent Joanna Hong-Meng Tai Margaret Bearman and David Boud. 2022. Feedback that works: A realist review of feedback interventions for written tasks. Studies in Higher Education 47 7 (2022) 1343\u20131356.","DOI":"10.1080\/03075079.2021.1894115"},{"key":"e_1_3_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300233"},{"key":"e_1_3_3_3_8_2","unstructured":"Judith Arter. 2000. Rubrics Scoring Guides and Performance Criteria: Classroom Tools for Assessing and Improving Student Learning. (2000)."},{"key":"e_1_3_3_3_9_2","volume-title":"Creating & Recognizing Quality Rubrics","author":"Arter J.A.","year":"2007","unstructured":"J.A. Arter and J. Chappuis. 2007. Creating & Recognizing Quality Rubrics. Pearson Education. https:\/\/books.google.com.sg\/books?id=AMdKAAAAYAAJ"},{"key":"e_1_3_3_3_10_2","volume-title":"Scoring rubrics in the classroom: Using performance criteria for assessing and improving student performance","author":"Arter Judith","year":"2001","unstructured":"Judith Arter and Jay McTighe. 2001. Scoring rubrics in the classroom: Using performance criteria for assessing and improving student performance. Corwin Press."},{"key":"e_1_3_3_3_11_2","unstructured":"Zahra Ashktorab Michael Desmond Qian Pan James\u00a0M Johnson Martin\u00a0Santillan Cooper Elizabeth\u00a0M Daly Rahul Nair Tejaswini Pedapati Hyo\u00a0Jin Do and Werner Geyer. 2024. Aligning human and LLM judgments: Insights from evalassist on task-specific evaluations and ai-assisted assessment strategy preferences. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2410.00873 (2024)."},{"key":"e_1_3_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3746059.3747740"},{"key":"e_1_3_3_3_13_2","doi-asserted-by":"crossref","unstructured":"Seyyed\u00a0Kazem Banihashem Nafiseh\u00a0Taghizadeh Kerman Omid Noroozi Jewoong Moon and Hendrik Drachsler. 2024. Feedback sources in essay writing: peer-generated or AI-generated feedback? International Journal of Educational Technology in Higher Education 21 1 (2024) 23.","DOI":"10.1186\/s41239-024-00455-4"},{"key":"e_1_3_3_3_14_2","doi-asserted-by":"crossref","unstructured":"Margaret Bearman and Rola Ajjawi. 2021. Can a rubric do more than be transparent? Invitation as a new metaphor for assessment criteria. Studies in Higher Education 46 2 (2021) 359\u2013368.","DOI":"10.1080\/03075079.2019.1637842"},{"key":"e_1_3_3_3_15_2","volume-title":"Advertising & promotion","author":"Belch George\u00a0E","year":"2016","unstructured":"George\u00a0E Belch and Michael\u00a0A Belch. 2016. Advertising & promotion. The McGraw-Hill\/Irwin Series i Marketing."},{"key":"e_1_3_3_3_16_2","unstructured":"Lloyd\u00a0F Bitzer. 1968. The rhetorical situation. Philosophy & rhetoric (1968) 1\u201314."},{"key":"e_1_3_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642689"},{"key":"e_1_3_3_3_18_2","doi-asserted-by":"crossref","unstructured":"Ljubi\u0161a Boji\u0107 Olga Zagovora Asta Zelenkauskaite Vuk Vukovi\u0107 Milan \u010cabarkapa Selma Veseljevi\u0107\u00a0Jerkovi\u0107 and Ana Jovan\u010devi\u0107. 2025. Comparing large Language models and human annotators in latent content analysis of sentiment political leaning emotional intensity and sarcasm. Scientific reports 15 1 (2025) 11477.","DOI":"10.1038\/s41598-025-96508-3"},{"key":"e_1_3_3_3_19_2","doi-asserted-by":"crossref","unstructured":"Saskia Brand-Gruwel Yvonne Kammerer Ludo Van\u00a0Meeuwen and Tamara Van\u00a0Gog. 2017. Source evaluation of domain experts and novices during Web search. Journal of computer assisted learning 33 3 (2017) 234\u2013251.","DOI":"10.1111\/jcal.12162"},{"key":"e_1_3_3_3_20_2","volume-title":"How to create and use rubrics for formative assessment and grading","author":"Brookhart Susan\u00a0M","year":"2013","unstructured":"Susan\u00a0M Brookhart. 2013. How to create and use rubrics for formative assessment and grading. Ascd."},{"key":"e_1_3_3_3_21_2","doi-asserted-by":"crossref","unstructured":"Susan\u00a0M Brookhart and Fei Chen. 2015. The quality and effectiveness of descriptive rubrics. Educational Review 67 3 (2015) 343\u2013368.","DOI":"10.1080\/00131911.2014.929565"},{"key":"e_1_3_3_3_22_2","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared\u00a0D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et\u00a0al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877\u20131901."},{"key":"e_1_3_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3580959"},{"key":"e_1_3_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642731"},{"key":"e_1_3_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459043.3459065"},{"key":"e_1_3_3_3_26_2","doi-asserted-by":"crossref","unstructured":"Tsung-Ping Chen and Li Su. 2021. Attend to chords: Improving harmonic analysis of symbolic music using transformer-based models. Transactions of the International Society for Music Information Retrieval 4 1 (2021).","DOI":"10.5334\/tismir.65"},{"key":"e_1_3_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3641904"},{"key":"e_1_3_3_3_28_2","doi-asserted-by":"crossref","unstructured":"Michelene\u00a0TH Chi and Ruth Wylie. 2014. The ICAP framework: Linking cognitive engagement to active learning outcomes. Educational psychologist 49 4 (2014) 219\u2013243.","DOI":"10.1080\/00461520.2014.965823"},{"key":"e_1_3_3_3_29_2","unstructured":"Cheng-Han Chiang and Hung-yi Lee. 2023. Can large language models be an alternative to human evaluations? arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.01937 (2023)."},{"key":"e_1_3_3_3_30_2","doi-asserted-by":"crossref","unstructured":"Heeryung Choi Jelena Jovanovic Oleksandra Poquet Christopher Brooks Sre\u0107ko Joksimovi\u0107 and Joseph\u00a0Jay Williams. 2023. The benefit of reflection prompts for encouraging learning with hints in an online programming course. The Internet and Higher Education 58 (2023) 100903.","DOI":"10.1016\/j.iheduc.2023.100903"},{"key":"e_1_3_3_3_31_2","doi-asserted-by":"publisher","unstructured":"Daisy Cristine\u00a0Albuquerque da Silva C. Mello and Ana Cristina\u00a0Bicharra Garcia. 2024. Analysis of the Effectiveness of Large Language Models in Assessing Argumentative Writing and Generating Feedback. (2024) 573\u2013582. 10.5220\/0012466600003636","DOI":"10.5220\/0012466600003636"},{"key":"e_1_3_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.4135\/9781452230115"},{"key":"e_1_3_3_3_33_2","doi-asserted-by":"crossref","unstructured":"Phillip Dawson Michael Henderson Paige Mahoney Michael Phillips Tracii Ryan David Boud and Elizabeth Molloy. 2019. What makes for effective feedback: Staff and student perspectives. Assessment & Evaluation in Higher Education 44 1 (2019) 25\u201336.","DOI":"10.1080\/02602938.2018.1467877"},{"key":"e_1_3_3_3_34_2","doi-asserted-by":"crossref","unstructured":"Linqian Ding and Di Zou. 2024. Automated writing evaluation systems: A systematic review of Grammarly Pigai and Criterion with a perspective on future directions in the age of generative artificial intelligence. Education and Information Technologies 29 11 (2024) 14151\u201314203.","DOI":"10.1007\/s10639-023-12402-3"},{"key":"e_1_3_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/2858036.2858405"},{"key":"e_1_3_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445188"},{"key":"e_1_3_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3301275.3302316"},{"key":"e_1_3_3_3_38_2","doi-asserted-by":"publisher","unstructured":"Estela Ene and Virginia Kosobucki. 2016. Rubrics and corrective feedback in ESL writing: A longitudinal case study of an L2 writer. Assessing Writing 30 (2016) 3\u201320. 10.1016\/J.ASW.2016.06.003","DOI":"10.1016\/J.ASW.2016.06.003"},{"key":"e_1_3_3_3_39_2","doi-asserted-by":"crossref","unstructured":"Erkan Er Yannis Dimitriadis and Dragan Ga\u0161evi\u0107. 2021. A collaborative learning approach to dialogic peer feedback: a theoretical framework. Assessment & Evaluation in Higher Education 46 4 (2021) 586\u2013600.","DOI":"10.1080\/02602938.2020.1786497"},{"key":"e_1_3_3_3_40_2","doi-asserted-by":"crossref","unstructured":"Juan Escalante Austin Pack and Alex Barrett. 2023. AI-generated feedback on writing: Insights into efficacy and ENL student preference. International Journal of Educational Technology in Higher Education 20 1 (2023) 57.","DOI":"10.1186\/s41239-023-00425-2"},{"key":"e_1_3_3_3_41_2","doi-asserted-by":"crossref","unstructured":"Mingming Fan Xianyou Yang TszTung Yu Q\u00a0Vera Liao and Jian Zhao. 2022. Human-ai collaboration for UX evaluation: effects of explanation and synchronization. Proceedings of the ACM on human-computer interaction 6 CSCW1 (2022) 1\u201332.","DOI":"10.1145\/3512943"},{"key":"e_1_3_3_3_42_2","volume-title":"Problem-Solving Strategies for Writing. Third Edition.","author":"Flower Linda","year":"1989","unstructured":"Linda Flower. 1989. Problem-Solving Strategies for Writing. Third Edition.Harcourt Brace Jovanovich."},{"key":"e_1_3_3_3_43_2","doi-asserted-by":"crossref","unstructured":"Linda Flower and John\u00a0R Hayes. 1980. The cognition of discovery: Defining a rhetorical problem. College Composition & Communication 31 1 (1980) 21\u201332.","DOI":"10.58680\/ccc198015963"},{"key":"e_1_3_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300619"},{"key":"e_1_3_3_3_45_2","doi-asserted-by":"crossref","unstructured":"Alberto Gandolfi. 2025. GPT-4 in education: Evaluating aptness reliability and loss of coherence in solving calculus problems and grading submissions. International Journal of Artificial Intelligence in Education 35 1 (2025) 367\u2013397.","DOI":"10.1007\/s40593-024-00403-3"},{"key":"e_1_3_3_3_46_2","doi-asserted-by":"crossref","unstructured":"John\u00a0Maurice Gayed May Kristine\u00a0Jonson Carlon Angelu\u00a0Mari Oriola and Jeffrey\u00a0S Cross. 2022. Exploring an AI-based writing Assistant\u2019s impact on English language learners. Computers and Education: Artificial Intelligence 3 (2022) 100055.","DOI":"10.1016\/j.caeai.2022.100055"},{"key":"e_1_3_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3729176.3729199"},{"key":"e_1_3_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3532106.3533533"},{"key":"e_1_3_3_3_49_2","doi-asserted-by":"crossref","unstructured":"Mario Gielen and Bram De\u00a0Wever. 2015. Structuring peer assessment: Comparing the impact of the degree of structure on peer feedback content. Computers in Human Behavior 52 (2015) 315\u2013325.","DOI":"10.1016\/j.chb.2015.06.019"},{"key":"e_1_3_3_3_50_2","unstructured":"Jiawei Gu Xuhui Jiang Zhichao Shi Hexiang Tan Xuehao Zhai Chengjin Xu Wei Li Yinghan Shen Shengjie Ma Honghao Liu et\u00a0al. 2024. A survey on llm-as-a-judge. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2411.15594 (2024)."},{"key":"e_1_3_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3681758.3698010"},{"key":"e_1_3_3_3_52_2","doi-asserted-by":"crossref","unstructured":"Ali Alsagheer\u00a0Abdelal Hasan. 2022. Effect of rubric-based feedback on the writing skills of high school graders. Journal of Innovation in Educational and Cultural Research 3 1 (2022) 49\u201358.","DOI":"10.46843\/jiecr.v3i1.52"},{"key":"e_1_3_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.491"},{"key":"e_1_3_3_3_54_2","unstructured":"Helia Hashemi Jason Eisner Corby Rosset Benjamin Van\u00a0Durme and Chris Kedzie. 2024. LLM-rubric: A multidimensional calibrated approach to automated evaluation of natural language texts. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2501.00274 (2024)."},{"key":"e_1_3_3_3_55_2","doi-asserted-by":"crossref","unstructured":"John Hattie and Helen Timperley. 2007. The power of feedback. Review of educational research 77 1 (2007) 81\u2013112.","DOI":"10.3102\/003465430298487"},{"key":"e_1_3_3_3_56_2","unstructured":"Ari Holtzman Jan Buys Li Du Maxwell Forbes and Yejin Choi. 2019. The curious case of neural text degeneration. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1904.09751 (2019)."},{"key":"e_1_3_3_3_57_2","doi-asserted-by":"crossref","unstructured":"Lei Huang Weijiang Yu Weitao Ma Weihong Zhong Zhangyin Feng Haotian Wang Qianglong Chen Weihua Peng Xiaocheng Feng Bing Qin et\u00a0al. 2025. A survey on hallucination in large language models: Principles taxonomy challenges and open questions. ACM Transactions on Information Systems 43 2 (2025) 1\u201355.","DOI":"10.1145\/3703155"},{"key":"e_1_3_3_3_58_2","unstructured":"Daphne Ippolito Ann Yuan Andy Coenen and Sehmon Burnam. 2022. Creative writing with an ai-powered writing assistant: Perspectives from professional writers. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2211.05030 (2022)."},{"key":"e_1_3_3_3_59_2","unstructured":"Shinichiro Ishikawa. 2018. The ICNALE edited essays; A dataset for analysis of L2 English learner essays based on a new integrative viewpoint. English Corpus Studies 25 (2018) 117\u2013130."},{"key":"e_1_3_3_3_60_2","volume-title":"Testing ESL composition: A practical approach. English composition program.","author":"Jacobs Holly\u00a0L","year":"1981","unstructured":"Holly\u00a0L Jacobs et\u00a0al. 1981. Testing ESL composition: A practical approach. English composition program.ERIC."},{"key":"e_1_3_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3581196"},{"key":"e_1_3_3_3_62_2","doi-asserted-by":"crossref","unstructured":"Thorben Jansen Lars H\u00f6ft Luca Bahr Johanna Fleckenstein Jens M\u00f6ller Olaf K\u00f6ller and Jennifer Meyer. 2024. Empirische arbeit: comparing generative AI and expert feedback to students\u2019 writing: insights from student teachers. Psychologie in Erziehung und Unterricht 71 2 (2024) 80\u201392.","DOI":"10.2378\/peu2024.art08d"},{"key":"e_1_3_3_3_63_2","doi-asserted-by":"crossref","unstructured":"Anders Jonsson. 2014. Rubrics as a way of providing transparency in assessment. Assessment & Evaluation in Higher Education 39 7 (2014) 840\u2013852.","DOI":"10.1080\/02602938.2013.875117"},{"key":"e_1_3_3_3_64_2","doi-asserted-by":"crossref","unstructured":"Anders Jonsson and Gunilla Svingby. 2007. The use of scoring rubrics: Reliability validity and educational consequences. Educational research review 2 2 (2007) 130\u2013144.","DOI":"10.1016\/j.edurev.2007.05.002"},{"key":"e_1_3_3_3_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376219"},{"key":"e_1_3_3_3_66_2","doi-asserted-by":"crossref","unstructured":"Qusai Khraisha Sophie Put Johanna Kappenberg Azza Warraitch and Kristin Hadfield. 2024. Can large language models replace humans in systematic reviews? Evaluating GPT-4\u2019s efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages. Research Synthesis Methods 15 4 (2024) 616\u2013626.","DOI":"10.1002\/jrsm.1715"},{"key":"e_1_3_3_3_67_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Kim Seungone","year":"2023","unstructured":"Seungone Kim, Jamin Shin, Yejin Cho, Joel Jang, Shayne Longpre, Hwaran Lee, Sangdoo Yun, Seongjin Shin, Sungdong Kim, James Thorne, et\u00a0al. 2023. Prometheus: Inducing fine-grained evaluation capability in language models. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_3_3_68_2","doi-asserted-by":"publisher","DOI":"10.1145\/3706598.3714020"},{"key":"e_1_3_3_3_69_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3581001"},{"key":"e_1_3_3_3_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642216"},{"key":"e_1_3_3_3_71_2","doi-asserted-by":"crossref","unstructured":"Stephen\u00a0P Klein Brian\u00a0M Stecher Richard\u00a0J Shavelson Daniel McCaffrey Tor Ormseth Robert\u00a0M Bell Kathy Comfort and Abdul\u00a0R Othman. 1998. Analytic versus holistic scoring of science performance tasks. Applied Measurement in Education 11 2 (1998) 121\u2013137.","DOI":"10.1207\/s15324818ame1102_1"},{"key":"e_1_3_3_3_72_2","unstructured":"Klaus Krippendorff. 2011. Computing Krippendorff\u2019s alpha-reliability. (2011)."},{"key":"e_1_3_3_3_73_2","volume-title":"Content analysis: An introduction to its methodology","author":"Krippendorff Klaus","year":"2018","unstructured":"Klaus Krippendorff. 2018. Content analysis: An introduction to its methodology. Sage publications."},{"key":"e_1_3_3_3_74_2","doi-asserted-by":"publisher","DOI":"10.3389\/feduc.2020.572367"},{"key":"e_1_3_3_3_75_2","doi-asserted-by":"publisher","DOI":"10.1145\/3654777.3676419"},{"key":"e_1_3_3_3_76_2","doi-asserted-by":"crossref","unstructured":"Vivian Lai Yiming Zhang Chacha Chen Q\u00a0Vera Liao and Chenhao Tan. 2023. Selective explanations: Leveraging human input to align explainable ai. Proceedings of the ACM on Human-Computer Interaction 7 CSCW2 (2023) 1\u201335.","DOI":"10.1145\/3610206"},{"key":"e_1_3_3_3_77_2","doi-asserted-by":"crossref","unstructured":"J\u00a0Richard Landis and Gary\u00a0G Koch. 1977. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics (1977) 363\u2013374.","DOI":"10.2307\/2529786"},{"key":"e_1_3_3_3_78_2","doi-asserted-by":"crossref","unstructured":"J\u00a0Richard Landis and Gary\u00a0G Koch. 1977. The measurement of observer agreement for categorical data. biometrics (1977) 159\u2013174.","DOI":"10.2307\/2529310"},{"key":"e_1_3_3_3_79_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642697"},{"key":"e_1_3_3_3_80_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3502030"},{"key":"e_1_3_3_3_81_2","doi-asserted-by":"crossref","unstructured":"Min\u00a0Hun Lee and Chong\u00a0Jun Chew. 2023. Understanding the effect of counterfactual explanations on trust and reliance on ai for human-ai collaborative clinical decision making. Proceedings of the ACM on human-computer interaction 7 CSCW2 (2023) 1\u201322.","DOI":"10.1145\/3610218"},{"key":"e_1_3_3_3_82_2","doi-asserted-by":"crossref","unstructured":"Tiffany\u00a0I Leung Taiane de Azevedo\u00a0Cardoso Amaryllis Mavragani and Gunther Eysenbach. 2023. Best practices for using AI tools as an author peer reviewer or editor. e51584\u00a0pages.","DOI":"10.2196\/51584"},{"key":"e_1_3_3_3_83_2","doi-asserted-by":"publisher","DOI":"10.1145\/3706598.3713241"},{"key":"e_1_3_3_3_84_2","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376590"},{"key":"e_1_3_3_3_85_2","doi-asserted-by":"publisher","DOI":"10.1145\/1620545.1620576"},{"key":"e_1_3_3_3_86_2","doi-asserted-by":"publisher","DOI":"10.1145\/2037373.2037399"},{"key":"e_1_3_3_3_87_2","doi-asserted-by":"publisher","DOI":"10.1145\/1518701.1519023"},{"key":"e_1_3_3_3_88_2","unstructured":"Bill\u00a0Yuchen Lin Yuntian Deng Khyathi Chandu Faeze Brahman Abhilasha Ravichander Valentina Pyatkin Nouha Dziri Ronan\u00a0Le Bras and Yejin Choi. 2024. Wildbench: Benchmarking llms with challenging tasks from real users in the wild. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2406.04770 (2024)."},{"key":"e_1_3_3_3_89_2","unstructured":"Jieru Lin Danqing Huang Tiejun Zhao Dechen Zhan and Chin-Yew Lin. 2024. Designprobe: A graphic design benchmark for multimodal large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2404.14801 (2024)."},{"key":"e_1_3_3_3_90_2","doi-asserted-by":"publisher","unstructured":"A. Lipnevich E. Panadero and Terrence Calistro. 2022. Unraveling the effects of rubrics and exemplars on student writing performance. Journal of experimental psychology. Applied (2022). 10.1037\/xap0000434","DOI":"10.1037\/xap0000434"},{"key":"e_1_3_3_3_91_2","doi-asserted-by":"crossref","unstructured":"Jie Liu Kim Marriott Tim Dwyer and Guido Tack. 2023. Increasing user trust in optimisation through feedback and interaction. ACM Transactions on Computer-Human Interaction 29 5 (2023) 1\u201334.","DOI":"10.1145\/3503461"},{"key":"e_1_3_3_3_92_2","unstructured":"Shansong Liu Atin\u00a0Sakkeer Hussain Qilong Wu Chenshuo Sun and Ying Shan. 2024. Mumu-llama: Multi-modal music understanding and generation via large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2412.06660 3 5 (2024) 6."},{"key":"e_1_3_3_3_93_2","doi-asserted-by":"crossref","unstructured":"Yang Liu Dan Iter Yichong Xu Shuohang Wang Ruochen Xu and Chenguang Zhu. 2023. G-eval: NLG evaluation using gpt-4 with better human alignment. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2303.16634 (2023).","DOI":"10.18653\/v1\/2023.emnlp-main.153"},{"key":"e_1_3_3_3_94_2","unstructured":"Yang Liu Yuanshun Yao Jean-Francois Ton Xiaoying Zhang Ruocheng Guo Hao Cheng Yegor Klochkov Muhammad\u00a0Faaiz Taufiq and Hang Li. 2023. Trustworthy llms: a survey and guideline for evaluating large language models\u2019 alignment. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2308.05374 (2023)."},{"key":"e_1_3_3_3_95_2","doi-asserted-by":"crossref","unstructured":"Teun Lucassen Rienco Muilwijk Matthijs\u00a0L Noordzij and Jan\u00a0Maarten Schraagen. 2013. Topic familiarity and information skills in online credibility evaluation. Journal of the American Society for Information Science and Technology 64 2 (2013) 254\u2013264.","DOI":"10.1002\/asi.22743"},{"key":"e_1_3_3_3_96_2","doi-asserted-by":"crossref","unstructured":"Yan Lyu Hangxin Lu Min\u00a0Kyung Lee Gerhard Schmitt and Brian\u00a0Y Lim. 2024. IF-City: Intelligible fair city planning to measure explain and mitigate inequality. IEEE Transactions on Visualization and Computer Graphics 30 7 (2024) 3749\u20133766.","DOI":"10.1109\/TVCG.2023.3239909"},{"key":"e_1_3_3_3_97_2","doi-asserted-by":"crossref","unstructured":"Qianou Ma Weirui Peng Chenyang Yang Hua Shen Ken Koedinger and Tongshuang Wu. 2025. What should we engineer in prompts? training humans in requirement-driven llm use. ACM Transactions on Computer-Human Interaction 32 4 (2025) 1\u201327.","DOI":"10.1145\/3731756"},{"key":"e_1_3_3_3_98_2","doi-asserted-by":"crossref","unstructured":"Aman Madaan Niket Tandon Prakhar Gupta Skyler Hallinan Luyu Gao Sarah Wiegreffe Uri Alon Nouha Dziri Shrimai Prabhumoye Yiming Yang et\u00a0al. 2023. Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems 36 (2023) 46534\u201346594.","DOI":"10.52202\/075280-2019"},{"key":"e_1_3_3_3_99_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581641.3584048"},{"key":"e_1_3_3_3_100_2","doi-asserted-by":"publisher","unstructured":"Jennifer Meyer Thorben Jansen Ronja Schiller Lucas Liebenow Marlene Steinbach Andrea Horbach and Johanna Fleckenstein. 2023. Using LLMs to bring evidence-based feedback into the classroom: AI-generated feedback increases secondary students\u2019 text revision motivation and positive emotions. Comput. Educ. Artif. Intell. 6 (2023) 100199. 10.1016\/j.caeai.2023.100199","DOI":"10.1016\/j.caeai.2023.100199"},{"key":"e_1_3_3_3_101_2","doi-asserted-by":"crossref","unstructured":"Jennifer Meyer Thorben Jansen Ronja Schiller Lucas\u00a0W Liebenow Marlene Steinbach Andrea Horbach and Johanna Fleckenstein. 2024. Using LLMs to bring evidence-based feedback into the classroom: AI-generated feedback increases secondary students\u2019 text revision motivation and positive emotions. Computers and Education: Artificial Intelligence 6 (2024) 100199.","DOI":"10.1016\/j.caeai.2023.100199"},{"key":"e_1_3_3_3_102_2","doi-asserted-by":"crossref","unstructured":"Katelyn Morrison Philipp Spitzer Violet Turri Michelle Feng Niklas K\u00fchl and Adam Perer. 2024. The impact of imperfect XAI on human-AI decision-making. Proceedings of the ACM on human-computer interaction 8 CSCW1 (2024) 1\u201339.","DOI":"10.1145\/3641022"},{"key":"e_1_3_3_3_103_2","doi-asserted-by":"publisher","DOI":"10.1145\/3351095.3372850"},{"key":"e_1_3_3_3_104_2","unstructured":"B.\u00a0B. Mullinix. 2002. A Rubric for Rubrics. Monmouth University Faculty Resource Center. https:\/\/www.bates.edu\/research\/files\/2018\/07\/A-Rubric-for-Rubrics.pdf"},{"key":"e_1_3_3_3_105_2","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3173629"},{"key":"e_1_3_3_3_106_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3517667"},{"key":"e_1_3_3_3_107_2","doi-asserted-by":"publisher","DOI":"10.1145\/2702123.2702149"},{"key":"e_1_3_3_3_108_2","doi-asserted-by":"publisher","unstructured":"Ernesto Panadero and Anders J\u00f6nsson. 2013. The Use of Scoring Rubrics for Formative Assessment Purposes Revisited: A Review. Educational Research Review 9 (01 2013) 129\u2013144. 10.1016\/j.edurev.2013.01.002","DOI":"10.1016\/j.edurev.2013.01.002"},{"key":"e_1_3_3_3_109_2","doi-asserted-by":"publisher","DOI":"10.1145\/3706598.3713726"},{"key":"e_1_3_3_3_110_2","doi-asserted-by":"crossref","unstructured":"Cecilia Panigutti Andrea Beretta Daniele Fadda Fosca Giannotti Dino Pedreschi Alan Perotti and Salvatore Rinzivillo. 2023. Co-design of human-centered explainable AI for clinical decision support. ACM Transactions on Interactive Intelligent Systems 13 4 (2023) 1\u201335.","DOI":"10.1145\/3587271"},{"key":"e_1_3_3_3_111_2","unstructured":"Samir Passi and Mihaela Vorvoreanu. 2022. Overreliance on AI literature review. Microsoft Research 339 (2022) 340."},{"key":"e_1_3_3_3_112_2","unstructured":"PI Pavlik Keith Brawner Andrew Olney and Antonija Mitrovic. 2013. A review of student models used in intelligent tutoring systems. Design recommendations for intelligent tutoring systems 1 (2013) 39\u201368."},{"key":"e_1_3_3_3_113_2","doi-asserted-by":"crossref","unstructured":"Ann Poulos and Mary\u00a0Jane Mahony. 2008. Effectiveness of feedback: The students\u2019 perspective. Assessment & Evaluation in Higher Education 33 2 (2008) 143\u2013154.","DOI":"10.1080\/02602930601127869"},{"key":"e_1_3_3_3_114_2","doi-asserted-by":"publisher","unstructured":"Margaret Price Karen Handley Jill Millar and Berry O\u2019Donovan. 2010. Feedback : all that effort but what is the effect? Assessment & Evaluation in Higher Education 35 3 (2010) 277\u2013289. 10.1080\/02602930903541007","DOI":"10.1080\/02602930903541007"},{"key":"e_1_3_3_3_115_2","doi-asserted-by":"crossref","unstructured":"Hanieh\u00a0Shafiee Rad Rasoul Alipour and Aliakbar Jafarpour. 2024. Using artificial intelligence to foster students\u2019 writing feedback literacy engagement and outcome: A case of Wordtune application. Interactive Learning Environments 32 9 (2024) 5020\u20135040.","DOI":"10.1080\/10494820.2023.2208170"},{"key":"e_1_3_3_3_116_2","doi-asserted-by":"crossref","unstructured":"Y\u00a0Malini Reddy and Heidi Andrade. 2010. A review of rubric use in higher education. Assessment & evaluation in higher education 35 4 (2010) 435\u2013448.","DOI":"10.1080\/02602930902862859"},{"key":"e_1_3_3_3_117_2","doi-asserted-by":"crossref","unstructured":"Jeba Rezwana and Mary\u00a0Lou Maher. 2023. Designing creative AI partners with COFI: A framework for modeling interaction in human-AI co-creative systems. ACM Transactions on Computer-Human Interaction 30 5 (2023) 1\u201328.","DOI":"10.1145\/3519026"},{"key":"e_1_3_3_3_118_2","doi-asserted-by":"crossref","unstructured":"Joan\u00a0M Sargeant Karen\u00a0V Mann Cees\u00a0P Van\u00a0der Vleuten and Job\u00a0F Metsemakers. 2009. Reflection: a link between receiving and using assessment feedback. Advances in health sciences education 14 3 (2009) 399\u2013410.","DOI":"10.1007\/s10459-008-9124-4"},{"key":"e_1_3_3_3_119_2","doi-asserted-by":"publisher","DOI":"10.1145\/3654777.3676450"},{"key":"e_1_3_3_3_120_2","doi-asserted-by":"publisher","DOI":"10.1145\/1099554.1099747"},{"key":"e_1_3_3_3_121_2","doi-asserted-by":"crossref","unstructured":"Sungbok Shin Sanghyun Hong and Niklas Elmqvist. 2025. Visualizationary: Automating design feedback for visualization designers using llms. IEEE Transactions on Visualization and Computer Graphics (2025).","DOI":"10.1109\/TVCG.2025.3579700"},{"key":"e_1_3_3_3_122_2","doi-asserted-by":"crossref","unstructured":"Jeffrey\u00a0S Smith. 2017. Assessing creativity: Creating a rubric to effectively evaluate mediated digital portfolios. Journalism & Mass Communication Educator 72 1 (2017) 24\u201336.","DOI":"10.1177\/1077695816648866"},{"key":"e_1_3_3_3_123_2","unstructured":"Changhao Song Yazhou Zhang Hui Gao Ben Yao and Peng Zhang. 2025. Large language models for subjective language understanding: A survey. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2508.07959 (2025)."},{"key":"e_1_3_3_3_124_2","unstructured":"Kaya Stechly Matthew Marquez and Subbarao Kambhampati. 2023. Gpt-4 doesn\u2019t know it\u2019s wrong: An analysis of iterative prompting for reasoning problems. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.12397 (2023)."},{"key":"e_1_3_3_3_125_2","doi-asserted-by":"publisher","unstructured":"Jacob Steiss Tamara Tate Steve Graham Jazmin Cruz Michael Hebert Jiali Wang Youngsun Moon Waverly Tseng M. Warschauer and Carol\u00a0Booth Olson. 2024. Comparing the quality of human and ChatGPT feedback of students\u2019 writing. Learning and Instruction (2024). 10.1016\/j.learninstruc.2024.101894","DOI":"10.1016\/j.learninstruc.2024.101894"},{"key":"e_1_3_3_3_126_2","volume-title":"Introduction to rubrics: An assessment tool to save grading time, convey effective feedback, and promote student learning","author":"Stevens Dannelle\u00a0D","year":"2023","unstructured":"Dannelle\u00a0D Stevens. 2023. Introduction to rubrics: An assessment tool to save grading time, convey effective feedback, and promote student learning. Routledge."},{"key":"e_1_3_3_3_127_2","doi-asserted-by":"publisher","DOI":"10.1145\/3640543.3645159"},{"key":"e_1_3_3_3_128_2","unstructured":"Annalisa Szymanski Simret\u00a0Araya Gebreegziabher Oghenemaro Anuyah Ronald\u00a0A Metoyer and Toby Jia-Jun Li. 2024. Comparing criteria development across domain experts lay users and models in large language model evaluation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2410.02054 (2024)."},{"key":"e_1_3_3_3_129_2","doi-asserted-by":"publisher","DOI":"10.1145\/3708359.3712091"},{"key":"e_1_3_3_3_130_2","doi-asserted-by":"crossref","unstructured":"Xiaoyi Tang Hongwei Chen Daoyu Lin and Kexin Li. 2024. Harnessing LLMs for multi-dimensional writing assessment: Reliability and alignment with human judgments. Heliyon 10 14 (2024).","DOI":"10.1016\/j.heliyon.2024.e34262"},{"key":"e_1_3_3_3_131_2","doi-asserted-by":"crossref","unstructured":"Miles Turpin Julian Michael Ethan Perez and Samuel Bowman. 2023. Language models don\u2019t always say what they think: Unfaithful explanations in chain-of-thought prompting. Advances in Neural Information Processing Systems 36 (2023) 74952\u201374965.","DOI":"10.52202\/075280-3275"},{"key":"e_1_3_3_3_132_2","unstructured":"Satoru Uchida. 2024. Evaluating the accuracy of ChatGPT in assessing writing and speaking: A verification study using ICNALE GRA. Learner Corpus Studies in Asia and the World 6 (2024) 1\u201312."},{"key":"e_1_3_3_3_133_2","unstructured":"Manya Wadhwa Zayne Sprague Chaitanya Malaviya Philippe Laban Junyi\u00a0Jessy Li and Greg Durrett. 2025. Evalagent: Discovering implicit evaluation criteria from the web. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2504.15219 (2025)."},{"key":"e_1_3_3_3_134_2","unstructured":"Xingyao Wang Zihan Wang Jiateng Liu Yangyi Chen Lifan Yuan Hao Peng and Heng Ji. 2023. Mint: Evaluating LLMs in multi-turn interaction with tools and language feedback. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2309.10691 (2023)."},{"key":"e_1_3_3_3_135_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3517551"},{"key":"e_1_3_3_3_136_2","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3580816"},{"key":"e_1_3_3_3_137_2","doi-asserted-by":"crossref","unstructured":"Monika Westphal Michael V\u00f6ssing Gerhard Satzger Galit\u00a0B Yom-Tov and Anat Rafaeli. 2023. Decision control and explanations in human-AI collaboration: Improving user perceptions and compliance. Computers in Human Behavior 144 (2023) 107714.","DOI":"10.1016\/j.chb.2023.107714"},{"key":"e_1_3_3_3_138_2","doi-asserted-by":"crossref","unstructured":"Naomi\u00a0E Winstone Robert\u00a0A Nash Michael Parker and James Rowntree. 2017. Supporting learners\u2019 agentic engagement with feedback: A systematic review and a taxonomy of recipience processes. Educational psychologist 52 1 (2017) 17\u201337.","DOI":"10.1080\/00461520.2016.1207538"},{"key":"e_1_3_3_3_139_2","unstructured":"Kenneth Wolf and Ellen Stevens. 2007. The role of rubrics in advancing and assessing student learning. Journal of effective teaching 7 1 (2007) 3\u201314."},{"key":"e_1_3_3_3_140_2","doi-asserted-by":"crossref","unstructured":"Mareike Wollenschl\u00e4ger John Hattie Nils Machts Jens M\u00f6ller and Ute Harms. 2016. What makes rubrics effective in teacher-feedback? Transparency of learning goals is not enough. Contemporary Educational Psychology 44 (2016) 1\u201311.","DOI":"10.1016\/j.cedpsych.2015.11.003"},{"key":"e_1_3_3_3_141_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.02301"},{"key":"e_1_3_3_3_142_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-78462-1_13"},{"key":"e_1_3_3_3_143_2","unstructured":"Wenjing Xie Juxin Niu Chun\u00a0Jason Xue and Nan Guan. 2024. Grade like a human: Rethinking automated assessment with large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2405.19694 (2024)."},{"key":"e_1_3_3_3_144_2","doi-asserted-by":"crossref","unstructured":"Wei Xu Marvin\u00a0J Dainoff Liezhong Ge and Zaifeng Gao. 2023. Transitioning to human interaction with AI systems: New challenges and opportunities for HCI professionals to enable human-centered AI. International Journal of Human\u2013Computer Interaction 39 3 (2023) 494\u2013518.","DOI":"10.1080\/10447318.2022.2041900"},{"key":"e_1_3_3_3_145_2","doi-asserted-by":"crossref","unstructured":"Da Yan. 2024. Rubric co-creation to promote quality interactivity and uptake of peer feedback. Assessment & Evaluation in Higher Education 49 8 (2024) 1017\u20131034.","DOI":"10.1080\/02602938.2024.2333005"},{"key":"e_1_3_3_3_146_2","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376301"},{"key":"e_1_3_3_3_147_2","doi-asserted-by":"publisher","DOI":"10.1145\/3654777.3676357"},{"key":"e_1_3_3_3_148_2","doi-asserted-by":"crossref","unstructured":"So\u00a0Jung Yune Sang\u00a0Yeoup Lee Sun\u00a0Ju Im Bee\u00a0Sung Kam and Sun\u00a0Yong Baek. 2018. Holistic rubric vs. analytic rubric for measuring clinical performance levels in medical students. BMC medical education 18 1 (2018) 124.","DOI":"10.1186\/s12909-018-1228-9"},{"key":"e_1_3_3_3_149_2","doi-asserted-by":"publisher","DOI":"10.1145\/3706598.3714316"},{"key":"e_1_3_3_3_150_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3501826"},{"key":"e_1_3_3_3_151_2","unstructured":"Xingxuan Zhang Jiansheng Li Wenjing Chu Junjia Hai Renzhe Xu Yuqing Yang Shikai Guan Jiazheng Xu and Peng Cui. 2024. On the out-of-distribution generalization of multimodal large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.06599 (2024)."},{"key":"e_1_3_3_3_152_2","unstructured":"Yue Zhang Yafu Li Leyang Cui Deng Cai Lemao Liu Tingchen Fu Xinting Huang Enbo Zhao Yu Zhang Yulong Chen et\u00a0al. 2025. Siren\u2019s Song in the AI Ocean: A Survey on Hallucination in Large Language Models. Computational Linguistics (2025) 1\u201346."},{"key":"e_1_3_3_3_153_2","doi-asserted-by":"publisher","DOI":"10.1145\/3586183.3606800"},{"key":"e_1_3_3_3_154_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2025.emnlp-main.483"},{"key":"e_1_3_3_3_155_2","doi-asserted-by":"crossref","unstructured":"Shuang Zhou Mingquan Lin Sirui Ding Jiashuo Wang Canyu Chen Genevieve\u00a0B Melton James Zou and Rui Zhang. 2025. Explainable differential diagnosis with dual-inference large language models. npj Health Systems 2 1 (2025) 12.","DOI":"10.1038\/s44401-025-00015-6"}],"event":{"name":"CHI 2026: CHI Conference on Human Factors in Computing Systems","location":"Barcelona Spain","acronym":"CHI '26","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"]},"container-title":["Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3772318.3790539","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T10:29:48Z","timestamp":1776421788000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3772318.3790539"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,13]]},"references-count":154,"alternative-id":["10.1145\/3772318.3790539","10.1145\/3772318"],"URL":"https:\/\/doi.org\/10.1145\/3772318.3790539","relation":{},"subject":[],"published":{"date-parts":[[2026,4,13]]},"assertion":[{"value":"2026-04-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}