{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T00:50:57Z","timestamp":1774399857913,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":38,"publisher":"ACM","funder":[{"DOI":"10.13039\/501100006374","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1846017"],"award-info":[{"award-number":["1846017"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,7,18]]},"DOI":"10.1145\/3731120.3744591","type":"proceedings-article","created":{"date-parts":[[2025,7,18]],"date-time":"2025-07-18T13:34:06Z","timestamp":1752845646000},"page":"254-263","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Criteria-Based LLM Relevance Judgments"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-3297-8888","authenticated-orcid":false,"given":"Naghmeh","family":"Farzi","sequence":"first","affiliation":[{"name":"University of New Hampshire, Durham, NH, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1624-3907","authenticated-orcid":false,"given":"Laura","family":"Dietz","sequence":"additional","affiliation":[{"name":"University of New Hampshire, Durham, NH, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,7,18]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Can we use large language models to fill relevance judgment holes? arXiv preprint arXiv:2405.05600","author":"Abbasiantaeb Zahra","year":"2024","unstructured":"Zahra Abbasiantaeb, Chuan Meng, Leif Azzopardi, and Mohammad Aliannejadi. Can we use large language models to fill relevance judgment holes? arXiv preprint arXiv:2405.05600, 2024."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3673791.3698431"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3726302.3730305"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(97)00078-2"},{"key":"e_1_3_2_1_5_1","volume-title":"The concept of relevance in IR. Department of Information Studies","author":"Borlund Pia","year":"2003","unstructured":"Pia Borlund. The concept of relevance in IR. Department of Information Studies, Royal School of Library and Information Science, 2003."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(82)90033-4"},{"key":"e_1_3_2_1_7_1","volume-title":"Proceedings of the 10th International Workshop on Evaluating Information Access (EVIA 2025), a Satellite Workshop of the NTCIR-18 Conference","author":"Charles L.","year":"2025","unstructured":"Charles L. A. Clarke and Laura Dietz. LLM-based relevance assessment still can't replace human relevance assessment. In Proceedings of the 10th International Workshop on Evaluating Information Access (EVIA 2025), a Satellite Workshop of the NTCIR-18 Conference, Tokyo, Japan, June 2025. National Institute of Informatics."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(99)00072-2"},{"key":"e_1_3_2_1_9_1","volume-title":"Daniel Campos. Overview of the TREC 2020 deep learning track. In Proceedings of the Twenty-Ninth Text REtrieval Conference (TREC 2020","volume":"1266","author":"Craswell Nick","year":"2020","unstructured":"Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. Overview of the TREC 2020 deep learning track. In Proceedings of the Twenty-Ninth Text REtrieval Conference (TREC 2020), volume 1266 of NIST Special Publication, Online (virtual), November 2020. National Institute of Standards and Technology."},{"key":"e_1_3_2_1_10_1","volume-title":"Voorhees. Overview of the TREC 2019 deep learning track. In Proceedings of the Twenty-Eighth Text REtrieval Conference (TREC 2019","volume":"1250","author":"Craswell Nick","year":"2019","unstructured":"Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. Voorhees. Overview of the TREC 2019 deep learning track. In Proceedings of the Twenty-Eighth Text REtrieval Conference (TREC 2019), volume 1250 of NIST Special Publication, Gaithersburg, MD, USA, November 2019. National Institute of Standards and Technology."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626772.3657871"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3731120.3744588"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3578337.3605136"},{"key":"e_1_3_2_1_14_1","volume-title":"Farzi and Laura Dietz. EXAM: LLM-based Answerability Metrics for IR Evaluation. In Proceedings of LLM4Eval: The First Workshop on Large Language Models for Evaluation in Information Retrieval","author":"Naghmeh","year":"2024","unstructured":"Naghmeh Farzi and Laura Dietz. EXAM: LLM-based Answerability Metrics for IR Evaluation. In Proceedings of LLM4Eval: The First Workshop on Large Language Models for Evaluation in Information Retrieval, 2024."},{"key":"e_1_3_2_1_15_1","first-page":"175","volume-title":"Automatic Rubric-based Evaluation of Retrieve\/Generate Systems. In Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval","author":"Farzi Naghmeh","year":"2024","unstructured":"Naghmeh Farzi and Laura Dietz. Pencils Down! Automatic Rubric-based Evaluation of Retrieve\/Generate Systems. In Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval, pages 175-184, Washington DC USA, August 2024. ACM."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3726302.3730317"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-45442-5_21"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3539618.3592032"},{"key":"e_1_3_2_1_19_1","first-page":"320","volume-title":"Proceedings of the 20th International ISCRAM Conference","author":"McCreadie Richard","year":"2023","unstructured":"Richard McCreadie and Cody Buntain. Crisisfacts: building and evaluating crisis timelines. In Proceedings of the 20th International ISCRAM Conference, pages 320-339, 2023."},{"key":"e_1_3_2_1_20_1","volume-title":"Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24)","author":"Meng Chuan","year":"2024","unstructured":"Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, and Maarten de Rijke. Query performance prediction using relevance judgments generated by large language models. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24), Washington, DC, USA, July 2024. ACM."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2124295.2124343"},{"key":"e_1_3_2_1_22_1","volume-title":"Initial nugget evaluation results for the trec 2024 rag track with the autonuggetizer framework","author":"Pradeep Ronak","year":"2024","unstructured":"Ronak Pradeep, Nandan Thakur, Shivani Upadhyay, Daniel Campos, Nick Craswell, and Jimmy Lin. Initial nugget evaluation results for the trec 2024 rag track with the autonuggetizer framework, 2024."},{"key":"e_1_3_2_1_23_1","volume-title":"Towards understanding bias in synthetic data for evaluation","author":"Rahmani Hossein A.","year":"2025","unstructured":"Hossein A. Rahmani, Varsha Ramineni, Nick Craswell, Bhaskar Mitra, and Emine Yilmaz. Towards understanding bias in synthetic data for evaluation, 2025."},{"key":"e_1_3_2_1_24_1","volume-title":"Judging the judges: A collection of llm-generated relevance judgements","author":"Rahmani Hossein A.","year":"2025","unstructured":"Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, and Emine Yilmaz. Judging the judges: A collection of llm-generated relevance judgements, 2025."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3701716.3715536"},{"key":"e_1_3_2_1_26_1","first-page":"3040","volume-title":"Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24)","author":"Rahmani Hossein A.","year":"2024","unstructured":"Hossein A. Rahmani, Emine Yilmaz, Nick Craswell, Bhaskar Mitra, Paul Thomas, Charles L. A. Clarke, Mohammad Aliannejadi, Clemencia Siro, and Guglielmo Faggioli. LLMJudge: LLMs for relevance judgments. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24), pages 3040-3043, Washington, DC, USA, July 2024. ACM."},{"key":"e_1_3_2_1_27_1","volume-title":"Swan: A generic framework for auditing textual conversational systems","author":"Sakai Tetsuya","year":"2023","unstructured":"Tetsuya Sakai. Swan: A generic framework for auditing textual conversational systems, 2023."},{"key":"e_1_3_2_1_28_1","first-page":"136","volume-title":"DESIRES","author":"Sander David P","year":"2021","unstructured":"David P Sander and Laura Dietz. Exam: How to evaluate retrieve-and-generate systems for users who do not (yet) know what they want. In DESIRES, pages 136-146, 2021."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/1315930.1315947"},{"issue":"1","key":"e_1_3_2_1_30_1","first-page":"10","article-title":"LLMs to make relevance judgments","volume":"1","author":"Soboroff Ian","unstructured":"Ian Soboroff. Don't use LLMs to make relevance judgments. Information Retrieval Research Journal, 1(1):10.54195\/irrj.19625, March 2025.","journal-title":"Information Retrieval Research Journal"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.923"},{"key":"e_1_3_2_1_32_1","volume-title":"Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Thomas Paul","year":"2023","unstructured":"Paul Thomas, Seth Spielman, Nick Craswell, and Bhaskar Mitra. Large language models can accurately predict searcher preferences. In Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023."},{"key":"e_1_3_2_1_33_1","volume-title":"Together inference api","author":"Together","year":"2024","unstructured":"Together AI. Together inference api, 2024. [Online Service] Available at: https:\/\/www.together.ai\/."},{"key":"e_1_3_2_1_34_1","volume-title":"Umbrela: Umbrela is the (open-source reproduction of the) bing relevance assessor","author":"Upadhyay Shivani","year":"2024","unstructured":"Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Nick Craswell, and Jimmy Lin. Umbrela: Umbrela is the (open-source reproduction of the) bing relevance assessor, 2024."},{"key":"e_1_3_2_1_35_1","first-page":"24824","volume-title":"Quoc V. Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems 35 (NeurIPS","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Huai hsin Chi, Quoc V. Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022), pages 24824-24837. Curran Associates, 2022."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.5555\/1133031.1133039"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626772.3657784"},{"key":"e_1_3_2_1_38_1","volume-title":"Proceedings of the 11th International Conference on Learning Representations (ICLR 2023","author":"Zhou Denny","year":"2023","unstructured":"Denny Zhou, Nathanael Sch\u00e4rli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, and Ed H. Chi. Least-to-most prompting enables complex reasoning in large language models. In Proceedings of the 11th International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, May 2023. OpenReview."}],"event":{"name":"ICTIR '25: International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval","location":"Padua Italy","acronym":"ICTIR '25","sponsor":["SIGIR ACM Special Interest Group on Information Retrieval"]},"container-title":["Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR)"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3731120.3744591","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T13:18:05Z","timestamp":1755868685000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3731120.3744591"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,18]]},"references-count":38,"alternative-id":["10.1145\/3731120.3744591","10.1145\/3731120"],"URL":"https:\/\/doi.org\/10.1145\/3731120.3744591","relation":{},"subject":[],"published":{"date-parts":[[2025,7,18]]},"assertion":[{"value":"2025-07-18","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}