{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,20]],"date-time":"2026-01-20T04:11:42Z","timestamp":1768882302596,"version":"3.49.0"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,8,21]],"date-time":"2023-08-21T00:00:00Z","timestamp":1692576000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"European Union\u2019s Horizon 2020 research and innovation programme","award":["893667"],"award-info":[{"award-number":["893667"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2024,1,31]]},"abstract":"<jats:p>\n            The present study leverages a recent opportunity we had to create a new English web search test collection for the NTCIR-16 We Want Web (WWW-4) task, which concluded in June 2022. More specifically, through the test collection construction effort, we examined two factors that may affect the relevance assessments of depth-\n            <jats:italic>k<\/jats:italic>\n            pools, which in turn may affect the relative evaluation of different IR systems. The first factor is the document ordering strategy for the assessors, namely, prioritisation (PRI) and randomisation (RND). PRI is a method that has been used in NTCIR tasks for over a decade; it ranks the pooled documents by a kind of pseudorelevance for the assessors. The second factor is assessor type, i.e., Gold or Bronze. Gold assessors are the topic creators and therefore they \u201cknow\u201d which documents are (highly) relevant and which are not; Bronze assessors are not the topic creators and may lack sufficient knowledge about the topics. We believe that our study is unique in that the authors of this article served as the Gold assessors when creating the WWW-4 test collection, which enabled us to closely examine why Bronze assessments differ from the Gold ones. Our research questions examine assessor efficiency (\n            <jats:bold>RQ1<\/jats:bold>\n            ), inter-assessor agreement (\n            <jats:bold>RQ2<\/jats:bold>\n            ), system ranking similarity with different qrels files (\n            <jats:bold>RQ3<\/jats:bold>\n            ), system ranking robustness to the choice of test topics (\n            <jats:bold>RQ4<\/jats:bold>\n            ), and the reasons why Bronze assessors tend to be more liberal than Gold assessors (\n            <jats:bold>RQ5<\/jats:bold>\n            ). The most remarkable of our results are as follows: First, in the comparisons for\n            <jats:bold>RQ1<\/jats:bold>\n            through\n            <jats:bold>RQ4<\/jats:bold>\n            , it turned out that what may matter more than the document ordering strategy (PRI vs. RND) and the assessor type (Gold vs. Bronze) is how well-motivated and\/or well-trained the Bronze assessors are. Second, regarding\n            <jats:bold>RQ5<\/jats:bold>\n            , of the documents originally judged nonrelevant by the Gold assessors contrary to the Bronze assessors in our experiments, almost one half were truly relevant according to the Gold assessors\u2019 own reconsiderations. This result suggests that even Gold assessors are far from perfect; budget permitting, it may be beneficial to hire highly motivated Bronze assessors in addition to Gold assessors so they can complement each other.\n          <\/jats:p>","DOI":"10.1145\/3600227","type":"journal-article","created":{"date-parts":[[2023,5,27]],"date-time":"2023-05-27T10:27:16Z","timestamp":1685183236000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["On the Ordering of Pooled Web Pages, Gold Assessments, and Bronze Assessments"],"prefix":"10.1145","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6720-963X","authenticated-orcid":false,"given":"Tetsuya","family":"Sakai","sequence":"first","affiliation":[{"name":"Waseda University, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6751-5303","authenticated-orcid":false,"given":"Sijie","family":"Tao","sequence":"additional","affiliation":[{"name":"Waseda University, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8600-8203","authenticated-orcid":false,"given":"Nuo","family":"Chen","sequence":"additional","affiliation":[{"name":"Waseda University, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-7204-526X","authenticated-orcid":false,"given":"Yujing","family":"Li","sequence":"additional","affiliation":[{"name":"Waseda University, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7001-4817","authenticated-orcid":false,"given":"Maria","family":"Maistro","sequence":"additional","affiliation":[{"name":"University of Copenhagen, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1233-0184","authenticated-orcid":false,"given":"Zhumin","family":"Chu","sequence":"additional","affiliation":[{"name":"Tsinghua University, P. R. C."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9219-6239","authenticated-orcid":false,"given":"Nicola","family":"Ferro","sequence":"additional","affiliation":[{"name":"University of Padua, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,8,21]]},"reference":[{"key":"e_1_3_4_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/2637002.2637025"},{"key":"e_1_3_4_3_2","volume-title":"Proceedings of the 16th Text REtrieval Conference (TREC\u201907)","author":"Allan James","year":"2008","unstructured":"James Allan, Ben Carterette, Javed A. Aslam, Virgil Pavlu, Blagovest Dachev, and Evangelos Kanoulas. 2008. Million query track 2007 overview. In Proceedings of the 16th Text REtrieval Conference (TREC\u201907). NIST."},{"key":"e_1_3_4_4_2","volume-title":"Proceedings of the 26th Text REtrieval Conference (TREC\u201917)","author":"Allan James","year":"2018","unstructured":"James Allan, Donna Harman, Evangelos Kanoulas, Dan Li, Christophe Van Gysel, and Ellen Voorhees. 2018. TREC common core track overview. In Proceedings of the 26th Text REtrieval Conference (TREC\u201917). NIST."},{"key":"e_1_3_4_5_2","doi-asserted-by":"crossref","unstructured":"Omar Alonso and Stefano Mizzaro. 2012. Using crowdsourcing for TREC relevance assessment. Inf. Process. Manag. 48 6 (2012) 1053\u20131066.","DOI":"10.1016\/j.ipm.2012.01.004"},{"key":"e_1_3_4_6_2","doi-asserted-by":"crossref","first-page":"667","DOI":"10.1145\/1390334.1390447","volume-title":"Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Bailey Peter","year":"2008","unstructured":"Peter Bailey, Nick Craswell, Ian Soboroff, Paul Thomas, Arjen P. de Vries, and Emine Yilmaz. 2008. Relevance assessment: Are judges exchangeable and does it matter? In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, 667\u2013674."},{"key":"e_1_3_4_7_2","volume-title":"Proceedings of the 18th Text REtrieval Conference (TREC\u201909)","author":"Carterette Ben","year":"2010","unstructured":"Ben Carterette, Virgil Pavlu, Hui Fang, and Evangelos Kanoulas. 2010. Million query track 2009 overview. In Proceedings of the 18th Text REtrieval Conference (TREC\u201909). NIST."},{"key":"e_1_3_4_8_2","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1145\/2433396.2433411","volume-title":"Proceedings of the 6th ACM International Conference on Web Search and Data Mining","author":"Chouldechova Alexandra","year":"2013","unstructured":"Alexandra Chouldechova and David Mease. 2013. Differences in search engine evaluations between query owners and non-owners. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, 103\u2013112."},{"key":"e_1_3_4_9_2","volume-title":"Proceedings of the 18th Text REtrieval Conference (TREC\u201909)","author":"Clarke Charles L. A.","year":"2010","unstructured":"Charles L. A. Clarke, Nick Craswell, and Ian Soboroff. 2010. Overview of the TREC 2009 web track. In Proceedings of the 18th Text REtrieval Conference (TREC\u201909). NIST."},{"key":"e_1_3_4_10_2","volume-title":"The Effect of Variation in Relevance Assessments in Comparative Experimental Tests of Indexing Languages","author":"Cleverdon Cyril","year":"1970","unstructured":"Cyril Cleverdon. 1970. The Effect of Variation in Relevance Assessments in Comparative Experimental Tests of Indexing Languages. Technical Report. College of Aeronautics, Cranfield, UK."},{"key":"e_1_3_4_11_2","doi-asserted-by":"crossref","unstructured":"Paul Clough Mark Sanderson Jiayu Tang Tim Gollins and Amy Warner. 2012. Examining the limits of crowdsourcing for relevance assessment. IEEE Internet Comput. 17 4 (2012) 32\u201338.","DOI":"10.1109\/MIC.2012.95"},{"key":"e_1_3_4_12_2","doi-asserted-by":"crossref","unstructured":"Jacob Cohen. 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychol. Bull. 70 4 (1968) 213\u2013220.","DOI":"10.1037\/h0026256"},{"key":"e_1_3_4_13_2","first-page":"303","volume-title":"Proceedings of the 6th Text REtrieval Conference (TREC\u201998)","author":"Cormack Gordon V.","year":"1998","unstructured":"Gordon V. Cormack, Charles L. A. Clarke, Christopher R. Palmer, and Samuel S. L. To. 1998. Passage-based refinement (MultiText experiment for TREC-6). In Proceedings of the 6th Text REtrieval Conference (TREC\u201998). NIST, 303\u2013319."},{"key":"e_1_3_4_14_2","doi-asserted-by":"crossref","unstructured":"Gordon V. Cormack and Maura R. Grossman. 2015. Autonomy and reliability of continuous active learning for technology-assisted review. Retrieved from https:\/\/arxiv.org\/abs\/1504.06868.","DOI":"10.1145\/2766462.2767771"},{"key":"e_1_3_4_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/290941.291009"},{"key":"e_1_3_4_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3271750"},{"key":"e_1_3_4_17_2","doi-asserted-by":"crossref","unstructured":"Michael Eisenberg and Carol Barry. 1988. Order effects: A study of the possible influence of presentation order on user judgments of document relevance. J. Amer. Societ. Inf. Sci. 39 5 (1988) 293\u2013300.","DOI":"10.1002\/(SICI)1097-4571(198809)39:5<293::AID-ASI1>3.0.CO;2-I"},{"key":"e_1_3_4_18_2","volume-title":"Meta-Analysis in Social Research","author":"Glass Gene V.","year":"1981","unstructured":"Gene V. Glass, Barry McGaw, and Mary Lee Smith. 1981. Meta-Analysis in Social Research. Sage Publications."},{"key":"e_1_3_4_19_2","volume-title":"TREC: Experiment and Evaluation in Information Retrieval","author":"Harman Donna K.","year":"2005","unstructured":"Donna K. Harman. 2005. The TREC test collections. In TREC: Experiment and Evaluation in Information Retrieval, Ellen M. Voorhees and Donna K. Harman (Eds.). The MIT Press."},{"key":"e_1_3_4_20_2","doi-asserted-by":"crossref","unstructured":"Mu-Hsuan Huang and Hui-Yu Wang. 2004. The influence of document presentation order and number of documents judged on users\u2019 judgments of relevance. J. Amer. Societ. Inf. Sci. 55 11 (2004) 970\u2013979.","DOI":"10.1002\/asi.20047"},{"key":"e_1_3_4_21_2","doi-asserted-by":"crossref","unstructured":"Kalervo J\u00e4rvelin and Jaana Kek\u00e4l\u00e4inen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20 4 (2002) 422\u2013446.","DOI":"10.1145\/582415.582418"},{"key":"e_1_3_4_22_2","first-page":"29","volume-title":"Comparative Evaluation of Multilingual Information Access Systems (Lecture Notes in Computer Science,Vol. 3237)","author":"Kando Noriko","year":"2004","unstructured":"Noriko Kando. 2004. Evaluation of information access technologies at the NTCIR workshop. In Comparative Evaluation of Multilingual Information Access Systems (Lecture Notes in Computer Science,Vol. 3237), Carol Peters, Julio Gonzalo, Martin Braschler, and Michael Kluck (Eds.). Springer, 29\u201343."},{"key":"e_1_3_4_23_2","volume-title":"Rank Correlation Methods","author":"Kendall Maurice G.","year":"1962","unstructured":"Maurice G. Kendall. 1962. Rank Correlation Methods (3rd Edition). Charles Griffin and Company Limited."},{"key":"e_1_3_4_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/1458082.1458160"},{"key":"e_1_3_4_25_2","first-page":"805","volume-title":"Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Kutlu Mucahid","year":"2018","unstructured":"Mucahid Kutlu, Tyler McDonnell, Yassmine Barkallah, Tamer Elsayed, and Matthew Lease. 2018. Crowd vs. Expert: What can relevance judgment rationales teach us about assessor disagreement? In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, 805\u2013814."},{"key":"e_1_3_4_26_2","doi-asserted-by":"crossref","unstructured":"Aldo Lipani David E. Losada Guido Zuccon and Mihai Lupu. 2021. Fixed-cost pooling strategies. IEEE Trans. Knowl. Data Eng. 33 4 (2021) 1503\u20131522.","DOI":"10.1109\/TKDE.2019.2947049"},{"key":"e_1_3_4_27_2","doi-asserted-by":"crossref","unstructured":"Jeffrey D. Long and Norman Cliff. 1997. Confidence intervals for Kendall\u2019s tau. Brit. J. Math. Statist. Psych. 50 (1997) 31\u201341.","DOI":"10.1111\/j.2044-8317.1997.tb01100.x"},{"key":"e_1_3_4_28_2","doi-asserted-by":"crossref","unstructured":"David E. Losada Javier Parapar and \u00c1lvaro Barreiro. 2017. Multi-armed bandits for ordering judgements in pooling-based evaluation. Inf. Process. Manag. 53 3 (2017) 1005\u20131025.","DOI":"10.1016\/j.ipm.2017.04.005"},{"key":"e_1_3_4_29_2","doi-asserted-by":"crossref","unstructured":"David E. Losada Javier Parapar and \u00c1lvaro Barreiro. 2018. When to stop making relevance judgments? A study of stopping methods for building information retrieval test collections. J. Assoc. Inf. Sci. Technol. 70 1 (2018) 49\u201360.","DOI":"10.1002\/asi.24077"},{"key":"e_1_3_4_30_2","first-page":"139","volume-title":"Proceedings of the AAAI Conference on Human Computation and Crowdsourcing","author":"McDonnell Tyler","year":"2016","unstructured":"Tyler McDonnell, Matthew Lease, Mucahid Kutlu, and Tamer Elsayed. 2016. Why is that relevant? Collecting annotator rationales for relevance judgments. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing. AAAI, 139\u2013148."},{"key":"e_1_3_4_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277756"},{"key":"e_1_3_4_32_2","first-page":"116","volume-title":"Bridging between Information Retrieval and Databases. PROMISE 2013 (Lecture Notes in Computer Science, Vol.8173)","author":"Sakai Tetsuya","year":"2014","unstructured":"Tetsuya Sakai. 2014. Metrics, statistics, tests. In Bridging between Information Retrieval and Databases. PROMISE 2013 (Lecture Notes in Computer Science, Vol.8173), Nicola Ferro (Ed.). 116\u2013163."},{"key":"e_1_3_4_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-13-1199-4"},{"key":"e_1_3_4_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4899-7993-3_80616-1"},{"key":"e_1_3_4_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-22948-1_3"},{"key":"e_1_3_4_36_2","first-page":"1","volume-title":"Evaluating Information Retrieval and Access Tasks: NTCIR\u2019s Legacy of Research Impact","author":"Sakai Tetsuya","year":"2020","unstructured":"Tetsuya Sakai. 2020. Graded relevance. In Evaluating Information Retrieval and Access Tasks: NTCIR\u2019s Legacy of Research Impact, Tetsuya Sakai, Douglas W. Oard, and Noriko Kando (Eds.). Springer, 1\u201320."},{"key":"e_1_3_4_37_2","first-page":"572","volume-title":"Advances in Information Retrieval (ECIR\u201921) (Lecture Notes in Computer Science, Vol. 12656","author":"Sakai Tetsuya","year":"2021","unstructured":"Tetsuya Sakai. 2021. On the instability of diminishing return IR measures. In Advances in Information Retrieval (ECIR\u201921) (Lecture Notes in Computer Science, Vol. 12656), Djoerd Hiemstra, Marie-Francine Moens, Josiane Mothe, Raffaele Perego, Martin Potthast, and Fabrizio Sebastiani (Eds.). 572\u2013586."},{"key":"e_1_3_4_38_2","first-page":"77","volume-title":"Proceedings of the 7th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-lingual Information Access","author":"Sakai Tetsuya","year":"2008","unstructured":"Tetsuya Sakai, Noriko Kando, Chuan-Jie Lin, Teruko Mitamura, Hideki Shima, Donghong Ji, Kuang-Hua Chen, and Eric Nyberg. 2008. Overview of the NTCIR-7 ACLIA IR4QA task. In Proceedings of the 7th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-lingual Information Access. National Institute of Informatics, 77\u2013114."},{"key":"e_1_3_4_39_2","first-page":"234","volume-title":"Proceedings of the 16th NTCIR Conference: Evaluation of Information Access Technologies","author":"Sakai Tetsuya","year":"2022","unstructured":"Tetsuya Sakai, Sijie Tao, Zhumin Chu, Maria Maistro, Yujing Li, Nuo Chen, Nicola Ferro, Junjie Wang, Ian Soboroff, and Yiqun Liu. 2022. Overview of the NTCIR-16 We Want Web with CENTRE (WWW-4) task. In Proceedings of the 16th NTCIR Conference: Evaluation of Information Access Technologies. National Institute of Informatics, 234\u2013245."},{"key":"e_1_3_4_40_2","unstructured":"Tetsuya Sakai Sijie Tao Maria Maistro Zhumin Chu Yujing Li Nuo Chen Nicola Ferro Junjie Wang Ian Soboroff and Yiqun Liu. 2022. Corrected evaluation results of the NTCIR WWW-2 WWW-3 and WWW-4 English subtasks. Retrieved from https:\/\/arxiv.org\/abs\/2210.10266."},{"key":"e_1_3_4_41_2","doi-asserted-by":"crossref","unstructured":"Tetsuya Sakai Sijie Tao and Zhaohao Zeng. 2022. Relevance assessments for web search evaluation: Should we randomise or prioritise the pooled documents? ACM Trans. Inf. Syst. 40 4 (2022).","DOI":"10.1145\/3494833"},{"key":"e_1_3_4_42_2","doi-asserted-by":"crossref","unstructured":"Tetsuya Sakai Sijie Tao and Zhaohao Zeng. 2022. Relevance assessments for web search evaluation: Should we randomise or prioritise the pooled documents? (CORRECTED VERSION). Retrieved from http:\/\/arxiv.org\/abs\/2211.00981.","DOI":"10.1145\/3494833"},{"key":"e_1_3_4_43_2","first-page":"94","volume-title":"Proceedings of the 15th Asia Information Retrieval Societies Conference (AIRS\u201919) (Lecture Notes in Computer Science, Vol. 12004)","author":"Sakai Tetsuya","year":"2019","unstructured":"Tetsuya Sakai and Peng Xiao. 2019. Randomised vs. Prioritised pools for relevance assessments: Sample size considerations. In Proceedings of the 15th Asia Information Retrieval Societies Conference (AIRS\u201919) (Lecture Notes in Computer Science, Vol. 12004). Springer, 94\u2013105."},{"key":"e_1_3_4_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/2484028.2484090"},{"key":"e_1_3_4_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/2766462.2767841"},{"key":"e_1_3_4_46_2","doi-asserted-by":"crossref","first-page":"324","DOI":"10.1145\/564376.564433","volume-title":"Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Sormunen Eero","year":"2002","unstructured":"Eero Sormunen. 2002. Liberal relevance criteria of TREC\u2014Counting on negligible documents? In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, 324\u2013330."},{"key":"e_1_3_4_47_2","volume-title":"Report on a Design Study for the \u201cIdeal\u201d Information Retrieval Test Collection","author":"Jones K. Sparck","year":"1977","unstructured":"K. Sparck Jones and R. G. Bates. 1977. Report on a Design Study for the \u201cIdeal\u201d Information Retrieval Test Collection. Technical Report. Computer Laboratory, University of Cambridge, British Library Research and Development Report No. 5481."},{"key":"e_1_3_4_48_2","volume-title":"Report on the Need for and Provision of an \u201cIdeal\u201d Information Retrieval Test Collection","author":"Jones K. Sparck","year":"1975","unstructured":"K. Sparck Jones and C. J. van Rijsbergen. 1975. Report on the Need for and Provision of an \u201cIdeal\u201d Information Retrieval Test Collection. Technical Report. Computer Laboratory, University of Cambridge, British Library Research and Development Report No. 5266."},{"key":"e_1_3_4_49_2","first-page":"243","volume-title":"Proceedings of the 16th NTCIR Conference: Evaluation of Information Access Technologies","author":"Ubukata Yuya","year":"2022","unstructured":"Yuya Ubukata, Masaki Muraoka, Sijie Tao, and Tetsuya Sakai. 2022. SLWWW at the NTCIR-16 WWW-4 task. In Proceedings of the 16th NTCIR Conference: Evaluation of Information Access Technologies. National Institute of Informatics, 243\u2013246."},{"key":"e_1_3_4_50_2","first-page":"247","volume-title":"Proceedings of the 16th NTCIR Conference: Evaluation of Information Access Technologies","author":"Usuha Kota","year":"2022","unstructured":"Kota Usuha, Kohei Shinden, and Makoto P. Kato. 2022. KASYS at the NTCIR-16 WWW-4 task. In Proceedings of the 16th NTCIR Conference: Evaluation of Information Access Technologies. National Institute of Informatics, 247\u2013253."},{"key":"e_1_3_4_51_2","doi-asserted-by":"crossref","unstructured":"Ellen M. Voorhees. 2000. Variations in relevance judgments and the measurement of retrieval effectiveness. Inf. Process. Manag. 36 5 (2000) 697\u2013716.","DOI":"10.1016\/S0306-4573(00)00010-8"},{"key":"e_1_3_4_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3271766"},{"key":"e_1_3_4_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/564376.564432"},{"key":"e_1_3_4_54_2","doi-asserted-by":"crossref","first-page":"2970","DOI":"10.1145\/3477495.3531728","volume-title":"Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Voorhees Ellen M.","year":"2022","unstructured":"Ellen M. Voorhees, Nick Craswell, and Jimmy Lin. 2022. Too many relevants: Whither Cranfield test collections? In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, 2970\u20132980."},{"key":"e_1_3_4_55_2","doi-asserted-by":"crossref","DOI":"10.6028\/NIST.SP.500-246","volume-title":"NIST Special Publication 500-246: The Eighth Text REtrieval Conference (TREC 8)","author":"Voorhees Ellen M.","year":"2000","unstructured":"Ellen M. Voorhees and Donna Harman. 2000. Overview of the eighth Text REtrieval Conference (TREC-8). In NIST Special Publication 500-246: The Eighth Text REtrieval Conference (TREC 8). NIST."},{"key":"e_1_3_4_56_2","first-page":"173","volume-title":"Proceedings of the ACM Conference on Human Information Interaction and Retrieval","author":"Wakeling Simon","year":"2016","unstructured":"Simon Wakeling, Martin Halvey, Robert Villa, and Laura Hasler. 2016. A comparison of primary and secondary relevance judgements for real-life topics. In Proceedings of the ACM Conference on Human Information Interaction and Retrieval. Association for Computing Machinery, 173\u2013182."},{"key":"e_1_3_4_57_2","first-page":"254","volume-title":"Proceedings of the 16th NTCIR Conference: Evaluation of Information Access Technologies","author":"Yang Shenghao","year":"2022","unstructured":"Shenghao Yang, Haitao Li, Zhumin Chu, Jingtao Zhan, Yiqun Liu, Min Zhang, and Shaoping Ma. 2022. THUIR at the NTCIR-16 WWW-4 task. In Proceedings of the 16th NTCIR Conference: Evaluation of Information Access Technologies. National Institute of Informatics, 254\u2013257."},{"key":"e_1_3_4_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/1183614.1183633"},{"key":"e_1_3_4_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/290941.291014"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3600227","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3600227","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T18:55:31Z","timestamp":1750272931000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3600227"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,21]]},"references-count":58,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,1,31]]}},"alternative-id":["10.1145\/3600227"],"URL":"https:\/\/doi.org\/10.1145\/3600227","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"value":"1046-8188","type":"print"},{"value":"1558-2868","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,21]]},"assertion":[{"value":"2022-11-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-05-18","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-08-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}