{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T21:07:10Z","timestamp":1761599230221,"version":"3.41.0"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,1,11]],"date-time":"2022-01-11T00:00:00Z","timestamp":1641859200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2022,10,31]]},"abstract":"<jats:p>\n            In the context of depth-\n            <jats:italic>k<\/jats:italic>\n            pooling for constructing web search test collections, we compare two approaches to ordering pooled documents for relevance assessors: The prioritisation strategy (PRI) used widely at NTCIR, and the simple randomisation strategy (RND). In order to address research questions regarding PRI and RND, we have constructed and released the WWW3E8 dataset, which contains eight independent relevance labels for 32,375 topic-document pairs, i.e., a total of 259,000 labels. Four of the eight relevance labels were obtained from PRI-based pools; the other four were obtained from RND-based pools. Using WWW3E8, we compare PRI and RND in terms of inter-assessor agreement, system ranking agreement, and robustness to new systems that did not contribute to the pools. We also utilise an assessor activity log we obtained as a byproduct of WWW3E8 to compare the two strategies in terms of assessment efficiency. Our main findings are: (a)\u00a0The presentation order has no substantial impact on assessment efficiency; (b)\u00a0While the presentation order substantially affects which documents are judged (highly) relevant, the difference between the inter-assessor agreement under the PRI condition and that under the RND condition is of no practical significance; (c)\u00a0Different system rankings under the PRI condition are substantially more similar to one another than those under the RND condition; and (d)\u00a0PRI-based relevance assessment files (qrels) are substantially and statistically significantly more robust to new systems than RND-based ones. Finding\u00a0(d) suggests that PRI helps the assessors identify relevant documents that affect the evaluation of many existing systems, including those that did not contribute to the pools. Hence, if researchers need to evaluate their current IR systems using legacy IR test collections, we recommend the use of those constructed using the PRI approach unless they have a good reason to believe that their systems retrieve relevant documents that are vastly different from the pooled documents. While this robustness of PRI may also mean that the PRI-based pools are biased against future systems that retrieve highly novel relevant documents, one should note that there is no evidence that RND is any better in this respect.\n          <\/jats:p>","DOI":"10.1145\/3494833","type":"journal-article","created":{"date-parts":[[2022,1,17]],"date-time":"2022-01-17T06:04:22Z","timestamp":1642399462000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents?"],"prefix":"10.1145","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6720-963X","authenticated-orcid":false,"given":"Tetsuya","family":"Sakai","sequence":"first","affiliation":[{"name":"Waseda University, Okubo, Shinjuku, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6751-5303","authenticated-orcid":false,"given":"Sijie","family":"Tao","sequence":"additional","affiliation":[{"name":"Waseda University, Okubo, Shinjuku, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9275-8526","authenticated-orcid":false,"given":"Zhaohao","family":"Zeng","sequence":"additional","affiliation":[{"name":"Waseda University, Okubo, Shinjuku, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,1,11]]},"reference":[{"doi-asserted-by":"crossref","unstructured":"James Allan Ben Carterette Javed A. Aslam Virgil Pavlu Blagovest Dachev and Evangelos Kanoulas. 2008. Million query track 2007 overview.","key":"e_1_3_3_2_2","DOI":"10.21236\/ADA477388"},{"key":"e_1_3_3_3_2","volume-title":"Proceedings of the TREC 2017","author":"Allan James","year":"2018","unstructured":"James Allan, Donna Harman, Evangelos Kanoulas, Dan Li, Christophe Van Gysel, and Ellen Voorhees. 2018. TREC common core track overview. In Proceedings of the TREC 2017."},{"doi-asserted-by":"publisher","key":"e_1_3_3_4_2","DOI":"10.1145\/1390334.1390447"},{"key":"e_1_3_3_5_2","volume-title":"Proceedings of the TREC 2009","author":"Carterette Ben","year":"2010","unstructured":"Ben Carterette, Virgil Pavlu, Hui Fang, and Evangelos Kanoulas. 2010. Million query track 2009 overview. In Proceedings of the TREC 2009."},{"doi-asserted-by":"publisher","key":"e_1_3_3_6_2","DOI":"10.1145\/2433396.2433411"},{"key":"e_1_3_3_7_2","volume-title":"Proceedings of the TREC 2009","author":"Clarke Charles L. A.","year":"2010","unstructured":"Charles L. A. Clarke, Nick Craswell, and Ian Soboroff. 2010. Overview of the TREC 2009 web track. In Proceedings of the TREC 2009."},{"key":"e_1_3_3_8_2","volume-title":"Factors Determining the Performance of Indexing Systems; Volume 2","author":"Cleverdon Cyril","year":"1966","unstructured":"Cyril Cleverdon and Michael Keen. 1966. Factors Determining the Performance of Indexing Systems; Volume 2. Technical Report. College of Aeronautics, Cranfield, UK."},{"key":"e_1_3_3_9_2","volume-title":"Factors Determining the Performance of Indexing Systems; Volume 1: Design","author":"Cleverdon Cyril","year":"1966","unstructured":"Cyril Cleverdon, Jack Mills, and Michael Keen. 1966. Factors Determining the Performance of Indexing Systems; Volume 1: Design. Technical Report. College of Aeronautics, Cranfield, UK."},{"doi-asserted-by":"publisher","key":"e_1_3_3_10_2","DOI":"10.1145\/290941.291009"},{"doi-asserted-by":"publisher","key":"e_1_3_3_11_2","DOI":"10.1145\/3269206.3271750"},{"doi-asserted-by":"publisher","key":"e_1_3_3_12_2","DOI":"10.1145\/2970398.2970431"},{"doi-asserted-by":"publisher","key":"e_1_3_3_13_2","DOI":"10.1002\/(SICI)1097-4571(198809)39:5<293::AID-ASI1>3.0.CO;2-I"},{"doi-asserted-by":"publisher","key":"e_1_3_3_14_2","DOI":"10.1007\/978-3-030-22948-1"},{"doi-asserted-by":"publisher","key":"e_1_3_3_15_2","DOI":"10.1037\/h0031619"},{"key":"e_1_3_3_16_2","volume-title":"Proceedings of the TREC: Experiment and Evaluation in Information Retrieval","author":"Harman Donna K.","year":"2005","unstructured":"Donna K. Harman. 2005. The TREC test collections. In Proceedings of the TREC: Experiment and Evaluation in Information Retrieval, Ellen M. Voorhees and Donna K. Harman (Eds.), The MIT Press, Chapter 2."},{"doi-asserted-by":"publisher","key":"e_1_3_3_17_2","DOI":"10.1002\/asi.20047"},{"doi-asserted-by":"publisher","key":"e_1_3_3_18_2","DOI":"10.1007\/978-3-540-30222-3_4"},{"key":"e_1_3_3_19_2","volume-title":"Content Analysis: An Introduction to Its Methodology (Fourth Edition)","author":"Krippendorff Klaus","year":"2018","unstructured":"Klaus Krippendorff. 2018. Content Analysis: An Introduction to Its Methodology (Fourth Edition). SAGE Publications."},{"doi-asserted-by":"publisher","key":"e_1_3_3_20_2","DOI":"10.1109\/TKDE.2019.2947049"},{"doi-asserted-by":"publisher","key":"e_1_3_3_21_2","DOI":"10.1016\/j.ipm.2017.04.005"},{"key":"e_1_3_3_22_2","article-title":"When to stop making relevance judgments? A study of stopping methods for building information retrieval test collections","author":"Losada David E.","year":"2018","unstructured":"David E. Losada, Javier Parapar, and \u00c1lvaro Barreiro. 2018. When to stop making relevance judgments? A study of stopping methods for building information retrieval test collections. Journal of the Association for Information Science and Technology 70, 3 (2018), 49\u201360.","journal-title":"Journal of the Association for Information Science and Technology"},{"key":"e_1_3_3_23_2","first-page":"394","volume-title":"Proceedings of the NTCIR-13","author":"Luo Cheng","year":"2017","unstructured":"Cheng Luo, Tetsuya Sakai, Yiqun Liu, Zhicheng Dou, Chenyan Xiong, and Jingfang Xu. 2017. Overview of the NTCIR-13 we want web task. In Proceedings of the NTCIR-13. 394\u2013401."},{"key":"e_1_3_3_24_2","first-page":"455","volume-title":"Proceedings of the NTCIR-14","author":"Mao Jiaxin","year":"2019","unstructured":"Jiaxin Mao, Tetsuya Sakai, Cheng Luo, Peng Xiao, Yiqun Liu, and Zhicheng Dou. 2019. Overview of the NTCIR-14 we want web task. In Proceedings of the NTCIR-14. 455\u2013467."},{"key":"e_1_3_3_25_2","first-page":"243","volume-title":"Proceedings of the NTCIR-15","author":"Muraoka Masaki","year":"2020","unstructured":"Masaki Muraoka, Zhaohao Zeng, and Tetsuya Sakai. 2020. SLWWW at the NTCIR-15 WWW-3 task. In Proceedings of the NTCIR-15. 243\u2013246."},{"doi-asserted-by":"publisher","key":"e_1_3_3_26_2","DOI":"10.1145\/1458082.1458159"},{"doi-asserted-by":"publisher","key":"e_1_3_3_27_2","DOI":"10.1007\/978-3-642-54798-0_6"},{"doi-asserted-by":"publisher","key":"e_1_3_3_28_2","DOI":"10.1145\/2911451.2911492"},{"doi-asserted-by":"publisher","key":"e_1_3_3_29_2","DOI":"10.1007\/s10791-015-9273-z"},{"doi-asserted-by":"crossref","unstructured":"Tetsuya Sakai. 2018. Laboratory Experiments in Information Retrieval: Sample Sizes Effect Sizes and Statistical Power . Springer.","key":"e_1_3_3_30_2","DOI":"10.1007\/978-981-13-1199-4"},{"doi-asserted-by":"publisher","key":"e_1_3_3_31_2","DOI":"10.1007\/978-3-030-22948-1_3"},{"doi-asserted-by":"publisher","key":"e_1_3_3_32_2","DOI":"10.1007\/978-3-030-72113-8_38"},{"doi-asserted-by":"publisher","key":"e_1_3_3_33_2","DOI":"10.1007\/978-3-642-35341-3_3"},{"key":"e_1_3_3_34_2","first-page":"77","volume-title":"Proceedings of the NTCIR-7","author":"Sakai Tetsuya","year":"2008","unstructured":"Tetsuya Sakai, Noriko Kando, Chuan-Jie Lin, Teruko Mitamura, Hideki Shima, Donghong Ji, Kuang-Hua Chen, and Eric Nyberg. 2008. Overview of the NTCIR-7 ACLIA IR4QA task. In Proceedings of the NTCIR-7. 77\u2013114."},{"key":"e_1_3_3_35_2","first-page":"25","volume-title":"Proceedings of the EVIA 2010","author":"Sakai Tetsuya","year":"2010","unstructured":"Tetsuya Sakai and Chin-Yew Lin. 2010. Ranking retrieval systems without relevance assessments - revisited. In Proceedings of the EVIA 2010. 25\u201333."},{"key":"e_1_3_3_36_2","volume-title":"Evaluating Information Retrieval and Access Tasks: NTCIR\u2019s Legacy of Research Impact","author":"Sakai Tetsuya","year":"2020","unstructured":"Tetsuya Sakai, Douglas W. Oard, and Noriko Kando (Eds.), 2020. Evaluating Information Retrieval and Access Tasks: NTCIR\u2019s Legacy of Research Impact. Springer."},{"doi-asserted-by":"publisher","key":"e_1_3_3_37_2","DOI":"10.1145\/3404835.3463236"},{"key":"e_1_3_3_38_2","first-page":"219","volume-title":"Proceedings of the NTCIR-15","author":"Sakai Tetsuya","year":"2020","unstructured":"Tetsuya Sakai, Sijie Tao, Zhaohao Zeng, Yukun Zheng, Jiaxin Mao, Zhumin Chu, Yiqun Liu, Zhicheng Dou, Nicola Ferro, Maria Maistro, and Ian Soboroff. 2020. Overview of the NTCIR-15 we want web with CENTRE (WWW-3) task. In Proceedings of the NTCIR-15. 219\u2013234."},{"key":"e_1_3_3_39_2","first-page":"94","volume-title":"Proceedings of the AIRS 2019 (LNCS 12004)","author":"Sakai Tetsuya","year":"2019","unstructured":"Tetsuya Sakai and Peng Xiao. 2019. Randomised vs. prioritised pools for relevance assessments: Sample size considerations. In Proceedings of the AIRS 2019 (LNCS 12004). 94\u2013105."},{"doi-asserted-by":"publisher","key":"e_1_3_3_40_2","DOI":"10.1145\/3331184.3331215"},{"doi-asserted-by":"publisher","key":"e_1_3_3_41_2","DOI":"10.1145\/3431813"},{"doi-asserted-by":"publisher","key":"e_1_3_3_42_2","DOI":"10.1145\/2484028.2484090"},{"key":"e_1_3_3_43_2","volume-title":"Report on a Design Study for the \u201cIdeal\u201d Information Retrieval Test Collection","author":"Jones K. Sparck","year":"1977","unstructured":"K. Sparck Jones and R. G. Bates. 1977. Report on a Design Study for the \u201cIdeal\u201d Information Retrieval Test Collection. Technical Report. Computer Laboratory, University of Cambridge, British Library Research and Development Report No. 5481."},{"key":"e_1_3_3_44_2","volume-title":"Report on the Need for and Provision of an \u201cIdeal\u201d Information Retrieval Test Collection","author":"Jones K. Sparck","year":"1975","unstructured":"K. Sparck Jones and C. J. van Rijsbergen. 1975. Report on the Need for and Provision of an \u201cIdeal\u201d Information Retrieval Test Collection. Technical Report. Computer Laboratory, University of Cambridge, British Library Research and Development Report No. 5266."},{"doi-asserted-by":"publisher","key":"e_1_3_3_45_2","DOI":"10.1145\/2766462.2767760"},{"doi-asserted-by":"publisher","key":"e_1_3_3_46_2","DOI":"10.1016\/S0306-4573(00)00010-8"},{"doi-asserted-by":"publisher","key":"e_1_3_3_47_2","DOI":"10.5555\/648264.753539"},{"doi-asserted-by":"publisher","key":"e_1_3_3_48_2","DOI":"10.1145\/290941.291014"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3494833","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3494833","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:18Z","timestamp":1750188678000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3494833"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,11]]},"references-count":47,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,10,31]]}},"alternative-id":["10.1145\/3494833"],"URL":"https:\/\/doi.org\/10.1145\/3494833","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"type":"print","value":"1046-8188"},{"type":"electronic","value":"1558-2868"}],"subject":[],"published":{"date-parts":[[2022,1,11]]},"assertion":[{"value":"2021-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}