{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T04:58:58Z","timestamp":1755838738440,"version":"3.41.0"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2018,9,29]],"date-time":"2018-09-29T00:00:00Z","timestamp":1538179200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000923","name":"Australian Research Council","doi-asserted-by":"crossref","award":["DP180102687"],"award-info":[{"award-number":["DP180102687"]}],"id":[{"id":"10.13039\/501100000923","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Google Faculty Research Grant"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2018,9,30]]},"abstract":"<jats:p>\n            One typical way of building test collections for offline measurement of information retrieval systems is to pool the ranked outputs of different systems down to some chosen depth\n            <jats:italic>d<\/jats:italic>\n            and then form relevance judgments for those documents only. Non-pooled documents\u2014ones that did not appear in the top-\n            <jats:italic>d<\/jats:italic>\n            sets of any of the contributing systems\u2014are then deemed to be non-relevant for the purposes of evaluating the relative behavior of the systems. In this article, we use RBP-derived residuals to re-examine the reliability of that process. By fitting the RBP parameter \u03d5 to maximize similarity between AP- and NDCG-induced system rankings, on the one hand, and RBP-induced rankings, on the other, an estimate can be made as to the potential score uncertainty associated with those two recall-based metrics. We then consider the effect that residual size\u2014as an indicator of possible measurement uncertainty in utility-based metrics\u2014has in connection with recall-based metrics by computing the effect of increasing pool sizes and examining the trends that arise in terms of both metric score and system separability using standard statistical tests. The experimental results show that the confidence levels expressed via the\n            <jats:italic>p<\/jats:italic>\n            -values generated by statistical tests are only weakly connected to the size of the residual and to the degree of measurement uncertainty caused by the presence of unjudged documents. Statistical confidence estimates are, however, largely consistent as pooling depths are altered. We therefore recommend that all such experimental results should report, in addition to the outcomes of statistical significance tests, the residual measurements generated by a suitably matched weighted-precision metric, to give a clear indication of measurement uncertainty that arises due to the presence of unjudged documents in test collections with finite pooled judgments.\n          <\/jats:p>","DOI":"10.1145\/3239572","type":"journal-article","created":{"date-parts":[[2018,10,1]],"date-time":"2018-10-01T12:15:55Z","timestamp":1538396155000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Estimating Measurement Uncertainty for Information Retrieval Effectiveness Metrics"],"prefix":"10.1145","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6638-0232","authenticated-orcid":false,"given":"Alistair","family":"Moffat","sequence":"first","affiliation":[{"name":"The University of Melbourne, Victoria, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9094-0810","authenticated-orcid":false,"given":"Falk","family":"Scholer","sequence":"additional","affiliation":[{"name":"RMIT University, Victoria, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ziying","family":"Yang","sequence":"additional","affiliation":[{"name":"The University of Melbourne, Victoria, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,9,29]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/956863.956953"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390334.1390447"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911451.2914671"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-007-9032-x"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1008992.1009000"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277755"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277754"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148219"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646033"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3106371"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3190580.3190586"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/582415.582418"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3019612.3019692"},{"volume-title":"Proceedings of the European Conference in Information Retrieval (ECIR\u201917)","author":"Lipani A.","key":"e_1_2_1_14_1","unstructured":"A. Lipani , J. R. M. Palotti , M. Lupu , F. Piroi , G. Zuccon , and A. Hanbury . 2017. Fixed-cost pooling strategies based on IR evaluation measures . In Proceedings of the European Conference in Information Retrieval (ECIR\u201917) . 357--368. A. Lipani, J. R. M. Palotti, M. Lupu, F. Piroi, G. Zuccon, and A. Hanbury. 2017. Fixed-cost pooling strategies based on IR evaluation measures. In Proceedings of the European Conference in Information Retrieval (ECIR\u201917). 357--368."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2017.04.005"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-016-9282-6"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080793"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/262192.262203"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-45068-6_1"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3052768"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2507665"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277806"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1416950.1416952"},{"volume-title":"Proceedings of the Australasian Database Conference (ADC\u201909)","author":"Ravana S. D.","key":"e_1_2_1_24_1","unstructured":"S. D. Ravana and A. Moffat . 2009. Score aggregation techniques in retrieval experimentation . In Proceedings of the Australasian Database Conference (ADC\u201909) . 59--67. S. D. Ravana and A. Moffat. 2009. Score aggregation techniques in retrieval experimentation. In Proceedings of the Australasian Database Conference (ADC\u201909). 59--67."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1183614.1183630"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the NII Testbeds and Community for Information Access Research (NTCIR\u201904)","author":"Sakai T.","year":"2004","unstructured":"T. Sakai . 2004 . New performance metrics based on multigrade relevance: Their application to question answering . In Proceedings of the NII Testbeds and Community for Information Access Research (NTCIR\u201904) . T. Sakai. 2004. New performance metrics based on multigrade relevance: Their application to question answering. In Proceedings of the NII Testbeds and Community for Information Access Research (NTCIR\u201904)."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148261"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277756"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390334.1390454"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1458082.1458159"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911451.2911492"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-008-9059-7"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000009"},{"volume-title":"Handbook of Parametric and Nonparametric Statistical Procedures","author":"Sheskin D.","key":"e_1_2_1_34_1","unstructured":"D. Sheskin . 1997. Handbook of Parametric and Nonparametric Statistical Procedures . CRC Press . D. Sheskin. 1997. Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1321440.1321528"},{"key":"e_1_2_1_36_1","volume-title":"Information Retrieval Test Collection. Computer Laboratory","author":"Sp\u00e4rck Jones K.","year":"1975","unstructured":"K. Sp\u00e4rck Jones and C. J. Van Rijsbergen . 1975. Report on the Need for and Provision of an \u2018Ideal \u2019 Information Retrieval Test Collection. Computer Laboratory , University of Cambridge , British Library Research and Development Report No. 5266 ( 1975 ). K. Sp\u00e4rck Jones and C. J. Van Rijsbergen. 1975. Report on the Need for and Provision of an \u2018Ideal\u2019 Information Retrieval Test Collection. Computer Laboratory, University of Cambridge, British Library Research and Development Report No. 5266 (1975)."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(00)00010-8"},{"key":"e_1_2_1_38_1","volume-title":"Overview of the TREC 2004 robust retrieval track. In Proceedings of the Text Retrieval Conference (TREC\u201904)","author":"Voorhees E. M.","year":"2004","unstructured":"E. M. Voorhees . 2004 . Overview of the TREC 2004 robust retrieval track. In Proceedings of the Text Retrieval Conference (TREC\u201904) . NIST Special Publication. 500--261. E. M. Voorhees. 2004. Overview of the TREC 2004 robust retrieval track. In Proceedings of the Text Retrieval Conference (TREC\u201904). NIST Special Publication. 500--261."},{"key":"e_1_2_1_39_1","volume-title":"Overview of TREC 2004. In Proceedings of the Text Retrieval Conference (TREC\u201904)","author":"Voorhees E. M.","year":"2004","unstructured":"E. M. Voorhees . 2004 . Overview of TREC 2004. In Proceedings of the Text Retrieval Conference (TREC\u201904) . NIST Special Publication. 500--261. E. M. Voorhees. 2004. Overview of TREC 2004. In Proceedings of the Text Retrieval Conference (TREC\u201904). NIST Special Publication. 500--261."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.6028\/NIST.SP.500-246"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.6028\/NIST.SP.500-246"},{"key":"e_1_2_1_42_1","volume-title":"TREC: Experiment and Evaluation in Information Retrieval","author":"Voorhees E. M.","year":"2005","unstructured":"E. M. Voorhees and D. K. Harman (Eds.). 2005 . TREC: Experiment and Evaluation in Information Retrieval . MIT Press . E. M. Voorhees and D. K. Harman (Eds.). 2005. TREC: Experiment and Evaluation in Information Retrieval. MIT Press."},{"volume-title":"Proceedings of the Workshop on Evaluating Information Access (EVIA\u201910)","author":"Webber W.","key":"e_1_2_1_43_1","unstructured":"W. Webber , A. Moffat , and J. Zobel . 2010. The effect of pooling and evaluation depth on metric stability . In Proceedings of the Workshop on Evaluating Information Access (EVIA\u201910) . 7--15. W. Webber, A. Moffat, and J. Zobel. 2010. The effect of pooling and evaluation depth on metric stability. In Proceedings of the Workshop on Evaluating Information Access (EVIA\u201910). 7--15."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/1852102.1852106"},{"volume-title":"Proceedings of the Asia Information Retrieval Societies Conference (AIRS\u201916)","author":"Yang Z.","key":"e_1_2_1_45_1","unstructured":"Z. Yang , A. Moffat , and A. Turpin . 2016. How precise does document scoring need to be? In Proceedings of the Asia Information Retrieval Societies Conference (AIRS\u201916) . 279--291. Z. Yang, A. Moffat, and A. Turpin. 2016. How precise does document scoring need to be? In Proceedings of the Asia Information Retrieval Societies Conference (AIRS\u201916). 279--291."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390334.1390435"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/290941.291014"}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3239572","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3239572","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:08:20Z","timestamp":1750208900000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3239572"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,9,29]]},"references-count":47,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2018,9,30]]}},"alternative-id":["10.1145\/3239572"],"URL":"https:\/\/doi.org\/10.1145\/3239572","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"type":"print","value":"1936-1955"},{"type":"electronic","value":"1936-1963"}],"subject":[],"published":{"date-parts":[[2018,9,29]]},"assertion":[{"value":"2017-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-09-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}