{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T19:19:57Z","timestamp":1768591197221,"version":"3.49.0"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2016,6,9]],"date-time":"2016-06-09T00:00:00Z","timestamp":1465430400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Australian Research Council's Discovery Projects scheme","award":["DP130104007"],"award-info":[{"award-number":["DP130104007"]}]},{"DOI":"10.13039\/100008242","name":"NICTA","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100008242","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2016,9,14]]},"abstract":"<jats:p>We present a study of which baseline to use when testing a new retrieval technique. In contrast to past work, we show that measuring a statistically significant improvement over a weak baseline is not a good predictor of whether a similar improvement will be measured on a strong baseline. Sometimes strong baselines are made worse when a new technique is applied. We investigate whether conducting comparisons against a range of weaker baselines can increase confidence that an observed effect will also show improvements on a stronger baseline. Our results indicate that this is not the case -- at best, testing against a range of baselines means that an experimenter can be more confident that the new technique is unlikely to significantly harm a strong baseline. Examining recent past work, we present evidence that the information retrieval (IR) community continues to test against weak baselines. This is unfortunate as, in light of our experiments, we conclude that the only way to be confident that a new technique is a contribution is to compare it against nothing less than the state of the art.<\/jats:p>","DOI":"10.1145\/2882782","type":"journal-article","created":{"date-parts":[[2016,6,10]],"date-time":"2016-06-10T13:54:00Z","timestamp":1465566840000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":27,"title":["Examining Additivity and Weak Baselines"],"prefix":"10.1145","volume":"34","author":[{"given":"Sadegh","family":"Kharazmi","sequence":"first","affiliation":[{"name":"RMIT University &amp; NICTA, Melbourne, Australia"}]},{"given":"Falk","family":"Scholer","sequence":"additional","affiliation":[{"name":"RMIT University, Melbourne, Australia"}]},{"given":"David","family":"Vallet","sequence":"additional","affiliation":[{"name":"Google, Sydney, Australia"}]},{"given":"Mark","family":"Sanderson","sequence":"additional","affiliation":[{"name":"RMIT University, Melbourne, Australia"}]}],"member":"320","published-online":{"date-parts":[[2016,6,9]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498759.1498766"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646031"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2492189.2492191"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277805"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2422256.2422258"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/290941.291025"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646033"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390334.1390446"},{"key":"e_1_2_1_9_1","volume-title":"Advances in Information Retrieval","author":"Cummins Ronan","unstructured":"Ronan Cummins , Mounia Lalmas , and Colm O\u2019Riordan . 2011. The limits of retrieval effectiveness . In Advances in Information Retrieval . Springer , 277--282. Ronan Cummins, Mounia Lalmas, and Colm O\u2019Riordan. 2011. The limits of retrieval effectiveness. In Advances in Information Retrieval. Springer, 277--282."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2348283.2348296"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1250\/ast.29.247"},{"key":"e_1_2_1_12_1","volume-title":"Advances in Music Information Retrieval","author":"Downie J. Stephen","unstructured":"J. Stephen Downie , Andreas F. Ehmann , Mert Bay , and M. Cameron Jones . 2010. The music information retrieval evaluation exchange: Some observations and insights . In Advances in Music Information Retrieval . Springer , 93--115. J. Stephen Downie, Andreas F. Ehmann, Mert Bay, and M. Cameron Jones. 2010. The music information retrieval evaluation exchange: Some observations and insights. In Advances in Music Information Retrieval. Springer, 93--115."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600428.2611178"},{"key":"e_1_2_1_14_1","volume-title":"Advances in Information Retrieval","author":"Ferro Nicola","unstructured":"Nicola Ferro and Gianmaria Silvello . 2015. Rank-biased precision reloaded: Reproducibility and generalization . In Advances in Information Retrieval . Springer , 768--780. Nicola Ferro and Gianmaria Silvello. 2015. Rank-biased precision reloaded: Reproducibility and generalization. In Advances in Information Retrieval. Springer, 768--780."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2348283.2348285"},{"key":"e_1_2_1_16_1","volume-title":"Advances in Information Retrieval","author":"Hagen Matthias","unstructured":"Matthias Hagen , Martin Potthast , Michel B\u00fcchner , and Benno Stein . 2015. Twitter sentiment detection via ensemble classification using averaged confidence scores . In Advances in Information Retrieval . Springer , 741--754. Matthias Hagen, Martin Potthast, Michel B\u00fcchner, and Benno Stein. 2015. Twitter sentiment detection via ensemble classification using averaged confidence scores. In Advances in Information Retrieval. Springer, 741--754."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2348283.2348397"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2500887"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661829.2661894"},{"key":"e_1_2_1_20_1","volume-title":"Jones","author":"Leveling Johannes","year":"2012","unstructured":"Johannes Leveling , Lorraine Goeuriot , Liadh Kelly , and Gareth J . Jones . 2012 . DCU@ TRECMed 2012: Using ad-hoc baselines for domain-specific retrieval. In Proceedings of TREC. Johannes Leveling, Lorraine Goeuriot, Liadh Kelly, and Gareth J. Jones. 2012. DCU@ TRECMed 2012: Using ad-hoc baselines for domain-specific retrieval. In Proceedings of TREC."},{"key":"e_1_2_1_21_1","volume-title":"Bibliometric-Enhanced Information Retrieval","author":"Mayr Philipp","year":"2016","unstructured":"Philipp Mayr , Andrea Scharnhorst , Birger Larsen , Philipp Schaer , and Peter Mutschke . 2014. Bibliometric-Enhanced Information Retrieval . Springer , 798--801. Retrieved May 13, 2016 from http:\/\/link.springer.com\/chapter\/10.1007\/978-3-319-06028-6_99 Philipp Mayr, Andrea Scharnhorst, Birger Larsen, Philipp Schaer, and Peter Mutschke. 2014. Bibliometric-Enhanced Information Retrieval. Springer, 798--801. Retrieved May 13, 2016 from http:\/\/link.springer.com\/chapter\/10.1007\/978-3-319-06028-6_99"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2348283.2348534"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1816123.1816156"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of Australasian Language Technology Association Workshop. 96--100","author":"Puurula Antti","year":"2013","unstructured":"Antti Puurula . 2013 . Cumulative progress in language models for information retrieval . In Proceedings of Australasian Language Technology Association Workshop. 96--100 . Antti Puurula. 2013. Cumulative progress in language models for information retrieval. In Proceedings of Australasian Language Technology Association Workshop. 96--100."},{"key":"e_1_2_1_25_1","volume-title":"Advances in Information Retrieval","author":"Rao Jinfeng","unstructured":"Jinfeng Rao , Jimmy Lin , and Miles Efron . 2015. Reproducible experiments on lexical and temporal feedback for tweet search . In Advances in Information Retrieval . Springer , 755--767. Jinfeng Rao, Jimmy Lin, and Miles Efron. 2015. Reproducible experiments on lexical and temporal feedback for tweet search. In Advances in Information Retrieval. Springer, 755--767."},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Stephen E. Robertson Steve Walker Susan Jones Micheline M. Hancock-Beaulieu and Mike Gatford. 1995. Okapi at TREC-3. NIST SPECIAL PUBLICATION SP 109--109. Stephen E. Robertson Steve Walker Susan Jones Micheline M. Hancock-Beaulieu and Mike Gatford. 1995. Okapi at TREC-3. NIST SPECIAL PUBLICATION SP 109--109.","DOI":"10.6028\/NIST.SP.500-225.city"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2645710.2645746"},{"key":"e_1_2_1_28_1","volume-title":"3rd International Workshop on Evaluating Information Access (EVIA\u201910)","author":"Sakai Tetsuya","year":"2010","unstructured":"Tetsuya Sakai and Chin-Yew Lin . 2010 . Ranking retrieval systems without relevance assessments-revisited . In 3rd International Workshop on Evaluating Information Access (EVIA\u201910) . 25--33. Tetsuya Sakai and Chin-Yew Lin. 2010. Ranking retrieval systems without relevance assessments-revisited. In 3rd International Workshop on Evaluating Information Access (EVIA\u201910). 25--33."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000009"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2396761.2398553"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063576.2063869"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2010112"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2009997"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2010111"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-011-9180-x"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-12275-0_11"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000042"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600428.2609511"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb026616"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-53974-9_10"},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the International Conference on Intelligent Analysis","volume":"2","author":"Strohman Trevor","year":"2016","unstructured":"Trevor Strohman , Donald Metzler , Howard Turtle , and W. Bruce Croft . 2005. Indri: A language model-based search engine for complex queries . In Proceedings of the International Conference on Intelligent Analysis , Vol. 2 . Citeseer, 26. Retrieved May 13, 2016 from http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi&equals;10.1.1.65.3502&rep&equals;&equals;rep1&type&equals;&equals;pdf. Trevor Strohman, Donald Metzler, Howard Turtle, and W. Bruce Croft. 2005. Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligent Analysis, Vol. 2. Citeseer, 26. Retrieved May 13, 2016 from http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi&equals;10.1.1.65.3502&rep&equals;&equals;rep1&type&equals;&equals;pdf."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2010066"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2682862.2682863"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2348283.2348396"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2348283.2348297"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/564376.564432"},{"key":"e_1_2_1_47_1","volume-title":"TREC: Experiment and Evaluation in Information Retrieval","author":"Voorhees E. M.","year":"2005","unstructured":"E. M. Voorhees and D. K. Harman . 2005 . TREC: Experiment and Evaluation in Information Retrieval . MIT Press , Cambridge MA . E. M. Voorhees and D. K. Harman. 2005. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge MA."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1571941.1571963"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390334.1390346"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2348283.2348451"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.5555\/1138797.1710806"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-28997-2_26"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2882782","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2882782","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T19:04:28Z","timestamp":1750273468000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2882782"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,6,9]]},"references-count":52,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2016,9,14]]}},"alternative-id":["10.1145\/2882782"],"URL":"https:\/\/doi.org\/10.1145\/2882782","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"value":"1046-8188","type":"print"},{"value":"1558-2868","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,6,9]]},"assertion":[{"value":"2015-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-06-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}