{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T20:10:44Z","timestamp":1761163844804,"version":"build-2065373602"},"reference-count":30,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2019,2,1]],"date-time":"2019-02-01T00:00:00Z","timestamp":1548979200000},"content-version":"vor","delay-in-days":396,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":["asistdl.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Proc. Assoc. Info. Sci. Tech."],"published-print":{"date-parts":[[2018,1]]},"abstract":"<jats:title>ABSTRACT<\/jats:title>\n                  <jats:p>Within many domains, such as news, medicine and patent, documents contain a variety of fields such as title, author, body, source, etc. As such fielded retrieval models that query across these fields are often employed. It is largely presumed that fielding provides a better representation of the document and offers more control when querying and that this will lead to improved retrieval performance. However, depending on how the fields are weighted and if the fields are populated, the retrieval algorithm may unduly favour certain documents over others. This is known as algorithmic bias and it can be detrimental to retrieval systems performance. In this paper, we explore the impact of fielding on retrieval bias and performance across a variety of TREC News Test Collections. We perform an extensive large\u2010scale analysis on two types of fielded retrieval model variations that are based on the popular BM25 retrieval algorithm where either: fields are scored independently and then combined (Model 1), or fields are first combined and then scored (Model 2). Our findings show that for Model 1 fielding, a strong correlation exists between retrieval bias and performance such that as title fields are weighted more heavily, bias increases, while retrieval performance decreases. When weighting is applied to content\u2010based fields, performance increases as bias decreases, showing that relying more on content may be favourable in terms of fairness and performance. On the other hand, for Model 2 fielding, the relationship between retrieval bias and performance is more complex. But, crucially we show that Model 2 fielding results in lower retrieval bias and greater performance than Model 1 fielding. And, we observed that under Model 1, news articles without titles are substantially less retrievable (i.e. more susceptible to algorithmic bias). These findings have serious ramifications as many popular Open Source Information Retrieval frameworks, commonly used by professional searchers, use the default implementation of Model 1 for their fielded search capability. This research shows the importance of analysing retrieval algorithms with respect to both bias and performance to ensure they minimize any unwanted or unintended biases when maximising performance. Further work is required to examine this phenomenon in more detail and to design fielded retrieval models that have the advantages of control and performance without detrimental biases.<\/jats:p>","DOI":"10.1002\/pra2.2018.14505501061","type":"journal-article","created":{"date-parts":[[2019,2,1]],"date-time":"2019-02-01T16:52:19Z","timestamp":1549039939000},"page":"564-572","update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["The impact of fielding on retrieval performance and bias"],"prefix":"10.1002","volume":"55","author":[{"given":"Colin","family":"Wilkie","sequence":"first","affiliation":[{"name":"University of Glasgow United Kingdom"}]},{"given":"Leif","family":"Azzopardi","sequence":"additional","affiliation":[{"name":"University of Strathclyde United Kingdom"}]}],"member":"311","published-online":{"date-parts":[[2019,2]]},"reference":[{"key":"e_1_2_8_2_1","doi-asserted-by":"crossref","unstructured":"Azzopardi L.andOwens C.(2009) \u2018Search engine predilection towards news media providers\u2019 inProceedings \u2013 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR2009. doi:10.1145\/1571941.1572122.","DOI":"10.1145\/1571941.1572122"},{"key":"e_1_2_8_3_1","unstructured":"Azzopardi L. De Rijke M.and others (2006) \u2018Query intention acquisition: A case study on automatically inferring structured queries\u2019 inProceedings of the 6th Dutch\u2010Belgian Information Retrieval Workshop pp.3\u201310."},{"key":"e_1_2_8_4_1","doi-asserted-by":"crossref","unstructured":"Azzopardi L.andVinay V.(2008a)Accessibility in information retrieval Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). doi:10.1007\/978-3-540-78646-7_46.","DOI":"10.1007\/978-3-540-78646-7_46"},{"key":"e_1_2_8_5_1","doi-asserted-by":"crossref","unstructured":"Azzopardi L.andVinay V.(2008b) \u2018Retrievability: An evaluation measure for higher order information access tasks\u2019 inInternational Conference on Information and Knowledge Management Proceedings. doi:10.1145\/1458082.1458157.","DOI":"10.1145\/1458082.1458157"},{"key":"e_1_2_8_6_1","doi-asserted-by":"crossref","unstructured":"Bashir S.andRauber A.(2009a) \u2018Identification of Low\/High Retrievable Patents Using Content\u2010based Features\u2019 inProceedings of the 2Nd International Workshop on Patent Information Retrieval. New York NY USA: ACM (PaIR \u201809) pp.9\u201316. doi:10.1145\/1651343.1651346.","DOI":"10.1145\/1651343.1651346"},{"key":"e_1_2_8_7_1","doi-asserted-by":"crossref","unstructured":"Bashir S.andRauber A.(2009b) \u2018Improving retrievability of patents with cluster\u2010based pseudo\u2010relevance feedback documents selection\u2019 inProc. of the 18th ACM CIKM pp.1863\u20131866.","DOI":"10.1145\/1645953.1646250"},{"key":"e_1_2_8_8_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.21549"},{"key":"e_1_2_8_9_1","first-page":"189","volume-title":"Knowledge and Information Systems","author":"Bashir S.","year":"2014"},{"key":"e_1_2_8_10_1","doi-asserted-by":"crossref","unstructured":"Blanco R.andBoldi P.(2012) \u2018Extending BM25 with Multiple Query Operators\u2019 inProceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York NY USA: ACM (SIGIR \u201812) pp.921\u2013930. doi:10.1145\/2348283.2348406.","DOI":"10.1145\/2348283.2348406"},{"key":"e_1_2_8_11_1","doi-asserted-by":"crossref","unstructured":"Chen R.\u2010C. Azzopardi L.andScholer F.(2017) \u2018An empirical analysis of pruning techniques performance retrievability and bias\u2019 inInternational Conference on Information and Knowledge Management Proceedings. doi:10.1145\/3132847.3133151.","DOI":"10.1145\/3132847.3133151"},{"key":"e_1_2_8_12_1","doi-asserted-by":"publisher","DOI":"10.2307\/1937992"},{"key":"e_1_2_8_13_1","doi-asserted-by":"crossref","unstructured":"Itakura K. Y.andClarke C. L. A.(2010) \u2018A Framework for BM25F\u2010based XML Retrieval\u2019 inProceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York NY USA: ACM (SIGIR \u201810) pp.843\u2013844. doi:10.1145\/1835449.1835644.","DOI":"10.1145\/1835449.1835644"},{"key":"e_1_2_8_14_1","doi-asserted-by":"crossref","unstructured":"Jimmy Zuccon G.andKoopman B.(2016) \u2018Boosting Titles Does Not Generally Improve Retrieval Effectiveness\u2019 inProceedings of the 21st Australasian Document Computing Symposium. New York NY USA: ACM (ADCS \u201816) pp.25\u201332. doi:10.1145\/3015022.3015028.","DOI":"10.1145\/3015022.3015028"},{"key":"e_1_2_8_15_1","doi-asserted-by":"crossref","unstructured":"Kim J. Xue X.andCroft W. B.(2009) \u2018A Probabilistic Retrieval Model for Semistructured Data\u2019 inProceedings of the 31th European Conference on IR Research on Advances in Information Retrieval. Berlin Heidelberg: Springer\u2010Verlag (ECIR \u201809) pp.228\u2013239. doi:10.1007\/978-3-642-00958-7_22.","DOI":"10.1007\/978-3-642-00958-7_22"},{"key":"e_1_2_8_16_1","doi-asserted-by":"crossref","unstructured":"Kim J. Y.andCroft W. B.(2012) \u2018A Field Relevance Model for Structured Document Retrieval\u2019 inProceedings of the 34th European Conference on Advances in Information Retrieval. Berlin Heidelberg: Springer\u2010Verlag (ECIR'12) pp.97\u2013108. doi:10.1007\/978-3-642-28997-2_9.","DOI":"10.1007\/978-3-642-28997-2_9"},{"key":"e_1_2_8_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983270"},{"key":"e_1_2_8_18_1","doi-asserted-by":"crossref","unstructured":"Lipani A.et al. (2015) \u2018An Initial Analytical Exploration of Retrievability\u2019 inProc. of the 2015 ICTIR. ACM (ICTIR \u201815) pp.329\u2013332.","DOI":"10.1145\/2808194.2809495"},{"key":"e_1_2_8_19_1","doi-asserted-by":"crossref","unstructured":"Ogilvie P.andCallan J.(2003) \u2018Combining Document Representations for Known\u2010item Search\u2019 inProceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. New York NY USA: ACM (SIGIR \u201803) pp.143\u2013150. doi:10.1145\/860435.860463.","DOI":"10.1145\/860435.860463"},{"key":"e_1_2_8_20_1","doi-asserted-by":"crossref","unstructured":"Plachouras V.andOunis I.(2007) \u2018Multinomial Randomness Models for Retrieval with Document Fields\u2019 inProceedings of the 29th European Conference on IR Research. Berlin Heidelberg: Springer\u2010Verlag (ECIR'07) pp.28\u201339.","DOI":"10.1007\/978-3-540-71496-5_6"},{"key":"e_1_2_8_21_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000019"},{"key":"e_1_2_8_22_1","doi-asserted-by":"crossref","unstructured":"Robertson S. Zaragoza H.andTaylor M.(2004) \u2018Simple BM25 extension to multiple weighted fields\u2019 inProceedings of the 13th ACM CIKM pp.42\u201349.","DOI":"10.1145\/1031171.1031181"},{"key":"e_1_2_8_23_1","first-page":"1","volume-title":"International Journal on Digital Libraries","author":"Samar T.","year":"2017"},{"key":"e_1_2_8_24_1","doi-asserted-by":"crossref","unstructured":"Singhal A. Buckley C.andMitra M.(1996) \u2018Pivoted document length normalization\u2019 inProce. of the 19th ACM SIGIR conference. (SIGIR \u201896) pp.21\u201329.","DOI":"10.1145\/243199.243206"},{"key":"e_1_2_8_25_1","doi-asserted-by":"crossref","unstructured":"Traub M. C.et al. (2016) \u2018Querylog\u2010based assessment of retrievability bias in a large newspaper corpus\u2019 in2016 IEEE\/ACM Joint Conference on Digital Libraries (JCDL) pp.7\u201316.","DOI":"10.1145\/2910896.2910907"},{"key":"e_1_2_8_26_1","doi-asserted-by":"crossref","unstructured":"Wilkie C.andAzzopardi L.(2013) \u2018Relating retrievability performance and length\u2019 inSIGIR 2013 \u2010 Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. doi:10.1145\/2484028.2484145.","DOI":"10.1145\/2484028.2484145"},{"key":"e_1_2_8_27_1","doi-asserted-by":"crossref","unstructured":"Wilkie C.andAzzopardi L.(2014a)Best and fairest: An empirical analysis of retrieval system bias Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). doi:10.1007\/978-3-319-06028-6_2.","DOI":"10.1007\/978-3-319-06028-6_2"},{"key":"e_1_2_8_28_1","doi-asserted-by":"crossref","unstructured":"Wilkie C.andAzzopardi L.(2014b)Efficiently estimating retrievability bias Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). doi:10.1007\/978-3-319-06028-6_82.","DOI":"10.1007\/978-3-319-06028-6_82"},{"key":"e_1_2_8_29_1","doi-asserted-by":"crossref","unstructured":"Wilkie C.andAzzopardi L.(2015) \u2018Query length retrievability bias and performance\u2019 inInternational Conference on Information and Knowledge Management Proceedings. doi:10.1145\/2806416.2806604.","DOI":"10.1145\/2806416.2806604"},{"key":"e_1_2_8_30_1","doi-asserted-by":"crossref","unstructured":"Wilkie C.andAzzopardi L.(2017) \u2018Algorithmic Bias: Do Good Systems Make Relevant Documents More Retrievable\u2019 inProceedings of the International ACM CIKM. (CIKM \u201817).","DOI":"10.1145\/3132847.3133135"},{"key":"e_1_2_8_31_1","doi-asserted-by":"crossref","unstructured":"Wilkie C.andAzzopardi L.(2017) \u2018An initial investigation of query expansion bias\u2019 inICTIR 2017 \u2010 Proceedings of the 2017 ACM SIGIR International Conference on the Theory of Information Retrieval. doi:10.1145\/3121050.3121097.","DOI":"10.1145\/3121050.3121097"}],"container-title":["Proceedings of the Association for Information Science and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fpra2.2018.14505501061","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/pra2.2018.14505501061","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/full-xml\/10.1002\/pra2.2018.14505501061","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/asistdl.onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/pra2.2018.14505501061","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T17:42:54Z","timestamp":1761068574000},"score":1,"resource":{"primary":{"URL":"https:\/\/asistdl.onlinelibrary.wiley.com\/doi\/10.1002\/pra2.2018.14505501061"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,1]]},"references-count":30,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2018,1]]}},"alternative-id":["10.1002\/pra2.2018.14505501061"],"URL":"https:\/\/doi.org\/10.1002\/pra2.2018.14505501061","archive":["Portico"],"relation":{},"ISSN":["2373-9231","2373-9231"],"issn-type":[{"type":"print","value":"2373-9231"},{"type":"electronic","value":"2373-9231"}],"subject":[],"published":{"date-parts":[[2018,1]]},"assertion":[{"value":"2019-02-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}