{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:23:58Z","timestamp":1750220638565,"version":"3.41.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2020,12,17]],"date-time":"2020-12-17T00:00:00Z","timestamp":1608163200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100011105","name":"European Commission","doi-asserted-by":"publisher","award":["780751"],"award-info":[{"award-number":["780751"]}],"id":[{"id":"10.13039\/100011105","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Ministero dell\u00f0Istruzione, dell\u00f0Universit\u00e0 e della Ricerca","award":["ARS01_00917"],"award-info":[{"award-number":["ARS01_00917"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2021,4,30]]},"abstract":"<jats:p>We observe that in curated documents the distribution of the occurrences of salient terms, e.g., terms with a high Inverse Document Frequency, is not uniform, and such terms are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of the document. We study a multiplicity of partitioning schemes of document content into passages and compute the collection-dependent weights associated with them on the basis of the distribution of occurrences of salient terms in documents. Moreover, we tune BM25P hyperparameters and investigate their impact on ad hoc document retrieval through fully reproducible experiments conducted using four publicly available datasets. Our findings demonstrate that our BM25P weighting model markedly and consistently outperforms BM25 in terms of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1, and up to 21% in MRR.<\/jats:p>","DOI":"10.1145\/3428687","type":"journal-article","created":{"date-parts":[[2020,12,17]],"date-time":"2020-12-17T17:51:13Z","timestamp":1608227473000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Weighting Passages Enhances Accuracy"],"prefix":"10.1145","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5265-1831","authenticated-orcid":false,"given":"Cristina Ioana","family":"Muntean","sequence":"first","affiliation":[{"name":"ISTI-CNR, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Franco Maria","family":"Nardini","sequence":"additional","affiliation":[{"name":"ISTI-CNR, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Raffaele","family":"Perego","sequence":"additional","affiliation":[{"name":"ISTI-CNR, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7427-1001","authenticated-orcid":false,"given":"Nicola","family":"Tonellotto","sequence":"additional","affiliation":[{"name":"University of Pisa, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ophir","family":"Frieder","sequence":"additional","affiliation":[{"name":"Georgetown University, U.S.A."}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,12,17]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/11735106_3"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1367497.1367717"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-009-9118-8"},{"volume-title":"Proceedings of the ACM Conference on Recommender Systems (RecSys\u201907)","author":"Bogers Toine","key":"e_1_2_1_4_1","unstructured":"Toine Bogers and Antal van den Bosch. 2007. Comparing and evaluating information retrieval algorithms for news recommendation . In Proceedings of the ACM Conference on Recommender Systems (RecSys\u201907) . Association for Computing Machinery, New York, NY, 141--144. DOI:https:\/\/doi.org\/10.1145\/1297231.1297256 10.1145\/1297231.1297256 Toine Bogers and Antal van den Bosch. 2007. Comparing and evaluating information retrieval algorithms for news recommendation. In Proceedings of the ACM Conference on Recommender Systems (RecSys\u201907). Association for Computing Machinery, New York, NY, 141--144. DOI:https:\/\/doi.org\/10.1145\/1297231.1297256"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM\u201906)","author":"B\u00fcttcher Stefan","year":"1836","unstructured":"Stefan B\u00fcttcher and Charles L. A. Clarke . 2006. A document-centric approach to static index pruning in text retrieval systems . In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM\u201906) . Association for Computing Machinery, New York, NY, 182--189. DOI:https:\/\/doi.org\/10.1145\/1 1836 14.1183644 10.1145\/1183614.1183644 Stefan B\u00fcttcher and Charles L. A. Clarke. 2006. A document-centric approach to static index pruning in text retrieval systems. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM\u201906). Association for Computing Machinery, New York, NY, 182--189. DOI:https:\/\/doi.org\/10.1145\/1183614.1183644"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148285"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/188490.188589"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2016.05.004"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/366836.366860"},{"key":"e_1_2_1_10_1","volume-title":"Franco Maria Nardini, Raffaele Perego, and Nicola Tonellotto.","author":"Catena Matteo","year":"2019","unstructured":"Matteo Catena , Ophir Frieder , Cristina Ioana Muntean , Franco Maria Nardini, Raffaele Perego, and Nicola Tonellotto. 2019 . Enhanced news retrieval: Passages lead the way! In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\u201919). ACM, New York, NY , 1269--1272. DOI:https:\/\/doi.org\/10.1145\/3331184.3331373 10.1145\/3331184.3331373 Matteo Catena, Ophir Frieder, Cristina Ioana Muntean, Franco Maria Nardini, Raffaele Perego, and Nicola Tonellotto. 2019. Enhanced news retrieval: Passages lead the way! In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\u201919). ACM, New York, NY, 1269--1272. DOI:https:\/\/doi.org\/10.1145\/3331184.3331373"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the NewsIR\u201916 Workshop. CEUR-WS, 42--47","author":"Corney David","year":"2016","unstructured":"David Corney , Dyaa Albakour , Miguel Martinez , and Samir Moussa . 2016 . What do a million news articles look like? In Proceedings of the NewsIR\u201916 Workshop. CEUR-WS, 42--47 . David Corney, Dyaa Albakour, Miguel Martinez, and Samir Moussa. 2016. What do a million news articles look like? In Proceedings of the NewsIR\u201916 Workshop. CEUR-WS, 42--47."},{"key":"e_1_2_1_12_1","volume-title":"Search Engines: Information Retrieval in Practice","author":"Croft Bruce","year":"2009","unstructured":"Bruce Croft , Donald Metzler , and Trevor Strohman . 2009 . Search Engines: Information Retrieval in Practice ( 1 st ed.). Addison-Wesley Publishing Company . Bruce Croft, Donald Metzler, and Trevor Strohman. 2009. Search Engines: Information Retrieval in Practice (1st ed.). Addison-Wesley Publishing Company.","edition":"1"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3209980"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/160688.160720"},{"key":"e_1_2_1_15_1","volume-title":"BM25T: A BM25 extension for focused information retrieval. Knowl. Inf. Syst. 32, 1 (01","author":"G\u00e9ry Mathias","year":"2012","unstructured":"Mathias G\u00e9ry and Christine Largeron . 2012. BM25T: A BM25 extension for focused information retrieval. Knowl. Inf. Syst. 32, 1 (01 July 2012 ), 217--241. DOI:https:\/\/doi.org\/10.1007\/s10115-011-0426-0 10.1007\/s10115-011-0426-0 Mathias G\u00e9ry and Christine Largeron. 2012. BM25T: A BM25 extension for focused information retrieval. Knowl. Inf. Syst. 32, 1 (01 July 2012), 217--241. DOI:https:\/\/doi.org\/10.1007\/s10115-011-0426-0"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2011.03.007"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/369401.369409"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2018.2860982"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/1005332.1005345"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3271764"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331205"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3308774.3308781"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808194.2809486"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766462.2767827"},{"volume-title":"Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM\u201902)","author":"Liu Xiaoyong","key":"e_1_2_1_25_1","unstructured":"Xiaoyong Liu and W. Bruce Croft . 2002. Passage retrieval based on language models . In Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM\u201902) . Association for Computing Machinery, New York, NY, 375--382. DOI:https:\/\/doi.org\/10.1145\/584792.584854 10.1145\/584792.584854 Xiaoyong Liu and W. Bruce Croft. 2002. Passage retrieval based on language models. In Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM\u201902). Association for Computing Machinery, New York, NY, 375--382. DOI:https:\/\/doi.org\/10.1145\/584792.584854"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1571941.1571994"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331316"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the OSIR Workshop at the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\u201912)","author":"Macdonald Craig","year":"2012","unstructured":"Craig Macdonald , Richard McCreadie , Rodrygo L. T. Santos , and Iadh Ounis . 2012 . From puppy to maturity: Experiences in developing Terrier . In Proceedings of the OSIR Workshop at the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\u201912) . 60--63. Craig Macdonald, Richard McCreadie, Rodrygo L. T. Santos, and Iadh Ounis. 2012. From puppy to maturity: Experiences in developing Terrier. In Proceedings of the OSIR Workshop at the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\u201912). 60--63."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080827"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/1526645.1526657"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the SIGIR Workshop on Information Retrieval for Question Answering","volume":"2","author":"Monz Christof","year":"2004","unstructured":"Christof Monz . 2004 . Minimal span weighting retrieval for question answering . In Proceedings of the SIGIR Workshop on Information Retrieval for Question Answering , Vol. 2 . Christof Monz. 2004. Minimal span weighting retrieval for question answering. In Proceedings of the SIGIR Workshop on Information Retrieval for Question Answering, Vol. 2."},{"key":"e_1_2_1_32_1","volume-title":"MS MARCO: A human generated machine reading comprehension dataset.","author":"Nguyen Tri","year":"2016","unstructured":"Tri Nguyen , Mir Rosenberg , Xia Song , Jianfeng Gao , Saurabh Tiwary , Rangan Majumder , and Li Deng . 2016 . MS MARCO: A human generated machine reading comprehension dataset. Retrieved from https:\/\/www.microsoft.com\/en-us\/research\/publication\/ms-marco-human-generated-machine-reading-comprehension-dataset\/. Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. Retrieved from https:\/\/www.microsoft.com\/en-us\/research\/publication\/ms-marco-human-generated-machine-reading-comprehension-dataset\/."},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the 1st Instructional Conference on Machine Learning","volume":"242","author":"\u00a0al Juan Ramos","year":"2003","unstructured":"Juan Ramos et \u00a0al . 2003 . Using TF-IDF to determine word relevance in document queries . In Proceedings of the 1st Instructional Conference on Machine Learning , Vol. 242 . Piscataway, NJ, 133--142. Juan Ramos et\u00a0al. 2003. Using TF-IDF to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning, Vol. 242. Piscataway, NJ, 133--142."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/502932.502945"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-36618-0_15"},{"volume-title":"Proceedings of the 3rd Text REtrieval Conference (TREC-3). 109--126","author":"Robertson Stephen","key":"e_1_2_1_36_1","unstructured":"Stephen Robertson , S. Walker , S. Jones , M. M. Hancock-Beaulieu , and M. Gatford . 1995. Okapi at TREC-3 . In Proceedings of the 3rd Text REtrieval Conference (TREC-3). 109--126 . Retrieved from https:\/\/www.microsoft.com\/en-us\/research\/publication\/okapi-at-trec-3\/. Stephen Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. 1995. Okapi at TREC-3. In Proceedings of the 3rd Text REtrieval Conference (TREC-3). 109--126. Retrieved from https:\/\/www.microsoft.com\/en-us\/research\/publication\/okapi-at-trec-3\/."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1031171.1031181"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000019"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/160688.160693"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.10060"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1321440.1321528"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the 8th International Conference on Information and Knowledge Management (CIKM\u201999)","author":"Song Fei","year":"1995","unstructured":"Fei Song and W. Bruce Croft . 1999. A general language model for information retrieval . In Proceedings of the 8th International Conference on Information and Knowledge Management (CIKM\u201999) . Association for Computing Machinery, New York, NY, 316--321. DOI:https:\/\/doi.org\/10.1145\/3 1995 0.320022 10.1145\/319950.320022 Fei Song and W. Bruce Croft. 1999. A general language model for information retrieval. In Proceedings of the 8th International Conference on Information and Knowledge Management (CIKM\u201999). Association for Computing Machinery, New York, NY, 316--321. DOI:https:\/\/doi.org\/10.1145\/319950.320022"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277794"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/1183614.1183698"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/290941.290947"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000057"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2003.10.003"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2682862.2682863"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390334.1390407"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.5555\/188490.188591"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings IEEE Advances in Digital Libraries. 141--150","author":"Wolff J. E.","year":"2000","unstructured":"J. E. Wolff , H. Florke , and A. B. Cremers . 2000. Searching and browsing collections of structural information . In Proceedings IEEE Advances in Digital Libraries. 141--150 . DOI:https:\/\/doi.org\/10.1109\/ADL. 2000 .848377 10.1109\/ADL.2000.848377 J. E. Wolff, H. Florke, and A. B. Cremers. 2000. Searching and browsing collections of structural information. In Proceedings IEEE Advances in Digital Libraries. 141--150. DOI:https:\/\/doi.org\/10.1109\/ADL.2000.848377"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331233"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/984321.984322"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/2348283.2348507"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3428687","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3428687","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:01:59Z","timestamp":1750197719000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3428687"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,17]]},"references-count":54,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2021,4,30]]}},"alternative-id":["10.1145\/3428687"],"URL":"https:\/\/doi.org\/10.1145\/3428687","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"type":"print","value":"1046-8188"},{"type":"electronic","value":"1558-2868"}],"subject":[],"published":{"date-parts":[[2020,12,17]]},"assertion":[{"value":"2020-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-12-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}