{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T02:48:55Z","timestamp":1774406935397,"version":"3.50.1"},"reference-count":11,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2017,8,2]],"date-time":"2017-08-02T00:00:00Z","timestamp":1501632000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGIR Forum"],"published-print":{"date-parts":[[2017,8,2]]},"abstract":"<jats:p>Automatic information retrieval systems have to deal with documents of varying lengths in a text collection. Document length normalization is used to fairly retrieve documents of all lengths. In this study, we ohserve that a normalization scheme that retrieves documents of all lengths with similar chances as their likelihood of relevance will outperform another scheme which retrieves documents with chances very different from their likelihood of relevance. We show that the retrievaf probabilities for a particular normalization method deviate systematically from the relevance probabilities across different collections. We present pivoted normalization, a technique that can be used to modify any normalization function thereby reducing the gap between the relevance and the retrieval probabilities. Training pivoted normalization on one collection, we can successfully use it on other (new) text collections, yielding a robust, collectzorz independent normalization technique. We use the idea of pivoting with the well known cosine normalization function. We point out some shortcomings of the cosine function andpresent two new normalization functions--pivoted unique normalization and piuotert byte size normalization.<\/jats:p>","DOI":"10.1145\/3130348.3130365","type":"journal-article","created":{"date-parts":[[2017,8,2]],"date-time":"2017-08-02T19:36:12Z","timestamp":1501702572000},"page":"176-184","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":31,"title":["Pivoted Document Length Normalization"],"prefix":"10.1145","volume":"51","author":[{"given":"Amit","family":"Singhal","sequence":"first","affiliation":[{"name":"Cornell University, Ithaca, NY"}]},{"given":"Chris","family":"Buckley","sequence":"additional","affiliation":[{"name":"Cornell University, Ithaca, NY"}]},{"given":"Manclar","family":"Mitra","sequence":"additional","affiliation":[{"name":"Cornell University, Ithaca, NY"}]}],"member":"320","published-online":{"date-parts":[[2017,8,2]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"29","volume-title":"Proceedings of the Third Text REtrieval Conference (TREC-3)","author":"Broglio J.","year":"1995","unstructured":"J. Broglio , J.P. CaJlan , W.B. Croft , and D.W. Nachbar . Document retrieval and routing using the INQUERY system. In D. K. Harman, editor , Proceedings of the Third Text REtrieval Conference (TREC-3) , pages 29 - 38 . NIST Special Publication 500- -225, April 1995 . J. Broglio, J.P. CaJlan, W.B. Croft, and D.W. Nachbar. Document retrieval and routing using the INQUERY system. In D. K. Harman, editor, Proceedings of the Third Text REtrieval Conference (TREC-3), pages 29- 38. NIST Special Publication 500--225, April 1995."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.3115\/1075671.1075753"},{"key":"e_1_2_1_3_1","first-page":"69","volume-title":"Proceedings of the Third Text REtrieval Conference (TRE6-3)","author":"Buckley Chris","year":"1995","unstructured":"Chris Buckley , James Allan , Gerard Salton , and Amit Singhal . Automatic query expansion using SMART : TREC 3. In D. K. Harman, editor , Proceedings of the Third Text REtrieval Conference (TRE6-3) , pages 69 -- 80 . NIST Special Publication 500- -225, April 1995 . Chris Buckley, James Allan, Gerard Salton, and Amit Singhal. Automatic query expansion using SMART : TREC 3. In D. K. Harman, editor, Proceedings of the Third Text REtrieval Conference (TRE6-3), pages 69--80. NIST Special Publication 500--225, April 1995."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.6028\/NIST.SP.500-225"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4471-2099-5_24"},{"key":"e_1_2_1_6_1","first-page":"109","volume-title":"Proceedings of the Third Text REtrieval Conference (TREC-3)","author":"Robertson S.E.","year":"1995","unstructured":"S.E. Robertson , S. Walker , S. Jones , M.M. Hancock- Beaulieu , and M. Gatford . Okapi at TREC-3. In D. K. Harman, editor , Proceedings of the Third Text REtrieval Conference (TREC-3) , pages 109 -- 126 . NIST Special Publication 500- -225, April 1995 . S.E. Robertson, S. Walker, S. Jones, M.M. Hancock- Beaulieu, and M. Gatford. Okapi at TREC-3. In D. K. Harman, editor, Proceedings of the Third Text REtrieval Conference (TREC-3), pages 109--126. NIST Special Publication 500--225, April 1995."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/77013"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(88)90021-0"},{"key":"e_1_2_1_9_1","volume-title":"Introduction to Modern Information Retrieval","author":"Salton Gerard","year":"1983","unstructured":"Gerard Salton and M.J. McGill . Introduction to Modern Information Retrieval . McGraw Hill Book Co. , New York , 1983 . Gerard Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw Hill Book Co., New York, 1983."},{"issue":"11","key":"e_1_2_1_10_1","first-page":"613","article-title":"A vector space model for information retrievel","volume":"18","author":"Salton Gerard","year":"1975","unstructured":"Gerard Salton , A. Wong , and C.S. Yang . A vector space model for information retrievel . Journal of the American Society for Information Science , 18 ( 11 ): 613 -- 620 , November 1975 . Gerard Salton, A. Wong, and C.S. Yang. A vector space model for information retrievel. Journal of the American Society for Information Science, 18(11):613--620, November 1975.","journal-title":"Journal of the American Society for Information Science"},{"key":"e_1_2_1_11_1","first-page":"149","volume-title":"Fifth Annual Symposium on Document Analysis and Information Retrieval","author":"Singhal Amit","year":"1996","unstructured":"Amit Singhal , Gerard Salt , on, and Chris Buckley . Length normalization in degraded text collections . In Fifth Annual Symposium on Document Analysis and Information Retrieval , pages 149 -- 162 , April 1996 . Also Technical Report TR95-1507, Department of Computer Science, Cornell [University, Ithaca, NY 14853, April 1995. Amit Singhal, Gerard Salt, on, and Chris Buckley. Length normalization in degraded text collections. In Fifth Annual Symposium on Document Analysis and Information Retrieval, pages 149--162, April 1996. Also Technical Report TR95-1507, Department of Computer Science, Cornell [University, Ithaca, NY 14853, April 1995."}],"container-title":["ACM SIGIR Forum"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3130348.3130365","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3130348.3130365","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:26:17Z","timestamp":1750213577000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3130348.3130365"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,8,2]]},"references-count":11,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2017,8,2]]}},"alternative-id":["10.1145\/3130348.3130365"],"URL":"https:\/\/doi.org\/10.1145\/3130348.3130365","relation":{},"ISSN":["0163-5840"],"issn-type":[{"value":"0163-5840","type":"print"}],"subject":[],"published":{"date-parts":[[2017,8,2]]},"assertion":[{"value":"2017-08-02","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}