{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T15:27:06Z","timestamp":1776785226566,"version":"3.51.2"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,8,18]],"date-time":"2023-08-18T00:00:00Z","timestamp":1692316800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2024,1,31]]},"abstract":"<jats:p>\n            We study hybrid search in text retrieval where lexical and semantic search are\n            <jats:italic>fused<\/jats:italic>\n            together with the intuition that the two are complementary in how they model relevance. In particular, we examine fusion by a convex combination of lexical and semantic scores, as well as the reciprocal rank fusion (RRF) method, and identify their advantages and potential pitfalls. Contrary to existing studies, we find RRF to be sensitive to its parameters; that the learning of a convex combination fusion is generally agnostic to the choice of score normalization; that convex combination outperforms RRF in in-domain and out-of-domain settings; and finally, that convex combination is sample efficient, requiring only a small set of training examples to tune its only parameter to a target domain.\n          <\/jats:p>","DOI":"10.1145\/3596512","type":"journal-article","created":{"date-parts":[[2023,5,20]],"date-time":"2023-05-20T08:59:21Z","timestamp":1684573161000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":36,"title":["An Analysis of Fusion Functions for Hybrid Retrieval"],"prefix":"10.1145","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2469-8242","authenticated-orcid":false,"given":"Sebastian","family":"Bruch","sequence":"first","affiliation":[{"name":"Pinecone, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5912-5196","authenticated-orcid":false,"given":"Siyu","family":"Gai","sequence":"additional","affiliation":[{"name":"University of California, Berkeley, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6639-8240","authenticated-orcid":false,"given":"Amir","family":"Ingber","sequence":"additional","affiliation":[{"name":"Pinecone, Israel"}]}],"member":"320","published-online":{"date-parts":[[2023,8,18]]},"reference":[{"key":"e_1_3_5_2_2","volume-title":"Multi-Stage Search Architectures for Streaming Documents","author":"Asadi Nima","year":"2013","unstructured":"Nima Asadi. 2013. Multi-Stage Search Architectures for Streaming Documents. University of Maryland."},{"key":"e_1_3_5_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/2484028.2484132"},{"key":"e_1_3_5_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531704"},{"key":"e_1_3_5_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331347"},{"key":"e_1_3_5_6_2","doi-asserted-by":"crossref","unstructured":"Tao Chen Mingyang Zhang Jing Lu Michael Bendersky and Marc Najork. 2022. Out-of-domain semantics to the rescue! Zero-shot hybrid retrieval models. In Advances in Information Retrieval . Lecture Notes in Computer Science Vol. 13185. Springer 95\u2013110.","DOI":"10.1007\/978-3-030-99736-6_7"},{"key":"e_1_3_5_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/1571941.1572114"},{"key":"e_1_3_5_8_2","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1007\/978-3-642-36973-5_36","volume-title":"Advances in Information Retrieval","author":"Dang Van","year":"2013","unstructured":"Van Dang, Michael Bendersky, and W. Bruce Croft. 2013. Two-stage learning to rank for information retrieval. In Advances in Information Retrieval. Springer, 423\u2013434."},{"key":"e_1_3_5_9_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171\u20134186."},{"key":"e_1_3_5_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531857"},{"key":"e_1_3_5_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462891"},{"key":"e_1_3_5_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/345508.345545"},{"key":"e_1_3_5_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2019.2921572"},{"key":"e_1_3_5_14_2","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201920)","author":"Karpukhin Vladimir","year":"2020","unstructured":"Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-Tau Yih. 2020. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201920)."},{"key":"e_1_3_5_15_2","unstructured":"Saar Kuzi Mingyang Zhang Cheng Li Michael Bendersky and Marc Najork. 2020. Leveraging semantic and lexical matching to improve the recall of document retrieval systems: A hybrid approach. arXiv:cs.IR\/2010.01195 (2020)."},{"key":"e_1_3_5_16_2","first-page":"2495","volume-title":"Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Li Hang","year":"2022","unstructured":"Hang Li, Shuai Wang, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, and Guido Zuccon. 2022. To interpolate or not to interpolate: PRF, dense and sparse retrievers. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2495\u20132500."},{"key":"e_1_3_5_17_2","doi-asserted-by":"crossref","unstructured":"Jimmy Lin Rodrigo Nogueira and Andrew Yates. 2021. Pretrained Transformers for text ranking: BERT and beyond. arXiv:cs.IR\/2010.06467 (2021).","DOI":"10.1007\/978-3-031-02181-7"},{"key":"e_1_3_5_18_2","doi-asserted-by":"publisher","DOI":"10.1561\/1500000016"},{"key":"e_1_3_5_19_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00369"},{"key":"e_1_3_5_20_2","volume-title":"Proceedings of the Conference and Labs of the Evaluation Forum (CLEF\u201920)","author":"Ma Ji","year":"2020","unstructured":"Ji Ma, Ivan Korotkov, Keith Hall, and Ryan T. McDonald. 2020. Hybrid first-stage retrieval models for biomedical literature. In Proceedings of the Conference and Labs of the Evaluation Forum (CLEF\u201920)."},{"key":"e_1_3_5_21_2","unstructured":"Xueguang Ma Kai Sun Ronak Pradeep and Jimmy J. Lin. 2021. A replication study of dense passage retriever. arXiv:cs.CL\/2004.04906 (2021)."},{"key":"e_1_3_5_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-012-9209-9"},{"key":"e_1_3_5_23_2","unstructured":"Yu. A. Malkov and D. A. Yashunin. 2016. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. arXiv:cs.DS\/1603.09320 (2016)."},{"key":"e_1_3_5_24_2","first-page":"50","volume-title":"Proceedings of the Open-Source IR Replicability Challenge Co-Located with the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.","author":"Mallia Antonio","year":"2019","unstructured":"Antonio Mallia, Michal Siedlaczek, Joel Mackenzie, and Torsten Suel. 2019. PISA: Performant indexes and search for academia. In Proceedings of the Open-Source IR Replicability Challenge Co-Located with the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.50\u201356."},{"key":"e_1_3_5_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401266"},{"key":"e_1_3_5_26_2","unstructured":"Bhaskar Mitra Eric Nalisnick Nick Craswell and Rich Caruana. 2016. A dual embedding space model for document ranking. arXiv:cs.IR\/1602.01137 (2016)."},{"key":"e_1_3_5_27_2","unstructured":"Tri Nguyen Mir Rosenberg Xia Song Jianfeng Gao Saurabh Tiwary Rangan Majumder and Li Deng. 2016. MS MARCO: A human generated MAchine Reading COmprehension dataset. arXiv:1611.09268 (2016)."},{"key":"e_1_3_5_28_2","unstructured":"Rodrigo Nogueira and Kyunghyun Cho. 2020. Passage re-ranking with BERT. arXiv:cs.IR\/1901.04085 ."},{"key":"e_1_3_5_29_2","doi-asserted-by":"crossref","first-page":"708","DOI":"10.18653\/v1\/2020.findings-emnlp.63","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Nogueira Rodrigo","year":"2020","unstructured":"Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document ranking with a pretrained sequence-to-sequence model. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, 708\u2013718."},{"key":"e_1_3_5_30_2","unstructured":"Rodrigo Nogueira Wei Yang Kyunghyun Cho and Jimmy Lin. 2019. Multi-stage document ranking with BERT. arXiv:cs.IR\/1910.14424 (2019)."},{"key":"e_1_3_5_31_2","unstructured":"Rodrigo Nogueira Wei Yang Jimmy Lin and Kyunghyun Cho. 2019. Document expansion by query prediction. arXiv:cs.IR\/1904.08375 (2019)."},{"key":"e_1_3_5_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-009-9124-x"},{"key":"e_1_3_5_33_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_3_5_34_2","doi-asserted-by":"publisher","DOI":"10.1561\/1500000019"},{"key":"e_1_3_5_35_2","first-page":"109","volume-title":"TREC (NIST Special Publication)","author":"Robertson Stephen E.","year":"1994","unstructured":"Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at TREC-3. In TREC (NIST Special Publication), Vol. 500-225. National Institute of Standards and Technology (NIST), 109\u2013126."},{"key":"e_1_3_5_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531766"},{"key":"e_1_3_5_37_2","doi-asserted-by":"publisher","DOI":"10.3115\/1220835.1220887"},{"key":"e_1_3_5_38_2","volume-title":"Proceedings of the 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)","author":"Thakur Nandan","year":"2021","unstructured":"Nandan Thakur, Nils Reimers, Andreas R\u00fcckl\u00e9, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Proceedings of the 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)."},{"key":"e_1_3_5_39_2","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_3_5_40_2","first-page":"New York, NY, 1","volume-title":"Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Wang Lidan","year":"2011","unstructured":"Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 105\u2013114."},{"key":"e_1_3_5_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3471158.3472233"},{"key":"e_1_3_5_42_2","article-title":"Adapting boosting for information retrieval measures","author":"Wu Qiang","year":"2010","unstructured":"Qiang Wu, Christopher J. C. Burges, Krysta M. Svore, and Jianfeng Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval 13 (2010), 254\u2013270.","journal-title":"Information Retrieval"},{"key":"e_1_3_5_43_2","unstructured":"Xiang Wu Ruiqi Guo David Simcha Dave Dopson and Sanjiv Kumar. 2019. Efficient inner product approximation in hybrid spaces. arXiv:cs.LG\/1903.08690 (2019)."},{"key":"e_1_3_5_44_2","unstructured":"Yonghui Wu Mike Schuster Zhifeng Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun et\u00a0al. 2016. Google\u2019s Neural Machine Translation system: Bridging the gap between human and machine translation. arXiv:cs.CL\/1609.08144 (2016)."},{"key":"e_1_3_5_45_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Xiong Lee","year":"2021","unstructured":"Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_5_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939677"},{"key":"e_1_3_5_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3539813.3545141"},{"key":"e_1_3_5_48_2","unstructured":"Jingtao Zhan Jiaxin Mao Yiqun Liu Min Zhang and Shaoping Ma. 2020. RepBERT: Contextualized text embeddings for first-stage retrieval. arXiv:cs.IR\/2006.15498 (2020)."}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3596512","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3596512","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:48:00Z","timestamp":1750178880000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3596512"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,18]]},"references-count":47,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,1,31]]}},"alternative-id":["10.1145\/3596512"],"URL":"https:\/\/doi.org\/10.1145\/3596512","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"value":"1046-8188","type":"print"},{"value":"1558-2868","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,18]]},"assertion":[{"value":"2022-09-22","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-05-03","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-08-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}