{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T17:43:08Z","timestamp":1763142188432,"version":"3.41.0"},"reference-count":85,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2024,4,29]],"date-time":"2024-04-29T00:00:00Z","timestamp":1714348800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"European Union - Horizon 2020 Program","award":["INFRAIA-01-2018-2019"],"award-info":[{"award-number":["INFRAIA-01-2018-2019"]}]},{"name":"Integrating Activities for Advanced Communities","award":["871042"],"award-info":[{"award-number":["871042"]}]},{"name":"SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics"},{"name":"Science and Engineering Research Board, Department of Science and Technology, Government of India","award":["SRG\/2022\/001548"],"award-info":[{"award-number":["SRG\/2022\/001548"]}]},{"name":"DST-INSPIRE Faculty Fellowship","award":["DST\/INSPIRE\/04\/2021\/003055"],"award-info":[{"award-number":["DST\/INSPIRE\/04\/2021\/003055"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2024,9,30]]},"abstract":"<jats:p>Contextual ranking models have delivered impressive performance improvements over classical models in the document ranking task. However, these highly over-parameterized models tend to be data-hungry and require large amounts of data even for fine-tuning. In this article, we propose data-augmentation methods for effective and robust ranking performance. One of the key benefits of using data augmentation is in achieving<jats:italic>sample efficiency<\/jats:italic>or learning effectively when we have only a small amount of training data. We propose supervised and unsupervised data augmentation schemes by creating training data using parts of the relevant documents in the query-document pairs. We then adapt a family of contrastive losses for the document ranking task that can exploit the augmented data to learn an effective ranking model. Our extensive experiments on subsets of the<jats:sc>MS MARCO<\/jats:sc>and<jats:sc>TREC-DL<\/jats:sc>test sets show that data augmentation, along with the ranking-adapted contrastive losses, results in performance improvements under most dataset sizes. Apart from sample efficiency, we conclusively show that data augmentation results in robust models when transferred to out-of-domain benchmarks. Our performance improvements in in-domain and more prominently in out-of-domain benchmarks show that augmentation regularizes the ranking model and improves its robustness and generalization capability.<\/jats:p>","DOI":"10.1145\/3634911","type":"journal-article","created":{"date-parts":[[2023,11,29]],"date-time":"2023-11-29T11:58:56Z","timestamp":1701259136000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Data Augmentation for Sample Efficient and Robust Document Ranking"],"prefix":"10.1145","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8006-0649","authenticated-orcid":false,"given":"Abhijit","family":"Anand","sequence":"first","affiliation":[{"name":"L3S Research Center, Hannover, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1992-9261","authenticated-orcid":false,"given":"Jurek","family":"Leonhardt","sequence":"additional","affiliation":[{"name":"Delft University of Technology, The Netherlands and L3S ResearchCenter, Hannover, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4364-2487","authenticated-orcid":false,"given":"Jaspreet","family":"Singh","sequence":"additional","affiliation":[{"name":"Independent Researcher, Berlin, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2486-7608","authenticated-orcid":false,"given":"Koustav","family":"Rudra","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Kharagpur, Kharagpur, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0163-0739","authenticated-orcid":false,"given":"Avishek","family":"Anand","sequence":"additional","affiliation":[{"name":"Delft University of Technology, Delft, Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,4,29]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-12275-0_30"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3539813.3545139"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3368567.3368587"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531863"},{"key":"e_1_3_2_6_2","first-page":"1","article-title":"Evaluation of scholarly information retrieval using precision and recall","author":"Butt Kiran","year":"2021","unstructured":"Kiran Butt and Abid Hussain. 2021. Evaluation of scholarly information retrieval using precision and recall. Library Philosophy and Practice (2021), 1\u201311.","journal-title":"Library Philosophy and Practice"},{"key":"e_1_3_2_7_2","unstructured":"Wei-Cheng Chang X. Yu Felix Yin-Wen Chang Yiming Yang and Sanjiv Kumar. 2020. Pre-training tasks for embedding-based large-scale retrieval. In 8th International Conference on Learning Representations (ICLR\u201920) Addis Ababa Ethiopia April 26-30 2020. https:\/\/openreview.net\/forum?id=rkg-mA4FDr"},{"key":"e_1_3_2_8_2","unstructured":"Pengguang Chen Shu Liu Hengshuang Zhao and Jiaya Jia. 2020. Gridmask data augmentation. arXiv:2001.04086. Retrieved from https:\/\/arxiv.org\/abs\/cs\/001.04086"},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","unstructured":"Yanda Chen Chris Kedzie Suraj Nair Petra Galu\u0161\u010d\u00e1kov\u00e1 Rui Zhang Douglas W. Oard and Kathleen Mckeown. 2021. Cross-language sentence selection via data augmentation and rationale training. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 3881\u20133895.","DOI":"10.18653\/v1\/2021.acl-long.300"},{"key":"e_1_3_2_10_2","article-title":"TREC-2019-Deep-Learning","author":"Craswell Nick","year":"2019","unstructured":"Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2019. TREC-2019-Deep-Learning. Retrieved from https:\/\/microsoft.github.io\/TREC-2019-Deep-Learning\/. (2019).","journal-title":"https:\/\/microsoft.github.io\/TREC-2019-Deep-Learning\/"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00020"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331303"},{"key":"e_1_3_2_13_2","unstructured":"Zhuyun Dai Vincent Y. Zhao Ji Ma Yi Luan Jianmo Ni Jing Lu Anton Bakalov Kelvin Guu Keith Hall and Ming-Wei Chang. 2022. Promptagator: Few-shot dense retrieval from 8 examples. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/pdf?id=gmL46YMpu2J"},{"key":"e_1_3_2_14_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. [n. d.]. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies NAACL-HLT 2019 Minneapolis MN USA June 2-7 2019 Volume 1 (Long and Short Papers). 4171."},{"key":"e_1_3_2_15_2","unstructured":"Jeff Donahue and Karen Simonyan. 2019. Large scale adversarial representation learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. 10542\u201310552."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3463098"},{"key":"e_1_3_2_17_2","article-title":"Pairwise t-test on TREC Run Files","author":"Gallagher Luke","year":"2019","unstructured":"Luke Gallagher. 2019. Pairwise t-test on TREC Run Files. Retrieved from https:\/\/github.com\/lgrz\/pairwise-ttest\/. (2019).","journal-title":"https:\/\/github.com\/lgrz\/pairwise-ttest\/"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.75"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.203"},{"key":"e_1_3_2_20_2","doi-asserted-by":"crossref","unstructured":"Xiang Gao Ripon K. Saha Mukul R. Prasad and Abhik Roychoudhury. 2020. Fuzz testing based data augmentation to improve robustness of deep neural networks. In Proceedings of the ACM\/IEEE 42nd International Conference on Software Engineering. 1147\u20131158.","DOI":"10.1145\/3377811.3380415"},{"key":"e_1_3_2_21_2","unstructured":"Spyros Gidaris Praveer Singh and Nikos Komodakis. 2018. Unsupervised Representation Learning by Predicting Image Rotations. In 6th International Conference on Learning Representations (ICLR\u201918) Vancouver BC Canada April 30 - May 3 2018 Conference Track Proceedings. https:\/\/openreview.net\/forum?id=S1v4N2l0-"},{"key":"e_1_3_2_22_2","unstructured":"Jacob Goldberger Sam T. Roweis Geoffrey E. Hinton and Ruslan Salakhutdinov. 2004. Neighbourhood components analysis. In Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems (NIPS\u201904 December 13-18 2004 Vancouver British Columbia Canada). 513\u2013520."},{"key":"e_1_3_2_23_2","unstructured":"Beliz Gunel Jingfei Du Alexis Conneau and Veselin Stoyanov. 2021. Supervised contrastive learning for Pre-trained language model fine-tuning. In 9th International Conference on Learning Representations (ICLR\u201921). Virtual Event Austria. https:\/\/openreview.net\/forum?id=cu7IUiOhujH"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00975"},{"key":"e_1_3_2_25_2","unstructured":"R. Devon Hjelm Alex Fedorov Samuel Lavoie-Marchildon Karan Grewal Phil Bachman Adam Trischler and Yoshua Bengio. 2018. Learning deep representations by mutual information estimation and maximization. In 7th International Conference on Learning Representations (ICLR\u201919) New Orleans LA."},{"key":"e_1_3_2_26_2","doi-asserted-by":"crossref","unstructured":"Sebastian Hofst\u00e4tter Hamed Zamani Bhaskar Mitra Nick Craswell and Allan Hanbury. 2020. Local self-attention over long text for efficient document retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021\u20132024.","DOI":"10.1145\/3397271.3401224"},{"key":"e_1_3_2_27_2","unstructured":"Sebastian Hofst\u00e4tter Markus Zlabinger and Allan Hanbury. 2020. Interpretable & time-budget-constrained contextual- ization for re-ranking. In ECAI 2020 24th European Conference on Artificial Intelligence 29 August-8 September 2020 Santiago de Compostela Spain-Including 10th Conference on Prestigious Applications of Artificial Intelligence (PAIS\u201920). IOS Press 1\u20138."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.3390\/technologies9010002"},{"key":"e_1_3_2_29_2","doi-asserted-by":"crossref","unstructured":"Vladimir Karpukhin Barlas Oguz Sewon Min Patrick Lewis Ledell Wu Sergey Edunov Danqi Chen and Wen-tau Yih. 2020. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6769\u20136781.","DOI":"10.18653\/v1\/2020.emnlp-main.550"},{"key":"e_1_3_2_30_2","doi-asserted-by":"crossref","unstructured":"Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 39\u201348.","DOI":"10.1145\/3397271.3401075"},{"key":"e_1_3_2_31_2","unstructured":"Prannay Khosla Piotr Teterwak Chen Wang Aaron Sarna Yonglong Tian Phillip Isola Aaron Maschinot Ce Liu and Dilip Krishnan. 2020. Supervised contrastive learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems. 18661\u201318673."},{"key":"e_1_3_2_32_2","unstructured":"Varun Kumar Ashutosh Choudhary and Eunah Cho. 2020. Data augmentation using Pre-trained transformer models. In Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems. 18\u201326."},{"key":"e_1_3_2_33_2","doi-asserted-by":"crossref","unstructured":"Carlos Lassance Herv\u00e9 Dejean and St\u00e9phane Clinchant. 2023. An experimental study on pretraining transformers from scratch for IR. In Advances in Information Retrieval: 45th European Conference on Information Retrieval ECIR 2023 Dublin Ireland April 2\u20136 2023 Proceedings Part I. 504\u2013520.","DOI":"10.1007\/978-3-031-28244-7_32"},{"key":"e_1_3_2_34_2","doi-asserted-by":"crossref","unstructured":"Jurek Leonhardt Koustav Rudra and Avishek Anand. 2023. Extractive explanations for interpretable text ranking. ACM Transactions on Information Systems 41 4 (2023) 1\u201331.","DOI":"10.1145\/3576924"},{"key":"e_1_3_2_35_2","doi-asserted-by":"crossref","unstructured":"Jurek Leonhardt Koustav Rudra Megha Khosla Abhijit Anand and Avishek Anand. 2022. Efficient neural ranking using forward indexes. In WWW\u201922: The ACM Web Conference 2022 Virtual Event Lyon France 266\u2013276.","DOI":"10.1145\/3485447.3511955"},{"key":"e_1_3_2_36_2","doi-asserted-by":"crossref","unstructured":"Canjia Li Andrew Yates Sean MacAvaney Ben He and Yingfei Sun. 2023. PARADE: Passage representation aggrega-tion for document reranking. ACM Transactions on Information Systems 42 2 (2023) 1\u201326.","DOI":"10.1145\/3600088"},{"issue":"3","key":"e_1_3_2_37_2","first-page":"1","article-title":"The power of selecting key blocks with local pre-ranking for long document information retrieval","volume":"41","author":"Li Minghan","year":"2023","unstructured":"Minghan Li, Diana Nicoleta Popa, Johan Chagnon, Yagmur Gizem Cinar, and Eric Gaussier. 2023. The power of selecting key blocks with local pre-ranking for long document information retrieval. ACM Transactions on Information Systems 41, 3 (2023), 1\u201335.","journal-title":"ACM Transactions on Information Systems"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3471158.3472245"},{"key":"e_1_3_2_39_2","unstructured":"Yijiang Lian Zhenjun You Fan Wu Wenqiang Liu and Jing Jia. 2020. Retrieve synonymous keywords for frequent queries in sponsored search in a data augmentation way. arXiv:2008.01969. Retrieved from https:\/\/arxiv.org\/abs\/cs\/2008.01969"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3308774.3308781"},{"key":"e_1_3_2_41_2","first-page":"4134","volume-title":"Advances in Neural Information Processing Systems","author":"Lindgren Erik","year":"2021","unstructured":"Erik Lindgren, Sashank Reddi, Ruiqi Guo, and Sanjiv Kumar. 2021. Efficient training of retrieval models using negative cache. In Advances in Neural Information Processing Systems. M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 4134\u20134146. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2021\/file\/2175f8c5cd9604f6b1e576b252d4c86e-Paper.pdf"},{"key":"e_1_3_2_42_2","first-page":"7","volume-title":"Proceedings of the ICML","volume":"2","author":"Liu Weiyang","year":"2016","unstructured":"Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. 2016. Large-margin softmax loss for convolutional neural networks. In Proceedings of the ICML, Vol. 2. 7."},{"key":"e_1_3_2_43_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692. Retrieved from https:\/\/arxiv.org\/abs\/cs\/1907.11692"},{"key":"e_1_3_2_44_2","doi-asserted-by":"crossref","unstructured":"Shayne Longpre Yu Wang and Chris DuBois. 2020. How effective is task-agnostic data augmentation for pretrained transformers? In Findings of the Association for Computational Linguistics: (EMNLP\u201920). 4401\u20134411.","DOI":"10.18653\/v1\/2020.findings-emnlp.394"},{"key":"e_1_3_2_45_2","unstructured":"Xueguang Ma Xinyu Zhang Ronak Pradeep and Jimmy Lin. 2023. Zero-shot listwise document reranking with a large language model. arXiv:2305.02156. Retrieved from https:\/\/arxiv.org\/abs\/cs\/2305.02156"},{"key":"e_1_3_2_46_2","unstructured":"Sean MacAvaney Andrew Yates Arman Cohan and Nazli Goharian. 2019. CEDR: Contextualized embeddings for document ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1101\u20131104."},{"key":"e_1_3_2_47_2","doi-asserted-by":"crossref","unstructured":"John Morris Eli Lifland Jin Yong Yoo Jake Grigsby Di Jin and Yanjun Qi. 2020. TextAttack: A framework for adversarial attacks data augmentation and adversarial training in NLP. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 119\u2013126.","DOI":"10.18653\/v1\/2020.emnlp-demos.16"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00799-022-00337-y"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9054130"},{"key":"e_1_3_2_50_2","unstructured":"Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with BERT. arXiv:1901.04085. Retrieved from https:\/\/arxiv.org\/abs\/cs\/1901.04085"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISRITI48646.2019.9034594"},{"key":"e_1_3_2_52_2","unstructured":"Aaron van den Oord Yazhe Li and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv:1807.03748. Retrieved from https:\/\/arxiv.org\/abs\/cs\/1807.03748"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","unstructured":"Baolin Peng Chenguang Zhu Michael Zeng and Jianfeng Gao. 2021. Data augmentation for spoken language understanding via pretrained language models. In Interspeech 2021 22nd Annual Conference of the International Speech Communication Association. 1219\u20131223. 10.21437\/Interspeech.2021-117","DOI":"10.21437\/Interspeech.2021-117"},{"key":"e_1_3_2_54_2","first-page":"3853","volume-title":"Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence","author":"Qin Libo","year":"2021","unstructured":"Libo Qin, Minheng Ni, Yue Zhang, and Wanxiang Che. 2021. CoSDA-ML: Multi-lingual code-switching data augmentation for zero-shot cross-lingual NLP. In Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence. 3853\u20133860."},{"key":"e_1_3_2_55_2","doi-asserted-by":"crossref","unstructured":"Zhen Qin Rolf Jagerman Kai Hui Honglei Zhuang Junru Wu Jiaming Shen Tianqi Liu Jialu Liu Donald Metzler Xuanhui Wang and Michael Freisleben. 2023. Large language models are effective text rankers with pairwise ranking prompting. arXiv:2306.17563. Retrieved from https:\/\/arxiv.org\/abs\/cs\/2306.17563","DOI":"10.18653\/v1\/2024.findings-naacl.97"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.466"},{"key":"e_1_3_2_57_2","unstructured":"Roberta Raileanu Maxwell Goldstein Denis Yarats Ilya Kostrikov and Rob Fergus. 2021. Automatic Data augmentation for generalization in reinforcement learning. In Advances in Neural Information Processing Systems Vol. 34. Curran Associates Inc. 5402\u20135415."},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.562"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3412124"},{"key":"e_1_3_2_60_2","unstructured":"Victor Sanh Lysandre Debut Julien Chaumond and Thomas Wolf. 2019. DistilBERT a distilled version of BERT: Smaller faster cheaper and lighter. arXiv:1910.01108. Retrieved from https:\/\/arxiv.org\/abs\/cs\/1910.01108"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-019-0197-0"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-021-00492-0"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1145\/2854946.2854959"},{"key":"e_1_3_2_64_2","unstructured":"Kihyuk Sohn. 2016. Improved deep metric learning with multi-class n-pair loss objective. In Advances in Neural Information Processing Systems 29 Annual Conference on Neural Information Processing Systems. 1857\u20131865."},{"key":"e_1_3_2_65_2","doi-asserted-by":"crossref","unstructured":"Lichao Sun Congying Xia Wenpeng Yin Tingting Liang S. Yu Philip and Lifang He. 2020. Mixup-transformer: Dynamic data augmentation for NLP tasks. In Proceedings of the 28th International Conference on Computational Linguistics. 3436\u20133440.","DOI":"10.18653\/v1\/2020.coling-main.305"},{"key":"e_1_3_2_66_2","doi-asserted-by":"crossref","unstructured":"Weiwei Sun Lingyong Yan Xinyu Ma Shuaiqiang Wang Pengjie Ren Zhumin Chen Dawei Yin and Zhaochun Ren. 2023. Is ChatGPT Good at Search? Investigating large language models as re-ranking agents. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201923). 14918\u201314937.","DOI":"10.18653\/v1\/2023.emnlp-main.923"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1044"},{"key":"e_1_3_2_68_2","volume-title":"Proceedings of the 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)","author":"Thakur Nandan","year":"2021","unstructured":"Nandan Thakur, Nils Reimers, Andreas R\u00fcckl\u00e9, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Proceedings of the 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). Retrieved from https:\/\/openreview.net\/forum?id=wCu6T5xFjeJ"},{"key":"e_1_3_2_69_2","doi-asserted-by":"crossref","unstructured":"Hoang Van Vikas Yadav and Mihai Surdeanu. 2021. Cheap and good? simple and effective data augmentation for low resource machine reading. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2116\u20132120.","DOI":"10.1145\/3404835.3463099"},{"key":"e_1_3_2_70_2","first-page":"9929","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Wang Tongzhou","year":"2020","unstructured":"Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Proceedings of the International Conference on Machine Learning. PMLR, 9929\u20139939."},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-92273-3_18"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00393"},{"key":"e_1_3_2_73_2","unstructured":"Lee Xiong Chenyan Xiong Ye Li Kwok-Fung Tang Jialin Liu Paul N. Bennett Junaid Ahmed and Arnold Overwijk. 2021. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In 9th International Conference on Learning Representations (ICLR\u201921) Virtual Event Austria. https:\/\/openreview.net\/forum?id=zeFrfgyZln"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.477"},{"key":"e_1_3_2_75_2","doi-asserted-by":"crossref","unstructured":"Wei Yang Yuqing Xie Luchen Tan Kun Xiong Ming Li and Jimmy Lin. 2019. Data augmentation for bert fine-tuning in open-domain question answering. arXiv:1904.06652. Retrieved from https:\/\/arxiv.org\/abs\/cs\/1904.06652","DOI":"10.18653\/v1\/N19-4013"},{"key":"e_1_3_2_76_2","doi-asserted-by":"crossref","unstructured":"Yinfei Yang Ning Jin Kuo Lin Mandy Guo and Daniel Cer. 2021. Neural retrieval for question answering with cross-attention supervised data augmentation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 263\u2013268.","DOI":"10.18653\/v1\/2021.acl-short.35"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.coling-main.399"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1352"},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-60239-0_19"},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462880"},{"key":"e_1_3_2_81_2","unstructured":"Xingyu Zhang Tong Xiao Yidong Chen and Qun Liu. 2021. Text augmentation for neural machine translation: A review. arXiv:2103.09065. Retrieved from https:\/\/arxiv.org\/abs\/cs\/2103.09065"},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1145\/3437963.3441758"},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.5555\/3327546.3327555"},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.7000"},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCC51575.2020.9344922"},{"key":"e_1_3_2_86_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482243"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3634911","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3634911","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:51:07Z","timestamp":1750287067000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3634911"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,29]]},"references-count":85,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,9,30]]}},"alternative-id":["10.1145\/3634911"],"URL":"https:\/\/doi.org\/10.1145\/3634911","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"type":"print","value":"1046-8188"},{"type":"electronic","value":"1558-2868"}],"subject":[],"published":{"date-parts":[[2024,4,29]]},"assertion":[{"value":"2023-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-11-20","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}