{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T10:02:21Z","timestamp":1775815341853,"version":"3.50.1"},"reference-count":114,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2022,12,20]],"date-time":"2022-12-20T00:00:00Z","timestamp":1671494400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Web"],"published-print":{"date-parts":[[2023,2,28]]},"abstract":"<jats:p>Pre-trained language representation models (PLMs) such as BERT and Enhanced Representation through kNowledge IntEgration (ERNIE) have been integral to achieving recent improvements on various downstream tasks, including information retrieval. However, it is nontrivial to directly utilize these models for the large-scale web search due to the following challenging issues: (1) the prohibitively expensive computations of massive neural PLMs, especially for long texts in the web document, prohibit their deployments in the web search system that demands extremely low latency; (2) the discrepancy between existing task-agnostic pre-training objectives and the ad hoc retrieval scenarios that demand comprehensive relevance modeling is another main barrier for improving the online retrieval and ranking effectiveness; and (3) to create a significant impact on real-world applications, it also calls for practical solutions to seamlessly interweave the resultant PLM and other components into a cooperative system to serve web-scale data. Accordingly, we contribute a series of successfully applied techniques in tackling these exposed issues in this work when deploying the state-of-the-art Chinese pre-trained language model, i.e., ERNIE, in the online search engine system. We first present novel practices to perform expressive PLM-based semantic retrieval with a flexible poly-interaction scheme and cost-efficiently contextualize and rank web documents with a cheap yet powerful Pyramid-ERNIE architecture. We then endow innovative pre-training and fine-tuning paradigms to explicitly incentivize the query-document relevance modeling in PLM-based retrieval and ranking with the large-scale noisy and biased post-click behavioral data. We also introduce a series of effective strategies to seamlessly interwoven the designed PLM-based models with other conventional components into a cooperative system. Extensive offline and online experimental results show that our proposed techniques are crucial to achieving more effective search performance. We also provide a thorough analysis of our methodology and experimental results.<\/jats:p>","DOI":"10.1145\/3568681","type":"journal-article","created":{"date-parts":[[2022,10,20]],"date-time":"2022-10-20T11:50:59Z","timestamp":1666266659000},"page":"1-36","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Pre-trained Language Model-based Retrieval and Ranking for Web Search"],"prefix":"10.1145","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6755-871X","authenticated-orcid":false,"given":"Lixin","family":"Zou","sequence":"first","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0761-3419","authenticated-orcid":false,"given":"Weixue","family":"Lu","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6857-261X","authenticated-orcid":false,"given":"Yiding","family":"Liu","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7147-5666","authenticated-orcid":false,"given":"Hengyi","family":"Cai","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7407-2005","authenticated-orcid":false,"given":"Xiaokai","family":"Chu","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2215-5356","authenticated-orcid":false,"given":"Dehong","family":"Ma","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4926-3357","authenticated-orcid":false,"given":"Daiting","family":"Shi","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5430-5534","authenticated-orcid":false,"given":"Yu","family":"Sun","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6503-4581","authenticated-orcid":false,"given":"Zhicong","family":"Cheng","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0113-4540","authenticated-orcid":false,"given":"Simiu","family":"Gu","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9212-1947","authenticated-orcid":false,"given":"Shuaiqiang","family":"Wang","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8846-2001","authenticated-orcid":false,"given":"Dawei","family":"Yin","sequence":"additional","affiliation":[{"name":"Baidu Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,12,20]]},"reference":[{"key":"e_1_3_2_2_2","article-title":"Better fine-tuning by reducing representational collapse","author":"Aghajanyan Armen","year":"2020","unstructured":"Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, and Sonal Gupta. 2020. Better fine-tuning by reducing representational collapse. arXiv:2008.03156. Retrieved from https:\/\/arxiv.org\/abs\/2008.03156.","journal-title":"arXiv:2008.03156"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.394"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1539"},{"key":"e_1_3_2_5_2","article-title":"Longformer: The long-document transformer","author":"Beltagy Iz","year":"2020","unstructured":"Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv:2004.05150. Retrieved from https:\/\/arxiv.org\/abs\/2004.05150.","journal-title":"arXiv:2004.05150"},{"key":"e_1_3_2_6_2","article-title":"Language models are few-shot learners","author":"Brown Tom B.","year":"2020","unstructured":"Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et\u00a0al. 2020. Language models are few-shot learners. arXiv:2005.14165. Retrieved from https:\/\/arxiv.org\/abs\/2005.14165.","journal-title":"arXiv:2005.14165"},{"key":"e_1_3_2_7_2","article-title":"From ranknet to lambdarank to lambdamart: An overview","author":"Burges Christopher J. C.","year":"2010","unstructured":"Christopher J. C. Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning.","journal-title":"Learning"},{"key":"e_1_3_2_8_2","article-title":"Semantic models for the first-stage retrieval: A comprehensive review","author":"Cai Yinqiong","year":"2021","unstructured":"Yinqiong Cai, Yixing Fan, Jiafeng Guo, Fei Sun, Ruqing Zhang, and Xueqi Cheng. 2021. Semantic models for the first-stage retrieval: A comprehensive review. arXiv:2103.04831. Retrieved from https:\/\/arxiv.org\/abs\/2103.04831.","journal-title":"arXiv:2103.04831"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273513"},{"key":"e_1_3_2_10_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Chang Wei-Cheng","year":"2019","unstructured":"Wei-Cheng Chang, X. Yu Felix, Yin-Wen Chang, Yiming Yang, and Sanjiv Kumar. 2019. Pre-training tasks for embedding-based large-scale retrieval. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2094072.2094078"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/1526709.1526711"},{"key":"e_1_3_2_13_2","article-title":"Bias and debias in recommender system: A survey and future directions","author":"Chen J.","year":"2020","unstructured":"J. Chen, Hande Dong, Xiao lei Wang, Fuli Feng, Ming-Chieh Wang, and X. He. 2020. Bias and debias in recommender system: A survey and future directions. arXiv:2010.03240. Retrieved from https:\/\/arxiv.org\/abs\/2010.03240.","journal-title":"arXiv:2010.03240"},{"key":"e_1_3_2_14_2","article-title":"Rethinking attention with performers","author":"Choromanski Krzysztof","year":"2020","unstructured":"Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, et\u00a0al. 2020. Rethinking attention with performers. arXiv:2009.14794. Retrieved from https:\/\/arxiv.org\/abs\/2009.14794.","journal-title":"arXiv:2009.14794"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531986"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/133160.133199"},{"key":"e_1_3_2_17_2","volume-title":"Advances in Neural Information Processing Systems","author":"Ding Ming","year":"2020","unstructured":"Ming Ding, Chang Zhou, Hongxia Yang, and Jie Tang. 2020. CogLTX: Applying BERT to long texts. In Advances in Neural Information Processing Systems."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_28"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1026"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.5555\/945365.964285"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1997.1504"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2010007"},{"key":"e_1_3_2_23_2","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Gao Luyu","year":"2020","unstructured":"Luyu Gao and Zhuyun Dai. 2020. Modularized transformer-based ranking framework. In Proceedings of the Conference on Empirical Methods in Natural Language Processing."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-72240-1_26"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.5555\/3086952"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3412697"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/2983323.2983769"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462917"},{"key":"e_1_3_2_29_2","article-title":"Don\u2019t stop pretraining: Adapt language models to domains and tasks","author":"Gururangan Suchin","year":"2020","unstructured":"Suchin Gururangan, Ana Marasovi\u0107, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. 2020. Don\u2019t stop pretraining: Adapt language models to domains and tasks. arXiv:2004.10964. Retrieved from https:\/\/arxiv.org\/abs\/2004.10964.","journal-title":"arXiv:2004.10964"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403333"},{"key":"e_1_3_2_31_2","article-title":"Distilling the knowledge in a neural network","author":"Hinton Geoffrey","year":"2015","unstructured":"Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531. Retrieved from https:\/\/arxiv.org\/abs\/1503.02531.","journal-title":"arXiv:1503.02531"},{"key":"e_1_3_2_32_2","article-title":"Dynabert: Dynamic bert with adaptive width and depth","author":"Hou Lu","year":"2020","unstructured":"Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, and Qun Liu. 2020. Dynabert: Dynamic bert with adaptive width and depth. arXiv:2004.04037. Retrieved from https:\/\/arxiv.org\/abs\/2004.04037.","journal-title":"arXiv:2004.04037"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403305"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505665"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.5555\/3122009.3242044"},{"key":"e_1_3_2_36_2","article-title":"Poly-encoders: Transformer architectures and pre-training strategies for fast and accurate multi-sentence scoring","author":"Humeau Samuel","year":"2019","unstructured":"Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. 2019. Poly-encoders: Transformer architectures and pre-training strategies for fast and accurate multi-sentence scoring. arXiv:1905.01969.","journal-title":"arXiv:1905.01969"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00286"},{"key":"e_1_3_2_38_2","first-page":"243","volume-title":"ACM SIGIR Forum","author":"J\u00e4rvelin Kalervo","year":"2017","unstructured":"Kalervo J\u00e4rvelin and Jaana Kek\u00e4l\u00e4inen. 2017. IR evaluation methods for retrieving highly relevant documents. In ACM SIGIR Forum, Vol. 51. ACM New York, NY, 243\u2013250."},{"key":"e_1_3_2_39_2","article-title":"Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization","author":"Jiang Haoming","year":"2019","unstructured":"Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Tuo Zhao. 2019. Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. arXiv:1911.03437.","journal-title":"arXiv:1911.03437"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775067"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00065"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.550"},{"key":"e_1_3_2_43_2","first-page":"4171","volume-title":"Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT\u201919)","author":"Kenton Jacob Devlin Ming-Wei Chang","year":"2019","unstructured":"Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT\u201919). 4171\u20134186."},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401075"},{"key":"e_1_3_2_45_2","unstructured":"Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. (2014)."},{"key":"e_1_3_2_46_2","article-title":"Reformer: The efficient transformer","author":"Kitaev Nikita","year":"2020","unstructured":"Nikita Kitaev, \u0141ukasz Kaiser, and Anselm Levskaya. 2020. Reformer: The efficient transformer. arXiv:2001.04451. Retrieved from https:\/\/arxiv.org\/abs\/2001.04451.","journal-title":"arXiv:2001.04451"},{"key":"e_1_3_2_47_2","article-title":"Quantizing deep convolutional networks for efficient inference: A whitepaper","author":"Krishnamoorthi Raghuraman","year":"2018","unstructured":"Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv:1806.08342. Retrieved from https:\/\/arxiv.org\/abs\/1806.08342.","journal-title":"arXiv:1806.08342"},{"key":"e_1_3_2_48_2","article-title":"Albert: A lite bert for self-supervised learning of language representations","author":"Lan Zhenzhong","year":"2019","unstructured":"Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv:1909.11942. Retrieved from https:\/\/arxiv.org\/abs\/1909.11942.","journal-title":"arXiv:1909.11942"},{"key":"e_1_3_2_49_2","article-title":"BioBERT: A pre-trained biomedical language representation model for biomedical text mining","author":"Lee Jinhyuk","year":"2020","unstructured":"Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, D. Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. BioinformaticsBioinformatics 36, 4 (2020), 1234\u20131240.","journal-title":"Bioinformatics"},{"key":"e_1_3_2_50_2","article-title":"Latent retrieval for weakly supervised open domain question answering","author":"Lee Kenton","year":"2019","unstructured":"Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. 2019. Latent retrieval for weakly supervised open domain question answering. arXiv:1906.00300. Retrieved from https:\/\/arxiv.org\/abs\/1906.00300.","journal-title":"arXiv:1906.00300"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1561\/1500000035"},{"key":"e_1_3_2_52_2","article-title":"Mcrank: Learning to rank using multiple classification and gradient boosting","volume":"20","author":"Li Ping","year":"2007","unstructured":"Ping Li, Qiang Wu, and Christopher Burges. 2007. Mcrank: Learning to rank using multiple classification and gradient boosting. Adv. Neural Inf. Process. Syst. 20 (2007).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.2200\/S01123ED1V01Y202108HLT053"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1561\/1500000016"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3412695"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467149"},{"key":"e_1_3_2_57_2","article-title":"RoBERTa: A robustly optimized BERT pretraining approach","author":"Liu Y.","year":"2019","unstructured":"Y. Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, M. Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692. Retrieved from https:\/\/arxiv.org\/abs\/1907.11692.","journal-title":"arXiv:1907.11692"},{"key":"e_1_3_2_58_2","article-title":"Neural passage retrieval with improved negative contrast","author":"Lu Jing","year":"2020","unstructured":"Jing Lu, Gustavo Hernandez Abrego, Ji Ma, Jianmo Ni, and Yinfei Yang. 2020. Neural passage retrieval with improved negative contrast. arXiv:2010.12523. Retrieved from https:\/\/arxiv.org\/abs\/2010.12523.","journal-title":"arXiv:2010.12523"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3412747"},{"key":"e_1_3_2_60_2","volume-title":"Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS\u201913)","volume":"26","author":"Lu Zhengdong","year":"2013","unstructured":"Zhengdong Lu and Hang Li. 2013. A deep architecture for matching short texts. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS\u201913), Vol. 26. Curran Associates, Inc."},{"key":"e_1_3_2_61_2","article-title":"Sparse, dense, and attentional representations for text retrieval","author":"Luan Yi","year":"2020","unstructured":"Yi Luan, Jacob Eisenstein, Kristina Toutanova, and Michael Collins. 2020. Sparse, dense, and attentional representations for text retrieval. arXiv:2005.00181. Retrieved from https:\/\/arxiv.org\/abs\/2005.00181.","journal-title":"arXiv:2005.00181"},{"key":"e_1_3_2_62_2","article-title":"Model-based unbiased learning to rank","author":"Luo Dan","year":"2022","unstructured":"Dan Luo, Lixin Zou, Qingyao Ai, Zhiyu Chen, Dawei Yin, and Brian D. Davison. 2022. Model-based unbiased learning to rank. arXiv:2207.11785. Retrieved from https:\/\/arxiv.org\/abs\/2207.11785.","journal-title":"arXiv:2207.11785"},{"key":"e_1_3_2_63_2","article-title":"PROP: Pre-training with representative words prediction for ad-hoc retrieval","author":"Ma Xinyu","year":"2020","unstructured":"Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xiang Ji, and Xueqi Cheng. 2020. PROP: Pre-training with representative words prediction for ad-hoc retrieval. arXiv:2010.10137. Retrieved from https:\/\/arxiv.org\/abs\/2010.10137.","journal-title":"arXiv:2010.10137"},{"key":"e_1_3_2_64_2","article-title":"B-PROP: Bootstrapped pre-training with representative words prediction for ad-hoc retrieval","author":"Ma Xinyu","year":"2021","unstructured":"Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xiang Ji, and Xueqi Cheng. 2021. B-PROP: Bootstrapped pre-training with representative words prediction for ad-hoc retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.","journal-title":"Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval"},{"key":"e_1_3_2_65_2","volume-title":"Data Mining with Decision Trees: Theory and Applications","author":"Maimon Oded Z.","year":"2014","unstructured":"Oded Z. Maimon and Lior Rokach. 2014. Data Mining with Decision Trees: Theory and Applications. World scientific."},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-32153-5_10"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2013.10.006"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2889473"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1211"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1561\/1500000061"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/3038912.3052579"},{"key":"e_1_3_2_72_2","article-title":"Passage re-ranking with BERT","author":"Nogueira Rodrigo","year":"2019","unstructured":"Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with BERT. arXiv:1901.04085. Retrieved from https:\/\/arxiv.org\/abs\/1901.04085.","journal-title":"arXiv:1901.04085"},{"key":"e_1_3_2_73_2","article-title":"Multi-stage document ranking with BERT","author":"Nogueira Rodrigo","year":"2019","unstructured":"Rodrigo Nogueira, W. Yang, Kyunghyun Cho, and Jimmy Lin. 2019. Multi-stage document ranking with BERT. arXiv:1910.14424. Retrieved from https:\/\/arxiv.org\/abs\/1910.14424.","journal-title":"arXiv:1910.14424"},{"key":"e_1_3_2_74_2","article-title":"Semantic modelling with long-short-term memory for information retrieval","author":"Palangi Hamid","year":"2014","unstructured":"Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, and R. Ward. 2014. Semantic modelling with long-short-term memory for information retrieval. arXiv:1412.6629. Retrieved from https:\/\/arxiv.org\/abs\/1412.6629.","journal-title":"arXiv:1412.6629"},{"key":"e_1_3_2_75_2","doi-asserted-by":"crossref","unstructured":"Hamid Palangi H. Palangi L. Deng Y. Shen J. Gao X. He J. Chen X. Song and R. Ward. 2015. Deep sentence embedding using the long short term memory network: Analysis and application to information retrieval.","DOI":"10.1109\/TASLP.2016.2520371"},{"key":"e_1_3_2_76_2","article-title":"Modeling of pruning techniques for deep neural networks simplification","author":"Pasandi Morteza Mousa","year":"2020","unstructured":"Morteza Mousa Pasandi, Mohsen Hajabdollahi, Nader Karimi, and Shadrokh Samavi. 2020. Modeling of pruning techniques for deep neural networks simplification. arXiv:2001.04062. Retrieved from https:\/\/arxiv.org\/abs\/200104062.","journal-title":"arXiv:2001.04062"},{"key":"e_1_3_2_77_2","article-title":"Deep contextualized word representations","author":"Peters Matthew E.","year":"2018","unstructured":"Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv:1802.05365. Retrieved from https:\/\/arxiv.org\/abs\/1802.05365.","journal-title":"arXiv:1802.05365"},{"key":"e_1_3_2_78_2","volume-title":"Proceedings of the International Conference on Information and Communication Technologies & Applications","volume":"17","author":"Ponomarenko Alexander","year":"2011","unstructured":"Alexander Ponomarenko, Yury Malkov, Andrey Logvinov, and Vladimir Krylov. 2011. Approximate nearest neighbor search small world approach. In Proceedings of the International Conference on Information and Communication Technologies & Applications, Vol. 17."},{"key":"e_1_3_2_79_2","article-title":"The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models","author":"Pradeep Ronak","year":"2021","unstructured":"Ronak Pradeep, Rodrigo Nogueira, and Jimmy Lin. 2021. The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models. arXiv:2101.05667. Retrieved from https:\/\/arxiv.org\/abs\/2101.05667.","journal-title":"arXiv:2101.05667"},{"key":"e_1_3_2_80_2","article-title":"Intermediate-task transfer learning with pretrained models for natural language understanding: When and why does it work?","author":"Pruksachatkun Yada","year":"2020","unstructured":"Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, and Samuel R. Bowman. 2020. Intermediate-task transfer learning with pretrained models for natural language understanding: When and why does it work? arXiv:2005.00628. Retrieved from https:\/\/arxiv.org\/abs\/2005.00628.","journal-title":"arXiv:2005.00628"},{"key":"e_1_3_2_81_2","article-title":"Exploring transfer learning with T5: The text-to-text transfer transformer","author":"Roberts A.","year":"2020","unstructured":"A. Roberts and C. Raffel. 2020. Exploring transfer learning with T5: The text-to-text transfer transformer. Google AI Blog. https:\/\/ai.googleblog.com\/2020\/02\/exploring-transfer-learning-with-t5.html.","journal-title":"Google AI Blog"},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.5555\/1823431"},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1162\/089976604773135104"},{"issue":"7","key":"e_1_3_2_84_2","first-page":"969","article-title":"Semantic hashing","volume":"50","author":"Salakhutdinov Ruslan","year":"2009","unstructured":"Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Semantic hashing. Int. J. Adv. Res. 50, 7 (2009), 969\u2013978.","journal-title":"Int. J. Adv. Res."},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.1145\/2766462.2767738"},{"key":"e_1_3_2_86_2","doi-asserted-by":"publisher","DOI":"10.1145\/2661829.2661935"},{"key":"e_1_3_2_87_2","doi-asserted-by":"publisher","DOI":"10.1145\/2567948.2577348"},{"key":"e_1_3_2_88_2","article-title":"ERNIE: Enhanced representation through knowledge integration","author":"Sun Yu","year":"2019","unstructured":"Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. ERNIE: Enhanced representation through knowledge integration. arXiv:1904.09223.","journal-title":"arXiv:1904.09223"},{"key":"e_1_3_2_89_2","article-title":"Efficient transformers: A survey","author":"Tay Yi","year":"2020","unstructured":"Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2020. Efficient transformers: A survey. arXiv:2009.06732. Retrieved from https:\/\/arxiv.org\/abs\/2009.06732.","journal-title":"arXiv:2009.06732"},{"key":"e_1_3_2_90_2","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_91_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10342"},{"key":"e_1_3_2_92_2","article-title":"Cross-thought for sentence encoder pre-training","author":"Wang Shuohang","year":"2020","unstructured":"Shuohang Wang, Yuwei Fang, Siqi Sun, Zhe Gan, Yu Cheng, Jing Jiang, and Jingjing Liu. 2020. Cross-thought for sentence encoder pre-training. arXiv:2010.03652. Retrieved from https:\/\/arxiv.org\/abs\/2010.06352.","journal-title":"arXiv:2010.03652"},{"key":"e_1_3_2_93_2","article-title":"Linformer: Self-attention with linear complexity","author":"Wang Sinong","year":"2020","unstructured":"Sinong Wang, Belinda Li, Madian Khabsa, Han Fang, and Hao Ma. 2020. Linformer: Self-attention with linear complexity. arXiv:2006.04768. Retrieved from https:\/\/arxiv.org\/abs\/2006.04768.","journal-title":"arXiv:2006.04768"},{"key":"e_1_3_2_94_2","first-page":"5383","volume-title":"International Conference on Machine Learning","author":"Xia Yingce","year":"2018","unstructured":"Yingce Xia, Xu Tan, Fei Tian, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2018. Model-level dual learning. In International Conference on Machine Learning. PMLR, 5383\u20135392."},{"key":"e_1_3_2_95_2","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080809"},{"key":"e_1_3_2_96_2","article-title":"Approximate nearest neighbor negative contrastive learning for dense text retrieval","author":"Xiong Lee","year":"2020","unstructured":"Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2020. Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv:2007.00808.","journal-title":"arXiv:2007.00808"},{"key":"e_1_3_2_97_2","article-title":"Xlnet: Generalized autoregressive pretraining for language understanding","volume":"32","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 32 (2019).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_98_2","first-page":"247","volume-title":"Proceedings of the 15th Conference on Computational Natural Language Learning","author":"Yih Wen-tau","year":"2011","unstructured":"Wen-tau Yih, Kristina Toutanova, John C. Platt, and Christopher Meek. 2011. Learning discriminative projections for text similarity measures. In Proceedings of the 15th Conference on Computational Natural Language Learning. 247\u2013256."},{"key":"e_1_3_2_99_2","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939677"},{"key":"e_1_3_2_100_2","article-title":"Towards personalized and semantic retrieval: An end-to-end solution for e-commerce search via embedding learning","author":"Zhang Han","year":"2020","unstructured":"Han Zhang, Songlin Wang, Kang Zhang, Zhiling Tang, Yunjiang Jiang, Yun Xiao, Weipeng Yan, and Wen-Yun Yang. 2020. Towards personalized and semantic retrieval: An end-to-end solution for e-commerce search via embedding learning. arXiv:2006.02282. Retrieved from https:\/\/arxiv.org\/abs\/2006.02282.","journal-title":"arXiv:2006.02282"},{"key":"e_1_3_2_101_2","first-page":"11328","volume-title":"International Conference on Machine Learning","author":"Zhang Jingqing","year":"2020","unstructured":"Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter Liu. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning. PMLR, 11328\u201311339."},{"key":"e_1_3_2_102_2","article-title":"Multi-stage pre-training for low-resource domain adaptation","author":"Zhang R.","year":"2020","unstructured":"R. Zhang, Revanth Reddy Gangi Reddy, Md Arafat Sultan, V. Castelli, Anthony Ferritto, Radu Florian, Efsun Sarioglu Kayi, S. Roukos, A. Sil, and T. Ward. 2020. Multi-stage pre-training for low-resource domain adaptation. arXiv:2010.05904. Retrieved from https:\/\/arxiv.org\/abs\/2010.05904.","journal-title":"arXiv:2010.05904"},{"key":"e_1_3_2_103_2","volume-title":"Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP\u201911)","author":"Zhao Shiqi","year":"2011","unstructured":"Shiqi Zhao, H. Wang, Chao Li, T. Liu, and Y. Guan. 2011. Automatically generating questions from queries for community-based question answering. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP\u201911)."},{"key":"e_1_3_2_104_2","article-title":"Memory-efficient embedding for recommendations","author":"Zhao Xiangyu","year":"2020","unstructured":"Xiangyu Zhao, Haochen Liu, Hui Liu, Jiliang Tang, Weiwei Guo, Jun Shi, Sida Wang, Huiji Gao, and Bo Long. 2020. Memory-efficient embedding for recommendations. arXiv:2006.14827. Retrieved from https:\/\/arxiv.org\/abs\/2006.14827.","journal-title":"arXiv:2006.14827"},{"key":"e_1_3_2_105_2","article-title":"Autoemb: Automated embedding dimensionality search in streaming recommendations","author":"Zhao Xiangyu","year":"2020","unstructured":"Xiangyu Zhao, Chong Wang, Ming Chen, Xudong Zheng, Xiaobing Liu, and Jiliang Tang. 2020. Autoemb: Automated embedding dimensionality search in streaming recommendations. arXiv:2002.11252. Retrieved from https:\/\/arxiv.org\/abs\/2002.11252.","journal-title":"arXiv:2002.11252"},{"key":"e_1_3_2_106_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3412044"},{"key":"e_1_3_2_107_2","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277792"},{"key":"e_1_3_2_108_2","article-title":"Pre-training text-to-text transformers for concept-centric common sense","author":"Zhou Wangchunshu","year":"2020","unstructured":"Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee, Bill Yuchen Lin, and Xiang Ren. 2020. Pre-training text-to-text transformers for concept-centric common sense. arXiv:2011.07956. Retrieved from https:\/\/arxiv.org\/abs\/2011.07956.","journal-title":"arXiv:2011.07956"},{"key":"e_1_3_2_109_2","article-title":"Fine-tuning language models from human preferences","author":"Ziegler Daniel M.","year":"2019","unstructured":"Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2019. Fine-tuning language models from human preferences. arXiv:1909.08593. Retrieved from https:\/\/arxiv.org\/abs\/1909.08593.","journal-title":"arXiv:1909.08593"},{"key":"e_1_3_2_110_2","volume-title":"NeurIPS Dataset Track","author":"Zou Lixin","year":"2022","unstructured":"Lixin Zou, Haitao Mao, Xiaokai Chu, Jiliang Tang, Wenwen Ye, Shuaiqiang Wang, and Dawei Yin. 2022. A large scale search dataset for unbiased learning to rank. In NeurIPS Dataset Track."},{"key":"e_1_3_2_111_2","doi-asserted-by":"publisher","DOI":"10.1145\/3511808.3557145"},{"key":"e_1_3_2_112_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330668"},{"key":"e_1_3_2_113_2","doi-asserted-by":"publisher","DOI":"10.1145\/3336191.3371801"},{"key":"e_1_3_2_114_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401181"},{"key":"e_1_3_2_115_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467147"}],"container-title":["ACM Transactions on the Web"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3568681","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3568681","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:08:34Z","timestamp":1750183714000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3568681"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,20]]},"references-count":114,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,2,28]]}},"alternative-id":["10.1145\/3568681"],"URL":"https:\/\/doi.org\/10.1145\/3568681","relation":{},"ISSN":["1559-1131","1559-114X"],"issn-type":[{"value":"1559-1131","type":"print"},{"value":"1559-114X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,20]]},"assertion":[{"value":"2022-05-27","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-09-13","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-12-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}