{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T10:00:07Z","timestamp":1775815207219,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":112,"publisher":"ACM","license":[{"start":{"date-parts":[[2025,4,22]],"date-time":"2025-04-22T00:00:00Z","timestamp":1745280000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Beijing Natural Science Foundation","award":["L233008"],"award-info":[{"award-number":["L233008"]}]},{"name":"Engineer- ing Research Center of Next-Generation Intelligent Search and Recommendation, MOE."},{"name":"Beijing Municipal Science and Technology Project","award":["Z231100010323009"],"award-info":[{"award-number":["Z231100010323009"]}]},{"DOI":"10.13039\/https:\/\/doi.org\/10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62272467"],"award-info":[{"award-number":["62272467"]}],"id":[{"id":"10.13039\/https:\/\/doi.org\/10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,4,28]]},"DOI":"10.1145\/3696410.3714717","type":"proceedings-article","created":{"date-parts":[[2025,4,22]],"date-time":"2025-04-22T22:52:18Z","timestamp":1745362338000},"page":"4206-4225","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":24,"title":["Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2318-0281","authenticated-orcid":false,"given":"Guanting","family":"Dong","sequence":"first","affiliation":[{"name":"Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9432-3251","authenticated-orcid":false,"given":"Yutao","family":"Zhu","sequence":"additional","affiliation":[{"name":"Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-8399-2258","authenticated-orcid":false,"given":"Chenghao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-8624-1634","authenticated-orcid":false,"given":"Zechen","family":"Wang","sequence":"additional","affiliation":[{"name":"Beijing University of Posts and Telecommunications, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9777-9676","authenticated-orcid":false,"given":"Ji-Rong","family":"Wen","sequence":"additional","affiliation":[{"name":"Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9781-948X","authenticated-orcid":false,"given":"Zhicheng","family":"Dou","sequence":"additional","affiliation":[{"name":"Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,4,22]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","unstructured":"Rohan Anil Andrew M. Dai Orhan Firat Melvin Johnson Dmitry Lepikhin Alexandre Passos Siamak Shakeri Emanuel Taropa Paige Bailey Zhifeng Chen Eric Chu Jonathan H. Clark Laurent El Shafey Yanping Huang Kathy Meier-Hellstern Gaurav Mishra Erica Moreira Mark Omernick Kevin Robinson Sebastian Ruder Yi Tay Kefan Xiao Yuanzhong Xu Yujing Zhang Gustavo Hern\u00e1ndez \u00c1brego Junwhan Ahn Jacob Austin Paul Barham Jan A. Botha James Bradbury Siddhartha Brahma Kevin Brooks Michele Catasta Yong Cheng Colin Cherry Christopher A. Choquette-Choo Aakanksha Chowdhery Cl\u00e9ment Crepy Shachi Dave Mostafa Dehghani Sunipa Dev Jacob Devlin Mark D\u00edaz Nan Du Ethan Dyer Vladimir Feinberg Fangxiaoyu Feng Vlad Fienber Markus Freitag Xavier Garcia Sebastian Gehrmann Lucas Gonzalez and et al. 2023. PaLM 2 Technical Report. CoRR Vol. abs\/2305.10403 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2305.10403 showeprint[arXiv]2305.10403","DOI":"10.48550\/ARXIV.2305.10403"},{"key":"e_1_3_2_1_4_1","volume-title":"Marzieh Fadaee, and Rodrigo Frassetto Nogueira.","author":"Bonifacio Luiz Henrique","year":"2022","unstructured":"Luiz Henrique Bonifacio, Hugo Queiroz Abonizio, Marzieh Fadaee, and Rodrigo Frassetto Nogueira. 2022. InPars: Data Augmentation for Information Retrieval using Large Language Models. CoRR, Vol. abs\/2202.05144 (2022). showeprint[arXiv]2202.05144 https:\/\/arxiv.org\/abs\/2202.05144"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007379606734"},{"key":"e_1_3_2_1_6_1","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Pond\u00e9 de Oliveira Pinto Jared Kaplan Harrison Edwards Yuri Burda Nicholas Joseph Greg Brockman Alex Ray Raul Puri Gretchen Krueger Michael Petrov Heidy Khlaaf Girish Sastry Pamela Mishkin Brooke Chan Scott Gray Nick Ryder Mikhail Pavlov Alethea Power Lukasz Kaiser Mohammad Bavarian Clemens Winter Philippe Tillet Felipe Petroski Such Dave Cummings Matthias Plappert Fotios Chantzis Elizabeth Barnes Ariel Herbert-Voss William Hebgen Guss Alex Nichol Alex Paino Nikolas Tezak Jie Tang Igor Babuschkin Suchir Balaji Shantanu Jain William Saunders Christopher Hesse Andrew N. Carr Jan Leike Joshua Achiam Vedant Misra Evan Morikawa Alec Radford Matthew Knight Miles Brundage Mira Murati Katie Mayer Peter Welinder Bob McGrew Dario Amodei Sam McCandlish Ilya Sutskever and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. CoRR Vol. abs\/2107.03374 (2021). showeprint[arXiv]2107.03374 https:\/\/arxiv.org\/abs\/2107.03374"},{"key":"e_1_3_2_1_7_1","volume-title":"Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12--17","author":"Dong Guanting","year":"2022","unstructured":"Guanting Dong, Daichi Guo, Liwen Wang, Xuefeng Li, Zechen Wang, Chen Zeng, Keqing He, Jinzheng Zhao, Hao Lei, Xinyue Cui, Yi Huang, Junlan Feng, and Weiran Xu. 2022. PSSAT: A Perturbed Semantic Structure Awareness Transferring Method for Perturbation-Robust Slot Filling. In Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12--17, 2022, Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon Na (Eds.). International Committee on Computational Linguistics, 5327--5334. https:\/\/aclanthology.org\/2022.coling-1.473"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3583780.3615150"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2406.13542"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2410.09584"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3583780.3614766"},{"key":"e_1_3_2_1_13_1","volume-title":"2023 d. How abilities in large language models are affected by supervised fine-tuning data composition. arXiv preprint arXiv:2310.05492","author":"Dong Guanting","year":"2023","unstructured":"Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, and Jingren Zhou. 2023 d. How abilities in large language models are affected by supervised fine-tuning data composition. arXiv preprint arXiv:2310.05492 (2023)."},{"key":"e_1_3_2_1_14_1","volume-title":"Progressive multimodal reasoning via active retrieval. arXiv preprint arXiv:2412.14835","author":"Dong Guanting","year":"2024","unstructured":"Guanting Dong, Chenghao Zhang, Mengjie Deng, Yutao Zhu, Zhicheng Dou, and Ji-Rong Wen. 2024c. Progressive multimodal reasoning via active retrieval. arXiv preprint arXiv:2412.14835 (2024)."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/978--3-031--44693--1_53"},{"key":"e_1_3_2_1_16_1","unstructured":"Shihan Dou Enyu Zhou Yan Liu Songyang Gao Jun Zhao Wei Shen Yuhao Zhou Zhiheng Xi Xiao Wang Xiaoran Fan et al. 2023. The Art of Balancing: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment. arXiv preprint arXiv:2312.09979 (2023)."},{"key":"e_1_3_2_1_17_1","unstructured":"Yin Fang Ningyu Zhang Zhuo Chen Lingbing Guo Xiaohui Fan and Huajun Chen. 2024. Domain-Agnostic Molecular Generation with Chemical Feedback. arxiv: 2301.11259 [cs.LG]"},{"key":"e_1_3_2_1_18_1","volume-title":"REALM: Retrieval-Augmented Language Model Pre-Training. CoRR","author":"Guu Kelvin","year":"2020","unstructured":"Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. REALM: Retrieval-Augmented Language Model Pre-Training. CoRR, Vol. abs\/2002.08909 (2020). showeprint[arXiv]2002.08909 https:\/\/arxiv.org\/abs\/2002.08909"},{"key":"e_1_3_2_1_19_1","volume-title":"Weizhu Chen, Allie Del Giorno, Ronen Eldan, Sivakanth Gopi, et al.","author":"Javaheripi Mojan","year":"2023","unstructured":"Mojan Javaheripi, S\u00e9bastien Bubeck, Marah Abdin, Jyoti Aneja, Sebastien Bubeck, Caio C\u00e9sar Teodoro Mendes, Weizhu Chen, Allie Del Giorno, Ronen Eldan, Sivakanth Gopi, et al. 2023. Phi-2: The surprising power of small language models. Microsoft Research Blog (2023)."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2301.01820"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2310.19852"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","unstructured":"Albert Q. Jiang Alexandre Sablayrolles Arthur Mensch Chris Bamford Devendra Singh Chaplot Diego de Las Casas Florian Bressand Gianna Lengyel Guillaume Lample Lucile Saulnier L\u00e9lio Renard Lavaud Marie-Anne Lachaux Pierre Stock Teven Le Scao Thibaut Lavril Thomas Wang Timoth\u00e9e Lacroix and William El Sayed. 2023a. Mistral 7B. CoRR Vol. abs\/2310.06825 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2310.06825 showeprint[arXiv]2310.06825","DOI":"10.48550\/ARXIV.2310.06825"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.495"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2405.13576"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2402.12174"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3463048"},{"key":"e_1_3_2_1_29_1","volume-title":"Scaling Laws for Neural Language Models. CoRR","author":"Kaplan Jared","year":"2020","unstructured":"Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling Laws for Neural Language Models. CoRR, Vol. abs\/2001.08361 (2020). showeprint[arXiv]2001.08361 https:\/\/arxiv.org\/abs\/2001.08361"},{"key":"e_1_3_2_1_30_1","volume-title":"Dense Passage Retrieval for Open-Domain Question Answering. arxiv","author":"Karpukhin Vladimir","year":"2004","unstructured":"Vladimir Karpukhin, Barlas O\u011fuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. arxiv: 2004.04906 [cs.CL]"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401075"},{"key":"e_1_3_2_1_32_1","volume-title":"Supervised Contrastive Learning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020","author":"Khosla Prannay","year":"2020","unstructured":"Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised Contrastive Learning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/d89a66c7c80a29b1bdbab0f2a1a94af8-Abstract.html"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1162\/TACL_A_00276"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.25300\/MISQ"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2309.11911"},{"key":"e_1_3_2_1_36_1","volume-title":"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020","author":"Lewis Patrick S. H.","year":"2020","unstructured":"Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K\u00fcttler, Mike Lewis, Wen-tau Yih, Tim Rockt\u00e4schel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/6b493230205f780e1bc26945df7481e5-Abstract.html"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2407.04078"},{"key":"e_1_3_2_1_38_1","volume-title":"Query and response augmentation cannot help out-of-domain math reasoning generalization. arXiv preprint arXiv:2310.05506","author":"Li Chengpeng","year":"2023","unstructured":"Chengpeng Li, Zheng Yuan, Guanting Dong, Keming Lu, Jiancan Wu, Chuanqi Tan, Xiang Wang, and Chang Zhou. 2023. Query and response augmentation cannot help out-of-domain math reasoning generalization. arXiv preprint arXiv:2310.05506 (2023)."},{"key":"e_1_3_2_1_39_1","volume-title":"A Survey on Retrieval-Augmented Text Generation. CoRR","author":"Li Huayang","year":"2022","unstructured":"Huayang Li, Yixuan Su, Deng Cai, Yan Wang, and Lemao Liu. 2022a. A Survey on Retrieval-Augmented Text Generation. CoRR, Vol. abs\/2202.01110 (2022). showeprint[arXiv]2202.01110 https:\/\/arxiv.org\/abs\/2202.01110"},{"key":"e_1_3_2_1_40_1","volume-title":"Search-o1: Agentic Search-Enhanced Large Reasoning Models. arXiv preprint arXiv:2501.05366","author":"Li Xiaoxi","year":"2025","unstructured":"Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, and Zhicheng Dou. 2025. Search-o1: Agentic Search-Enhanced Large Reasoning Models. arXiv preprint arXiv:2501.05366 (2025)."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3487553.3524208"},{"key":"e_1_3_2_1_42_1","volume-title":"Pareto Multi-Task Learning. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019","author":"Lin Xi","year":"2019","unstructured":"Xi Lin, Hui-Ling Zhen, Zhenhua Li, Qingfu Zhang, and Sam Kwong. 2019. Pareto Multi-Task Learning. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alch\u00e9-Buc, Emily B. Fox, and Roman Garnett (Eds.). 12037--12047. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/685bfde03eb646c27ed565881917c71c-Abstract.html"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2302.02676"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3560815"},{"key":"e_1_3_2_1_46_1","volume-title":"Training Socially Aligned Language Models in Simulated Human Society. CoRR","author":"Liu Ruibo","year":"2023","unstructured":"Ruibo Liu, Ruixin Yang, Chenyan Jia, Ge Zhang, Denny Zhou, Andrew M. Dai, Diyi Yang, and Soroush Vosoughi. 2023b. Training Socially Aligned Language Models in Simulated Human Society. CoRR, Vol. abs\/2305.16960 (2023)."},{"key":"e_1_3_2_1_47_1","volume-title":"2023 e. Statistical Rejection Sampling Improves Preference Optimization. CoRR","author":"Liu Tianqi","year":"2023","unstructured":"Tianqi Liu, Yao Zhao, Rishabh Joshi, Misha Khalman, Mohammad Saleh, Peter J. Liu, and Jialu Liu. 2023 e. Statistical Rejection Sampling Improves Preference Optimization. CoRR, Vol. abs\/2309.06657 (2023)."},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2312.15685"},{"key":"e_1_3_2_1_49_1","volume-title":"The Flan Collection: Designing Data and Methods for Effective Instruction Tuning. In International Conference on Machine Learning, ICML 2023","volume":"22648","author":"Longpre Shayne","year":"2023","unstructured":"Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, and Adam Roberts. 2023. The Flan Collection: Designing Data and Methods for Effective Instruction Tuning. In International Conference on Machine Learning, ICML 2023, 23--29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 22631--22648. https:\/\/proceedings.mlr.press\/v202\/longpre23a.html"},{"key":"e_1_3_2_1_50_1","volume-title":"Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101","author":"Loshchilov Ilya","year":"2017","unstructured":"Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)."},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2308.07074"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2308.09583"},{"key":"e_1_3_2_1_53_1","volume-title":"Chatkbqa: A generate-then-retrieve framework for knowledge base question answering with fine-tuned large language models. arXiv preprint arXiv:2310.08975","author":"Luo Haoran","year":"2023","unstructured":"Haoran Luo, Zichen Tang, Shiyao Peng, Yikai Guo, Wentai Zhang, Chenghao Ma, Guanting Dong, Meina Song, Wei Lin, et al. 2023b. Chatkbqa: A generate-then-retrieve framework for knowledge base question answering with fine-tuned large language models. arXiv preprint arXiv:2310.08975 (2023)."},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2306.08568"},{"key":"e_1_3_2_1_55_1","volume-title":"4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2--4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1511","author":"Luong Minh-Thang","year":"2016","unstructured":"Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. 2016. Multi-task Sequence to Sequence Learning. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2--4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1511.06114"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2310.08319"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2305.02156"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_59_1","unstructured":"Meta. 2024. Introducing Meta Llama 3: The most capable openly available LLM to date. https:\/\/ai.meta.com\/blog\/meta-llama-3\/"},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_61_1","unstructured":"Inc. NetEase Youdao. 2023. BCEmbedding: Bilingual and Crosslingual Embedding for RAG. https:\/\/github.com\/netease-youdao\/BCEmbedding."},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_63_1","volume-title":"Multi-Stage Document Ranking with BERT. CoRR","author":"Nogueira Rodrigo Frassetto","year":"2019","unstructured":"Rodrigo Frassetto Nogueira, Wei Yang, Kyunghyun Cho, and Jimmy Lin. 2019. Multi-Stage Document Ranking with BERT. CoRR, Vol. abs\/1910.14424 (2019). showeprint[arXiv]1910.14424 http:\/\/arxiv.org\/abs\/1910.14424"},{"key":"e_1_3_2_1_66_1","volume-title":"Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022a. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.). http:\/\/papers.nips.cc\/paper_files\/paper\/2022\/hash\/b1efde53be364a73914f58805a001731-Abstract-Conference.html"},{"key":"e_1_3_2_1_67_1","unstructured":"Long Ouyang Jeffrey Wu Xu Jiang Diogo Almeida Carroll L. Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray John Schulman Jacob Hilton Fraser Kelton Luke Miller Maddie Simens Amanda Askell Peter Welinder Paul F. Christiano Jan Leike and Ryan Lowe. 2022b. Training language models to follow instructions with human feedback. In NeurIPS."},{"key":"e_1_3_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3298689.3347000"},{"key":"e_1_3_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_70_1","volume-title":"Rodrigo Frassetto Nogueira, and Jimmy Lin","author":"Pradeep Ronak","year":"2021","unstructured":"Ronak Pradeep, Rodrigo Frassetto Nogueira, and Jimmy Lin. 2021. The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models. CoRR, Vol. abs\/2101.05667 (2021). showeprint[arXiv]2101.05667 https:\/\/arxiv.org\/abs\/2101.05667"},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2306.17563"},{"key":"e_1_3_2_1_73_1","volume-title":"Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023","author":"Rafailov Rafael","year":"2023","unstructured":"Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (Eds.). http:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html"},{"key":"e_1_3_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2307.11019"},{"key":"e_1_3_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_78_1","volume-title":"Proximal Policy Optimization Algorithms. CoRR","author":"Schulman John","year":"2017","unstructured":"John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. CoRR, Vol. abs\/1707.06347 (2017)."},{"key":"e_1_3_2_1_79_1","volume-title":"Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018","author":"Sener Ozan","year":"2018","unstructured":"Ozan Sener and Vladlen Koltun. 2018. Multi-Task Learning as Multi-Objective Optimization. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3--8, 2018, Montr\u00e9al, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicol\u00f2 Cesa-Bianchi, and Roman Garnett (Eds.). 525--536. https:\/\/proceedings.neurips.cc\/paper\/2018\/hash\/432aca3a1e345e339f35a30c8f65edce-Abstract.html"},{"key":"e_1_3_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1002\/J.1538--7305.1948.TB01338.X"},{"key":"e_1_3_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2301.12652"},{"key":"e_1_3_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1609\/AAAI.V38I17.29865"},{"key":"e_1_3_2_1_83_1","volume-title":"Christiano","author":"Stiennon Nisan","year":"2020","unstructured":"Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul F. Christiano. 2020. Learning to summarize from human feedback. CoRR, Vol. abs\/2009.01325 (2020). showeprint[arXiv]2009.01325 https:\/\/arxiv.org\/abs\/2009.01325"},{"key":"e_1_3_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_85_1","volume-title":"MathScale: Scaling Instruction Tuning for Mathematical Reasoning. In Forty-first International Conference on Machine Learning, ICML 2024","author":"Tang Zhengyang","year":"2024","unstructured":"Zhengyang Tang, Xingxing Zhang, Benyou Wang, and Furu Wei. 2024. MathScale: Scaling Instruction Tuning for Mathematical Reasoning. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21--27, 2024. OpenReview.net. https:\/\/openreview.net\/forum?id=Kjww7ZN47M"},{"key":"e_1_3_2_1_86_1","doi-asserted-by":"publisher","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton-Ferrer Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie Kambadur Sharan Narang Aur\u00e9lien Rodriguez Robert Stojnic Sergey Edunov and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. CoRR Vol. abs\/2307.09288 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2307.09288 showeprint[arXiv]2307.09288","DOI":"10.48550\/ARXIV.2307.09288"},{"key":"e_1_3_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_88_1","volume-title":"Representation Learning with Contrastive Predictive Coding. CoRR","author":"van den Oord A\u00e4ron","year":"2018","unstructured":"A\u00e4ron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation Learning with Contrastive Predictive Coding. CoRR, Vol. abs\/1807.03748 (2018). showeprint[arXiv]1807.03748 http:\/\/arxiv.org\/abs\/1807.03748"},{"key":"e_1_3_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1145\/2629489"},{"key":"e_1_3_2_1_90_1","unstructured":"Keheng Wang Feiyu Duan Peiguang Li Sirui Wang and Xunliang Cai. 2024a. LLMs Know What They Need: Leveraging a Missing Information Guided Framework to Empower Retrieval-Augmented Generation. arxiv: 2404.14043 [cs.CL]"},{"key":"e_1_3_2_1_91_1","doi-asserted-by":"crossref","unstructured":"Yejie Wang Keqing He Guanting Dong Pei Wang Weihao Zeng Muxi Diao Yutao Mou Mengdi Zhang Jingang Wang Xunliang Cai et al. 2024b. DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning. arXiv preprint arXiv:2402.09136 (2024).","DOI":"10.18653\/v1\/2024.acl-long.259"},{"key":"e_1_3_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2307.12966"},{"key":"e_1_3_2_1_93_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2311.08377"},{"key":"e_1_3_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2403.05313"},{"key":"e_1_3_2_1_95_1","volume-title":"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.). http:\/\/papers.nips.cc\/paper_files\/paper\/2022\/hash\/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html"},{"key":"e_1_3_2_1_96_1","volume-title":"Llama pro: Progressive llama with block expansion. arXiv preprint arXiv:2401.02415","author":"Wu Chengyue","year":"2024","unstructured":"Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ping Luo, and Ying Shan. 2024. Llama pro: Progressive llama with block expansion. arXiv preprint arXiv:2401.02415 (2024)."},{"key":"e_1_3_2_1_97_1","volume-title":"Fine-Grained Human Feedback Gives Better Rewards for Language Model Training. CoRR","author":"Wu Zeqiu","year":"2023","unstructured":"Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, and Hannaneh Hajishirzi. 2023. Fine-Grained Human Feedback Gives Better Rewards for Language Model Training. CoRR, Vol. abs\/2306.01693 (2023)."},{"key":"e_1_3_2_1_98_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2309.07597"},{"key":"e_1_3_2_1_99_1","volume-title":"[n.,d.]. Qwen2 technical report. arXiv","author":"Yang A","year":"2024","unstructured":"A Yang, B Yang, B Hui, B Zheng, B Yu, C Zhou, C Li, C Li, D Liu, F Huang, et al. [n.,d.]. Qwen2 technical report. arXiv 2024. arXiv preprint arXiv:2407.10671 ( [n.,d.])."},{"key":"e_1_3_2_1_101_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_102_1","volume-title":"ReAct: Synergizing Reasoning and Acting in Language Models. In The Eleventh International Conference on Learning Representations, ICLR 2023","author":"Yao Shunyu","year":"2023","unstructured":"Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1--5, 2023. OpenReview.net. https:\/\/openreview.net\/pdf?id=WE_vluYUL-X"},{"key":"e_1_3_2_1_103_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"},{"key":"e_1_3_2_1_104_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2309.12284"},{"key":"e_1_3_2_1_105_1","volume-title":"System-Level Natural Language Feedback. CoRR","author":"Yuan Weizhe","year":"2023","unstructured":"Weizhe Yuan, Kyunghyun Cho, and Jason Weston. 2023a. System-Level Natural Language Feedback. CoRR, Vol. abs\/2306.13588 (2023)."},{"key":"e_1_3_2_1_106_1","volume-title":"Scaling relationship on learning mathematical reasoning with large language models. arXiv preprint arXiv:2308.01825","author":"Yuan Zheng","year":"2023","unstructured":"Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Chuanqi Tan, and Chang Zhou. 2023b. Scaling relationship on learning mathematical reasoning with large language models. arXiv preprint arXiv:2308.01825 (2023)."},{"key":"e_1_3_2_1_107_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2304.05302"},{"key":"e_1_3_2_1_108_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2406.00770"},{"key":"e_1_3_2_1_109_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2403.10131"},{"key":"e_1_3_2_1_110_1","doi-asserted-by":"crossref","unstructured":"Yichi Zhang Zhuo Chen Yin Fang Yanxi Lu Fangming Li Wen Zhang and Huajun Chen. 2024a. Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering. arxiv: 2311.06503 [cs.CL]","DOI":"10.18653\/v1\/2024.findings-acl.52"},{"key":"e_1_3_2_1_111_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2309.01219"},{"key":"e_1_3_2_1_112_1","volume-title":"Liu","author":"Zhao Yao","year":"2023","unstructured":"Yao Zhao, Rishabh Joshi, Tianqi Liu, Misha Khalman, Mohammad Saleh, and Peter J. Liu. 2023. SLiC-HF: Sequence Likelihood Calibration with Human Feedback. CoRR, Vol. abs\/2305.10425 (2023)."},{"key":"e_1_3_2_1_113_1","volume-title":"LlamaFactory: Unified Efficient Fine-Tuning of 100 Language Models. arXiv preprint arXiv:2403.13372","author":"Zheng Yaowei","year":"2024","unstructured":"Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo, and Yongqiang Ma. 2024. LlamaFactory: Unified Efficient Fine-Tuning of 100 Language Models. arXiv preprint arXiv:2403.13372 (2024). http:\/\/arxiv.org\/abs\/2403.13372"},{"key":"e_1_3_2_1_114_1","volume-title":"LIMA: Less Is More for Alignment. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023","author":"Zhou Chunting","year":"2023","unstructured":"Chunting Zhou, Pengfei Liu, Puxin Xu, Srinivasan Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, and Omer Levy. 2023. LIMA: Less Is More for Alignment. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (Eds.). http:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/ac662d74829e4407ce1d126477f4a03a-Abstract-Conference.html"},{"key":"e_1_3_2_1_115_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2402.11626"},{"key":"e_1_3_2_1_116_1","doi-asserted-by":"publisher","DOI":"10.1145\/3539618.3592047"},{"key":"e_1_3_2_1_117_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1"}],"event":{"name":"WWW '25: The ACM Web Conference 2025","location":"Sydney NSW Australia","acronym":"WWW '25","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web"]},"container-title":["Proceedings of the ACM on Web Conference 2025"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3696410.3714717","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3696410.3714717","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:57Z","timestamp":1750295937000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3696410.3714717"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,22]]},"references-count":112,"alternative-id":["10.1145\/3696410.3714717","10.1145\/3696410"],"URL":"https:\/\/doi.org\/10.1145\/3696410.3714717","relation":{},"subject":[],"published":{"date-parts":[[2025,4,22]]},"assertion":[{"value":"2025-04-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}