{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,25]],"date-time":"2026-01-25T03:50:24Z","timestamp":1769313024614,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":68,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,10,27]],"date-time":"2024-10-27T00:00:00Z","timestamp":1729987200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Natural Science Foundation of China Projects","award":["U1936213"],"award-info":[{"award-number":["U1936213"]}]},{"name":"National Natural Science Foundation of China Projects","award":["62272473"],"award-info":[{"award-number":["62272473"]}]},{"name":"Science and Technology Innovation Program of Hunan Province","award":["2023RC1001"],"award-info":[{"award-number":["2023RC1001"]}]},{"name":"Shanghai Science, Technology Development","award":["22dz1200704"],"award-info":[{"award-number":["22dz1200704"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,10,27]]},"DOI":"10.1145\/3691620.3694987","type":"proceedings-article","created":{"date-parts":[[2024,10,18]],"date-time":"2024-10-18T15:39:19Z","timestamp":1729265959000},"page":"65-77","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Preference-Guided Refactored Tuning for Retrieval Augmented Code Generation"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-9306-6106","authenticated-orcid":false,"given":"Xinyu","family":"Gao","sequence":"first","affiliation":[{"name":"Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8575-5415","authenticated-orcid":false,"given":"Yun","family":"Xiong","sequence":"additional","affiliation":[{"name":"Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7935-6840","authenticated-orcid":false,"given":"Deze","family":"Wang","sequence":"additional","affiliation":[{"name":"National University of Defense Technology, Hunan, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-5364-7961","authenticated-orcid":false,"given":"Zhenhan","family":"Guan","sequence":"additional","affiliation":[{"name":"Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-1312-7738","authenticated-orcid":false,"given":"Zejian","family":"Shi","sequence":"additional","affiliation":[{"name":"Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3018-3824","authenticated-orcid":false,"given":"Haofen","family":"Wang","sequence":"additional","affiliation":[{"name":"Tongji University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0798-974X","authenticated-orcid":false,"given":"Shanshan","family":"Li","sequence":"additional","affiliation":[{"name":"National University of Defense Technology, Hunan, China"}]}],"member":"320","published-online":{"date-parts":[[2024,10,27]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, et al. Santacoder: don't reach for the stars! arXiv preprint arXiv:2301.03988","author":"Allal Loubna Ben","year":"2023","unstructured":"Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, et al. Santacoder: don't reach for the stars! arXiv preprint arXiv:2301.03988, 2023."},{"key":"e_1_3_2_1_2_1","volume-title":"Selfrag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511","author":"Asai Akari","year":"2023","unstructured":"Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. Selfrag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511, 2023."},{"key":"e_1_3_2_1_3_1","volume-title":"Optimizing retrieval-augmented reader models via token elimination. arXiv preprint arXiv:2310.13682","author":"Berchansky Moshe","year":"2023","unstructured":"Moshe Berchansky, Peter Izsak, Avi Caciularu, Ido Dagan, and Moshe Wasserblat. Optimizing retrieval-augmented reader models via token elimination. arXiv preprint arXiv:2310.13682, 2023."},{"key":"e_1_3_2_1_4_1","first-page":"2206","volume-title":"International conference on machine learning","author":"Borgeaud Sebastian","year":"2022","unstructured":"Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206--2240. PMLR, 2022."},{"key":"e_1_3_2_1_5_1","volume-title":"Language models are few-shot learners. Advances in neural information processing systems, 33:1877--1901","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877--1901, 2020."},{"key":"e_1_3_2_1_6_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018."},{"key":"e_1_3_2_1_7_1","volume-title":"The faiss library","author":"Douze Matthijs","year":"2024","unstructured":"Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazar\u00e9, Maria Lomeli, Lucas Hosseini, and Herv\u00e9 J\u00e9gou. The faiss library. 2024."},{"key":"e_1_3_2_1_8_1","volume-title":"Ast-t5: Structure-aware pre-training for code generation and understanding. arXiv preprint arXiv:2401.03003","author":"Gong Linyuan","year":"2024","unstructured":"Linyuan Gong, Mostafa Elhoushi, and Alvin Cheung. Ast-t5: Structure-aware pre-training for code generation and understanding. arXiv preprint arXiv:2401.03003, 2024."},{"key":"e_1_3_2_1_9_1","first-page":"3929","volume-title":"International conference on machine learning","author":"Guu Kelvin","year":"2020","unstructured":"Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929--3938. PMLR, 2020."},{"key":"e_1_3_2_1_10_1","volume-title":"Levenshtein distance technique in dictionary lookup methods: An improved approach. arXiv preprint arXiv:1101.1232","author":"Haldar Rishin","year":"2011","unstructured":"Rishin Haldar and Debajyoti Mukhopadhyay. Levenshtein distance technique in dictionary lookup methods: An improved approach. arXiv preprint arXiv:1101.1232, 2011."},{"key":"e_1_3_2_1_11_1","volume-title":"Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436","author":"Husain Hamel","year":"2019","unstructured":"Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436, 2019."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.825"},{"key":"e_1_3_2_1_13_1","volume-title":"Active retrieval augmented generation. arXiv preprint arXiv:2305.06983","author":"Jiang Zhengbao","year":"2023","unstructured":"Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation. arXiv preprint arXiv:2305.06983, 2023."},{"key":"e_1_3_2_1_14_1","first-page":"15696","volume-title":"International Conference on Machine Learning","author":"Kandpal Nikhil","year":"2023","unstructured":"Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, and Colin Raffel. Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, pages 15696--15707. PMLR, 2023."},{"key":"e_1_3_2_1_15_1","volume-title":"Bridging the preference gap between retrievers and llms. arXiv preprint arXiv:2401.06954","author":"Ke Zixuan","year":"2024","unstructured":"Zixuan Ke, Weize Kong, Cheng Li, Mingyang Zhang, Qiaozhu Mei, and Michael Bendersky. Bridging the preference gap between retrievers and llms. arXiv preprint arXiv:2401.06954, 2024."},{"key":"e_1_3_2_1_16_1","volume-title":"Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172","author":"Khandelwal Urvashi","year":"2019","unstructured":"Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172, 2019."},{"key":"e_1_3_2_1_17_1","volume-title":"The Eleventh International Conference on Learning Representations","author":"Lan Tian","year":"2023","unstructured":"Tian Lan, Deng Cai, Yan Wang, Heyan Huang, and Xian-Ling Mao. Copy is all you need. In The Eleventh International Conference on Learning Representations, 2023."},{"key":"e_1_3_2_1_18_1","volume-title":"Large language model-aware in-context learning for code generation. arXiv preprint arXiv:2310.09748","author":"Li Jia","year":"2023","unstructured":"Jia Li, Ge Li, Chongyang Tao, Huangzhao Zhang, Fang Liu, and Zhi Jin. Large language model-aware in-context learning for code generation. arXiv preprint arXiv:2310.09748, 2023."},{"key":"e_1_3_2_1_19_1","volume-title":"Jingyuan Wang, Jian-Yun Nie, and Ji-Rong Wen. The web can be your oyster for improving large language models. arXiv preprint arXiv:2305.10998","author":"Li Junyi","year":"2023","unstructured":"Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jingyuan Wang, Jian-Yun Nie, and Ji-Rong Wen. The web can be your oyster for improving large language models. arXiv preprint arXiv:2305.10998, 2023."},{"key":"e_1_3_2_1_20_1","volume-title":"Unlocking context constraints of llms: Enhancing context efficiency of llms with self-information-based content filtering. arXiv preprint arXiv:2304.12102","author":"Li Yucheng","year":"2023","unstructured":"Yucheng Li. Unlocking context constraints of llms: Enhancing context efficiency of llms with self-information-based content filtering. arXiv preprint arXiv:2304.12102, 2023."},{"key":"e_1_3_2_1_21_1","first-page":"20852","volume-title":"Less is more: Task-aware layer-wise distillation for language model compression","author":"Liang Chen","year":"2022","unstructured":"Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, and Tuo Zhao. Less is more: Task-aware layer-wise distillation for language model compression. pages 20852--20867, 2022."},{"key":"e_1_3_2_1_22_1","volume-title":"et al. Repofuse: Repository-level code completion with fused dual context. arXiv preprint arXiv:2402.14323","author":"Liang Ming","year":"2024","unstructured":"Ming Liang, Xiaoheng Xie, Gehao Zhang, Xunjin Zheng, Peng Di, Hongwei Chen, Chengpeng Wang, Gang Fan, et al. Repofuse: Repository-level code completion with fused dual context. arXiv preprint arXiv:2402.14323, 2024."},{"key":"e_1_3_2_1_23_1","volume-title":"et al. Ra-dit: Retrieval-augmented dual instruction tuning. arXiv preprint arXiv:2310.01352","author":"Lin Xi Victoria","year":"2023","unstructured":"Xi Victoria Lin, Xilun Chen, Mingda Chen, Weijia Shi, Maria Lomeli, Rich James, Pedro Rodriguez, Jacob Kahn, Gergely Szilvasy, Mike Lewis, et al. Ra-dit: Retrieval-augmented dual instruction tuning. arXiv preprint arXiv:2310.01352, 2023."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE56229.2023.00159"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00638"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3580305.3599931"},{"key":"e_1_3_2_1_27_1","volume-title":"et al. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664","author":"Lu Shuai","year":"2021","unstructured":"Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, et al. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664, 2021."},{"key":"e_1_3_2_1_28_1","volume-title":"Nonparametric masked language modeling. arXiv preprint arXiv:2212.01349","author":"Min Sewon","year":"2022","unstructured":"Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, and Luke Zettlemoyer. Nonparametric masked language modeling. arXiv preprint arXiv:2212.01349, 2022."},{"key":"e_1_3_2_1_29_1","volume-title":"Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474","author":"Nijkamp Erik","year":"2022","unstructured":"Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474, 2022."},{"key":"e_1_3_2_1_30_1","volume-title":"Llmlingua-2: Data distillation for efficient and faithful task-agnostic prompt compression. arXiv preprint arXiv:2403.12968","author":"Pan Zhuoshi","year":"2024","unstructured":"Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor R\u00fchle, Yuqing Yang, Chin-Yew Lin, et al. Llmlingua-2: Data distillation for efficient and faithful task-agnostic prompt compression. arXiv preprint arXiv:2403.12968, 2024."},{"key":"e_1_3_2_1_31_1","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Pierre Isabelle, Eugene Charniak, and Dekang Lin, editors, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311--318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics."},{"key":"e_1_3_2_1_32_1","volume-title":"Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. Retrieval augmented code generation and summarization. arXiv preprint arXiv:2108.11601","author":"Parvez Md Rizwan","year":"2021","unstructured":"Md Rizwan Parvez, Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. Retrieval augmented code generation and summarization. arXiv preprint arXiv:2108.11601, 2021."},{"key":"e_1_3_2_1_33_1","volume-title":"Language models are unsupervised multitask learners","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019."},{"key":"e_1_3_2_1_34_1","volume-title":"Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1--67","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1--67, 2020."},{"key":"e_1_3_2_1_35_1","volume-title":"Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530","author":"Reid Machel","year":"2024","unstructured":"Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry Lepikhin, Timothy Lillicrap, Jean-baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530, 2024."},{"key":"e_1_3_2_1_36_1","volume-title":"Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297","author":"Ren Shuo","year":"2020","unstructured":"Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297, 2020."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE56229.2023.00143"},{"key":"e_1_3_2_1_38_1","unstructured":"Baptiste Rozi\u00e8re Jonas Gehring Fabian Gloeckle Sten Sootla Itai Gat Xiaoqing Tan Yossi Adi Jingyu Liu Tal Remez J\u00e9r\u00e9my Rapin Artyom Kozhevnikov I. Evtimov Joanna Bitton Manish P Bhatt Cristian Cant\u00f3n Ferrer Aaron Grattafiori Wenhan Xiong Alexandre D'efossez Jade Copet Faisal Azhar Hugo Touvron Louis Martin Nicolas Usunier Thomas Scialom and Gabriel Synnaeve. Code llama: Open foundation models for code. ArXiv abs\/2308.12950 2023."},{"key":"e_1_3_2_1_39_1","volume-title":"Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347","author":"Schulman John","year":"2017","unstructured":"John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017."},{"key":"e_1_3_2_1_40_1","volume-title":"Pangu-coder2: Boosting large language models for code with ranking feedback. arXiv preprint arXiv:2307.14936","author":"Shen Bo","year":"2023","unstructured":"Bo Shen, Jiaxin Zhang, Taihong Chen, Daoguang Zan, Bing Geng, An Fu, Muhan Zeng, Ailun Yu, Jichuan Ji, Jingyang Zhao, et al. Pangu-coder2: Boosting large language models for code with ranking feedback. arXiv preprint arXiv:2307.14936, 2023."},{"key":"e_1_3_2_1_41_1","series-title":"Proceedings of Machine Learning Research","first-page":"31210","volume-title":"Nathanael Sch\u00e4rli, and Denny Zhou. Large language models can be easily distracted by irrelevant context","author":"Shi Freda","year":"2023","unstructured":"Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H. Chi, Nathanael Sch\u00e4rli, and Denny Zhou. Large language models can be easily distracted by irrelevant context. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 31210--31227. PMLR, 23--29 Jul 2023."},{"key":"e_1_3_2_1_42_1","volume-title":"Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652","author":"Shi Weijia","year":"2023","unstructured":"Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, and Wen-tau Yih. Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652, 2023."},{"key":"e_1_3_2_1_43_1","volume-title":"Biomedical knowledge graph-enhanced prompt generation for large language models. arXiv preprint arXiv:2311.17330","author":"Soman Karthik","year":"2023","unstructured":"Karthik Soman, Peter W Rose, John H Morris, Rabia E Akbas, Brett Smith, Braian Peetoom, Catalina Villouta-Reyes, Gabriel Cerono, Yongmei Shi, Angela Rizk-Jackson, et al. Biomedical knowledge graph-enhanced prompt generation for large language models. arXiv preprint arXiv:2311.17330, 2023."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i17.29865"},{"key":"e_1_3_2_1_45_1","volume-title":"Arks: Active retrieval in knowledge soup for code generation. arXiv preprint arXiv:2402.12317","author":"Su Hongjin","year":"2024","unstructured":"Hongjin Su, Shuyang Jiang, Yuhang Lai, Haoyuan Wu, Boao Shi, Che Liu, Qian Liu, and Tao Yu. Arks: Active retrieval in knowledge soup for code generation. arXiv preprint arXiv:2402.12317, 2024."},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE56229.2023.00076"},{"key":"e_1_3_2_1_47_1","volume-title":"Xunzhu Tang, Shing-Chi Cheung, Jacques Klein, and Tegawend\u00e9 F Bissyand\u00e9. Is chatgpt the ultimate programming assistant-how far is it? arXiv preprint arXiv:2304.11938","author":"Tian Haoye","year":"2023","unstructured":"Haoye Tian, Weiqi Lu, Tsz On Li, Xunzhu Tang, Shing-Chi Cheung, Jacques Klein, and Tegawend\u00e9 F Bissyand\u00e9. Is chatgpt the ultimate programming assistant-how far is it? arXiv preprint arXiv:2304.11938, 2023."},{"key":"e_1_3_2_1_48_1","volume-title":"Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023."},{"key":"e_1_3_2_1_49_1","volume-title":"Instructretro: Instruction tuning post retrieval-augmented pretraining. arXiv preprint arXiv:2310.07713","author":"Wang Boxin","year":"2023","unstructured":"Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, and Bryan Catanzaro. Instructretro: Instruction tuning post retrieval-augmented pretraining. arXiv preprint arXiv:2310.07713, 2023."},{"key":"e_1_3_2_1_50_1","volume-title":"Shall we pretrain autoregressive language models with retrieval? a comprehensive study. arXiv preprint arXiv:2304.06762","author":"Wang Boxin","year":"2023","unstructured":"Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, et al. Shall we pretrain autoregressive language models with retrieval? a comprehensive study. arXiv preprint arXiv:2304.06762, 2023."},{"key":"e_1_3_2_1_51_1","volume-title":"Nghi DQ Bui, Junnan Li, and Steven CH Hoi. Codet5+: Open code large language models for code understanding and generation. arXiv preprint arXiv:2305.07922","author":"Wang Yue","year":"2023","unstructured":"Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi DQ Bui, Junnan Li, and Steven CH Hoi. Codet5+: Open code large language models for code understanding and generation. arXiv preprint arXiv:2305.07922, 2023."},{"key":"e_1_3_2_1_52_1","volume-title":"Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859","author":"Wang Yue","year":"2021","unstructured":"Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859, 2021."},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.558"},{"key":"e_1_3_2_1_54_1","volume-title":"Recomp: Improving retrieval-augmented lms with compression and selective augmentation. arXiv preprint arXiv:2310.04408","author":"Xu Fangyuan","year":"2023","unstructured":"Fangyuan Xu, Weijia Shi, and Eunsol Choi. Recomp: Improving retrieval-augmented lms with compression and selective augmentation. arXiv preprint arXiv:2310.04408, 2023."},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3520312.3534862"},{"key":"e_1_3_2_1_56_1","volume-title":"Corrective retrieval augmented generation. arXiv preprint arXiv:2401.15884","author":"Yan Shi-Qi","year":"2024","unstructured":"Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, and Zhen-Hua Ling. Corrective retrieval augmented generation. arXiv preprint arXiv:2401.15884, 2024."},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.326"},{"key":"e_1_3_2_1_58_1","volume-title":"Rrhf: Rank responses to align language models with human feedback without tears. arXiv preprint arXiv:2304.05302","author":"Yuan Zheng","year":"2023","unstructured":"Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang, and Fei Huang. Rrhf: Rank responses to align language models with human feedback without tears. arXiv preprint arXiv:2304.05302, 2023."},{"key":"e_1_3_2_1_59_1","volume-title":"When language model meets private library. arXiv preprint arXiv:2210.17236","author":"Zan Daoguang","year":"2022","unstructured":"Daoguang Zan, Bei Chen, Zeqi Lin, Bei Guan, Yongji Wang, and Jian-Guang Lou. When language model meets private library. arXiv preprint arXiv:2210.17236, 2022."},{"key":"e_1_3_2_1_60_1","volume-title":"Repocoder: Repository-level code completion through iterative retrieval and generation. arXiv preprint arXiv:2303.12570","author":"Zhang Fengji","year":"2023","unstructured":"Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, and Weizhu Chen. Repocoder: Repository-level code completion through iterative retrieval and generation. arXiv preprint arXiv:2303.12570, 2023."},{"key":"e_1_3_2_1_61_1","volume-title":"Codeagent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges. arXiv preprint arXiv:2401.07339","author":"Zhang Kechi","year":"2024","unstructured":"Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, and Zhi Jin. Codeagent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges. arXiv preprint arXiv:2401.07339, 2024."},{"key":"e_1_3_2_1_62_1","volume-title":"Tool-coder: Teach code generation models to use api search tools. arXiv preprint arXiv:2305.04032","author":"Zhang Kechi","year":"2023","unstructured":"Kechi Zhang, Huangzhao Zhang, Ge Li, Jia Li, Zhuo Li, and Zhi Jin. Tool-coder: Teach code generation models to use api search tools. arXiv preprint arXiv:2305.04032, 2023."},{"key":"e_1_3_2_1_63_1","volume-title":"The Eleventh International Conference on Learning Representations","author":"Zhang Shun","year":"2023","unstructured":"Shun Zhang, Zhenfang Chen, Yikang Shen, Mingyu Ding, Joshua B. Tenenbaum, and Chuang Gan. Planning with large language models for code generation. In The Eleventh International Conference on Learning Representations, 2023."},{"key":"e_1_3_2_1_64_1","first-page":"41832","volume-title":"International Conference on Machine Learning","author":"Zhang Tianyi","year":"2023","unstructured":"Tianyi Zhang, Tao Yu, Tatsunori Hashimoto, Mike Lewis, Wen-tau Yih, Daniel Fried, and Sida Wang. Coder reviewer reranking for code generation. In International Conference on Machine Learning, pages 41832--41846. PMLR, 2023."},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.90"},{"key":"e_1_3_2_1_66_1","volume-title":"Ltgc: Long-tail recognition via leveraging llms-driven generated content. arXiv preprint arXiv:2403.05854","author":"Zhao Qihao","year":"2024","unstructured":"Qihao Zhao, Yalun Dai, Hao Li, Wei Hu, Fan Zhang, and Jun Liu. Ltgc: Long-tail recognition via leveraging llms-driven generated content. arXiv preprint arXiv:2403.05854, 2024."},{"key":"e_1_3_2_1_67_1","volume-title":"et al. Codegeex: A pre-trained model for code generation with multilingual evaluations on humaneval-x. arXiv preprint arXiv:2303.17568","author":"Zheng Qinkai","year":"2023","unstructured":"Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li, et al. Codegeex: A pre-trained model for code generation with multilingual evaluations on humaneval-x. arXiv preprint arXiv:2303.17568, 2023."},{"key":"e_1_3_2_1_68_1","volume-title":"Docprompting: Generating code by retrieving the docs. arXiv preprint arXiv:2207.05987","author":"Zhou Shuyan","year":"2022","unstructured":"Shuyan Zhou, Uri Alon, Frank F Xu, Zhiruo Wang, Zhengbao Jiang, and Graham Neubig. Docprompting: Generating code by retrieving the docs. arXiv preprint arXiv:2207.05987, 2022."}],"event":{"name":"ASE '24: 39th IEEE\/ACM International Conference on Automated Software Engineering","location":"Sacramento CA USA","acronym":"ASE '24","sponsor":["SIGAI ACM Special Interest Group on Artificial Intelligence","SIGSOFT ACM Special Interest Group on Software Engineering","IEEE CS"]},"container-title":["Proceedings of the 39th IEEE\/ACM International Conference on Automated Software Engineering"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3691620.3694987","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3691620.3694987","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:04:06Z","timestamp":1750291446000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3691620.3694987"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,27]]},"references-count":68,"alternative-id":["10.1145\/3691620.3694987","10.1145\/3691620"],"URL":"https:\/\/doi.org\/10.1145\/3691620.3694987","relation":{},"subject":[],"published":{"date-parts":[[2024,10,27]]},"assertion":[{"value":"2024-10-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}