{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T15:49:10Z","timestamp":1778168950455,"version":"3.51.4"},"reference-count":96,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,19]]},"abstract":"<jats:p>Code generation has largely improved development efficiency in the era of large language models (LLMs). With the ability to follow instructions, current LLMs can be prompted to generate code solutions given detailed descriptions in natural language. Many research efforts are being devoted to improving the correctness of LLM-generated code, and many benchmarks are proposed to evaluate the correctness comprehensively. Despite the focus on correctness, the time efficiency of LLM-generated code solutions is under-explored. Current correctness benchmarks are not suitable for time efficiency evaluation since their test cases cannot well distinguish the time efficiency of different code solutions. Besides, the current execution time measurement is not stable and comprehensive, threatening the validity of the time efficiency evaluation.\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \nTo address the challenges in the time efficiency evaluation of code generation, we propose COFFE, a code generation benchmark for evaluating the time efficiency of LLM-generated code solutions. COFFE contains 398 and 358 problems for function-level and file-level code generation, respectively. To improve the distinguishability, we design a novel stressful test case generation approach with contracts and two new formats of test cases to improve the accuracy of generation. For the time evaluation metric, we propose efficienct@k based on CPU instruction count to ensure a stable and solid comparison between different solutions. We evaluate 14 popular LLMs on COFFE and identify four findings. Based on the findings, we draw some implications for LLM researchers and software practitioners to facilitate future research and usage of LLMs in code generation.<\/jats:p>","DOI":"10.1145\/3715727","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:16:02Z","timestamp":1750346162000},"page":"242-265","source":"Crossref","is-referenced-by-count":9,"title":["COFFE: A Code Efficiency Benchmark for Code Generation"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1936-5598","authenticated-orcid":false,"given":"Yun","family":"Peng","sequence":"first","affiliation":[{"name":"The Chinese University of Hong Kong, HongKong, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-3294-688X","authenticated-orcid":false,"given":"Jun","family":"Wan","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-8370-644X","authenticated-orcid":false,"given":"Yichen","family":"Li","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5526-1617","authenticated-orcid":false,"given":"Xiaoxue","family":"Ren","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2404.14219"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2301.03988"},{"key":"e_1_2_1_3_1","unstructured":"Anthropic. 2024. API reference provided by Anthropic. https:\/\/docs.anthropic.com\/en\/api\/getting-started https:\/\/docs.anthropic.com\/en\/api\/getting-started"},{"key":"e_1_2_1_4_1","unstructured":"Anthropic. 2024. Claude 3.5 Sonnet. https:\/\/www.anthropic.com\/news\/claude-3-5-sonnet https:\/\/www.anthropic.com\/news\/claude-3-5-sonnet"},{"key":"e_1_2_1_5_1","volume-title":"Program Synthesis with Large Language Models. CoRR, abs\/2108.07732","author":"Austin Jacob","year":"2021","unstructured":"Jacob Austin, Augustus Odena, Maxwell I. Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V. Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. CoRR, abs\/2108.07732 (2021), arXiv:2108.07732. arxiv:2108.07732"},{"key":"e_1_2_1_6_1","unstructured":"Ned Batchelder. 2024. The Coverage.py library. https:\/\/github.com\/nedbat\/coveragepy https:\/\/github.com\/nedbat\/coveragepy"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2312.10622"},{"key":"e_1_2_1_8_1","volume-title":"Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, and Greg Brockman.","author":"Chen Mark","year":"2021","unstructured":"Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pond\u00e9 de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, and Greg Brockman. 2021. Evaluating Large Language Models Trained on Code. CoRR, abs\/2107.03374 (2021), arXiv:2107.03374. arxiv:2107.03374"},{"key":"e_1_2_1_9_1","volume-title":"Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, and Alex Ray.","author":"Chen Mark","year":"2021","unstructured":"Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pond\u00e9 de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, and Alex Ray. 2021. Evaluating Large Language Models Trained on Code. CoRR, abs\/2107.03374 (2021), arXiv:2107.03374. arxiv:2107.03374"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2304.05128"},{"key":"e_1_2_1_11_1","volume-title":"ChatUniTest: A Framework for LLM-Based Test Generation. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, FSE 2024, Porto de Galinhas","author":"Chen Yinghao","year":"2024","unstructured":"Yinghao Chen, Zehao Hu, Chen Zhi, Junxiao Han, Shuiguang Deng, and Jianwei Yin. 2024. ChatUniTest: A Framework for LLM-Based Test Generation. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, FSE 2024, Porto de Galinhas, Brazil, July 15-19, 2024. ACM, 572\u2013576. https:\/\/doi.org\/10.1145\/3663529.3663801 10.1145\/3663529.3663801"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2210.11416"},{"key":"e_1_2_1_13_1","unstructured":"The MITRE Corporation. 2024. Performance efficiency CWEs. https:\/\/cwe.mitre.org\/data\/definitions\/1132.html https:\/\/cwe.mitre.org\/data\/definitions\/1132.html"},{"key":"e_1_2_1_14_1","unstructured":"Deepmind. 2024. Gemini 1.5 Pro. https:\/\/deepmind.google\/technologies\/gemini\/pro\/ https:\/\/deepmind.google\/technologies\/gemini\/pro\/"},{"key":"e_1_2_1_15_1","unstructured":"DeepSeek. 2024. DeepSeek API. https:\/\/platform.deepseek.com\/ https:\/\/platform.deepseek.com\/"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2405.04434"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023","author":"Deng Yinlin","year":"2023","unstructured":"Yinlin Deng, Chunqiu Steven Xia, Haoran Peng, Chenyuan Yang, and Lingming Zhang. 2023. Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, WA, USA, July 17-21, 2023. ACM, 423\u2013435. https:\/\/doi.org\/10.1145\/3597926.3598067 10.1145\/3597926.3598067"},{"key":"e_1_2_1_18_1","volume-title":"Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, and Bing Xiang.","author":"Ding Yangruibo","year":"2023","unstructured":"Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, and Bing Xiang. 2023. CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. http:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/920f2dced7d32ab2ba2f1970bc306af6-Abstract-Datasets_and_Benchmarks.html"},{"key":"e_1_2_1_19_1","unstructured":"Docker. 2024. Docker. https:\/\/www.docker.com\/ https:\/\/www.docker.com\/"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the ACM on Software Engineering, 1, FSE (2024)","author":"Endres Madeline","year":"2024","unstructured":"Madeline Endres, Sarah Fakhoury, Saikat Chakraborty, and Shuvendu K Lahiri. 2024. Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions? Proceedings of the ACM on Software Engineering, 1, FSE (2024), 1889\u20131912."},{"key":"e_1_2_1_21_1","unstructured":"The Linux Foundation. 2024. The perf tool on linux. https:\/\/perf.wiki.kernel.org\/index.php\/Main_Page https:\/\/perf.wiki.kernel.org\/index.php\/Main_Page"},{"key":"e_1_2_1_22_1","volume-title":"InCoder: A Generative Model for Code Infilling and Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023","author":"Fried Daniel","year":"2023","unstructured":"Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Scott Yih, Luke Zettlemoyer, and Mike Lewis. 2023. InCoder: A Generative Model for Code Infilling and Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https:\/\/openreview.net\/pdf?id=hQwb-lbM6EL"},{"key":"e_1_2_1_23_1","unstructured":"Inc. GitHub. 2022. GitHub Octoverse report on programming languages. https:\/\/octoverse.github.com\/2022\/top-programming-languages https:\/\/octoverse.github.com\/2022\/top-programming-languages"},{"key":"e_1_2_1_24_1","unstructured":"Google. 2023. Sanitized version of MBPP benchmark released by Google. https:\/\/huggingface.co\/datasets\/google-research-datasets\/mbpp\/viewer\/sanitized\/test https:\/\/huggingface.co\/datasets\/google-research-datasets\/mbpp\/viewer\/sanitized\/test"},{"key":"e_1_2_1_25_1","unstructured":"Google. 2024. API reference provided by Google. https:\/\/ai.google.dev\/gemini-api\/docs\/models\/gemini https:\/\/ai.google.dev\/gemini-api\/docs\/models\/gemini"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2401.14196"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2406.11927"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021","author":"Hendrycks Dan","year":"2021","unstructured":"Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, and Jacob Steinhardt. 2021. Measuring Coding Challenge Competence With APPS. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual. https:\/\/datasets-benchmarks-proceedings.neurips.cc\/paper\/2021\/hash\/c24cd76e1ce41366a4bbe8a49b02a028-Abstract-round2.html"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2310.02003"},{"key":"e_1_2_1_30_1","volume-title":"MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. In The Twelfth International Conference on Learning Representations, ICLR 2024","author":"Hong Sirui","year":"2024","unstructured":"Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, and Steven Ka Shing Yau. 2024. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. https:\/\/openreview.net\/forum?id=VtmBAGCN7o"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2405.03786"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2312.13010"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2402.02037"},{"key":"e_1_2_1_34_1","unstructured":"Deep Infra. 2024. Deep Infra API. https:\/\/deepinfra.com\/ https:\/\/deepinfra.com\/"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2401.04088"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2310.06770"},{"key":"e_1_2_1_37_1","volume-title":"Lei Xu, Weidong Shi, and Mohammad Amin Alipour.","author":"Karanjai Rabimba","year":"2024","unstructured":"Rabimba Karanjai, Aftab Hussain, Md Rafiqul Islam Rabin, Lei Xu, Weidong Shi, and Mohammad Amin Alipour. 2024. Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing. arxiv:2407.05202. arxiv:2407.05202"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2303.03004"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC\/FSE","author":"Laaber Christoph","year":"2020","unstructured":"Christoph Laaber, Stefan W\u00fcrsten, Harald C. Gall, and Philipp Leitner. 2020. Dynamically reconfiguring software microbenchmarks: reducing execution time without sacrificing result quality. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC\/FSE 2020). Association for Computing Machinery, New York, NY, USA. 989\u20131001. isbn:9781450370431 https:\/\/doi.org\/10.1145\/3368089.3409683 10.1145\/3368089.3409683"},{"key":"e_1_2_1_40_1","volume-title":"Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022","author":"Le Hung","year":"2022","unstructured":"Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Silvio Savarese, and Steven Chu-Hong Hoi. 2022. CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. http:\/\/papers.nips.cc\/paper_files\/paper\/2022\/hash\/8636419dea1aa9fbd25fc4248e702da4-Abstract-Conference.html"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2404.13340"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2305.06161"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","unstructured":"Yujia Li David Choi Junyoung Chung Nate Kushman Julian Schrittwieser R\u00e9mi Leblond Tom Eccles James Keeling Felix Gimeno Agustin Dal Lago Thomas Hubert Peter Choy Cyprien de Masson d\u2019Autume Igor Babuschkin Xinyun Chen Po-Sen Huang Johannes Welbl Sven Gowal Alexey Cherepanov James Molloy Daniel J. Mankowitz Esme Sutherland Robson Pushmeet Kohli Nando de Freitas Koray Kavukcuoglu and Oriol Vinyals. 2022. Competition-level code generation with AlphaCode. Science 378 6624 (2022) 1092\u20131097. https:\/\/doi.org\/10.1126\/science.abq1158 arxiv:https:\/\/www.science.org\/doi\/pdf\/10.1126\/science.abq1158. 10.1126\/science.abq1158","DOI":"10.1126\/science.abq1158"},{"key":"e_1_2_1_44_1","unstructured":"Jiawei Liu Thanh Nguyen Mingyue Shang Hantian Ding Xiaopeng Li Yu Yu Varun Kumar and Zijian Wang. 2024. Learning Code Preference via Synthetic Evolution. arxiv:2410.03837. arxiv:2410.03837"},{"key":"e_1_2_1_45_1","volume-title":"Rigorous Evaluation of Large Language Models for Code Generation. In Thirty-seventh Conference on Neural Information Processing Systems. https:\/\/openreview.net\/forum?id=1qvx610Cu7","author":"Liu Jiawei","year":"2023","unstructured":"Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and LINGMING ZHANG. 2023. Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. In Thirty-seventh Conference on Neural Information Processing Systems. https:\/\/openreview.net\/forum?id=1qvx610Cu7"},{"key":"e_1_2_1_46_1","volume-title":"Yuyao Wang, and Lingming Zhang.","author":"Liu Jiawei","year":"2023","unstructured":"Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2023. The MBPP Plus benchmark. https:\/\/github.com\/evalplus\/evalplus\/releases\/tag\/v0.2.1 https:\/\/github.com\/evalplus\/evalplus\/releases\/tag\/v0.2.1"},{"key":"e_1_2_1_47_1","volume-title":"Yuyao Wang, and LINGMING ZHANG.","author":"Liu Jiawei","year":"2024","unstructured":"Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and LINGMING ZHANG. 2024. EvalPlus Leaderboard. https:\/\/evalplus.github.io\/leaderboard.html https:\/\/evalplus.github.io\/leaderboard.html"},{"key":"e_1_2_1_48_1","unstructured":"Jiawei Liu Songrun Xie Junhao Wang Yuxiang Wei Yifeng Ding and Lingming Zhang. 2024. Evaluating Language Models for Efficient Code Generation. arxiv:2408.06450. arxiv:2408.06450"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2404.10304"},{"key":"e_1_2_1_50_1","volume-title":"9th International Conference on Learning Representations, ICLR 2021","author":"Liu Shangqing","year":"2021","unstructured":"Shangqing Liu, Yu Chen, Xiaofei Xie, Jing Kai Siow, and Yang Liu. 2021. Retrieval-Augmented Generation for Code Summarization via Hybrid GNN. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https:\/\/openreview.net\/forum?id=zv-typ1gPxA"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2306.03091"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2402.19173"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2306.08568"},{"key":"e_1_2_1_54_1","unstructured":"Meta. 2024. Llama3. https:\/\/ai.meta.com\/blog\/meta-llama-3\/ https:\/\/ai.meta.com\/blog\/meta-llama-3\/"},{"key":"e_1_2_1_55_1","unstructured":"Meta. 2024. Llama3.1. https:\/\/ai.meta.com\/blog\/meta-llama-3-1\/ https:\/\/ai.meta.com\/blog\/meta-llama-3-1\/"},{"key":"e_1_2_1_56_1","volume-title":"OctoPack: Instruction Tuning Code Large Language Models. In The Twelfth International Conference on Learning Representations, ICLR 2024","author":"Muennighoff Niklas","year":"2024","unstructured":"Niklas Muennighoff, Qian Liu, Armel Randy Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro von Werra, and Shayne Longpre. 2024. OctoPack: Instruction Tuning Code Large Language Models. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. https:\/\/openreview.net\/forum?id=mw1PWNSWZP"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2406.12952"},{"key":"e_1_2_1_58_1","volume-title":"International Conference on Machine Learning, ICML 2023","volume":"26128","author":"Ni Ansong","year":"2023","unstructured":"Ansong Ni, Srini Iyer, Dragomir Radev, Veselin Stoyanov, Wen-Tau Yih, Sida I. Wang, and Xi Victoria Lin. 2023. LEVER: Learning to Verify Language-to-Code Generation with Execution. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Research, Vol. 202). PMLR, 26106\u201326128. https:\/\/proceedings.mlr.press\/v202\/ni23b.html"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2305.02309"},{"key":"e_1_2_1_60_1","volume-title":"CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023","author":"Nijkamp Erik","year":"2023","unstructured":"Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https:\/\/openreview.net\/pdf?id=iaYcJKpY2B_"},{"key":"e_1_2_1_61_1","unstructured":"OpenAI. 2022. ChatGPT. https:\/\/openai.com\/blog\/chatgpt https:\/\/openai.com\/blog\/chatgpt"},{"key":"e_1_2_1_62_1","unstructured":"OpenAI. 2023. GPT-4 Technical Report. CoRR abs\/2303.08774 (2023) https:\/\/doi.org\/10.48550\/ARXIV.2303.08774 arXiv:2303.08774. 10.48550\/ARXIV.2303.08774"},{"key":"e_1_2_1_63_1","unstructured":"OpenAI. 2024. GPT-4o. https:\/\/openai.com\/index\/hello-gpt-4o\/ https:\/\/openai.com\/index\/hello-gpt-4o\/"},{"key":"e_1_2_1_64_1","unstructured":"OpenAI. 2024. OpenAI API. https:\/\/openai.com\/api\/ https:\/\/openai.com\/api\/"},{"key":"e_1_2_1_65_1","volume-title":"Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, and Chong Zhang. 2022. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. http:\/\/papers.nips.cc\/paper_files\/paper\/2022\/hash\/b1efde53be364a73914f58805a001731-Abstract-Conference.html"},{"key":"e_1_2_1_66_1","volume-title":"Bissyand\u00e9","author":"Ou\u00e9draogo Wendk\u00fbuni C.","year":"2024","unstructured":"Wendk\u00fbuni C. Ou\u00e9draogo, Kader Kabor\u00e9, Haoye Tian, Yewei Song, Anil Koyuncu, Jacques Klein, David Lo, and Tegawend\u00e9 F. Bissyand\u00e9. 2024. Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation. arxiv:2407.00225. arxiv:2407.00225"},{"key":"e_1_2_1_67_1","volume-title":"Retrieval Augmented Code Generation and Summarization. In Findings of the Association for Computational Linguistics: EMNLP 2021","author":"Rizwan Parvez Md.","year":"2021","unstructured":"Md. Rizwan Parvez, Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Retrieval Augmented Code Generation and Summarization. In Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event \/ Punta Cana, Dominican Republic, 16-20 November, 2021. Association for Computational Linguistics, 2719\u20132734. https:\/\/doi.org\/10.18653\/V1\/2021.FINDINGS-EMNLP.232 10.18653\/V1\/2021.FINDINGS-EMNLP.232"},{"key":"e_1_2_1_68_1","volume-title":"Hennessy","author":"Patterson David A.","year":"2012","unstructured":"David A. Patterson and John L. Hennessy. 2012. Computer Organization and Design - The Hardware \/ Software Interface (Revised 4th Edition). Academic Press. isbn:978-0-12-374750-1 http:\/\/www.elsevierdirect.com\/product.jsp?isbn=9780123747501"},{"key":"e_1_2_1_69_1","volume-title":"Michael Lyu, Caiming Xiong, Silvio Savarese, and Doyen Sahoo.","author":"Peng Yun","year":"2024","unstructured":"Yun Peng, Akhilesh Deepak Gotmare, Michael Lyu, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. 2024. PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback. arxiv:2412.03578. arxiv:2412.03578"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2401.08500"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2308.12950"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2406.07021"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2023.3334955"},{"key":"e_1_2_1_74_1","volume-title":"Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023","author":"Shinn Noah","year":"2023","unstructured":"Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: language agents with verbal reinforcement learning. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. http:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/1b44b878bb782e6954cd888628510e90-Abstract-Conference.html"},{"key":"e_1_2_1_75_1","volume-title":"Learning Performance-Improving Code Edits. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=ix7rLVHXyY","author":"Shypula Alexander G","year":"2024","unstructured":"Alexander G Shypula, Aman Madaan, Yimeng Zeng, Uri Alon, Jacob R. Gardner, Yiming Yang, Milad Hashemi, Graham Neubig, Parthasarathy Ranganathan, Osbert Bastani, and Amir Yazdanbakhsh. 2024. Learning Performance-Improving Code Edits. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=ix7rLVHXyY"},{"key":"e_1_2_1_76_1","unstructured":"Matt Stuchlik Bruno P. Kinoshita and Donald Lee. 2024. The Cirron library. https:\/\/github.com\/s7nfo\/Cirron https:\/\/github.com\/s7nfo\/Cirron"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2402.12317"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-022-10247-x"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2406.04531"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2402.01030"},{"key":"e_1_2_1_81_1","volume-title":"Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023","author":"Wang Yue","year":"2023","unstructured":"Yue Wang, Hung Le, Akhilesh Gotmare, Nghi D. Q. Bui, Junnan Li, and Steven C. H. Hoi. 2023. CodeT5+: Open Code Large Language Models for Code Understanding and Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023. Association for Computational Linguistics, 1069\u20131088. https:\/\/doi.org\/10.18653\/V1\/2023.EMNLP-MAIN.68 10.18653\/V1\/2023.EMNLP-MAIN.68"},{"key":"e_1_2_1_82_1","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event \/ Punta Cana","author":"Wang Yue","year":"2021","unstructured":"Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event \/ Punta Cana, Dominican Republic, 7-11 November, 2021. Association for Computational Linguistics, 8696\u20138708. https:\/\/doi.org\/10.18653\/V1\/2021.EMNLP-MAIN.685 10.18653\/V1\/2021.EMNLP-MAIN.685"},{"key":"e_1_2_1_83_1","first-page":"2022","volume-title":"Trans. Mach. Learn. Res.","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. 2022. Emergent Abilities of Large Language Models. Trans. Mach. Learn. Res., 2022 (2022), https:\/\/openreview.net\/forum?id=yzkSU5zdwD"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2312.02120"},{"key":"e_1_2_1_85_1","unstructured":"Papers with Code. 2024. The Leaderboard of APPS benchmark on Papers with Code. https:\/\/paperswithcode.com\/sota\/code-generation-on-apps https:\/\/paperswithcode.com\/sota\/code-generation-on-apps"},{"key":"e_1_2_1_86_1","unstructured":"Papers with Code. 2024. The Leaderboard of the Code Contests benchmark. https:\/\/paperswithcode.com\/sota\/code-generation-on-codecontests https:\/\/paperswithcode.com\/sota\/code-generation-on-codecontests"},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2405.15793"},{"key":"e_1_2_1_88_1","volume-title":"Jin Liu, and Xin Xia.","author":"Yu Xiao","year":"2024","unstructured":"Xiao Yu, Lei Liu, Xing Hu, Jacky Wai Keung, Jin Liu, and Xin Xia. 2024. Where Are Large Language Models for Code Generation on GitHub? arxiv:2406.19544. arxiv:2406.19544"},{"key":"e_1_2_1_89_1","volume-title":"2009 IEEE International Symposium on Performance Analysis of Systems and Software. 23\u201332","author":"Zaparanuks Dmitrijs","year":"2009","unstructured":"Dmitrijs Zaparanuks, Milan Jovic, and Matthias Hauswirth. 2009. Accuracy of performance counter measurements. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software. 23\u201332. https:\/\/doi.org\/10.1109\/ISPASS.2009.4919635 10.1109\/ISPASS.2009.4919635"},{"key":"e_1_2_1_90_1","volume-title":"Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023","author":"Zhang Fengji","year":"2023","unstructured":"Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, and Weizhu Chen. 2023. RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023. Association for Computational Linguistics, 2471\u20132484. https:\/\/doi.org\/10.18653\/V1\/2023.EMNLP-MAIN.151 10.18653\/V1\/2023.EMNLP-MAIN.151"},{"key":"e_1_2_1_91_1","volume-title":"Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023","author":"Zhang Fengji","year":"2023","unstructured":"Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, and Weizhu Chen. 2023. RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023. Association for Computational Linguistics, 2471\u20132484. https:\/\/doi.org\/10.18653\/V1\/2023.EMNLP-MAIN.151 10.18653\/V1\/2023.EMNLP-MAIN.151"},{"key":"e_1_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2404.05427"},{"key":"e_1_2_1_93_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2303.17568"},{"key":"e_1_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2310.04406"},{"key":"e_1_2_1_95_1","volume-title":"The Eleventh International Conference on Learning Representations, ICLR 2023","author":"Zhou Shuyan","year":"2023","unstructured":"Shuyan Zhou, Uri Alon, Frank F. Xu, Zhengbao Jiang, and Graham Neubig. 2023. DocPrompting: Generating Code by Retrieving the Docs. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https:\/\/openreview.net\/forum?id=ZTCxT2t2Ru"},{"key":"e_1_2_1_96_1","unstructured":"Qihao Zhu Daya Guo Zhihong Shao Dejian Yang Peiyi Wang Runxin Xu Y Wu Yukun Li Huazuo Gao and Shirong Ma. 2024. DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence. arXiv preprint arXiv:2406.11931."}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3715727","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:21:14Z","timestamp":1750346474000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715727"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":96,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2025,6,19]]}},"alternative-id":["10.1145\/3715727"],"URL":"https:\/\/doi.org\/10.1145\/3715727","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,19]]}}}