{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T15:55:03Z","timestamp":1777564503724,"version":"3.51.4"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","license":[{"start":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T00:00:00Z","timestamp":1720742400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Youth Innovation Promotion Association Chinese Academy of Sciences, Basic Research Program of ISCAS","award":["No. ISCAS-JCZD-202304"],"award-info":[{"award-number":["No. ISCAS-JCZD-202304"]}]},{"name":"Major Program of ISCAS","award":["No. ISCAS-ZD-202302"],"award-info":[{"award-number":["No. ISCAS-ZD-202302"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["No. 62332001, No. 62232016, No. 62072442, and No. 62272445"],"award-info":[{"award-number":["No. 62332001, No. 62232016, No. 62072442, and No. 62272445"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2024,7,12]]},"abstract":"<jats:p>\n                    Large Language Models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in automatically generating code from provided natural language requirements. However, in real-world practice, it is inevitable that the requirements written by users might be ambiguous or insufficient. Current LLMs will directly generate programs according to those unclear requirements, regardless of interactive clarification, which will likely deviate from the original user intents. To bridge that gap, we introduce a novel framework named C\n                    <jats:sc>larify<\/jats:sc>\n                    GPT, which aims to enhance code generation by empowering LLMs with the ability to identify ambiguous requirements and ask targeted clarifying questions. Specifically, C\n                    <jats:sc>larify<\/jats:sc>\n                    GPT first detects whether a given requirement is ambiguous by performing a code consistency check. If it is ambiguous, C\n                    <jats:sc>larify<\/jats:sc>\n                    GPT prompts an LLM to generate targeted clarifying questions. After receiving question responses, C\n                    <jats:sc>larify<\/jats:sc>\n                    GPT refines the ambiguous requirement and inputs it into the same LLM to generate a final code solution. To evaluate our C\n                    <jats:sc>larify<\/jats:sc>\n                    GPT, we invite ten participants to use C\n                    <jats:sc>larify<\/jats:sc>\n                    GPT for code generation on two benchmarks: MBPP-sanitized and MBPP-ET. The results show that C\n                    <jats:sc>larify<\/jats:sc>\n                    GPT elevates the performance (Pass@1) of GPT-4 from 70.96% to 80.80% on MBPP-sanitized. Furthermore, to conduct large-scale automated evaluations of C\n                    <jats:sc>larify<\/jats:sc>\n                    GPT across different LLMs and benchmarks without requiring user participation, we introduce a high-fidelity simulation method to simulate user responses. The results demonstrate that C\n                    <jats:sc>larify<\/jats:sc>\n                    GPT can significantly enhance code generation performance compared to the baselines. In particular, C\n                    <jats:sc>larify<\/jats:sc>\n                    GPT improves the average performance of GPT-4 and ChatGPT across five benchmarks from 62.43% to 69.60% and from 54.32% to 62.37%, respectively. A human evaluation also confirms the effectiveness of C\n                    <jats:sc>larify<\/jats:sc>\n                    GPT in detecting ambiguous requirements and generating high-quality clarifying questions. We believe that C\n                    <jats:sc>larify<\/jats:sc>\n                    GPT can effectively facilitate the practical application of LLMs in real-world development environments.\n                  <\/jats:p>","DOI":"10.1145\/3660810","type":"journal-article","created":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T10:22:09Z","timestamp":1720779729000},"page":"2332-2354","source":"Crossref","is-referenced-by-count":57,"title":["ClarifyGPT: A Framework for Enhancing LLM-Based Code Generation via Requirements Clarification"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8577-7932","authenticated-orcid":false,"given":"Fangwen","family":"Mu","sequence":"first","affiliation":[{"name":"Institute of Software at Chinese Academy of Sciences, Beijing, China"},{"name":"University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1476-7213","authenticated-orcid":false,"given":"Lin","family":"Shi","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0617-2877","authenticated-orcid":false,"given":"Song","family":"Wang","sequence":"additional","affiliation":[{"name":"York University, Toronto, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-0037-0396","authenticated-orcid":false,"given":"Zhuohao","family":"Yu","sequence":"additional","affiliation":[{"name":"Institute of Software at Chinese Academy of Sciences, Beijing, China"},{"name":"University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6468-9842","authenticated-orcid":false,"given":"Binquan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-7552-4649","authenticated-orcid":false,"given":"ChenXue","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Software at Chinese Academy of Sciences, Beijing, China"},{"name":"Harbin Institute of Technology, Harbin, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-7186-3096","authenticated-orcid":false,"given":"Shichao","family":"Liu","sequence":"additional","affiliation":[{"name":"Software Huawei Central Software Institute, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2618-5694","authenticated-orcid":false,"given":"Qing","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Software at Chinese Academy of Sciences, Beijing, China"},{"name":"University of Chinese Academy of Sciences, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2024,7,12]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"2023. CoderEval. https:\/\/github.com\/CoderEval\/CoderEval."},{"key":"e_1_3_1_3_2","unstructured":"2023. Website. https:\/\/github.com\/ClarifyGPT\/ClarifyGPT."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","unstructured":"Wasi Uddin Ahmad Saikat Chakraborty Baishakhi Ray and Kai-Wei Chang. 2021. Unified Pre-training for Program Understanding and Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies NAACL-HLT 2021 Online June 6-11 2021. 2655-2668. https:\/\/doi.org\/10.18653\/v1\/2021.naacl-main.211 10.18653\/v1\/2021.naacl-main.211","DOI":"10.18653\/v1\/2021.naacl-main.211"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331265"},{"key":"e_1_3_1_6_2","unstructured":"Jacob Austin Augustus Odena Maxwell I. Nye Maarten Bosma Henryk Michalewski David Dohan Ellen Jiang Carrie J. Cai Michael Terry Quoc V. Le and Charles Sutton. 2021. Program Synthesis with Large Language Models. CoRR abs\/2108.07732 (2021). arXiv:2108.07732 https:\/\/arxiv.org\/abs\/2108.07732"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","unstructured":"Sid Black Stella Biderman Eric Hallahan Quentin Anthony Leo Gao Laurence Golding Horace He Connor Leahy Kyle McDonell Jason Phang Michael Pieler USVSN Sai Prashanth Shivanshu Purohit Laria Reynolds Jonathan Tow Ben Wang and Samuel Weinbach. 2022. GPT-NeoX-20B: An Open-Source Autoregressive Language Model. CoRR abs\/2204.06745 (2022). https:\/\/doi.org\/10.48550\/arXiv.2204.06745 10.48550\/arXiv.2204.06745 arXiv:2204.06745","DOI":"10.48550\/arXiv.2204.06745"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","unstructured":"S\u00e9bastien Bubeck Varun Chandrasekaran Ronen Eldan Johannes Gehrke Eric Horvitz Ece Kamar Peter Lee Yin Tat Lee Yuanzhi Li Scott M. Lundberg Harsha Nori Hamid Palangi Marco Tulio Ribeiro and Yi Zhang. 2023. Sparks of Artificial General Intelligence: Early experiments with GPT-4. CoRR abs\/2303.12712 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2303.12712 10.48550\/ARXIV.2303.12712 arXiv:2303.12712","DOI":"10.48550\/ARXIV.2303.12712"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","unstructured":"Bei Chen Fengji Zhang Anh Nguyen Daoguang Zan Zeqi Lin Jian-Guang Lou and Weizhu Chen. 2022. CodeT: Code Generation with Generated Tests. CoRR abs\/2207.10397 (2022). https:\/\/doi.org\/10.48550\/arXiv.2207.10397 10.48550\/arXiv.2207.10397 arXiv:2207.10397","DOI":"10.48550\/arXiv.2207.10397"},{"key":"e_1_3_1_10_2","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Pond\u00e9 de Oliveira Pinto Jared Kaplan Harrison Edwards Yuri Burda Nicholas Joseph Greg Brockman Alex Ray Raul Puri Gretchen Krueger Michael Petrov Heidy Khlaaf Girish Sastry Pamela Mishkin Brooke Chan Scott Gray Nick Ryder Mikhail Pavlov Alethea Power Lukasz Kaiser Mohammad Bavarian Clemens Winter Philippe Tillet Felipe Petroski Such Dave Cummings Matthias Plappert Fotios Chantzis Elizabeth Barnes Ariel Herbert-Voss William Hebgen Guss Alex Nichol Alex Paino Nikolas Tezak Jie Tang Igor Babuschkin Suchir Balaji Shantanu Jain William Saunders Christopher Hesse Andrew N. Carr Jan Leike Joshua Achiam Vedant Misra Evan Morikawa Alec Radford Matthew Knight Miles Brundage Mira Murati Katie Mayer Peter Welinder Bob McGrew Dario Amodei Sam McCandlish Ilya Sutskever and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. CoRR abs\/2107.03374 (2021). arXiv:2107.03374 https:\/\/arxiv.org\/abs\/2107.03374"},{"key":"e_1_3_1_11_2","unstructured":"Kaustubh D. Dhole. 2020. Resolving Intent Ambiguities by Retrieving Discriminative Clarifying Questions. CoRR abs\/2008.07559 (2020). arXiv:2008.07559 https:\/\/arxiv.org\/abs\/2008.07559"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","unstructured":"Yihong Dong Jiazheng Ding Xue Jiang Zhuo Li Ge Li and Zhi Jin. 2023. CodeScore: Evaluating Code Generation by Learning Code Execution. CoRR abs\/2301.09043 (2023). https:\/\/doi.org\/10.48550\/arXiv.2301.09043 10.48550\/arXiv.2301.09043 arXiv:2301.09043","DOI":"10.48550\/arXiv.2301.09043"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","unstructured":"Yihong Dong Xue Jiang Zhi Jin and Ge Li. 2023. Self-collaboration Code Generation via ChatGPT. CoRR abs\/2304.07590 (2023). https:\/\/doi.org\/10.48550\/arXiv.2304.07590 10.48550\/arXiv.2304.07590 arXiv:2304.07590","DOI":"10.48550\/arXiv.2304.07590"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/SANER53432.2022.00028"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","unstructured":"Daniel Fried Armen Aghajanyan Jessy Lin Sida Wang Eric Wallace Freda Shi Ruiqi Zhong Wen-tau Yih Luke Zettlemoyer and Mike Lewis. 2022. InCoder: A Generative Model for Code Infilling and Synthesis. CoRR abs\/2204.05999 (2022). https:\/\/doi.org\/10.48550\/arXiv.2204.05999 10.48550\/arXiv.2204.05999 arXiv:2204.05999","DOI":"10.48550\/arXiv.2204.05999"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","unstructured":"Shuzheng Gao Xin-Cheng Wen Cuiyun Gao Wenxuan Wang and Michael R. Lyu. 2023. Constructing Effective In-Context Demonstration for Code Intelligence Tasks: An Empirical Study. CoRR abs\/2304.07575 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2304.07575 10.48550\/ARXIV.2304.07575 arXiv:2304.07575","DOI":"10.48550\/ARXIV.2304.07575"},{"issue":"5","key":"e_1_3_1_17_2","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1002\/(SICI)1097-4571(199007)41:5<313::AID-ASI1>3.0.CO;2-G","article-title":"Evaluating the effectiveness of information retrieval systems using simulated queries","volume":"41","author":"Gordon Michael D","year":"1990","unstructured":"Michael D Gordon. 1990. Evaluating the effectiveness of information retrieval systems using simulated queries. Journal of the American Society for Information Science 41, 5 (1990), 313-323.","journal-title":"Journal of the American Society for Information Science"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","unstructured":"Xue Jiang Yihong Dong Lecheng Wang Qiwei Shang and Ge Li. 2023. Self-planning Code Generation with Large Language Model. CoRR abs\/2303.06689 (2023). https:\/\/doi.org\/10.48550\/arXiv.2303.06689 10.48550\/arXiv.2303.06689 arXiv:2303.06689","DOI":"10.48550\/arXiv.2303.06689"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3534965"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","unstructured":"Takeshi Kojima Shixiang Shane Gu Machel Reid Yutaka Matsuo and Yusuke Iwasawa. 2022. Large Language Models are Zero-Shot Reasoners. CoRR abs\/2205.11916 (2022). https:\/\/doi.org\/10.48550\/arXiv.2205.11916 10.48550\/arXiv.2205.11916 arXiv:2205.11916","DOI":"10.48550\/arXiv.2205.11916"},{"key":"e_1_3_1_21_2","unstructured":"Dmitrii Krasheninnikov Egor Krasheninnikov and David Krueger. 2022. Assistance with large language models. In NeurIPS ML Safety Workshop."},{"key":"e_1_3_1_22_2","unstructured":"Lorenz Kuhn Yarin Gal and Sebastian Farquhar. 2023. CLAM: Selective Clarification for Ambiguous Questions with Generative Language Models. (2023)."},{"key":"e_1_3_1_23_2","unstructured":"Sumith Kulal Panupong Pasupat Kartik Chandra Mina Lee Oded Padon Alex Aiken and Percy Liang. 2019. SPoC: Search-based Pseudocode to Code. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 NeurIPS 2019 December 8-14 2019 Vancouver BC Canada. 11883-11894. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/7298332f04ac004a0ca44cc69ecf6f6b-Abstract.html"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","unstructured":"Shuvendu K. Lahiri Aaditya Naik Georgios Sakkas Piali Choudhury Curtis von Veh Madanlal Musuvathi Jeevana Priya Inala Chenglong Wang and Jianfeng Gao. 2022. Interactive Code Generation via Test-Driven User-Intent Formalization. CoRR abs\/2208.05950 (2022). https:\/\/doi.org\/10.48550\/arXiv.2208.05950 10.48550\/arXiv.2208.05950 arXiv:2208.05950","DOI":"10.48550\/arXiv.2208.05950"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00085"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.799"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","unstructured":"Jia Li Ge Li Yongmin Li and Zhi Jin. 2023. Enabling Programming Thinking in Large Language Models Toward Code Generation. CoRR abs\/2305.06599 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2305.06599 10.48550\/ARXIV.2305.06599 arXiv:2305.06599","DOI":"10.48550\/ARXIV.2305.06599"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","unstructured":"Yujia Li David H. Choi Junyoung Chung Nate Kushman Julian Schrittwieser R\u00e9mi Leblond Tom Eccles James Keeling Felix Gimeno Agustin Dal Lago Thomas Hubert Peter Choy Cyprien de Masson d\u2019Autume Igor Babuschkin Xinyun Chen Po-Sen Huang Johannes Welbl Sven Gowal Alexey Cherepanov James Molloy Daniel J. Mankowitz Esme Sutherland Robson Pushmeet Kohli Nando de Freitas Koray Kavukcuoglu and Oriol Vinyals. 2022. Competition-Level Code Generation with AlphaCode. CoRR abs\/2203.07814 (2022). https:\/\/doi.org\/10.48550\/arXiv.2203.07814 10.48550\/arXiv.2203.07814 arXiv:2203.07814","DOI":"10.48550\/arXiv.2203.07814"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","unstructured":"Chao Liu Xuanlin Bao Hongyu Zhang Neng Zhang Haibo Hu Xiaohong Zhang and Meng Yan. 2023. Improving ChatGPT Prompt for Code Generation. CoRR abs\/2305.08360 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2305.08360 10.48550\/ARXIV.2305.08360 arXiv:2305.08360","DOI":"10.48550\/ARXIV.2305.08360"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","unstructured":"Jiawei Liu Chunqiu Steven Xia Yuyao Wang and Lingming Zhang. 2023. Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. CoRR abs\/2305.01210 (2023). https:\/\/doi.org\/10.48550\/arXiv.2305.01210 10.48550\/arXiv.2305.01210 arXiv:2305.01210","DOI":"10.48550\/arXiv.2305.01210"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2020.EMNLP-MAIN.466"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00205"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","unstructured":"Feng Nie Meixi Chen Zhirui Zhang and Xu Cheng. 2022. Improving Few-Shot Performance of Language Models via Nearest Neighbor Calibration. CoRR abs\/2212.02216 (2022). https:\/\/doi.org\/10.48550\/ARXIV.2212.02216 10.48550\/ARXIV.2212.02216 arXiv:2212.02216","DOI":"10.48550\/ARXIV.2212.02216"},{"key":"e_1_3_1_34_2","unstructured":"Erik Nijkamp Bo Pang Hiroaki Hayashi Lifu Tu Huan Wang Yingbo Zhou Silvio Savarese and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations ICLR 2023 Kigali Rwanda May 1-5 2023. OpenReview.net. https:\/\/openreview.net\/pdf?id=iaYcJKpY2B_"},{"key":"e_1_3_1_35_2","unstructured":"OpenAI. 2022. ChatGPT. https:\/\/openai.com\/blog\/chatgpt\/."},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","unstructured":"OpenAI. 2023. GPT-4 Technical Report. CoRR abs\/2303.08774 (2023). https:\/\/doi.org\/10.48550\/arXiv.2303.08774 10.48550\/arXiv.2303.08774 arXiv:2303.08774","DOI":"10.48550\/arXiv.2303.08774"},{"key":"e_1_3_1_37_2","unstructured":"Anton Osika. 2023. GPT-Engineer. https:\/\/github.com\/AntonOsika\/gpt-engineer\/."},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/n19-1013"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","unstructured":"Max Sch\u00e4fer Sarah Nadi Aryaz Eghbali and Frank Tip. 2023. Adaptive Test Generation Using a Large Language Model. CoRR abs\/2302.06527 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2302.06527 10.48550\/ARXIV.2302.06527 arXiv:2302.06527","DOI":"10.48550\/ARXIV.2302.06527"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3488560.3498440"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","unstructured":"Disha Shrivastava Hugo Larochelle and Daniel Tarlow. 2022. Repository-Level Prompt Generation for Large Language Models of Code. CoRR abs\/2206.12839 (2022). https:\/\/doi.org\/10.48550\/arXiv.2206.12839 10.48550\/arXiv.2206.12839 arXiv:2206.12839","DOI":"10.48550\/arXiv.2206.12839"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-15712-8_18"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","unstructured":"Vasudev Vikram Caroline Lemieux and Rohan Padhye. 2023. Can Large Language Models Write Good Property-Based Tests? CoRR abs\/2307.04346 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2307.04346 10.48550\/ARXIV.2307.04346 arXiv:2307.04346","DOI":"10.48550\/ARXIV.2307.04346"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482199"},{"key":"e_1_3_1_45_2","unstructured":"Xuezhi Wang Jason Wei Dale Schuurmans Quoc V. Le Ed H. Chi Sharan Narang Aakanksha Chowdhery and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. In The Eleventh International Conference on Learning Representations ICLR 2023 Kigali Rwanda May 1-5 2023. OpenReview.net. https:\/\/openreview.net\/pdf?id=1PL1NIMMrw"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.685"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","unstructured":"Zhiruo Wang Shuyan Zhou Daniel Fried and Graham Neubig. 2022. Execution-Based Evaluation for Open-Domain Code Generation. CoRR abs\/2212.10481 (2022). https:\/\/doi.org\/10.48550\/arXiv.2212.10481 10.48550\/arXiv.2212.10481 arXiv:2212.10481","DOI":"10.48550\/arXiv.2212.10481"},{"key":"e_1_3_1_48_2","unstructured":"Jason Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Ed H. Chi Quoc Le and Denny Zhou. 2022. Chain of Thought Prompting Elicits Reasoning in Large Language Models. CoRR abs\/2201.11903 (2022). arXiv:2201.11903 https:\/\/arxiv.org\/abs\/2201.11903"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3520312.3534862"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3623316"},{"key":"e_1_3_1_51_2","unstructured":"Michal Zalewski. 2018. American fuzzing lop. https:\/\/lcamtuf.coredump.cx\/afl\/."},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSRE.2011.26"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","unstructured":"Tianyi Zhang Tao Yu Tatsunori B. Hashimoto Mike Lewis Wen-tau Yih Daniel Fried and Sida I. Wang. 2022. Coder Reviewer Reranking for Code Generation. CoRR abs\/2211.16490 (2022). https:\/\/doi.org\/10.48550\/arXiv.2211.16490 10.48550\/arXiv.2211.16490 arXiv:2211.16490","DOI":"10.48550\/arXiv.2211.16490"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3660810","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3660810","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T08:03:21Z","timestamp":1770192201000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3660810"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,12]]},"references-count":52,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2024,7,12]]}},"alternative-id":["10.1145\/3660810"],"URL":"https:\/\/doi.org\/10.1145\/3660810","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,12]]}}}