{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T03:38:38Z","timestamp":1776310718933,"version":"3.50.1"},"reference-count":73,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2025,1,22]],"date-time":"2025-01-22T00:00:00Z","timestamp":1737504000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"UKRI Centre for Doctoral Training in Safe and Trusted Artificial Intelligence","award":["EP\/S023356\/1"],"award-info":[{"award-number":["EP\/S023356\/1"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2025,2,28]]},"abstract":"<jats:p>There has been a recent explosion of research on Large Language Models (LLMs) for software engineering tasks, in particular code generation. However, results from LLMs can be highly unstable; non-deterministically returning very different code for the same prompt. Such non-determinism affects the correctness and consistency of the generated code, undermines developers\u2019 trust in LLMs, and yields low reproducibility in LLM-based papers. Nevertheless, there is no work investigating how serious this non-determinism threat is.<\/jats:p>\n          <jats:p>\n            To fill this gap, this article conducts an empirical study on the non-determinism of ChatGPT in code generation. We chose to study ChatGPT because it is already highly prevalent in the code generation research literature. We report results from a study of 829 code generation problems across three code generation benchmarks (i.e., CodeContests, APPS and HumanEval) with three aspects of code similarities: semantic similarity, syntactic similarity, and structural similarity. Our results reveal that ChatGPT exhibits a high degree of non-determinism under the default setting: the ratio of coding tasks with zero equal test output across different requests is 75.76%, 51.00% and 47.56% for three different code generation datasets (i.e., CodeContests, APPS and HumanEval), respectively. In addition, we find that setting the\n            <jats:italic>temperature<\/jats:italic>\n            to 0 does not guarantee determinism in code generation, although it indeed brings less non-determinism than the default configuration (\n            <jats:italic>temperature<\/jats:italic>\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(=\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            1). In order to put LLM-based research on firmer scientific foundations, researchers need to take into account non-determinism in drawing their conclusions.\n          <\/jats:p>","DOI":"10.1145\/3697010","type":"journal-article","created":{"date-parts":[[2024,9,26]],"date-time":"2024-09-26T15:43:55Z","timestamp":1727365435000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":129,"title":["An Empirical Study of the Non-Determinism of ChatGPT in Code Generation"],"prefix":"10.1145","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-0056-3101","authenticated-orcid":false,"given":"Shuyin","family":"Ouyang","sequence":"first","affiliation":[{"name":"King\u2019s College London, London, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0481-7264","authenticated-orcid":false,"given":"Jie M.","family":"Zhang","sequence":"additional","affiliation":[{"name":"King\u2019s College London, London, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5864-4488","authenticated-orcid":false,"given":"Mark","family":"Harman","sequence":"additional","affiliation":[{"name":"University College London, London, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7780-630X","authenticated-orcid":false,"given":"Meng","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Bristol, Bristol, UK"}]}],"member":"320","published-online":{"date-parts":[[2025,1,22]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"GitHub. Retrieved from https:\/\/152334h.github.io\/blog\/non-determinism-in-gpt-4\/"},{"key":"e_1_3_2_3_2","unstructured":"OpenAI. Retrieved from https:\/\/chat.openai.com\/chat"},{"key":"e_1_3_2_4_2","unstructured":"GitHub. Retrieved from https:\/\/github.com\/ShuyinOuyang\/LLM-is-a-box-of-chocolate"},{"key":"e_1_3_2_5_2","unstructured":"Josh Achiam Steven Adler Sandhini Agarwal Lama Ahmad Ilge Akkaya Florencia Leoni Aleman Diogo Almeida Janko Altenschmidt Sam Altman Shyamal Anadkat Red Avila Igor Babuschkin Suchir Balaji Valerie Balcom Paul Baltescu Haiming Bao Mohammad Bavarian Jeff Belgum Irwan Bello Jake Berdine Gabriel Bernadett-Shapiro Christopher Berner Lenny Bogdonoff Oleg Boiko Madelaine Boyd Anna-Luisa Brakman Greg Brockman Tim Brooks Miles Brundage Kevin Button Trevor Cai Rosie Campbell Andrew Cann Brittany Carey Chelsea Carlson Rory Carmichael Brooke Chan Che Chang Fotis Chantzis Derek Chen Sully Chen Ruby Chen Jason Chen Mark Chen Ben Chess Chester Cho Casey Chu Hyung Won Chung Dave Cummings Jeremiah Currier Yunxing Dai Cory Decareaux Thomas Degry Noah Deutsch Damien Deville Arka Dhar David Dohan Steve Dowling Sheila Dunning Adrien Ecoffet Atty Eleti Tyna Eloundou David Farhi Liam Fedus Niko Felix Sim\u00f3n Posada Fishman Juston Forte Isabella Fulford Leo Gao Elie Georges Christian Gibson Vik Goel Tarun Gogineni Gabriel Goh Rapha Gontijo-Lopes Jonathan Gordon Morgan Grafstein Scott Gray Ryan Greene Joshua Gross Shixiang Shane Gu Yufei Guo Chris Hallacy Jesse Han Jeff Harris Yuchen He Mike Heaton Johannes Heidecke Chris Hesse Alan Hickey Wade Hickey Peter Hoeschele Brandon Houghton Kenny Hsu Shengli Hu Xin Hu Joost Huizinga Shantanu Jain and Shawn Jain. 2023. Gpt-4 technical report. arXiv:2303.08774. Retrieved from https:\/\/arxiv.org\/pdf\/2303.08774"},{"key":"e_1_3_2_6_2","unstructured":"Jacob Austin Augustus Odena Maxwell Nye Maarten Bosma Henryk Michalewski David Dohan Ellen Jiang Carrie Cai Michael Terry Quoc Le and Charles Sutton. 2021. Program synthesis with large language models. arXiv:2108.07732. Retrieved from https:\/\/arxiv.org\/pdf\/2108.07732"},{"key":"e_1_3_2_7_2","doi-asserted-by":"crossref","unstructured":"Y. Bang S. Cahyawijaya N. Lee W. Dai D. Su B. Wilie H. Lovenia Z. Ji T. Yu W. Chung Q. V. Do Y. Xu and P. Fung. 2023. A multitask multilingual multimodal evaluation of ChatGPT on reasoning hallucination and interactivity. arXiv:2302.04023. Retrieved from https:\/\/arxiv.org\/pdf\/2302.04023","DOI":"10.18653\/v1\/2023.ijcnlp-main.45"},{"key":"e_1_3_2_8_2","doi-asserted-by":"crossref","unstructured":"Bhavya Bhavya Jinjun Xiong and Chengxiang Zhai. 2022. Analogy generation by prompting large language models: A case study of instructgpt. arXiv:2210.04186. Retrieved from https:\/\/arxiv.org\/pdf\/2210.04186","DOI":"10.18653\/v1\/2022.inlg-main.25"},{"issue":"8","key":"e_1_3_2_9_2","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1109\/TSMC.1978.4310035","article-title":"The inference of regular LISP programs from examples","volume":"8","author":"Biermann Alan W.","year":"1978","unstructured":"Alan W. Biermann. 1978. The inference of regular LISP programs from examples. IEEE Transactions on Systems, Man, and Cybernetics 8, 8 (1978), 585\u2013600.","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics"},{"key":"e_1_3_2_10_2","first-page":"1877","volume-title":"Proceedings of the 34th International Conference on Neural Information Processing Systems","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, 1877\u20131901."},{"key":"e_1_3_2_11_2","unstructured":"S\u00e9bastien Bubeck Varun Chandrasekaran Ronen Eldan Johannes Gehrke Eric Horvitz Ece Kamar Peter Lee Yin Tat Lee Yuanzhi Li Scott Lundberg Harsha Nori Hamid Palangi Marco Tulio Ribeiro and Yi Zhang. 2023. Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv:2303.12712. Retrieved from https:\/\/arxiv.org\/pdf\/2303.12712"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","unstructured":"Subhashis Chatterjee Deepjyoti Saha Akhilesh Sharma and Yogesh Verma. 2022. Reliability and optimal release time analysis for multi up-gradation software with imperfect debugging and varied testing coverage under the effect of random field environments. Annals of Operations Research 312 1 (May 2022) 65\u201385. DOI: 10.1007\/s10479-021-04258-y","DOI":"10.1007\/s10479-021-04258-y"},{"key":"e_1_3_2_13_2","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Ponde de Oliveira Pinto Jared Kaplan Harri Edwards Yuri Burda Nicholas Joseph Greg Brockman Alex Ray Raul Puri Gretchen Krueger Michael Petrov Heidy Khlaaf Girish Sastry Pamela Mishkin Brooke Chan Scott Gray Nick Ryder Mikhail Pavlov Alethea Power Lukasz Kaiser Mohammad Bavarian Clemens Winter Philippe Tillet Felipe Petroski Such Dave Cummings Matthias Plappert Fotios Chantzis Elizabeth Barnes Ariel Herbert-Voss William Hebgen Guss Alex Nichol Alex Paino Nikolas Tezak Jie Tang Igor Babuschkin Suchir Balaji Shantanu Jain William Saunders Christopher Hesse Andrew N. Carr Jan Leike Josh Achiam Vedant Misra Evan Morikawa Alec Radford Matthew Knight Miles Brundage Mira Murati Katie Mayer Peter Welinder Bob McGrew Dario Amodei Sam McCandlish Ilya Sutskever and Wojciech Zaremba. 2021. Evaluating large language models trained on code. arXiv:2107.03374. Retrieved from https:\/\/arxiv.org\/pdf\/2107.03374"},{"key":"e_1_3_2_14_2","doi-asserted-by":"crossref","unstructured":"Yinlin Deng Chunqiu Steven Xia Chenyuan Yang Shizhuo Dylan Zhang Shujing Yang and Lingming Zhang. 2023. Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt. arXiv:2304.02014. Retrieved from https:\/\/arxiv.org\/pdf\/2304.02014","DOI":"10.1145\/3597926.3598067"},{"key":"e_1_3_2_15_2","doi-asserted-by":"crossref","unstructured":"Li Dong and Mirella Lapata. 2016. Language to logical form with neural attention. arXiv:1601.01280. Retrieved from https:\/\/arxiv.org\/pdf\/1601.01280","DOI":"10.18653\/v1\/P16-1004"},{"key":"e_1_3_2_16_2","first-page":"31","volume-title":"2023 IEEE\/ACM International Conference onSoftware Engineering: Future of Software Engineering (ICSE-FoSE)","author":"Fan Angela","year":"2023","unstructured":"Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sengupta, Shin Yoo, and Jie M Zhang. 2023. Large language models for software engineering: Survey and open problems. In 2023 IEEE\/ACM International Conference onSoftware Engineering: Future of Software Engineering (ICSE-FoSE). IEEE, 31\u201353."},{"key":"e_1_3_2_17_2","first-page":"876","volume-title":"Proceedings of the IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)","author":"Feng Yunhe","year":"2023","unstructured":"Yunhe Feng, Sreecharan Vanam, Manasa Cherukupally, Weijian Zheng, Meikang Qiu, and Haihua Chen. 2023. Investigating code generation performance of ChatGPT with crowdsourcing social data. In Proceedings of the IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 876\u2013885."},{"key":"e_1_3_2_18_2","doi-asserted-by":"crossref","unstructured":"Zhangyin Feng Daya Guo Duyu Tang Nan Duan Xiaocheng Feng Ming Gong Linjun Shou Bing Qin Ting Liu Daxin Jiang and Ming Zhou. 2020. Codebert: A pre-trained model for programming and natural languages. arXiv:2002.08155. Retrieved from https:\/\/arxiv.org\/pdf\/2002.08155","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"e_1_3_2_19_2","article-title":"Mathematical capabilities of chatgpt","volume":"36","author":"Frieder Simon","year":"2024","unstructured":"Simon Frieder, Luca Pinchetti, Ryan-Rhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Petersen, and Julius Berner. 2024. Mathematical capabilities of chatgpt. Advances in neural information processing systems 36 (2024).","journal-title":"Advances in neural information processing systems"},{"key":"e_1_3_2_20_2","doi-asserted-by":"crossref","first-page":"202","DOI":"10.1016\/B978-0-934613-03-3.50019-2","volume-title":"Readings in Artificial Intelligence","author":"Green Cordell","year":"1981","unstructured":"Cordell Green. 1981. Application of theorem proving to problem solving. In Readings in Artificial Intelligence. Elsevier, 202\u2013222."},{"issue":"1","key":"e_1_3_2_21_2","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1145\/1925844.1926423","article-title":"Automating string processing in spreadsheets using input-output examples","volume":"46","author":"Gulwani Sumit","year":"2011","unstructured":"Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. ACM Sigplan Notices 46, 1 (2011), 317\u2013330.","journal-title":"ACM Sigplan Notices"},{"issue":"1","key":"e_1_3_2_22_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2500000010","article-title":"Program synthesis","volume":"4","author":"Gulwani Sumit","year":"2017","unstructured":"Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. 2017. Program synthesis. Foundations and Trends\u00ae in Programming Languages 4, 1\u20132 (2017), 1\u2013119.","journal-title":"Foundations and Trends\u00ae in Programming Languages"},{"key":"e_1_3_2_23_2","unstructured":"Daya Guo Shuo Ren Shuai Lu Zhangyin Feng Duyu Tang Shujie Liu Long Zhou Nan Duan Alexey Svyatkovskiy Shengyu Fu Michele Tufano Shao Kun Deng Colin Clement Dawn Drain Neel Sundaresan Jian Yin Daxin Jiang and Ming Zhou. 2020. Graphcodebert: Pre-training code representations with data flow. arXiv:2009.08366. Retrieved from https:\/\/arxiv.org\/pdf\/2009.08366"},{"key":"e_1_3_2_24_2","first-page":"1","volume-title":"Proceedings of the 46th IEEE\/ACM International Conference on Software Engineering","author":"Guo Qi","year":"2024","unstructured":"Qi Guo, Junming Cao, Xiaofei Xie, Shangqing Liu, Xiaohong Li, Bihuan Chen, and Xin Peng. 2024. Exploring the potential of chatgpt in automated code refinement: An empirical study. In Proceedings of the 46th IEEE\/ACM International Conference on Software Engineering, 1\u201313."},{"key":"e_1_3_2_25_2","volume-title":"Proceedings of the 32nd International Conference on Neural Information Processing Systems","author":"Hashimoto Tatsunori B.","year":"2018","unstructured":"Tatsunori B. Hashimoto, Kelvin Guu, Yonatan Oren, and Percy S. Liang. 2018. A retrieve-and-edit framework for predicting structured outputs. In Proceedings of the 32nd International Conference on Neural Information Processing Systems."},{"issue":"2","key":"e_1_3_2_26_2","doi-asserted-by":"crossref","first-page":"62","DOI":"10.3390\/bdcc7020062","article-title":"The role of ChatGPT in data science: How AI-assisted conversational interfaces are revolutionizing the field","volume":"7","author":"Hassani Hossein","year":"2023","unstructured":"Hossein Hassani and Emmanuel Sirmal Silva. 2023. The role of ChatGPT in data science: How AI-assisted conversational interfaces are revolutionizing the field. Big Data and Cognitive Computing 7, 2 (2023), 62.","journal-title":"Big Data and Cognitive Computing"},{"key":"e_1_3_2_27_2","unstructured":"Dan Hendrycks Steven Basart Saurav Kadavath Mantas Mazeika Akul Arora Ethan Guo Collin Burns Samir Puranik Horace He Dawn Song and Jacob Steinhardt. 2021. Measuring coding challenge competence with apps. arXiv:2105.09938. Retrieved from https:\/\/arxiv.org\/pdf\/2105.09938"},{"key":"e_1_3_2_28_2","unstructured":"Dan Hendrycks Collin Burns Steven Basart Andrew Critch Jerry Li Dawn Song and Jacob Steinhardt. 2020. Aligning ai with shared human values. arXiv:2008.02275. Retrieved from https:\/\/arxiv.org\/pdf\/2008.02275"},{"key":"e_1_3_2_29_2","first-page":"13419","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems","author":"Inala Jeevana Priya","year":"2022","unstructured":"Jeevana Priya Inala, Chenglong Wang, Mei Yang, Andres Codas, Mark Encarnaci\u00f3n, Shuvendu Lahiri, Madanlal Musuvathi, and Jianfeng Gao. 2022. Fault-aware neural code rankers. In Proceedings of the 36th International Conference on Neural Information Processing Systems, 13419\u201313432."},{"key":"e_1_3_2_30_2","first-page":"4130","volume-title":"Proceedings of the IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","author":"Jalil Sajed","year":"2023","unstructured":"Sajed Jalil, Suzzana Rafi, Thomas D. LaToza, Kevin Moran, and Wing Lam. 2023. Chatgpt and software testing education: Promises & perils. In Proceedings of the IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, 4130\u20134137."},{"key":"e_1_3_2_31_2","volume-title":"Efficient Model Checking: The Power of Randomness","author":"Kiviriga Andrej","year":"2023","unstructured":"Andrej Kiviriga. 2023. Efficient Model Checking: The Power of Randomness. Aalborg Universitetsforlag."},{"key":"e_1_3_2_32_2","doi-asserted-by":"crossref","unstructured":"Kalpesh Krishna Yapei Chang John Wieting and Mohit Iyyer. 2022. Rankgen: Improving text generation with large ranking models. arXiv:2205.09726. Retrieved from https:\/\/arxiv.org\/pdf\/2205.09726","DOI":"10.18653\/v1\/2022.emnlp-main.15"},{"key":"e_1_3_2_33_2","volume-title":"Proceedings of the 33rd International Conference on Neural Information Processing Systems","author":"Kulal Sumith","year":"2019","unstructured":"Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken, and Percy S. Liang. 2019. SPoC: Search-based pseudocode to code. In Proceedings of the 33rd International Conference on Neural Information Processing Systems."},{"key":"e_1_3_2_34_2","unstructured":"Emanuele La Malfa Aleksandar Petrov Simon Frieder Christoph Weinhuber Ryan Burnell Anthony G. Cohn Nigel Shadbolt and Michael Wooldridge. 2023. The ARRT of language-models-as-a-service: Overview of a new paradigm and its challenges. arXiv:2309.16573. Retrieved from https:\/\/arxiv.org\/pdf\/2309.16573"},{"key":"e_1_3_2_35_2","first-page":"1","volume-title":"Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems","author":"Lee Mina","year":"2022","unstructured":"Mina Lee, Percy Liang, and Qian Yang. 2022. Coauthor: Designing a human-ai collaborative writing dataset for exploring language model capabilities. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, 1\u201319."},{"issue":"6","key":"e_1_3_2_36_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3597207","article-title":"Codeeditor: Learning to edit source code with pre-trained models","volume":"32","author":"Li Jia","year":"2023","unstructured":"Jia Li, Ge Li, Zhuo Li, Zhi Jin, Xing Hu, Kechi Zhang, and Zhiyi Fu. 2023. Codeeditor: Learning to edit source code with pre-trained models. ACM Transactions on Software Engineering and Methodology 32, 6 (2023), 1\u201322.","journal-title":"ACM Transactions on Software Engineering and Methodology"},{"key":"e_1_3_2_37_2","first-page":"2124","volume-title":"Proceedings of the IEEE\/ACM 45th International Conference on Software Engineering (ICSE)","author":"Li Jia","year":"2023","unstructured":"Jia Li, Yongmin Li, Ge Li, Zhi Jin, Yiyang Hao, and Xing Hu. 2023a. Skcoder: A sketch-based approach for automatic code generation. In Proceedings of the IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2124\u20132135."},{"key":"e_1_3_2_38_2","first-page":"6624","article-title":"Competition-level code generation with alphacode","volume":"378","author":"Li Yujia","year":"2022","unstructured":"Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, R\u00e9mi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d\u2019Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu, and Oriol Vinyals. 2022. Competition-level code generation with alphacode. Science 378, 6624 (2022), 1092\u20131097.","journal-title":"Science"},{"key":"e_1_3_2_39_2","first-page":"1238","volume-title":"Proceedings of the IEEE\/ACM 45th International Conference on Software Engineering (ICSE)","author":"Li Zongjie","year":"2023","unstructured":"Zongjie Li, Chaozheng Wang, Zhibo Liu, Haoxuan Wang, Dong Chen, Shuai Wang, and Cuiyun Gao. 2023. Cctest: Testing and repairing code completion systems. In Proceedings of the IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1238\u20131250."},{"key":"e_1_3_2_40_2","first-page":"9493","volume-title":"Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)","author":"Liang Jacky","year":"2023","unstructured":"Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. 2023. Code as policies: Language model programs for embodied control. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, 9493\u20139500."},{"key":"e_1_3_2_41_2","doi-asserted-by":"crossref","unstructured":"Wang Ling Edward Grefenstette Karl Moritz Hermann Tom\u00e1\u0161 Ko\u010disky Andrew Senior Fumin Wang and Phil Blunsom. 2016. Latent predictor networks for code generation. arXiv:1603.06744. Retrieved from https:\/\/arxiv.org\/pdf\/1603.06744","DOI":"10.18653\/v1\/P16-1057"},{"key":"e_1_3_2_42_2","unstructured":"Jiawei Liu Chunqiu Steven Xia Yuyao Wang and Lingming Zhang. 2023. Is your code generated by chatgpt really correct? Rigorous evaluation of large language models for code generation. arXiv:2305.01210."},{"key":"e_1_3_2_43_2","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems","author":"Liu Jiawei","year":"2024","unstructured":"Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2024. Is your code generated by chatgpt really correct? Rigorous evaluation of large language models for code generation. In Proceedings of the 37th International Conference on Neural Information Processing Systems."},{"key":"e_1_3_2_44_2","unstructured":"Yiheng Liu Tianle Han Siyuan Ma Jiayue Zhang Yuanyuan Yang Jiaming Tian Hao He Antong Li Mengshen He Zhengliang Liu Zihao Wu Lin Zhao Dajiang Zhu Xiang Li Ning Qiang Dingang Shen Tianming Liu and Bao Ge. 2023. Summary of chatgpt\/GPT-4 research and perspective towards the future of large language models. arXiv:2304.01852. Retrieved from https:\/\/arxiv.org\/pdf\/2304.01852"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3643674"},{"key":"e_1_3_2_46_2","unstructured":"Shuai Lu Daya Guo Shuo Ren Junjie Huang Alexey Svyatkovskiy Ambrosio Blanco Colin Clement Dawn Drain Daxin Jiang Duyu Tang Ge Li Lidong Zhou Linjun Shou Long Zhou Michele Tufano Ming Gong Ming Zhou Nan Duan Neel Sundaresan Shao Kun Deng Shengyu Fu and Shujie Liu. 2021. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv:2102.04664. Retrieved from https:\/\/arxiv.org\/pdf\/2102.04664"},{"issue":"3","key":"e_1_3_2_47_2","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1145\/362566.362568","article-title":"Toward automatic program synthesis","volume":"14","author":"Manna Zohar","year":"1971","unstructured":"Zohar Manna and Richard J. Waldinger. 1971. Toward automatic program synthesis. Communications of the ACM 14, 3 (1971), 151\u2013165.","journal-title":"Communications of the ACM"},{"key":"e_1_3_2_48_2","first-page":"2149","volume-title":"Proceedings of the IEEE\/ACM 45th International Conference on Software Engineering (ICSE)","author":"Mastropaolo Antonio","year":"2023","unstructured":"Antonio Mastropaolo, Luca Pascarella, Emanuela Guglielmi, Matteo Ciniselli, Simone Scalabrino, Rocco Oliveto, and Gabriele Bavota. 2023. On the robustness of code generation techniques: An empirical study on github copilot. In Proceedings of the IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2149\u20132160."},{"key":"e_1_3_2_49_2","first-page":"1","volume-title":"The Corsini Encyclopedia of Psychology","author":"McKight Patrick E.","year":"2010","unstructured":"Patrick E. McKight and Julius Najab. 2010. Kruskal-Wallis test. In The Corsini Encyclopedia of Psychology. John Wiley & Sons, Inc., 1\u20131."},{"key":"e_1_3_2_50_2","first-page":"1","volume-title":"The Corsini Encyclopedia of Psychology","author":"McKnight Patrick E.","year":"2010","unstructured":"Patrick E. McKnight and Julius Najab. 2010. Mann-Whitney U test. In The Corsini Encyclopedia of Psychology. John Wiley & Sons, Inc., 1\u20131."},{"key":"e_1_3_2_51_2","first-page":"24950","volume-title":"International Conference on Machine Learning","author":"Mitchell Eric","year":"2023","unstructured":"Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D Manning, and Chelsea Finn. 2023. Detectgpt: Zero-shotmachine-generated text detection using probability curvature. In International Conference on Machine Learning. PMLR, 24950\u201324962."},{"key":"e_1_3_2_52_2","unstructured":"Prabhat Nagarajan Garrett Warnell and Peter Stone. 2018. Deterministic implementations for reproducibility in deep reinforcement learning. arXiv:1809.05676. Retrieved from https:\/\/arxiv.org\/pdf\/1809.05676"},{"key":"e_1_3_2_53_2","unstructured":"OpenAI. 2023. GPT-4 technical report. arXiv:2303.08774."},{"key":"e_1_3_2_54_2","doi-asserted-by":"crossref","first-page":"771","DOI":"10.1145\/3324884.3416545","volume-title":"Proceedings of the 35th IEEE\/ACM International Conference on Automated Software Engineering","author":"Pham Hung Viet","year":"2020","unstructured":"Hung Viet Pham, Shangshu Qian, Jiannan Wang, Thibaud Lutellier, Jonathan Rosenthal, Lin Tan, Yaoliang Yu, and Nachiappan Nagappan. 2020. Problems and opportunities in training deep learning software systems: An analysis of variance. In Proceedings of the 35th IEEE\/ACM International Conference on Automated Software Engineering, 771\u2013783."},{"key":"e_1_3_2_55_2","unstructured":"Gabriel Poesia Oleksandr Polozov Vu Le Ashish Tiwari Gustavo Soares Christopher Meek and Sumit Gulwani. 2022. Synchromesh: Reliable code generation from pre-trained language models. arXiv:2201.11227. Retrieved from https:\/\/arxiv.org\/pdf\/2201.11227"},{"key":"e_1_3_2_56_2","unstructured":"Joan Puigcerver Carlos Riquelme Basil Mustafa and Neil Houlsby. 2023. From sparse to soft mixtures of experts. arXiv:2308.00951. Retrieved from https:\/\/arxiv.org\/pdf\/2308.00951"},{"key":"e_1_3_2_57_2","author":"Radford Alec","year":"2018","unstructured":"Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. Technical Report, OpenAI.","journal-title":"Improving Language Understanding by Generative Pre-Training"},{"key":"e_1_3_2_58_2","first-page":"260","volume-title":"Proceedings of the 4th International Joint Conference on Artificial Intelligence (IJCAI)","volume":"75","author":"Shaw David E","year":"1975","unstructured":"David E Shaw, William R Swartout, and C. Cordell Green. 1975. Inferring LISP programs from examples. In Proceedings of the 4th International Joint Conference on Artificial Intelligence (IJCAI), Vol. 75, 260\u2013267."},{"key":"e_1_3_2_59_2","doi-asserted-by":"crossref","DOI":"10.21236\/ADA016811","volume-title":"Pygmalion: A Creative Programming Environment","author":"Smith David Canfield","year":"1975","unstructured":"David Canfield Smith. 1975. Pygmalion: A Creative Programming Environment. Stanford University."},{"key":"e_1_3_2_60_2","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics","author":"Soares Ioana Baldini","year":"2022","unstructured":"Ioana Baldini Soares, Dennis Wei, Karthikeyan Natesan Ramamurthy, Moninder Singh, and Mikhail Yurochkin. 2022. Your fairness may vary: Pretrained language model fairness in toxic text classification. In Proceedings of the Annual Meeting of the Association for Computational Linguistics."},{"issue":"1","key":"e_1_3_2_61_2","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1145\/321992.322002","article-title":"A methodology for LISP program construction from examples","volume":"24","author":"Summers Phillip D.","year":"1977","unstructured":"Phillip D. Summers. 1977. A methodology for LISP program construction from examples. Journal of the ACM 24, 1 (1977), 161\u2013175.","journal-title":"Journal of the ACM"},{"key":"e_1_3_2_62_2","first-page":"7055","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"33","author":"Sun Zeyu","year":"2019","unstructured":"Zeyu Sun, Qihao Zhu, Lili Mou, Yingfei Xiong, Ge Li, and Lu Zhang. 2019. A grammar-based structural cnn decoder for code generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 7055\u20137062."},{"issue":"1","key":"e_1_3_2_63_2","doi-asserted-by":"crossref","first-page":"17","DOI":"10.55529\/ijitc.31.17.22","article-title":"Use chat gpt to solve programming bugs","volume":"3","author":"Surameery Nigar M Shafiq","year":"2023","unstructured":"Nigar M Shafiq Surameery and Mohammed Y Shakor. 2023. Use chat gpt to solve programming bugs. International Journal of Information Technology & Computer Engineering 3, 1 (2023), 17\u201322.","journal-title":"International Journal of Information Technology & Computer Engineering"},{"key":"e_1_3_2_64_2","doi-asserted-by":"crossref","first-page":"1433","DOI":"10.1145\/3368089.3417058","volume-title":"Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering","author":"Svyatkovskiy Alexey","year":"2020","unstructured":"Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1433\u20131443."},{"key":"e_1_3_2_65_2","first-page":"1","volume-title":"Proceedings of the CHI Conference on Human Factors in Computing Systems Extended Abstracts","author":"Vaithilingam Priyan","year":"2022","unstructured":"Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Proceedings of the CHI Conference on Human Factors in Computing Systems Extended Abstracts, 1\u20137."},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2024.3368208"},{"key":"e_1_3_2_67_2","volume-title":"Proceedings of the 33rd International Conference on Neural Information Processing Systems","author":"Wei Bolin","year":"2019","unstructured":"Bolin Wei, Ge Li, Xin Xia, Zhiyi Fu, and Zhi Jin. 2019. Code generation as a dual task of code summarization. In Proceedings of the 33rd International Conference on Neural Information Processing Systems."},{"key":"e_1_3_2_68_2","first-page":"172","volume-title":"Proceedings of the 28th International Conference on Program Comprehension","author":"Wu Xiongfei","year":"2020","unstructured":"Xiongfei Wu, Liangyu Qin, Bing Yu, Xiaofei Xie, Lei Ma, Yinxing Xue, Yang Liu, and Jianjun Zhao. 2020. How are deep learning models similar? An empirical study on clone analysis of deep learning software. In Proceedings of the 28th International Conference on Program Comprehension, 172\u2013183."},{"issue":"2","key":"e_1_3_2_69_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3487569","article-title":"In-ide code generation from natural language: Promise and challenges","volume":"31","author":"Xu Frank F.","year":"2022","unstructured":"Frank F. Xu, Bogdan Vasilescu, and Graham Neubig. 2022. In-ide code generation from natural language: Promise and challenges. ACM Transactions on Software Engineering and Methodology 31, 2 (2022), 1\u201347.","journal-title":"ACM Transactions on Software Engineering and Methodology"},{"key":"e_1_3_2_70_2","unstructured":"Burak Yeti\u015ftiren I\u015fi\u0307k \u00d6zsoy Miray Ayerdem and Eray T\u00fcz\u00fcn. 2023. Evaluating the code quality of AI-assisted code generation tools: An empirical study on GitHub Copilot Amazon CodeWhisperer and ChatGPT. arXiv:2304.10778. Retrieved from https:\/\/arxiv.org\/pdf\/2304.10778"},{"key":"e_1_3_2_71_2","unstructured":"Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. arXiv:1704.01696. Retrieved from https:\/\/arxiv.org\/pdf\/1704.01696"},{"key":"e_1_3_2_72_2","unstructured":"Pengcheng Yin and Graham Neubig. 2018. Tranx: A transition-based neural abstract syntax parser for semantic parsing and code generation. arXiv:1810.02720. Retrieved from https:\/\/arxiv.org\/pdf\/1810.02720"},{"key":"e_1_3_2_73_2","first-page":"1","volume-title":"Proceedings of the 46th IEEE\/ACM International Conference on Software Engineering","author":"Yu Hao","year":"2024","unstructured":"Hao Yu, Bo Shen, Dezhi Ran, Jiaxin Zhang, Qi Zhang, Yuchi Ma, Guangtai Liang, Ying Li, Qianxiang Wang, and Tao Xie. 2024. Codereval: A benchmark of pragmatic code generation with generative pre-trained models. In Proceedings of the 46th IEEE\/ACM International Conference on Software Engineering, 1\u201312."},{"key":"e_1_3_2_74_2","unstructured":"Daoguang Zan Bei Chen Dejian Yang Zeqi Lin Minsu Kim Bei Guan Yongji Wang Weizhu Chen and Jian-Guang Lou. 2022. CERT: Continual pre-training on Sketches for Library-oriented Code Generation. arXiv:2206.06888. Retrieved from https:\/\/arxiv.org\/pdf\/2206.06888"}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3697010","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3697010","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T18:43:16Z","timestamp":1750272196000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3697010"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,22]]},"references-count":73,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,2,28]]}},"alternative-id":["10.1145\/3697010"],"URL":"https:\/\/doi.org\/10.1145\/3697010","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,22]]},"assertion":[{"value":"2023-11-03","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-23","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}