{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T05:23:05Z","timestamp":1776489785858,"version":"3.51.2"},"reference-count":95,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,19]]},"abstract":"<jats:p>Large language models (LLMs) have brought a paradigm shift to the field of code generation, offering the potential to enhance the software development process. However, previous research mainly focuses on the accuracy of code generation, while coding style differences between LLMs and human developers remain under-explored. In this paper, we empirically analyze the differences in coding style between the code generated by mainstream LLMs and the code written by human developers, and summarize coding style inconsistency taxonomy. Specifically, we first summarize the types of coding style inconsistencies by manually analyzing a large number of generation results. We then compare the code generated by LLMs with the code written by human programmers in terms of readability, conciseness, and robustness. The results reveal that LLMs and developers exhibit differences in coding style. Additionally, we study the possible causes of these inconsistencies and provide some solutions to alleviate the problem.<\/jats:p>","DOI":"10.1145\/3715749","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:16:02Z","timestamp":1750346162000},"page":"690-712","source":"Crossref","is-referenced-by-count":6,"title":["Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7761-7269","authenticated-orcid":false,"given":"Yanlin","family":"Wang","sequence":"first","affiliation":[{"name":"Sun Yat-sen University, Zhuhai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-1305-8084","authenticated-orcid":false,"given":"Tianyue","family":"Jiang","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, Zhuhai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3462-997X","authenticated-orcid":false,"given":"Mingwei","family":"Liu","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, Zhuhai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0192-9992","authenticated-orcid":false,"given":"Jiachi","family":"Chen","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, Zhuhai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9369-7828","authenticated-orcid":false,"given":"Mingzhi","family":"Mao","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, Zhuhai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4870-1012","authenticated-orcid":false,"given":"Xilin","family":"Liu","sequence":"additional","affiliation":[{"name":"Huawei Cloud Computing Technologies, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-3304-1389","authenticated-orcid":false,"given":"Yuchi","family":"Ma","sequence":"additional","affiliation":[{"name":"Huawei Cloud Computing Technologies, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7878-4330","authenticated-orcid":false,"given":"Zibin","family":"Zheng","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, Zhuhai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2024. Replication Package. https:\/\/github.com\/DeepSoftwareAnalytics\/Coding-Style-Empirical"},{"key":"e_1_2_1_2_1","volume-title":"Diogo Almeida, Janko Altenschmidt, Sam Altman, and Shyamal Anadkat.","author":"Achiam Josh","year":"2023","unstructured":"Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, and Shyamal Anadkat. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639183"},{"key":"e_1_2_1_4_1","unstructured":"Jacob Austin Augustus Odena Maxwell Nye Maarten Bosma Henryk Michalewski David Dohan Ellen Jiang Carrie Cai Michael Terry and Quoc Le. 2021. Program synthesis with large language models. arXiv preprint arXiv:2108.07732."},{"key":"e_1_2_1_5_1","volume-title":"2019 IEEE\/ACM 16th International Conference on Mining Software Repositories (MSR). 210\u2013214","author":"Bafatakis Nikolaos","year":"2019","unstructured":"Nikolaos Bafatakis, Niels Boecker, Wenjie Boon, Martin Cabello Salazar, Jens Krinke, Gazi Oznacar, and Robert White. 2019. Python coding style compliance on stack overflow. In 2019 IEEE\/ACM 16th International Conference on Mining Software Repositories (MSR). 210\u2013214."},{"key":"e_1_2_1_6_1","volume-title":"Codeplan: Repository-level coding using llms and planning. arXiv preprint arXiv:2309.12499.","author":"Bairi Ramakrishna","year":"2023","unstructured":"Ramakrishna Bairi, Atharv Sonwane, Aditya Kanade, Arun Iyer, Suresh Parthasarathy, Sriram Rajamani, B Ashok, and Shashank Shet. 2023. Codeplan: Repository-level coding using llms and planning. arXiv preprint arXiv:2309.12499."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2465.2469"},{"key":"e_1_2_1_8_1","volume-title":"Jun Shern Chan, Samuel R Bowman, Kyunghyun Cho, and Ethan Perez.","author":"Chen Angelica","year":"2023","unstructured":"Angelica Chen, J\u00e9r\u00e9my Scheurer, Tomasz Korbak, Jon Ander Campos, Jun Shern Chan, Samuel R Bowman, Kyunghyun Cho, and Ethan Perez. 2023. Improving code generation by training with natural language feedback. arXiv preprint arXiv:2303.16749."},{"key":"e_1_2_1_9_1","volume-title":"2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). 2362\u20132373","author":"Chen Binger","year":"2023","unstructured":"Binger Chen and Ziawasch Abedjan. 2023. DUETCS: Code Style Transfer through Generation and Retrieval. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). 2362\u20132373."},{"key":"e_1_2_1_10_1","volume-title":"Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, and Greg Brockman.","author":"Chen Mark","year":"2021","unstructured":"Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, and Greg Brockman. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374."},{"key":"e_1_2_1_11_1","volume-title":"2022 26th International Conference on Engineering of Complex Computer Systems (ICECCS). 173\u2013182","author":"Chen Penglong","year":"2022","unstructured":"Penglong Chen, Zhen Li, Yu Wen, and Lili Liu. 2022. Generating adversarial source programs using important tokens-based structural transformations. In 2022 26th International Conference on Engineering of Complex Computer Systems (ICECCS). 173\u2013182."},{"key":"e_1_2_1_12_1","unstructured":"Anton Cheshkov Pavel Zadorozhny and Rodion Levichev. 2023. Evaluation of chatgpt model for vulnerability detection. arXiv preprint arXiv:2304.07232."},{"key":"e_1_2_1_13_1","volume-title":"A coefficient of agreement for nominal scales. Educational and psychological measurement, 20, 1","author":"Cohen Jacob","year":"1960","unstructured":"Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement, 20, 1 (1960), 37\u201346."},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Domenico Cotroneo Cristina Improta Pietro Liguori and Roberto Natella. 2023. Vulnerabilities in ai code generators: Exploring targeted data poisoning attacks. arXiv preprint arXiv:2308.04451.","DOI":"10.1145\/3643916.3644416"},{"key":"e_1_2_1_15_1","unstructured":"Yihong Dong Xue Jiang Zhi Jin and Ge Li. 2023. Self-collaboration Code Generation via ChatGPT. arXiv preprint arXiv:2304.07590."},{"key":"e_1_2_1_16_1","volume-title":"Classeval: A manually-crafted benchmark for evaluating llms on class-level code generation. arXiv preprint arXiv:2308.01861.","author":"Du Xueying","year":"2023","unstructured":"Xueying Du, Mingwei Liu, Kaixin Wang, Hanlin Wang, Junwei Liu, Yixuan Chen, Jiayi Feng, Chaofeng Sha, Xin Peng, and Yiling Lou. 2023. Classeval: A manually-crafted benchmark for evaluating llms on class-level code generation. arXiv preprint arXiv:2308.01861."},{"key":"e_1_2_1_17_1","unstructured":"Jing Gong Yanghui Wu Linxi Liang Zibin Zheng and Yanlin Wang. 2024. CoSQA+: Enhancing Code Search Dataset with Matching Code. arXiv preprint arXiv:2406.11589."},{"key":"e_1_2_1_18_1","unstructured":"Daya Guo Qihao Zhu Dejian Yang Zhenda Xie Kai Dong Wentao Zhang Guanting Chen Xiao Bi Y Wu and YK Li. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming\u2013The Rise of Code Intelligence. arXiv preprint arXiv:2401.14196."},{"key":"e_1_2_1_19_1","unstructured":"Daya Guo Qihao Zhu Dejian Yang Zhenda Xie Kai Dong Wentao Zhang Guanting Chen Xiao Bi Y Wu and YK Li. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming\u2013The Rise of Code Intelligence. arXiv preprint arXiv:2401.14196."},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1073\u20131085","author":"Guo Lianghong","year":"2024","unstructured":"Lianghong Guo, Yanlin Wang, Ensheng Shi, Wanjun Zhong, Hongyu Zhang, Jiachi Chen, Ruikai Zhang, Yuchi Ma, and Zibin Zheng. 2024. When to stop? towards efficient code generation in llms with excess token prevention. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1073\u20131085."},{"key":"e_1_2_1_21_1","unstructured":"Rajarshi Haldar and Julia Hockenmaier. 2024. Analyzing the performance of large language models on code summarization. arXiv preprint arXiv:2404.08018."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING","author":"Hu Fan","year":"2024","unstructured":"Fan Hu, Yanlin Wang, Lun Du, Hongyu Zhang, Dongmei Zhang, and Xirong Li. 2024. Tackling Long Code Search with Splitting, Encoding, and Aggregating. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 15500\u201315510."},{"key":"e_1_2_1_23_1","unstructured":"Baizhou Huang Shuai Lu Weizhu Chen Xiaojun Wan and Nan Duan. 2023. Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency. arXiv preprint arXiv:2309.17272."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 2024 IEEE\/ACM 46th International Conference on Software Engineering: Companion Proceedings. 270\u2013271","author":"Huang Tao","year":"2024","unstructured":"Tao Huang, Zhihong Sun, Zhi Jin, Ge Li, and Chen Lyu. 2024. KareCoder: A New Knowledge-Enriched Code Generation System. In Proceedings of the 2024 IEEE\/ACM 46th International Conference on Software Engineering: Companion Proceedings. 270\u2013271."},{"key":"e_1_2_1_25_1","unstructured":"Naman Jain Tianjun Zhang Wei-Lin Chiang Joseph E Gonzalez Koushik Sen and Ion Stoica. 2023. LLM-Assisted Code Cleaning For Training Accurate Code Generators. arXiv preprint arXiv:2311.14904."},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Hui Jiang Chulun Zhou Fandong Meng Biao Zhang Jie Zhou Degen Huang Qingqiang Wu and Jinsong Su. 2021. Exploring dynamic selection of branch expansion orders for code generation. arXiv preprint arXiv:2106.00261.","DOI":"10.18653\/v1\/2021.acl-long.394"},{"key":"e_1_2_1_27_1","unstructured":"Shuyang Jiang Yuhao Wang and Yu Wang. 2023. SelfEvolve: A Code Evolution Framework via Large Language Models. arXiv preprint arXiv:2306.02907."},{"key":"e_1_2_1_28_1","unstructured":"Xue Jiang Yihong Dong Lecheng Wang Qiwei Shang and Ge Li. 2023. Self-planning code generation with large language model. arXiv preprint arXiv:2303.06689."},{"key":"e_1_2_1_29_1","volume-title":"Open coding","author":"Khandkar Shahedul Huq","year":"2009","unstructured":"Shahedul Huq Khandkar. 2009. Open coding. University of Calgary, 23, 2009 (2009)."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop). 130\u2013137","author":"Kondo Mizuki","year":"2024","unstructured":"Mizuki Kondo, Daisuke Kawahara, and Toshiyuki Kurabayashi. 2024. Improving Repository-level Code Search with Text Conversion. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop). 130\u2013137."},{"key":"e_1_2_1_31_1","first-page":"21314","article-title":"Coderl: Mastering code generation through pretrained models and deep reinforcement learning","volume":"35","author":"Le Hung","year":"2022","unstructured":"Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Silvio Savarese, and Steven Chu Hong Hoi. 2022. Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Advances in Neural Information Processing Systems, 35 (2022), 21314\u201321328.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_32_1","unstructured":"Jia Li Ge Li Yongmin Li and Zhi Jin. 2023. Structured Chain-of-Thought Prompting for Code Generation. arXiv preprint arXiv:2305.06599."},{"key":"e_1_2_1_33_1","unstructured":"Jia Li Ge Li Chongyang Tao Huangzhao Zhang Fang Liu and Zhi Jin. 2023. Large Language Model-Aware In-Context Learning for Code Generation. arXiv preprint arXiv:2310.09748."},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Jia Li Ge Li Xuanming Zhang Yihong Dong and Zhi Jin. 2024. EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories. arXiv preprint arXiv:2404.00599.","DOI":"10.18653\/v1\/2024.findings-acl.214"},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Jia Li Ge Li Yunfei Zhao Yongmin Li Huanyu Liu Hao Zhu Lecheng Wang Kaibo Liu Zheng Fang and Lanshen Wang. 2024. DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories. arXiv e-prints arXiv\u20132405.","DOI":"10.18653\/v1\/2024.findings-acl.214"},{"key":"e_1_2_1_36_1","volume-title":"Skcoder: A sketch-based approach for automatic code generation. arXiv preprint arXiv:2302.06144.","author":"Li Jia","year":"2023","unstructured":"Jia Li, Yongmin Li, Ge Li, Zhi Jin, Yiyang Hao, and Xing Hu. 2023. Skcoder: A sketch-based approach for automatic code generation. arXiv preprint arXiv:2302.06144."},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the 32nd IEEE\/ACM International Conference on Program Comprehension. 47\u201351","author":"Li Jiliang","year":"2024","unstructured":"Jiliang Li, Yifan Zhang, Zachary Karas, Collin McMillan, Kevin Leach, and Yu Huang. 2024. Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization. In Proceedings of the 32nd IEEE\/ACM International Conference on Program Comprehension. 47\u201351."},{"key":"e_1_2_1_38_1","unstructured":"Jia Li Yunfei Zhao Yongmin Li Ge Li and Zhi Jin. 2023. Towards Enhancing In-Context Learning for Code Generation. arXiv preprint arXiv:2303.17780."},{"key":"e_1_2_1_39_1","volume-title":"Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, and Jenny Chim.","author":"Li Raymond","year":"2023","unstructured":"Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, and Jenny Chim. 2023. Starcoder: may the source be with you!. arXiv preprint arXiv:2305.06161."},{"key":"e_1_2_1_40_1","unstructured":"Xin-Ye Li Jiang-Tian Xue Zheng Xie and Ming Li. 2023. Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation. arXiv preprint arXiv:2305.10679."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3671016.3674819"},{"key":"e_1_2_1_42_1","unstructured":"Zehan Li Jianfei Zhang Chuantao Yin Yuanxin Ouyang and Wenge Rong. 2024. ProCQA: A Large-scale Community-based Programming Question Answering Dataset for Code Search. arXiv preprint arXiv:2403.16702."},{"key":"e_1_2_1_43_1","unstructured":"Junwei Liu Yixuan Chen Mingwei Liu Xin Peng and Yiling Lou. 2024. STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis. arXiv preprint arXiv:2406.10018."},{"key":"e_1_2_1_44_1","volume-title":"Rigorous Evaluation of Large Language Models for Code Generation. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023","author":"Liu Jiawei","year":"2023","unstructured":"Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2023. Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (Eds.). http:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/43e9d647ccd3e4b7b5baab53f0368686-Abstract-Conference.html"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE56229.2023.00159"},{"key":"e_1_2_1_46_1","unstructured":"Cristina V Lopes Vanessa I Klotzman Iris Ma and Iftekar Ahmed. 2024. Commit Messages in the Age of Large Language Models. arXiv preprint arXiv:2401.17622."},{"key":"e_1_2_1_47_1","volume-title":"COMET: Generating Commit Messages using Delta Graph Context Representation. arXiv preprint arXiv:2402.01841.","author":"Mandli Abhinav Reddy","year":"2024","unstructured":"Abhinav Reddy Mandli, Saurabhsingh Rajput, and Tushar Sharma. 2024. COMET: Generating Commit Messages using Delta Graph Context Representation. arXiv preprint arXiv:2402.01841."},{"key":"e_1_2_1_48_1","volume-title":"2019 IEEE\/ACM 16th International Conference on Mining Software Repositories (MSR). 468\u2013478","author":"Markovtsev Vadim","year":"2019","unstructured":"Vadim Markovtsev, Waren Long, Hugo Mougard, Konstantin Slavnov, and Egor Bulychev. 2019. STYLE-ANALYZER: fixing code style inconsistencies with interpretable unsupervised algorithms. In 2019 IEEE\/ACM 16th International Conference on Mining Software Repositories (MSR). 468\u2013478."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2972958.2972963"},{"key":"e_1_2_1_50_1","unstructured":"Fangwen Mu Lin Shi Song Wang Zhuohao Yu Binquan Zhang Chenxue Wang Shichao Liu and Qing Wang. 2023. ClarifyGPT: Empowering LLM-based Code Generation with Intention Clarification. arXiv preprint arXiv:2310.10996."},{"key":"e_1_2_1_51_1","volume-title":"International Conference on Machine Learning. 26106\u201326128","author":"Ni Ansong","year":"2023","unstructured":"Ansong Ni, Srini Iyer, Dragomir Radev, Veselin Stoyanov, Wen-tau Yih, Sida Wang, and Xi Victoria Lin. 2023. Lever: Learning to verify language-to-code generation with execution. In International Conference on Machine Learning. 26106\u201326128."},{"key":"e_1_2_1_52_1","volume-title":"Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474.","author":"Nijkamp Erik","year":"2022","unstructured":"Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2022. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474."},{"key":"e_1_2_1_53_1","unstructured":"Sanghak Oh Kiho Lee Seonhye Park Doowon Kim and Hyoungshick Kim. 2023. Poisoned ChatGPT Finds Work for Idle Hands: Exploring Developers\u2019 Coding Practices with Insecure Suggestions from Poisoned AI Models. arXiv preprint arXiv:2312.06227."},{"key":"e_1_2_1_54_1","volume-title":"Chenglong Wang, Jianfeng Gao, and Armando Solar-Lezama.","author":"Olausson Theo X","year":"2023","unstructured":"Theo X Olausson, Jeevana Priya Inala, Chenglong Wang, Jianfeng Gao, and Armando Solar-Lezama. 2023. Demystifying GPT Self-Repair for Code Generation. arXiv preprint arXiv:2306.09896."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/100348.100385"},{"key":"e_1_2_1_56_1","unstructured":"OpenAI. 2021. OpenAI Code. https:\/\/openai.com\/blog\/openai-code"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2997364.2997383"},{"key":"e_1_2_1_58_1","unstructured":"Huy N Phan Hoang N Phan Tien N Nguyen and Nghi DQ Bui. 2024. RepoHyper: Better Context Retrieval Is All You Need for Repository-Level Code Completion. arXiv preprint arXiv:2403.06095."},{"key":"e_1_2_1_59_1","unstructured":"F Rohlf. 1981. Biometry the principles and practice of statistics in biological research."},{"key":"e_1_2_1_60_1","volume-title":"Yossi Adi, Jingyu Liu, Tal Remez, and J\u00e9r\u00e9my Rapin.","author":"Roziere Baptiste","year":"2023","unstructured":"Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, and J\u00e9r\u00e9my Rapin. 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950."},{"key":"e_1_2_1_61_1","article-title":"An empirical evaluation of using large language models for automated unit test generation","author":"Sch\u00e4fer Max","year":"2023","unstructured":"Max Sch\u00e4fer, Sarah Nadi, Aryaz Eghbali, and Frank Tip. 2023. An empirical evaluation of using large language models for automated unit test generation. IEEE Transactions on Software Engineering.","journal-title":"IEEE Transactions on Software Engineering."},{"key":"e_1_2_1_62_1","unstructured":"Ensheng Shi Yanlin Wang Hongyu Zhang Lun Du Shi Han Dongmei Zhang and Hongbin Sun. 2023. Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond. arXiv preprint arXiv:2304.05216."},{"key":"e_1_2_1_63_1","volume-title":"Noshin Ulfat, FA Rifat, and V Carvalho Lopes.","author":"Siddiq Mohammed Latif","year":"2023","unstructured":"Mohammed Latif Siddiq, Joanna Santos, Ridwanul Hasan Tanvir, Noshin Ulfat, FA Rifat, and V Carvalho Lopes. 2023. Exploring the effectiveness of large language models in generating unit tests. arXiv preprint arXiv:2305.00418."},{"key":"e_1_2_1_64_1","first-page":"354","article-title":"Python\u2013the fastest growing programming language","volume":"4","author":"Srinath KR","year":"2017","unstructured":"KR Srinath. 2017. Python\u2013the fastest growing programming language. International Research Journal of Engineering and Technology, 4, 12 (2017), 354\u2013357.","journal-title":"International Research Journal of Engineering and Technology"},{"key":"e_1_2_1_65_1","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1007\/s10515-024-00421-4","article-title":"Distilled GPT for source code summarization","volume":"31","author":"Su Chia-Yi","year":"2024","unstructured":"Chia-Yi Su and Collin McMillan. 2024. Distilled GPT for source code summarization. Automated Software Engineering, 31, 1 (2024), 22.","journal-title":"Automated Software Engineering"},{"key":"e_1_2_1_66_1","unstructured":"Weisong Sun Chunrong Fang Yudu You Yun Miao Yi Liu Yuekang Li Gelei Deng Shenghan Huang Yuchen Chen and Quanjun Zhang. 2023. Automatic code summarization via chatgpt: How far are we? arXiv preprint arXiv:2305.12865."},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE-Companion58688.2023.00089"},{"key":"e_1_2_1_68_1","unstructured":"Karl Tamberg and Hayretdin Bahsi. 2024. Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study. arxiv:2405.15614. arxiv:2405.15614"},{"key":"e_1_2_1_69_1","volume-title":"KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation. ACM Transactions on Software Engineering and Methodology.","author":"Tao Wei","year":"2024","unstructured":"Wei Tao, Yucheng Zhou, Yanlin Wang, Hongyu Zhang, Haofen Wang, and Wenqiang Zhang. 2024. KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation. ACM Transactions on Software Engineering and Methodology."},{"key":"e_1_2_1_70_1","volume-title":"Structcoder: Structure-aware transformer for code generation. ACM Transactions on Knowledge Discovery from Data, 18, 3","author":"Tipirneni Sindhu","year":"2024","unstructured":"Sindhu Tipirneni, Ming Zhu, and Chandan K Reddy. 2024. Structcoder: Structure-aware transformer for code generation. ACM Transactions on Knowledge Discovery from Data, 18, 3 (2024), 1\u201320."},{"key":"e_1_2_1_71_1","unstructured":"Shubham Ugare Tarun Suresh Hangoo Kang Sasa Misailovic and Gagandeep Singh. 2024. Improving llm code generation with grammar augmentation. arXiv preprint arXiv:2403.01632."},{"key":"e_1_2_1_72_1","volume-title":"You Augment Me: Exploring ChatGPT-based Data Augmentation for Semantic Code Search. In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). 14\u201325","author":"Wang Yanlin","year":"2023","unstructured":"Yanlin Wang, Lianghong Guo, Ensheng Shi, Wenqing Chen, Jiachi Chen, Wanjun Zhong, Menghan Wang, Hui Li, Hongyu Zhang, and Ziyu Lyu. 2023. You Augment Me: Exploring ChatGPT-based Data Augmentation for Semantic Code Search. In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). 14\u201325."},{"key":"e_1_2_1_73_1","doi-asserted-by":"crossref","unstructured":"Yanlin Wang Yanxian Huang Daya Guo Hongyu Zhang and Zibin Zheng. 2024. SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization. arXiv preprint arXiv:2401.14727.","DOI":"10.1109\/SANER60148.2024.00068"},{"key":"e_1_2_1_74_1","volume-title":"Rlcoder: Reinforcement learning for repository-level code completion. arXiv preprint arXiv:2407.19487.","author":"Wang Yanlin","year":"2024","unstructured":"Yanlin Wang, Yanli Wang, Daya Guo, Jiachi Chen, Ruikai Zhang, Yuchi Ma, and Zibin Zheng. 2024. Rlcoder: Reinforcement learning for repository-level code completion. arXiv preprint arXiv:2407.19487."},{"key":"e_1_2_1_75_1","unstructured":"Ziliang Wang Ge Li Jia Li Yingfei Xiong and Zhi Jin. 2024. M2CVD: Multi-Model Collaboration for Code Vulnerability Detection. arXiv preprint arXiv:2406.05940."},{"key":"e_1_2_1_76_1","unstructured":"Zejun Wang Jia Li Ge Li and Zhi Jin. 2023. ChatCoder: Chat-based Refine Requirement Improves LLMs\u2019 Code Generation. arXiv preprint arXiv:2311.00272."},{"key":"e_1_2_1_77_1","unstructured":"Zhuokui Xie Yinghao Chen Chen Zhi Shuiguang Deng and Jianwei Yin. 2023. ChatUniTest: a ChatGPT-based automated unit test generation tool. arXiv preprint arXiv:2305.04764."},{"key":"e_1_2_1_78_1","doi-asserted-by":"crossref","unstructured":"Prateek Yadav Qing Sun Hantian Ding Xiaopeng Li Dejiao Zhang Ming Tan Xiaofei Ma Parminder Bhatia Ramesh Nallapati and Murali Krishna Ramanathan. 2023. Exploring continual learning for code generation models. arXiv preprint arXiv:2307.02435.","DOI":"10.18653\/v1\/2023.acl-short.68"},{"key":"e_1_2_1_79_1","unstructured":"Aidan Z. H. Yang Haoye Tian He Ye Ruben Martins and Claire Le Goues. 2024. Security Vulnerability Detection with Multitask Self-Instructed Fine-Tuning of Large Language Models. arxiv:2406.05892. arxiv:2406.05892"},{"key":"e_1_2_1_80_1","unstructured":"Rafed Muhammad Yasir and Dr Ahmedul Kabir. 2022. Exploring the Impact of Code Style in Identifying Good Programmers. arXiv preprint arXiv:2206.10891."},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3623316"},{"key":"e_1_2_1_82_1","unstructured":"Zhiqiang Yuan Junwei Liu Qiancheng Zi Mingwei Liu Xin Peng and Yiling Lou. 2023. Evaluating instruction-tuned large language models on code comprehension and generation. arXiv preprint arXiv:2308.01240."},{"key":"e_1_2_1_83_1","unstructured":"Zhiqiang Yuan Yiling Lou Mingwei Liu Shiji Ding Kaixin Wang Yixuan Chen and Xin Peng. 2023. No more manual tests? evaluating and improving chatgpt for unit test generation. arXiv preprint arXiv:2305.04207."},{"key":"e_1_2_1_84_1","unstructured":"Imam Nur Bani Yusuf and Lingxiao Jiang. 2024. Your Instructions Are Not Always Helpful: Assessing the Efficacy of Instruction Fine-tuning for Software Vulnerability Detection. arxiv:2401.07466. arxiv:2401.07466"},{"key":"e_1_2_1_85_1","unstructured":"Daoguang Zan Bei Chen Yongshun Gong Junzhi Cao Fengji Zhang Bingchao Wu Bei Guan Yilong Yin and Yongji Wang. 2023. Private-library-oriented code generation with large language models. arXiv preprint arXiv:2307.15370."},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.411"},{"key":"e_1_2_1_87_1","doi-asserted-by":"crossref","unstructured":"Fengji Zhang Bei Chen Yue Zhang Jacky Keung Jin Liu Daoguang Zan Yi Mao Jian-Guang Lou and Weizhu Chen. 2023. RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation. arxiv:2303.12570.","DOI":"10.18653\/v1\/2023.emnlp-main.151"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/3511887"},{"key":"e_1_2_1_89_1","volume-title":"Self-Edit: Fault-Aware Code Editor","author":"Zhang Kechi","unstructured":"Kechi Zhang, Zhuo Li, Jia Li, Ge Li, and Zhi Jin. 2023. Self-Edit: Fault-Aware Code Editor for Code Generation. arXiv preprint arXiv:2305.04087."},{"key":"e_1_2_1_90_1","volume-title":"Planning with Large Language Models for Code Generation. In The Eleventh International Conference on Learning Representations.","author":"Zhang Shun","year":"2023","unstructured":"Shun Zhang, Zhenfang Chen, Yikang Shen, Mingyu Ding, Joshua B. Tenenbaum, and Chuang Gan. 2023. Planning with Large Language Models for Code Generation. In The Eleventh International Conference on Learning Representations."},{"key":"e_1_2_1_91_1","article-title":"Automatic commit message generation: A critical review and directions for future work","author":"Zhang Yuxia","year":"2024","unstructured":"Yuxia Zhang, Zhiqing Qiu, Klaas-Jan Stol, Wenhui Zhu, Jiaxin Zhu, Yingchen Tian, and Hui Liu. 2024. Automatic commit message generation: A critical review and directions for future work. IEEE Transactions on Software Engineering.","journal-title":"IEEE Transactions on Software Engineering."},{"key":"e_1_2_1_92_1","doi-asserted-by":"crossref","unstructured":"Zibin Zheng Kaiwen Ning Jiachi Chen Yanlin Wang Wenqing Chen Lianghong Guo and Weicheng Wang. 2023. Towards an understanding of large language models in software engineering tasks. arXiv preprint arXiv:2308.11396.","DOI":"10.1007\/s10664-024-10602-0"},{"key":"e_1_2_1_93_1","unstructured":"Zibin Zheng Kaiwen Ning Yanlin Wang Jingwen Zhang Dewu Zheng Mingxi Ye and Jiachi Chen. 2023. A survey of large language models for code: Evolution benchmarking and future trends. arXiv preprint arXiv:2311.10372."},{"key":"e_1_2_1_94_1","unstructured":"Zibin Zheng Kaiwen Ning Yanlin Wang Jingwen Zhang Dewu Zheng Mingxi Ye and Jiachi Chen. 2023. A Survey of Large Language Models for Code: Evolution Benchmarking and Future Trends. arXiv preprint arXiv:2311.10372."},{"key":"e_1_2_1_95_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence. 38","author":"Zhu Yuqi","year":"2024","unstructured":"Yuqi Zhu, Jia Li, Ge Li, YunFei Zhao, Zhi Jin, and Hong Mei. 2024. Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models. In Proceedings of the AAAI Conference on Artificial Intelligence. 38, 437\u2013445."}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3715749","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:20:53Z","timestamp":1750346453000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715749"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":95,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2025,6,19]]}},"alternative-id":["10.1145\/3715749"],"URL":"https:\/\/doi.org\/10.1145\/3715749","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,19]]}}}