{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T06:47:42Z","timestamp":1777099662314,"version":"3.51.4"},"reference-count":113,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T00:00:00Z","timestamp":1745798400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62202419"],"award-info":[{"award-number":["62202419"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["226-2022-00064"],"award-info":[{"award-number":["226-2022-00064"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Zhejiang Provincial Natural Science Foundation of China","award":["LY24F020008"],"award-info":[{"award-number":["LY24F020008"]}]},{"DOI":"10.13039\/100007834","name":"Ningbo Natural Science Foundation","doi-asserted-by":"crossref","award":["2022J184"],"award-info":[{"award-number":["2022J184"]}],"id":[{"id":"10.13039\/100007834","id-type":"DOI","asserted-by":"crossref"}]},{"name":"State Street Zhejiang University Technology Center"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2025,5,31]]},"abstract":"<jats:p>Large language models (LLMs), such as ChatGPT released by OpenAI, have attracted significant attention from both industry and academia due to their demonstrated ability to generate high-quality content for various tasks. Despite the impressive capabilities of LLMs, there are growing concerns regarding their potential risks in various fields, such as news, education, and software engineering. Recently, several commercial and open source LLM-generated content detectors have been proposed, which, however, are primarily designed for detecting natural language content without considering the specific characteristics of program code. This article aims to fill this gap by proposing a novel ChatGPT-generated code detector, CodeGPTSensor, based on a contrastive learning framework and a semantic encoder built with UniXcoder. To assess the effectiveness of CodeGPTSensor on differentiating ChatGPT-generated code from human-written code, we first curate a large-scale Human and Machine comparison Corpus (HMCorp), which includes 550k pairs of human-written and ChatGPT-generated code (i.e., 288k Python code pairs and 222k Java code pairs). Based on the HMCorp dataset, our qualitative and quantitative analysis of the characteristics of ChatGPT-generated code reveals the challenge and opportunity of distinguishing ChatGPT-generated code from human-written code with their representative features. Our experimental results indicate that CodeGPTSensor can effectively identify ChatGPT-generated code, outperforming all selected baselines.<\/jats:p>","DOI":"10.1145\/3705300","type":"journal-article","created":{"date-parts":[[2025,1,27]],"date-time":"2025-01-27T15:53:31Z","timestamp":1737993211000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Distinguishing LLM-Generated from Human-Written Code by Contrastive Learning"],"prefix":"10.1145","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-4066-0535","authenticated-orcid":false,"given":"Xiaodan","family":"Xu","sequence":"first","affiliation":[{"name":"State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2906-0598","authenticated-orcid":false,"given":"Chao","family":"Ni","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-0499-0222","authenticated-orcid":false,"given":"Xinrong","family":"Guo","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-6996-9479","authenticated-orcid":false,"given":"Shaoxuan","family":"Liu","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-3706-0663","authenticated-orcid":false,"given":"Xiaoya","family":"Wang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0145-615X","authenticated-orcid":false,"given":"Kui","family":"Liu","sequence":"additional","affiliation":[{"name":"Software Engineering Application Technology Lab, Huawei, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4111-4189","authenticated-orcid":false,"given":"Xiaohu","family":"Yang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,4,28]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","unstructured":"Wasi Uddin Ahmad Saikat Chakraborty Baishakhi Ray and Kai-Wei Chang. 2021. Unified pre-training for program understanding and generation. arXiv:2103.06333. Retrieved from https:\/\/arxiv.org\/abs\/2103.06333.","DOI":"10.18653\/v1\/2021.naacl-main.211"},{"key":"e_1_3_2_3_2","unstructured":"Apache. 2024. fury. Retrieved from https:\/\/github.com\/apache\/fury"},{"key":"e_1_3_2_4_2","unstructured":"Apache. 2024. Retrieved from https:\/\/github.com\/apache\/hertzbeat"},{"key":"e_1_3_2_5_2","unstructured":"AI at Meta. 2023. InCoder-6B. Retrieved from https:\/\/huggingface.co\/facebook\/incoder-6B"},{"key":"e_1_3_2_6_2","unstructured":"Adam Bannister. 2021. DevSecAI: GitHub Copilot prone to writing security flaws. Retrieved from https:\/\/portswigger.net\/daily-swig\/devsecai-github-copilot-prone-to-writing-security-flaws"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3655103.3655106"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462840"},{"key":"e_1_3_2_9_2","unstructured":"BurhanUlTayyab. 2023. Pytorch implementation of DetectGPT. Retrieved from https:\/\/github.com\/BurhanUlTayyab\/DetectGPT"},{"key":"e_1_3_2_10_2","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Ponde de Oliveira Pinto Jared Kaplan Harri Edwards Yuri Burda Nicholas Joseph Greg Brockman et al. 2021. Evaluating large language models trained on code. arXiv:2107.03374. Retrieved from https:\/\/arxiv.org\/abs\/2107.03374"},{"key":"e_1_3_2_11_2","unstructured":"Yutian Chen Hao Kang Vivian Zhai Liangze Li Rita Singh and Bhiksha Ramakrishnan. 2023. Gpt-sentinel: Distinguishing human and chatgpt generated content. arXiv:2305.07969. Retrieved from http:\/\/arxiv.org\/abs\/2305.07969"},{"key":"e_1_3_2_12_2","unstructured":"Anton Cheshkov Zadorozhny Pavel and Levichev Rodion. 2023. Evaluation of ChatGPT model for vulnerability detection. arXiv:2304.07232. Retrieved from https:\/\/arxiv.org\/abs\/2304.07232"},{"key":"e_1_3_2_13_2","unstructured":"Kyunghyun Cho Bart Van Merri\u00ebnboer Dzmitry Bahdanau and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv:1409.1259. Retrieved from https:\/\/arxiv.org\/abs\/1409.1259"},{"key":"e_1_3_2_14_2","doi-asserted-by":"crossref","unstructured":"Jonathan H. Choi Kristin E. Hickman Amy Monahan and Daniel Schwarcz. 2023. Chatgpt goes to law school. SSRN Electronic Journal. Retrieved from https:\/\/doi.org\/10.2139\/ssrn.4335905","DOI":"10.2139\/ssrn.4335905"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.5555\/3294996.3295184"},{"key":"e_1_3_2_16_2","unstructured":"OpenAI community. 2023. gpt2-medium. Retrieved from https:\/\/huggingface.co\/gpt2-medium"},{"key":"e_1_3_2_17_2","unstructured":"OpenAI community. 2023. RoBERTa base OpenAI detector. Retrieved from https:\/\/huggingface.co\/roberta-base-openai-detector"},{"key":"e_1_3_2_18_2","unstructured":"T5 Community. 2023. t5-large. Retrieved from https:\/\/huggingface.co\/t5-large"},{"key":"e_1_3_2_19_2","unstructured":"Copyleaks. 2023. Copyleaks: AI Content Detector. Retrieved from https:\/\/copyleaks.com\/ai-content-detector"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00022"},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","first-page":"876","DOI":"10.1109\/COMPSAC57700.2023.00117","volume-title":"2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)","author":"Feng Yunhe","year":"2023","unstructured":"Yunhe Feng, Sreecharan Vanam, Manasa Cherukupally, Weijian Zheng, Meikang Qiu, and Haihua Chen. 2023. Investigating Code Generation Performance of Chat-GPT with Crowdsourcing Social Data. In 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 876\u2013885."},{"key":"e_1_3_2_22_2","doi-asserted-by":"crossref","unstructured":"Zhangyin Feng Daya Guo Duyu Tang Nan Duan Xiaocheng Feng Ming Gong Linjun Shou Bing Qin Ting Liu Daxin Jiang et al. 2020. Codebert: A pre-trained model for programming and natural languages. arXiv:2002.08155. Retrieved from https:\/\/arxiv.org\/abs\/2002.08155","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"e_1_3_2_23_2","first-page":"608","volume-title":"19th IEEE\/ACM International Conference on Mining Software Repositories","author":"Fu Michael","year":"2022","unstructured":"Michael Fu and Chakkrit Tantithamthavorn. 2022. LineVul: A Transformer-based Line-Level Vulnerability Prediction. In 19th IEEE\/ACM International Conference on Mining Software Repositories, 608\u2013620."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549098"},{"key":"e_1_3_2_25_2","unstructured":"Tianyu Gao Xingcheng Yao and Danqi Chen. 2021. Simcse: Simple contrastive learning of sentence embeddings. arXiv:2104.08821. Retrieved from https:\/\/arxiv.org\/abs\/2104.08821"},{"key":"e_1_3_2_26_2","doi-asserted-by":"crossref","unstructured":"Sebastian Gehrmann Hendrik Strobelt and Alexander M. Rush. 2019. Gltr: Statistical detection and visualization of generated text. arXiv:1906.04043. Retrieved from https:\/\/arxiv.org\/abs\/1906.04043","DOI":"10.18653\/v1\/P19-3019"},{"key":"e_1_3_2_27_2","unstructured":"GitHub. 2024. Example human-written Java function from QBit in GitHub. Retrieved from https:\/\/github.com\/advantageous\/qbit\/blob\/533b3671785f238d576b02b5290c6525ed60f583\/qbit\/admin\/src\/main\/java\/io\/advantageous\/qbit\/admin\/ManagedServiceBuilder.java#L717-L730"},{"key":"e_1_3_2_28_2","unstructured":"GitHub. 2024. Example human-written Python function from lobocv\/anonymous usage in GitHub. Retrieved from https:\/\/github.com\/lobocv\/anonymoususage\/blob\/847bdad0746ad1cc6c57fb9def201beb59fb8300\/anonymoususage\/tools.py#L67-L78"},{"key":"e_1_3_2_29_2","unstructured":"GitHub. 2024. Example human-written Python function from purr in GitHub. Retrieved from https:\/\/github.com\/ska-sa\/purr\/blob\/4c848768d0485d0f88b30850d0d5372221b21b66\/Purr\/Plugins\/local_pychart\/pychart_util.py#L26-L35"},{"key":"e_1_3_2_30_2","unstructured":"GPTZero. 2023. GPTZero. Retrieved from https:\/\/gptzero.me\/"},{"key":"e_1_3_2_31_2","unstructured":"Biyang Guo Xin Zhang Ziyuan Wang Minqi Jiang Jinran Nie Yuxuan Ding Jianwei Yue and Yupeng Wu. 2023. How close is ChatGPT to human experts? Comparison corpus evaluation and detection. arxiv:2301.07597. Retrieved from https:\/\/arxiv.org\/abs\/2301.07597"},{"key":"e_1_3_2_32_2","doi-asserted-by":"crossref","unstructured":"Daya Guo Shuai Lu Nan Duan Yanlin Wang Ming Zhou and Jian Yin. 2022. UniXcoder: Unified cross-modal pre-training for code representation. arXiv:2203.03850. Retrieved from http:\/\/arxiv.org\/abs\/2203.03850","DOI":"10.18653\/v1\/2022.acl-long.499"},{"key":"e_1_3_2_33_2","unstructured":"Daya Guo Shuo Ren Shuai Lu Zhangyin Feng Duyu Tang Shujie Liu Long Zhou Nan Duan Alexey Svyatkovskiy Shengyu Fu et al. 2020. Graphcodebert: Pre-training code representations with data flow. arXiv:2009.08366. Retrieved from https:\/\/arxiv.org\/abs\/2009.08366"},{"key":"e_1_3_2_34_2","unstructured":"HarryDulaney. [n.d.]. Solutions to Introduction to Java Programming by Y. Daniel Liang. 10th Edition. Retrieved from https:\/\/github.com\/HarryDulaney\/intro-to-java-programming"},{"key":"e_1_3_2_35_2","unstructured":"Hello-SimpleAI. 2023. RoBERTa-QA. Retrieved from https:\/\/huggingface.co\/Hello-SimpleAI\/chatgpt-qa-detector-roberta"},{"key":"e_1_3_2_36_2","unstructured":"Hello-SimpleAI. 2023. RoBERTa-single. Retrieved from https:\/\/huggingface.co\/Hello-SimpleAI\/chatgpt-detector-roberta"},{"key":"e_1_3_2_37_2","doi-asserted-by":"crossref","unstructured":"David Hin Andrey Kan Huaming Chen and M. Ali Babar. 2022. LineVD: Statement-level vulnerability detection using graph neural networks. arXiv:2203.05181. Retrieved from https:\/\/arxiv.org\/abs\/2203.05181","DOI":"10.1145\/3524842.3527949"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24261-3_7"},{"key":"e_1_3_2_40_2","first-page":"15077","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS \u201923)","author":"Hu Xiaomeng","year":"2023","unstructured":"Xiaomeng Hu, Pin-Yu Chen, and Tsung-Yi Ho. 2023. Radar: Robust AI-text detection via adversarial learning. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS \u201923), 15077\u201315095."},{"key":"e_1_3_2_41_2","unstructured":"Hamel Husain Ho-Hsiang Wu Tiferet Gazit Miltiadis Allamanis and Marc Brockschmidt. 2019. CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv:1909.09436. Retrieved from https:\/\/arxiv.org\/abs\/1909.09436"},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","unstructured":"Paras Jain Ajay Jain Tianjun Zhang Pieter Abbeel Joseph E. Gonzalez and Ion Stoica. 2020. Contrastive code representation learning. arXiv:2007.04973. Retrieved from https:\/\/arxiv.org\/abs\/2007.04973","DOI":"10.18653\/v1\/2021.emnlp-main.482"},{"key":"e_1_3_2_43_2","unstructured":"JavaParser. 2024. Retrieved from https:\/\/javaparser.org\/"},{"key":"e_1_3_2_44_2","unstructured":"Rapha\u00ebl Khoury Anderson R. Avila Jacob Brunelle and Baba Mamadou Camara. 2023. How secure is code generated by ChatGPT? arXiv:2304.09655. Retrieved from https:\/\/arxiv.org\/abs\/2304.09655"},{"key":"e_1_3_2_45_2","first-page":"17061","volume-title":"International Conference on Machine Learning","author":"Kirchenbauer John","year":"2023","unstructured":"John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. 2023. A watermark for large language models. In International Conference on Machine Learning. PMLR, 17061\u201317084."},{"key":"e_1_3_2_46_2","first-page":"27469","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS \u201923)","author":"Krishna Kalpesh","year":"2023","unstructured":"Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. 2023. Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS \u201923), 27469\u201327500."},{"issue":"2","key":"e_1_3_2_47_2","doi-asserted-by":"crossref","first-page":"e0000198","DOI":"10.1371\/journal.pdig.0000198","article-title":"Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models","volume":"2","author":"Tiffany H.","year":"2023","unstructured":"H. Tiffany, Morgan Kung, Arielle Cheatham, Czarina Medenilla, Lorie De Sillos, Camille Leon, Maria Elepa\u00f1o, Rimel Madriaga, Giezel Aggabao, James Diaz-Candido, et al. 2023. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS Digital Health 2, 2 (2023), e0000198.","journal-title":"PLoS Digital Health"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14539"},{"key":"e_1_3_2_49_2","unstructured":"Taehyun Lee Seokhee Hong Jaewoo Ahn Ilgee Hong Hwaran Lee Sangdoo Yun Jamin Shin and Gunhee Kim. 2023. Who wrote this code? Watermarking for code generation. arXiv:2305.15060. Retrieved from https:\/\/arxiv.org\/abs\/2305.15060"},{"key":"e_1_3_2_50_2","unstructured":"Yujia Li Daniel Tarlow Marc Brockschmidt and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv:1511.05493. Retrieved from https:\/\/arxiv.org\/abs\/1511.05493"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3468597"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549137"},{"key":"e_1_3_2_53_2","volume-title":"Introduction to Java Programming","author":"Liang Y. Daniel","year":"2003","unstructured":"Y. Daniel Liang. 2003. Introduction to Java Programming. Pearson Education India."},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2022.3233901"},{"key":"e_1_3_2_55_2","unstructured":"Xiaoming Liu Zhaohan Zhang Yichen Wang Hang Pu Yu Lan and Chao Shen. 2022. Coco: Coherence-enhanced machine-generated text detection under data limitation with contrastive learning. arXiv:2212.10341. Retrieved from https:\/\/arxiv.org\/abs\/2212.10341"},{"issue":"5","key":"e_1_3_2_56_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3643674","article-title":"Refining ChatGPT-generated code: Characterizing and mitigating code quality issues","volume":"33","author":"Liu Yue","year":"2023","unstructured":"Yue Liu, Thanh Le-Cong, Ratnadira Widyasari, Chakkrit Tantithamthavorn, Li Li, Xuan-Bach D. Le, and David Lo. 2023. Refining ChatGPT-generated code: Characterizing and mitigating code quality issues. ACM Transactions on Software Engineering and Methodology 33, 5 (2023), 1\u201326. arXiv:2307.12596","journal-title":"ACM Transactions on Software Engineering and Methodology"},{"key":"e_1_3_2_57_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692. Retrieved from https:\/\/arxiv.org\/abs\/1907.11692"},{"key":"e_1_3_2_58_2","unstructured":"Zhijie Liu Yutian Tang Xiapu Luo Yuming Zhou and Liang Feng Zhang. 2023. No need to lift a finger anymore? Assessing the quality of code generation by ChatGPT. arXiv:2308.04838. Retrieved from https:\/\/arxiv.org\/abs\/2308.04838"},{"key":"e_1_3_2_59_2","unstructured":"Code Llama. 2023. CodeLlama-7b-hf. Retrieved from https:\/\/huggingface.co\/codellama\/CodeLlama-7b-hf"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1976.233837"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF02295996"},{"key":"e_1_3_2_62_2","unstructured":"MDEGroup. 2024. Implementation of GPTSniffer. Retrieved from https:\/\/github.com\/MDEGroup\/GPTSniffer"},{"key":"e_1_3_2_63_2","unstructured":"Microsoft. 2023. unixcoder-base-nine. Retrieved from https:\/\/huggingface.co\/microsoft\/unixcoder-base-nine"},{"key":"e_1_3_2_64_2","unstructured":"Fatemehsadat Mireshghallah Justus Mattern Sicun Gao Reza Shokri and Taylor Berg-Kirkpatrick. 2023. Smaller language models are better black-box machine-generated text detectors. arXiv:2305.09859. Retrieved from https:\/\/arxiv.org\/abs\/2305.09859"},{"key":"e_1_3_2_65_2","unstructured":"Eric Mitchell Yoonho Lee Alexander Khazatsky Christopher D. Manning and Chelsea Finn. 2023. DetectGPT: Zero-shot machine-generated text detection using probability curvature. arXiv:2301.11305. Retrieved from https:\/\/arxiv.org\/abs\/2301.11305"},{"key":"e_1_3_2_66_2","unstructured":"Madhav Nair Rajat Sadhukhan and Debdeep Mukhopadhyay. 2023. Generating secure hardware using chatgpt resistant to cwes. Cryptology ePrint Archive. Retrieved from https:\/\/eprint.iacr.org\/2023\/212.pdf"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2024.112059"},{"key":"e_1_3_2_68_2","first-page":"672","volume-title":"Proceedings of the 2022 30th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering","author":"Ni Chao","year":"2022","unstructured":"Chao Ni, Wei Wang, Kaiwen Yang, Xin Xia, Kui Liu, and David Lo. 2022. The best of both worlds: Integrating semantic features with expert features for defect prediction and localization. In Proceedings of the 2022 30th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 672\u2013683."},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1145\/3508479"},{"key":"e_1_3_2_70_2","unstructured":"M. T. Nietzel. 2023. More than half of college students believe using chatgpt to complete assignments is cheating. Retrieved from https:\/\/www.forbes.com\/sites\/michaeltnietzel\/2023\/03\/20\/more-than-half-of-college-students-believe-using-chatgpt-to-complete-assignments-is-cheating\/?sh=5d4d763c18f9"},{"key":"e_1_3_2_71_2","unstructured":"NinedayWang. 2023. PolyCoder-160M. Retrieved from https:\/\/huggingface.co\/NinedayWang\/PolyCoder-160M"},{"key":"e_1_3_2_72_2","unstructured":"OpenAI. 2019. GPT-2 output detector. Retrieved from https:\/\/github.com\/openai\/gpt-2-output-dataset\/tree\/master\/detector"},{"key":"e_1_3_2_73_2","unstructured":"OpenAI. 2022. Introducing ChatGPT. Retrieved from https:\/\/openai.com\/index\/chatgpt"},{"key":"e_1_3_2_74_2","unstructured":"OpenAI. 2023. AI Text Classifier. Retrieved from https:\/\/platform.openai.com\/ai-textclassifier"},{"key":"e_1_3_2_75_2","unstructured":"OpenAI. 2023. New AI classifier for indicating AI-written text. Retrieved from https:\/\/openai.com\/blog\/new-ai-classifier-for-indicating-ai-written-text"},{"key":"e_1_3_2_76_2","unstructured":"OpenAI. 2023. tiktoken. Retrieved from https:\/\/github.com\/openai\/openai-cookbook\/blob\/main\/examples\/How_to_count_tokens_with_tiktoken.ipynb"},{"key":"e_1_3_2_77_2","unstructured":"Openkoda. 2024. Openkoda. Retrieved from https:\/\/github.com\/openkoda\/openkoda"},{"key":"e_1_3_2_78_2","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730\u201327744.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_79_2","unstructured":"Stack Overflow. 2023. Why posting GPT and ChatGPT generated answers is not currently acceptable. Retrieved from https:\/\/stackoverflow.com\/help\/gpt-policy"},{"key":"e_1_3_2_80_2","unstructured":"Wei Hung Pan Ming Jie Chok Jonathan Leong Shan Wong Yung Xin Shin Yeong Shian Poon Zhou Yang Chun Yong Chong David Lo and Mei Kuan Lim. 2024. Assessing AI detectors in identifying AI-generated code: Implications for education. arXiv:2401.03676. Retrieved from https:\/\/arxiv.org\/abs\/2401.03676"},{"key":"e_1_3_2_81_2","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans Ilya Sutskever et al. 2018. Improving language understanding by generative pre-training. Retrieved from https:\/\/openai.com\/index\/language-unsupervised"},{"key":"e_1_3_2_82_2","unstructured":"Sapling. 2023. Sapling. Retrieved from https:\/\/sapling.ai\/ai-content-detector"},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549145"},{"key":"e_1_3_2_84_2","unstructured":"Irene Solaiman Miles Brundage Jack Clark Amanda Askell Ariel Herbert-Voss Jeff Wu Alec Radford Gretchen Krueger Jong Wook Kim Sarah Kreps et al. 2019. Release strategies and the social impacts of language models. arXiv:1908.09203. Retrieved from https:\/\/arxiv.org\/abs\/1908.09203"},{"key":"e_1_3_2_85_2","unstructured":"statsmodels. 2024. proportions_ztest. Retrieved from https:\/\/www.statsmodels.org\/stable\/generated\/statsmodels.stats.proportion.proportions_ztest.html"},{"key":"e_1_3_2_86_2","first-page":"17","article-title":"Use chat gpt to solve programming bugs","volume":"3","author":"Nigar","year":"2023","unstructured":"Nigar, M. Shafiq Surameery, Mohammed, and Y. Shakor. 2023. Use chat gpt to solve programming bugs. International Journal of Information Technology & Computer Engineering (IJITC) ISSN: 2455-5290 3 (2023), 17\u201322.","journal-title":"International Journal of Information Technology & Computer Engineering (IJITC)"},{"key":"e_1_3_2_87_2","unstructured":"Susnjak Teo. 2022. ChatGPT: The end of online exam integrity? arXiv:2212.09292. Retrieved from https:\/\/arxiv.org\/abs\/2212.09292"},{"key":"e_1_3_2_88_2","unstructured":"Yahya Tashtoush Mohammed Al-Maolegi and Bassam Arkok. 2014. The correlation among software complexity metrics with case study. arXiv:1408.4523. Retrieved from https:\/\/arxiv.org\/abs\/1408.4523"},{"key":"e_1_3_2_89_2","unstructured":"Yuchuan Tian Hanting Chen Xutao Wang Zheyuan Bai Qinghua Zhang Ruifeng Li Chao Xu and Yunhe Wang. 2023. Multiscale positive-unlabeled detection of AI-generated texts. arXiv:2305.18149. Retrieved from https:\/\/arxiv.org\/abs\/2305.18149"},{"key":"e_1_3_2_90_2","unstructured":"TIOBE. 2023. TIOBE Index for April 2023. Retrieved from https:\/\/www.tiobe.com\/tiobe-index\/"},{"key":"e_1_3_2_91_2","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_3_2_92_2","unstructured":"Vivek Verma Eve Fleisig Nicholas Tomlin and Dan Klein. 2023. Ghostbuster: Detecting text ghostwritten by large language models. arXiv:2305.15047. Retrieved from https:\/\/arxiv.org\/abs\/2305.15047"},{"key":"e_1_3_2_93_2","doi-asserted-by":"crossref","unstructured":"Yue Wang Weishi Wang Shafiq Joty and Steven C. H. Hoi. 2021. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv:2109.00859. Retrieved from https:\/\/arxiv.org\/abs\/2109.00859","DOI":"10.18653\/v1\/2021.emnlp-main.685"},{"key":"e_1_3_2_94_2","volume-title":"Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems","author":"Sholom","year":"1991","unstructured":"Sholom, M. Weiss, and Casimir A. Kulikowski. 1991. Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufmann Publishers Inc."},{"key":"e_1_3_2_95_2","unstructured":"wjx.cn. 2023. WJX. Retrieved from https:\/\/www.wjx.cn\/"},{"key":"e_1_3_2_96_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_2_97_2","unstructured":"Writer. 2023. Writer: AI Content Detector. Retrieved from https:\/\/writer.com\/ai-content-detector\/"},{"key":"e_1_3_2_98_2","unstructured":"Kangxi Wu Liang Pang Huawei Shen Xueqi Cheng and Tat-Seng Chua. 2023. Llmdet: A large language models detection tool. arXiv:2305.15004. Retrieved from https:\/\/arxiv.org\/abs\/2305.15004"},{"key":"e_1_3_2_99_2","unstructured":"Chunqiu Steven Xia and Lingming Zhang. 2023. Keep the conversation going: Fixing 162 out of 337 bugs for \\(\\$\\) 0.42 each using ChatGPT. arXiv:2304.00385. Retrieved from https:\/\/arxiv.org\/abs\/2304.00385"},{"key":"e_1_3_2_100_2","doi-asserted-by":"publisher","DOI":"10.1145\/3520312.3534862"},{"key":"e_1_3_2_101_2","unstructured":"Xiaodan Xu. 2024. Replication of this paper. Retrieved from https:\/\/github.com\/doriscullen\/CodeGPTSensor"},{"key":"e_1_3_2_102_2","unstructured":"Xianjun Yang. 2024. Codes for paper: Zero-shot detection of machine-generated codes. Retrieved from https:\/\/github.com\/Xianjun-Yang\/Code_detection"},{"key":"e_1_3_2_103_2","unstructured":"Xianjun Yang Wei Cheng Linda Petzold William Yang Wang and Haifeng Chen. 2023. Dna-gpt: Divergent n-gram analysis for training-free detection of gpt-generated text. arXiv:2305.17359. Retrieved from https:\/\/arxiv.org\/abs\/2305.17359"},{"key":"e_1_3_2_104_2","unstructured":"Yang Xianjun Pan Liangming Zhao Xuandong Chen Haifeng Petzold Linda Wang William Yang and Cheng Wei. 2023. A survey on detection of llms-generated content. arXiv:2310.15654. Retrieved from https:\/\/arxiv.org\/abs\/2310.15654"},{"key":"e_1_3_2_105_2","unstructured":"Yang Xianjun Zhang Kexun Chen Haifeng Petzold Linda Wang William Yang and Cheng Wei. 2023. Zero-shot detection of machine-generated codes. arXiv:2310.05103. Retrieved from https:\/\/arxiv.org\/abs\/2310.05103"},{"key":"e_1_3_2_106_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33017378"},{"key":"e_1_3_2_107_2","unstructured":"Terry Yin. 2023. lizard. Retrieved from https:\/\/github.com\/terryyin\/lizard"},{"key":"e_1_3_2_108_2","unstructured":"Yu Xiao Qi Yuang Chen Kejiang Chen Guoqiang Yang Xi Zhu Pengyuan Zhang Weiming and Yu Nenghai. 2023. Gpt paternity test: Gpt generated text detection with gpt genetic inheritance. arXiv:2305.12519. Retrieved from https:\/\/arxiv.org\/abs\/2305.12519"},{"key":"e_1_3_2_109_2","unstructured":"ZeroGPT. 2023. AI Text Detector. Retrieved from https:\/\/www.zerogpt.com"},{"key":"e_1_3_2_110_2","unstructured":"Zhan Haolan He Xuanli Xu Qiongkai Wu Yuxiang and Stenetorp Pontus. 2023. G3detector: General gpt-generated text detector. arXiv:2305.12680. Retrieved from https:\/\/arxiv.org\/abs\/2305.12680"},{"key":"e_1_3_2_111_2","unstructured":"Susan Zhang Stephen Roller Naman Goyal Mikel Artetxe Moya Chen Shuohui Chen Christopher Dewan Mona Diab Xian Li Xi Victoria Lin et al. 2022. Opt: Open pre-trained transformer language models. arXiv:2205.01068. Retrieved from https:\/\/arxiv.org\/abs\/2205.01068"},{"key":"e_1_3_2_112_2","unstructured":"Li Zhong and Zilong Wang. 2023. A study on robustness and reliability of large language model code generation. arXiv:2308.10335. Retrieved from https:\/\/arxiv.org\/abs\/2308.10335"},{"key":"e_1_3_2_113_2","first-page":"10197","volume-title":"Proceedings of the 33rd International Conference on Neural Information Processing Systems","author":"Zhou Yaqin","year":"2019","unstructured":"Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, 10197\u201310207."},{"key":"e_1_3_2_114_2","unstructured":"Yuxiang Zhu and Minxue Pan. 2019. Automatic code summarization: A systematic literature review. arXiv:1909.04352. Retrieved from https:\/\/arxiv.org\/abs\/1909.04352"}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3705300","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3705300","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:02Z","timestamp":1750295882000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3705300"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,28]]},"references-count":113,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,5,31]]}},"alternative-id":["10.1145\/3705300"],"URL":"https:\/\/doi.org\/10.1145\/3705300","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,28]]},"assertion":[{"value":"2023-10-18","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-14","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}