{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T16:33:26Z","timestamp":1781109206637,"version":"3.54.1"},"reference-count":83,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2025,4,16]],"date-time":"2025-04-16T00:00:00Z","timestamp":1744761600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,4,16]],"date-time":"2025-04-16T00:00:00Z","timestamp":1744761600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Empir Software Eng"],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>The widespread adoption of conversational LLMs for software development has raised new security concerns regarding the safety of LLM-generated content. Our motivational study outlines ChatGPT\u2019s potential in volunteering context-specific information to the developers, promoting safe coding practices. Motivated by this finding, we conduct a study to evaluate the degree of security awareness exhibited by three prominent LLMs: Claude 3, GPT-4, and Llama 3. We prompt these LLMs with Stack Overflow questions that contain vulnerable code to evaluate whether they merely provide answers to the questions or if they also warn users about the insecure code, thereby demonstrating a degree of security awareness. Further, we assess whether LLM responses provide information about the causes, exploits, and the potential fixes of the vulnerability, to help raise users\u2019 awareness. Our findings show that all three models struggle to accurately detect and warn users about vulnerabilities, achieving a detection rate of only 12.6% to 40% across our datasets. We also observe that the LLMs tend to identify certain types of vulnerabilities related to sensitive information exposure and improper input neutralization much more frequently than other types, such as those involving external control of file names or paths. Furthermore, when LLMs do issue security warnings, they often provide more information on the causes, exploits, and fixes of vulnerabilities compared to Stack Overflow responses. Finally, we provide an in-depth discussion on the implications of our findings, and demonstrated a CLI-based prompting tool that can be used to produce more secure LLM responses.<\/jats:p>","DOI":"10.1007\/s10664-025-10658-6","type":"journal-article","created":{"date-parts":[[2025,4,16]],"date-time":"2025-04-16T08:57:04Z","timestamp":1744793824000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Do LLMs consider security? an empirical study on responses to programming questions"],"prefix":"10.1007","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0188-7005","authenticated-orcid":false,"given":"Amirali","family":"Sajadi","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Binh","family":"Le","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Anh","family":"Nguyen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kostadin","family":"Damevski","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Preetha","family":"Chatterjee","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2025,4,16]]},"reference":[{"key":"10658_CR1","unstructured":"AI Stack Exchange (2023) Was ChatGPT Trained on Stack Overflow Data? https:\/\/ai.stackexchange.com\/questions\/38660\/was-chatgpt-trained-on-stack-overflow-data. Accessed 07 Oct 2024"},{"key":"10658_CR2","doi-asserted-by":"crossref","unstructured":"Akuthota V, Kasula R, Sumona ST, Mohiuddin M, Reza MT, Rahman MM (2023) Vulnerability detection and monitoring using llm. In: 2023 IEEE 9th International women in engineering (WIE) conference on electrical and computer engineering (WIECON-ECE), pp 309\u2013314. IEEE","DOI":"10.1109\/WIECON-ECE60392.2023.10456393"},{"key":"10658_CR3","unstructured":"Anthropic (2024) Claude. https:\/\/www.anthropic.com\/claude. Accessed 20 May 2024"},{"issue":"6","key":"10658_CR4","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1007\/s10664-023-10380-1","volume":"28","author":"O Asare","year":"2023","unstructured":"Asare O, Nagappan M, Asokan N (2023) Is github\u2019s copilot as bad as humans at introducing vulnerabilities in code? Empir Softw Eng 28(6):129","journal-title":"Empir Softw Eng"},{"key":"10658_CR5","unstructured":"Bakhshandeh A, Keramatfar A, Norouzi A, Chekidehkhoun MM (2023) Using chatgpt as a static application security testing tool. arXiv preprint arXiv:2308.14434"},{"key":"10658_CR6","doi-asserted-by":"publisher","unstructured":"Banerjee S, Sussman M, Lian Y (2023) Dimensional Analysis in Error Reduction for Prediction of Nucleate Boiling Heat Flux by Artificial Neural Networks for Limited Dataset. ASME J Heat Mass Transfer 145(6):061602. https:\/\/doi.org\/10.1115\/1.4056539https:\/\/asmedigitalcollection.asme.org\/heattransfer\/article-pdf\/145\/6\/061602\/6979081\/ht_145_06_061602.pdf","DOI":"10.1115\/1.4056539"},{"key":"10658_CR7","doi-asserted-by":"crossref","unstructured":"Belzner L, Gabor T, Wirsing M (2023) Large language model assisted software engineering: prospects, challenges, and a case study. In: International conference on bridging the gap between AI and reality, pp 355\u2013374. Springer","DOI":"10.1007\/978-3-031-46002-9_23"},{"key":"10658_CR8","doi-asserted-by":"crossref","unstructured":"Chakraborty S, Ahmed T, Ding Y, Devanbu PT, Ray B (2022) Natgen: generative pre-training by \u201cnaturalizing\u201d source code. In: Proceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering, pp 18\u201330","DOI":"10.1145\/3540250.3549162"},{"key":"10658_CR9","doi-asserted-by":"publisher","unstructured":"Chatterjee P, Damevski K, Pollock L, Augustine V, Kraft NA (2019) Exploratory Study of Slack Q &A Chats as a Mining Source for Software Engineering Tools. In: Proceedings of the 16th international conference on mining software repositories (MSR\u201919). https:\/\/doi.org\/10.1109\/MSR.2019.00075","DOI":"10.1109\/MSR.2019.00075"},{"key":"10658_CR10","doi-asserted-by":"publisher","unstructured":"Chatterjee P, Kong M, Pollock L (2020) Finding help with programming errors: An exploratory study of novice software engineers\u2019 focus in stack overflow posts. J Syst Softw 159:110454. https:\/\/doi.org\/10.1016\/j.jss.2019.110454http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0164121219302286","DOI":"10.1016\/j.jss.2019.110454"},{"key":"10658_CR11","unstructured":"Cheshkov A, Zadorozhny P, Levichev R (2023) Evaluation of chatgpt model for vulnerability detection. arXiv preprint arXiv:2304.07232"},{"key":"10658_CR12","unstructured":"Chopra B, Singha A, Fariha A, Gulwani S, Parnin C, Tiwari A, Henley AZ (2023) Conversational challenges in ai-powered data science: Obstacles, needs, and design opportunities. arXiv:2310.16164"},{"key":"10658_CR13","unstructured":"CVE Program (2024) Cve - common vulnerabilities and exposures: Metrics. https:\/\/www.cve.org\/about\/Metrics. Accessed 02 May 2025"},{"key":"10658_CR14","unstructured":"Da\u00a0Silva L, Samhi J, Khomh F (2024) Chatgpt vs llama: Impact, reliability, and challenges in stack overflow discussions. arXiv preprint arXiv:2402.08801"},{"key":"10658_CR15","doi-asserted-by":"crossref","unstructured":"Das JK, Mondal S, Roy CK (2024) Investigating the utility of chatgpt in the issue tracking system: An exploratory study. In: 2024 IEEE\/ACM 21th International conference on mining software repositories (MSR)","DOI":"10.1145\/3643991.3645083"},{"key":"10658_CR16","doi-asserted-by":"crossref","unstructured":"Delile Z, Radel S, Godinez J, Engstrom G, Brucker T, Young K, Ghanavati S (2023) Evaluating privacy questions from stack overflow: Can chatgpt compete? In: 2023 IEEE 31st International requirements engineering conference workshops (REW), pp 239\u2013244. IEEE","DOI":"10.1109\/REW57809.2023.00048"},{"key":"10658_CR17","unstructured":"Eastman D (2023) How conversational programming will democratize computing. https:\/\/thenewstack.io\/how-conversational-programming-will-democratize-computing\/"},{"key":"10658_CR18","unstructured":"Eli S, Gil D (2023) Self-enhancing pattern detection with llms: Our answer to uncovering malicious packages at scale. https:\/\/apiiro.com\/blog\/llm-code-pattern-malicious-package-detection\/. Accessed 20 May 2024"},{"key":"10658_CR19","doi-asserted-by":"crossref","unstructured":"Falade PV (2023) Decoding the threat landscape: Chatgpt, fraudgpt, and wormgpt in social engineering attacks. arXiv preprint arXiv:2310.05595","DOI":"10.32628\/CSEIT2390533"},{"key":"10658_CR20","doi-asserted-by":"crossref","unstructured":"Fischer F, B\u00f6ttinger K, Xiao H, Stransky C, Acar Y, Backes M, Fahl S (2017) Stack overflow considered harmful? the impact of copy &paste on android application security. In: 2017 IEEE symposium on security and privacy (SP), pp 121\u2013136. IEEE","DOI":"10.1109\/SP.2017.31"},{"key":"10658_CR21","unstructured":"Fu Y, Liang P, Tahir A, Li Z, Shahin M, Yu J, Chen J (2023) Security weaknesses of copilot generated code in github. arXiv preprint arXiv:2310.02059"},{"key":"10658_CR22","unstructured":"GitHub (2022) CodeQL. https:\/\/github.com\/github\/codeql"},{"key":"10658_CR23","unstructured":"GitHub and OpenAI (2024) GitHub Copilot $$\\cdot $$ Your AI pair programmer. https:\/\/copilot.github.com\/. Accessed 16 Feb 2024"},{"key":"10658_CR24","doi-asserted-by":"crossref","unstructured":"Hamer S, d\u2019Amorim M, Williams L (2024) Just another copy and paste? comparing the security vulnerabilities of chatgpt generated code and stackoverflow answers. arXiv preprint arXiv:2403.15600","DOI":"10.1109\/SPW63631.2024.00014"},{"key":"10658_CR25","doi-asserted-by":"crossref","unstructured":"Hao H, Hasan KA, Qin H, Macedo M, Tian Y, Ding SHH, Hassan AE (2024) An empirical study on developers shared conversations with chatgpt in github pull requests and issues. 2403.10468","DOI":"10.1007\/s10664-024-10540-x"},{"key":"10658_CR26","doi-asserted-by":"crossref","unstructured":"Happe A, Cito J (2023) Getting pwn\u2019d by ai: Penetration testing with large language models. In: Proceedings of the 31st ACM joint european software engineering conference and symposium on the foundations of software engineering, pp 2082\u20132086","DOI":"10.1145\/3611643.3613083"},{"key":"10658_CR27","unstructured":"Henrik P (2023) LLM-assisted malware review: Ai and humans join forces to combat malware. https:\/\/shorturl.at\/loqT4. Accessed 20 May 2024"},{"key":"10658_CR28","unstructured":"Hou X, Zhao Y, Liu Y, Yang Z, Wang K, Li L, Luo X, Lo D, Grundy J, Wang H (2023) Large language models for software engineering: A systematic literature review. arXiv preprint arXiv:2308.10620"},{"key":"10658_CR29","unstructured":"Huang Q, Xia X, Lo D, Murphy GC (2018) Automating intention mining. IEEE Trans Softw Eng 1\u20131"},{"key":"10658_CR30","unstructured":"Hui B, Yang J, Cui Z, Yang J, Liu D, Zhang L, Liu T, Zhang J, Yu B, Lu K et al (2024) Qwen2. 5-coder technical report. arXiv preprint arXiv:2409.12186"},{"key":"10658_CR31","doi-asserted-by":"publisher","unstructured":"Imran MM, Damevski K (2022) Using clarification questions to improve software developers\u2019 web search. Inf Softw Technol 151(C). https:\/\/doi.org\/10.1016\/j.infsof.2022.107021","DOI":"10.1016\/j.infsof.2022.107021"},{"key":"10658_CR32","unstructured":"Jensen RIT, Tawosi V, Alamir S (2024) Software vulnerability and functionality assessment using llms. arXiv preprint arXiv:2403.08429"},{"key":"10658_CR33","unstructured":"JetBrains (2023) The state of developer ecosystem 2023. https:\/\/www.jetbrains.com\/lp\/devecosystem-2023\/"},{"key":"10658_CR34","doi-asserted-by":"crossref","unstructured":"Jiang N, Liu K, Lutellier T, Tan L (2023) Impact of code language models on automated program repair. In: 2023 IEEE\/ACM 45th International conference on software engineering (ICSE), pp 1430\u20131442. IEEE","DOI":"10.1109\/ICSE48619.2023.00125"},{"key":"10658_CR35","doi-asserted-by":"crossref","unstructured":"Jin M, Shahriar S, Tufano M, Shi X, Lu S, Sundaresan N, Svyatkovskiy A (2023) Inferfix: End-to-end program repair with llms. In: Proceedings of the 31st ACM joint european software engineering conference and symposium on the foundations of software engineering, pp 1646\u20131656","DOI":"10.1145\/3611643.3613892"},{"key":"10658_CR36","doi-asserted-by":"crossref","unstructured":"Kabir S, Udo-Imeh DN, Kou B, Zhang T (2024) Is stack overflow obsolete? an empirical study of the characteristics of chatgpt answers to stack overflow questions. In: Proceedings of the CHI conference on human factors in computing systems, pp 1\u201317","DOI":"10.1145\/3613904.3642596"},{"key":"10658_CR37","doi-asserted-by":"crossref","unstructured":"Khoury R, Avila AR, Brunelle J, Camara BM (2023) How secure is code generated by chatgpt? In: 2023 IEEE International conference on systems, man, and cybernetics (SMC), pp 2445\u20132451. IEEE","DOI":"10.1109\/SMC53992.2023.10394237"},{"key":"10658_CR38","unstructured":"Kosinski M (2023) Log4j vulnerability detection and patching. https:\/\/www.ibm.com\/think\/topics\/log4j. Accessed 08 Nov 2024"},{"key":"10658_CR39","doi-asserted-by":"crossref","unstructured":"Licorish SA, Nishatharan T (2021) Contextual profiling of stack overflow java code security vulnerabilities initial insights from a pilot study. In: 2021 IEEE 21st International conference on software quality, reliability and security companion (QRS-C), pp 1060\u20131068. IEEE","DOI":"10.1109\/QRS-C55045.2021.00160"},{"key":"10658_CR40","doi-asserted-by":"crossref","unstructured":"Lill A, Meyer AN, Fritz T (2024) On the helpfulness of answering developer questions on discord with similar conversations and posts from the past. In: Proceedings of the 46th IEEE\/ACM international conference on software engineering, pp 1\u201313","DOI":"10.1145\/3597503.3623341"},{"key":"10658_CR41","unstructured":"Liu J, Tang X, Li L, Chen P, Liu Y (2023a) Which is a better programming assistant? a comparative study between chatgpt and stack overflow. arXiv preprint arXiv:2308.13851"},{"key":"10658_CR42","unstructured":"Liu P, Sun C, Zheng Y, Feng X, Qin C, Wang Y, Li Z, Sun L (2023b) Harnessing the power of llm to support binary taint analysis. arXiv preprint arXiv:2310.08275"},{"key":"10658_CR43","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2024.112031","volume":"212","author":"G Lu","year":"2024","unstructured":"Lu G, Ju X, Chen X, Pei W, Cai Z (2024) Grace: Empowering llm-based software vulnerability detection with graph structure and in-context learning. J Syst Softw 212:112031","journal-title":"J Syst Softw"},{"issue":"3","key":"10658_CR44","doi-asserted-by":"publisher","first-page":"276","DOI":"10.11613\/BM.2012.031","volume":"22","author":"ML McHugh","year":"2012","unstructured":"McHugh ML (2012) Interrater reliability: the kappa statistic. Biochemia Medica 22(3):276\u2013282","journal-title":"Biochemia Medica"},{"key":"10658_CR45","unstructured":"Meta (2024) LLaMA. https:\/\/llama.meta.com\/ Accessed 20 May 2024"},{"key":"10658_CR46","doi-asserted-by":"publisher","unstructured":"Nam D, Macvean A, Hellendoorn V, Vasilescu B, Myers B (2024) Using an llm to help with code understanding. In: Proceedings of the IEEE\/ACM 46th international conference on software engineering, association for computing machinery. New York, NY, USA, ICSE \u201924. https:\/\/doi.org\/10.1145\/3597503.3639187","DOI":"10.1145\/3597503.3639187"},{"key":"10658_CR47","unstructured":"Noever D (2023) Can large language models find and fix vulnerable software? arXiv preprint arXiv:2308.10345"},{"key":"10658_CR48","unstructured":"OpenAI (2023) Chatgpt. https:\/\/www.openai.com\/"},{"key":"10658_CR49","unstructured":"OpenAI (2024) OpenAI Documentation. https:\/\/platform.openai.com\/docs\/api-reference\/chat\/create. Accessed 02 July 2024"},{"key":"10658_CR50","doi-asserted-by":"crossref","unstructured":"Pan S, Bao L, Ren X, Xia X, Lo D, Li S (2021) Automating developer chat mining. In: 2021 36th IEEE\/ACM International conference on automated software engineering (ASE), pp 854\u2013866. IEEE","DOI":"10.1109\/ASE51524.2021.9678923"},{"key":"10658_CR51","doi-asserted-by":"crossref","unstructured":"Pearce H, Ahmad B, Tan B, Dolan-Gavitt B, Karri R (2022) Asleep at the keyboard? assessing the security of github copilot\u2019s code contributions. In: 2022 IEEE symposium on security and privacy (SP), pp 754\u2013768. IEEE","DOI":"10.1109\/SP46214.2022.9833571"},{"key":"10658_CR52","doi-asserted-by":"crossref","unstructured":"Pearce H, Tan B, Ahmad B, Karri R, Dolan-Gavitt B (2023) Examining zero-shot vulnerability repair with large language models. In: 2023 IEEE symposium on security and privacy (SP), pp 2339\u20132356. IEEE","DOI":"10.1109\/SP46215.2023.10179324"},{"key":"10658_CR53","doi-asserted-by":"crossref","unstructured":"Perry N, Srivastava M, Kumar D, Boneh D (2023) Do users write more insecure code with ai assistants? In: Proceedings of the 2023 ACM SIGSAC conference on computer and communications security, pp 2785\u20132799","DOI":"10.1145\/3576915.3623157"},{"key":"10658_CR54","doi-asserted-by":"crossref","unstructured":"Purba MD, Ghosh A, Radford BJ, Chu B (2023) Software vulnerability detection using large language models. In: 2023 IEEE 34th International symposium on software reliability engineering workshops (ISSREW), pp 112\u2013119. IEEE","DOI":"10.1109\/ISSREW60843.2023.00058"},{"key":"10658_CR55","doi-asserted-by":"publisher","unstructured":"Rahman MM, Roy CK, Keivanloo I (2015) Recommending Insightful Comments for Source Code using Crowdsourced Knowledge. In: 2015 IEEE 15th International working conference on source code analysis and manipulation (SCAM), pp 81\u201390. https:\/\/doi.org\/10.1109\/SCAM.2015.7335404","DOI":"10.1109\/SCAM.2015.7335404"},{"key":"10658_CR56","unstructured":"Ramos D, Mamede C, Jain K, Canelas P, Gamboa C, Goues CL (2024) Are large language models memorizing bug benchmarks? arXiv preprint arXiv:2411.13323"},{"key":"10658_CR57","doi-asserted-by":"publisher","first-page":"1192","DOI":"10.1007\/s10664-015-9379-3","volume":"21","author":"C Rosen","year":"2015","unstructured":"Rosen C, Shihab E (2015) What are mobile developers asking about? a large scale study using stack overflow. Empir Softw Eng 21:1192\u20131223","journal-title":"Empir Softw Eng"},{"key":"10658_CR58","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1016\/j.jss.2019.06.001","volume":"156","author":"ER Russo","year":"2019","unstructured":"Russo ER, Di Sorbo A, Visaggio CA, Canfora G (2019) Summarizing vulnerabilities\u2019 descriptions to support experts during vulnerability assessment activities. J Syst Softw 156:84\u201399","journal-title":"J Syst Softw"},{"key":"10658_CR59","doi-asserted-by":"crossref","unstructured":"Savelka J, Agarwal A, Bogart C, Song Y, Sakr M (2023) Can generative pre-trained transformers (gpt) pass assessments in higher education programming courses? In: Proceedings of the 2023 conference on innovation and technology in computer science education V 1, pp 117\u2013123","DOI":"10.1145\/3587102.3588792"},{"key":"10658_CR60","unstructured":"Shani I (2024) Survey reveals AI\u2019s impact on the developer experience. https:\/\/github.blog\/2023-06-13-survey-reveals-ais-impact-on-the-developer-experience\/. Accessed 22 July 2024"},{"key":"10658_CR61","doi-asserted-by":"crossref","unstructured":"Siddiq ML, Santos JC (2022) Securityeval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques. In: Proceedings of the 1st international workshop on mining software repositories applications for privacy and security, pp 29\u201333","DOI":"10.1145\/3549035.3561184"},{"key":"10658_CR62","doi-asserted-by":"crossref","unstructured":"Siddiq ML, Da\u00a0Silva\u00a0Santos JC, Tanvir RH, Ulfat N, Al\u00a0Rifat F, Carvalho\u00a0Lopes V (2024a) Using large language models to generate junit tests: An empirical study. In: Proceedings of the 28th international conference on evaluation and assessment in software engineering, pp 313\u2013322","DOI":"10.1145\/3661167.3661216"},{"key":"10658_CR63","doi-asserted-by":"crossref","unstructured":"Siddiq ML, Roney L, Zhang J, Santos JCDS (2024b) Quality assessment of chatgpt generated code and their use by developers. In: Proceedings of the 21st international conference on mining software repositories, pp 152\u2013156","DOI":"10.1145\/3643991.3645071"},{"key":"10658_CR64","unstructured":"Stack Exchange (2023) Stack overflow data dump. https:\/\/archive.org\/details\/stackexchange. Accessed 02 July 2024"},{"key":"10658_CR65","unstructured":"Stack Overflow (2024) 2024 developer survey. https:\/\/survey.stackoverflow.co\/2024\/technology\/. Accessed 04 Nov 2024"},{"key":"10658_CR66","unstructured":"Steenhoek B, Rahman MM, Roy MK, Alam MS, Barr ET, Le W (2024) A comprehensive study of the capabilities of large language models for vulnerability detection. arXiv preprint arXiv:2403.17218"},{"key":"10658_CR67","doi-asserted-by":"crossref","unstructured":"Mohamed S, E P, Parvin A (2024) Chatting with ai: Deciphering developer conversations with chatgpt. In: 2024 IEEE\/ACM 21th International conference on mining software repositories (MSR)","DOI":"10.1145\/3643991.3645078"},{"key":"10658_CR68","unstructured":"Team DC (2024) Deepseek coder: Code generation and understanding models. https:\/\/deepseekcoder.github.io\/. Accessed 25 March 2025"},{"key":"10658_CR69","doi-asserted-by":"crossref","unstructured":"Thapa C, Jang SI, Ahmed ME, Camtepe S, Pieprzyk J, Nepal S (2022) Transformer-based language models for software vulnerability detection. In: Proceedings of the 38th annual computer security applications conference, pp 481\u2013496","DOI":"10.1145\/3564625.3567985"},{"key":"10658_CR70","doi-asserted-by":"publisher","unstructured":"Treude C, Barzilay O, Storey MA (2011) How do programmers ask and answer questions on the web? (nier track). In: Proceedings of the 33rd international conference on software engineering. ACM, New York, NY, USA, ICSE \u201911, pp 804\u2013807. https:\/\/doi.org\/10.1145\/1985793.1985907","DOI":"10.1145\/1985793.1985907"},{"key":"10658_CR71","doi-asserted-by":"crossref","unstructured":"Ullah S, Han M, Pujar S, Pearce H, Coskun A, Stringhini G (2024) Llms cannot reliably identify and reason about security vulnerabilities (yet?): A comprehensive evaluation, framework, and benchmarks. In: IEEE symposium on security and privacy","DOI":"10.1109\/SP54263.2024.00210"},{"key":"10658_CR72","unstructured":"Wang J, Huang Z, Liu H, Yang N, Xiao Y (2023) Defecthunter: A novel llm-driven boosted-conformer-based code vulnerability detection mechanism. arXiv preprint arXiv:2309.15324"},{"key":"10658_CR73","unstructured":"Xia CS, Zhang L (2023) Conversational automated program repair. 2301.13246"},{"key":"10658_CR74","unstructured":"Xia CS, Wei Y, Zhang L (2022) Practical program repair in the era of large pre-trained language models. arXiv preprint arXiv:2210.14179"},{"key":"10658_CR75","doi-asserted-by":"crossref","unstructured":"Xiao T, Treude C, Hata H, Matsumoto K (2023) Devgpt: Studying developer-chatgpt conversations. arXiv preprint arXiv:2309.03914","DOI":"10.1145\/3643991.3648400"},{"key":"10658_CR76","doi-asserted-by":"publisher","first-page":"910","DOI":"10.1007\/s11390-016-1672-0","volume":"31","author":"XL Yang","year":"2016","unstructured":"Yang XL, Lo D, Xia X, Wan ZY, Sun JL (2016) What security questions do developers ask? a large-scale study of stack overflow posts. J Comput Sci Technol 31:910\u2013924","journal-title":"J Comput Sci Technol"},{"key":"10658_CR77","unstructured":"Yao JY, Ning KP, Liu ZH, Ning MN, Yuan L (2023) Llm lies: Hallucinations are not bugs, but features as adversarial examples. arXiv preprint arXiv:2310.01469"},{"key":"10658_CR78","doi-asserted-by":"crossref","unstructured":"Yao Y, Duan J, Xu K, Cai Y, Sun Z, Zhang Y (2024) A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing p 100211","DOI":"10.1016\/j.hcc.2024.100211"},{"key":"10658_CR79","doi-asserted-by":"crossref","unstructured":"Zhang T, Upadhyaya G, Reinhardt A, Rajan H, Kim M (2018) Are code examples on an online q &a forum reliable? a study of api misuse on stack overflow. In: Proceedings of the 40th international conference on software engineering, pp 886\u2013896","DOI":"10.1145\/3180155.3180260"},{"key":"10658_CR80","unstructured":"Zhang Y, Song W, Ji Z, Meng N et\u00a0al (2023) How well does llm generate security tests? arXiv preprint arXiv:2310.00710"},{"key":"10658_CR81","doi-asserted-by":"crossref","unstructured":"Zheng Z, Ning K, Chen J, Wang Y, Chen W, Guo L, Wang W (2023) Towards an understanding of large language models in software engineering tasks. arXiv preprint arXiv:2308.11396","DOI":"10.1007\/s10664-024-10602-0"},{"key":"10658_CR82","doi-asserted-by":"crossref","unstructured":"Zhou X, Kim K, Xu B, Han D, Lo D (2024a) Out of sight, out of mind: Better automatic vulnerability repair by broadening input ranges and sources. In: Proceedings of the IEEE\/ACM 46th international conference on software engineering, pp 1\u201313","DOI":"10.1145\/3597503.3639222"},{"key":"10658_CR83","doi-asserted-by":"crossref","unstructured":"Zhou X, Zhang T, Lo D (2024b) Large language model for vulnerability detection: Emerging results and future directions. arXiv preprint arXiv:2401.15468","DOI":"10.1145\/3639476.3639762"}],"container-title":["Empirical Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-025-10658-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10664-025-10658-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-025-10658-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,5]],"date-time":"2025-06-05T09:56:33Z","timestamp":1749117393000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10664-025-10658-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,16]]},"references-count":83,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["10658"],"URL":"https:\/\/doi.org\/10.1007\/s10664-025-10658-6","relation":{},"ISSN":["1382-3256","1573-7616"],"issn-type":[{"value":"1382-3256","type":"print"},{"value":"1573-7616","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,16]]},"assertion":[{"value":"3 April 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 April 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable to this study.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical Standards"}},{"value":"Not applicable to this study.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical Approval"}},{"value":"Not applicable to this study.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Informed Consent"}},{"value":"The authors declare that they have no conflict of interest.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of Interest"}}],"article-number":"101"}}