{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T18:10:54Z","timestamp":1772043054661,"version":"3.50.1"},"reference-count":77,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,11,19]],"date-time":"2025-11-19T00:00:00Z","timestamp":1763510400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council","doi-asserted-by":"publisher","award":["RGPIN-2020-06797"],"award-info":[{"award-number":["RGPIN-2020-06797"]}],"id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Common Weakness Enumerations (CWEs) and Common Vulnerabilities and Exposures (CVEs) are open knowledge bases that provide definitions, descriptions, and samples of code vulnerabilities. The combination of Large Language Models (LLMs) with vulnerability knowledge bases helps to enhance and automate code vulnerability repair. Several key factors come into play in this setting, including (1) the retrieval of the most relevant context to a specific vulnerable code snippet; (2) augmenting LLM prompts with the retrieved context; and (3) the generated artifact form, such as a code repair with natural language explanations or a code repair only. Artifacts produced by these factors often lack transparency and explainability regarding the rationale behind the repair. In this paper, we propose an LLM-enabled framework for explainable recommendation of vulnerable code repairs with techniques addressing each factor. Our method is data-driven, which means the data characteristics of the selected CWE and CVE datasets and the knowledge base determine the best retrieval strategies. Across 100 experiments, we observe the inadequacy of the SOTA metrics to differentiate between low-quality and irrelevant repairs. To address this limitation, we design the LLM-as-a-Judge framework to enhance the robustness of recommendation assessments. Compared to baselines from prior works, as well as using static code analysis and LLMs in zero-shot, our findings highlight that multifaceted LLMs guided by retrieval context produce explainable and reliable recommendations under a small to mild level of self-alignment bias. Our work is developed on open-source knowledge bases and models, which makes it reproducible and extensible to new datasets and retrieval strategies.<\/jats:p>","DOI":"10.3390\/make7040149","type":"journal-article","created":{"date-parts":[[2025,11,19]],"date-time":"2025-11-19T15:03:15Z","timestamp":1763564595000},"page":"149","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Explainable Recommendation of Software Vulnerability Repair Based on Metadata Retrieval and Multifaceted LLMs"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-9687-9120","authenticated-orcid":false,"given":"Alfred Asare","family":"Amoah","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, Concordia University, Montr\u00e9al, QC H4B 1R6, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6747-8151","authenticated-orcid":false,"given":"Yan","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Concordia University, Montr\u00e9al, QC H4B 1R6, Canada"}]}],"member":"1968","published-online":{"date-parts":[[2025,11,19]]},"reference":[{"key":"ref_1","unstructured":"NIST (2025, August 08). NVD\u2014Vulnerabilities, Available online: https:\/\/nvd.nist.gov\/vuln."},{"key":"ref_2","unstructured":"(2025, August 08). Octoverse 2024: The State of Open Source. Available online: https:\/\/github.blog\/news-insights\/octoverse\/octoverse-2024\/."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1109\/MSP.2017.15","article-title":"Seven Years of Software Vulnerabilities: The Ebb and Flow","volume":"15","author":"Homaei","year":"2017","journal-title":"IEEE Secur. Priv."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1145\/2187671.2187673","article-title":"Mitigating program security vulnerabilities: Approaches and challenges","volume":"44","author":"Shahriar","year":"2012","journal-title":"ACM Comput. Surv."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zhou, X., Kim, K., Xu, B., Han, D., and Lo, D. (2024, January 14\u201320). Out of Sight, Out of Mind: Better Automatic Vulnerability Repair by Broadening Input Ranges and Sources. Proceedings of the IEEE\/ACM 46th International Conference on Software Engineering, New York, NY, USA.","DOI":"10.1145\/3597503.3639222"},{"key":"ref_6","first-page":"2224","article-title":"\u03bc VulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection","volume":"18","author":"Zou","year":"2021","journal-title":"IEEE Trans. Dependable Secur. Comput."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Jiang, N., Lutellier, T., and Tan, L. (2021, January 22\u201330). CURE: Code-Aware Neural Machine Translation for Automatic Program Repair. Proceedings of the 43rd International Conference on Software Engineering, Madrid, Spain.","DOI":"10.1109\/ICSE43902.2021.00107"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Fu, M., Tantithamthavorn, C., Le, T., Nguyen, V., and Phung, D. (2022, January 14\u201316). VulRepair: A T5-based automated software vulnerability repair. Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, New York, NY, USA.","DOI":"10.1145\/3540250.3549098"},{"key":"ref_9","unstructured":"Berabi, B., Gronskiy, A., Raychev, V., Sivanrupan, G., Chibotaru, V., and Vechev, M.T. (2024). DeepCode AI Fix: Fixing Security Vulnerabilities with Large Language Models. arXiv."},{"key":"ref_10","unstructured":"Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L., Luo, X., Lo, D., Grundy, J.C., and Wang, H. (2023). Large Language Models for Software Engineering: A Systematic Literature Review. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Mashhadi, E., and Hemmati, H. (2021). Applying CodeBERT for Automated Program Repair of Java Simple Bugs. arXiv.","DOI":"10.1109\/MSR52588.2021.00063"},{"key":"ref_12","unstructured":"Ren, S., Guo, D., Lu, S., Zhou, L., Liu, S., Tang, D., Zhou, M., Blanco, A., and Ma, S. (2020). CodeBLEU: A Method for Automatic Evaluation of Code Synthesis. arXiv."},{"key":"ref_13","unstructured":"Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K\u00fcttler, H., Lewis, M., Yih, W.T., and Rockt\u00e4schel, T. (2020, January 6\u201312). Retrieval-augmented generation for knowledge-intensive NLP tasks. Proceedings of the 34th International Conference on Neural Information Processing Systems, Red Hook, NY, USA."},{"key":"ref_14","unstructured":"OpenAI (2024, December 06). Hello GPT-4o. Available online: https:\/\/openai.com\/index\/hello-gpt-4o\/."},{"key":"ref_15","unstructured":"Meta (2024, December 06). Meta-Llama-3-8B-Instruct. Available online: https:\/\/huggingface.co\/meta-llama\/Meta-Llama-3-8B-Instruct."},{"key":"ref_16","unstructured":"Jiang, A.Q., Sablayrolles, A., Roux, A., Mensch, A., Savary, B., Bamford, C., Chaplot, D.S., de las Casas, D., Hanna, E.B., and Bressand, F. (2024). Mixtral of Experts. arXiv."},{"key":"ref_17","unstructured":"Rozi\u00e8re, B., Gehring, J., Gloeckle, F., Sootla, S., Gat, I., Tan, X.E., Adi, Y., Liu, J., Sauvestre, R., and Remez, T. (2024). Code Llama: Open Foundation Models for Code. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Fan, J., Li, Y., Wang, S., and Nguyen, T.N. (2020, January 29\u201330). A C\/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. Proceedings of the 2020 IEEE\/ACM 17th International Conference on Mining Software Repositories (MSR), Seoul, Republic of Korea.","DOI":"10.1145\/3379597.3387501"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Bhandari, G., Naseer, A., and Moonen, L. (2021, January 19\u201320). CVEfixes: Automated collection of vulnerabilities and their fixes from open-source software. Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering, Athens, Greece.","DOI":"10.1145\/3475960.3475985"},{"key":"ref_20","unstructured":"MITRE (2025, October 08). CVE Website. Available online: https:\/\/www.cve.org\/."},{"key":"ref_21","unstructured":"(2025, October 08). CWE\u2014Common Weakness Enumeration. Available online: https:\/\/cwe.mitre.org\/."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1109\/TSE.2022.3147265","article-title":"Neural Transfer Learning for Repairing Security Vulnerabilities in C Code","volume":"49","author":"Chen","year":"2023","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Pearce, H., Tan, B., Ahmad, B., Karri, R., and Dolan-Gavitt, B. (2022). Examining Zero-Shot Vulnerability Repair with Large Language Models. arXiv.","DOI":"10.1109\/SP46215.2023.10179324"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Russell, R.L., Kim, L., Hamilton, L.H., Lazovich, T., Harer, J.A., Ozdemir, O., Ellingwood, P.M., and McConley, M.W. (2018). Automated Vulnerability Detection in Source Code Using Deep Representation Learning. arXiv.","DOI":"10.1109\/ICMLA.2018.00120"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Wu, Y., Jiang, N., Pham, H.V., Lutellier, T., Davis, J., Tan, L., Babkin, P., and Shah, S. (2023, January 17\u201321). How Effective Are Neural Networks for Fixing Security Vulnerabilities. Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, Seattle, WA, USA.","DOI":"10.1145\/3597926.3598135"},{"key":"ref_26","unstructured":"Islam, N.T., Khoury, J., Seong, A., Karkevandi, M.B., Parra, G.D.L.T., Bou-Harb, E., and Najafirad, P. (2024). LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Joshi, H., Cambronero, J., Gulwani, S., Le, V., Radicek, I., and Verbruggen, G. (2022). Repair Is Nearly Generation: Multilingual Program Repair with LLMs. arXiv.","DOI":"10.1609\/aaai.v37i4.25642"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"109291","DOI":"10.1016\/j.engappai.2024.109291","article-title":"Enhanced automated code vulnerability repair using large language models","volume":"138","year":"2024","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"111741","DOI":"10.1016\/j.jss.2023.111741","article-title":"Out of the BLEU: How should we assess quality of the Code Generation models?","volume":"203","author":"Evtikhiev","year":"2023","journal-title":"J. Syst. Softw."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Fu, M., Tantithamthavorn, C., Nguyen, V., and Le, T. (2023). ChatGPT for Vulnerability Detection, Classification, and Repair: How Far Are We?. arXiv.","DOI":"10.1109\/APSEC60848.2023.00085"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ahmed, T., Ghosh, S., Bansal, C., Zimmermann, T., Zhang, X., and Rajmohan, S. (2023). Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models. arXiv.","DOI":"10.1109\/ICSE48619.2023.00149"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Bavaresco, A., Bernardi, R., Bertolazzi, L., Elliott, D., Fern\u00e1ndez, R., Gatt, A., Ghaleb, E., Giulianelli, M., Hanna, M., and Koller, A. (2024). LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks. arXiv.","DOI":"10.18653\/v1\/2025.acl-short.20"},{"key":"ref_33","unstructured":"Zhu, L., Wang, X., and Wang, X. (2025). JudgeLM: Fine-tuned Large Language Models are Scalable Judges. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Fu, J., Ng, S.K., Jiang, Z., and Liu, P. (2023). GPTScore: Evaluate as You Desire. arXiv.","DOI":"10.18653\/v1\/2024.naacl-long.365"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Kim, T.S., Lee, Y., Shin, J., Kim, Y.H., and Kim, J. (2024, January 11\u201316). EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria. Proceedings of the CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA.","DOI":"10.1145\/3613904.3642216"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Liu, Y., Iter, D., Xu, Y., Wang, S., Xu, R., and Zhu, C. (2023). G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment. arXiv.","DOI":"10.18653\/v1\/2023.emnlp-main.153"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Hu, R., Cheng, Y., Meng, L., Xia, J., Zong, Y., Shi, X., and Lin, W. (2025). Training an LLM-as-a-Judge Model: Pipeline, Insights, and Practical Lessons. arXiv.","DOI":"10.1145\/3701716.3715265"},{"key":"ref_38","unstructured":"Ye, J., Wang, Y., Huang, Y., Chen, D., Zhang, Q., Moniz, N., Gao, T., Geyer, W., Huang, C., and Chen, P.Y. (2024). Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge. arXiv."},{"key":"ref_39","unstructured":"Rogers, A., Boyd-Graber, J., and Okazaki, N. Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)."},{"key":"ref_40","unstructured":"Setty, S., Thakkar, H., Lee, A., Chung, E., and Vidra, N. (2024). Improving Retrieval for RAG based Question Answering Models on Financial Documents. arXiv."},{"key":"ref_41","unstructured":"Yepes, A.J., You, Y., Milczek, J., Laverde, S., and Li, R. (2024). Financial Report Chunking for Effective Retrieval Augmented Generation. arXiv."},{"key":"ref_42","unstructured":"LangChain (2024, October 09). LangChain. Available online: https:\/\/www.langchain.com\/."},{"key":"ref_43","unstructured":"Blog, N.T. (2024, December 04). Introduction to LLM Agents. Available online: https:\/\/developer.nvidia.com\/blog\/introduction-to-llm-agents\/."},{"key":"ref_44","unstructured":"Team, G., Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., and Millican, K. (2024). Gemini: A Family of Highly Capable Multimodal Models. arXiv."},{"key":"ref_45","unstructured":"(2024, November 12). The Claude 3 Model Family: Opus, Sonnet, Haiku. Available online: https:\/\/www-cdn.anthropic.com\/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627\/Model_Card_Claude_3.pdf."},{"key":"ref_46","unstructured":"Guo, D., Zhu, Q., Yang, D., Xie, Z., Dong, K., Zhang, W., Chen, G., Bi, X., Wu, Y., and Li, Y.K. (2024). DeepSeek-Coder: When the Large Language Model Meets Programming\u2014The Rise of Code Intelligence. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Ding, N., Chen, Y., Xu, B., Qin, Y., Zheng, Z., Hu, S., Liu, Z., Sun, M., and Zhou, B. (2023). Enhancing Chat Language Models by Scaling High-quality Instructional Conversations. arXiv.","DOI":"10.18653\/v1\/2023.emnlp-main.183"},{"key":"ref_48","unstructured":"xai (2025, March 10). Grok OS. Available online: https:\/\/x.ai\/news\/grok-os."},{"key":"ref_49","unstructured":"Du, N., Huang, Y., Dai, A.M., Tong, S., Lepikhin, D., Xu, Y., Krikun, M., Zhou, Y., Yu, A.W., and Firat, O. (2022). GLaM: Efficient Scaling of Language Models with Mixture-of-Experts. arXiv."},{"key":"ref_50","unstructured":"Sanseviero, O., Tunstall, L., Schmid, P., Mangrulkar, S., Belkada, Y., and Cuenca, P. (2024, November 12). Mixture of Experts Explained. Available online: https:\/\/huggingface.co\/blog\/moe."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Wang, Y., Wang, W., Joty, S., and Hoi, S.C.H. (2021). CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. arXiv.","DOI":"10.18653\/v1\/2021.emnlp-main.685"},{"key":"ref_52","unstructured":"Al-Onaizan, Y., Bansal, M., and Chen, Y.N. On Leakage of Code Generation Evaluation Datasets. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024."},{"key":"ref_53","unstructured":"Jiang, X., Wu, L., Sun, S., Li, J., Xue, J., Wang, Y., Wu, T., and Liu, M. (2025). Investigating Large Language Models for Code Vulnerability Detection: An Experimental Study. arXiv."},{"key":"ref_54","unstructured":"Du, X., Zheng, G., Wang, K., Feng, J., Deng, W., Liu, M., Chen, B., Peng, X., Ma, T., and Lou, Y. (2024). Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG. arXiv."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Popovic, M. (2015, January 17\u201318). chrF: Character n-gram F-score for automatic MT evaluation. Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisboa, Portugal.","DOI":"10.18653\/v1\/W15-3049"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Tran, N., Tran, H., Nguyen, S., Nguyen, H., and Nguyen, T. (2019, January 25\u201326). Does BLEU Score Work for Code Migration?. Proceedings of the 2019 IEEE\/ACM 27th International Conference on Program Comprehension (ICPC), Los Alamitos, CA, USA.","DOI":"10.1109\/ICPC.2019.00034"},{"key":"ref_57","unstructured":"GitHub (2025, March 10). CodeQL Documentation. Available online: https:\/\/codeql.github.com\/docs\/."},{"key":"ref_58","unstructured":"Snyk (2025, March 10). Developer Security. Available online: https:\/\/snyk.io\/."},{"key":"ref_59","unstructured":"SonarSource (2025, March 30). Advanced Security with SonarQube. Available online: https:\/\/www.sonarsource.com\/solutions\/security\/."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., and Jiang, D. (2020, January 16\u201320). CodeBERT: A Pre-Trained Model for Programming and Natural Languages. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online.","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"ref_61","unstructured":"Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., Zhou, L., Duan, N., Yin, J., and Jiang, D. (2020). GraphCodeBERT: Pre-training Code Representations with Data Flow. arXiv."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Xu, F.F., Alon, U., Neubig, G., and Hellendoorn, V.J. (2022, January 13). A systematic evaluation of large language models of code. Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, MAPS 2022, New York, NY, USA.","DOI":"10.1145\/3520312.3534862"},{"key":"ref_63","unstructured":"Nijkamp, E., Pang, B., Hayashi, H., Tu, L., Wang, H., Zhou, Y., Savarese, S., and Xiong, C. (2022, January 25\u201329). CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. Proceedings of the International Conference on Learning Representations, Online."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Li, Z., Lu, S., Guo, D., Duan, N., Jannu, S., Jenks, G., Majumder, D., Green, J., Svyatkovskiy, A., and Fu, S. (2022, January 14\u201318). Automating code review activities by large-scale pre-training. Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC\/FSE 2022, New York, NY, USA.","DOI":"10.1145\/3540250.3549081"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Peng, J., Cui, L., Huang, K., Yang, J., and Ray, B. (2025). CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation. arXiv.","DOI":"10.1109\/LLM4Code66737.2025.00009"},{"key":"ref_66","unstructured":"(2024, December 02). Confident AI. LLM Evaluation Metrics: Everything You Need for LLM Evaluation. Available online: https:\/\/www.confident-ai.com\/blog\/llm-evaluation-metrics-everything-you-need-for-llm-evaluation."},{"key":"ref_67","unstructured":"Microsoft (2025, March 06). Evaluation Metrics Built-In. Available online: https:\/\/learn.microsoft.com\/en-us\/azure\/ai-foundry\/concepts\/evaluation-metrics-built-in?tabs=warning."},{"key":"ref_68","unstructured":"Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., and Wang, Y. (2023). A Survey on Evaluation of Large Language Models. arXiv."},{"key":"ref_69","unstructured":"Friel, R., and Sanyal, A. (2023). Chainpoll: A high efficacy method for LLM hallucination detection. arXiv."},{"key":"ref_70","unstructured":"Galileo (2025, March 06). Completeness. Available online: https:\/\/docs.galileo.ai\/galileo\/gen-ai-studio-products\/galileo-guardrail-metrics\/completeness."},{"key":"ref_71","unstructured":"Lemos, R. (2025, March 06). Patch Imperfect: Software Fixes Failing to Shut Out Attackers. Available online: https:\/\/www.darkreading.com\/vulnerabilities-threats\/patch-imperfect-software-fixes-failing-to-shut-out-attackers."},{"key":"ref_72","unstructured":"Magazine, I. (2025, March 06). Google: Incomplete Patches Caused Quarter of Vulnerabilities, in Some Cases. Available online: https:\/\/www.infosecurity-magazine.com\/news\/google-incomplete-patches-quarter\/#:~:text=Google."},{"key":"ref_73","unstructured":"Galileo (2025, March 06). Correctness. Available online: https:\/\/docs.galileo.ai\/galileo\/gen-ai-studio-products\/galileo-guardrail-metrics\/correctness."},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Qi, Z., Long, F., Achour, S., and Rinard, M. (2015, January 13\u201317). An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA 2015, New York, NY, USA.","DOI":"10.1145\/2771783.2771791"},{"key":"ref_75","unstructured":"Liu, P., Liu, J., Fu, L., Lu, K., Xia, Y., Zhang, X., Chen, W., Weng, H., Ji, S., and Wang, W. (2024, January 14\u201316). Exploring ChatGPT\u2019s capabilities on vulnerability management. Proceedings of the 33rd USENIX Conference on Security Symposium SEC \u201924, Philadelphia, PA, USA."},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Liu, M., Wang, J., Lin, T., Ma, Q., Fang, Z., and Wu, Y. (2024). An Empirical Study of the Code Generation of Safety-Critical Software Using LLMs. Appl. Sci., 14.","DOI":"10.3390\/app14031046"},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Diener, M.J. (2010). Cohen\u2019s d. The Corsini Encyclopedia of Psychology, John Wiley & Sons, Ltd.","DOI":"10.1002\/9780470479216.corpsy0200"}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/149\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,3]],"date-time":"2025-12-03T05:10:28Z","timestamp":1764738628000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/149"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,19]]},"references-count":77,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["make7040149"],"URL":"https:\/\/doi.org\/10.3390\/make7040149","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,19]]}}}