{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,8,1]],"date-time":"2026-08-01T03:19:39Z","timestamp":1785554379949,"version":"3.56.0"},"reference-count":54,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,5,13]],"date-time":"2024-05-13T00:00:00Z","timestamp":1715558400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Big Data"],"abstract":"<jats:sec><jats:title>Introduction<\/jats:title><jats:p>Artificial Intelligence (AI) is increasingly used as a helper to develop computing programs. While it can boost software development and improve coding proficiency, this practice offers no guarantee of security. On the contrary, recent research shows that some AI models produce software with vulnerabilities. This situation leads to the question: How serious and widespread are the security flaws in code generated using AI models?<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>Through a systematic literature review, this work reviews the state of the art on how AI models impact software security. It systematizes the knowledge about the risks of using AI in coding security-critical software.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>It reviews what security flaws of well-known vulnerabilities (e.g., the MITRE CWE Top 25 Most Dangerous Software Weaknesses) are commonly hidden in AI-generated code. It also reviews works that discuss how vulnerabilities in AI-generated code can be exploited to compromise security and lists the attempts to improve the security of such AI-generated code.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>Overall, this work provides a comprehensive and systematic overview of the impact of AI in secure coding. This topic has sparked interest and concern within the software security engineering community. It highlights the importance of setting up security measures and processes, such as code verification, and that such practices could be customized for AI-aided code production.<\/jats:p><\/jats:sec>","DOI":"10.3389\/fdata.2024.1386720","type":"journal-article","created":{"date-parts":[[2024,5,13]],"date-time":"2024-05-13T04:54:22Z","timestamp":1715576062000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":45,"title":["A systematic literature review on the impact of AI models on the security of code generation"],"prefix":"10.3389","volume":"7","author":[{"given":"Claudia","family":"Negri-Ribalta","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"R\u00e9mi","family":"Geraud-Stewart","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Anastasia","family":"Sergeeva","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Gabriele","family":"Lenzini","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1965","published-online":{"date-parts":[[2024,5,13]]},"reference":[{"key":"B1","first-page":"2655","author":"Ahmad","year":"2021","journal-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies"},{"key":"B2","doi-asserted-by":"publisher","first-page":"129","DOI":"10.48550\/arXiv.2204.04741","volume":"28","author":"Asare","year":"2023"},{"key":"B3","doi-asserted-by":"crossref","first-page":"500","DOI":"10.1145\/3545945.3569759","article-title":"\u201cProgramming is hard-or at least it used to be: educational opportunities and challenges of ai code generation,\u201d","volume-title":"Proceedings of the 54th ACM Technical Symposium on Computer Science Education V.1","author":"Becker","year":"2023"},{"key":"B4","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1109\/SPW59333.2023.00027","article-title":"\u201cGPThreats-3: is automatic malware generation a threat?\u201d","volume-title":"2023 IEEE Security and Privacy Workshops (SPW)","author":"Botacin","year":"2023"},{"key":"B5","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1703.03906","article-title":"Massive exploration of neural machine translation architectures","author":"Britz","year":"2017","journal-title":"ArXiv e-prints"},{"key":"B6","author":"Burgess","year":"2023","journal-title":"Criminals Have Created Their Own ChatGPT Clones"},{"key":"B7","doi-asserted-by":"publisher","first-page":"101895","DOI":"10.1016\/j.mex.2022.101895","article-title":"How-to conduct a systematic literature review: a quick guide for computer science research","volume":"9","author":"Carrera-Rivera","year":"2022","journal-title":"MethodsX"},{"key":"B8","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2107.03374","article-title":"Evaluating large language models trained on code","author":"Chen","year":"2021","journal-title":"CoRR"},{"key":"B9","doi-asserted-by":"crossref","first-page":"508","DOI":"10.1145\/3379597.3387501","article-title":"\u201cA C\/C++ code vulnerability dataset with code changes and CVE summaries,\u201d","volume-title":"Proceedings of the 17th International Conference on Mining Software Repositories, MSR '20","author":"Fan","year":"2020"},{"key":"B10","first-page":"1536","author":"Feng","year":"2020","journal-title":"Findings of the Association for Computational Linguistics: EMNLP 2020"},{"key":"B11","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2204.05999","article-title":"InCoder: a generative model for code infilling and synthesis","author":"Fried","year":"2022","journal-title":"ArXiv"},{"key":"B12","unstructured":"\u201cGraphCodeBERT: pre-training code representations with data flow,\u201d\n            GuoD.\n            RenS.\n            LuS.\n            FengZ.\n            TangD.\n            LiuS.\n          9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3\u20137, 20212021"},{"key":"B13","doi-asserted-by":"crossref","first-page":"1865","DOI":"10.1145\/3576915.3623175","article-title":"\u201cLarge language models for code: Security hardening and adversarial testing,\u201d","volume-title":"Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security","author":"He","year":"2023"},{"key":"B14","doi-asserted-by":"crossref","first-page":"526","DOI":"10.1109\/SANER53432.2022.00070","article-title":"\u201cSemantic robustness of models of source code,\u201d","volume-title":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","author":"Henkel","year":"2022"},{"key":"B15","unstructured":"\u201cThe curious case of neural text degeneration,\u201d\n            HoltzmanA.\n            BuysJ.\n            DuL.\n            ForbesM.\n            ChoiY.\n          8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26\u201330, 20202020"},{"key":"B16","author":"Huang","year":"2023","journal-title":"Do Not Give Away My Secrets: Uncovering the Privacy Issue of Neural Code Completion Tools"},{"key":"B17","unstructured":"2022"},{"key":"B18","doi-asserted-by":"crossref","first-page":"5954","DOI":"10.18653\/v1\/2021.emnlp-main.482","author":"Jain","year":"2021","journal-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing"},{"key":"B19","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1109\/MSR59073.2023.00082","article-title":"\u201cLarge language models and simple, stupid bugs,\u201d","volume-title":"2023 IEEE\/ACM 20th International Conference on Mining Software Repositories (MSR)","author":"Jesse","year":"2023"},{"key":"B20","doi-asserted-by":"crossref","first-page":"14892","DOI":"10.1609\/aaai.v37i12.26739","author":"Jha","year":"2023","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37"},{"key":"B21","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1109\/SANER56733.2023.00029","article-title":"\u201cCLAWSAT: towards both robust and accurate code models,\u201d","volume-title":"2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","author":"Jia","year":"2023"},{"key":"B22","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1145\/3379597.3387491","article-title":"\u201cHow often do single-statement bugs occur? The manySStuBs4J dataset,\u201d","volume-title":"Proceedings of the 17th International Conference on Mining Software Repositories, MSR '20","author":"Karampatsis","year":"2020"},{"key":"B23","unstructured":"Guidelines for performing systematic literature reviews in software engineering\n            KitchenhamB.\n            ChartersS.\n          Tech. Rep.2007"},{"key":"B24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1852786.1852789","article-title":"\u201cCan we evaluate the quality of software engineering experiments?,\u201d","volume-title":"Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement","author":"Kitchenham","year":"2010"},{"key":"B25","doi-asserted-by":"publisher","author":"Li","year":"2023","DOI":"10.48550\/arXiv.2305.06161"},{"key":"B26","doi-asserted-by":"publisher","first-page":"120073","DOI":"10.48550\/arXiv.2212.06008","article-title":"Who evaluates the evaluators? On automatic metrics for assessing AI-based offensive code generators","volume":"225","author":"Liguori","year":"2023","journal-title":"Expert Syst. Appl."},{"key":"B27","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1907.11692","article-title":"RoBERTa: a robustly optimized BERT pretraining approach","author":"Liu","year":"2019","journal-title":"CoRR"},{"key":"B28","doi-asserted-by":"publisher","first-page":"111734","DOI":"10.48550\/arXiv.2206.15331","volume":"203","author":"Moradi Dakhel","year":"2023"},{"key":"B29","year":"2021","journal-title":"GPT Code Clippy: The Open Source Version of GitHub Copilot"},{"key":"B30","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1007\/978-3-031-34671-2_23","article-title":"\u201cHow hardened is your hardware? Guiding ChatGPT to generate secure hardware resistant to CWEs,\u201d","volume-title":"International Symposium on Cyber Security, Cryptology, and Machine Learning","author":"Nair","year":"2023"},{"key":"B31","doi-asserted-by":"publisher","first-page":"1219","DOI":"10.48550\/arXiv.2402.01219","volume":"2024","author":"Natella","year":"2024"},{"key":"B32","volume-title":"CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis","author":"Nijkamp","year":"2023"},{"key":"B33","first-page":"1565","article-title":"\u201cCrossVul: a cross-language vulnerability dataset with commit data,\u201d","volume-title":"Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC\/FSE 2021","author":"Nikitopoulos","year":"2021"},{"key":"B34","first-page":"2133","author":"Niu","year":"2023","journal-title":"32nd USENIX Security Symposium (USENIX Security 23)"},{"key":"B35","unstructured":"Modern neural networks generalize on small data sets36233632\n            OlsonM.\n            WynerA.\n            BerkR.\n          Adv. Neural Inform. Process. Syst312018"},{"key":"B36","first-page":"10","article-title":"\u201cAn attacker's dream? Exploring the capabilities of chatgpt for developing malware,\u201d","volume-title":"Proceedings of the 16th Cyber Security Experimentation and Test Workshop","author":"Pa Pa","year":"2023"},{"key":"B37","doi-asserted-by":"crossref","first-page":"754","DOI":"10.1109\/SP46214.2022.9833571","article-title":"\u201cAsleep at the keyboard? Assessing the security of GitHub Copilot's code contributions,\u201d","volume-title":"2022 IEEE Symposium on Security and Privacy (SP)","author":"Pearce","year":"2022"},{"key":"B38","doi-asserted-by":"crossref","first-page":"2339","DOI":"10.1109\/SP46215.2023.10179324","article-title":"\u201cExamining zero-shot vulnerability repair with large language models,\u201d","volume-title":"2023 IEEE Symposium on Security and Privacy (SP)","author":"Pearce","year":"2023"},{"key":"B39","doi-asserted-by":"crossref","first-page":"2785","DOI":"10.1145\/3576915.3623157","article-title":"\u201cDo users write more insecure code with AI assistants?,\u201d","volume-title":"Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security","author":"Perry","year":"2023"},{"key":"B40","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.infsof.2015.03.007","article-title":"Guidelines for conducting systematic mapping studies in software engineering: an update","volume":"64","author":"Petersen","year":"2015","journal-title":"Inform. Softw. Technol"},{"key":"B41","first-page":"2205","article-title":"\u201cLost at C: a user study on the security implications of large language model code assistants,\u201d","volume-title":"32nd USENIX Security Symposium (USENIX Security 23)","author":"Sandoval","year":"2023"},{"key":"B42","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1109\/SCAM55253.2022.00014","article-title":"\u201cAn empirical study of code smells in transformer-based code generation techniques,\u201d","volume-title":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","author":"Siddiq","year":"2022"},{"key":"B43","doi-asserted-by":"crossref","first-page":"683","DOI":"10.1109\/ISSRE59848.2023.00035","article-title":"\u201cEfficient avoidance of vulnerabilities in auto-completed smart contract code using vulnerability-constrained decoding,\u201d","volume-title":"2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)","author":"Storhaug","year":"2023"},{"key":"B44","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1007\/978-3-030-01424-7_27","author":"Tan","year":"2018","journal-title":"Artificial Neural Networks and Machine Learning \u2013 ICANN 2018"},{"key":"B45","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1109\/QRS57517.2022.00094","article-title":"\u201cGitHub considered harmful? Analyzing open-source projects for the automatic generation of cryptographic API call sequences,\u201d","volume-title":"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)","author":"Tony","year":"2022"},{"key":"B46","first-page":"588","article-title":"\u201cLLMSecEval: a dataset of natural language prompts for security evaluations,\u201d","volume-title":"20th IEEE\/ACM International Conference on Mining Software Repositories, MSR 2023, Melbourne, Australia, May 15-16, 2023","author":"Tony","year":"2023"},{"key":"B47","first-page":"5998","author":"Vaswani","year":"2017","journal-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017"},{"key":"B48","doi-asserted-by":"crossref","first-page":"8696","DOI":"10.18653\/v1\/2021.emnlp-main.685","author":"Wang","year":"2021","journal-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing"},{"key":"B49","doi-asserted-by":"publisher","first-page":"106809","DOI":"10.48550\/arXiv.2201.08441","article-title":"VUDENC: vulnerability detection with deep learning on a natural codebase for Python","volume":"144","author":"Wartschinski","year":"2022","journal-title":"Inform. Softw. Technol"},{"key":"B50","doi-asserted-by":"publisher","first-page":"102","DOI":"10.1007\/s00766-005-0021-6","article-title":"Requirements engineering paper classification and evaluation criteria: a proposal and a discussion","volume":"11","author":"Wieringa","year":"2006","journal-title":"Requir. Eng"},{"key":"B51","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2601248.2601268","article-title":"\u201cGuidelines for snowballing in systematic literature studies and a replication in software engineering,\u201d","volume-title":"Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering","author":"Wohlin","year":"2014"},{"key":"B52","doi-asserted-by":"publisher","first-page":"2594","DOI":"10.1016\/j.jss.2013.04.076","article-title":"On the reliability of mapping studies in software engineering","volume":"86","author":"Wohlin","year":"2013","journal-title":"J. Syst. Softw"},{"key":"B53","first-page":"1282","article-title":"\u201cHow effective are neural networks for fixing security vulnerabilities,\u201d","volume-title":"Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023","author":"Wu","year":"2023"},{"key":"B54","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3520312.3534862","article-title":"\u201cA systematic evaluation of large language models of code,\u201d","volume-title":"Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming","author":"Xu","year":"2022"}],"container-title":["Frontiers in Big Data"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdata.2024.1386720\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,13]],"date-time":"2024-05-13T04:54:47Z","timestamp":1715576087000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdata.2024.1386720\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,13]]},"references-count":54,"alternative-id":["10.3389\/fdata.2024.1386720"],"URL":"https:\/\/doi.org\/10.3389\/fdata.2024.1386720","relation":{},"ISSN":["2624-909X"],"issn-type":[{"value":"2624-909X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,13]]},"article-number":"1386720"}}