{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:20:07Z","timestamp":1760145607291,"version":"build-2065373602"},"reference-count":40,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2024,7,31]],"date-time":"2024-07-31T00:00:00Z","timestamp":1722384000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union","award":["RRF-2.3.1-21-2022-00004","TKP2021-NVA-09","KP2021-NVA"],"award-info":[{"award-number":["RRF-2.3.1-21-2022-00004","TKP2021-NVA-09","KP2021-NVA"]}]},{"name":"Artificial Intelligence National Laboratory","award":["RRF-2.3.1-21-2022-00004","TKP2021-NVA-09","KP2021-NVA"],"award-info":[{"award-number":["RRF-2.3.1-21-2022-00004","TKP2021-NVA-09","KP2021-NVA"]}]},{"name":"Ministry of Culture and Innovation of Hungary","award":["RRF-2.3.1-21-2022-00004","TKP2021-NVA-09","KP2021-NVA"],"award-info":[{"award-number":["RRF-2.3.1-21-2022-00004","TKP2021-NVA-09","KP2021-NVA"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Software"],"abstract":"<jats:p>Software vulnerabilities pose a significant threat to computer systems because they can jeopardize the integrity of both software and hardware. The existing tools for detecting vulnerabilities are inadequate. Machine learning algorithms may struggle to interpret enormous datasets because of their limited ability to understand intricate linkages within high-dimensional data. Traditional procedures, on the other hand, take a long time and require a lot of manual labor. Furthermore, earlier deep-learning approaches failed to acquire adequate feature data. Self-attention mechanisms can process information across large distances, but they do not collect structural data. This work addresses the critical problem of inadequate vulnerability detection in software systems. We propose a novel method that combines self-attention with convolutional networks to enhance the detection of software vulnerabilities by capturing both localized, position-specific features and global, content-driven interactions. Our contribution lies in the integration of these methodologies to improve the precision and F1 score of vulnerability detection systems, achieving unprecedented results on complex Python datasets. In addition, we improve the self-attention approaches by changing the denominator to address the issue of excessive attention heads creating irrelevant disturbances. We assessed the effectiveness of this strategy using six complex Python vulnerability datasets obtained from GitHub. Our rigorous study and comparison of data with previous studies resulted in the most precise outcomes and F1 score (99%) ever attained by machine learning systems.<\/jats:p>","DOI":"10.3390\/software3030016","type":"journal-article","created":{"date-parts":[[2024,8,1]],"date-time":"2024-08-01T18:04:59Z","timestamp":1722535499000},"page":"310-327","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Towards a Block-Level Conformer-Based Python Vulnerability Detection"],"prefix":"10.3390","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9691-7937","authenticated-orcid":false,"given":"Amirreza","family":"Bagheri","sequence":"first","affiliation":[{"name":"Department of Software Engineering, University of Szeged, 6720 Szeged, Hungary"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4592-6504","authenticated-orcid":false,"given":"P\u00e9ter","family":"Heged\u0171s","sequence":"additional","affiliation":[{"name":"Department of Software Engineering, University of Szeged, 6720 Szeged, Hungary"}]}],"member":"1968","published-online":{"date-parts":[[2024,7,31]]},"reference":[{"key":"ref_1","unstructured":"(2024, July 28). Nist, Available online: https:\/\/nvd.nist.gov\/vuln\/vulnerability-detail-pages."},{"key":"ref_2","unstructured":"Ferschke, O., Gurevych, I., and Rittberger, M. (2012). FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia. CLEF (Online Working Notes\/Labs\/Workshop), AAAI."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1109\/MS.2008.130","article-title":"Using static analysis to find bugs","volume":"25","author":"Ayewah","year":"2008","journal-title":"IEEE Softw."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Perl, H., Dechand, S., Smith, M., Arp, D., Yamaguchi, F., Rieck, K., Fahl, S., and Acar, Y. (2015, January 12\u201316). Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA.","DOI":"10.1145\/2810103.2813604"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3092566","article-title":"Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey","volume":"50","author":"Ghaffarian","year":"2017","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1943","DOI":"10.1109\/TIFS.2020.3044773","article-title":"Combining graph-based learning with automated data collection for code vulnerability detection","volume":"16","author":"Wang","year":"2020","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_7","unstructured":"Zhou, Y., Liu, S., Siow, J., Du, X., and Liu, Y. (2019). Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. NIPS Proceedings\u2014Advances in Neural Information Processing Systems 32 (NIPS 2019), Vancouver, Canada, 8\u201314 December 2019, Neural Information Processing Systems (NIPS)."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., and Jiang, D. (2020). Codebert: A pre-trained model for programming and natural languages. arXiv.","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Wang, Y., Wang, W., Joty, S., and Hoi, S.C. (2021). Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv.","DOI":"10.18653\/v1\/2021.emnlp-main.685"},{"key":"ref_10","unstructured":"(2024, July 28). Gpt-4. Available online: https:\/\/platform.openai.com\/playground\/chat?mode=chat&model=gpt-4o&models=gpt-4o."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Gulati, A., Qin, J., Chiu, C.C., Parmar, N., Zhang, Y., Yu, J., Han, W., Wang, S., Zhang, Z., and Wu, Y. (2020). Conformer: Convolution-augmented transformer for speech recognition. arXiv.","DOI":"10.21437\/Interspeech.2020-3015"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Cui, S., Zhao, G., Gao, Y., Tavu, T., and Huang, J. (2022, January 7\u201311). VRust: Automated vulnerability detection for solana smart contracts. Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, Los Angeles, CA, USA.","DOI":"10.1145\/3548606.3560552"},{"key":"ref_13","unstructured":"Johns, M., Pfistner, S., and SAP SE (2018). End-to-End Taint Tracking for Detection and Mitigation of iNjection Vulnerabilities in Web Applications. (10,129,285), U.S. Patent."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wang, D., Jiang, B., and Chan, W.K. (2020). WANA: Symbolic execution of wasm bytecode for cross-platform smart contract vulnerability detection. arXiv.","DOI":"10.1109\/QRS54544.2021.00102"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Dinh, S.T., Cho, H., Martin, K., Oest, A., Zeng, K., Kapravelos, A., Ahn, G.J., Bao, T., Wang, R., and Doup\u00e9, A. (2021, January 21\u201325). Favocado: Fuzzing the Binding Code of JavaScript Engines Using Semantically Correct Test Cases. Proceedings of the Network and Distributed System Security Symposium, Virtual.","DOI":"10.14722\/ndss.2021.24224"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"He, J., Balunovi\u0107, M., Ambroladze, N., Tsankov, P., and Vechev, M. (2019, January 11\u201315). Learning to fuzz from symbolic execution with application to smart contracts. Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK.","DOI":"10.1145\/3319535.3363230"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1016\/j.eswa.2016.09.041","article-title":"Multi-level hybrid support vector machine and extreme learning machine based on modified K-means for intrusion detection system","volume":"67","author":"Othman","year":"2017","journal-title":"Expert Syst. Appl."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"111283","DOI":"10.1016\/j.jss.2022.111283","article-title":"Just-in-time software vulnerability detection: Are we there yet?","volume":"188","author":"Lomio","year":"2022","journal-title":"J. Syst. Softw."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"6822","DOI":"10.1109\/JIOT.2019.2912022","article-title":"Machine learning-based network vulnerability analysis of industrial Internet of Things","volume":"6","author":"Zolanvari","year":"2019","journal-title":"IEEE Internet Things J."},{"key":"ref_20","first-page":"2224","article-title":"\u03bcVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection","volume":"18","author":"Zou","year":"2019","journal-title":"IEEE Trans. Dependable Secur. Comput."},{"key":"ref_21","unstructured":"Allamanis, M., Brockschmidt, M., and Khademi, M. (2017). Learning to represent programs with graphs. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Steenhoek, B., Rahman, M.M., Jiles, R., and Le, W. (2023, January 14\u201320). An empirical study of deep learning models for vulnerability detection. Proceedings of the 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia.","DOI":"10.1109\/ICSE48619.2023.00188"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Hin, D., Kan, A., Chen, H., and Babar, M.A. (2022, January 23\u201324). Linevd: Statement-level vulnerability detection using graph neural networks. Proceedings of the 19th International Conference on Mining Software Repositories, Pittsburgh, PA, USA.","DOI":"10.1145\/3524842.3527949"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3290353","article-title":"Code2Vec: Learning Distributed Representations of Code","volume":"3","author":"Alon","year":"2019","journal-title":"Proc. ACM Program. Lang."},{"key":"ref_25","unstructured":"Guo, D., Ren, S., Lu, S., Pan, J., Zhang, C., Feng, X., and de Rijke, M. (2021, January 7\u201311). GraphCodeBERT: Pre-training Code Representations with Data Flow. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Virtual Event\/Punta Cana, Dominican Republic."},{"key":"ref_26","unstructured":"Kanade, A., Maniatis, P., Balakrishnan, G., and Shi, K. (2020). Learning and Evaluating Contextual Embedding of Source Code. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"731","DOI":"10.1145\/3022671.2984041","article-title":"Probabilistic Model for Code with Decision Trees","volume":"Volume 51","author":"Raychev","year":"2016","journal-title":"ACM SIGPLAN Notices"},{"key":"ref_28","unstructured":"Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozi\u00e8re, B., Goyal, N., Hambro, E., and Azhar, F. (2023). Llama: Open and efficient foundation language models. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Finnie-Ansley, J., Denny, P., Becker, B.A., Luxton-Reilly, A., and Prather, J. (2022, January 14\u201318). The robots are coming: Exploring the implications of openai codex on introductory programming. Proceedings of the 24th Australasian Computing Education Conference, Melbourne, VIC, Australia.","DOI":"10.1145\/3511861.3511863"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Pearce, H., Tan, B., Ahmad, B., Karri, R., and Dolan-Gavitt, B. (2023, January 23\u201324). Examining zero-shot vulnerability repair with large language models. Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA.","DOI":"10.1109\/SP46215.2023.10179324"},{"key":"ref_31","unstructured":"Cheshkov, A., Zadorozhny, P., and Levichev, R. (2023). Evaluation of chatgpt model for vulnerability detection. arXiv."},{"key":"ref_32","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, NIPS\u201917: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4\u20139 December 2017, Curran Associates Inc."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Sharma, A. (2017, January 4\u20138). Automated identification of security issues from commit messages and bug reports. Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, Paderborn, Germany.","DOI":"10.1145\/3106237.3117771"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1109\/TSE.2018.2884955","article-title":"Mining fix patterns for findbugs violations","volume":"47","author":"Liu","year":"2018","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_35","unstructured":"Bagheri, A., and Heged\u0171s, P. (2021). A comparison of different source code representation methods for vulnerability prediction in python. Quality of Information and Communications Technology: 14th International Conference, QUATIC 2021, Algarve, Portugal, 8\u201311 September 2021, Proceedings 14, Springer International Publishing."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Morrison, P., Herzig, K., Murphy, B., and Williams, L. (2015, January 21\u201322). Challenges with applying vulnerability prediction models. Proceedings of the 2015 Symposium and Bootcamp on the Science of Security, Urbana, IL, USA.","DOI":"10.1145\/2746194.2746198"},{"key":"ref_37","unstructured":"Dam, H.K., Tran, T., and Pham, T. (2016). A deep language model for software code. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Hovsepyan, A., Scandariato, R., Joosen, W., and Walden, J. (2012, January 21). Software vulnerability prediction using text analysis techniques. Proceedings of the 4th International Workshop on Security Measurements and Metrics, Lund, Sweden.","DOI":"10.1145\/2372225.2372230"},{"key":"ref_39","unstructured":"(2024, July 28). Owasp. Available online: https:\/\/owasp.org\/www-community\/attacks."},{"key":"ref_40","unstructured":"(2024, July 28). Hpc. Available online: https:\/\/docs.hpc.kifu.hu\/tasks\/overview.html#compute-nodes."}],"container-title":["Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2674-113X\/3\/3\/16\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:27:12Z","timestamp":1760110032000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2674-113X\/3\/3\/16"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,31]]},"references-count":40,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2024,9]]}},"alternative-id":["software3030016"],"URL":"https:\/\/doi.org\/10.3390\/software3030016","relation":{},"ISSN":["2674-113X"],"issn-type":[{"type":"electronic","value":"2674-113X"}],"subject":[],"published":{"date-parts":[[2024,7,31]]}}}