{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T08:25:36Z","timestamp":1781511936914,"version":"3.54.1"},"reference-count":40,"publisher":"Wiley","license":[{"start":{"date-parts":[[2022,1,18]],"date-time":"2022-01-18T00:00:00Z","timestamp":1642464000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62062069"],"award-info":[{"award-number":["62062069"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["YNOE-2020-01"],"award-info":[{"award-number":["YNOE-2020-01"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["2021J011131"],"award-info":[{"award-number":["2021J011131"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Optoelectronic Information Technology Key Laboratory Open Project Fund of Yunnan Province","award":["62062069"],"award-info":[{"award-number":["62062069"]}]},{"name":"Optoelectronic Information Technology Key Laboratory Open Project Fund of Yunnan Province","award":["YNOE-2020-01"],"award-info":[{"award-number":["YNOE-2020-01"]}]},{"name":"Optoelectronic Information Technology Key Laboratory Open Project Fund of Yunnan Province","award":["2021J011131"],"award-info":[{"award-number":["2021J011131"]}]},{"name":"Natural Science Foundation Project of Fujian Province","award":["62062069"],"award-info":[{"award-number":["62062069"]}]},{"name":"Natural Science Foundation Project of Fujian Province","award":["YNOE-2020-01"],"award-info":[{"award-number":["YNOE-2020-01"]}]},{"name":"Natural Science Foundation Project of Fujian Province","award":["2021J011131"],"award-info":[{"award-number":["2021J011131"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Security and Communication Networks"],"published-print":{"date-parts":[[2022,1,18]]},"abstract":"<jats:p>Due to multitudinous vulnerabilities in sophisticated software programs, the detection performance of existing approaches requires further improvement. Multiple vulnerability detection approaches have been proposed to aid code inspection. Among them, there is a line of approaches that apply deep learning (DL) techniques and achieve promising results. This paper attempts to utilize CodeBERT which is a deep contextualized model as an embedding solution to facilitate the detection of vulnerabilities in C open-source projects. The application of CodeBERT for code analysis allows the rich and latent patterns within software code to be revealed, having the potential to facilitate various downstream tasks such as the detection of software vulnerability. CodeBERT inherits the architecture of BERT, providing a stacked encoder of transformer in a bidirectional structure. This facilitates the learning of vulnerable code patterns which requires long-range dependency analysis. Additionally, the multihead attention mechanism of transformer enables multiple key variables of a data flow to be focused, which is crucial for analyzing and tracing potentially vulnerable data flaws, eventually, resulting in optimized detection performance. To evaluate the effectiveness of the proposed CodeBERT-based embedding solution, four mainstream-embedding methods are compared for generating software code embeddings, including Word2Vec, GloVe, and FastText. Experimental results show that CodeBERT-based embedding outperforms other embedding models on the downstream vulnerability detection tasks. To further boost performance, we proposed to include synthetic vulnerable functions and perform synthetic and real-world data fine tuning to facilitate the model learning of C-related vulnerable code patterns. Meanwhile, we explored the suitable configuration of CodeBERT. The evaluation results show that the model with new parameters outperform some state-of-the-art detection methods in our dataset.<\/jats:p>","DOI":"10.1155\/2022\/5203217","type":"journal-article","created":{"date-parts":[[2022,1,19]],"date-time":"2022-01-19T02:20:52Z","timestamp":1642558852000},"page":"1-12","source":"Crossref","is-referenced-by-count":27,"title":["Deep Neural Embedding for Software Vulnerability Discovery: Comparison and Optimization"],"prefix":"10.1155","volume":"2022","author":[{"given":"Xue","family":"Yuan","sequence":"first","affiliation":[{"name":"School of Physics and Electronic Information, Yunnan Normal University, Kunming 650000, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Guanjun","family":"Lin","sequence":"additional","affiliation":[{"name":"School of Information Engineering, Sanming University, Sanming, Fujian 365004, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9186-475X","authenticated-orcid":true,"given":"Yonghang","family":"Tai","sequence":"additional","affiliation":[{"name":"School of Physics and Electronic Information, Yunnan Normal University, Kunming 650000, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5310-0270","authenticated-orcid":true,"given":"Jun","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Physics and Electronic Information, Yunnan Normal University, Kunming 650000, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"311","reference":[{"key":"1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dcan.2020.07.003"},{"key":"2","doi-asserted-by":"publisher","DOI":"10.1145\/3465171"},{"key":"3","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2019.2932228"},{"key":"4","first-page":"1","article-title":"Deep neural-based vulnerability discovery demystified: data, model and performance","volume-title":"Neural Computing and Applications","author":"G. Lin","year":"2021"},{"key":"5","doi-asserted-by":"publisher","DOI":"10.1109\/comst.2018.2800740"},{"key":"6","doi-asserted-by":"publisher","DOI":"10.1109\/sp.2018.00003"},{"key":"7","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2014.2320577"},{"key":"8","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2020.2984505"},{"key":"9","doi-asserted-by":"publisher","DOI":"10.1109\/tnsm.2019.2899085"},{"key":"10","doi-asserted-by":"publisher","DOI":"10.1109\/TCCN.2017.2758370"},{"key":"11","first-page":"219","article-title":"Deep learning-based vulnerable function detection: a benchmark","author":"G. Lin"},{"key":"12","first-page":"2123","article-title":"Bimodal modelling of source code and natural language","author":"M. Allamanis"},{"key":"13","article-title":"Automated software vulnerability detection with machine learning","author":"J. A. Harer","year":"2018"},{"key":"14","article-title":"Learning binary code with deep learning to detect soft-ware weakness","author":"Y. J. Lee"},{"key":"15","doi-asserted-by":"publisher","DOI":"10.1109\/icmla.2018.00120"},{"key":"16","first-page":"1298","article-title":"Vulnerability detection with deep learning","author":"F. Wu"},{"key":"17","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884804"},{"key":"18","doi-asserted-by":"publisher","DOI":"10.1109\/tdsc.2019.2954088"},{"key":"19","first-page":"310","article-title":"\u201cPredicting common web application vulnerabilities from input validation and sanitization code patterns","author":"L. K. Shar"},{"key":"20","article-title":"Vuldeepecker: a deep learningbased system for vulnerability detection","author":"Z. Li","year":"2018"},{"key":"21","first-page":"2539","article-title":"Poster: vulnerability discovery with function representation learning from unlabeled projects","author":"G. Lin"},{"key":"22","doi-asserted-by":"publisher","DOI":"10.1109\/tii.2018.2821768"},{"issue":"1","key":"23","article-title":"Deep learning to find bugs","volume":"4","author":"M. Pradel","year":"2017","journal-title":"TU Darmstadt, Department of Computer Science"},{"key":"24","doi-asserted-by":"publisher","DOI":"10.1145\/3236024.3236085"},{"key":"25","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0228439"},{"key":"26","first-page":"5110","article-title":"Learning and evaluating contextual embedding of source code","author":"A. Kanade"},{"key":"27","article-title":"Scelmo: source code embeddings from language models","author":"R.-M. Karampatsis","year":"2020"},{"key":"28","doi-asserted-by":"crossref","first-page":"197158","DOI":"10.1109\/ACCESS.2020.3034766","article-title":"SOftware vulnerability analysis and discovery using deep learning techniques: a survey","volume":"8","author":"P. Zeng","year":"2020","journal-title":"IEEE Access"},{"key":"29","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-15-4032-5_59"},{"key":"30","doi-asserted-by":"publisher","DOI":"10.1109\/comst.2018.2885561"},{"key":"31","doi-asserted-by":"publisher","DOI":"10.1145\/3417978"},{"key":"32","doi-asserted-by":"publisher","DOI":"10.1109\/jproc.2020.2993293"},{"key":"33","article-title":"Efficient estimation of word representations in vector space","author":"T. Mikolov","year":"2013"},{"key":"34","article-title":"Codebert: a pretrained model for programming and natural languages","author":"Z. Feng","year":"2020"},{"key":"35","first-page":"5998","article-title":"Attention is all you need","author":"A. Vaswani"},{"key":"36","article-title":"Global relational models of source code","author":"V. J. Hellendoorn"},{"key":"37","article-title":"Mapping language to code in programmatic context","author":"S. Iyer","year":"2018"},{"key":"38","doi-asserted-by":"publisher","DOI":"10.1109\/JAS.2021.1004261"},{"key":"39","article-title":"In defense of fully connected layers in visual representation transfer","volume-title":"Pacific Rim Conference on Multimedia","author":"C.-L. Zhang","year":"2017"},{"key":"40","author":"FlawFinder"}],"container-title":["Security and Communication Networks"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/scn\/2022\/5203217.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/scn\/2022\/5203217.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/scn\/2022\/5203217.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,1,19]],"date-time":"2022-01-19T02:20:59Z","timestamp":1642558859000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.hindawi.com\/journals\/scn\/2022\/5203217\/"}},"subtitle":[],"editor":[{"given":"Weizhi","family":"Meng","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2022,1,18]]},"references-count":40,"alternative-id":["5203217","5203217"],"URL":"https:\/\/doi.org\/10.1155\/2022\/5203217","relation":{},"ISSN":["1939-0122","1939-0114"],"issn-type":[{"value":"1939-0122","type":"electronic"},{"value":"1939-0114","type":"print"}],"subject":[],"published":{"date-parts":[[2022,1,18]]}}}