{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,20]],"date-time":"2025-06-20T04:08:53Z","timestamp":1750392533093,"version":"3.41.0"},"reference-count":67,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","funder":[{"name":"National Science Foundation","award":["CCF-2008905, CCF-2047682"],"award-info":[{"award-number":["CCF-2008905, CCF-2047682"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,19]]},"abstract":"<jats:p>Understanding software vulnerabilities and their resolutions is crucial for securing modern software systems. This study presents a novel traceability model that links a pair of sentences describing at least one of the three types of semantics (triggers, crash phenomenon and fix action) for a vulnerability in natural language (NL) vulnerability artifacts, to their corresponding pair of code statements. Different from the traditional traceability models, our trace links between a pair of related NL sentences and a pair of code statements can recover the semantic relationship between code statements so that the specific role played by each code statement in a vulnerability can be automatically identified. Our end-to-end approach is implemented in two key steps: VulnExtract and VulnTrace. VulnExtract automatically extracts sentences describing triggers, crash phenomenon and\/or fix action for a vulnerability using 37 discourse patterns derived from NL artifacts (CVE summary, bug reports and commit messages). VulnTrace employs pre-trained code search models to trace these sentences to corresponding code statements. Our empirical study, based on 341 CVEs and their associated code snippets, demonstrates the effectiveness of our approach, with recall exceeding 90% in most cases for NL sentence extraction. VulnTrace achieves a Top5 accuracy of over 68.2% for mapping a pair of related NL sentences to corresponding pair of code statements. The end-to-end combined VulnExtract+VulnTrace achieves a Top5 accuracy of 59.6% and 53.1% for mapping two pairs of NL sentences to code statements. These results highlight the potential of our method in automating vulnerability comprehension and reducing manual effort.<\/jats:p>","DOI":"10.1145\/3729360","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:15:34Z","timestamp":1750346134000},"page":"2006-2029","source":"Crossref","is-referenced-by-count":0,"title":["Teaching AI the \u2018Why\u2019 and \u2018How\u2019 of Software Vulnerability Fixes"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-1074-1912","authenticated-orcid":false,"given":"Amiao","family":"Gao","sequence":"first","affiliation":[{"name":"Southern Methodist University, Dallas, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3811-675X","authenticated-orcid":false,"given":"Zenong","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of Texas at Dallas, Richardson, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7383-276X","authenticated-orcid":false,"given":"Simin","family":"Wang","sequence":"additional","affiliation":[{"name":"Southern Methodist University, Dallas, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7790-0195","authenticated-orcid":false,"given":"LiGuo","family":"Huang","sequence":"additional","affiliation":[{"name":"Southern Methodist University, Dallas, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2826-1857","authenticated-orcid":false,"given":"Shiyi","family":"Wei","sequence":"additional","affiliation":[{"name":"University of Texas at Dallas, Richardson, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8237-429X","authenticated-orcid":false,"given":"Vincent","family":"Ng","sequence":"additional","affiliation":[{"name":"University of Texas at Dallas, Richardson, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2017. CVE-2017-12893. Available from MITRE CVE-ID CVE-2017-12893. https:\/\/cve.mitre.org\/cgi-bin\/cvename.cgi?name=CVE-2017-12893 Accessed: 2024-09-12"},{"key":"e_1_2_1_2_1","unstructured":"2024. Common Vulnerabilities and Exposures (CVE). https:\/\/cve.mitre.org\/ Accessed: 2024-09-12"},{"key":"e_1_2_1_3_1","volume-title":"30th USENIX Security Symposium (USENIX Security 21)","author":"Ahmadi Mansour","year":"2021","unstructured":"Mansour Ahmadi, Reza Mirzazade farkhani, Ryan Williams, and Long Lu. 2021. Finding Bugs Using Your Own Code: Detecting Functionally-similar yet Inconsistent Code. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 2025\u20132040. isbn:978-1-939133-24-3 https:\/\/www.usenix.org\/conference\/usenixsecurity21\/presentation\/ahmadi"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.2307\/1267500"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1453101.1453146"},{"key":"e_1_2_1_6_1","unstructured":"S. Bird E. Klein and E. Loper. 2009. Natural Language Processing with Python. O\u2019Reilly Media Inc.. https:\/\/www.nltk.org\/book\/ Accessed: 2024-07-25"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-023-10415-7"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ESEM.2017.55"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2021.106745"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3106237.3106285"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1002\/smr.2294"},{"key":"e_1_2_1_12_1","volume-title":"Software and Systems Traceability","author":"Cleland-Huang Jane","unstructured":"Jane Cleland-Huang, Orlena Gotel, and Andrea Zisman. 2012. Software and Systems Traceability. Springer Publishing Company, Incorporated. isbn:1447122380"},{"key":"e_1_2_1_13_1","volume-title":"BERT Model and Tokenizer. https:\/\/huggingface.co\/bert-base-uncased Accessed","author":"Face Hugging","year":"2024","unstructured":"Hugging Face. 2024. BERT Model and Tokenizer. https:\/\/huggingface.co\/bert-base-uncased Accessed: July 23, 2024"},{"key":"e_1_2_1_14_1","volume-title":"DistilBERT Model and Tokenizer. https:\/\/huggingface.co\/distilbert-base-uncased Accessed","author":"Face Hugging","year":"2024","unstructured":"Hugging Face. 2024. DistilBERT Model and Tokenizer. https:\/\/huggingface.co\/distilbert-base-uncased Accessed: July 23, 2024"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_16_1","volume-title":"Teaching AI the \u2018Why","author":"Gao Amiao","unstructured":"Amiao Gao, Zenong Zhang, Simin Wang, Liguo Huang, Shiyi Wei, and Vincent Ng. 2025. Teaching AI the \u2018Why\u2019 and \u2018How\u2019 of Software Vulnerability Fixes. https:\/\/github.com\/amiaog\/Teaching-AI-the-Why-and-How-of-Software-Vulnerability-Fixes"},{"key":"e_1_2_1_17_1","unstructured":"Git Contributors. 2024. git diff Manual Page. https:\/\/git-scm.com\/docs\/git-diff Accessed: 2024-09-12"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2568225.2568260"},{"key":"e_1_2_1_19_1","unstructured":"The Tcpdump Group. 2024. Tcpdump 4.x.y by the Tcpdump Group. https:\/\/github.com\/the-tcpdump-group\/tcpdump Accessed: 2024-09-13"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","unstructured":"D. Guo S. Lu N. Duan Y. Wang M. Zhou and J. Yin. 2022. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. https:\/\/doi.org\/10.48550\/arXiv.2203.03850 arxiv:2203.03850. 10.48550\/arXiv.2203.03850","DOI":"10.48550\/arXiv.2203.03850"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2009.08366"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2597073.2597118"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-023-10314-x"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2013.6606585"},{"key":"e_1_2_1_25_1","unstructured":"Hugging Face. 2024. ELECTRA Model Tokenizer and Config. https:\/\/huggingface.co\/google\/electra-small-discriminator Accessed: 2024-07-23"},{"key":"e_1_2_1_26_1","unstructured":"Hugging Face. 2024. SpanBERT Model and Tokenizer. https:\/\/huggingface.co\/SpanBERT\/spanbert-base-cased Accessed: 2024-07-23"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPC.2009.5090029"},{"key":"e_1_2_1_28_1","unstructured":"Tim Kientzle and contributors. 2024. Libarchive - Multi-format Archive and Compression Library. https:\/\/github.com\/libarchive\/libarchive Accessed: 2024-09-13"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2006.23"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.2307\/2529310"},{"key":"e_1_2_1_31_1","volume-title":"2017 IEEE International Conference on Software Quality, Reliability and Security (QRS), 318\u2013328","author":"Li Jian","unstructured":"Jian Li, Pinjia He, Jieming Zhu, and Michael R. Lyu. 2017. Software Defect Prediction via Convolutional Neural Network. 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS), 318\u2013328. https:\/\/api.semanticscholar.org\/CorpusID:4845285"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00040"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3387904.3389272"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3293882.3330577"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1907.11692"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2914770.2837617"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2102.04664"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the 2009 International Conference on Ontology Patterns -","volume":"516","author":"Maynard Diana","year":"2009","unstructured":"Diana Maynard, Adam Funk, and Wim Peters. 2009. Using lexico-syntactic ontology design patterns for ontology creation and population. In Proceedings of the 2009 International Conference on Ontology Patterns - Volume 516 (WOP\u201909). CEUR-WS.org, Aachen, DEU. 39\u201352."},{"key":"e_1_2_1_39_1","volume-title":"Andrew Meneely, Pradeep K. Murukannaiah, Emily Tucker Prud\u2019hommeaux, Josephine Wolff, and Yang Yu.","author":"Munaiah Nuthan","year":"2017","unstructured":"Nuthan Munaiah, Benjamin S. Meyers, Cecilia Ovesdotter Alm, Andrew Meneely, Pradeep K. Murukannaiah, Emily Tucker Prud\u2019hommeaux, Josephine Wolff, and Yang Yu. 2017. Natural Language Insights from Code Reviews that Missed a Vulnerability - A Large Scale Study of Chromium. In Engineering Secure Software and Systems. https:\/\/doi-org.proxy.libraries.smu.edu\/10.1007\/978-3-319-62105-0_5"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2568225.2568317"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1315245.1315311"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/SANER48275.2020.9054829"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-008-9077-5"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2211.06335"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1201.0490"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2810103.2813604"},{"key":"e_1_2_1_47_1","unstructured":"GNU Project. 2024. GNU Binutils. https:\/\/github.com\/google\/oss-fuzz\/tree\/master\/projects\/binutils Accessed: 2024-09-13"},{"key":"e_1_2_1_48_1","unstructured":"GNOME Project. 2024. The XML C Parser and Toolkit of GNOME. https:\/\/github.com\/google\/fuzzbench\/tree\/master\/benchmarks\/libxml2_libxml2_xml_reader_for_file_fuzzer Accessed: 2024-09-13"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2771783.2771791"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/1806799.1806872"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2016.2622264"},{"key":"e_1_2_1_52_1","volume-title":"Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits. In 24th USENIX Security Symposium (USENIX Security 15)","author":"Sabottke Carl","year":"2015","unstructured":"Carl Sabottke, Octavian Suciu, and Tudor Dumitras. 2015. Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits. In 24th USENIX Security Symposium (USENIX Security 15). USENIX Association, Washington, D.C.. 1041\u20131056. isbn:978-1-939133-11-3 https:\/\/www.usenix.org\/conference\/usenixsecurity15\/technical-sessions\/presentation\/sabottke"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","unstructured":"Ensheng Shi Yanlin Wang Wenchao Gu Lun Du Hongyu Zhang Shi Han Dongmei Zhang and Hongbin Sun. 2023. CoCoSoDa: Effective Contrastive Learning for Code Search. https:\/\/doi.org\/10.48550\/arXiv.2204.03293 arxiv:2204.03293. 10.48550\/arXiv.2204.03293","DOI":"10.48550\/arXiv.2204.03293"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2010.81"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/2046660.2046664"},{"key":"e_1_2_1_56_1","unstructured":"systemd Developers. 2024. The systemd System and Service Manager. https:\/\/github.com\/systemd\/systemd Accessed: 2024-09-13"},{"key":"e_1_2_1_57_1","unstructured":"FFmpeg Team. 2024. A complete cross-platform solution to record convert and stream audio and video. https:\/\/github.com\/FFmpeg\/FFmpeg Accessed: 2024-09-13"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3340544"},{"key":"e_1_2_1_59_1","volume-title":"Investigations Report","year":"2023","unstructured":"Verizon. 2023. Data Breach Investigations Report 2023."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2022.3173346"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","unstructured":"Shichao Wang Yun Zhang Liagfeng Bao Xin Xia and Minghui Wu. 2022. VCMatch: A Ranking-based Approach for Automatic Security Patches Localization for OSS Vulnerabilities. In 2022 IEEE International Conference on Software Analysis Evolution and Reengineering (SANER). 589\u2013600. https:\/\/doi.org\/10.1109\/SANER53432.2022.00076 10.1109\/SANER53432.2022.00076","DOI":"10.1109\/SANER53432.2022.00076"},{"key":"e_1_2_1_62_1","doi-asserted-by":"crossref","unstructured":"T. Wolf L. Debut V. Sanh J. Chaumond C. Delangue A. Moi P. Cistac T. Rault R. Louf M. Funtowicz and J. Brew. 2019. HuggingFace\u2019s Transformers: State-of-the-art Natural Language Processing. https:\/\/github.com\/huggingface\/transformers Accessed: 2024-09-12","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_2_1_63_1","volume-title":"33rd USENIX Security Symposium (USENIX Security 24)","author":"Xu Dandan","year":"2024","unstructured":"Dandan Xu, Di Tang, Yi Chen, XiaoFeng Wang, Kai Chen, Haixu Tang, and Longxing Li. 2024. Racing on the Negative Force: Efficient Vulnerability Root-Cause Analysis through Reinforcement Learning on Counterexamples. In 33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA. 4229\u20134246. isbn:978-1-939133-44-1 https:\/\/www.usenix.org\/conference\/usenixsecurity24\/presentation\/xu-dandan"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-021-10087-1"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2013.6606577"},{"key":"e_1_2_1_66_1","volume-title":"FIXREVERTER: A Realistic Bug Injection Methodology for Benchmarking Fuzz Testing. In 31st USENIX Security Symposium (USENIX Security 22)","author":"Zhang Zenong","year":"2022","unstructured":"Zenong Zhang, Zach Patterson, Michael Hicks, and Shiyi Wei. 2022. FIXREVERTER: A Realistic Bug Injection Methodology for Benchmarking Fuzz Testing. In 31st USENIX Security Symposium (USENIX Security 22). USENIX Association, Boston, MA. 3699\u20133715. isbn:978-1-939133-31-1 https:\/\/www.usenix.org\/conference\/usenixsecurity22\/presentation\/zhang-zenong"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10207-023-00795-8"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729360","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:22:16Z","timestamp":1750346536000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729360"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":67,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2025,6,19]]}},"alternative-id":["10.1145\/3729360"],"URL":"https:\/\/doi.org\/10.1145\/3729360","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,19]]}}}