{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T13:01:52Z","timestamp":1774443712698,"version":"3.50.1"},"reference-count":78,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,2,22]],"date-time":"2024-02-22T00:00:00Z","timestamp":1708560000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,2,22]],"date-time":"2024-02-22T00:00:00Z","timestamp":1708560000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Empir Software Eng"],"published-print":{"date-parts":[[2024,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Static analysis tools are widely used for vulnerability detection as they can analyze programs with complex behavior and millions of lines of code. Despite their popularity, static analysis tools are known to generate an excess of false positives. The recent ability of Machine Learning models to learn from programming language data opens new possibilities of reducing false positives when applied to static analysis. However, existing datasets to train models for vulnerability identification suffer from multiple limitations such as limited bug context, limited size, and synthetic and unrealistic source code. We propose Differential Dataset Analysis or D2A, a differential analysis based approach to label issues reported by static analysis tools. The dataset built with this approach is called the D2A dataset. The D2A dataset is built by analyzing version pairs from multiple open source projects. From each project, we select bug fixing commits and we run static analysis on the versions before and after such commits. If some issues detected in a before-commit version disappear in the corresponding after-commit version, they are very likely to be real bugs that got fixed by the commit. We use D2A to generate a large labeled dataset. We then train both classic machine learning models and deep learning models for vulnerability identification using the D2A dataset. We show that the dataset can be used to build a classifier to identify possible false alarms among the issues reported by static analysis, hence helping developers prioritize and investigate potential true positives first. To facilitate future research and contribute to the community, we make the dataset generation pipeline and the dataset publicly available. We have also created a leaderboard based on the D2A dataset, which has already attracted attention and participation from the community.<\/jats:p>","DOI":"10.1007\/s10664-023-10405-9","type":"journal-article","created":{"date-parts":[[2024,2,22]],"date-time":"2024-02-22T12:18:20Z","timestamp":1708604300000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Analyzing source code vulnerabilities in the D2A dataset with ML ensembles and C-BERT"],"prefix":"10.1007","volume":"29","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9772-3162","authenticated-orcid":false,"given":"Saurabh","family":"Pujar","sequence":"first","affiliation":[]},{"given":"Yunhui","family":"Zheng","sequence":"additional","affiliation":[]},{"given":"Luca","family":"Buratti","sequence":"additional","affiliation":[]},{"given":"Burn","family":"Lewis","sequence":"additional","affiliation":[]},{"given":"Yunchung","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Jim","family":"Laredo","sequence":"additional","affiliation":[]},{"given":"Alessandro","family":"Morari","sequence":"additional","affiliation":[]},{"given":"Edward","family":"Epstein","sequence":"additional","affiliation":[]},{"given":"Tsungnan","family":"Lin","sequence":"additional","affiliation":[]},{"given":"Bo","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Zhong","family":"Su","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,2,22]]},"reference":[{"key":"10405_CR1","doi-asserted-by":"crossref","unstructured":"Allamanis M, Barr ET, Devanbu P, Sutton C (2018) A survey of machine learning for big code and naturalness","DOI":"10.1145\/3212695"},{"issue":"5","key":"10405_CR2","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1109\/MS.2008.130","volume":"25","author":"N Ayewah","year":"2008","unstructured":"Ayewah N, Pugh W, Hovemeyer D, Morgenthaler JD, Penix J (2008) Using static analysis to find bugs. IEEE Softw 25(5):22\u201329","journal-title":"IEEE Softw"},{"key":"10405_CR3","doi-asserted-by":"crossref","unstructured":"Ayewah N, Pugh W, Morgenthaler JD, Penix J, Zhou YQ (2007) Using find bugs on production software. In OOPSLA\u201907","DOI":"10.1145\/1297846.1297897"},{"key":"10405_CR4","unstructured":"Buratti L, Pujar S, Bornea M, McCarley JS, Zheng Y, Rossiello G, Morari A, Laredo J, Thost V, Zhuang Y, Domeniconi G (2020) Exploring software naturalness through neural language models. CoRR, abs\/2006.12641"},{"issue":"26","key":"10405_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2049697.2049700","volume":"58","author":"C Calcagno","year":"2011","unstructured":"Calcagno C, Distefano D, O\u2019Hearn PW, Yang H (2011) Compositional shape analysis by means of bi-abduction. J ACM 58(26):1\u201366","journal-title":"J ACM"},{"key":"10405_CR6","unstructured":"Chandrasekaran D, Mago V (2020) Evolution of semantic similarity - a survey. CoRR, abs\/2004.13820. https:\/\/arxiv.org\/abs\/2004.13820"},{"key":"10405_CR7","doi-asserted-by":"crossref","unstructured":"Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD\u201916. ACM, New York, pp 785\u2013794. ISBN 978-1-4503-4232-2. http:\/\/doi.acm.org\/10.1145\/2939672.2939785","DOI":"10.1145\/2939672.2939785"},{"key":"10405_CR8","doi-asserted-by":"crossref","unstructured":"Choi M-J, Jeong S, Oh H, Choo J (2017) End-to-end prediction of buffer overruns from raw source code via neural memory networks. In: Proceedings of the 26th international joint conference on artificial intelligence, IJCAI\u201917","DOI":"10.24963\/ijcai.2017\/214"},{"key":"10405_CR9","unstructured":"Clang (2023). Clang tooling. https:\/\/clang.llvm.org\/docs\/Tooling.html"},{"key":"10405_CR10","unstructured":"Cppcheck-team (2023). Cppcheck. http:\/\/cppcheck.sourceforge.net\/"},{"key":"10405_CR11","unstructured":"CWE400 (2023). Cwe-400: uncontrolled resource consumption. https:\/\/cwe.mitre.org\/data\/definitions\/400.html"},{"key":"10405_CR12","unstructured":"CWE457 (2023) Cwe-457: use of uninitialized variable. https:\/\/cwe.mitre.org\/data\/definitions\/457.html"},{"key":"10405_CR13","unstructured":"CWE476 Cwe-476: null pointer dereference. https:\/\/cwe.mitre.org\/data\/definitions\/476.html"},{"key":"10405_CR14","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: human language technologies, vol 1 (Long and Short Papers), pp 4171\u20134186"},{"key":"10405_CR15","unstructured":"Dorogush AV, Ershov V, Gulin A (2018) Catboost: gradient boosting with categorical features support. arXiv:1810.11363"},{"key":"10405_CR16","doi-asserted-by":"crossref","unstructured":"Du X, Chen B, Li Y, Guo J, Zhou Y, Liu Y, Jiang Y (2019) Leopard: identifying vulnerable code for vulnerability assessment through program metrics. In: 2019 IEEE\/ACM 41st international conference on software engineering (ICSE). IEEE, pp 60\u201371","DOI":"10.1109\/ICSE.2019.00024"},{"key":"10405_CR17","unstructured":"Facebook (2023a) Infer static analyzer. https:\/\/fbinfer.com\/"},{"key":"10405_CR18","unstructured":"Facebook (2023b) Infer reportdiff. https:\/\/fbinfer.com\/docs\/man-infer-reportdiff"},{"key":"10405_CR19","doi-asserted-by":"crossref","unstructured":"Fan G, Wu R, Shi Q, Xiao X, Zhou J, Zhang C (2019) Smoke: scalable path-sensitive memory leak detection for millions of lines of code. In ICSE\u201919","DOI":"10.1109\/ICSE.2019.00025"},{"key":"10405_CR20","doi-asserted-by":"publisher","unstructured":"Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M (2020) CodeBERT: a pre-trained model for programming and natural languages. In: Findings of the association for computational linguistics: EMNLP 2020, pp 1536\u20131547. https:\/\/doi.org\/10.18653\/v1\/2020.findings-emnlp.139","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"10405_CR21","unstructured":"Flynn L (2016) Prioritizing alerts from static analysis to find and fix code flaws. http:\/\/insights.sei.cmu.edu\/blog\/prioritizing-alerts-from-static-analysis-to-find-and-fix-code-flaws\/"},{"key":"10405_CR22","doi-asserted-by":"crossref","unstructured":"Guarnieri S, Pistoia M, Tripp O, Dolby J, Teilhet S, Berg R (2011) Saving the world wide web from vulnerable Javascript. In: Proceedings of the 2011 international symposium on software testing and analysis, ISSTA\u201911","DOI":"10.1145\/2001420.2001442"},{"key":"10405_CR23","unstructured":"Guo D, Ren S, Lu S, Feng Z, Tang D, Liu S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng SK, Clement C, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M (2021) Graphcodebert: pre-training code representations with data flow. In: International conference on learning representations"},{"key":"10405_CR24","doi-asserted-by":"crossref","unstructured":"Hanam Q, Tan L, Holmes R, Lam P (2014) Finding patterns in static analysis alerts: improving actionable alert ranking. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014, pp 152\u2013161","DOI":"10.1145\/2597073.2597100"},{"key":"10405_CR25","doi-asserted-by":"publisher","unstructured":"Hindle A, Barr ET, Su Z, Gabel M, Devanbu P (2012) On the naturalness of software. In: 2012 34th international conference on software engineering (ICSE), pp 837\u2013847. https:\/\/doi.org\/10.1109\/ICSE.2012.6227135","DOI":"10.1109\/ICSE.2012.6227135"},{"key":"10405_CR26","unstructured":"Infer. Infer issue types. https:\/\/github.com\/facebook\/infer\/blob\/ea4f7cf\/infer\/man\/man1\/infer.txt#L370"},{"key":"10405_CR27","doi-asserted-by":"crossref","unstructured":"Johnson B, Song Y, Murphy-Hill E, Bowdidge R (2013) Why don\u2019t software developers use static analysis tools to find bugs? In ICSE\u201913, pp 672\u2013681","DOI":"10.1109\/ICSE.2013.6606613"},{"key":"10405_CR28","doi-asserted-by":"crossref","unstructured":"Jung Y, Kim J, Shin J, Yi K (2013) Taming false alarms from a domain-unaware C analyzer by a bayesian statistical post analysis. In: Proceedings of the 12th international conference on static analysis, SAS\u201905, pp 203\u2013217","DOI":"10.1007\/11547662_15"},{"key":"10405_CR29","unstructured":"Kanade A, Maniatis P, Balakrishnan G, Shi K (2020) Learning and evaluating contextual embedding of source code. In: International conference on machine learning. PMLR, pp 5110\u20135121"},{"key":"10405_CR30","unstructured":"Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30:3146\u20133154"},{"key":"10405_CR31","first-page":"3146","volume":"30","author":"G Ke","year":"2017","unstructured":"Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30:3146\u20133154","journal-title":"Adv Neural Inf Process Syst"},{"key":"10405_CR32","doi-asserted-by":"crossref","unstructured":"Koc U, Saadatpanah P, Foster JS, Porter AA (2017a) Learning a classifier for false positive error reports emitted by static code analysis tools. In MAPL\u201917, pp 35\u201342","DOI":"10.1145\/3088525.3088675"},{"key":"10405_CR33","doi-asserted-by":"crossref","unstructured":"Koc U, Saadatpanah P, Foster JS, Porter AA (2017b) Learning a classifier for false positive error reports emitted by static code analysis tools. In MAPL\u201917","DOI":"10.1145\/3088525.3088675"},{"key":"10405_CR34","doi-asserted-by":"crossref","unstructured":"Kremenek T, Engler DR (2003) Z-ranking: using statistical analysis to counter the impact of static analysis approximations. In: Cousot R (ed), Static analysis, 10th international symposium, SAS 2003","DOI":"10.1007\/3-540-44898-5_16"},{"key":"10405_CR35","doi-asserted-by":"crossref","unstructured":"Kudo T, Richardson J (2018) Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In EMNLP","DOI":"10.18653\/v1\/D18-2012"},{"key":"10405_CR36","doi-asserted-by":"crossref","unstructured":"LaToza TD, Venolia G, DeLine R (2006) Maintaining mental models: a study of developer work habits. In: Proceedings of the 28th international conference on software engineering","DOI":"10.1145\/1134285.1134355"},{"key":"10405_CR37","doi-asserted-by":"crossref","unstructured":"Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) Vuldeepecker: a deep learning-based system for vulnerability detection. In: 25th annual network and distributed system security symposium, NDSS\u201918","DOI":"10.14722\/ndss.2018.23158"},{"issue":"10","key":"10405_CR38","doi-asserted-by":"publisher","first-page":"1825","DOI":"10.1109\/JPROC.2020.2993293","volume":"108","author":"G Lin","year":"2020","unstructured":"Lin G, Wen S, Han QL, Zhang J, Xiang Y (2020) Software vulnerability detection using deep neural networks: a survey. Proc IEEE 108(10):1825\u20131848","journal-title":"Proc IEEE"},{"key":"10405_CR39","unstructured":"Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. ArXiv, abs\/1907.11692"},{"key":"10405_CR40","unstructured":"Livshits VB, Lam MS (2005) Finding security vulnerabilities in java applications with static analysis. In: Proceedings of the 14th conference on USENIX security symposium"},{"key":"10405_CR41","unstructured":"LLVM. The clang static analyzer. https:\/\/clang-analyzer.llvm.org\/"},{"key":"10405_CR42","unstructured":"Flynn L, Snavely W, Kurtz Z (2018) Test suites as a source of training data for static analysis alert classifiers. SEI Blog. https:\/\/insights.sei.cmu.edu\/sei_blog\/2018\/04\/static-analysis-alert-test-suites-as-a-source-of-training-data-for-alert-classifiers.html"},{"key":"10405_CR43","unstructured":"Lu S, Guo D, Ren S, Huang J, Svyatkovskiy A, Blanco A, Clement C, Drain D, Jiang D, Tang D, Li G, Zhou L, Shou L, Zhou L, Tufano M, Gong M, Zhou M, Duan N, Sundaresan N, Deng SK, Fu S, Liu S (2021) Codexglue: a machine learning benchmark dataset for code understanding and generation. ArXiv, abs\/2102.04664"},{"key":"10405_CR44","unstructured":"MITRETop25. Cwe top 25 most dangerous software weaknessess. https:\/\/cwe.mitre.org\/top25\/archive\/2022\/2022_cwe_top25.html"},{"issue":"1","key":"10405_CR45","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1109\/TSE.2014.2357438","volume":"41","author":"E Murphy-Hill","year":"2015","unstructured":"Murphy-Hill E, Zimmermann T, Bird C, Nagappan N (2015) The design space of bug fixes and how developers navigate it. IEEE Trans Software Eng 41(1):65\u201381","journal-title":"IEEE Trans Software Eng"},{"key":"10405_CR46","doi-asserted-by":"crossref","unstructured":"Muske T, Serebrenik A (2016) Survey of approaches for handling static analysis alarms. In: 2016 IEEE 16th international working conference on source code analysis and manipulation (SCAM), pp 157\u2013166","DOI":"10.1109\/SCAM.2016.25"},{"key":"10405_CR47","doi-asserted-by":"crossref","unstructured":"Muske TB, Baid A, Sanas T (2013) Review efforts reduction by partitioning of static analysis warnings. In: 13th international working conference on source code analysis and manipulation","DOI":"10.1109\/SCAM.2013.6648191"},{"key":"10405_CR48","doi-asserted-by":"crossref","unstructured":"Nie P, Zhang J, Li JJ, Mooney RJ, Gligoric M (2021) Impact of evaluation methodologies on code summarization. arXiv:2108.09619","DOI":"10.18653\/v1\/2022.acl-long.339"},{"key":"10405_CR49","unstructured":"NIST (2023a) National vulnerability database. https:\/\/nvd.nist.gov\/"},{"key":"10405_CR50","unstructured":"NIST (2023b) Juliet test suite for c\/c++ version 1.3. https:\/\/samate.nist.gov\/SRD\/testsuite.php"},{"key":"10405_CR51","doi-asserted-by":"crossref","unstructured":"O\u2019Hearn P, Reynolds J, Yang H (2001) Local reasoning about programs that alter data structures. LNCS 2142","DOI":"10.1007\/3-540-44802-0_1"},{"key":"10405_CR52","doi-asserted-by":"crossref","unstructured":"Paletov R, Tsankov P, Raychev V, Vechev M (2018) Inferring crypto api rules from code changes. In: Proceedings of the 39th ACM SIGPLAN conference on programming language design and implementation, PLDI 2018, pp 450\u2013464","DOI":"10.1145\/3192366.3192403"},{"key":"10405_CR53","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay \u00c9 (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(85):2825\u20132830. http:\/\/jmlr.org\/papers\/v12\/pedregosa11a.html"},{"key":"10405_CR54","unstructured":"Puri R, Kung DS, Janssen G, Zhang W, Domeniconi G, Zolotov V, Dolby J, Chen J, Choudhury MR, Decker L, Thost V, Buratti L, Pujar S, Finkler U (2021) Project codenet: a large-scale AI for code dataset for learning a diversity of coding tasks. ArXiv, abs\/2105.12655"},{"key":"10405_CR55","doi-asserted-by":"crossref","unstructured":"Raghothaman M, Kulkarni S, Heo K, Naik M (2018) User-guided program reasoning using bayesian inference. In Proceedings of the 39th ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI 2018, pp 722\u2013735","DOI":"10.1145\/3192366.3192417"},{"key":"10405_CR56","doi-asserted-by":"crossref","unstructured":"Ray B, Hellendoorn V, Godhane S, Tu Z, Bacchelli A, Devanbu P (2016) On the naturalness of buggy code. ICSE \u201916, pp 428\u2013439","DOI":"10.1145\/2884781.2884848"},{"key":"10405_CR57","doi-asserted-by":"crossref","unstructured":"Reynolds ZP, Jayanth AB, Koc U, Porter AA, Raje RR, Hill JH (2017) Identifying and documenting false positive patterns generated by static code analysis tools. In: 4th international workshop on software engineering research and industrial practice","DOI":"10.1109\/SER-IP.2017..20"},{"key":"10405_CR58","doi-asserted-by":"crossref","unstructured":"Russell RL, Kim LY, Hamilton LH, Lazovich T, Harer J, Ozdemir O, Ellingwood PM, McConley MW (2018) Automated vulnerability detection in source code using deep representation learning. In ICMLA\u201918","DOI":"10.1109\/ICMLA.2018.00120"},{"key":"10405_CR59","doi-asserted-by":"crossref","unstructured":"Sahami M, Heilman TD (2006) A web-based kernel function for measuring the similarity of short text snippets. In WWW \u201906","DOI":"10.1145\/1135777.1135834"},{"key":"10405_CR60","unstructured":"Sestili CD, Snavely WS, VanHoudnos NM (2018) Towards security defect prediction with AI. CoRR, abs\/1808.09897. http:\/\/arxiv.org\/abs\/1808.09897"},{"key":"10405_CR61","doi-asserted-by":"crossref","unstructured":"Sui Y, Cheng X, Zhang G, Wang H (2020) Flow2vec: value-flow-based precise code embedding. OOPSLA","DOI":"10.1145\/3428301"},{"key":"10405_CR62","unstructured":"Suneja S, Zheng Y, Zhuang Y, Laredo J, Morari A (2020) Learning to map source code to software vulnerability using code-as-a-graph. CoRR, abs\/2006.08614"},{"key":"10405_CR63","doi-asserted-by":"crossref","unstructured":"Tripp O, Guarnieri S, Pistoia M, Aravkin A (2014) ALETHEIA: improving the usability of static security analysis. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp 762\u2013774","DOI":"10.1145\/2660267.2660339"},{"key":"10405_CR64","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems"},{"key":"10405_CR65","unstructured":"Villard J (2023). Infer is not deterministic, infer issue #1110. https:\/\/github.com\/facebook\/infer\/issues\/1110"},{"key":"10405_CR66","doi-asserted-by":"publisher","unstructured":"Wang Y, Wang W, Joty S, Hoi SCH (2021) CodeT5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 8696\u20138708. Association for computational linguistics, November 2021. https:\/\/doi.org\/10.18653\/v1\/2021.emnlp-main.685","DOI":"10.18653\/v1\/2021.emnlp-main.685"},{"key":"10405_CR67","unstructured":"Wheeler DA (2023). Flawfinder. https:\/\/dwheeler.com\/flawfinder\/"},{"key":"10405_CR68","unstructured":"Wiki (2023). Libav. https:\/\/en.wikipedia.org\/wiki\/Libav#Fork_from_FFmpeg"},{"issue":"2","key":"10405_CR69","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1016\/S0893-6080(05)80023-1","volume":"5","author":"DH Wolpert","year":"1992","unstructured":"Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241\u2013259","journal-title":"Neural Netw"},{"key":"10405_CR70","doi-asserted-by":"publisher","unstructured":"Yamaguchi F, Golde N, Arp D, Rieck K (2014) Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE symposium on security and privacy, pp 590\u2013604. https:\/\/doi.org\/10.1109\/SP.2014.44","DOI":"10.1109\/SP.2014.44"},{"key":"10405_CR71","doi-asserted-by":"crossref","unstructured":"Yamaguchi F, Maier A, Gascon H, Rieck K (2015) Automatic inference of search patterns for taint-style vulnerabilities. In: 2015 IEEE symposium on security and privacy","DOI":"10.1109\/SP.2015.54"},{"key":"10405_CR72","unstructured":"Yin T (2019) Lizard: an extensible cyclomatic complexity analyzer"},{"key":"10405_CR73","doi-asserted-by":"crossref","unstructured":"Y\u00fcksel U, S\u00f6zer H (2013a) Automated classification of static code analysis alerts: a case study. In ICSM\u201913","DOI":"10.1109\/ICSM.2013.89"},{"key":"10405_CR74","doi-asserted-by":"crossref","unstructured":"Y\u00fcksel U, S\u00f6zer H (2013b) Automated classification of static code analysis alerts: a case study. In: 2013 IEEE international conference on software maintenance, pp 532\u2013535","DOI":"10.1109\/ICSM.2013.89"},{"key":"10405_CR75","doi-asserted-by":"crossref","unstructured":"Zhang X, Si X, Naik M (2017) Combining the logical and the probabilistic in program analysis. In: Proceedings of the 1st ACM SIGPLAN international workshop on machine learning and programming languages, MAPL 2017, pp 27\u201334","DOI":"10.1145\/3088525.3088563"},{"key":"10405_CR76","unstructured":"Zheng Y (2023). Parallelism gives inconsistent results, infer issue #1239. https:\/\/github.com\/facebook\/infer\/issues\/1239"},{"key":"10405_CR77","doi-asserted-by":"crossref","unstructured":"Zheng Y, Pujar S, Lewis B, Buratti L, Epstein E, Yang B, Laredo J, Morari A, Su Z (2021) D2a: a dataset built for ai-based vulnerability detection methods using differential analysis. In: 2021 IEEE\/ACM 43rd international conference on software engineering: software engineering in practice (ICSE-SEIP). IEEE, pp 111\u2013120","DOI":"10.1109\/ICSE-SEIP52600.2021.00020"},{"key":"10405_CR78","doi-asserted-by":"crossref","unstructured":"Zhou Y, Liu S, Siow JK, Du X, Liu Y (2019) Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In NeurIPS\u201919","DOI":"10.1007\/978-3-031-01587-8_4"}],"container-title":["Empirical Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-023-10405-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10664-023-10405-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-023-10405-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,23]],"date-time":"2024-03-23T02:21:05Z","timestamp":1711160465000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10664-023-10405-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,22]]},"references-count":78,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,3]]}},"alternative-id":["10405"],"URL":"https:\/\/doi.org\/10.1007\/s10664-023-10405-9","relation":{},"ISSN":["1382-3256","1573-7616"],"issn-type":[{"value":"1382-3256","type":"print"},{"value":"1573-7616","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,22]]},"assertion":[{"value":"28 September 2023","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 February 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"48"}}