{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T23:29:32Z","timestamp":1740180572862,"version":"3.37.3"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,4,3]],"date-time":"2023-04-03T00:00:00Z","timestamp":1680480000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,4,3]],"date-time":"2023-04-03T00:00:00Z","timestamp":1680480000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Science Foundation of China","doi-asserted-by":"crossref","award":["U1836211"],"award-info":[{"award-number":["U1836211"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Cybersecurity"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Locating bug code snippets (short for BugCode) has been a complex problem throughout the history of software security, mainly because the constraints that define BugCode are obscure and hard to summarize. Previously, security analysts attempted to define such constraints manually (e.g., limiting buffer size to detect overflow), but were limited to the types of BugCode. Recent researchers address this problem by extracting constraints from program documentation, which shows the potential for API misuse. But for bugs beyond the scope of API misuse, such an approach becomes less effective since the corresponding constraints are not defined in documents, not to mention the programs without documentation In this paper, inspired by the fact that expert programmers often correct the BugCode on open forums such as StackOverflow, we design an approach to automatically extract knowledge from StackOverflow and leverage it to detect BugCode. As we all know, the contexts in StackOverflow come from ordinary developers. Their writing tends to be loosely organized and in various styles, which are more challenging to analyze than program documentation. To address the challenges, we design a custom tokenization approach to segment sentences and employ sentiment analysis to find the Controversial Sentences (CSs) that typically contain the constraints we need for code analysis. Then we use constituency parsing to extract knowledge from CSs, which helps locate BugCode. We evaluated our system on 41,144 comments from the questions tagged with Java and Android. The results show that our approach achieves 95.5% precision in discovering CSs. We have discovered 276 pieces of BugCode proved to be true through manual validation including an assigned CVE. 89.3% of the discovered bugs remained in the current version of answers, which are unknown to users.<\/jats:p>","DOI":"10.1186\/s42400-023-00153-0","type":"journal-article","created":{"date-parts":[[2023,4,3]],"date-time":"2023-04-03T10:28:53Z","timestamp":1680517733000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Jeu de mots paronomasia: a StackOverflow-driven bug discovery approach"],"prefix":"10.1186","volume":"6","author":[{"given":"Yi","family":"Yang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2628-3737","authenticated-orcid":false,"given":"Ying","family":"Li","sequence":"additional","affiliation":[]},{"given":"Kai","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Jinghua","family":"Liu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,4,3]]},"reference":[{"key":"153_CR1","doi-asserted-by":"crossref","unstructured":"Aggarwal A, Jalote P (2006) Integrating static and dynamic analysis for detecting vulnerabilities. In: 30th annual international computer software and applications conference (COMPSAC\u201906), vol 1, pp 343\u2013350. IEEE","DOI":"10.1109\/COMPSAC.2006.55"},{"key":"153_CR2","unstructured":"Ahmadi M, Farkhani RM, Williams R, Lu L (2021) Finding bugs using your own code: detecting functionally-similar yet inconsistent code. In: 30th USENIX security symposium (USENIX Security 21), pp 2025\u20132040"},{"issue":"10","key":"153_CR3","doi-asserted-by":"publisher","first-page":"326","DOI":"10.3390\/info10100326","volume":"10","author":"A Amin","year":"2019","unstructured":"Amin A, Eldessouki A, Magdy MT, Abdeen N, Hindy H, Hegazy I (2019) Androshield: automated android applications vulnerability detection, a hybrid static and dynamic analysis approach. Information 10(10):326","journal-title":"Information"},{"issue":"5","key":"153_CR4","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1109\/MS.2008.130","volume":"25","author":"N Ayewah","year":"2008","unstructured":"Ayewah N, Pugh W, Hovemeyer D, Morgenthaler JD, Penix J (2008) Using static analysis to find bugs. IEEE Softw 25(5):22\u201329","journal-title":"IEEE Softw"},{"key":"153_CR5","doi-asserted-by":"crossref","unstructured":"B\u00f6hme M, Pham V-T, Nguyen M-D, Roychoudhury A (2017) Directed greybox fuzzing. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 2329\u20132344","DOI":"10.1145\/3133956.3134020"},{"key":"153_CR6","unstructured":"Bug-java (2022) http:\/\/bugs.java.com\/bugdatabase\/view_bug.do?bug_id=5003595"},{"key":"153_CR7","doi-asserted-by":"crossref","unstructured":"Chen M, Fischer F, Meng N, Wang X, Grossklags J (2019) How reliable is the crowdsourced knowledge of security implementation? In: 2019 IEEE\/ACM 41st international conference on software engineering (ICSE), pp 536\u2013547 (2019). IEEE","DOI":"10.1109\/ICSE.2019.00065"},{"key":"153_CR8","doi-asserted-by":"crossref","unstructured":"Chen D, Manning CD (2014) A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 740\u2013750","DOI":"10.3115\/v1\/D14-1082"},{"key":"153_CR9","doi-asserted-by":"crossref","unstructured":"Chen H, Xue Y, Li Y, Chen B, Xie X, Wu X, Liu Y (2018) Hawkeye: Towards a desired directed grey-box fuzzer. In: Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, pp 2095\u20132108","DOI":"10.1145\/3243734.3243849"},{"key":"153_CR10","unstructured":"Constituency Parser (2022) https:\/\/github.com\/nitishgupta\/constituency-parse-predictor"},{"key":"153_CR11","unstructured":"CVE-2020-8908. https:\/\/cve.mitre.org\/cgi-bin\/cvename.cgi?name=CVE-2020-8908 (2022)"},{"key":"153_CR12","doi-asserted-by":"crossref","unstructured":"Fischer F, B\u00f6ttinger K, Xiao H, Stransky C, Acar Y, Backes M, Fahl S (2017) Stack overflow considered harmful? the impact of copy & paste on android application security. In: 2017 IEEE symposium on security and privacy (SP), pp 121\u2013136. IEEE","DOI":"10.1109\/SP.2017.31"},{"key":"153_CR13","doi-asserted-by":"crossref","unstructured":"Gardner M, Grus J, Neumann M, Tafjord O, Dasigi P, Liu NF, Peters M, Schmitz M, Zettlemoyer LS (2017) Allennlp: a deep semantic natural language processing platform arXiv:1803.07640","DOI":"10.18653\/v1\/W18-2501"},{"key":"153_CR14","unstructured":"Google NLP API (2022) https:\/\/cloud.google.com\/docs\/"},{"key":"153_CR15","unstructured":"Infer (2022) https:\/\/github.com\/facebook\/infer\/"},{"key":"153_CR16","unstructured":"Issue-3 (2022) https:\/\/github.com\/rakcy\/code-scanner-demo\/issues\/3\/"},{"key":"153_CR17","unstructured":"Joshi C, Singh UK, Tarey K (2015) A review on taxonomies of attacks and vulnerability in computer and network system. Int J 5(1)"},{"key":"153_CR18","doi-asserted-by":"crossref","unstructured":"Kiss B, Kosmatov N, Pariente D, Puccetti A (2015) Combining static and dynamic analyses for vulnerability detection: illustration on heartbleed. In: Hardware and software: verification and testing: 11th international Haifa verification conference, HVC 2015, Haifa, Israel, November 17\u201319, 2015, Proceedings 11, pp 39\u201350. Springer, Berlin","DOI":"10.1007\/978-3-319-26287-1_3"},{"issue":"1","key":"153_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s42400-018-0002-y","volume":"1","author":"J Li","year":"2018","unstructured":"Li J, Zhao B, Zhang C (2018) Fuzzing: a survey. Cybersecurity 1(1):1\u201313","journal-title":"Cybersecurity"},{"key":"153_CR20","doi-asserted-by":"crossref","unstructured":"Lv T, Li R, Yang Y, Chen K, Liao X, Wang X, Hu P, Xing L (2020) Rtfm! automatic assumption discovery and verification derivation from library document for api misuse detection. In: Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pp 1837\u20131852","DOI":"10.1145\/3372297.3423360"},{"issue":"4","key":"153_CR21","doi-asserted-by":"publisher","first-page":"1093","DOI":"10.1016\/j.asej.2014.04.011","volume":"5","author":"W Medhat","year":"2014","unstructured":"Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093\u20131113","journal-title":"Ain Shams Eng J"},{"key":"153_CR22","doi-asserted-by":"publisher","first-page":"102516","DOI":"10.1016\/j.scico.2020.102516","volume":"199","author":"S Meldrum","year":"2020","unstructured":"Meldrum S, Licorish SA, Owen CA, Savarimuthu BTR (2020) Understanding stack overflow code quality: a recommendation of caution. Sci Comput Program 199:102516","journal-title":"Sci Comput Program"},{"key":"153_CR23","unstructured":"NLTK (2022) https:\/\/nltk.org\/"},{"key":"153_CR24","doi-asserted-by":"crossref","unstructured":"Pandita R, Taneja K, Williams L, Tung T (2016) Icon: inferring temporal constraints from natural language API descriptions. In: 2016 IEEE international conference on software maintenance and evolution (ICSME), pp 378\u2013388. IEEE","DOI":"10.1109\/ICSME.2016.59"},{"key":"153_CR25","doi-asserted-by":"crossref","unstructured":"Ren X, Sun J, Xing Z, Xia X, Sun J (2020) Demystify official API usage directives with crowdsourced API misuse scenarios, erroneous code examples and patches. In: Proceedings of the ACM\/IEEE 42nd international conference on software engineering, pp 925\u2013936","DOI":"10.1145\/3377811.3380430"},{"key":"153_CR26","doi-asserted-by":"crossref","unstructured":"Ren X, Xing Z, Xia X, Li G, Sun J (2019) Discovering, explaining and summarizing controversial discussions in community q &a sites. In: 2019 34th IEEE\/ACM international conference on automated software engineering (ASE), pp 151\u2013162. IEEE","DOI":"10.1109\/ASE.2019.00024"},{"key":"153_CR27","doi-asserted-by":"crossref","unstructured":"Ren X, Ye X, Xing Z, Xia X, Xu X, Zhu L, Sun J (2020) API-misuse detection driven by fine-grained API-constraint knowledge graph. In: 2020 35th IEEE\/ACM international conference on automated software engineering (ASE), pp 461\u2013472 . IEEE","DOI":"10.1145\/3324884.3416551"},{"key":"153_CR28","unstructured":"SO-617414 (2022) https:\/\/stackoverflow.com\/questions\/617414\/how-to-create-a-temporary-directory-folder-in-java\/6403880#6403880"},{"key":"153_CR29","unstructured":"SO-vote-up (2022) https:\/\/stackoverflow.com\/help\/privileges\/vote-up"},{"key":"153_CR30","unstructured":"Spacy (2022) https:\/\/spacy.io\/"},{"key":"153_CR31","unstructured":"Stackexchange (2022) https:\/\/api.stackexchange.com\/"},{"key":"153_CR32","unstructured":"Stackoverflow. https:\/\/stackoverflow.com\/ (2022)"},{"key":"153_CR33","unstructured":"Stanfordnlp (2022) https:\/\/nlp.stanford.edu\/software\/"},{"key":"153_CR34","doi-asserted-by":"crossref","unstructured":"Yamaguchi F, Wressnegger C, Gascon H, Rieck K (2013) Chucky: exposing missing checks in source code for vulnerability discovery. In: Proceedings of the 2013 ACM SIGSAC conference on computer & communications security, pp.499\u2013510","DOI":"10.1145\/2508859.2516665"},{"key":"153_CR35","doi-asserted-by":"crossref","unstructured":"You W, Zong P, Chen K, Wang X, Liao X, Bian P, Liang B (2017) Semfuzz: semantics-based automatic generation of proof-of-concept exploits. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 2139\u20132154","DOI":"10.1145\/3133956.3134085"},{"issue":"10","key":"153_CR36","doi-asserted-by":"publisher","first-page":"1898","DOI":"10.1007\/s11431-020-1666-4","volume":"63","author":"M Zhang","year":"2020","unstructured":"Zhang M (2020) A survey of syntactic-semantic parsing based on constituent and dependency structures. Sci China Technol Sci 63(10):1898\u20131920","journal-title":"Sci China Technol Sci"},{"issue":"3","key":"153_CR37","doi-asserted-by":"publisher","first-page":"227","DOI":"10.1007\/s10515-011-0082-3","volume":"18","author":"H Zhong","year":"2011","unstructured":"Zhong H, Zhang L, Xie T, Mei H (2011) Inferring specifications for resources from natural language API documentation. Autom Softw Eng 18(3):227\u2013261","journal-title":"Autom Softw Eng"},{"key":"153_CR38","doi-asserted-by":"crossref","unstructured":"Zhou Y, Gu R, Chen T, Huang Z, Panichella S, Gall H (2017) Analyzing APIs documentation and code to detect directive defects. In: 2017 IEEE\/ACM 39th international conference on software engineering (ICSE), pp 27\u201337. IEEE","DOI":"10.1109\/ICSE.2017.11"},{"key":"153_CR39","unstructured":"Zong P, Lv T, Wang D, Deng Z, Liang R, Chen K (2020) $$\\{$$FuzzGuard$$\\}$$: Filtering out unreachable inputs in directed grey-box fuzzing through deep learning. In: 29th USENIX security symposium (USENIX security 20), pp 2255\u20132269"}],"container-title":["Cybersecurity"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s42400-023-00153-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s42400-023-00153-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s42400-023-00153-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,3]],"date-time":"2023-04-03T10:41:08Z","timestamp":1680518468000},"score":1,"resource":{"primary":{"URL":"https:\/\/cybersecurity.springeropen.com\/articles\/10.1186\/s42400-023-00153-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,3]]},"references-count":39,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["153"],"URL":"https:\/\/doi.org\/10.1186\/s42400-023-00153-0","relation":{},"ISSN":["2523-3246"],"issn-type":[{"type":"electronic","value":"2523-3246"}],"subject":[],"published":{"date-parts":[[2023,4,3]]},"assertion":[{"value":"28 December 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 March 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 April 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"We confirm that none of the authors has any competing interests in the manuscript.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"17"}}