{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T07:33:08Z","timestamp":1775719988083,"version":"3.50.1"},"reference-count":75,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2023,7,22]],"date-time":"2023-07-22T00:00:00Z","timestamp":1689984000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61972455"],"award-info":[{"award-number":["61972455"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2023,9,30]]},"abstract":"<jats:p>Software vulnerabilities, once disclosed, can be documented in vulnerability databases, which have great potential to advance vulnerability analysis and security research. People describe the key characteristics of software vulnerabilities in natural language mixed with domain-specific names and concepts. This textual nature poses a significant challenge for the automatic analysis of vulnerability knowledge embedded in text. Automatic extraction of key vulnerability aspects is highly desirable but demands significant effort to manually label data for model training.<\/jats:p>\n          <jats:p>In this article, we propose unsupervised methods to label and extract important vulnerability concepts in textual vulnerability descriptions (TVDs). We focus on six types of phrase-based vulnerability concepts (vulnerability type, vulnerable component, root cause, attacker type, impact, and attack vector) as they are much more difficult to label and extract than name- or number-based entities (i.e., vendor, product, and version). Our approach is based on a key observation that the same-type of phrases, no matter how they differ in sentence structures and phrase expressions, usually share syntactically similar paths in the sentence parsing trees. Specifically, we present a source-target neural architecture that learns the Part-of-Speech (POS) tagging to identify a token\u2019s functional role within TVDs, where the source neural model is trained to capture common features found in the TVD corpus, and the target model is trained to identify linguistically malformed words specific to the security domain. Our evaluation confirms that the proposed tagger outperforms (4.45%\u20135.98%) the taggers designed on natural language notions and identifies a broad set of TVDs and natural language contents. Then, based on the key observations, we propose two path representations (absolute paths and relative paths) and use an auto-encoder to encode such syntactic similarities. To address the discrete nature of our paths, we enhance the traditional Variational Auto-encoder (VAE) with Gumble-Max trick for categorical data distribution and thus create a Categorical VAE (CaVAE). In the latent space of absolute and relative paths, we further apply unsupervised clustering techniques to generate clusters of the same-type of concepts. Our evaluation confirms the effectiveness of our CaVAE, which achieves a small (85.85) log-likelihood for encoding path representations and the accuracy (83%\u201389%) of vulnerability concepts in the resulting clusters.<\/jats:p>\n          <jats:p>The resulting clusters accurately label six types of vulnerability concepts from a TVD corpus in an unsupervised way. Furthermore, these labeled vulnerability concepts can be mapped back to the corresponding phrases in the original TVDs, which produce labels of six types of vulnerability concepts. The resulting labeled TVDs can be used to train concept extraction models for other TVD corpora. In this work, we present two concept extraction methods (concept classification and sequence labeling model) to demonstrate the utility of the unsupervisedly labeled concepts. Our study shows that models trained with our unsupervisedly labeled vulnerability concepts outperform (3.9%\u20135.14%) those trained with the two manually labeled TVD datasets from previous work due to the consistent boundary and typing by our unsupervised labeling method.<\/jats:p>","DOI":"10.1145\/3579638","type":"journal-article","created":{"date-parts":[[2023,2,9]],"date-time":"2023-02-09T13:48:04Z","timestamp":1675950484000},"page":"1-45","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Extraction of Phrase-based Concepts in Vulnerability Descriptions through Unsupervised Labeling"],"prefix":"10.1145","volume":"32","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9247-7521","authenticated-orcid":false,"given":"Sofonias","family":"Yitagesu","sequence":"first","affiliation":[{"name":"College of Intelligence and Computing, Tianjin University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7663-1421","authenticated-orcid":false,"given":"Zhenchang","family":"Xing","sequence":"additional","affiliation":[{"name":"CSIRO\u2019s Data61 and School of Computing, Australia National University, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3931-3886","authenticated-orcid":false,"given":"Xiaowang","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Intelligence and Computing, Tianjin University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8158-7453","authenticated-orcid":false,"given":"Zhiyong","family":"Feng","sequence":"additional","affiliation":[{"name":"College of Intelligence and Computing, Tianjin University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0752-6764","authenticated-orcid":false,"given":"Xiaohong","family":"Li","sequence":"additional","affiliation":[{"name":"College of Intelligence and Computing, Tianjin University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9747-4426","authenticated-orcid":false,"given":"Linyi","family":"Han","sequence":"additional","affiliation":[{"name":"College of Intelligence and Computing, Tianjin University, China"}]}],"member":"320","published-online":{"date-parts":[[2023,7,22]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"265","volume-title":"Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, and Jeffrey Dean. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation 265\u2013283."},{"key":"e_1_3_1_3_2","first-page":"1638","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics.","author":"Akbik Alan","year":"2018","unstructured":"Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics.Emily M. Bender, Leon Derczynski, and Pierre Isabelle (Eds.), 1638\u20131649."},{"key":"e_1_3_1_4_2","first-page":"129","volume-title":"Proceedings of the 3rd International Symposium on Artificial Intelligence and Signal Processing","year":"2015","unstructured":"AtefehZafarian, Ali Rokni, Shahram Khadivi, and Sonia Ghiasifard. 2015. Semi-supervised learning for named entity recognition using weakly labeled training data. In Proceedings of the 3rd International Symposium on Artificial Intelligence and Signal Processing. 129\u2013135."},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","unstructured":"Hodaya Binyamini Ron Bitton Masaki Inokuchi Tomohiko Yagyu Yuval Elovici and Asaf Shabtai. 2020. An automated end-to-end framework for modeling attacks from vulnerability Descriptions. arXiv:2008.04377. Retrieved from https:\/\/arxiv.org\/abs\/2008.04377.","DOI":"10.1145\/3447548.3467159"},{"key":"e_1_3_1_6_2","first-page":"69","volume-title":"Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference.","author":"Bird Steven","year":"2006","unstructured":"Steven Bird. 2006. NLTK: The natural language toolkit. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference.69\u201372."},{"key":"e_1_3_1_7_2","first-page":"177","volume-title":"Proceedings of the 19th International Conference on Computational Statistics.","author":"Bottou L\u00e9on","year":"2010","unstructured":"L\u00e9on Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of the 19th International Conference on Computational Statistics.177\u2013186."},{"key":"e_1_3_1_8_2","first-page":"437","volume-title":"Proceedings of the 16th IEEE International Conference on Machine Learning and Applications.","author":"Bridges Robert A.","year":"2017","unstructured":"Robert A. Bridges, Kelly M. T. Huffer, Corinne L. Jones, Michael D. Iannacone, and John R. Goodall. 2017. Cybersecurity automated information extraction techniques: Drawbacks of current methods, and enhanced extractors. In Proceedings of the 16th IEEE International Conference on Machine Learning and Applications.437\u2013442."},{"key":"e_1_3_1_9_2","unstructured":"Robert A. Bridges Corinne L. Jones Michael D. Iannacone and John R. Goodall. 2013. Automatic labeling for entity extraction in cyber security. arXiv:1308.4941. Retrieved from https:\/\/arxiv.org\/abs\/1308.4941."},{"key":"e_1_3_1_10_2","first-page":"6503","volume-title":"Proceedings of the 28th International Joint Conference on Artificial Intelligence","author":"Chen Haipeng","year":"2019","unstructured":"Haipeng Chen, Jing Liu, Rui Liu, Noseong Park, and V. S. Subrahmanian. 2019. VEST: A system for vulnerability exploit scoring & timing. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 6503\u20136505. Francois Chollet. 2015. Keras. Retrieved from https:\/\/keras.io."},{"key":"e_1_3_1_11_2","unstructured":"Francois Chollet. 2015. Keras. Retrieved from https:\/\/keras.io."},{"key":"e_1_3_1_12_2","first-page":"1","volume-title":"Proceedings of the 7th Conference on Empirical Methods in Natural Language Processing.","author":"Collins Michael","year":"2002","unstructured":"Michael Collins. 2002. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 7th Conference on Empirical Methods in Natural Language Processing.1\u20138."},{"key":"e_1_3_1_13_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https:\/\/arxiv.org\/abs\/1810.04805."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.5555\/3361338.3361399"},{"key":"e_1_3_1_15_2","first-page":"226","volume-title":"Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining","year":"1996","unstructured":"Ester, Hans-Peter Kriegel, J\u00f6rg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. 226\u2013231."},{"key":"e_1_3_1_16_2","doi-asserted-by":"crossref","unstructured":"Zhangyin Feng Daya Guo Duyu Tang Nan Duan Xiaocheng Feng Ming Gong Linjun Shou Bing Qin Ting Liu Daxin Jiang and Ming Zhou. 2020. CodeBERT: A pre-trained model for programming and natural languages. arXiv:2002.08155. Retrieved from https:\/\/arxiv.org\/abs\/2002.08155.","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"issue":"19","key":"e_1_3_1_17_2","article-title":"Information extraction of cybersecurity concepts: An LSTM approach","volume":"9","author":"Gasmi Houssem","year":"2019","unstructured":"Houssem Gasmi, Jannik Laval, and Abdelaziz Bouras. 2019. Information extraction of cybersecurity concepts: An LSTM approach. Applied Sciences 9, 19 (2019).","journal-title":"Applied Sciences"},{"key":"e_1_3_1_18_2","article-title":"Word2vec","year":"2019","unstructured":"Google. 2019. Word2vec. Retrieved August 30, 2021 from https:\/\/code.google.com\/archive\/p\/word2vec\/.","journal-title":"Retrieved August 30, 2021 from https:\/\/code.google.com\/archive\/p\/word2vec\/"},{"key":"e_1_3_1_19_2","volume-title":"Statistical Theory of Extreme Values and Some Practical Applications: A Series of Lectures","author":"Gumbel Emil Julius","year":"1954","unstructured":"Emil Julius Gumbel. 1954. Statistical Theory of Extreme Values and Some Practical Applications: A Series of Lectures. US Government Printing Office."},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.5555\/3367032.3367077"},{"key":"e_1_3_1_21_2","first-page":"1020","volume-title":"Proceedings of the 45th IEEE Annual Computers, Software, and Applications Conference","author":"Guo Hao","year":"2021","unstructured":"Hao Guo, Zhenchang Xing, Sen Chen, Xiaohong Li, Yude Bai, and Hu Zhang. 2021. Key aspects augmentation of vulnerability description based on multiple security databases. In Proceedings of the 45th IEEE Annual Computers, Software, and Applications Conference. 1020\u20131025."},{"key":"e_1_3_1_22_2","article-title":"IBM X-Force Exchange","year":"2019","unstructured":"IBM. 2019. IBM X-Force Exchange. Retrieved June 30, 2021 from https:\/\/exchange.xforce.ibmcloud.com\/.","journal-title":"Retrieved June 30, 2021 from https:\/\/exchange.xforce.ibmcloud.com\/"},{"key":"e_1_3_1_23_2","volume-title":"Proceedings of the 5th International Conference on Learning Representations","author":"Jang Eric","year":"2017","unstructured":"Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical reparameterization with gumbel-softmax. In Proceedings of the 5th International Conference on Learning Representations."},{"key":"e_1_3_1_24_2","article-title":"key details phrasing","author":"Evans MITRE Jonathan","year":"2020","unstructured":"MITRE Jonathan Evans. 2020. key details phrasing. Retrieved June, 2021 from http:\/\/cveproject.github.io\/docs\/content\/key-details-phrasing.pdf.","journal-title":"Retrieved June, 2021 from http:\/\/cveproject.github.io\/docs\/content\/key-details-phrasing.pdf"},{"key":"e_1_3_1_25_2","first-page":"11:1\u201311:4","volume-title":"Proceedings of the 10th Annual Cyber and Information Security Research Conference.","author":"Jones Corinne L.","year":"2015","unstructured":"Corinne L. Jones, Robert A. Bridges, Kelly M. T. Huffer, and John R. Goodall. 2015. Towards a relation extraction framework for cyber-security concepts. In Proceedings of the 10th Annual Cyber and Information Security Research Conference.11:1\u201311:4."},{"key":"e_1_3_1_26_2","first-page":"252","volume-title":"Proceedings of the 7th IEEE International Conference on Semantic Computing.","author":"Joshi Arnav","year":"2013","unstructured":"Arnav Joshi, Ravendar Lal, Tim Finin, and Anupam Joshi. 2013. Extracting cybersecurity related linked data from text. In Proceedings of the 7th IEEE International Conference on Semantic Computing.252\u2013259."},{"key":"e_1_3_1_27_2","doi-asserted-by":"crossref","unstructured":"Simran K Sriram S Vinayakumar R and K. P. Soman. 2020. Deep learning approach for intelligent named entity recognition of cyber security. arXiv:2004.00502. Retrieved from https:\/\/arxiv.org\/abs\/2004.00502.","DOI":"10.1007\/978-981-15-4828-4_14"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/s13042-020-01122-6"},{"key":"e_1_3_1_29_2","volume-title":"Proceedings of the 3rd International Conference on Learning Representations","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations."},{"key":"e_1_3_1_30_2","volume-title":"Proceedings of the 2nd International Conference on Learning Representations.","author":"Kingma Diederik P.","year":"2014","unstructured":"Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations."},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.5555\/927682"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.2307\/2529786"},{"issue":"3","key":"e_1_3_1_33_2","first-page":"3072","article-title":"Sequence labeling with meta-learning","volume":"35","author":"Li Jing","year":"2023","unstructured":"Jing Li, Peng Han, Xiangnan Ren, Jilin Hu, Lisi Chen, and Shuo Shang. 2023. Sequence labeling with meta-learning. IEEE Transactions on Knowledge and Data Engineering 35, 3 (2023), 3072\u20133086.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_3_1_34_2","first-page":"429","volume-title":"Proceedings of the 20th Web Conference","author":"Li Jing","year":"2020","unstructured":"Jing Li, Shuo Shang, and Ling Shao. 2020. MetaNER: Named entity recognition with meta-learning. In Proceedings of the 20th Web Conference. 429\u2013440."},{"key":"e_1_3_1_35_2","unstructured":"Jing Li Aixin Sun Jianglei Han and Chenliang Li. 2018. A Survey on deep learning for named entity recognition. arXiv:1812.09449. Retrieved from https:\/\/arxiv.org\/abs\/1812.09449."},{"key":"e_1_3_1_36_2","first-page":"755","volume-title":"Proceedings of the 23rd ACM\/SIGSAC Conference on Computer and Communications Security.","author":"Liao Xiaojing","year":"2016","unstructured":"Xiaojing Liao, Kan Yuan, XiaoFeng Wang, Zhou Li, Luyi Xing, and Raheem A. Beyah. 2016. Acing the IOC game: Toward automatic discovery and analysis of open-source cyber threat intelligence. In Proceedings of the 23rd ACM\/SIGSAC Conference on Computer and Communications Security.755\u2013766."},{"key":"e_1_3_1_37_2","first-page":"1557","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics","author":"Lim Swee Kiat","year":"2017","unstructured":"Swee Kiat Lim, Aldrian Obaja Muis, Wei Lu, and Ong Chen Hui. 2017. MalwareTextDB: A database for annotated malware articles. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 1557\u20131567."},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41592-018-0308-4"},{"key":"e_1_3_1_39_2","volume-title":"Proceedings of the 5th International Conference on Learning Representations","author":"Maddison Chris J.","year":"2017","unstructured":"Chris J. Maddison, Andriy Mnih, and Yee Whye Teh. 2017. The concrete distribution: A continuous relaxation of discrete random variables. In Proceedings of the 5th International Conference on Learning Representations."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.5555\/972470.972475"},{"key":"e_1_3_1_41_2","first-page":"60","volume-title":"Proceedings of the 12th International Conference on Machine Learning and Applications","author":"McNeil Nikki","year":"2013","unstructured":"Nikki McNeil, Robert A. Bridges, Michael D. Iannacone, Bogdan D. Czejdo, Nicolas Perez, and John R. Goodall. 2013. PACE: Pattern accurate computationally efficient bootstrapping for timely discovery of cyber-security concepts. In Proceedings of the 12th International Conference on Machine Learning and Applications. 60\u201365."},{"key":"e_1_3_1_42_2","volume-title":"Proceedings of the Published by FIRST-forum of Incident Response and Security Teams","author":"Mell Peter","year":"2007","unstructured":"Peter Mell, Karen Scarfone, Sasha Romanosky, et\u00a0al. 2007. A complete guide to the common vulnerability scoring system version 2.0. In Proceedings of the Published by FIRST-forum of Incident Response and Security Teams."},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.5555\/541177"},{"key":"e_1_3_1_44_2","article-title":"National vulnerability database (NVD)","author":"MITRE Corporation","year":"2017","unstructured":"Corporation MITRE. 2017. National vulnerability database (NVD). Retrieved January 21, 2021 from https:\/\/nvd.nist.gov\/.","journal-title":"Retrieved January 21, 2021 from https:\/\/nvd.nist.gov\/"},{"key":"e_1_3_1_45_2","article-title":"Common Vulnerabilities and Exposures (CVE)","author":"MITRE Corporation","year":"2019","unstructured":"Corporation MITRE. 2019. Common Vulnerabilities and Exposures (CVE). Retrieved June 30, 2021 from https:\/\/cve.mitre.org\/.","journal-title":"Retrieved June 30, 2021 from https:\/\/cve.mitre.org\/"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASONAM.2016.7752338"},{"key":"e_1_3_1_47_2","first-page":"75","volume-title":"Proceedings of the 33rd IEEE Symposium on Security and Privacy Workshops","author":"More Sumit","year":"2012","unstructured":"Sumit More, Mary Matthews, Anupam Joshi, and Tim Finin. 2012. A knowledge-based approach to intrusion detection modeling. In Proceedings of the 33rd IEEE Symposium on Security and Privacy Workshops. 75\u201381."},{"key":"e_1_3_1_48_2","first-page":"257","volume-title":"Proceedings of the 2nd IEEE\/ACM International Joint Conferenceon Web Intelligence and Intelligent Agent Technology - Workshops.","author":"Mulwad Varish","year":"2011","unstructured":"Varish Mulwad, Wenjia Li, Anupam Joshi, Tim Finin, and Krishnamurthy Viswanathan. 2011. Extracting information about security vulnerabilities from web text. In Proceedings of the 2nd IEEE\/ACM International Joint Conferenceon Web Intelligence and Intelligent Agent Technology - Workshops.257\u2013260."},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.5555\/3104322.3104425"},{"key":"e_1_3_1_50_2","first-page":"7","volume-title":"Proceedings of the 16th IEEE International Conference on Intelligence and Security Informatics","author":"Neil Lorenzo","year":"2018","unstructured":"Lorenzo Neil, Sudip Mittal, and Anupam Joshi. 2018. Mining threat intelligence about open-source projects and libraries from code repository issues and bug reports. In Proceedings of the 16th IEEE International Conference on Intelligence and Security Informatics. 7\u201312."},{"key":"e_1_3_1_51_2","article-title":"National Institute of Standards and Technology (NIST)","year":"2017","unstructured":"NIST. 2017. National Institute of Standards and Technology (NIST). Retrieved June 21, 2021 from https:\/\/www.nist.gov\/.","journal-title":"Retrieved June 21, 2021 from https:\/\/www.nist.gov\/"},{"key":"e_1_3_1_52_2","doi-asserted-by":"crossref","unstructured":"Matthew E. Peters Mark Neumann Mohit Iyyer Matt Gardner Christopher Clark Kenton Lee and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv:1802.05365. Retrieved from https:\/\/arxiv.org\/abs\/1802.05365.","DOI":"10.18653\/v1\/N18-1202"},{"key":"e_1_3_1_53_2","first-page":"879","volume-title":"Proceedings of the 19th International Conference on Advances in Social Networks Analysis and Mining","author":"Pingle Aditya","year":"2019","unstructured":"Aditya Pingle, Aritran Piplai, Sudip Mittal, Anupam Joshi, James Holt, and Richard Zak. 2019. RelExt: Relation extraction using deep learning approaches for cybersecurity knowledge graph improvement. In Proceedings of the 19th International Conference on Advances in Social Networks Analysis and Mining. 879\u2013886."},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1631\/FITEE.1800520"},{"key":"e_1_3_1_55_2","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018) 1\u201312."},{"issue":"1","key":"e_1_3_1_56_2","first-page":"012012","article-title":"Determination of optimal epsilon (Eps) value on DBSCAN algorithm to clustering data on peatland hotspots in sumatra","volume":"31","author":"Rahmah Nadia","year":"2016","unstructured":"Nadia Rahmah and Imas Sukaesih Sitanggang. 2016. Determination of optimal epsilon (Eps) value on DBSCAN algorithm to clustering data on peatland hotspots in sumatra. IOP Conference Series: Earth and nvironmental Science 31, 1 (2016), 012012.","journal-title":"IOP Conference Series: Earth and nvironmental Science"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.5555\/1596374.1596399"},{"key":"e_1_3_1_58_2","first-page":"1278","volume-title":"Proceedings of the 31th International Conference on Machine Learning.","author":"Rezende Danilo Jimenez","year":"2014","unstructured":"Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31th International Conference on Machine Learning.1278\u20131286."},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2019.06.001"},{"key":"e_1_3_1_60_2","article-title":"Exploit Database","author":"Security Offensive","year":"2019","unstructured":"Offensive Security. 2019. Exploit Database. Retrieved June 30, 2021 from https:\/\/www.exploit-db.com\/.","journal-title":"Retrieved June 30, 2021 from https:\/\/www.exploit-db.com\/"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00178"},{"key":"e_1_3_1_62_2","volume-title":"Elements of Survey Sampling","author":"Singh Ravindra","year":"2013","unstructured":"Ravindra Singh and Naurang Singh Mangat. 2013. Elements of Survey Sampling. Springer Science and Business Media."},{"key":"e_1_3_1_63_2","unstructured":"Casper Kaae S\u00f8nderby Tapani Raiko Lars Maal\u00f8e S\u00f8ren Kaae S\u00f8nderby and Ole Winther. 2016. How to train deep variational autoencoders and probabilistic ladder networks. arXiv:1602.02282. Retrieved from https:\/\/arxiv.org\/abs\/1602.02282."},{"key":"e_1_3_1_64_2","article-title":"Stanford Parser","year":"2018","unstructured":"Stanford. 2018. Stanford Parser. Retrieved January 30, 2021 from https:\/\/nlp.stanford.edu\/software\/lex-parser.shtml.","journal-title":"Retrieved January 30, 2021 from https:\/\/nlp.stanford.edu\/software\/lex-parser.shtml"},{"key":"e_1_3_1_65_2","article-title":"Stanford Tagger","year":"2018","unstructured":"Stanford. 2018. Stanford Tagger. Retrieved January 30, 2021 from https:\/\/nlp.stanford.edu\/software\/tagger.shtml.","journal-title":"Retrieved January 30, 2021 from https:\/\/nlp.stanford.edu\/software\/tagger.shtml"},{"key":"e_1_3_1_66_2","doi-asserted-by":"crossref","unstructured":"Henry Tsai Jason Riesa Melvin Johnson Naveen Arivazhagan Xin Li and Amelia Archer. 2019. Small and practical BERT models for sequence labeling. arXiv:1909.00100. Retrieved from https:\/\/arxiv.org\/abs\/1909.00100.","DOI":"10.18653\/v1\/D19-1374"},{"key":"e_1_3_1_67_2","article-title":"Unsupervised concept extraction from clinical text through semantic composition","volume":"91","author":"Tulkens St\u00e9phan","year":"2019","unstructured":"St\u00e9phan Tulkens, Simon Suster, and Walter Daelemans. 2019. Unsupervised concept extraction from clinical text through semantic composition. Journal of Biomedical Informatics 91 (2019).","journal-title":"Journal of Biomedical Informatics"},{"key":"e_1_3_1_68_2","first-page":"220","volume-title":"Proceedings of the 16th International Conference on Security and Management","author":"Vadapalli Satyanarayan Raju","year":"2018","unstructured":"Satyanarayan Raju Vadapalli, George Hsieh, and Kevin S. Nauer. 2018. Twitterosint: Automated cybersecurity threat intelligence collection and analysis using twitter data. In Proceedings of the 16th International Conference on Security and Management. 220\u2013226."},{"issue":"86","key":"e_1_3_1_69_2","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten Laurens van der","year":"2008","unstructured":"Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579\u20132605.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_1_70_2","first-page":"249","volume-title":"Proceedings of the 20th ACM Southeast Conference","author":"Vu Phong Minh","year":"2019","unstructured":"Phong Minh Vu, Tam The Nguyen, and Tung Thanh Nguyen. 2019. ALPACA: Advanced linguistic pattern and concept analysis framework for software engineering corpora. In Proceedings of the 20th ACM Southeast Conference. 249\u2013252."},{"key":"e_1_3_1_71_2","first-page":"356","volume-title":"Proceedings of the 7th International Symposium.","author":"Weerawardhana Sachini S.","year":"2014","unstructured":"Sachini S. Weerawardhana, Subhojeet Mukherjee, Indrajit Ray, and Adele E. Howe. 2014. Automated extraction of vulnerability information for home computer security. In Proceedings of the 7th International Symposium.356\u2013366."},{"key":"e_1_3_1_72_2","unstructured":"Yonghui Wu Mike Schuster Zhifeng Chen Quoc V. Le Mohammad Norouzi and Wolfgang Macherey. 2016. Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144. Retrieved from https:\/\/arxiv.org\/abs\/1609.08144."},{"key":"e_1_3_1_73_2","first-page":"2161","volume-title":"Proceedings of the 13th International Conference on Natural Computation, Fuzzy Systems, and Knowledge Discovery","author":"Xiao Zhifeng","year":"2017","unstructured":"Zhifeng Xiao. 2017. Towards a two-phase unsupervised system for cybersecurity concepts extraction. In Proceedings of the 13th International Conference on Natural Computation, Fuzzy Systems, and Knowledge Discovery. 2161\u20132168."},{"key":"e_1_3_1_74_2","doi-asserted-by":"crossref","unstructured":"Semih Yagcioglu Mehmet Saygin Seyfioglu Begum Citamak Batuhan Bardak Seren Guldamlasioglu Azmi Yuksel and Emin Islam Tatli. 2019. Detecting cybersecurity events from noisy short text. arXiv:1904.05054. Retrieved from https:\/\/arxiv.org\/abs\/1904.05054.","DOI":"10.18653\/v1\/N19-1138"},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE51524.2021.9678638"},{"key":"e_1_3_1_76_2","first-page":"29","volume-title":"Proceedings of the 18th IEEE\/ACM International Conference on Mining Software Repositories","author":"Yitagesu Sofonias","year":"2021","unstructured":"Sofonias Yitagesu, Xiaowang Zhang, Zhiyong Feng, Xiaohong Li, and Zhenchang Xing. 2021. Automatic part-of-speech tagging for security vulnerability descriptions. In Proceedings of the 18th IEEE\/ACM International Conference on Mining Software Repositories. 29\u201340."}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3579638","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3579638","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:48:44Z","timestamp":1750182524000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3579638"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,22]]},"references-count":75,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,9,30]]}},"alternative-id":["10.1145\/3579638"],"URL":"https:\/\/doi.org\/10.1145\/3579638","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,22]]},"assertion":[{"value":"2022-01-28","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-11-26","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}