{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T14:34:55Z","timestamp":1762353295414,"version":"3.40.3"},"publisher-location":"Cham","reference-count":34,"publisher":"Springer Nature Switzerland","isbn-type":[{"type":"print","value":"9783031240485"},{"type":"electronic","value":"9783031240492"}],"license":[{"start":{"date-parts":[[2022,1,1]],"date-time":"2022-01-01T00:00:00Z","timestamp":1640995200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,1,19]],"date-time":"2023-01-19T00:00:00Z","timestamp":1674086400000},"content-version":"vor","delay-in-days":383,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Signature and anomaly-based techniques are the fundamental methods to detect malware. However, in recent years this type of threat has advanced to become more complex and sophisticated, making these techniques less effective. For this reason, researchers have resorted to state-of-the-art machine learning techniques to combat the threat of information security. Nevertheless, despite the integration of the machine learning models, there is still a shortage of data in training that prevents these models from performing at their peak. In the past, generative models have been found to be highly effective at generating image-like data that are similar to the actual data distribution. In this paper, we leverage the knowledge of generative modeling on opcode sequences and aim to generate malware samples by taking advantage of the contextualized embeddings from BERT. We obtained promising results when differentiating between real and generated samples. We observe that generated malware has such similar characteristics to actual malware that the classifiers are having difficulty in distinguishing between the two, in which the classifiers falsely identify the generated malware as actual malware almost <jats:inline-formula><jats:tex-math>$$90\\%$$<\/jats:tex-math><\/jats:inline-formula> of the time.<\/jats:p>","DOI":"10.1007\/978-3-031-24049-2_2","type":"book-chapter","created":{"date-parts":[[2023,1,18]],"date-time":"2023-01-18T16:02:56Z","timestamp":1674057776000},"page":"22-37","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Word Embeddings for\u00a0Fake Malware Generation"],"prefix":"10.1007","author":[{"given":"Quang Duy","family":"Tran","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2355-7146","authenticated-orcid":false,"given":"Fabio","family":"Di Troia","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,1,19]]},"reference":[{"key":"2_CR1","unstructured":"Advanced guide to inception V3, Google. https:\/\/cloud.google.com\/tpu\/docs\/inception-v3-advanced"},{"key":"2_CR2","volume-title":"Computer Viruses and Malware","author":"J Aycock","year":"2006","unstructured":"Aycock, J.: Computer Viruses and Malware. Springer, New York (2006)"},{"key":"2_CR3","series-title":"Computer Communications and Networks","doi-asserted-by":"publisher","first-page":"281","DOI":"10.1007\/978-3-319-92624-7_12","volume-title":"Guide to Vulnerability Analysis for Computer Networks and Systems","author":"D Dhanasekar","year":"2018","unstructured":"Dhanasekar, D., Di Troia, F., Potika, K., Stamp, M.: Detecting encrypted and polymorphic malware using hidden Markov models. In: Parkinson, S., Crampton, A., Hill, R. (eds.) Guide to Vulnerability Analysis for Computer Networks and Systems. CCN, pp. 281\u2013299. Springer, Cham (2018). https:\/\/doi.org\/10.1007\/978-3-319-92624-7_12"},{"key":"2_CR4","unstructured":"Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv, abs\/1910.01108 (2019)"},{"issue":"5","key":"2_CR5","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1109\/MSP.2011.98","volume":"9","author":"P O\u2019Kane","year":"2011","unstructured":"O\u2019Kane, P., Sezer, S., McLaughlin, K.: Obfuscation: the hidden malware. IEEE Secur. Priv. 9(5), 41\u201347 (2011). https:\/\/doi.org\/10.1109\/MSP.2011.98","journal-title":"IEEE Secur. Priv."},{"key":"2_CR6","unstructured":"Hugging Face. Distilbert. https:\/\/huggingface.co\/transformers\/model_doc\/distilbert.html"},{"key":"2_CR7","doi-asserted-by":"crossref","unstructured":"Clark, K., Khandelwal, U., Levy, O., Manning, C.: What does BERT look at? an analysis of BERT\u2019s attention. In: Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Florence, Italy, August 2019, pp. 276\u2013286. Association for Computational Linguistics (2019)","DOI":"10.18653\/v1\/W19-4828"},{"key":"2_CR8","unstructured":"Microsoft Security Intelligence. Winwebsec (2010). https:\/\/www.microsoft.com\/security\/portal\/threat\/encyclopedia\/entry.aspx?Name=Win32%2fWinwebsec"},{"key":"2_CR9","unstructured":"Microsoft Security Intelligence. Zbot (2010). https:\/\/www.microsoft.com\/en-us\/wdsi\/threats\/malware-encyclopedia-description?name=win32%2Fzbot"},{"key":"2_CR10","unstructured":"Asher-Dotan, L.: What is zero access malware, cybereason i cybersecurity software to end cyber attacks, 16-May-2016. https:\/\/www.cybereason.com\/blog\/what-is-zeroaccess-malware"},{"key":"2_CR11","unstructured":"Microsoft Security Intelligence. VBInject (2010). https:\/\/www.microsoft.com\/en-us\/wdsi\/threats\/malware-encyclopedia-description?Name=VirTool:Win32\/VBInject%26ThreatID=-2147367171"},{"key":"2_CR12","unstructured":"Microsoft Security Intelligence. Onlinegames (2008). https:\/\/www.microsoft.com\/en-us\/wdsi\/threats\/malware-encyclopedia-description?Name=PWS%3AWin32%2FOnLineGames"},{"key":"2_CR13","unstructured":"Microsoft Security Intelligence. Renos (2006). https:\/\/www.microsoft.com\/en-us\/wdsi\/threats\/malware-encyclopedia-description?Name=TrojanDownloader:Win32\/Renos &threatId=16054"},{"key":"2_CR14","unstructured":"Microsoft Security Intelligence. BHO (2020). https:\/\/www.microsoft.com\/en-us\/wdsi\/threats\/malware-encyclopedia-description?Name=Trojan:Win32\/BHO.BO"},{"key":"2_CR15","unstructured":"Johnson, J.: Number of malware attacks per year 2020, Statista, 20-Aug-2021. https:\/\/www.statista.com\/statistics\/873097\/malware-attacks-per-year-worldwide\/"},{"key":"2_CR16","unstructured":"Vaswani, A., et al.: Attention is all you need (2017). https:\/\/arxiv.org\/abs\/1706.03762"},{"key":"2_CR17","doi-asserted-by":"crossref","unstructured":"Huang, L., Joseph, A.D., Nelson, B., Rubinstein, B.I., Tygar, J.D.: Adversarial machine learning. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 43\u201358 (2011)","DOI":"10.1145\/2046684.2046692"},{"issue":"1","key":"2_CR18","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1007\/s10207-014-0248-7","volume":"14","author":"A Nappa","year":"2015","unstructured":"Nappa, A., Rafique, M.Z., Caballero, J.: The MALICIA dataset: identification and analysis of drive-by download operations. Int. J. Inf. Secur. 14(1), 15\u201333 (2015). https:\/\/doi.org\/10.1007\/s10207-014-0248-7","journal-title":"Int. J. Inf. Secur."},{"key":"2_CR19","unstructured":"\u201cnovelty and outlier detection\", scikit-learn. https:\/\/scikit-learn.org\/stable\/modules\/outlier_detection.html"},{"key":"2_CR20","doi-asserted-by":"publisher","first-page":"413","DOI":"10.1109\/ICDM.2008.17","volume":"2008","author":"FT Liu","year":"2008","unstructured":"Liu, F.T., Ting, K.M., Zhou, Z.: Isolation forest. Eighth IEEE Int. Conf. Data Min. 2008, 413\u2013422 (2008). https:\/\/doi.org\/10.1109\/ICDM.2008.17","journal-title":"Eighth IEEE Int. Conf. Data Min."},{"key":"2_CR21","doi-asserted-by":"publisher","unstructured":"Burks, R., Islam, K.A., Lu, Y., Li, J.: Data augmentation with generative models for improved malware detection: a comparative study. In: 2019 IEEE 10th Annual Ubiquitous Computing, Electronics Mobile Communication Conference (UEMCON), pp. 0660\u20130665 (2019). https:\/\/doi.org\/10.1109\/UEMCON47517.2019.8993085","DOI":"10.1109\/UEMCON47517.2019.8993085"},{"key":"2_CR22","unstructured":"Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs (2016)"},{"key":"2_CR23","doi-asserted-by":"publisher","unstructured":"Bounsiar, A., Madden, M.G.: One-class support vector machines revisited. In: International Conference on Information Science & Applications (ICISA) 2014, pp. 1\u20134 (2014). https:\/\/doi.org\/10.1109\/ICISA.2014.6847442","DOI":"10.1109\/ICISA.2014.6847442"},{"key":"2_CR24","doi-asserted-by":"publisher","unstructured":"Pokrajac, D., Lazarevic, A., Latecki, L.J.: Incremental local outlier detection for data streams. In: IEEE Symposium on Computational Intelligence and Data Mining 2007, pp. 504\u2013515 (2007). https:\/\/doi.org\/10.1109\/CIDM.2007.368917","DOI":"10.1109\/CIDM.2007.368917"},{"key":"2_CR25","doi-asserted-by":"publisher","DOI":"10.1007\/s11416-022-00424-3","author":"AS Kale","year":"2022","unstructured":"Kale, A.S., Pandya, V., Di Troia, F., et al.: Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo. J. Comput. Virol. Hack. Tech. (2022). https:\/\/doi.org\/10.1007\/s11416-022-00424-3","journal-title":"J. Comput. Virol. Hack. Tech."},{"key":"2_CR26","unstructured":"Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN (2017)"},{"key":"2_CR27","unstructured":"Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 6629\u20136640. Curran Associates Inc. (2017)"},{"key":"2_CR28","doi-asserted-by":"publisher","unstructured":"Gibert, D., Mateu, C., Planes, J.: The rise of machine learning for detection and classification of malware: research developments, trends and challenges. J. Netw. Comput. Appl. 153, 102526 (2020). https:\/\/doi.org\/10.1016\/j.jnca.2019.102526, https:\/\/www.sciencedirect.com\/science\/article\/pii\/S1084804519303868","DOI":"10.1016\/j.jnca.2019.102526"},{"key":"2_CR29","doi-asserted-by":"publisher","unstructured":"Lu, Y., Li, J.: Generative adversarial network for improving deep learning based malware classification. In: 2019 Winter Simulation Conference (WSC), pp. 584\u2013593 (2019). https:\/\/doi.org\/10.1109\/WSC40007.2019.9004932","DOI":"10.1109\/WSC40007.2019.9004932"},{"key":"2_CR30","unstructured":"Roberts, J.M.: VirusShare.com - Because Sharing is Caring (2011). http:\/\/www.virusshare.com"},{"key":"2_CR31","doi-asserted-by":"publisher","unstructured":"Harshit, T.: Fake malware opcodes generation using HMM and different GAN algorithms (2021). Master\u2019s Projects. 1001. https:\/\/doi.org\/10.31979\/etd.eq6a-twvq, https:\/\/scholarworks.sjsu.edu\/etd_projects\/1001","DOI":"10.31979\/etd.eq6a-twvq"},{"key":"2_CR32","unstructured":"Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of wasserstein GANs (2017)"},{"issue":"1","key":"2_CR33","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1007\/s11416-019-00345-8","volume":"16","author":"S Basole","year":"2020","unstructured":"Basole, S., Di Troia, F., Stamp, M.: Multifamily malware models. J. Comput. Virol. Hacking Tech. 16(1), 79\u201392 (2020). https:\/\/doi.org\/10.1007\/s11416-019-00345-8","journal-title":"J. Comput. Virol. Hacking Tech."},{"key":"2_CR34","unstructured":"sklearn. Gridsearchcv. https:\/\/scikitlearn.org\/stable\/modules\/generated\/sklearn.model_selection.GridSearchCV.html"}],"container-title":["Communications in Computer and Information Science","Silicon Valley Cybersecurity Conference"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-24049-2_2","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,18]],"date-time":"2023-01-18T16:03:11Z","timestamp":1674057791000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-24049-2_2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"ISBN":["9783031240485","9783031240492"],"references-count":34,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-24049-2_2","relation":{},"ISSN":["1865-0929","1865-0937"],"issn-type":[{"type":"print","value":"1865-0929"},{"type":"electronic","value":"1865-0937"}],"subject":[],"published":{"date-parts":[[2022]]},"assertion":[{"value":"19 January 2023","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"SVCC","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Silicon Valley Cybersecurity Conference","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2022","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"17 August 2022","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"19 August 2022","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"3","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"svcc2022","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/svcc2022.svcsi.org\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Double-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Easychair","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"10","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"8","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"80% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"1","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Yes","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}