{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T21:13:01Z","timestamp":1775855581067,"version":"3.50.1"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,10,12]],"date-time":"2021-10-12T00:00:00Z","timestamp":1633996800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,10,12]],"date-time":"2021-10-12T00:00:00Z","timestamp":1633996800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2022,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Enterprises are striving to remain protected against malware-based cyber-attacks on their infrastructure, facilities, networks and systems. Static analysis is an effective approach to detect the malware, i.e., malicious Portable Executable (PE). It performs an in-depth analysis of PE files without executing, which is highly useful to minimize the risk of malicious PE contaminating the system. Yet, instant detection using static analysis has become very difficult due to the exponential rise in volume and variety of malware. The compelling need of early stage detection of malware-based attacks significantly motivates research inclination towards automated malware detection. The recent machine learning aided malware detection approaches using static analysis are mostly supervised. Supervised malware detection using static analysis requires manual labelling and human feedback; therefore, it is less effective in rapidly evolutionary and dynamic threat space. To this end, we propose a progressive deep unsupervised framework with feature attention block for static analysis-based malware detection (PROUD-MAL). The framework is based on cascading blocks of unsupervised clustering and features attention-based deep neural network. The proposed deep neural network embedded with feature attention block is trained on the pseudo labels. To evaluate the proposed unsupervised framework, we collected a real-time malware dataset by deploying low and high interaction honeypots on an enterprise organizational network. Moreover, endpoint security solution is also deployed on an enterprise organizational network to collect malware samples. After post processing and cleaning, the novel dataset consists of 15,457 PE samples comprising 8775 malicious and 6681 benign ones. The proposed PROUD-MAL framework achieved an accuracy of more than 98.09% with better quantitative performance in standard evaluation parameters on collected dataset and outperformed other conventional machine learning algorithms. The implementation and dataset are available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/bit.ly\/35Sne3a\">https:\/\/bit.ly\/35Sne3a<\/jats:ext-link>.<\/jats:p>","DOI":"10.1007\/s40747-021-00560-1","type":"journal-article","created":{"date-parts":[[2021,10,12]],"date-time":"2021-10-12T23:33:30Z","timestamp":1634081610000},"page":"673-685","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["PROUD-MAL: static analysis-based progressive framework for deep unsupervised malware classification of windows portable executable"],"prefix":"10.1007","volume":"8","author":[{"given":"Syed Khurram Jah","family":"Rizvi","sequence":"first","affiliation":[]},{"given":"Warda","family":"Aslam","sequence":"additional","affiliation":[]},{"given":"Muhammad","family":"Shahzad","sequence":"additional","affiliation":[]},{"given":"Shahzad","family":"Saleem","sequence":"additional","affiliation":[]},{"given":"Muhammad Moazam","family":"Fraz","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,10,12]]},"reference":[{"key":"560_CR1","doi-asserted-by":"publisher","first-page":"565","DOI":"10.1109\/TC.2010.130","volume":"60","author":"Y Tang","year":"2011","unstructured":"Tang Y, Xiao B, Lu X (2011) Signature tree generation for polymorphic worms. IEEE Trans Comput 60:565\u2013579. https:\/\/doi.org\/10.1109\/TC.2010.130","journal-title":"IEEE Trans Comput"},{"key":"560_CR2","unstructured":"Internet security threat report (2019) https:\/\/www.symantec.com\/content\/dam\/symantec\/docs\/reports\/istr-24-2019-en.pdf"},{"key":"560_CR3","first-page":"56","volume":"5","author":"E Gandotra","year":"2014","unstructured":"Gandotra E, Bansal D, Sofat S (2014) Malware analysis and classification: a survey. J Inf Secur 5:56\u201364","journal-title":"J Inf Secur"},{"key":"560_CR4","unstructured":"Provos N (2004) A virtual honeypot framework. In: Proceedings of the 13th conference on USENIX Security Symposium"},{"key":"560_CR5","unstructured":"Sung AH, Xu J, Chavez P, Mukkamala S (2004) Static analyzer of vicious executables (SAVE). In: Proceedings of the 20th annual computer security applications conference"},{"key":"560_CR6","first-page":"1","volume":"16","author":"T Wuchner","year":"2020","unstructured":"Wuchner T, Cis\u0142ak A, Ochoa M, Pretschner A (2020) Leveraging compression-based graph mining for behavior-based malware detection. IEEE Trans Depend Secure Comput 16:1","journal-title":"IEEE Trans Depend Secure Comput"},{"key":"560_CR7","doi-asserted-by":"publisher","first-page":"e2998","DOI":"10.7287\/peerj.preprints.2998v2","volume":"6","author":"I Ghafir","year":"2018","unstructured":"Ghafir I, Hammoudeh M, Prenosil V (2018) Defending against the advanced persistent threat: Detection of disguised executable files. PeerJ Preprints 6:e2998. https:\/\/doi.org\/10.7287\/peerj.preprints.2998v2","journal-title":"PeerJ Preprints"},{"key":"560_CR8","unstructured":"Alazab M, Layton R, Venkataraman S, Watters P (2010) Malware detection based on structural and behavioral features of API calls, pg 1\u201310. In: International cyber resilience conference 2010\u2014Perth"},{"key":"560_CR9","first-page":"395","volume":"2","author":"J Devesa","year":"2010","unstructured":"Devesa J, Santos I, Cantero X, Penya YK, Bringas PG (2010) Automatic behavior-based analysis and classification system for malware detection. In ICEIS 2:395\u2013399","journal-title":"In ICEIS"},{"issue":"2","key":"560_CR10","doi-asserted-by":"publisher","first-page":"646","DOI":"10.1016\/j.jnca.2012.10.004","volume":"36","author":"R Slam","year":"2013","unstructured":"Slam R, Tian R, Batten LM, Versteeg S (2013) Classification of malware based on integrated static and dynamic features. J Netw Comput Appl 36(2):646\u2013656","journal-title":"J Netw Comput Appl"},{"key":"560_CR11","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1145\/2089125.2089126","volume":"44","author":"M Egele","year":"2012","unstructured":"Egele M, Scholte T, Kirda E, Kruegel C (2012) A survey on automated dynamic malware-analysis techniques and tools. ACM Comput Surv 44:2","journal-title":"ACM Comput Surv"},{"key":"560_CR12","unstructured":"Christodorescu M, Jha S (2003) Static analysis of executables to detect malicious patterns. In: Proceedings of the 12th conference on USENIX security symposium, vol 12, p 12"},{"key":"560_CR13","doi-asserted-by":"crossref","unstructured":"Ye Y, Wang D, Li T, Ye D (2007) IMDS: intelligent malware detection system. In: Proceedings of ACM international conference on knowledge discovery and data mining (SIGKDD), pp 1043\u20131047","DOI":"10.1145\/1281192.1281308"},{"key":"560_CR14","first-page":"56","volume":"2","author":"E Gandotra","year":"2014","unstructured":"Gandotra E, Bansal D, Sofat S (2014) Malware analysis and classification: a survey. J Inf Secur 2:56\u201364","journal-title":"J Inf Secur"},{"key":"560_CR15","doi-asserted-by":"crossref","unstructured":"Schultz MG (2001) Data mining methods for detection of new malicious executables. In: Proceedings of the IEEE symp. on security and privacy, pp 38\u201349","DOI":"10.1109\/SECPRI.2001.924286"},{"key":"560_CR16","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1016\/j.istr.2009.03.003","volume":"14","author":"A Shabtai","year":"2009","unstructured":"Shabtai A, Moskovitch R, Elovici Y, Glezer C (2009) Detection of malicious code by applying machine learning classifiers on static features: a state-of-the-art survey. Inf Secur Tech Rep 14:16\u201329","journal-title":"Inf Secur Tech Rep"},{"issue":"2","key":"560_CR17","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1007\/s11416-013-0181-8","volume":"9","author":"M Eskandari","year":"2013","unstructured":"Eskandari M, Khorshidpour Z, Hashemi S (2013) Hdm-analyser: a hybrid analysis approach based on data mining techniques for malware detection. J Comput Virol Hack Tech 9(2):77\u201393","journal-title":"J Comput Virol Hack Tech"},{"key":"560_CR18","doi-asserted-by":"crossref","unstructured":"Khodamoradi P, Fazlali M, Mardukhi F, Nosrati M (2015) Heuristic metamorphic malware detection based on statistics of assembly instructions using classification algorithms. In: 18th CSI international symposium on computer architecture and digital systems (CADS), IEEE, pp 1\u20136","DOI":"10.1109\/CADS.2015.7377792"},{"key":"560_CR19","doi-asserted-by":"crossref","unstructured":"Raff E, Sylvester J, Nicholas C (2017) Learning the PE header, malware detection with minimal domain knowledge. In: Proc. 10th ACM workshop artificial intelligence secure, New York, ACM, pp 121\u2013132","DOI":"10.1145\/3128572.3140442"},{"key":"560_CR20","doi-asserted-by":"crossref","unstructured":"Belaoued M (2015) A real-time PE-malware detection system based on CHI-Square test and pe-file features. In: IFIP international conference on computer science and its applications CIIA 2015: computer science and its applications, pp 416\u2013425","DOI":"10.1007\/978-3-319-19578-0_34"},{"key":"560_CR21","unstructured":"Pietrek M (1994) Peering inside the PE: a tour of the Win32 portable executable file format"},{"key":"560_CR22","doi-asserted-by":"crossref","unstructured":"Rossow C et al (2012) Prudent practices for designing malware experiments: status quo and outlook. In: Proc. IEEE symp. secur. privacy (SP), pp 65\u201379","DOI":"10.1109\/SP.2012.14"},{"key":"560_CR23","doi-asserted-by":"crossref","unstructured":"Tobiyama S, Yamaguchi Y, Shimada H, Ikuse T, Yagi T (2016) Malware detection with deep neural network using process behavior. In: Proc. IEEE 40th annu. comput. softw. appl. conf. (COMPSAC), vol 2, pp 577\u2013582","DOI":"10.1109\/COMPSAC.2016.151"},{"key":"560_CR24","doi-asserted-by":"crossref","unstructured":"Shibahara T, Yagi T, Akiyama M, Chiba D, Yada T (2016) Efficient dynamic malware analysis based on network behavior using deep learning. In: Proc. IEEE Global Commun. Conf. (GLOBECOM), pp 1\u20137","DOI":"10.1109\/GLOCOM.2016.7841778"},{"issue":"1","key":"560_CR25","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1145\/1127345.1127348","volume":"9","author":"D Mutz","year":"2006","unstructured":"Mutz D, Valeur F, Vigna G (2006) Anomalous system call detection. ACM Trans Inf Syst Secur 9(1):61\u201393","journal-title":"ACM Trans Inf Syst Secur"},{"issue":"3","key":"560_CR26","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1109\/TDSC.2014.2300482","volume":"11","author":"Y Zhauniarovich","year":"2014","unstructured":"Zhauniarovich Y, Russello G, Conti M, Crispo B, Fernandes E (2014) Moses: supporting and enforcing security profiles on smartphones. Depend Secure Comput IEEE Trans 11(3):211\u2013223","journal-title":"Depend Secure Comput IEEE Trans"},{"key":"560_CR27","doi-asserted-by":"crossref","unstructured":"Raff E, Nicholas C (2017) An alternative to NCD for large sequences, Lempel-Ziv Jaccard distance. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining\u00a0(ACM, 2017), pp 1007\u20131015","DOI":"10.1145\/3097983.3098111"},{"key":"560_CR28","doi-asserted-by":"crossref","unstructured":"Rastogi V, Qu Z, McClurg J, Cao Y, Chen Y (2015) Uranine: real-time privacy leakage monitoring without system modification for android. Springer Int. Pub, Cham, pp 256\u2013276","DOI":"10.1007\/978-3-319-28865-9_14"},{"key":"560_CR29","doi-asserted-by":"publisher","first-page":"824","DOI":"10.1016\/j.future.2019.04.044","volume":"110","author":"AP Namanya","year":"2019","unstructured":"Namanya AP, Awan IU, Disso JP, Younas M (2019) Similarity hash-based scoring of portable executable files for efficient malware detection in IoT. Fut Gen Comput Syst 110:824\u2013832. https:\/\/doi.org\/10.1016\/j.future.2019.04.044","journal-title":"Fut Gen Comput Syst"},{"key":"560_CR30","doi-asserted-by":"publisher","unstructured":"Merkel R (2010) Statistical detection of malicious PE executables for fast offline analysis. In: Springer, Berlin, Heidelberg, ISBN 978-3-642-13241-4, pp 93\u2013105. https:\/\/doi.org\/10.1007\/978-3-642-132414_10","DOI":"10.1007\/978-3-642-132414_10"},{"key":"560_CR31","doi-asserted-by":"publisher","DOI":"10.7717\/peerj-cs.285","volume":"6","author":"FO Catak","year":"2020","unstructured":"Catak FO, Yaz\u0131 AF, Elezaj O, Ahmed J (2020) Deep learning based Sequential model for malware analysis using Windows exe API Calls. PeerJ Comput Sci 6:e285. https:\/\/doi.org\/10.7717\/peerj-cs.285","journal-title":"PeerJ Comput Sci"},{"key":"560_CR32","doi-asserted-by":"crossref","unstructured":"Rush AM, Harvard SEAS, Chopra S, Weston J (2015) a neural attention model for sentence summarization. In: Proceedings of the international conference on empirical methods in natural language processing, Lisbon, Protugal","DOI":"10.18653\/v1\/D15-1044"},{"issue":"8","key":"560_CR33","doi-asserted-by":"publisher","first-page":"735","DOI":"10.1109\/TSE.2002.1027797","volume":"28","author":"C Collberg","year":"2002","unstructured":"Collberg C, Thomborson C (2002) Watermarking, tamperproofing, and obfuscation - tools for software protection. IEEE Trans Software Eng 28(8):735\u2013746","journal-title":"IEEE Trans Software Eng"},{"key":"560_CR34","doi-asserted-by":"publisher","unstructured":"Koniaris I, Papadimitriou G, Nicopolitidis P, Obaidat M (2014) Honeypots deployment for the analysis and visualization of malware activity and malicious connections. In: 2014 IEEE international conference on communications (ICC), Sydney, pp 1819\u20131824. https:\/\/doi.org\/10.1109\/ICC.2014.6883587","DOI":"10.1109\/ICC.2014.6883587"},{"key":"560_CR35","doi-asserted-by":"crossref","unstructured":"Zhou Y, Jiang X (2012) Dissecting android malware: characterization and evolution. In IEEE symposium on security and privacy, pp 95\u2013109","DOI":"10.1109\/SP.2012.16"},{"issue":"12","key":"560_CR36","doi-asserted-by":"publisher","first-page":"3011","DOI":"10.1109\/TIFS.2017.2730581","volume":"12","author":"M Wan","year":"2017","unstructured":"Wan M, Shang W, Zeng P (2017) Double behavior characteristics for one-class classification anomaly detection in networked control systems. IEEE Trans Inf Forens Secur 12(12):3011\u20133023","journal-title":"IEEE Trans Inf Forens Secur"},{"issue":"4","key":"560_CR37","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1109\/5254.708428","volume":"13","author":"M Hearst","year":"1998","unstructured":"Hearst M, Dumais S, Osman E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18\u201328. https:\/\/doi.org\/10.1109\/5254.708428","journal-title":"IEEE Intell Syst Appl"},{"issue":"1","key":"560_CR38","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1109\/TSMCA.2009.2029559","volume":"40","author":"C Seiffert","year":"2010","unstructured":"Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern A Syst Hum 40(1):185\u2013197","journal-title":"IEEE Trans Syst Man Cybern A Syst Hum"},{"key":"560_CR39","doi-asserted-by":"crossref","unstructured":"Caruana A et al (2006) An empirical comparison of supervised learning algorithms. In: ICML '06 proceedings of the 23rd international conference on machine learning, pp 161\u2013168","DOI":"10.1145\/1143844.1143865"},{"key":"560_CR40","doi-asserted-by":"publisher","first-page":"881","DOI":"10.1109\/TPAMI.2002.1017616","volume":"24","author":"T Kanungo","year":"2002","unstructured":"Kanungo T, Mount D, Netanyahu N, Piatko C, Silverman R, Wu A (2002) An efficient K-means clustering algorithm analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24:881\u2013892. https:\/\/doi.org\/10.1109\/TPAMI.2002.1017616","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"560_CR41","unstructured":"Boutsidis C, Mahoney M, Drineas P (2009) Unsupervised feature selection for the k-means clustering problem. In: Advances in neural information processing systems 22-proceedings of the conference, pp 153\u2013161"},{"key":"560_CR42","doi-asserted-by":"crossref","unstructured":"Gandotra E, Bansal D, Sofat S (2016) Zero-day malware detection. In: Sixth international symposium on embedded computing and system design\u00a0(IEEE, 2016), pp 171\u2013175","DOI":"10.1109\/ISED.2016.7977076"},{"key":"560_CR43","doi-asserted-by":"publisher","first-page":"e5234","DOI":"10.1002\/cpe.5234","volume":"2019","author":"CK Ng","year":"2019","unstructured":"Ng CK, Jiang F, Zhang LY, Zhou W (2019) Static malware clustering using enhanced deep embedding method. Concurr Computat Pract Exper 2019:e5234. https:\/\/doi.org\/10.1002\/cpe.5234","journal-title":"Concurr Computat Pract Exper"},{"key":"560_CR44","doi-asserted-by":"publisher","unstructured":"Algaith A, Gashi I, Sobesto B, Cukier M, Haxhijaha S, Bajrami G (2016) comparing detection capabilities of antivirus products: an empirical study with different versions of products from the same vendors. In:2016 46th Annual IEEE\/IFIP international conference on dependable systems and networks workshop (DSN-W), Toulouse, pp 48\u201353. https:\/\/doi.org\/10.1109\/DSN-W.2016.45","DOI":"10.1109\/DSN-W.2016.45"},{"key":"560_CR45","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1007\/s11416-017-0309-3","volume":"14","author":"AV Kozachok","year":"2018","unstructured":"Kozachok AV, Kozachok VI (2018) Construction and evaluation of the new heuristic malware detection mechanism based on executable files static analysis. J Comput Virol Hack Tech 14:225\u2013231. https:\/\/doi.org\/10.1007\/s11416-017-0309-3","journal-title":"J Comput Virol Hack Tech"},{"key":"560_CR46","doi-asserted-by":"publisher","unstructured":"Hassen M, Carvalho MM, Chan PK (2017) Malware classification using static after validation analysis-based features. In: 2017 IEEE symposium series on computational intelligence (SSCI), Honolulu, pp 1\u20137. https:\/\/doi.org\/10.1109\/SSCI.2017.8285426","DOI":"10.1109\/SSCI.2017.8285426"},{"key":"560_CR47","doi-asserted-by":"crossref","unstructured":"Vadrevu P, Perdisci R (2016) MAXS: scaling malware execution with sequential multi-hypothesis testing. In: Proceedings of the 11th ACM on Asia conference on computer and communications security\u00a0(ACM, 2016), pp 771\u2013782","DOI":"10.1145\/2897845.2897873"},{"key":"560_CR48","unstructured":"Anderson HS, Kharkar A, Filar B, Evans D, Roth P (2018) Learning to evade static PE machine learning malware models via reinforcement learning. http:\/\/arxiv.org\/abs\/1801.08917"},{"issue":"01","key":"560_CR49","first-page":"1005","volume":"34","author":"Y Wang","year":"2020","unstructured":"Wang Y, Stokes J, Marinescu M (2020) Actor critic deep reinforcement learning for neural malware control. Proc AAAI Conf Artif Intell 34(01):1005\u20131012","journal-title":"Proc AAAI Conf Artif Intell"},{"key":"560_CR50","doi-asserted-by":"publisher","unstructured":"Wu C, Shi J, Yang Y, Li W (2018) Enhancing machine learning based malware detection model by reinforcement learning. In: Proceedings of the 8th international conference on communication and network security (ICCNS 2018), pp 74\u201378. https:\/\/doi.org\/10.1145\/3290480.3290494","DOI":"10.1145\/3290480.3290494"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-021-00560-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-021-00560-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-021-00560-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,3,3]],"date-time":"2022-03-03T12:41:57Z","timestamp":1646311317000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-021-00560-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,12]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,2]]}},"alternative-id":["560"],"URL":"https:\/\/doi.org\/10.1007\/s40747-021-00560-1","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,12]]},"assertion":[{"value":"30 January 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 September 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 October 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declared that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}