{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T17:51:06Z","timestamp":1780422666745,"version":"3.54.1"},"reference-count":52,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2022,9,17]],"date-time":"2022-09-17T00:00:00Z","timestamp":1663372800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Given their escalating number and variety, combating malware is becoming increasingly strenuous. Machine learning techniques are often used in the literature to automatically discover the models and patterns behind such challenges and create solutions that can maintain the rapid pace at which malware evolves. This article compares various tree-based ensemble learning methods that have been proposed in the analysis of PE malware. A tree-based ensemble is an unconventional learning paradigm that constructs and combines a collection of base learners (e.g., decision trees), as opposed to the conventional learning paradigm, which aims to construct individual learners from training data. Several tree-based ensemble techniques, such as random forest, XGBoost, CatBoost, GBM, and LightGBM, are taken into consideration and are appraised using different performance measures, such as accuracy, MCC, precision, recall, AUC, and F1. In addition, the experiment includes many public datasets, such as BODMAS, Kaggle, and CIC-MalMem-2022, to demonstrate the generalizability of the classifiers in a variety of contexts. Based on the test findings, all tree-based ensembles performed well, and performance differences between algorithms are not statistically significant, particularly when their respective hyperparameters are appropriately configured. The proposed tree-based ensemble techniques also outperformed other, similar PE malware detectors that have been published in recent years.<\/jats:p>","DOI":"10.3390\/a15090332","type":"journal-article","created":{"date-parts":[[2022,9,18]],"date-time":"2022-09-18T22:12:43Z","timestamp":1663539163000},"page":"332","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":36,"title":["Tree-Based Classifier Ensembles for PE Malware Analysis: A Performance Revisit"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8274-0990","authenticated-orcid":false,"given":"Maya Hilda Lestari","family":"Louk","sequence":"first","affiliation":[{"name":"Department of Informatics Engineering, University of Surabaya, Surabaya 60293, Indonesia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1821-6438","authenticated-orcid":false,"given":"Bayu Adhi","family":"Tama","sequence":"additional","affiliation":[{"name":"Department of Information Systems, University of Maryland, Baltimore County (UMBC), Baltimore, MD 21250, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,9,17]]},"reference":[{"key":"ref_1","unstructured":"Kleidermacher, D., and Kleidermacher, M. (2012). Embedded Systems Security: Practical Methods for Safe and Secure Software and Systems Development, Elsevier."},{"key":"ref_2","unstructured":"Xhafa, F. (2022). Autonomous and Connected Heavy Vehicle Technology, Academic Press."},{"key":"ref_3","unstructured":"Smith, D.J., and Simpson, K.G. (2010). Safety Critical Systems Handbook, Elsevier."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11416-015-0261-z","article-title":"A comparison of static, dynamic, and hybrid analysis for malware detection","volume":"13","author":"Damodaran","year":"2017","journal-title":"J. Comput. Virol. Hacking Tech."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1016\/j.neucom.2019.02.056","article-title":"Application of deep learning to cybersecurity: A survey","volume":"347","author":"Mahdavifar","year":"2019","journal-title":"Neurocomputing"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1109\/TR.2019.2924677","article-title":"DAMBA: Detecting android malware by ORGB analysis","volume":"69","author":"Zhang","year":"2020","journal-title":"IEEE Trans. Reliab."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1016\/j.cose.2018.11.001","article-title":"Survey of machine learning techniques for malware analysis","volume":"81","author":"Ucci","year":"2019","journal-title":"Comput. Secur."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"314","DOI":"10.1016\/j.cose.2019.01.001","article-title":"Time, accuracy and power consumption tradeoff in mobile malware detection systems","volume":"82","author":"Milosevic","year":"2019","journal-title":"Comput. Secur."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1016\/j.future.2018.07.052","article-title":"Classification of ransomware families with machine learning based onN-gram of opcodes","volume":"90","author":"Zhang","year":"2019","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.future.2021.11.030","article-title":"A study on malicious software behaviour analysis and detection techniques: Taxonomy, current trends and challenges","volume":"130","author":"Maniriho","year":"2022","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_11","first-page":"1119","article-title":"Detecting Malware in Cyberphysical Systems Using Machine Learning: A Survey","volume":"15","author":"Montes","year":"2021","journal-title":"KSII Trans. Internet Inf. Syst. (TIIS)"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"106273","DOI":"10.1016\/j.infsof.2020.106273","article-title":"Detection of malicious software by analyzing the behavioral artifacts using machine learning algorithms","volume":"121","author":"Singh","year":"2020","journal-title":"Inf. Softw. Technol."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.13164\/mendel.2019.2.001","article-title":"An ensemble-based malware detection model using minimum feature set","volume":"25","author":"Amer","year":"2019","journal-title":"Mendel"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Atluri, V. (2019, January 11\u201314). Malware Classification of Portable Executables using Tree-Based Ensemble Machine Learning. Proceedings of the 2019 SoutheastCon, Huntsville, AL, USA.","DOI":"10.1109\/SoutheastCon42311.2019.9020524"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Azeez, N.A., Odufuwa, O.E., Misra, S., Oluranti, J., and Dama\u0161evi\u010dius, R. (2021). Windows PE Malware Detection Using Ensemble Learning. Informatics, 8.","DOI":"10.3390\/informatics8010010"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Dama\u0161evi\u010dius, R., Ven\u010dkauskas, A., Toldinas, J., and Grigali\u016bnas, \u0160. (2021). Ensemble-Based Classification Using Neural Networks and Machine Learning Models for Windows PE Malware Detection. Electronics, 10.","DOI":"10.3390\/electronics10040485"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Mills, A., Spyridopoulos, T., and Legg, P. (2019, January 3\u20134). Efficient and Interpretable Real-Time Malware Detection Using Random-Forest. Proceedings of the International Conference on Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA), Oxford, UK.","DOI":"10.1109\/CyberSA.2019.8899533"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhou, Z.H. (2012). Ensemble Methods: Foundations and Algorithms, CRC Press.","DOI":"10.1201\/b12207"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1016\/j.chemolab.2019.06.003","article-title":"LightGBM-PPI: Predicting protein-protein interactions through LightGBM with multi-information fusion","volume":"191","author":"Chen","year":"2019","journal-title":"Chemom. Intell. Lab. Syst."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"107871","DOI":"10.1016\/j.anucene.2020.107871","article-title":"An assembly-level neutronic calculation method based on LightGBM algorithm","volume":"150","author":"Cai","year":"2021","journal-title":"Ann. Nucl. Energy"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"5472","DOI":"10.1038\/s41598-022-09521-1","article-title":"Human activity recognition of children with wearable devices using LightGBM machine learning","volume":"12","author":"Csizmadia","year":"2022","journal-title":"Sci. Rep."},{"key":"ref_23","first-page":"1","article-title":"Xgboost: Extreme gradient boosting","volume":"Volume 1","author":"Chen","year":"2015","journal-title":"R Package Version 0.4\u20132"},{"key":"ref_24","unstructured":"Dorogush, A.V., Ershov, V., and Gulin, A. (2018). CatBoost: Gradient boosting with categorical features support. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: A gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"ref_26","first-page":"1","article-title":"Lightgbm: A highly efficient gradient boosting decision tree","volume":"30","author":"Ke","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yang, L., Ciptadi, A., Laziuk, I., Ahmadzadeh, A., and Wang, G. (2021, January 27). BODMAS: An Open Dataset for Learning based Temporal Analysis of PE Malware. Proceedings of the IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA.","DOI":"10.1109\/SPW53761.2021.00020"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Carrier, T., Victor, P., Tekeoglu, A., and Lashkari, A.H. (2022, January 9\u201311). Detecting Obfuscated Malware using Memory Feature Engineering. Proceedings of the ICISSP, Online.","DOI":"10.5220\/0010908200003120"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"101861","DOI":"10.1016\/j.sysarc.2020.101861","article-title":"A survey on machine learning-based malware detection in executable files","volume":"112","author":"Singh","year":"2021","journal-title":"J. Syst. Archit."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"5061059","DOI":"10.1155\/2022\/5061059","article-title":"An Attribute Extraction for Automated Malware Attack Classification and Detection Using Soft Computing Techniques","volume":"2022","author":"Albishry","year":"2022","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Vadrevu, P., Rahbarinia, B., Perdisci, R., Li, K., and Antonakakis, M. (2013, January 9\u201313). Measuring and detecting malware downloads in live network traffic. Proceedings of the European Symposium on Research in Computer Security, Egham, UK.","DOI":"10.1007\/978-3-642-40203-6_31"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Uppal, D., Sinha, R., Mehra, V., and Jain, V. (2014, January 34\u201327). Malware detection and classification based on extraction of API sequences. Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Delhi, India.","DOI":"10.1109\/ICACCI.2014.6968547"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Kwon, B.J., Mondal, J., Jang, J., Bilge, L., and Dumitra\u015f, T. (2015, January 12\u201316). The dropper effect: Insights into malware distribution with downloader graph analytics. Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA.","DOI":"10.1145\/2810103.2813724"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Mao, W., Cai, Z., Towsley, D., and Guan, X. (2015, January 2\u20134). Probabilistic inference on integrity for access behavior based malware detection. Proceedings of the International Symposium on Recent Advances in Intrusion Detection, Tokyo, Japan.","DOI":"10.1007\/978-3-319-26362-5_8"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"W\u00fcchner, T., Ochoa, M., and Pretschner, A. (2015, January 9\u201310). Robust and effective malware detection through quantitative data flow graph metrics. Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Milan, Italy.","DOI":"10.1007\/978-3-319-20550-2_6"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M., and Giacinto, G. (2016, January 1\u20139). Novel feature extraction, selection and fusion for effective malware family classification. Proceedings of the sixth ACM Conference on Data and Application Security and Privacy, New Orleans, LA, USA.","DOI":"10.1145\/2857705.2857713"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Dener, M., Ok, G., and Orman, A. (2022). Malware Detection Using Memory Analysis Data in Big Data Environment. Appl. Sci., 12.","DOI":"10.3390\/app12178604"},{"key":"ref_38","first-page":"510","article-title":"Performance Analysis of Machine Learning Classifiers for Detecting PE Malware","volume":"11","author":"Azmee","year":"2020","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"101682","DOI":"10.1016\/j.cose.2019.101682","article-title":"A novel method for malware detection on ML-based visualization technique","volume":"89","author":"Liu","year":"2020","journal-title":"Comput. Secur."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Asam, M., Hussain, S.J., Mohatram, M., Khan, S.H., Jamal, T., Zafar, A., Khan, A., Ali, M.U., and Zahoora, U. (2021). Detection of Exceptional Malware Variants Using Deep Boosted Feature Spaces and Machine Learning. Appl. Sci., 11.","DOI":"10.3390\/app112110464"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"102905","DOI":"10.1016\/j.cose.2022.102905","article-title":"EII-MBS: Malware Family Classification via Enhanced Instruction-level Behavior Semantic Learning","volume":"112","author":"Hao","year":"2022","journal-title":"Comput. Secur."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Hou, S., Saas, A., Ye, Y., and Chen, L. (2016, January 3\u20135). Droiddelver: An android malware detection system using deep belief network based on api call blocks. Proceedings of the International Conference on Web-Age Information Management, Nanchang, China.","DOI":"10.1007\/978-3-319-47121-1_5"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Hou, S., Saas, A., Chen, L., Ye, Y., and Bourlai, T. (August, January 31). Deep neural networks for automatic android malware detection. Proceedings of the 2017 IEEE\/ACM International Conference on Advances in Social Networks Analysis and Mining, Sydney, Australia.","DOI":"10.1145\/3110025.3116211"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"95970","DOI":"10.1109\/ACCESS.2022.3202952","article-title":"Self-Attentive Models for Real-Time Malware Classification","volume":"10","author":"Lu","year":"2022","journal-title":"IEEE Access"},{"key":"ref_45","first-page":"281","article-title":"Random search for hyper-parameter optimization","volume":"13","author":"Bergstra","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"ref_46","unstructured":"Wright, M.N., and Ziegler, A. (2015). Ranger: A fast implementation of random forests for high dimensional data in C++ and R. arXiv."},{"key":"ref_47","first-page":"1","article-title":"Package \u2018xgboost\u2019","volume":"90","author":"Chen","year":"2019","journal-title":"R Version"},{"key":"ref_48","unstructured":"Cook, D. (2016). Practical Machine Learning with H2O: Powerful, Scalable Techniques for Deep Learning and AI, O\u2019Reilly Media, Inc."},{"key":"ref_49","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_50","first-page":"13","article-title":"The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation","volume":"14","author":"Chicco","year":"2021","journal-title":"BioBata Min."},{"key":"ref_51","unstructured":"Conover, W.J. (1999). Practical Nonparametric Statistics, John Wiley & Sons."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Arik, S.\u00d6., and Pfister, T. (2021, January 2\u20139). Tabnet: Attentive interpretable tabular learning. Proceedings of the AAAI Conference on Artificial Intelligence, Virsual Conference.","DOI":"10.1609\/aaai.v35i8.16826"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/9\/332\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:33:32Z","timestamp":1760142812000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/9\/332"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,17]]},"references-count":52,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2022,9]]}},"alternative-id":["a15090332"],"URL":"https:\/\/doi.org\/10.3390\/a15090332","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,17]]}}}