{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T09:00:18Z","timestamp":1762074018989,"version":"build-2065373602"},"reference-count":34,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2022,11,11]],"date-time":"2022-11-11T00:00:00Z","timestamp":1668124800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>Malware is used to carry out malicious operations on networks and computer systems. Consequently, malware classification is crucial for preventing malicious attacks. Application programming interfaces (APIs) are ideal candidates for characterizing malware behavior. However, the primary challenge is to produce API call features for classification algorithms to achieve high classification accuracy. To achieve this aim, this work employed the Jaccard similarity and visualization analysis to find the hidden patterns created by various malware API calls. Traditional machine learning classifiers, i.e., random forest (RF), support vector machine (SVM), and k-nearest neighborhood (KNN), were used in this research as alternatives to existing neural networks, which use millions of length API call sequences. The benchmark dataset used in this study contains 7107 samples of API call sequences (labeled to eight different malware families). The results showed that RF with the proposed API call features outperformed the LSTM (long short-term memory) and gated recurrent unit (GRU)-based methods against overall evaluation metrics.<\/jats:p>","DOI":"10.3390\/computers11110160","type":"journal-article","created":{"date-parts":[[2022,11,14]],"date-time":"2022-11-14T02:32:01Z","timestamp":1668393121000},"page":"160","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Features Engineering for Malware Family Classification Based API Call"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2180-676X","authenticated-orcid":false,"given":"Ammar Yahya","family":"Daeef","sequence":"first","affiliation":[{"name":"Technical Institute for Administration, Middle Technical University, Baghdad 10010, Iraq"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8840-9235","authenticated-orcid":false,"given":"Ali","family":"Al-Naji","sequence":"additional","affiliation":[{"name":"Electrical Engineering Technical College, Middle Technical University, Baghdad 10022, Iraq"},{"name":"School of Engineering, University of South Australia, Mawson Lakes, SA 5095, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6496-0543","authenticated-orcid":false,"given":"Javaan","family":"Chahl","sequence":"additional","affiliation":[{"name":"School of Engineering, University of South Australia, Mawson Lakes, SA 5095, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,11,11]]},"reference":[{"key":"ref_1","unstructured":"Institute, A.T. (2022, July 19). Malware Statistics and Trends Report: AV TEST. Available online: https:\/\/www.av-test.org\/en\/statistics\/malware\/."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"42762","DOI":"10.1109\/ACCESS.2022.3168794","article-title":"Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model","volume":"10","author":"Ghaleb","year":"2022","journal-title":"IEEE Access"},{"key":"ref_3","unstructured":"Catak, F.O., and Yaz\u0131, A.F. (2019). A benchmark API call dataset for windows PE malware classification. arXiv."},{"key":"ref_4","unstructured":"Oliveira, A., and Sassi, R. (2019). Behavioral malware detection using deep graph convolutional neural networks. TechRxiv, preprint."},{"key":"ref_5","unstructured":"VMRay (2022, July 10). Sans Webcast Recap: Practical Malware Family Identification for Incident Responders. Available online: https:\/\/www.vmray.com\/cyber-security-blog\/practical-malware-family-identification-sans-webcast-recap."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Sebasti\u00e1n, M., Rivera, R., Kotzias, P., and Caballero, J. (2016, January 19\u201321). Avclass: A tool for massive malware labeling. Proceedings of the International Symposium on Research in Attacks, Intrusions, and Defenses, Paris, France.","DOI":"10.1007\/978-3-319-45719-2_11"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1145\/1743546.1743567","article-title":"A tour through the visualization zoo","volume":"53","author":"Heer","year":"2010","journal-title":"Commun. ACM"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Srivastava, V., and Sharma, R. (2022). Malware Discernment Using Machine Learning. Transforming Management with AI, Big-Data, and IoT, Springer.","DOI":"10.1007\/978-3-030-86749-2_12"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"156900","DOI":"10.1109\/ACCESS.2020.3019282","article-title":"Multifamily classification of Android malware with a fuzzy strategy to resist polymorphic familial variants","volume":"8","author":"Liu","year":"2020","journal-title":"IEEE Access"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"103443","DOI":"10.1016\/j.csi.2020.103443","article-title":"Metamorphic malware identification using engine-specific patterns based on co-opcode graphs","volume":"71","author":"Kakisim","year":"2020","journal-title":"Comput. Stand. Interfaces"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Bayazit, E.C., Sahingoz, O.K., and Dogan, B. (2022, January 9\u201311). A Deep Learning Based Android Malware Detection System with Static Analysis. Proceedings of the 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Ankara, Turkey.","DOI":"10.1109\/HORA55278.2022.9800057"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"101682","DOI":"10.1016\/j.cose.2019.101682","article-title":"A novel method for malware detection on ML-based visualization technique","volume":"89","author":"Liu","year":"2020","journal-title":"Comput. Secur."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"3187","DOI":"10.1109\/TII.2018.2822680","article-title":"Detection of malicious code variants based on deep learning","volume":"14","author":"Cui","year":"2018","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"376","DOI":"10.1016\/j.cose.2019.04.005","article-title":"A feature-hybrid malware variants detection using CNN based opcode embedding and BPNN based API embedding","volume":"84","author":"Zhang","year":"2019","journal-title":"Comput. Secur."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"102871","DOI":"10.1016\/j.cose.2022.102871","article-title":"Efficient and Robust Malware Detection Based on Control Flow Traces Using Deep Neural Networks","volume":"122","author":"Qiang","year":"2022","journal-title":"Comput. Secur."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Pal\u0161a, J., \u00c1d\u00e1m, N., Hurtuk, J., Chovancov\u00e1, E., Mado\u0161, B., Chovanec, M., and Kocan, S. (2022). MLMD\u2014A Malware-Detecting Antivirus Tool Based on the XGBoost Machine Learning Algorithm. Appl. Sci., 12.","DOI":"10.3390\/app12136672"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1016\/j.future.2021.01.004","article-title":"Intelligent dynamic malware detection using machine learning in IP reputation for forensics data analytics","volume":"118","author":"Usman","year":"2021","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1016\/j.comnet.2019.06.015","article-title":"A multi-dimensional machine learning approach to predict advanced malware","volume":"160","author":"Bahtiyar","year":"2019","journal-title":"Comput. Netw."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1016\/j.cose.2019.02.007","article-title":"MalDAE: Detecting and explaining malware based on correlation and fusion of static and dynamic characteristics","volume":"83","author":"Han","year":"2019","journal-title":"Comput. Secur."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1016\/j.procs.2018.03.072","article-title":"ASSCA: API based sequence and statistics features combined malware detection architecture","volume":"129","author":"Xiaofeng","year":"2018","journal-title":"Procedia Comput. Sci."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"578","DOI":"10.1016\/j.cose.2018.05.010","article-title":"Early-stage malware prediction using recurrent neural networks","volume":"77","author":"Rhode","year":"2018","journal-title":"Comput. Secur."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Eskandari, M., Khorshidpur, Z., and Hashemi, S. (2012, January 22\u201324). To incorporate sequential dynamic features in malware detection engines. Proceedings of the 2012 European Intelligence and Security Informatics Conference, Odense, Denmark.","DOI":"10.1109\/EISIC.2012.57"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Lu, F., Cai, Z., Lin, Z., Bao, Y., and Tang, M. (2022). Research on the Construction of Malware Variant Datasets and Their Detection Method. Appl. Sci., 12.","DOI":"10.3390\/app12157546"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Rosenberg, I., Shabtai, A., Rokach, L., and Elovici, Y. (2018, January 10\u201312). Generic black-box end-to-end attack against state of the art API call based malware classifiers. Proceedings of the International Symposium on Research in Attacks, Intrusions, and Defenses, Crete, Greece.","DOI":"10.1007\/978-3-030-00470-5_23"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Yazi, A.F., \u00c7atak, F.\u00d6., and G\u00fcl, E. (2019, January 24\u201326). Classification of methamorphic malware with deep learning (LSTM). Proceedings of the 2019 27th Signal Processing and Communications Applications Conference (SIU), Sivas, Turkey.","DOI":"10.1109\/SIU.2019.8806571"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Hansen, S.S., Larsen, T.M.T., Stevanovic, M., and Pedersen, J.M. (2016, January 15\u201318). An approach for detection and family classification of malware based on behavioral analysis. Proceedings of the 2016 International Conference on Computing, Networking and Communications (ICNC), Kauai, HI, USA.","DOI":"10.1109\/ICCNC.2016.7440587"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Qiao, Y., Yang, Y., Ji, L., and He, J. (2013, January 16\u201318). Analyzing malware by abstracting the frequent itemsets in API call sequences. Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, Melbourne, Australia.","DOI":"10.1109\/TrustCom.2013.36"},{"key":"ref_28","first-page":"617","article-title":"API call-based malware classification using recurrent neural networks","volume":"10","author":"Li","year":"2021","journal-title":"J. Cyber Secur. Mobil."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1049\/iet-ifs.2018.5268","article-title":"Dynamic API call sequence visualisation for malware classification","volume":"13","author":"Tang","year":"2019","journal-title":"IET Inf. Secur."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"e285","DOI":"10.7717\/peerj-cs.285","article-title":"Deep learning based Sequential model for malware analysis using Windows exe API Calls","volume":"6","author":"Catak","year":"2020","journal-title":"PeerJ Comput. Sci."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Schofield, M., Alicioglu, G., Binaco, R., Turner, P., Thatcher, C., Lam, A., and Sun, B. (2021, January 28\u201330). Convolutional neural network for malware classification based on API call sequence. Proceedings of the Proceedings of the 8th International Conference on Artificial Intelligence and Applications (AIAP 2021), EL-Oued, Algeria.","DOI":"10.5121\/csit.2021.110106"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Rogel-Salazar, J. (2018). Data Science and Analytics with Python, Chapman and Hall\/CRC.","DOI":"10.1201\/9781315151670"},{"key":"ref_33","unstructured":"Networkx (2022, July 20). NetworkX Network Analysis in Python. Available online: https:\/\/networkx.org\/."},{"key":"ref_34","unstructured":"Graphviz (2022, July 20). What Is Graphviz?. Available online: https:\/\/graphviz.org\/."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/11\/11\/160\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:16:44Z","timestamp":1760145404000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/11\/11\/160"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,11]]},"references-count":34,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2022,11]]}},"alternative-id":["computers11110160"],"URL":"https:\/\/doi.org\/10.3390\/computers11110160","relation":{},"ISSN":["2073-431X"],"issn-type":[{"type":"electronic","value":"2073-431X"}],"subject":[],"published":{"date-parts":[[2022,11,11]]}}}