{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,20]],"date-time":"2026-04-20T10:04:13Z","timestamp":1776679453712,"version":"3.51.2"},"reference-count":66,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2024,1,1]],"date-time":"2024-01-01T00:00:00Z","timestamp":1704067200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"FCT\u2014Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","award":["SFRH\/BD\/145472\/2019"],"award-info":[{"award-number":["SFRH\/BD\/145472\/2019"]}]},{"name":"FCT\u2014Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","award":["UIDB\/50008\/2020"],"award-info":[{"award-number":["UIDB\/50008\/2020"]}]},{"name":"FCT\u2014Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","award":["C645008882-00000055"],"award-info":[{"award-number":["C645008882-00000055"]}]},{"name":"Instituto de Telecomunica\u00e7\u00f5es; and Portuguese Recovery and Resilience Plan","award":["SFRH\/BD\/145472\/2019"],"award-info":[{"award-number":["SFRH\/BD\/145472\/2019"]}]},{"name":"Instituto de Telecomunica\u00e7\u00f5es; and Portuguese Recovery and Resilience Plan","award":["UIDB\/50008\/2020"],"award-info":[{"award-number":["UIDB\/50008\/2020"]}]},{"name":"Instituto de Telecomunica\u00e7\u00f5es; and Portuguese Recovery and Resilience Plan","award":["C645008882-00000055"],"award-info":[{"award-number":["C645008882-00000055"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>The presence of malicious software (malware), for example, in Android applications (apps), has harmful or irreparable consequences to the user and\/or the device. Despite the protections app stores provide to avoid malware, it keeps growing in sophistication and diffusion. In this paper, we explore the use of machine learning (ML) techniques to detect malware in Android apps. The focus is on the study of different data pre-processing, dimensionality reduction, and classification techniques, assessing the generalization ability of the learned models using public domain datasets and specifically developed apps. We find that the classifiers that achieve better performance for this task are support vector machines (SVM) and random forests (RF). We emphasize the use of feature selection (FS) techniques to reduce the data dimensionality and to identify the most relevant features in Android malware classification, leading to explainability on this task. Our approach can identify the most relevant features to classify an app as malware. Namely, we conclude that permissions play a prominent role in Android malware detection. The proposed approach reduces the data dimensionality while achieving high accuracy in identifying malware in Android apps.<\/jats:p>","DOI":"10.3390\/info15010025","type":"journal-article","created":{"date-parts":[[2024,1,1]],"date-time":"2024-01-01T13:02:58Z","timestamp":1704114178000},"page":"25","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Explainable Machine Learning for Malware Detection on Android Applications"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-4329-1991","authenticated-orcid":false,"given":"Catarina","family":"Palma","sequence":"first","affiliation":[{"name":"ISEL, Instituto Superior de Engenharia de Lisboa, Instituto Polit\u00e9cnico de Lisboa, 1959-007 Lisboa, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6508-0932","authenticated-orcid":false,"given":"Artur","family":"Ferreira","sequence":"additional","affiliation":[{"name":"ISEL, Instituto Superior de Engenharia de Lisboa, Instituto Polit\u00e9cnico de Lisboa, 1959-007 Lisboa, Portugal"},{"name":"Instituto de Telecomunica\u00e7\u00f5es, 1049-001 Lisboa, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0970-7745","authenticated-orcid":false,"given":"M\u00e1rio","family":"Figueiredo","sequence":"additional","affiliation":[{"name":"Instituto de Telecomunica\u00e7\u00f5es, 1049-001 Lisboa, Portugal"},{"name":"IST, Instituto Superior T\u00e9cnico, Universidade de Lisboa, 1049-001 Lisboa, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2024,1,1]]},"reference":[{"key":"ref_1","unstructured":"(2023, December 29). How Many People Have Smartphones?|Oberlo. Available online: https:\/\/www.oberlo.com\/statistics\/how-many-people-have-smartphones."},{"key":"ref_2","unstructured":"Turner, A. (2023, December 29). Android vs. Apple Market Share: Leading Mobile OS. Available online: https:\/\/www.bankmycell.com\/blog\/android-vs-apple-market-share\/."},{"key":"ref_3","unstructured":"(2023, December 29). How Many Apps in Google Play Store?. Available online: https:\/\/www.bankmycell.com\/blog\/number-of-google-play-store-apps\/."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Alkahtani, H., and Aldhyani, T.H. (2022). Artificial intelligence algorithms for malware detection in Android-operated mobile devices. Sensors, 22.","DOI":"10.3390\/s22062268"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Czach\u00f3rski, T., Gelenbe, E., Grochla, K., and Lent, R. (2016). Computer and Information Sciences, Springer International Publishing.","DOI":"10.1007\/978-3-319-47217-1"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1016\/j.iotcps.2023.03.001","article-title":"Android malware classification using optimum feature selection and ensemble machine learning","volume":"3","author":"Islam","year":"2023","journal-title":"Internet Things Cyber-Phys. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"9517","DOI":"10.1007\/s11042-022-13767-2","article-title":"Android malware detection applying feature selection techniques and machine learning","volume":"82","author":"Keyvanpour","year":"2023","journal-title":"Multimed. Tools Appl."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Mart\u00edn, A., Calleja, A., Men\u00e9ndez, H.D., Tapiador, J., and Camacho, D. (2016, January 6\u20139). ADROIT: Android malware detection using meta-information. Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece.","DOI":"10.1109\/SSCI.2016.7849904"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"185","DOI":"10.3390\/info12050185","article-title":"A comprehensive survey on machine learning techniques for Android malware detection","volume":"12","author":"Kouliaridis","year":"2021","journal-title":"Information"},{"key":"ref_10","first-page":"8896013","article-title":"A survey of Android malware static detection technology based on machine learning","volume":"2021","author":"Wu","year":"2021","journal-title":"Mob. Inform. Syst."},{"key":"ref_11","unstructured":"Palma, C., Ferreira, A., and Figueiredo, M. (2023, January 7\u20138). On the use of machine learning techniques to detect malware in mobile applications. Proceedings of the 14th Simp\u00f3sio de Inform\u00e1tica (INForum), Porto, Portugal. Available online: https:\/\/www.inforum2023.org\/Atas\/paper_6478\/6478-CR.pdf."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"102833","DOI":"10.1016\/j.cose.2022.102833","article-title":"An in-depth review of machine learning based Android malware detection","volume":"121","author":"Muzaffar","year":"2022","journal-title":"Comput. Secur."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Alqahtani, E.J., Zagrouba, R., and Almuhaideb, A. (2019, January 10\u201313). A Survey on Android Malware Detection Techniques Using Machine Learning Algorithms. Proceedings of the 2019 Sixth International Conference on Software Defined Systems (SDS), Rome, Italy.","DOI":"10.1109\/SDS.2019.8768729"},{"key":"ref_14","unstructured":"(2023, December 29). Android Malware Dataset for Machine Learning|Kaggle. Available online: https:\/\/www.kaggle.com\/datasets\/shashwatwork\/android-malware-dataset-for-machine-learning."},{"key":"ref_15","unstructured":"(2023, December 29). Android Permission Dataset|Kaggle. Available online: https:\/\/www.kaggle.com\/datasets\/saurabhshahane\/android-permission-dataset."},{"key":"ref_16","unstructured":"(2023, December 29). Android Malware Dataset|Kaggle. Available online: https:\/\/www.kaggle.com\/datasets\/saurabhshahane\/android-malware-dataset."},{"key":"ref_17","unstructured":"(2023, December 29). Android Malware Static Feature Dataset (6 Datasets)|Kaggle. Available online: https:\/\/www.kaggle.com\/datasets\/laxman1216\/android-static-features-datasets6-features."},{"key":"ref_18","unstructured":"(2023, December 29). Data Preprocessing in Machine Learning [Steps & Techniques]. Available online: https:\/\/www.v7labs.com\/blog\/data-preprocessing-guide."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1794","DOI":"10.1016\/j.patrec.2012.05.019","article-title":"Efficient feature selection filters for high-dimensional data","volume":"33","author":"Ferreira","year":"2012","journal-title":"Pattern Recognit. Lett."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res."},{"key":"ref_21","unstructured":"Witten, I., Frank, E., Hall, M., and Pal, C. (2016). Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kauffmann. [4th ed.]."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"476","DOI":"10.1109\/TSMCC.2004.843247","article-title":"Top-down induction of decision trees classifiers\u2014A survey","volume":"35","author":"Rokach","year":"2005","journal-title":"IEEE Trans. Syst. Man Cybern. Part C Appl. Rev."},{"key":"ref_24","unstructured":"Alpaydin, E. (2010). Introduction to Machine Learning, The MIT Press. [2nd ed.]."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Vapnik, V. (1999). The Nature of Statistical Learning Theory, Springer.","DOI":"10.1007\/978-1-4757-3264-1"},{"key":"ref_26","unstructured":"(2023, December 29). Support Vector Machines (SVM)\u2014An Overview|By Rushikesh Pupale|Towards Data Science. Available online: https:\/\/towardsdatascience.com\/https-medium-com-pupalerushikesh-svm-f4b42800e989."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1007\/BF00153759","article-title":"Instance-based learning algorithms","volume":"6","author":"Aha","year":"1991","journal-title":"Mach. Learn."},{"key":"ref_28","unstructured":"Duda, R., Hart, P., and Stork, D. (2001). Pattern Classification, John Wiley & Sons. [2nd ed.]."},{"key":"ref_29","unstructured":"Haykin, S. (1999). Neural Networks: A Comprehensive Foundation, Prentice Hall. [2nd ed.]."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Bishop, C. (1995). Neural Networks for Pattern Recognition, Oxford University Press.","DOI":"10.1093\/oso\/9780198538493.001.0001"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"763","DOI":"10.1016\/j.procs.2023.03.101","article-title":"A Comparative Analysis of Machine Learning Algorithms for Android Malware Detection","volume":"220","author":"AlOmari","year":"2023","journal-title":"Procedia Comput. Sci."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Kouliaridis, V., Kambourakis, G., and Peng, T. (2020\u20131, January 29). Feature Importance in Android Malware Detection. Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China.","DOI":"10.1109\/TrustCom50675.2020.00195"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Renault, \u00c9., Boumerdassi, S., and M\u00fchlethaler, P. (2021). Machine Learning for Networking, Springer International Publishing.","DOI":"10.1007\/978-3-030-70866-5"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Kouliaridis, V., Kambourakis, G., Geneiatakis, D., and Potha, N. (2020). Two Anatomists Are Better than One\u2014Dual-Level Android Malware Detection. Symmetry, 12.","DOI":"10.3390\/sym12071128"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1077","DOI":"10.1080\/09540091.2020.1853056","article-title":"An extrinsic random-based ensemble approach for android malware detection","volume":"33","author":"Potha","year":"2021","journal-title":"Connect. Sci."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"273","DOI":"10.3390\/digital3030017","article-title":"Web-Based Malware Detection System Using Convolutional Neural Network","volume":"3","author":"Alqahtani","year":"2023","journal-title":"Digital"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhang, S., Hu, C., Wang, L., Mihaljevic, M.J., Xu, S., and Lan, T. (2023). A Malware Detection Approach Based on Deep Learning and Memory Forensics. Symmetry, 15.","DOI":"10.3390\/sym15030758"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Alomari, E.S., Nuiaa, R.R., Alyasseri, Z.A.A., Mohammed, H.J., Sani, N.S., Esa, M.I., and Musawi, B.A. (2023). Malware Detection Using Deep Learning and Correlation-Based Feature Selection. Symmetry, 15.","DOI":"10.3390\/sym15010123"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Akhtar, M.S., and Feng, T. (2022). Malware Analysis and Detection Using Machine Learning Algorithms. Symmetry, 14.","DOI":"10.3390\/sym14112304"},{"key":"ref_40","first-page":"650","article-title":"Malware Detection and Classification on Different Dataset by Hybridization of CNN and Machine Learning","volume":"12","author":"Hashmi","year":"2023","journal-title":"Int. J. Intell. Syst. Appl. Eng."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Djenna, A., Bouridane, A., Rubab, S., and Marou, I.M. (2023). Artificial Intelligence-Based Malware Detection, Analysis, and Mitigation. Symmetry, 15.","DOI":"10.3390\/sym15030677"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"102915","DOI":"10.1016\/j.cose.2022.102915","article-title":"An Android Malware Detection and Classification Approach Based on Contrastive Lerning","volume":"123","author":"Yang","year":"2022","journal-title":"Comput. Secur."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Lu, K., Cheng, J., and Yan, A. (2023). Malware Detection Based on the Feature Selection of a Correlation Information Decision Matrix. Mathematics, 11.","DOI":"10.3390\/math11040961"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"2850932","DOI":"10.1155\/2019\/2850932","article-title":"Improved malware detection model with apriori association rule and particle swarm optimization","volume":"2019","author":"Adebayo","year":"2019","journal-title":"Secur. Commun. Netw."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Yang, S., Xu, L., Li, X., and Zhao, D. (2023). A Malware Detection Framework Based on Semantic Information of Behavioral Features. Appl. Sci., 13.","DOI":"10.3390\/app132212528"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Wang, G., Ciptadi, A., and Ahmadzadeh, A. (2021). Deployable Machine Learning for Security Defense, Springer International Publishing.","DOI":"10.1007\/978-3-030-87839-9"},{"key":"ref_47","first-page":"331","article-title":"Hybroid: A Novel Hybrid Android Malware Detection Framework","volume":"14","year":"2021","journal-title":"Erzincan Univ. J. Sci. Technol."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Aboaoja, F.A., Zainal, A., Ghaleb, F.A., Al-rimy, B.A.S., Eisa, T.A.E., and Elnour, A.A.H. (2022). Malware Detection Issues, Challenges, and Future Directions: A Survey. Appl. Sci., 12.","DOI":"10.3390\/app12178482"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Agrawal, P., and Trivedi, B. (2019, January 20\u201322). A Survey on Android Malware and their Detection Techniques. Proceedings of the 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India.","DOI":"10.1109\/ICECCT.2019.8868951"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Almomani, I., Ahmed, M., and El-Shafai, W. (2022). Android malware analysis in a nutshell. PLoS ONE, 17.","DOI":"10.1371\/journal.pone.0270647"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3605775","article-title":"Deep Learning for Zero-Day Malware Detection and Classification: A Survey","volume":"56","author":"Deldar","year":"2023","journal-title":"ACM Comput. Surv."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Faruki, P., Bhan, R., Jain, V., Bhatia, S., El Madhoun, N., and Pamula, R. (2023). A Survey and Evaluation of Android-Based Malware Evasion Techniques and Detection Frameworks. Information, 14.","DOI":"10.3390\/info14070374"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Gyamfi, N.K., Goranin, N., Ceponis, D., and \u010cenys, H.A. (2023). Automated System-Level Malware Detection Using Machine Learning: A Comprehensive Review. Appl. Sci., 13.","DOI":"10.3390\/app132111908"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"124579","DOI":"10.1109\/ACCESS.2020.3006143","article-title":"A Review of Android Malware Detection Approaches Based on Machine Learning","volume":"8","author":"Liu","year":"2020","journal-title":"IEEE Access"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"2007327","DOI":"10.1080\/08839514.2021.2007327","article-title":"A Systematic Overview of Android Malware Detection","volume":"36","author":"Meijin","year":"2022","journal-title":"Appl. Artif. Intell."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"012011","DOI":"10.1088\/1742-6596\/1807\/1\/012011","article-title":"Malware Detection: Issues and Challenges","volume":"1807","author":"Naseer","year":"2021","journal-title":"J. Phys. Conf. Ser."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Florez, H., Diaz, C., and Chavarriaga, J. (2018). Applied Informatics, Springer International Publishing.","DOI":"10.1007\/978-3-030-01535-0"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"100358","DOI":"10.1016\/j.cosrev.2020.100358","article-title":"A survey of malware detection in Android apps: Recommendations and perspectives for future research","volume":"39","author":"Razgallah","year":"2021","journal-title":"Comput. Sci. Rev."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/s13673-018-0125-x","article-title":"A State-of-the-Art Survey of Malware Detection Approaches Using Data Mining Techniques","volume":"8","author":"Souri","year":"2018","journal-title":"Hum.-Centric Comput. Inf. Sci."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3417978","article-title":"A Survey of Android Malware Detection with Deep Neural Models","volume":"53","author":"Qiu","year":"2020","journal-title":"ACM Comput. Surv."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Vasani, V., Bairwa, A.K., Joshi, S., Pljonkin, A., Kaur, M., and Amoon, M. (2023). Comprehensive Analysis of Advanced Techniques and Vital Tools for Detecting Malware Intrusion. Electronics, 12.","DOI":"10.3390\/electronics12204299"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Xu, Y., Yan, H., Teng, H., Cai, J., and Li, J. (2023). Machine Learning for Cyber Security, Springer International Publishing.","DOI":"10.1007\/978-3-031-20096-0"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Guyon, I., Gunn, S., Nikravesh, M., and Zadeh, L. (2006). (Eds.) Feature Extraction, Foundations and Applications, Springer.","DOI":"10.1007\/978-3-540-35488-8"},{"key":"ref_64","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res. (JMLR)"},{"key":"ref_65","unstructured":"(2023, December 29). sklearn.model_selection.GridSearchCV\u2014Scikit-Learn 1.3.1 Documentation. Available online: https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.GridSearchCV.html."},{"key":"ref_66","unstructured":"(2023, December 29). Not So Boring Android Malware|Android-Malware-Samples. Available online: https:\/\/maldroid.github.io\/android-malware-samples\/."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/1\/25\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T13:37:54Z","timestamp":1760103474000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/1\/25"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,1]]},"references-count":66,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,1]]}},"alternative-id":["info15010025"],"URL":"https:\/\/doi.org\/10.3390\/info15010025","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,1]]}}}