{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T21:37:56Z","timestamp":1774993076032,"version":"3.50.1"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2023,1,22]],"date-time":"2023-01-22T00:00:00Z","timestamp":1674345600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,9,18]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Android is the dominant operating system in the smartphone market and there exists millions of applications in various application stores. The increase in the number of applications has necessitated the detection of malicious applications in a short time. As opposed to dynamic analysis, it is possible to obtain results in a shorter time in static analysis as there is no need to run the applications. However, obtaining various information from application packages using reverse engineering techniques still requires a substantial amount of processing power. Although some attempts have been made to solve this problem by analyzing binary files without decoding the source code, there is still more work to be done in this area. In this study, we analyzed the applications in bytecode level without decoding the binary source files. We proposed a model using Term Frequency - Inverse Document Frequency (TF-IDF) word representation for feature extraction and Extreme Gradient Boosting (XGBoost) method for classification. The experimental results show that our model classifies a given application package as a malware or benign in 2.75 s with 99.05% F1-score on a balanced dataset, and in 3.30 s with 99.35% F1-score on an imbalanced dataset containing obfuscated malwares.<\/jats:p>","DOI":"10.1093\/comjnl\/bxac198","type":"journal-article","created":{"date-parts":[[2023,1,24]],"date-time":"2023-01-24T10:52:39Z","timestamp":1674557559000},"page":"2317-2328","source":"Crossref","is-referenced-by-count":15,"title":["Android Malware Detection in Bytecode Level Using TF-IDF and XGBoost"],"prefix":"10.1093","volume":"66","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8280-5368","authenticated-orcid":false,"given":"Gokhan","family":"Ozogur","sequence":"first","affiliation":[{"name":"Department of Computer Engineering, Istanbul University-Cerrahpasa , Istanbul, Turkey"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4030-1110","authenticated-orcid":false,"given":"Mehmet Ali","family":"Erturk","sequence":"additional","affiliation":[{"name":"Department of Transportation and Logistics, Istanbul University , Istanbul, Turkey"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4125-0589","authenticated-orcid":false,"given":"Zeynep","family":"Gurkas Aydin","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Istanbul University-Cerrahpasa , Istanbul, Turkey"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1846-6090","authenticated-orcid":false,"given":"Muhammed Ali","family":"Aydin","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Istanbul University-Cerrahpasa , Istanbul, Turkey"}]}],"member":"286","published-online":{"date-parts":[[2023,1,22]]},"reference":[{"key":"2023091720461399100_ref1","volume-title":"Idc - smartphone market share - market share","author":"Popal","year":"2021"},{"key":"2023091720461399100_ref2","volume-title":"G data mobile security report: more than 2.5 million new malware apps for android devices","author":"G DATA CyberDefense AG","year":"2022"},{"key":"2023091720461399100_ref3","doi-asserted-by":"crossref","first-page":"1125","DOI":"10.1093\/comjnl\/bxz121","article-title":"Byte2vec: malware representation and feature selection for android","volume":"63","author":"Yousefi-Azar","year":"2020","journal-title":"The Computer Journal"},{"key":"2023091720461399100_ref4","first-page":"3111","article-title":"Distributed representations of words and phrases and their compositionality","volume-title":"Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2","author":"Mikolov","year":"2013"},{"key":"2023091720461399100_ref5","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1145\/2939672.2939785","article-title":"Xgboost: A scalable tree boosting system","volume-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Chen","year":"2016"},{"key":"2023091720461399100_ref6","first-page":"23","article-title":"Drebin: Effective and explainable detection of android malware in your pocket","volume-title":"Symposium on Network and Distributed System Security (NDSS)","author":"Arp","year":"2014"},{"key":"2023091720461399100_ref7","first-page":"1518","article-title":"Mutual information and feature importance gradient boosting: automatic byte n-gram feature reranking for android malware detection","volume":"51","author":"Yousefi-Azar","year":"2021","journal-title":"Software: Practice and Experience"},{"key":"2023091720461399100_ref8","doi-asserted-by":"crossref","first-page":"2372","DOI":"10.1016\/j.procs.2017.08.216","article-title":"Evaluating convolutional neural network for effective mobile malware detection","volume":"112","author":"Martinelli","year":"2017","journal-title":"Procedia computer science"},{"key":"2023091720461399100_ref9","doi-asserted-by":"crossref","first-page":"S48","DOI":"10.1016\/j.diin.2018.01.007","article-title":"Maldozer: automatic framework for android malware detection using deep learning","volume":"24","author":"Karbab","year":"2018","journal-title":"Digital Investigation"},{"key":"2023091720461399100_ref10","article-title":"subgraph2vec: learning distributed representations of rooted sub-graphs from large graphs","author":"Narayanan","year":"2016"},{"key":"2023091720461399100_ref11","first-page":"43","article-title":"Recurrent neural network for malware detection","volume":"11","author":"Halim","year":"2019","journal-title":"Int. J. Advance Soft Compu. Appl"},{"key":"2023091720461399100_ref12","doi-asserted-by":"crossref","first-page":"5770","DOI":"10.1002\/int.22529","article-title":"Hybrid sequence-based android malware detection using natural language processing","volume":"36","author":"Zhang","year":"2021","journal-title":"International Journal of Intelligent Systems"},{"key":"2023091720461399100_ref13","doi-asserted-by":"crossref","DOI":"10.14722\/ndss.2017.23353","article-title":"Mamadroid: Detecting android malware by building markov chains of behavioral models","volume-title":"Proceedings of 24th Network and Distributed System Security Symposium (NDSS 2017)","author":"Mariconti","year":"2017"},{"key":"2023091720461399100_ref14","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1007\/978-3-319-66399-9_4","article-title":"Adversarial examples for malware detection","volume-title":"Computer Security \u2013 ESORICS 2017","author":"Grosse","year":"2017"},{"key":"2023091720461399100_ref15","doi-asserted-by":"crossref","first-page":"51964","DOI":"10.1109\/ACCESS.2018.2870534","article-title":"Dalvik opcode graph based android malware variants detection using global topology features","volume":"6","author":"Zhang","year":"2018","journal-title":"IEEE Access"},{"key":"2023091720461399100_ref16","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.compeleceng.2019.04.019","article-title":"A novel parallel classifier scheme for vulnerability detection in android","volume":"77","author":"Garg","year":"2019","journal-title":"Computers & Electrical Engineering"},{"key":"2023091720461399100_ref17","doi-asserted-by":"crossref","first-page":"1081","DOI":"10.1016\/j.dib.2018.12.038","article-title":"Data on vulnerability detection in android","volume":"22","author":"Garg","year":"2019","journal-title":"Data Brief"},{"key":"2023091720461399100_ref18","author":"Android developers - reduce your app size"},{"key":"2023091720461399100_ref19","author":"Dalvik executable format \u2014 android open source project"},{"key":"2023091720461399100_ref20","first-page":"468","article-title":"Androzoo: Collecting millions of android apps for the research community","volume-title":"2016 IEEE\/ACM 13th Working Conference on Mining Software Repositories (MSR)","author":"Allix","year":"2016"},{"key":"2023091720461399100_ref21","volume-title":"Github - artemkushnerov\/az: Downloads apks from androzoo repository","author":"Kushniarou","year":"2018"},{"key":"2023091720461399100_ref22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/CCST.2018.8585560","article-title":"Toward developing a systematic approach to generate benchmark android malware datasets and classification","volume-title":"2018 International Carnahan Conference on Security Technology (ICCST)","author":"Lashkari","year":"2018"},{"key":"2023091720461399100_ref23","first-page":"515","article-title":"Dynamic android malware category classification using semi-supervised deep learning","volume-title":"2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC\/PiCom\/CBDCom\/CyberSciTech)","author":"Mahdavifar","year":"2020"},{"key":"2023091720461399100_ref24","author":"Limitcpu"},{"key":"2023091720461399100_ref25","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"2023091720461399100_ref26","author":"Xgboost python package"},{"key":"2023091720461399100_ref27","volume-title":"Apktool - a tool for reverse engineering 3rd party, closed, binary android apps","year":"2010"}],"container-title":["The Computer Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/66\/9\/2317\/51643548\/bxac198.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/66\/9\/2317\/51643548\/bxac198.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,17]],"date-time":"2023-09-17T21:09:58Z","timestamp":1694984998000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/comjnl\/article\/66\/9\/2317\/6995421"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,22]]},"references-count":27,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2023,1,22]]},"published-print":{"date-parts":[[2023,9,18]]}},"URL":"https:\/\/doi.org\/10.1093\/comjnl\/bxac198","relation":{},"ISSN":["0010-4620","1460-2067"],"issn-type":[{"value":"0010-4620","type":"print"},{"value":"1460-2067","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,9]]},"published":{"date-parts":[[2023,1,22]]}}}