{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T15:59:36Z","timestamp":1774799976499,"version":"3.50.1"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T00:00:00Z","timestamp":1634515200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Manage. Inf. Syst."],"published-print":{"date-parts":[[2022,6,30]]},"abstract":"<jats:p>To relieve the burden of security analysts, Android malware detection and its family classification need to be automated. There are many previous works focusing on using machine (or deep) learning technology to tackle these two important issues, but as the number of mobile applications has increased in recent years, developing a scalable and precise solution is a new challenge that needs to be addressed in the security field. Accordingly, in this article, we propose a novel approach that not only enhances the performance of both Android malware and its family classification, but also reduces the running time of the analysis process. Using large-scale datasets obtained from different sources, we demonstrate that our method is able to output a high F-measure of 99.71% with a low FPR of 0.37%. Meanwhile, the computation time for processing a 300K dataset is reduced to nearly 3.3 hours. In addition, in classification evaluation, we demonstrate that the F-measure, precision, and recall are 97.5%, 96.55%, 98.64%, respectively, when classifying 28 malware families. Finally, we compare our method with previous studies in both detection and classification evaluation. We observe that our method produces better performance in terms of its effectiveness and efficiency.<\/jats:p>","DOI":"10.1145\/3464323","type":"journal-article","created":{"date-parts":[[2021,10,19]],"date-time":"2021-10-19T01:18:12Z","timestamp":1634606292000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Detecting Android Malware and Classifying Its Families in Large-scale Datasets"],"prefix":"10.1145","volume":"13","author":[{"given":"Bo","family":"Sun","sequence":"first","affiliation":[{"name":"National Institute of Information and Communications Technology, Tokyo, Japan"}]},{"given":"Takeshi","family":"Takahashi","sequence":"additional","affiliation":[{"name":"National Institute of Information and Communications Technology, Tokyo, Japan"}]},{"given":"Tao","family":"Ban","sequence":"additional","affiliation":[{"name":"National Institute of Information and Communications Technology, Tokyo, Japan"}]},{"given":"Daisuke","family":"Inoue","sequence":"additional","affiliation":[{"name":"National Institute of Information and Communications Technology, Tokyo, Japan"}]}],"member":"320","published-online":{"date-parts":[[2021,10,18]]},"reference":[{"key":"e_1_3_2_2_1","unstructured":"2002. MALLET Documentation. Retrieved from https:\/\/www.cs.cmu.edu\/afs\/cs.cmu.edu\/project\/cmt-40\/Nice\/Urdu-MT\/code\/Tools\/POS\/postagger\/mallet_0.4\/doc\/documentation.html."},{"key":"e_1_3_2_3_1","unstructured":"2011. Dedxer. Retrieved from http:\/\/dedexer.sourceforge.net\/."},{"key":"e_1_3_2_4_1","unstructured":"2018. McAfee Labs Threats Report June 2018. Retrieved from https:\/\/www.mcafee.com\/enterprise\/en-us\/assets\/reports\/rp-quarterly-threats-jun-2018.pdf. (2018)."},{"key":"e_1_3_2_5_1","unstructured":"2018. The Statistics Portal. Retrieved from https:\/\/www.statista.com\/statistics\/266136\/global-market-share-held-by-smartphone-operating-systems\/."},{"key":"e_1_3_2_6_1","unstructured":"2019. Google Play Store. Retrieved from https:\/\/play.google.com\/store."},{"key":"e_1_3_2_7_1","unstructured":"2019. Opera Mobile Store - Bemobi. http:\/\/android.oms.apps.bemobi.com\/."},{"key":"e_1_3_2_8_1","unstructured":"2019. scikit-learn:machine learning in Python. http:\/\/scikit-learn.org\/stable\/."},{"key":"e_1_3_2_9_1","unstructured":"2019. TensorFlow. Retrieved from https:\/\/www.tensorflow.org."},{"key":"e_1_3_2_10_1","unstructured":"2019. VirusTotal- Free Online Virus Malware and URL Scanner. Retrieved from https:\/\/www.virustotal.com."},{"key":"e_1_3_2_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-04283-1_6"},{"key":"e_1_3_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2901739.2903508"},{"key":"e_1_3_2_13_1","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2014.23247"},{"key":"e_1_3_2_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/2818754.2818808"},{"key":"e_1_3_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2976749.2978422"},{"key":"e_1_3_2_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944937"},{"key":"e_1_3_2_17_1","volume-title":"Master\u2019s thesis. Royal Holloway University of London","author":"Korczynski Lorenzo Cavallaro David","year":"2015","unstructured":"Lorenzo Cavallaro David Korczynski. 2015. ClusTheDroid: Clustering Android Malware. Master\u2019s thesis. Royal Holloway University of London."},{"key":"e_1_3_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2018.2806891"},{"key":"e_1_3_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3162625"},{"key":"e_1_3_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517312.2517315"},{"key":"e_1_3_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2568225.2568276"},{"key":"e_1_3_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/WIW.2016.040"},{"key":"e_1_3_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICC.2014.6883436"},{"key":"e_1_3_2_24_1","first-page":"281","volume-title":"Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability","author":"Macqueen J.","year":"1967","unstructured":"J. Macqueen. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 281\u2013297."},{"key":"e_1_3_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2513228.2513295"},{"key":"e_1_3_2_26_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1181"},{"key":"e_1_3_2_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/3044805.3045025"},{"key":"e_1_3_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCNC.2018.8390391"},{"key":"e_1_3_2_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-66332-6_9"},{"key":"e_1_3_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3029806.3029823"},{"key":"e_1_3_2_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/2999792.2999959"},{"key":"e_1_3_2_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-47217-1_8"},{"key":"e_1_3_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2382196.2382224"},{"key":"e_1_3_2_34_1","first-page":"45","volume-title":"Proceedings of the LREC Workshop on New Challenges for NLP Frameworks","author":"\u0158eh\u016f\u0159ek Radim","year":"2010","unstructured":"Radim \u0158eh\u016f\u0159ek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC Workshop on New Challenges for NLP Frameworks. 45\u201350. Retrieved from http:\/\/is.muni.cz\/publication\/884893\/en."},{"key":"e_1_3_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/EISIC.2012.34"},{"key":"e_1_3_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2016.2536605"},{"key":"e_1_3_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TrustCom.2016.0070"},{"key":"e_1_3_2_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2013.07.106"},{"key":"e_1_3_2_39_1","first-page":"181","volume-title":"Android Application Analysis Using Machine Learning Techniques","author":"Takahashi Takeshi","year":"2019","unstructured":"Takeshi Takahashi and Tao Ban. 2019. Android Application Analysis Using Machine Learning Techniques. Springer International Publishing, Cham, 181\u2013205. DOI:https:\/\/doi.org\/10.1007\/978-3-319-98842-9_7"},{"key":"e_1_3_2_40_1","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2015.23145"},{"key":"e_1_3_2_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-60876-1_12"},{"key":"e_1_3_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3243734.3243835"},{"key":"e_1_3_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/AsiaJCIS.2012.18"},{"key":"e_1_3_2_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2740070.2631434"},{"key":"e_1_3_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2012.16"}],"container-title":["ACM Transactions on Management Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3464323","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3464323","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:12:16Z","timestamp":1750191136000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3464323"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,18]]},"references-count":44,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,6,30]]}},"alternative-id":["10.1145\/3464323"],"URL":"https:\/\/doi.org\/10.1145\/3464323","relation":{},"ISSN":["2158-656X","2158-6578"],"issn-type":[{"value":"2158-656X","type":"print"},{"value":"2158-6578","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,18]]},"assertion":[{"value":"2019-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-10-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}