{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T02:37:38Z","timestamp":1776134258691,"version":"3.50.1"},"reference-count":39,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2026,1,18]],"date-time":"2026-01-18T00:00:00Z","timestamp":1768694400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Digital"],"abstract":"<jats:p>Android malware continues to evolve through obfuscation and polymorphism, posing challenges for both signature-based defenses and machine learning models trained on limited and imbalanced datasets. Synthetic data has been proposed as a remedy for scarcity, yet the role of Large Language Models (LLMs) in generating effective malware data for detection tasks remains underexplored. In this study, we fine-tune GPT-4.1-mini to produce structured records for three malware families: BankBot, Locker\/SLocker, and Airpush\/StopSMS, using the KronoDroid dataset. After addressing generation inconsistencies with prompt engineering and post-processing, we evaluate multiple classifiers under three settings: training with real data only, real-plus-synthetic data, and synthetic data alone. Results show that real-only training achieves near-perfect detection, while augmentation with synthetic data preserves high performance with only minor degradations. In contrast, synthetic-only training produces mixed outcomes, with effectiveness varying across malware families and fine-tuning strategies. These findings suggest that LLM-generated tabular malware feature records can enhance scarce datasets without compromising detection accuracy, but remain insufficient as a standalone training source.<\/jats:p>","DOI":"10.3390\/digital6010005","type":"journal-article","created":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T11:35:27Z","timestamp":1768822527000},"page":"5","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["LLM-Generated Samples for Android Malware Detection"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-4674-8616","authenticated-orcid":false,"given":"Nik","family":"Rollinson","sequence":"first","affiliation":[{"name":"School of Architecture, Technology and Engineering, University of Brighton, Brighton BN2 4GJ, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4249-4953","authenticated-orcid":false,"given":"Nikolaos","family":"Polatidis","sequence":"additional","affiliation":[{"name":"School of Architecture, Technology and Engineering, University of Brighton, Brighton BN2 4GJ, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2026,1,18]]},"reference":[{"key":"ref_1","first-page":"8896013","article-title":"A Survey of Android Malware Static Detection Technology Based on Machine Learning","volume":"2021","author":"Wu","year":"2021","journal-title":"Mob. Inf. Syst."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1093\/comjnl\/bxae114","article-title":"ChatGPT-Driven Machine Learning Code Generation for Android Malware Detection","volume":"68","author":"Nelson","year":"2025","journal-title":"Comput. J."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"125546","DOI":"10.1016\/j.eswa.2024.125546","article-title":"AppPoet: Large Language Model Based Android Malware Detection via Multi-View Prompt Engineering","volume":"262","author":"Zhao","year":"2025","journal-title":"Expert Syst. Appl."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Achuthan, K., Ramanathan, S., Srinivas, S., and Raman, R. (2024). Advancing Cybersecurity and Privacy with Artificial Intelligence: Current Trends and Future Research Directions. Front. Big Data, 7.","DOI":"10.3389\/fdata.2024.1497535"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1007\/s11280-024-01287-y","article-title":"FSSDroid: Feature Subset Selection for Android Malware Detection","volume":"27","author":"Polatidis","year":"2024","journal-title":"World Wide Web"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"143806","DOI":"10.1109\/ACCESS.2024.3468914","article-title":"Applications of LLMs for Generating Cyber Security Exercise Scenarios","volume":"12","author":"Hashmi","year":"2024","journal-title":"IEEE Access"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1473","DOI":"10.1007\/s40745-022-00444-2","article-title":"Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects","volume":"10","author":"Sarker","year":"2023","journal-title":"Ann. Data Sci."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"104078","DOI":"10.1016\/j.rineng.2025.104078","article-title":"Enhancing Cybersecurity Incident Response: AI-Driven Optimization for Strengthened Advanced Persistent Threat Detection","volume":"25","author":"Ali","year":"2025","journal-title":"Results Eng."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.iotcps.2025.01.001","article-title":"Generative AI in Cybersecurity: A Comprehensive Review of LLM Applications and Vulnerabilities","volume":"5","author":"Ferrag","year":"2025","journal-title":"Internet Things Cyber-Phys. Syst."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"44662","DOI":"10.1109\/ACCESS.2025.3547433","article-title":"Cyber Attack Prediction: From Traditional Machine Learning to Generative Artificial Intelligence","volume":"13","author":"Ankalaki","year":"2025","journal-title":"IEEE Access"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Botacin, M. (2023, January 25). Gpthreats-3: Is Automatic Malware Generation a Threat?. Proceedings of the 2023 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA.","DOI":"10.1109\/SPW59333.2023.00027"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"127","DOI":"10.5937\/bizinfo2302127U","article-title":"The Use of the ChatGPT Language Model in the Creation of Malicious Programs","volume":"14","year":"2023","journal-title":"BizInfo"},{"key":"ref_13","unstructured":"Pa, Y.M.P., Tanizaki, S., Kou, T., van Eeten, M., Yoshioka, K., and Matsumoto, T. (2023, January 7). An Attacker\u2019s Dream? Exploring the Capabilities of ChatGPT for Developing Malware. Proceedings of the 16th Cyber Security Experimentation and Test Workshop, Marina del Rey, CA, USA."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2075","DOI":"10.1007\/s10207-024-00835-x","article-title":"Generative AI for Pentesting: The Good, the Bad, the Ugly","volume":"23","author":"Hilario","year":"2024","journal-title":"Int. J. Inf. Secur."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Berrios, S., Leiva, D., Olivares, B., Allende-Cid, H., and Hermosilla, P. (2025). Systematic Review: Malware Detection and Classification in Cybersecurity. Appl. Sci., 15.","DOI":"10.3390\/app15147747"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"106030","DOI":"10.1016\/j.engappai.2023.106030","article-title":"A Novel Deep Learning-Based Approach for Malware Detection","volume":"122","author":"Shaukat","year":"2023","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Gyamfi, N.K., Goranin, N., Ceponis, D., and \u010cenys, H.A. (2023). Automated System-Level Malware Detection Using Machine Learning: A Comprehensive Review. Appl. Sci., 13.","DOI":"10.3390\/app132111908"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1186\/s40537-024-00957-y","article-title":"Advancing Cybersecurity: A Comprehensive Review of AI-Driven Detection Techniques","volume":"11","author":"Salem","year":"2024","journal-title":"J. Big Data"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"117936","DOI":"10.1016\/j.eswa.2022.117936","article-title":"Generating Realistic Cyber Data for Training and Evaluating Machine Learning Classifiers for Network Intrusion Detection Systems","volume":"207","author":"Bastian","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_20","unstructured":"Ammara, D.A., Ding, J., and Tutschku, K. (2024). Synthetic Data Generation in Cybersecurity: A Comparative Analysis. arXiv, Available online: https:\/\/go.exlibris.link\/3xzCrxHL."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"101212","DOI":"10.1016\/j.iot.2024.101212","article-title":"SYN-GAN: A Robust Intrusion Detection System Using GAN-Based Synthetic Data for IoT Security","volume":"26","author":"Rahman","year":"2024","journal-title":"Internet Things"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Almorjan, A., Basheri, M., and Almasre, M. (2025). Large Language Models for Synthetic Dataset Generation of Cybersecurity Indicators of Compromise. Sensors, 25.","DOI":"10.3390\/s25092825"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Elnashar, A., White, J., and Schmidt, D.C. (2025). Enhancing Structured Data Generation with GPT-4o: Evaluating Prompt Efficiency Across Prompt Styles. Front. Artif. Intell., 8.","DOI":"10.3389\/frai.2025.1558938"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., and Siemens, C. (2014, January 23\u201326). Drebin: Effective and Explainable Detection of Android Malware in Your Pocket. Proceedings of the NDSS, San Diego, CA, USA.","DOI":"10.14722\/ndss.2014.23247"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Jiang, X. (2012, January 20\u201323). Dissecting Android Malware: Characterization and Evolution. Proceedings of the 2012 IEEE Symposium on Security and Privacy, San Francisco, CA, USA.","DOI":"10.1109\/SP.2012.16"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Alecci, M., Jim\u00e9nez, P.J.R., Allix, K., Bissyand\u00e9, T.F., and Klein, J. (2024, January 15\u201316). AndroZoo: A Retrospective with a Glimpse into the Future. Proceedings of the 21st International Conference on Mining Software Repositories, Lisbon, Portugal.","DOI":"10.1145\/3643991.3644863"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1038\/s41597-024-03027-3","article-title":"AndroDex: Android Dex Images of Obfuscated Malware","volume":"11","author":"Aurangzeb","year":"2024","journal-title":"Sci. Data"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"103969","DOI":"10.1016\/j.cose.2024.103969","article-title":"MaDroid: A Maliciousness-Aware Multifeatured Dataset for Detecting Android Malware","volume":"144","author":"Duan","year":"2024","journal-title":"Comput. Secur."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"102399","DOI":"10.1016\/j.cose.2021.102399","article-title":"KronoDroid: Time-Based Hybrid-Featured Dataset for Effective Android Malware Detection and Characterization","volume":"110","author":"Bahsi","year":"2021","journal-title":"Comput. Secur."},{"key":"ref_30","first-page":"1378","article-title":"DBank: Predictive Behavioral Analysis of Recent Android Banking Trojans","volume":"18","author":"Bai","year":"2021","journal-title":"IEEE Trans. Dependable Secur. Comput."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"20381","DOI":"10.1109\/ACCESS.2018.2888568","article-title":"Detecting Android Locker-Ransomware on Chinese Social Networks","volume":"7","author":"Su","year":"2019","journal-title":"IEEE Access"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Rastogi, V., Shao, R., Chen, Y., Pan, X., Zou, S., and Riley, R. (2016, January 21\u201324). Are These Ads Safe: Detecting Hidden Attacks Through the Mobile App\u2013Web Interfaces. Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA.","DOI":"10.14722\/ndss.2016.23234"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Babbar, H., Rani, S., Sah, D.K., AlQahtani, S.A., and Bashir, A.K. (2023). Detection of Android Malware in the Internet of Things through the K-Nearest Neighbor Algorithm. Sensors, 23.","DOI":"10.3390\/s23167256"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1016\/j.cose.2018.11.001","article-title":"Survey of Machine Learning Techniques for Malware Analysis","volume":"81","author":"Ucci","year":"2019","journal-title":"Comput. Secur."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Ferdous, J., Islam, R., Mahboubi, A., and Islam, M.Z. (2025). A Survey on ML Techniques for Multi-Platform Malware Detection: Securing PC, Mobile Devices, IoT, and Cloud Environments. Sensors, 25.","DOI":"10.3390\/s25041153"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"36","DOI":"10.21015\/vtse.v10i2.963","article-title":"Role of Logistic Regression in Malware Detection: A Systematic Literature Review","volume":"10","author":"Farooq","year":"2022","journal-title":"VFAST Trans. Softw. Eng."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"420","DOI":"10.1007\/s42979-021-00815-1","article-title":"Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions","volume":"2","author":"Sarker","year":"2021","journal-title":"SN Comput. Sci."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1186\/s40537-025-01157-y","article-title":"Application of Deep Learning in Malware Detection: A Review","volume":"12","author":"Song","year":"2025","journal-title":"J. Big Data"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1016\/j.procs.2016.06.047","article-title":"Random Forest Modeling for Network Intrusion Detection System","volume":"89","author":"Farnaaz","year":"2016","journal-title":"Procedia Comput. Sci."}],"container-title":["Digital"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2673-6470\/6\/1\/5\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T12:01:43Z","timestamp":1768824103000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2673-6470\/6\/1\/5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,18]]},"references-count":39,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,3]]}},"alternative-id":["digital6010005"],"URL":"https:\/\/doi.org\/10.3390\/digital6010005","relation":{},"ISSN":["2673-6470"],"issn-type":[{"value":"2673-6470","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,18]]}}}