{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T16:08:55Z","timestamp":1774627735125,"version":"3.50.1"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,19]]},"abstract":"<jats:p>With the rapid development of Large Language Models (LLMs), their integration into automated mobile GUI testing has emerged as a promising research direction. However, existing LLM-based testing approaches face significant challenges, including time inefficiency and high costs due to constant LLM querying. To address these issues, this paper introduces LLMDroid, a novel testing framework designed to enhance existing automated mobile GUI testing tools by leveraging LLMs more efficiently. The workflow of LLMDroid comprises two main stages: Autonomous Exploration and LLM Guidance. During Autonomous Exploration, LLMDroid utilizes existing testing tools while leveraging LLMs to summarize explored pages. When code coverage growth slows, it transitions to LLM Guidance to strategically direct testing towards unexplored functionalities. This approach minimizes LLM interactions while maximizing their impact on test coverage. We applied LLMDroid to three popular open-source Android testing tools and evaluated it on 14 top-listed apps from Google Play. Results demonstrate an average increase of 26.16% in code coverage and 29.31% in activity coverage. Furthermore, our evaluation under different LLMs reveals that LLMDroid outperform existing step-wise approaches with significant cost efficiency, achieving optimal performance at $0.49 per hour using GPT-4o among tested models, with a cost-effective alternative achieving 94% of this performance at just $0.03 per hour. These findings highlight LLMDroid\u2019s effectiveness in enhancing automated mobile app testing and its potential for widespread adoption.<\/jats:p>","DOI":"10.1145\/3715763","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:15:34Z","timestamp":1750346134000},"page":"1001-1022","source":"Crossref","is-referenced-by-count":8,"title":["LLMDroid: Enhancing Automated Mobile App GUI Testing Coverage with Large Language Model Guidance"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-8764-7230","authenticated-orcid":false,"given":"Chenxu","family":"Wang","sequence":"first","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5216-933X","authenticated-orcid":false,"given":"Tianming","family":"Liu","sequence":"additional","affiliation":[{"name":"Monash University, Melbourne, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8793-5367","authenticated-orcid":false,"given":"Yanjie","family":"Zhao","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3772-4240","authenticated-orcid":false,"given":"Minghui","family":"Yang","sequence":"additional","affiliation":[{"name":"OPPO, Dongguan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1100-8633","authenticated-orcid":false,"given":"Haoyu","family":"Wang","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}]}],"member":"320","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2024. AccessibilityService | Android Developers. https:\/\/developer.android.com\/reference\/android\/accessibilityservice\/AccessibilityService"},{"key":"e_1_2_1_2_1","unstructured":"2024. bytedance\/Fastbot_Android: Fastbot(2.0) is a model-based testing tool for modeling GUI transitions to discover app stability problems. https:\/\/github.com\/bytedance\/Fastbot_Android"},{"key":"e_1_2_1_3_1","unstructured":"2024. Dice-S\u00f8rensen coefficient. https:\/\/en.wikipedia.org\/wiki\/Dice-S%C3%B8rensen_coefficient"},{"key":"e_1_2_1_4_1","unstructured":"2024. Fragments | Android Developers. https:\/\/developer.android.com\/guide\/fragments"},{"key":"e_1_2_1_5_1","unstructured":"2024. GPT-3.5 | OpenAI. https:\/\/platform.openai.com\/docs\/models\/gpt-3-5-turbo"},{"key":"e_1_2_1_6_1","unstructured":"2024. GPT-4o | OpenAI. https:\/\/openai.com\/index\/hello-gpt-4o\/"},{"key":"e_1_2_1_7_1","unstructured":"2024. GPT-4o mini | OpenAI. https:\/\/openai.com\/index\/gpt-4o-mini-advancing-cost-efficient-intelligence\/"},{"key":"e_1_2_1_8_1","unstructured":"2024. Java Code Coverage Library. https:\/\/github.com\/jacoco\/jacoco"},{"key":"e_1_2_1_9_1","unstructured":"2024. UI\/Application Exerciser Monkey | Android Developers. https:\/\/developer.android.com\/studio\/test\/other-testing-tools\/monkey"},{"key":"e_1_2_1_10_1","unstructured":"2025. coinse\/droidagent: DroidAgent: Intent-Driven Mobile GUI Testing with Autonomous LLM Agents. https:\/\/github.com\/coinse\/droidagent"},{"key":"e_1_2_1_11_1","unstructured":"2025. DeepSeek. https:\/\/www.deepseek.com\/"},{"key":"e_1_2_1_12_1","unstructured":"2025. Llama. https:\/\/www.llama.com\/"},{"key":"e_1_2_1_13_1","unstructured":"2025. Pricing - OpenAI API. https:\/\/platform.openai.com\/docs\/pricing"},{"key":"e_1_2_1_14_1","unstructured":"2025. testinging6\/GPTDroid. https:\/\/github.com\/testinging6\/GPTDroid"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ms.2014.55"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ase.2015.89"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","unstructured":"Chenhui Cui Tao Li Junjie Wang Chunyang Chen Dave Towey and Rubing Huang. 2024. Large Language Models for Mobile GUI Text Input Generation: An Empirical Study. arXiv preprint arXiv:2404.08948 https:\/\/doi.org\/10.48550\/arXiv.2404.08948 10.48550\/arXiv.2404.08948","DOI":"10.48550\/arXiv.2404.08948"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380402"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3608137"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/icse48619.2023.00084"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/icse.2019.00042"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3695988"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","unstructured":"Han Hu Han Wang Ruiqi Dong Xiao Chen and Chunyang Chen. 2024. Enhancing GUI Exploration Coverage of Android Apps with Deep Link-Integrated Monkey. ACM Transactions on Software Engineering and Methodology https:\/\/doi.org\/10.1145\/3664810 10.1145\/3664810","DOI":"10.1145\/3664810"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","unstructured":"Yongxiang Hu Xuan Wang Yingchuan Wang Yu Zhang Shiyu Guo Chaoyi Chen Xin Wang and Yangfan Zhou. 2024. AUITestAgent: Automatic Requirements Oriented GUI Function Testing. arXiv preprint arXiv:2407.09018 https:\/\/doi.org\/10.48550\/arXiv.2407.09018 10.48550\/arXiv.2407.09018","DOI":"10.48550\/arXiv.2407.09018"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3623298"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","unstructured":"Bangyan Ju Jin Yang Tingting Yu Tamerlan Abdullayev Yuanyuan Wu Dingbang Wang and Yu Zhao. 2024. A Study of Using Multimodal LLMs for Non-Crash Functional Bug Detection in Android Apps. arXiv preprint arXiv:2407.19053 https:\/\/doi.org\/10.48550\/arXiv.2407.19053 10.48550\/arXiv.2407.19053","DOI":"10.48550\/arXiv.2407.19053"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/tr.2018.2865733"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3623344"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3691620.3695476"},{"key":"e_1_2_1_30_1","volume-title":"International Conference on Machine Learning. 18893\u201318912","author":"Lee Kenton","year":"2023","unstructured":"Kenton Lee, Mandar Joshi, Iulia Raluca Turc, Hexiang Hu, Fangyu Liu, Julian Martin Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, and Kristina Toutanova. 2023. Pix2struct: Screenshot parsing as pretraining for visual language understanding. In International Conference on Machine Learning. 18893\u201318912."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2312.03003"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/icse-c.2017.8"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ase.2019.00104"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/icse48619.2023.00119"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639180"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","unstructured":"Zhe Liu Cheng Li Chunyang Chen Junjie Wang Boyu Wu Yawen Wang Jun Hu and Qing Wang. 2024. Vision-driven Automated Mobile GUI Testing via Multimodal Large Language Model. arXiv preprint arXiv:2407.03037 https:\/\/doi.org\/10.48550\/arXiv.2407.03037 10.48550\/arXiv.2407.03037","DOI":"10.48550\/arXiv.2407.03037"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3559505"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3395363.3397354"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3663529.3663806"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3106237.3106298"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3468620"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1925805.1925818"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3580895"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3650212.3680341"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3636534.3649379"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2304.07061"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","unstructured":"An Yan Zhengyuan Yang Wanrong Zhu Kevin Lin Linjie Li Jianfeng Wang Jianwei Yang Yiwu Zhong Julian McAuley and Jianfeng Gao. 2023. Gpt-4v in wonderland: Large multimodal models for zero-shot smartphone gui navigation. arXiv preprint arXiv:2311.07562 https:\/\/doi.org\/10.48550\/arXiv.2311.07562 10.48550\/arXiv.2311.07562","DOI":"10.48550\/arXiv.2311.07562"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380347"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2312.13771"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ase51524.2021.9678778"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICST60714.2024.00020"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3715763","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:29:31Z","timestamp":1750346971000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715763"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":51,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2025,6,19]]}},"alternative-id":["10.1145\/3715763"],"URL":"https:\/\/doi.org\/10.1145\/3715763","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,19]]}}}