{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,29]],"date-time":"2026-01-29T21:47:08Z","timestamp":1769723228983,"version":"3.49.0"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,19]]},"abstract":"<jats:p>Exploratory testing (ET) harnesses tester's knowledge, creativity, and experience to create varying tests that uncover unexpected bugs from the end-user's perspective. Although ET has proven effective in system-level testing of interactive systems, the need for manual execution has hindered large-scale adoption. In this work, we explore the feasibility, challenges and road ahead of automated scenario-based ET (a.k.a soap opera testing). We conduct a formative study, identifying key insights for effective manual soap opera testing and challenges in automating the process. We then develop a multi-agent system leveraging LLMs and a Scenario Knowledge Graph (SKG) to automate soap opera testing. The system consists of three multi-modal agents, Planner, Player, and Detector that collaborate to execute tests and identify potential bugs. Experimental results demonstrate the potential of automated soap opera testing, but there remains a significant gap compared to manual execution, especially under-explored scenario boundaries and incorrectly identified bugs. Based on the observation, we envision road ahead for the future of automated soap opera testing, focusing on three key aspects: the synergy of neural and symbolic approaches, human-AI co-learning, and the integration of soap opera testing with broader software engineering practices. These insights aim to guide and inspire the future research.<\/jats:p>","DOI":"10.1145\/3715752","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:15:34Z","timestamp":1750346134000},"page":"757-778","source":"Crossref","is-referenced-by-count":1,"title":["Automated Soap Opera Testing Directed by LLMs and Scenario Knowledge: Feasibility, Challenges, and Road Ahead"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2410-3229","authenticated-orcid":false,"given":"Yanqi","family":"Su","sequence":"first","affiliation":[{"name":"Australian National University, Canberra, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7663-1421","authenticated-orcid":false,"given":"Zhenchang","family":"Xing","sequence":"additional","affiliation":[{"name":"CSIRO's Data61, Canberra, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1424-6290","authenticated-orcid":false,"given":"Chong","family":"Wang","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2011-9618","authenticated-orcid":false,"given":"Chunyang","family":"Chen","sequence":"additional","affiliation":[{"name":"TU Munich, Heilbronn, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2273-1862","authenticated-orcid":false,"given":"Sherry (Xiwei)","family":"Xu","sequence":"additional","affiliation":[{"name":"CSIRO's Data61, Sydney, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9466-1672","authenticated-orcid":false,"given":"Qinghua","family":"Lu","sequence":"additional","affiliation":[{"name":"CSIRO, Sydney, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5839-3765","authenticated-orcid":false,"given":"Liming","family":"Zhu","sequence":"additional","affiliation":[{"name":"CSIRO's Data61, Sydney, Australia"},{"name":"UNSW, Sydney, Australia"}]}],"member":"320","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-014-9301-4"},{"key":"e_1_2_1_2_1","first-page":"32","article-title":"Session-based test management","volume":"2","author":"Bach Jonathan","year":"2000","unstructured":"Jonathan Bach. 2000. Session-based test management. Software Testing and Quality Engineering Magazine, 2, 6 (2000), 32\u201337.","journal-title":"Software Testing and Quality Engineering Magazine"},{"key":"e_1_2_1_3_1","unstructured":"James Bach. 2003. Exploratory testing explained. Online: http:\/\/www. satisfice. com\/articles\/et-article. pdf 1\u201310."},{"key":"e_1_2_1_4_1","unstructured":"James Bach. 2004. Exploratory testing. The testing practitioner 253\u2013265."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380328"},{"key":"e_1_2_1_6_1","volume-title":"Using thematic analysis in psychology. Qualitative research in psychology, 3, 2","author":"Braun Virginia","year":"2006","unstructured":"Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology, 3, 2 (2006), 77\u2013101."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TR.2018.2799957"},{"key":"e_1_2_1_8_1","first-page":"30","article-title":"Soap opera testing","volume":"6","author":"Buwalda Hans","year":"2004","unstructured":"Hans Buwalda. 2004. Soap opera testing. Better Software, 6, 2 (2004), 30\u201337.","journal-title":"Better Software"},{"key":"e_1_2_1_9_1","volume-title":"An introduction to scenario testing","author":"Cem Kaner JD","unstructured":"JD Cem Kaner. 2013. An introduction to scenario testing. Florida Institute of Technology, Melbourne, 1\u201313."},{"key":"e_1_2_1_10_1","unstructured":"JD Cem Kaner and James Bach. 2006. The nature of exploratory testing."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3213846.3213869"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00084"},{"key":"e_1_2_1_13_1","unstructured":"Google. 2024. Android Debug Bridge (adb). https:\/\/developer.android.com\/studio\/command-line\/adb"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510454.3517063"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE-Companion52605.2021.00037"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510454.3522684"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-013-9266-8"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2012.55"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISESE.2005.1541817"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510203"},{"key":"e_1_2_1_21_1","volume-title":"A tutorial in exploratory testing. Tutorial presented at QUEST2008.(Available online at: http:\/\/www. kaner. com\/pdfs\/QAIExploring. pdf, accessed","author":"Kaner Cem","year":"2014","unstructured":"Cem Kaner. 2008. A tutorial in exploratory testing. Tutorial presented at QUEST2008.(Available online at: http:\/\/www. kaner. com\/pdfs\/QAIExploring. pdf, accessed: 26 Jan 2014)."},{"key":"e_1_2_1_22_1","volume-title":"Testing computer software","author":"Kaner Cem","unstructured":"Cem Kaner, Jack Falk, and Hung Q Nguyen. 1999. Testing computer software. John Wiley & Sons."},{"key":"e_1_2_1_23_1","first-page":"9459","article-title":"Retrieval-augmented generation for knowledge-intensive nlp tasks","volume":"33","author":"Lewis Patrick","year":"2020","unstructured":"Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K\u00fcttler, Mike Lewis, Wen-tau Yih, and Tim Rockt\u00e4schel. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33 (2020), 9459\u20139474.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377812.3390906"},{"key":"e_1_2_1_25_1","unstructured":"Rensis Likert. 1932. A technique for the measurement of attitudes.. Archives of psychology."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00119"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639118"},{"key":"e_1_2_1_28_1","volume-title":"Adventures in session-based testing","author":"Lyndsay James","year":"2003","unstructured":"James Lyndsay and Neil Van Eeden. 2003. Adventures in session-based testing. Workroom Productions Ltd. May, 27 (2003)."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2020.110890"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00205"},{"key":"e_1_2_1_31_1","volume-title":"2023 IEEE Symposium on Security and Privacy (SP). 2339\u20132356","author":"Pearce Hammond","year":"2023","unstructured":"Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt. 2023. Examining zero-shot vulnerability repair with large language models. In 2023 IEEE Symposium on Security and Privacy (SP). 2339\u20132356."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2652524.2652531"},{"key":"e_1_2_1_33_1","volume-title":"Sruti Srinivasa Ragavan, and Ben Zorn","author":"Sarkar Advait","year":"2022","unstructured":"Advait Sarkar, Andrew D Gordon, Carina Negreanu, Christian Poelitz, Sruti Srinivasa Ragavan, and Ben Zorn. 2022. What is it like to program with artificial intelligence? arXiv preprint arXiv:2208.06213."},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 46th International Conference on Software Engineering (ICSE\u201924)","author":"Sidong Feng","year":"2024","unstructured":"Feng Sidong and Chen Chunyang. 2024. Prompting Is All Your Need: Automated Android Bug Replay with Large Language Models. In Proceedings of the 46th International Conference on Software Engineering (ICSE\u201924)."},{"key":"e_1_2_1_35_1","volume-title":"Elements of survey sampling. 15","author":"Singh Ravindra","unstructured":"Ravindra Singh and Naurang Singh Mangat. 2013. Elements of survey sampling. 15, Springer Science & Business Media."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3556967"},{"key":"e_1_2_1_37_1","volume-title":"SoapOperaTG: A Tool for System Knowledge Graph Based Soap Opera Test Generation. In 2023 IEEE\/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 51\u201354","author":"Su Yanqi","year":"2023","unstructured":"Yanqi Su, Zheming Han, Zhenchang Xing, Xiwei Xu, Liming Zhu, and Qinghua Lu. 2023. SoapOperaTG: A Tool for System Knowledge Graph Based Soap Opera Test Generation. In 2023 IEEE\/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 51\u201354."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639157"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642777"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380349"},{"key":"e_1_2_1_41_1","volume-title":"Software quality and software testing in internet times","author":"V\u00e5ga Jarle","unstructured":"Jarle V\u00e5ga and St\u00e5le Amland. 2002. Managing high-speed web testing. In Software quality and software testing in internet times. Springer, 23\u201330."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3580895"},{"key":"e_1_2_1_43_1","volume-title":"Chi, Quoc V Le, and Denny Zhou","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35 (2022), 24824\u201324837."},{"key":"e_1_2_1_44_1","volume-title":"Shiqi Jiang, Yunhao Liu, Yaqin Zhang, and Yunxin Liu.","author":"Wen Hao","year":"2023","unstructured":"Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, and Yunxin Liu. 2023. Empowering llm to use smartphone for intelligent task automation. arXiv e-prints, arXiv\u20132308."},{"key":"e_1_2_1_45_1","unstructured":"James A Whittaker. 2009. Exploratory software testing: tips tricks tours and techniques to guide test design. Pearson Education."},{"key":"e_1_2_1_46_1","first-page":"90","article-title":"Applying session-based testing to medical software","volume":"25","author":"Wood Bill","year":"2003","unstructured":"Bill Wood and David James. 2003. Applying session-based testing to medical software. Medical Device and Diagnostic Industry, 25, 5 (2003), 90\u2013103.","journal-title":"Medical Device and Diagnostic Industry"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597926.3598138"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2024.3414672"},{"key":"e_1_2_1_49_1","unstructured":"Zhizheng Zhang Xiaoyi Zhang Wenxuan Xie and Yan Lu. 2023. Responsible task automation: Empowering large language models as responsible task automators. arXiv preprint arXiv:2306.01242."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2019.00030"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3715752","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:24:54Z","timestamp":1750346694000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715752"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":50,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2025,6,19]]}},"alternative-id":["10.1145\/3715752"],"URL":"https:\/\/doi.org\/10.1145\/3715752","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,19]]}}}