{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T22:59:17Z","timestamp":1775084357135,"version":"3.50.1"},"publisher-location":"Cham","reference-count":19,"publisher":"Springer Nature Switzerland","isbn-type":[{"value":"9783032223746","type":"print"},{"value":"9783032223753","type":"electronic"}],"license":[{"start":{"date-parts":[[2026,1,1]],"date-time":"2026-01-01T00:00:00Z","timestamp":1767225600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T00:00:00Z","timestamp":1775001600000},"content-version":"vor","delay-in-days":90,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    <jats:bold>Background:<\/jats:bold>\n                    Ensuring the quality of user stories is vital to Agile Software Development. Rule-based tools like AQUSA, based on the Quality User Story (QUS) framework, offer reliable structural checks but struggle with context-sensitive or pragmatic issues. Large Language Models (LLMs) have emerged as potential alternatives, yet prior studies often rely on small datasets, older models, or lack direct comparison with rule-based baselines.\n                    <jats:bold>Objective:<\/jats:bold>\n                    This study aims to assess the effectiveness of modern LLMs relative to a rule-based tool (AQUSA) for detecting defects in user stories, considering both structural and contextual dimensions.\n                    <jats:bold>Method:<\/jats:bold>\n                    We conduct a large-scale comparative evaluation involving AQUSA and three GPT-family LLMs (GPT-5, GPT-5-mini, and GPT-4), using 182 user stories drawn from three industrial datasets. We apply both quantitative metrics (precision, recall, F1-score) and qualitative analysis of feedback clarity and defect relevance.\n                    <jats:bold>Results:<\/jats:bold>\n                    GPT-5-mini achieved the highest recall (0.81) and overall F1-score (0.62), while AQUSA attained the highest precision (0.61) with significantly fewer false positives. GPT-5 showed high hallucination rates and instability; GPT-4 was overly conservative, leading to under-detection of defects.\n                    <jats:bold>Conclusion:<\/jats:bold>\n                    Neither rule-based nor GPT-family LLM-based approaches suffice in isolation. Rule-based tools enforce structural rigor, while LLMs capture nuanced linguistic and pragmatic flaws. We advocate a hybrid \u201cDual-gate\u201d strategy\u2014using AQUSA for structural validation followed by lightweight LLMs for contextual refinement\u2014to improve the reliability and scalability of user story quality assessment in agile environments.\n                  <\/jats:p>","DOI":"10.1007\/978-3-032-22375-3_10","type":"book-chapter","created":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T04:14:18Z","timestamp":1774930458000},"page":"155-174","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Evaluating the\u00a0Quality of\u00a0User Stories: An Extended Comparative Study of\u00a0Multiple LLMs and\u00a0Rule-Based Tools"],"prefix":"10.1007","author":[{"given":"Izabella","family":"Silva","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0008-5558-429X","authenticated-orcid":false,"given":"Jo\u00e3o","family":"Paiva","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9433-4962","authenticated-orcid":false,"given":"Mirko","family":"Perkusich","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5515-7812","authenticated-orcid":false,"given":"Danyllo","family":"Albuquerque","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8204-8731","authenticated-orcid":false,"given":"Emanuel","family":"Dantas","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9796-1382","authenticated-orcid":false,"given":"Kyller","family":"Gorg\u00f4nio","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7377-1258","authenticated-orcid":false,"given":"Angelo","family":"Perkusich","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2026,4,1]]},"reference":[{"key":"10_CR1","doi-asserted-by":"crossref","unstructured":"Arora, C., Grundy, J., Abdelrazek, M.: Advancing requirements engineering through generative ai: assessing the role of llms. In: Generative AI for Effective Software Development, pp. 129\u2013148. Springer, Heidelberg (2024)","DOI":"10.1007\/978-3-031-55642-5_6"},{"key":"10_CR2","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1016\/j.infsof.2015.01.004","volume":"61","author":"M Brhel","year":"2015","unstructured":"Brhel, M., Meth, H., Maedche, A., Werder, K.: Exploring principles of user-centered agile software development: a literature review. Inf. Softw. Technol. 61, 163\u2013181 (2015)","journal-title":"Inf. Softw. Technol."},{"key":"10_CR3","unstructured":"Endres, M., Fakhoury, S., Chakraborty, S., Lahiri, S.K.: Formalizing natural language intent into program specifications via large language models. arXiv preprint arXiv:2310.01831 (2023)"},{"key":"10_CR4","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2020.110851","volume":"172","author":"R Kasauli","year":"2021","unstructured":"Kasauli, R., Knauss, E., Horkoff, J., Liebel, G., de Oliveira Neto, F.G.: Requirements engineering challenges and practices in large-scale agile system development. J. Syst. Softw. 172, 110851 (2021)","journal-title":"J. Syst. Softw."},{"key":"10_CR5","unstructured":"Leffingwell, D.: Agile Software Requirements: Lean Requirements Practices for Teams, Programs, and the Enterprise. Addison-Wesley Professional, Boston (2010)"},{"key":"10_CR6","doi-asserted-by":"publisher","first-page":"383","DOI":"10.1007\/s00766-016-0250-x","volume":"21","author":"G Lucassen","year":"2016","unstructured":"Lucassen, G., Dalpiaz, F., van der Werf, J.M.E., Brinkkemper, S.: Improving agile requirements: the quality user story framework and tool. Requir. Eng. 21, 383\u2013403 (2016)","journal-title":"Requir. Eng."},{"key":"10_CR7","doi-asserted-by":"crossref","unstructured":"Ma, L., Liu, S., Li, Y., Xie, X., Bu, L.: Specgen: automated generation of formal program specifications via large language models. arXiv preprint arXiv:2401.08807 (2024)","DOI":"10.1109\/ICSE55347.2025.00129"},{"issue":"4","key":"10_CR8","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1049\/iet-sen.2017.0144","volume":"12","author":"C Pacheco","year":"2018","unstructured":"Pacheco, C., Garc\u00eda, I., Reyes, M.: Requirements elicitation techniques: a systematic literature review based on the maturity of the techniques. IET Softw. 12(4), 365\u2013378 (2018)","journal-title":"IET Softw."},{"key":"10_CR9","doi-asserted-by":"crossref","unstructured":"Perkusich, M., Silva, I., Albuquerque, D., Gorg\u00f4nio, K., Perkusich, A.: Evaluating the quality of user stories: a comparative study of large language models and rule-based tool. In: 2025 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), pp.\u00a01\u20136. IEEE (2025)","DOI":"10.23919\/SoftCOM66362.2025.11197422"},{"key":"10_CR10","doi-asserted-by":"crossref","unstructured":"Perkusich, M., et\u00a0al.: Intelligent software engineering in the context of agile software development: a systematic literature review. Inf. Softw. Technol. 119, 106241 (2020)","DOI":"10.1016\/j.infsof.2019.106241"},{"key":"10_CR11","unstructured":"Poudel, A., Lin, J., Cleland-Huang, J.: Leveraging transformer-based language models to automate requirements satisfaction assessment. arXiv preprint arXiv:2312.04463 (2023)"},{"key":"10_CR12","doi-asserted-by":"crossref","unstructured":"Robeer, M., Lucassen, G., van\u00a0der Werf, J.M.E., Dalpiaz, F., Brinkkemper, S.: Automated extraction of conceptual models from user stories via nlp. In: 2016 IEEE 24th International Requirements Engineering Conference (RE), pp. 196\u2013205. IEEE (2016)","DOI":"10.1109\/RE.2016.40"},{"key":"10_CR13","doi-asserted-by":"crossref","unstructured":"Ronanki, K., Cabrero-Daniel, B., Berger, C.: Chatgpt as a tool for user story quality evaluation: trustworthy out of the box? In: International Conference on Agile Software Development, pp. 173\u2013181. Springer, Heidelberg (2022)","DOI":"10.1007\/978-3-031-48550-3_17"},{"key":"10_CR14","doi-asserted-by":"crossref","unstructured":"Sharma, A., Kumar\u00a0Tripathi, A.: Evaluating user story quality with llms: a comparative study. J. Intell. Inf. Syst. 1\u201329 (2025)","DOI":"10.1007\/s10844-025-00939-3"},{"key":"10_CR15","doi-asserted-by":"crossref","unstructured":"Vogelsang, A., Fischbach, J.: Using large language models for natural language processing tasks in requirements engineering: a systematic guideline. In: Handbook on Natural Language Processing for Requirements Engineering, pp. 435\u2013456. Springer, Heidelberg (2025)","DOI":"10.1007\/978-3-031-73143-3_16"},{"key":"10_CR16","doi-asserted-by":"crossref","unstructured":"White, J., Hays, S., Fu, Q., Spencer-Smith, J., Schmidt, D.C.: Chatgpt prompt patterns for improving code quality, refactoring, requirements elicitation, and software design. In: Generative AI for Effective Software Development, pp. 71\u2013108. Springer, Heidelberg (2024)","DOI":"10.1007\/978-3-031-55642-5_4"},{"key":"10_CR17","doi-asserted-by":"crossref","unstructured":"Wohlin, C., et\u00a0al.: Experimentation in Software Engineering, vol.\u00a0236. Springer, Heidelberg (2012)","DOI":"10.1007\/978-3-642-29044-2"},{"key":"10_CR18","doi-asserted-by":"crossref","unstructured":"Yamani, A., Baslyman, M., Ahmed, M.: Leveraging llms for user stories in AI systems: Ustai dataset. arXiv preprint arXiv:2504.00513 (2025)","DOI":"10.1145\/3727582.3728689"},{"key":"10_CR19","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Rayhan, M., Herda, T., Goisauf, M., Abrahamsson, P.: Llm-based agents for automating the enhancement of user story quality: an early report. In: International Conference on Agile Software Development, pp. 117\u2013126. Springer, Cham (2024)","DOI":"10.1007\/978-3-031-61154-4_8"}],"container-title":["Lecture Notes in Business Information Processing","Agile Processes in Software Engineering and Extreme Programming"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-032-22375-3_10","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T22:06:51Z","timestamp":1775081211000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-032-22375-3_10"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026]]},"ISBN":["9783032223746","9783032223753"],"references-count":19,"URL":"https:\/\/doi.org\/10.1007\/978-3-032-22375-3_10","relation":{},"ISSN":["1865-1348","1865-1356"],"issn-type":[{"value":"1865-1348","type":"print"},{"value":"1865-1356","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026]]},"assertion":[{"value":"1 April 2026","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"XP","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Conference on Agile Software Development","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"S\u00e3o Paulo","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Brazil","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2026","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"8 April 2026","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"11 April 2026","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"27","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"xpu2026","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/agilealliance.org\/event\/xp-2026\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}