{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T19:34:47Z","timestamp":1774380887769,"version":"3.50.1"},"reference-count":83,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2025,5,17]],"date-time":"2025-05-17T00:00:00Z","timestamp":1747440000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,5,17]],"date-time":"2025-05-17T00:00:00Z","timestamp":1747440000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100011914","name":"M\u00e4lardalen University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100011914","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Software Qual J"],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Text mining techniques, particularly those leveraging machine learning for natural language processing, have gained significant attention for qualitative data analysis in software testing. However, their complexity and lack of transparency can pose challenges, especially in safety-critical domains where simpler, interpretable solutions are often preferred unless accuracy is heavily compromised. This study investigates the trade-offs between complexity, effort, accuracy, and utility in text mining and clustering techniques, focusing on their application for detecting functional dependencies among manual integration test cases in safety-critical systems. Using empirical data from an industrial testing project at ALSTOM Sweden, we evaluate various string distance methods, NCD compressors, and machine learning approaches. The results highlight the impact of preprocessing techniques, such as tokenization, and intrinsic factors, such as text length, on algorithm performance. Findings demonstrate how text mining and clustering can be optimized for safety-critical contexts, offering actionable insights for researchers and practitioners aiming to balance simplicity and effectiveness in their testing workflows.<\/jats:p>","DOI":"10.1007\/s11219-025-09722-7","type":"journal-article","created":{"date-parts":[[2025,5,17]],"date-time":"2025-05-17T03:23:08Z","timestamp":1747452188000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Comparative analysis of text mining and clustering techniques for assessing functional dependency between manual test cases"],"prefix":"10.1007","volume":"33","author":[{"given":"Sahar","family":"Tahvili","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Leo","family":"Hatvani","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael","family":"Felderer","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Francisco\u00a0Gomes","family":"de\u00a0Oliveira\u00a0Neto","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wasif","family":"Afzal","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Robert","family":"Feldt","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,5,17]]},"reference":[{"key":"9722_CR1","doi-asserted-by":"publisher","first-page":"152","DOI":"10.1016\/j.procs.2016.03.020","volume":"79","author":"A Ansari","year":"2016","unstructured":"Ansari, A., Khan, A., Khan, A., & Mukadam, K. (2016). Optimized regression test using test case prioritization. Procedia Computer Science, 79, 152\u2013160.","journal-title":"Procedia Computer Science"},{"key":"9722_CR2","doi-asserted-by":"crossref","unstructured":"Arlt, S., Morciniec, T., Podelski, A., & Wagner, S. (2015). If a fails, can b still succeed? Inferring dependencies between test results in automotive system testing. In The IEEE 8th international conference on software testing, verification and validation","DOI":"10.1109\/ICST.2015.7102593"},{"key":"9722_CR3","doi-asserted-by":"crossref","unstructured":"Beniwal, R., Jain, M., & Gupta, Y. (2021). Opinion mining to aid user acceptance testing for open beta versions. In P. Bansal, M. Tushir, V. E. Balas, & R. Srivastava (Eds.), Proceedings of international conference on artificial intelligence and applications. Singapore: Springer","DOI":"10.1007\/978-981-15-4992-2_28"},{"issue":"2","key":"9722_CR4","first-page":"290","volume":"26","author":"M Bergener","year":"1976","unstructured":"Bergener, M., Escher, H., & Linden, K. (1976). multidimensional diagnostics in pharmacopsychiatry-results of therapy with desmethyl-loxapine (author\u2019s transl). Arzneimittel-Forschung, 26(2), 290\u2013299.","journal-title":"Arzneimittel-Forschung"},{"key":"9722_CR5","unstructured":"Black, P. E. (2004). Ratcliff\/obershelp pattern recognition. Dictionary of algorithms and data structures [online]"},{"issue":"10","key":"9722_CR6","doi-asserted-by":"publisher","first-page":"10008","DOI":"10.1088\/1742-5468\/2008\/10\/P10008","volume":"2008","author":"VD Blondel","year":"2008","unstructured":"Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), 10008.","journal-title":"Journal of Statistical Mechanics: Theory and Experiment"},{"issue":"2","key":"9722_CR7","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1214\/12-STS406","volume":"28","author":"MGB Blum","year":"2013","unstructured":"Blum, M. G. B., Nunes, M. A., Prangle, D., & Sisson, S. A. (2013). A comparative review of dimension reduction methods in approximate Bayesian computation. Statistical Science, 28(2), 189\u2013208.","journal-title":"Statistical Science"},{"key":"9722_CR8","doi-asserted-by":"crossref","unstructured":"Brusco, M., Steinley, D., Stevens, J., & Cradit, D. (2017). Affinity propagation: An exemplar-based tool for clustering in psychological research. British Journal of Mathematical and Statistical Psychology,72.","DOI":"10.1111\/bmsp.12136"},{"issue":"5","key":"9722_CR9","doi-asserted-by":"publisher","first-page":"1843","DOI":"10.1007\/s10664-015-9402-8","volume":"21","author":"T-H Chen","year":"2016","unstructured":"Chen, T.-H., Thomas, S. W., & Hassan, A. E. (2016). A survey on the use of topic models when mining software repositories. Empirical Software Engineering, 21(5), 1843\u20131919.","journal-title":"Empirical Software Engineering"},{"issue":"4","key":"9722_CR10","doi-asserted-by":"publisher","first-page":"1523","DOI":"10.1109\/TIT.2005.844059","volume":"51","author":"R Cilibrasi","year":"2005","unstructured":"Cilibrasi, R., & Vit\u00e1nyi, P. M. (2005). Clustering by compression. IEEE Transactions on Information theory, 51(4), 1523\u20131545.","journal-title":"IEEE Transactions on Information theory"},{"key":"9722_CR11","unstructured":"Cohen, W. W., Ravikumar, P., & Fienberg, S. E. (2003). A comparison of string distance metrics for name-matching tasks. In Proceedings of the 2003 international conference on information integration on the web. IIWEB\u201903, pp. 73\u201378. AAAI Press, Acapulco, Mexico"},{"key":"9722_CR12","doi-asserted-by":"crossref","unstructured":"Cohen-addad, V., Kanade, V., Mallmann-trenn, F., & Mathieu, C. (2019). Hierarchical clustering: Objective functions and algorithms. Journal of the ACM,66(4).","DOI":"10.1145\/3321386"},{"key":"9722_CR13","doi-asserted-by":"crossref","unstructured":"Dalirsefat, S. B., Silva Meyer, A., & Mirhoseini, S. Z. (2009). Comparison of similarity coefficients used for cluster analysis with amplified fragment length polymorphism markers in the silkworm, Bombyx mori. Journal of Insect Science,9(1).","DOI":"10.1673\/031.009.7101"},{"key":"9722_CR14","first-page":"167","volume":"3","author":"S Dang","year":"2015","unstructured":"Dang, S. (2015). Performance evaluation of clustering algorithm using different datasets. IJARCSMS, 3, 167\u2013173.","journal-title":"IJARCSMS"},{"key":"9722_CR15","doi-asserted-by":"crossref","unstructured":"Deutsch, P. (1996). RFC1951: Deflate compressed data format specification version 1.3. RFC Editor, USA","DOI":"10.17487\/rfc1951"},{"key":"9722_CR16","unstructured":"Devlin, J., Chang, M. -W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Human language technologies, volume 1 (Long and Short Papers) (pp. 4171\u20134186)."},{"issue":"3","key":"9722_CR17","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1177\/1473871620909485","volume":"19","author":"M Espadoto","year":"2020","unstructured":"Espadoto, M., Hirata, N. S. T., & Telea, A. C. (2020). Deep learning multidimensional projections. Information Visualization, 19(3), 247\u2013269.","journal-title":"Information Visualization"},{"key":"9722_CR18","unstructured":"Ester, M., Kriegel, H. -P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the second international conference on knowledge discovery and data mining. KDD\u201996, pp. 226\u2013231. Portland, Oregon: AAAI Press."},{"key":"9722_CR19","doi-asserted-by":"crossref","unstructured":"Felderer, M., Enoiu, E. P., & Tahvili, S. (2023). Artificial intelligence techniques in system testing. In J. R.\u00a0Romero, & F. C. Inmaculada\u00a0Medina-Bulo (Eds.), Optimising the software development process with artificial intelligence. Singapore: Springer","DOI":"10.1007\/978-981-19-9948-2_8"},{"key":"9722_CR20","doi-asserted-by":"crossref","unstructured":"Feldt, R., Poulding, S., Clark, D., & Yoo, S. (2016a). Test set diameter: Quantifying the diversity of sets of test cases. In 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST) (pp. 223\u2013233)","DOI":"10.1109\/ICST.2016.33"},{"key":"9722_CR21","doi-asserted-by":"crossref","unstructured":"Feldt, R., Poulding, S., Clark, D., & Yoo, S. (2016b). Test set diameter: Quantifying the diversity of sets of test cases. In 2016 IEEE International conference on software testing, verification and validation","DOI":"10.1109\/ICST.2016.33"},{"key":"9722_CR22","doi-asserted-by":"crossref","unstructured":"Feldt, R., Torkar, R., Gorschek, T., & Afzal, W. (2008a). Searching for cognitively diverse tests: Towards universal test diversity metrics. In 2008 IEEE international conference on software testing verification and validation workshop (pp. 178\u2013186). IEEE","DOI":"10.1109\/ICSTW.2008.36"},{"key":"9722_CR23","doi-asserted-by":"crossref","unstructured":"Feldt, R., Torkar, R., Gorschek, T., & Afzal, W. (2008b). Searching for cognitively diverse tests: Towards universal test diversity metrics. In 2008 IEEE international conference on software testing verification and validation workshop (pp. 178\u2013186).","DOI":"10.1109\/ICSTW.2008.36"},{"key":"9722_CR24","doi-asserted-by":"crossref","unstructured":"Fischbach, J., Vogelsang, A., Spies, D., Wehrle, A., Junker, M., & Freudenstein, D. (2020). SPECMATE: Automated creation of test cases from acceptance criteria. In 2020 IEEE 13th International conference on software testing, validation and verification","DOI":"10.1109\/ICST46399.2020.00040"},{"issue":"5814","key":"9722_CR25","doi-asserted-by":"publisher","first-page":"972","DOI":"10.1126\/science.1136800","volume":"315","author":"BJ Frey","year":"2007","unstructured":"Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972\u20136.","journal-title":"Science"},{"issue":"7","key":"9722_CR26","doi-asserted-by":"publisher","first-page":"1109","DOI":"10.1016\/S0031-3203(96)00140-9","volume":"30","author":"H Frigui","year":"1997","unstructured":"Frigui, H., & Krishnapuram, R. (1997). Clustering by competitive agglomeration. Pattern Recognition, 30(7), 1109\u20131119.","journal-title":"Pattern Recognition"},{"key":"9722_CR27","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2020.106321","volume":"126","author":"V Garousi","year":"2020","unstructured":"Garousi, V., Bauer, S., & Felderer, M. (2020). Nlp-assisted software testing: A systematic mapping of the literature. Information and Software Technology, 126, Article 106321.","journal-title":"Information and Software Technology"},{"key":"9722_CR28","volume-title":"Objects, models, components, patterns","author":"M Greiler","year":"2012","unstructured":"Greiler, M., Deursen, A., & Zaidman, A. (2012). Measuring test case similarity to support test suite understanding. In C. A. Furia & S. Nanz (Eds.), Objects, models, components, patterns. Berlin, Heidelberg: Springer."},{"key":"9722_CR29","unstructured":"Guan, J., Zhu, F., & Bian, F. (2006). Scalable and visualization-oriented clustering for exploratory spatial analysis (vol. 2)."},{"key":"9722_CR30","doi-asserted-by":"crossref","unstructured":"Guillot, G., & Rousset, F. (2011). Dismantling mantel tests. Methods in Ecology and Evolution,4.","DOI":"10.1111\/2041-210x.12018"},{"key":"9722_CR31","doi-asserted-by":"crossref","unstructured":"Hatvani, L., & Tahvili, S. (2024). Comparative analysis of text mining and clustering techniques for assessing functional dependency between manual test cases. GitHub","DOI":"10.21203\/rs.3.rs-4014160\/v1"},{"key":"9722_CR32","unstructured":"Ilenic, N. (2017). A PyTorch implementation of Paragraph Vectors (doc2vec). GitHub"},{"issue":"3","key":"9722_CR33","doi-asserted-by":"publisher","first-page":"497","DOI":"10.1145\/990308.990313","volume":"51","author":"R Kannan","year":"2004","unstructured":"Kannan, R., Vempala, S., & Vetta, A. (2004). On clusterings: Good, bad and spectral. Journal of the ACM, 51(3), 497\u2013515.","journal-title":"Journal of the ACM"},{"issue":"2","key":"9722_CR34","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1109\/MSP.2010.940003","volume":"28","author":"S Kaski","year":"2011","unstructured":"Kaski, S., & Peltonen, J. (2011). Dimensionality reduction for data visualization [applications corner]. IEEE Signal Processing Magazine, 28(2), 100\u2013104.","journal-title":"IEEE Signal Processing Magazine"},{"key":"9722_CR35","doi-asserted-by":"crossref","unstructured":"Kempe, D., & McSherry, F. (2004). A decentralized algorithm for spectral analysis. In 36th Annual ACM symposium on theory of computing (pp. 561\u2013568)","DOI":"10.1145\/1007352.1007438"},{"key":"9722_CR36","first-page":"111","volume":"1","author":"S Kotsiantis","year":"2006","unstructured":"Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Data preprocessing for supervised learning. International Journal of Computer Science, 1, 111\u2013117.","journal-title":"International Journal of Computer Science"},{"key":"9722_CR37","unstructured":"Landin, C., Hatvani, L., Tahvili, S., Haggren, H., L\u00e4ngkvist, M., Loutfi, A., & H\u00e5kansson, A. (2020a). Performance comparison of two deep learning algorithms in detecting similarities between manual integration test cases. In The fifteenth international conference on software engineering advances"},{"key":"9722_CR38","doi-asserted-by":"crossref","unstructured":"Landin, C., Tahvili, S., Haggren, H., Muhammad, A., L\u00e4ngkvist, M., & Loutfi, A. (2020b). Cluster-based parallel testing using semantic analysis. In The second IEEE international conference on artificial intelligence testing","DOI":"10.1109\/AITEST49225.2020.00022"},{"key":"9722_CR39","unstructured":"Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In Proc. of the 31st int. conf. on machine learning"},{"issue":"1","key":"9722_CR40","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1007\/s10515-011-0093-0","volume":"19","author":"Y Ledru","year":"2012","unstructured":"Ledru, Y., Petrenko, A., Boroday, S., & Mandran, N. (2012). Prioritizing test cases with string distances. Automated Software Engineering, 19(1), 65\u201395.","journal-title":"Automated Software Engineering"},{"key":"9722_CR41","doi-asserted-by":"crossref","unstructured":"Lin, J., Jabbarvand, R., & Malek, S. (2019). Test transfer across mobile apps through semantic mapping. In 2019 34th IEEE\/ACM International conference on Automated Software Engineering (ASE).","DOI":"10.1109\/ASE.2019.00015"},{"key":"9722_CR42","first-page":"1","volume":"161","author":"A L\u00f6nnfalt","year":"2024","unstructured":"L\u00f6nnfalt, A., Tu, V., Gay, G., Singh, A., & Tahvili, S. (2024). An intelligent test management system for optimizing decision making during software testing. Journal of Systems and Software, 161, 1\u201310.","journal-title":"Journal of Systems and Software"},{"key":"9722_CR43","unstructured":"Maaten, L., Postma, E. O., & Herik, J. (2008).  Dimensionality reduction: A comparative review."},{"key":"9722_CR44","unstructured":"Mahoney, M. (2023). Large text compression benchmark. Webpage. http:\/\/www.mattmahoney.net\/dc\/text.html"},{"key":"9722_CR45","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1016\/j.jss.2018.07.014","volume":"144","author":"M Makki","year":"2018","unstructured":"Makki, M., Van Landuyt, D., Lagaisse, B., & Joosen, W. (2018). A comparative study of workflow customization strategies: Quality implications for multi-tenant saas. Journal of Systems and Software, 144, 423\u2013438.","journal-title":"Journal of Systems and Software"},{"key":"9722_CR46","doi-asserted-by":"crossref","unstructured":"Malik, M. I., Sindhu, M. A., Khattak, A. S., Abbasi, R. A., & Saleem, K. (2020). Automating test oracles from restricted natural language agile requirements. Expert Systems, n\/a(n\/a), 12608","DOI":"10.1111\/exsy.12608"},{"key":"9722_CR47","doi-asserted-by":"publisher","first-page":"355","DOI":"10.1007\/978-981-15-0630-7_35","volume-title":"ICT analysis and applications","author":"SK Mann","year":"2020","unstructured":"Mann, S. K., & Chawla, S. (2020). Clustering based algorithmic design for cab recommender system (crs). In S. Fong, N. Dey, & A. Joshi (Eds.), ICT analysis and applications (pp. 355\u2013363). Singapore: Springer."},{"key":"9722_CR48","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071","volume-title":"Introduction to information retrieval","author":"CD Manning","year":"2008","unstructured":"Manning, C. D., Sch\u00fctze, H., & Raghavan, P. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press."},{"key":"9722_CR49","doi-asserted-by":"crossref","unstructured":"McInnes, L., & Healy, J. (2017). Accelerated hierarchical density based clustering. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW) (pp. 33\u201342).","DOI":"10.1109\/ICDMW.2017.12"},{"key":"9722_CR50","doi-asserted-by":"crossref","unstructured":"Miranda, B., Bertolino, A., & Sabetta, A. (2018). Fast approaches to scalable similarity-based test case prioritization. Proceedings of the ACM\/SIGSOFT International Symposium on Software Testing and Analysis (ISSTA)","DOI":"10.1145\/3180155.3180210"},{"key":"9722_CR51","doi-asserted-by":"crossref","unstructured":"Miranda, B., Cruciani, E., Verdecchia, R., & Bertolino, A. (2018). FAST approaches to scalable similarity-based test case prioritization. In Proceedings of the 40th international conference on software engineering. ICSE \u201918, pp. 222\u2013232. New York, NY, USA: ACM","DOI":"10.1145\/3180155.3180210"},{"key":"9722_CR52","unstructured":"Mohan, V. (2015). Preprocessing techniques for text mining - an overview."},{"key":"9722_CR53","doi-asserted-by":"crossref","unstructured":"Navarro, G., Sutinen, E., & Tarhio, J. (2005). Indexing text with approximate q-grams. Journal of Discrete Algorithms,3(2), 157\u2013175. Combinatorial Pattern Matching (CPM) Special Issue","DOI":"10.1016\/j.jda.2004.08.003"},{"key":"9722_CR54","doi-asserted-by":"crossref","unstructured":"Neto, F. G. D. O., Feldt, R., Erlenhov, L., & Nunes, J. B. D. S. (2018). Visualizing test diversity to support test optimisation. In 2018 25th Asia-Pacific software engineering conference (pp. 149\u2013158). IEEE","DOI":"10.1109\/APSEC.2018.00029"},{"key":"9722_CR55","doi-asserted-by":"crossref","unstructured":"Noor, T. B., & Hemmati, H. (2015). A similarity-based approach for test case prioritization using historical failure data. In 2015 IEEE 26th international symposium on software reliability engineering (pp. 58\u201368).","DOI":"10.1109\/ISSRE.2015.7381799"},{"key":"9722_CR56","doi-asserted-by":"publisher","first-page":"124","DOI":"10.1016\/j.infsof.2016.08.008","volume":"80","author":"FG Oliveira Neto","year":"2016","unstructured":"Oliveira Neto, F. G., Torkar, R., & Machado, P. D. L. (2016). Full modification coverage through automatic similarity-based test case selection. Information and Software Technology, 80, 124\u2013137.","journal-title":"Information and Software Technology"},{"key":"9722_CR57","doi-asserted-by":"crossref","unstructured":"Oliveira\u00a0Neto, F. G., Ahmad, A., Leifler, O., Sandahl, K., & Enoiu, E. (2018). Improving continuous integration with similarity-based test case selection. In Proceedings of the 13th international workshop on automation of software test (pp. 39\u201345). New York, NY, USA: ACM","DOI":"10.1145\/3194733.3194744"},{"key":"9722_CR58","doi-asserted-by":"crossref","unstructured":"Prescott, J., Pennell, M., Best, T., Swanson, M., Haq, F., Jackson, R., & Gurcan, M. (2009). An automated method to segment the femur for osteoarthritis research. Conference proceedings: ... annual international conference of the IEEE engineering in medicine and biology society. IEEE Engineering in Medicine and Biology Society. Conference 2009, pp. 6364\u20137.","DOI":"10.1109\/IEMBS.2009.5333257"},{"key":"9722_CR59","doi-asserted-by":"crossref","unstructured":"Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 3973\u20133983)","DOI":"10.18653\/v1\/D19-1410"},{"key":"9722_CR60","doi-asserted-by":"crossref","unstructured":"Roy, C. K., Zibran, M. F., & Koschke, R. (2014). The vision of software clone management: Past, present, and future. In 2014 Software evolution week - IEEE conference on software maintenance, reengineering, and reverse engineering","DOI":"10.1109\/CSMR-WCRE.2014.6747168"},{"issue":"2","key":"9722_CR61","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1007\/s10664-008-9102-8","volume":"14","author":"P Runeson","year":"2008","unstructured":"Runeson, P., & H\u00f6st, M. (2008). Guidelines for conducting and reporting case study research in software engineering. Empirical Software Engineering, 14(2), 131.","journal-title":"Empirical Software Engineering"},{"key":"9722_CR62","doi-asserted-by":"crossref","unstructured":"Sander, J. (2010). In C. Sammut, & G. I. Webb (Eds.), Density-based clustering (pp. 270\u2013273). Boston, MA: Springer","DOI":"10.1007\/978-0-387-30164-8_211"},{"issue":"1","key":"9722_CR63","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1109\/TR.2015.2434953","volume":"65","author":"Q Shi","year":"2016","unstructured":"Shi, Q., Chen, Z., Fang, C., Feng, Y., & Xu, B. (2016). Measuring the diversity of a test set with distance entropy. IEEE Transactions on Reliability, 65(1), 19\u201327.","journal-title":"IEEE Transactions on Reliability"},{"key":"9722_CR64","doi-asserted-by":"crossref","unstructured":"Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., & Chanona-Hern\u00e1ndez, L. (2013). Syntactic dependency-based n-grams as classification features. In Advances in computational intelligence (pp. 1\u201311). Springer, Berlin Heidelberg","DOI":"10.1007\/978-3-642-37798-3_1"},{"issue":"4","key":"9722_CR65","first-page":"35","volume":"24","author":"A Singhal","year":"2001","unstructured":"Singhal, A. (2001). Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 24(4), 35\u201343.","journal-title":"IEEE Data Engineering Bulletin"},{"key":"9722_CR66","doi-asserted-by":"crossref","unstructured":"Sutar, S., Kumar, R., Pai, S., & BR, S. (2020). Regression test cases selection using natural language processing. In 2020 International conference on intelligent engineering and management","DOI":"10.1109\/ICIEM48762.2020.9160225"},{"key":"9722_CR67","unstructured":"Tahvili, S. (2018). Multi-criteria optimization of system integration testing. PhD thesis, M\u00e4lardalen University"},{"key":"9722_CR68","doi-asserted-by":"crossref","unstructured":"Tahvili, S., Ahlberg, M., Fornander, E., Afzal, W., Saadatmand, M., Bohlin, M., & Sarabi, M. (2018a). Functional dependency detection for integration test cases. In Companion of the 18th IEEE int. conf. on software quality, reliability, and security.","DOI":"10.1109\/QRS-C.2018.00047"},{"key":"9722_CR69","doi-asserted-by":"crossref","unstructured":"Tahvili, S., Bohlin, M., Saadatmand, M., Larsson, S., Afzal, W., & Sundmark, D. (2016a). Cost-benefit analysis of using dependency knowledge at integration testing. In The 17th int. conf. on product-focused software process improvement","DOI":"10.1007\/978-3-319-49094-6_17"},{"key":"9722_CR70","doi-asserted-by":"crossref","unstructured":"Tahvili, S., Hatvani, L., Felderer, M., Afzal, W., & Bohlin, M. (2019a). Automated functional dependency detection between test cases using doc2vec and clustering. In The first IEEE international conference on artificial intelligence testing","DOI":"10.1109\/AITest.2019.00-13"},{"key":"9722_CR71","doi-asserted-by":"crossref","unstructured":"Tahvili, S., Hatvani, L., Felderer, M., Afzal, W., Saadatmand, M., & Bohlin, M. (2018b). Cluster-based test scheduling strategies using semantic relationships between test specifications. In 5th Int. workshop on requirements engineering and testing.","DOI":"10.1145\/3195538.3195540"},{"key":"9722_CR72","doi-asserted-by":"crossref","unstructured":"Tahvili, S., Saadatmand, M., Larsson, S., Afzal, W., Bohlin, M., & Sundmark, D. (2016b). Dynamic integration test selection based on test case dependencies. In The 11th workshop on testing: Academia-industry collaboration, practice and research techniques","DOI":"10.1109\/ICSTW.2016.14"},{"key":"9722_CR73","volume-title":"Artificial intelligence methods for optimization of the software testing process with practical examples and exercises","author":"S Tahvili","year":"2022","unstructured":"Tahvili, S., & Hatvani, L. (2022). Artificial intelligence methods for optimization of the software testing process with practical examples and exercises. Amsterdam: Elsevier."},{"key":"9722_CR74","doi-asserted-by":"publisher","first-page":"103878","DOI":"10.1016\/j.engappai.2020.103878","volume":"95","author":"S Tahvili","year":"2020","unstructured":"Tahvili, S., Hatvani, L., Ramentol, E., Pimentel, R., Afzal, W., & Herrera, F. (2020). A novel methodology to classify test cases using natural language processing and imbalanced learning. Engineering Applications of Artificial Intelligence, 95, 103878.","journal-title":"Engineering Applications of Artificial Intelligence"},{"key":"9722_CR75","doi-asserted-by":"crossref","unstructured":"Tahvili, S., Pimentel, R., Afzal, W., Ahlberg, M., Fornander, E., & Bohlin, M. (2019b). sortes: A supportive tool for stochastic scheduling of manual integration test cases. Journal of IEEE Access, 6, 1\u201319.","DOI":"10.1109\/ACCESS.2019.2893209"},{"issue":"1","key":"9722_CR76","doi-asserted-by":"publisher","first-page":"182","DOI":"10.1007\/s10664-012-9219-7","volume":"19","author":"S Thomas","year":"2014","unstructured":"Thomas, S., Hemmati, H., Hassan, A., & Blostein, D. (2014). Static test case prioritization using topic models. Empirical Software Engineering, 19(1), 182\u2013212.","journal-title":"Empirical Software Engineering"},{"key":"9722_CR77","unstructured":"Visalakshi, R., Ponnusamy, R., & Manikandan, K. (2016). Literature survey of data mining clustering algorithms (vol. 1, pp. 310\u2013313)."},{"key":"9722_CR78","doi-asserted-by":"publisher","first-page":"2146","DOI":"10.1109\/TASLP.2020.3008390","volume":"28","author":"B Wang","year":"2020","unstructured":"Wang, B., & Kuo, C.-C.J. (2020). Sbert-wk: A sentence embedding method by dissecting bert-based word models. IEEE\/ACM Transactions on Audio, Speech, and Language Processing, 28, 2146\u20132157.","journal-title":"IEEE\/ACM Transactions on Audio, Speech, and Language Processing"},{"issue":"4","key":"9722_CR79","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1080\/19312458.2017.1387238","volume":"11","author":"K Welbers","year":"2017","unstructured":"Welbers, K., Atteveldt, W. V., & Benoit, K. (2017). Text analysis in r. Communication Methods and Measures, 11(4), 245\u2013265.","journal-title":"Communication Methods and Measures"},{"key":"9722_CR80","doi-asserted-by":"crossref","unstructured":"Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of Data Science,2.","DOI":"10.1007\/s40745-015-0040-1"},{"issue":"3","key":"9722_CR81","doi-asserted-by":"publisher","first-page":"645","DOI":"10.1109\/TNN.2005.845141","volume":"16","author":"R Xu","year":"2005","unstructured":"Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3), 645\u2013678.","journal-title":"IEEE Transactions on Neural Networks"},{"issue":"3","key":"9722_CR82","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1016\/j.artmed.2014.12.007","volume":"63","author":"J Ye","year":"2015","unstructured":"Ye, J. (2015). Improved cosine similarity measures of simplified neutrosophic sets for medical diagnoses. Artificial Intelligence in Medicine, 63(3), 171\u2013179.","journal-title":"Artificial Intelligence in Medicine"},{"key":"9722_CR83","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1016\/S1076-6332(03)00671-8","volume":"11","author":"K Zou","year":"2004","unstructured":"Zou, K., Warfield, S., Bharatha, A., Tempany, C., Kaus, M., Haker, S., Wells, W., Jolesz, F., & Kikinis, R. (2004). Statistical validation of image segmentation quality based on a spatial overlap index. Academic Radiology, 11, 178\u201389.","journal-title":"Academic Radiology"}],"container-title":["Software Quality Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11219-025-09722-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11219-025-09722-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11219-025-09722-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,9]],"date-time":"2025-06-09T04:34:00Z","timestamp":1749443640000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11219-025-09722-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,17]]},"references-count":83,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["9722"],"URL":"https:\/\/doi.org\/10.1007\/s11219-025-09722-7","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-4014160\/v1","asserted-by":"object"}]},"ISSN":["0963-9314","1573-1367"],"issn-type":[{"value":"0963-9314","type":"print"},{"value":"1573-1367","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,17]]},"assertion":[{"value":"23 April 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 May 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests"}}],"article-number":"24"}}