{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T12:03:01Z","timestamp":1777636981668,"version":"3.51.4"},"reference-count":85,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,1,18]],"date-time":"2022-01-18T00:00:00Z","timestamp":1642464000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,1,18]],"date-time":"2022-01-18T00:00:00Z","timestamp":1642464000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["402774445"],"award-info":[{"award-number":["402774445"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100018933","name":"Technische Universit\u00e4t Clausthal","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100018933","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Empir Software Eng"],"published-print":{"date-parts":[[2022,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Context<\/jats:title>\n                <jats:p>The SZZ algorithm is the de facto standard for labeling bug fixing commits and finding inducing changes for defect prediction data. Recent research uncovered potential problems in different parts of the SZZ algorithm. Most defect prediction data sets provide only static code metrics as features, while research indicates that other features are also important.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Objective<\/jats:title>\n                <jats:p>We provide an empirical analysis of the defect labels created with the SZZ algorithm and the impact of commonly used features on results.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Method<\/jats:title>\n                <jats:p>We used a combination of manual validation and adopted or improved heuristics for the collection of defect data. We conducted an empirical study on 398 releases of 38 Apache projects.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>We found that only half of the bug fixing commits determined by SZZ are actually bug fixing. If a six-month time frame is used in combination with SZZ to determine which bugs affect a release, one file is incorrectly labeled as defective for every file that is correctly labeled as defective. In addition, two defective files are missed. We also explored the impact of the relatively small set of features that are available in most defect prediction data sets, as there are multiple publications that indicate that, e.g., churn related features are important for defect prediction. We found that the difference of using more features is not significant.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>Problems with inaccurate defect labels are a severe threat to the validity of the state of the art of defect prediction. Small feature sets seem to be a less severe threat.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1007\/s10664-021-10092-4","type":"journal-article","created":{"date-parts":[[2022,1,18]],"date-time":"2022-01-18T09:17:50Z","timestamp":1642497470000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":48,"title":["Problems with SZZ and features: An empirical study of the state of practice of defect prediction data collection"],"prefix":"10.1007","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9765-2803","authenticated-orcid":false,"given":"Steffen","family":"Herbold","sequence":"first","affiliation":[]},{"given":"Alexander","family":"Trautsch","sequence":"additional","affiliation":[]},{"given":"Fabian","family":"Trautsch","sequence":"additional","affiliation":[]},{"given":"Benjamin","family":"Ledel","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,1,18]]},"reference":[{"key":"10092_CR1","doi-asserted-by":"crossref","unstructured":"Altinger H, Siegl S, Dajsuren Y, Wotawa F (2015) A novel industry grade dataset for fault prediction based on model-driven developed automotive embedded software. In: Proceedings of the 12th Working Conference on Mining Software Repositories, IEEE Press, Piscataway, NJ, USA, MSR \u201915, pp 494\u2013497. http:\/\/dl.acm.org\/citation.cfm?id=2820518.2820596","DOI":"10.1109\/MSR.2015.72"},{"key":"10092_CR2","doi-asserted-by":"publisher","unstructured":"Antoniol G, Ayari K, Di Penta M, Khomh F, Gu\u00e9h\u00e9neuc YG (2008) Is it a bug or an enhancement? a text-based approach to classify change requests. In: Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds, Association for Computing Machinery, New York, NY, USA, CASCON \u201908 https:\/\/doi.org\/10.1145\/1463788.1463819.","DOI":"10.1145\/1463788.1463819."},{"key":"10092_CR3","doi-asserted-by":"publisher","unstructured":"Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009a) Fair and balanced?: Bias in bug-fix datasets. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ACM, New York, NY, USA, ESEC\/FSE \u201909, pp 121\u2013130 https:\/\/doi.org\/10.1145\/1595696.1595716.","DOI":"10.1145\/1595696.1595716."},{"key":"10092_CR4","doi-asserted-by":"publisher","unstructured":"Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009b) The promises and perils of mining git. In: 2009 6th IEEE International Working Conference on Mining Software Repositories, pp 1\u201310 https:\/\/doi.org\/10.1109\/MSR.2009.5069475","DOI":"10.1109\/MSR.2009.5069475"},{"key":"10092_CR5","doi-asserted-by":"publisher","unstructured":"Bird C, Bachmann A, Rahman F, Bernstein A (2010) Linkster: Enabling efficient manual inspection and annotation of mined data. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, ACM, New York, NY, USA, FSE \u201910, pp 369\u2013370 https:\/\/doi.org\/10.1145\/1882291.1882352.","DOI":"10.1145\/1882291.1882352."},{"key":"10092_CR6","doi-asserted-by":"publisher","unstructured":"Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don\u2019t touch my code! examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Association for Computing Machinery, New York, NY, USA, ESEC\/FSE \u201911, p 4\u201314 https:\/\/doi.org\/10.1145\/2025113.2025119.","DOI":"10.1145\/2025113.2025119."},{"key":"10092_CR7","doi-asserted-by":"publisher","unstructured":"Bissyand\u00e9 TF, Thung F, Wang S, Lo D, Jiang L, R\u00e9veill\u00e8re L (2013) Empirical evaluation of bug linking. In: 2013 17th European Conference on Software Maintenance and Reengineering, pp 89\u201398 https:\/\/doi.org\/10.1109\/CSMR.2013.19","DOI":"10.1109\/CSMR.2013.19"},{"key":"10092_CR8","doi-asserted-by":"publisher","unstructured":"Bowes D, Hall T, Harman M, Jia Y, Sarro F, Wu F (2016) Mutation-aware fault prediction. In: Proceedings of the 25th International Symposium on Software Testing and Analysis, Association for Computing Machinery, New York, NY, USA, ISSTA 2016, p 330\u2013341 https:\/\/doi.org\/10.1145\/2931037.2931039","DOI":"10.1145\/2931037.2931039"},{"key":"10092_CR9","doi-asserted-by":"crossref","unstructured":"Camargo Cruz AE, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: Proc. 3rd Int. Symp. on Empirical Softw. Eng. and Measurement (ESEM), IEEE Computer Society","DOI":"10.1109\/ESEM.2009.5316002"},{"issue":"6","key":"10092_CR10","doi-asserted-by":"publisher","first-page":"476","DOI":"10.1109\/32.295895","volume":"20","author":"SR Chidamber","year":"1994","unstructured":"Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476\u2013493. https:\/\/doi.org\/10.1109\/32.295895","journal-title":"IEEE Trans Softw Eng"},{"issue":"3","key":"10092_CR11","doi-asserted-by":"publisher","first-page":"494","DOI":"10.1037\/0033-2909.114.3.494","volume":"114","author":"N Cliff","year":"1993","unstructured":"Cliff N (1993) Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin 114(3):494","journal-title":"Psychological Bulletin"},{"issue":"1","key":"10092_CR12","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1177\/001316446002000104","volume":"20","author":"J Cohen","year":"1960","unstructured":"Cohen J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1):37\u201346. https:\/\/doi.org\/10.1177\/001316446002000104","journal-title":"Educational and Psychological Measurement"},{"issue":"7","key":"10092_CR13","doi-asserted-by":"publisher","first-page":"641","DOI":"10.1109\/TSE.2016.2616306","volume":"43","author":"DA Da Costa","year":"2017","unstructured":"Da Costa DA, McIntosh S, Shang W, Kulesza U, Coelho R, Hassan AE (2017) A framework for evaluating the results of the szz approach for identifying bug-introducing changes. IEEE Transactions on Software Engineering 43(7):641\u2013657. https:\/\/doi.org\/10.1109\/TSE.2016.2616306","journal-title":"IEEE Transactions on Software Engineering"},{"issue":"4\u20135","key":"10092_CR14","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1007\/s10664-011-9173-9","volume":"17","author":"M D\u2019Ambros","year":"2012","unstructured":"D\u2019Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: A benchmark and an extensive comparison. Empirical Softw Engg 17(4\u20135):531\u2013577. https:\/\/doi.org\/10.1007\/s10664-011-9173-9","journal-title":"Empirical Softw Engg"},{"key":"10092_CR15","first-page":"1","volume":"7","author":"J Dem\u0161ar","year":"2006","unstructured":"Dem\u0161ar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1\u201330","journal-title":"J Mach Learn Res"},{"key":"10092_CR16","doi-asserted-by":"publisher","unstructured":"Di Penta M, Bavota G, Zampetti F (2020) On the relationship between refactoring actions and bugs: A differentiated replication. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Association for Computing Machinery, New York, NY, USA, ESEC\/FSE 2020, p 556\u2013567 https:\/\/doi.org\/10.1145\/3368089.3409695","DOI":"10.1145\/3368089.3409695"},{"key":"10092_CR17","doi-asserted-by":"publisher","unstructured":"Fan Y, Xia X, Alencar Da Costa D, Lo D, Hassan AE, Li S (2019) The impact of changes mislabeled by szz on just-in-time defect prediction. IEEE Transactions on Software Engineering pp 1 https:\/\/doi.org\/10.1109\/TSE.2019.2929761","DOI":"10.1109\/TSE.2019.2929761"},{"key":"10092_CR18","doi-asserted-by":"publisher","unstructured":"Ferenc R, T\u00f3th Z, Lad\u00e1nyi G, Siket I, Gyim\u00f3thy T (2018) A public unified bug dataset for java. In: Proceedings of the 14th International Conference on Predictive Models and Data Analytics in Software Engineering, ACM, New York, NY, USA, PROMISE\u201918, pp 12\u201321 https:\/\/doi.org\/10.1145\/3273934.3273936","DOI":"10.1145\/3273934.3273936"},{"key":"10092_CR19","doi-asserted-by":"publisher","unstructured":"Ferenc R, Gyimesi P, Gyimesi G, T\u00f3th Z, Gyim\u00f3thy T (2020a) An automatically created novel bug dataset and its validation in bug prediction. J Syst Softw 169:110691.\u00a0https:\/\/doi.org\/10.1016\/j.jss.2020.110691","DOI":"10.1016\/j.jss.2020.110691"},{"issue":"4","key":"10092_CR20","doi-asserted-by":"publisher","first-page":"1447","DOI":"10.1007\/s11219-020-09515-0","volume":"28","author":"R Ferenc","year":"2020","unstructured":"Ferenc R, T\u00f3th Z, Lad\u00e1nyi G, Siket I, Gyim\u00f3thy T (2020) A public unified bug dataset for java and its assessment regarding metrics and bug prediction. Software Quality Journal 28(4):1447\u20131506. https:\/\/doi.org\/10.1007\/s11219-020-09515-0","journal-title":"Software Quality Journal"},{"key":"10092_CR21","doi-asserted-by":"publisher","unstructured":"Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings., pp 23\u201332 https:\/\/doi.org\/10.1109\/ICSM.2003.1235403","DOI":"10.1109\/ICSM.2003.1235403"},{"issue":"11","key":"10092_CR22","doi-asserted-by":"publisher","first-page":"725","DOI":"10.1109\/TSE.2007.70731","volume":"33","author":"B Fluri","year":"2007","unstructured":"Fluri B, W\u00fcrsch M, Pinzger M, Gall H (2007) Change distilling: Tree differencing for fine-grained source code change extraction. IEEE Transactions on Software Engineering 33(11):725\u2013743","journal-title":"IEEE Transactions on Software Engineering"},{"issue":"1","key":"10092_CR23","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1214\/aoms\/1177731944","volume":"11","author":"M Friedman","year":"1940","unstructured":"Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics 11(1):86\u201392","journal-title":"The Annals of Mathematical Statistics"},{"issue":"6","key":"10092_CR24","doi-asserted-by":"publisher","first-page":"1276","DOI":"10.1109\/TSE.2011.103","volume":"38","author":"T Hall","year":"2012","unstructured":"Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering 38(6):1276\u20131304. https:\/\/doi.org\/10.1109\/TSE.2011.103","journal-title":"IEEE Transactions on Software Engineering"},{"key":"10092_CR25","doi-asserted-by":"publisher","unstructured":"Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proceedings of the 31st International Conference on Software Engineering, IEEE Computer Society, Washington, DC, USA, ICSE \u201909, pp 78\u201388 https:\/\/doi.org\/10.1109\/ICSE.2009.5070510","DOI":"10.1109\/ICSE.2009.5070510"},{"key":"10092_CR26","doi-asserted-by":"publisher","unstructured":"Herbold S (2019) On the costs and profit of software defect prediction. IEEE Transactions on Software Engineering pp 1 https:\/\/doi.org\/10.1109\/TSE.2019.2957794","DOI":"10.1109\/TSE.2019.2957794"},{"key":"10092_CR27","doi-asserted-by":"publisher","unstructured":"Herbold S, Trautsch A, Grabowski J (2017) A comparative study to benchmark cross-project defect prediction approaches. IEEE Transactions on Software Engineering PP(99):1 https:\/\/doi.org\/10.1109\/TSE.2017.2724538","DOI":"10.1109\/TSE.2017.2724538"},{"key":"10092_CR28","doi-asserted-by":"publisher","unstructured":"Herbold S, Trautsch A, Ledel B (2020) Large-scale manual validation of bugfixing changes. https:\/\/doi.org\/10.17605\/OSF.IO\/ACNWK","DOI":"10.17605\/OSF.IO\/ACNWK"},{"key":"10092_CR29","doi-asserted-by":"publisher","unstructured":"Herzig K, Just S, Rau A, Zeller A (2013) Predicting defects using change genealogies. In: 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE), pp 118\u2013127 https:\/\/doi.org\/10.1109\/ISSRE.2013.6698911","DOI":"10.1109\/ISSRE.2013.6698911"},{"key":"10092_CR30","doi-asserted-by":"crossref","unstructured":"Herzig K, Just S, Zeller A (2013) It\u2019s not a bug, it\u2019s a feature: How misclassification impacts bug prediction. In: Proceedings of the 2013 International Conference on Software Engineering, IEEE Press, Piscataway, NJ, USA, ICSE \u201913, pp 392\u2013401. http:\/\/dl.acm.org\/citation.cfm?id=2486788.2486840","DOI":"10.1109\/ICSE.2013.6606585"},{"key":"10092_CR31","doi-asserted-by":"publisher","unstructured":"Hosseini S, Turhan B, Gunarathna D (2017) A systematic literature review and meta-analysis on cross project defect prediction. IEEE Transactions on Software Engineering PP(99):1 https:\/\/doi.org\/10.1109\/TSE.2017.2770124","DOI":"10.1109\/TSE.2017.2770124"},{"key":"10092_CR32","doi-asserted-by":"publisher","unstructured":"Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering, ACM, New York, NY, USA, PROMISE \u201910, pp 9:1\u20139:10 https:\/\/doi.org\/10.1145\/1868328.1868342.","DOI":"10.1145\/1868328.1868342."},{"issue":"6","key":"10092_CR33","doi-asserted-by":"publisher","first-page":"757","DOI":"10.1109\/TSE.2012.70","volume":"39","author":"Y Kamei","year":"2013","unstructured":"Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39(6):757\u2013773. https:\/\/doi.org\/10.1109\/TSE.2012.70","journal-title":"IEEE Transactions on Software Engineering"},{"key":"10092_CR34","doi-asserted-by":"publisher","unstructured":"Kim S, Zimmermann T, Pan K, Jr Whitehead EJ (2006) Automatic identification of bug-introducing changes. In: 21st IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201906), pp 81\u201390 https:\/\/doi.org\/10.1109\/ASE.2006.23","DOI":"10.1109\/ASE.2006.23"},{"key":"10092_CR35","doi-asserted-by":"publisher","unstructured":"Kovalenko V, Palomba F, Bacchelli A (2018) Mining file histories: Should we consider branches? In: Proceedings of the 33rd ACM\/IEEE International Conference on Automated Software Engineering, ACM, New York, NY, USA, ASE 2018, pp 202\u2013213 https:\/\/doi.org\/10.1145\/3238147.3238169","DOI":"10.1145\/3238147.3238169"},{"issue":"1","key":"10092_CR36","doi-asserted-by":"publisher","first-page":"159","DOI":"10.2307\/2529310","volume":"33","author":"JR Landis","year":"1977","unstructured":"Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159\u2013174","journal-title":"Biometrics"},{"issue":"3","key":"10092_CR37","doi-asserted-by":"publisher","first-page":"393","DOI":"10.1007\/s11219-014-9241-7","volume":"23","author":"L Madeyski","year":"2015","unstructured":"Madeyski L, Jureczko M (2015) Which process metrics can significantly improve defect prediction models? an empirical study. Software Quality Journal 23(3):393\u2013422. https:\/\/doi.org\/10.1007\/s11219-014-9241-7","journal-title":"Software Quality Journal"},{"issue":"05","key":"10092_CR38","doi-asserted-by":"publisher","first-page":"412","DOI":"10.1109\/TSE.2017.2693980","volume":"44","author":"S McIntosh","year":"2018","unstructured":"McIntosh S, Kamei Y (2018) Are fix-inducing changes a moving target? a longitudinal case study of just-in-time defect prediction. IEEE Transactions on Software Engineering 44(05):412\u2013428. https:\/\/doi.org\/10.1109\/TSE.2017.2693980","journal-title":"IEEE Transactions on Software Engineering"},{"key":"10092_CR39","unstructured":"Menzies T, Krishna R, Pryor D (2015) The promise repository of empirical software engineering data"},{"key":"10092_CR40","unstructured":"Menzies T, Krishna R, Pryor D (2017) The seacraft repository of empirical software engineering data"},{"key":"10092_CR41","doi-asserted-by":"publisher","unstructured":"Mills C, Pantiuchina J, Parra E, Bavota G, Haiduc S (2018) Are bug reports enough for text retrieval-based bug localization? In: 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 381\u2013392 https:\/\/doi.org\/10.1109\/ICSME.2018.00046","DOI":"10.1109\/ICSME.2018.00046"},{"key":"10092_CR42","doi-asserted-by":"publisher","unstructured":"Mockus A (2009) Amassing and indexing a large sample of version control systems: Towards the census of public source code history. In: 2009 6th IEEE International Working Conference on Mining Software Repositories, pp 11\u201320 https:\/\/doi.org\/10.1109\/MSR.2009.5069476","DOI":"10.1109\/MSR.2009.5069476"},{"key":"10092_CR43","doi-asserted-by":"publisher","unstructured":"Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International Conference on Software Engineering, ACM, New York, NY, USA, ICSE \u201908, pp 181\u2013190 https:\/\/doi.org\/10.1145\/1368088.1368114","DOI":"10.1145\/1368088.1368114"},{"key":"10092_CR44","unstructured":"NASA (2004) Nasa iv & v facility metrics data program.\u00a0http:\/\/web.archive.org\/web\/20110421024209\/, http:\/\/mdp.ivv.nasa.gov\/repository.html. Accessed\u00a017 December 2021"},{"key":"10092_CR45","unstructured":"Nemenyi P (1963) Distribution-free multiple comparison. PhD thesis, Princeton University"},{"key":"10092_CR46","doi-asserted-by":"publisher","unstructured":"Neto EC, da Costa DA, Kulesza U (2018) The impact of refactoring changes on the szz algorithm: An empirical study. In: 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp 380\u2013390 https:\/\/doi.org\/10.1109\/SANER.2018.8330225","DOI":"10.1109\/SANER.2018.8330225"},{"key":"10092_CR47","doi-asserted-by":"publisher","unstructured":"Neto EC, d Costa DA, Kulesza U (2019) Revisiting and improving szz implementations. In: 2019 ACM\/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp 1\u201312 https:\/\/doi.org\/10.1109\/ESEM.2019.8870178","DOI":"10.1109\/ESEM.2019.8870178"},{"issue":"4","key":"10092_CR48","doi-asserted-by":"publisher","first-page":"340","DOI":"10.1109\/TSE.2005.49","volume":"31","author":"T Ostrand","year":"2005","unstructured":"Ostrand T, Weyuker E, Bell R (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340\u2013355. https:\/\/doi.org\/10.1109\/TSE.2005.49","journal-title":"IEEE Trans Softw Eng"},{"issue":"2","key":"10092_CR49","doi-asserted-by":"publisher","first-page":"194","DOI":"10.1109\/TSE.2017.2770122","volume":"45","author":"F Palomba","year":"2019","unstructured":"Palomba F, Zanoni M, Fontana FA, De Lucia A, Oliveto R (2019) Toward a smell-aware bug prediction model. IEEE Transactions on Software Engineering 45(2):194\u2013218. https:\/\/doi.org\/10.1109\/TSE.2017.2770122","journal-title":"IEEE Transactions on Software Engineering"},{"key":"10092_CR50","doi-asserted-by":"publisher","unstructured":"Pascarella L, Palomba F, Bacchelli A (2019) Fine-grained just-in-time defect prediction. J Syst Softw 150:22\u201336.\u00a0https:\/\/doi.org\/10.1016\/j.jss.2018.12.001","DOI":"10.1016\/j.jss.2018.12.001"},{"key":"10092_CR51","doi-asserted-by":"publisher","unstructured":"Plosch R, Gruber H, Hentschel A, Pomberger G, Schiffer S (2008) On the relation between external software quality and static code analysis. In: 2008 32nd Annual IEEE Software Engineering Workshop, pp 169\u2013174 https:\/\/doi.org\/10.1109\/SEW.2008.17","DOI":"10.1109\/SEW.2008.17"},{"key":"10092_CR52","doi-asserted-by":"publisher","unstructured":"Rahman F, Posnett D, Hindle A, Barr E, Devanbu P (2011) Bugcache for inspections: Hit or miss? In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ACM https:\/\/doi.org\/10.1145\/2025113.2025157","DOI":"10.1145\/2025113.2025157"},{"key":"10092_CR53","doi-asserted-by":"publisher","unstructured":"Rahman F, Khatri S, Barr ET, Devanbu P (2014) Comparing static bug finders and statistical prediction. In: Proceedings of the 36th International Conference on Software Engineering, ACM, New York, NY, USA, ICSE 2014, pp 424\u2013434 https:\/\/doi.org\/10.1145\/2568225.2568269","DOI":"10.1145\/2568225.2568269"},{"key":"10092_CR54","doi-asserted-by":"publisher","unstructured":"Rodr\u00edguez-P\u00e9rez G, Zaidman A, Serebrenik A, Robles G, Gonz\u00e1lez-Barahona JM (2018) What if a bug has a different origin? making sense of bugs without an explicit bug introducing change. In: Proceedings of the 12th ACM\/IEEE International Symposium on Empirical Software Engineering and Measurement, Association for Computing Machinery, New York, NY, USA, ESEM \u201918 https:\/\/doi.org\/10.1145\/3239235.3267436","DOI":"10.1145\/3239235.3267436"},{"key":"10092_CR55","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-019-09781-y","author":"G Rodr\u00edguez-P\u00e9rez","year":"2020","unstructured":"Rodr\u00edguez-P\u00e9rez G, Robles G, Serebrenik A, Zaidman A, Germ\u00e1n DM, Gonzalez-Barahona JM (2020) How bugs are born: a model to identify how bugs are introduced in software components. Empirical Software Engineering. https:\/\/doi.org\/10.1007\/s10664-019-09781-y","journal-title":"Empirical Software Engineering"},{"key":"10092_CR56","doi-asserted-by":"publisher","first-page":"164","DOI":"10.1016\/j.infsof.2018.03.009","volume":"99","author":"G Rodr\u00edguez-P\u00e9rez","year":"2018","unstructured":"Rodr\u00edguez-P\u00e9rez G, Robles G, Gonz\u00e1lez-Barahona JM (2018) Reproducibility and credibility in empirical software engineering: A case study based on a systematic literature review of the use of the szz algorithm. Information and Software Technology 99:164\u2013176. https:\/\/doi.org\/10.1016\/j.infsof.2018.03.009","journal-title":"Information and Software Technology"},{"key":"10092_CR57","unstructured":"Romano J, Kromrey J, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen\u2019sd for evaluating group differences on the NSSE and other surveys? In: Annual Meeting of the Florida Association of Institutional Research, pp 1\u20133"},{"key":"10092_CR58","doi-asserted-by":"publisher","unstructured":"Rosa G, Pascarella L, Scalabrino S, Tufano R, Bavota G, Lanza M, Oliveto R (2021) Evaluating szz implementations through a developer-informed oracle. In: 2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE), IEEE Computer Society, Los Alamitos, CA, USA, pp 436\u2013447 https:\/\/doi.org\/10.1109\/ICSE43902.2021.00049","DOI":"10.1109\/ICSE43902.2021.00049"},{"issue":"424","key":"10092_CR59","doi-asserted-by":"publisher","first-page":"1273","DOI":"10.1080\/01621459.1993.10476408","volume":"88","author":"PJ Rousseeuw","year":"1993","unstructured":"Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. Journal of the American Statistical Association 88(424):1273\u20131283. https:\/\/doi.org\/10.1080\/01621459.1993.10476408","journal-title":"Journal of the American Statistical Association"},{"issue":"2","key":"10092_CR60","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1007\/s10664-008-9102-8","volume":"14","author":"P Runeson","year":"2008","unstructured":"Runeson P, H\u00f6st M (2008) Guidelines for conducting and reporting case study research in software engineering. Empirical Software Engineering 14(2):131. https:\/\/doi.org\/10.1007\/s10664-008-9102-8","journal-title":"Empirical Software Engineering"},{"key":"10092_CR61","doi-asserted-by":"publisher","unstructured":"Shippey T, Hall T, Counsell S, Bowes D (2016) So you need more method level datasets for your software defect prediction?: Voil\u00e0! In: Proceedings of the 10th ACM\/IEEE International Symposium on Empirical Software Engineering and Measurement, ACM, New York, NY, USA, ESEM \u201916, pp 12:1\u201312:6 https:\/\/doi.org\/10.1145\/2961111.2962620","DOI":"10.1145\/2961111.2962620"},{"issue":"2","key":"10092_CR62","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s10664-008-9060-1","volume":"13","author":"F Shull","year":"2008","unstructured":"Shull F, Carver J, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empirical Software Engineering 13(2):211\u2013218","journal-title":"Empirical Software Engineering"},{"key":"10092_CR63","doi-asserted-by":"publisher","unstructured":"Silva D, Valente MT (2017) Refdiff: Detecting refactorings in version histories. In: 2017 IEEE\/ACM 14th International Conference on Mining Software Repositories (MSR), pp 269\u2013279 https:\/\/doi.org\/10.1109\/MSR.2017.14","DOI":"10.1109\/MSR.2017.14"},{"key":"10092_CR64","doi-asserted-by":"publisher","unstructured":"\u015aliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: Proceedings of the 2005 International Workshop on Mining Software Repositories, ACM, New York, NY, USA, MSR \u201905, pp 1\u20135 https:\/\/doi.org\/10.1145\/1082983.1083147","DOI":"10.1145\/1082983.1083147"},{"key":"10092_CR65","doi-asserted-by":"publisher","unstructured":"Spadini D, Palomba F, Zaidman A, Bruntink M, Bacchelli A (2018) On the relation of test smells to software code quality. In: 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 1\u201312 https:\/\/doi.org\/10.1109\/ICSME.2018.00010","DOI":"10.1109\/ICSME.2018.00010"},{"key":"10092_CR66","doi-asserted-by":"publisher","unstructured":"Tantithamthavorn C, McIntosh S, Hassan AE, Ihara A, Matsumoto K (2015) The impact of mislabelling on the performance and interpretation of defect prediction models. In: 2015 IEEE\/ACM 37th IEEE International Conference on Software Engineering, vol 1, pp 812\u2013823 https:\/\/doi.org\/10.1109\/ICSE.2015.93","DOI":"10.1109\/ICSE.2015.93"},{"key":"10092_CR67","doi-asserted-by":"publisher","unstructured":"Tantithamthavorn C, Hassan AE, Matsumoto K (2018) The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Transactions on Software Engineering pp 1 https:\/\/doi.org\/10.1109\/TSE.2018.2876537","DOI":"10.1109\/TSE.2018.2876537"},{"key":"10092_CR68","doi-asserted-by":"publisher","unstructured":"Thongtanunam P, McIntosh S, Hassan AE, Iida H (2016) Revisiting code ownership and its relationship with software quality in the scope of modern code review. In: 2016 IEEE\/ACM 38th International Conference on Software Engineering (ICSE), pp 1039\u20131050 https:\/\/doi.org\/10.1145\/2884781.2884852","DOI":"10.1145\/2884781.2884852"},{"key":"10092_CR69","doi-asserted-by":"publisher","first-page":"625","DOI":"10.1007\/978-3-319-42089-9_44","volume-title":"Computational Science and Its Applications - ICCSA 2016","author":"Z T\u00f3th","year":"2016","unstructured":"T\u00f3th Z, Gyimesi P, Ferenc R (2016) A public bug database of github projects and its application in bug prediction. In: Gervasi O, Murgante B, Misra S, Rocha AMA, Torre CM, Taniar D, Apduhan BO, Stankova E, Wang S (eds) Computational Science and Its Applications - ICCSA 2016. Springer International Publishing, Cham, pp 625\u2013638"},{"key":"10092_CR70","doi-asserted-by":"publisher","unstructured":"Trautsch A, Herbold S, Grabowski J (2020) Static source code metrics and static analysis warnings for fine-grained just-in-time defect prediction. In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 127\u2013138 https:\/\/doi.org\/10.1109\/ICSME46990.2020.00022","DOI":"10.1109\/ICSME46990.2020.00022"},{"key":"10092_CR71","doi-asserted-by":"crossref","unstructured":"Trautsch A, Trautsch F, Herbold S, Ledel B, Grabowski J (2020) The smartshark ecosystem for software repository mining. In: Proceedings of the 42st International Conference on Software Engineering - Demonstrations, ACM","DOI":"10.1145\/3377812.3382139"},{"key":"10092_CR72","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-017-9537-x","author":"F Trautsch","year":"2017","unstructured":"Trautsch F, Herbold S, Makedonski P, Grabowski J (2017) Addressing problems with replicability and validity of repository mining studies through a smart data platform. Empirical Software Engineering. https:\/\/doi.org\/10.1007\/s10664-017-9537-x","journal-title":"Empirical Software Engineering"},{"issue":"4","key":"10092_CR73","doi-asserted-by":"publisher","first-page":"e1838","DOI":"10.1002\/smr.1838","volume":"29","author":"M Tufano","year":"2017","unstructured":"Tufano M, Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2017) There and back again: Can you compile that snapshot? Journal of Software: Evolution and Process 29(4):e1838. https:\/\/doi.org\/10.1002\/smr.1838","journal-title":"Journal of Software: Evolution and Process"},{"issue":"5","key":"10092_CR74","doi-asserted-by":"publisher","first-page":"540","DOI":"10.1007\/s10664-008-9103-7","volume":"14","author":"B Turhan","year":"2009","unstructured":"Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering 14(5):540\u2013578. https:\/\/doi.org\/10.1007\/s10664-008-9103-7","journal-title":"Empirical Software Engineering"},{"key":"10092_CR75","doi-asserted-by":"publisher","unstructured":"Turing AM (1937) On computable numbers, with an application to the entscheidungsproblem. Proceedings of the London Mathematical Society s2-42(1):230\u2013265 https:\/\/doi.org\/10.1112\/plms\/s2-42.1.230","DOI":"10.1112\/plms\/s2-42.1.230"},{"key":"10092_CR76","doi-asserted-by":"publisher","unstructured":"Williams C, Spacco J (2008) Szz revisited: Verifying when changes induce fixes. In: Proceedings of the 2008 Workshop on Defects in Large Software Systems, Association for Computing Machinery, New York, NY, USA, DEFECTS \u201908, p 32\u201336 https:\/\/doi.org\/10.1145\/1390817.1390826","DOI":"10.1145\/1390817.1390826"},{"key":"10092_CR77","doi-asserted-by":"publisher","unstructured":"Wu R, Zhang H, Kim S, Cheung SC (2011) Relink: Recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ACM, New York, NY, USA, ESEC\/FSE \u201911, pp 15\u201325 https:\/\/doi.org\/10.1145\/2025113.2025120","DOI":"10.1145\/2025113.2025120"},{"key":"10092_CR78","doi-asserted-by":"publisher","unstructured":"Yao J, Shepperd M (2020) Assessing software defection prediction performance: Why using the matthews correlation coefficient matters. In: Proceedings of the Evaluation and Assessment in Software Engineering, Association for Computing Machinery, New York, NY, USA, EASE \u201920, p 120\u2013129 https:\/\/doi.org\/10.1145\/3383219.3383232","DOI":"10.1145\/3383219.3383232"},{"key":"10092_CR79","doi-asserted-by":"publisher","unstructured":"Yatish S, Jiarpakdee J, Thongtanunam P, Tantithamthavorn C (2019) Mining software defects: Should we consider affected releases? In: Proceedings of the 41st International Conference on Software Engineering, IEEE Press, Piscataway, NJ, USA, ICSE \u201919, pp 654\u2013665 https:\/\/doi.org\/10.1109\/ICSE.2019.00075","DOI":"10.1109\/ICSE.2019.00075"},{"key":"10092_CR80","doi-asserted-by":"publisher","unstructured":"Zhang F, Mockus A, Keivanloo I, Zou Y (2014) Towards building a universal defect prediction model. In: Proceedings of the 11th Working Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR 2014, pp 182\u2013191 https:\/\/doi.org\/10.1145\/2597073.2597078","DOI":"10.1145\/2597073.2597078"},{"issue":"5","key":"10092_CR81","doi-asserted-by":"publisher","first-page":"476","DOI":"10.1109\/TSE.2016.2599161","volume":"43","author":"F Zhang","year":"2017","unstructured":"Zhang F, Hassan AE, McIntosh S, Zou Y (2017) The use of summation to aggregate software metrics hinders the performance of defect prediction models. IEEE Transactions on Software Engineering 43(5):476\u2013491. https:\/\/doi.org\/10.1109\/TSE.2016.2599161","journal-title":"IEEE Transactions on Software Engineering"},{"key":"10092_CR82","unstructured":"Zhang H (2004) The optimality of naive bayes. In: Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004), AAAI Press"},{"key":"10092_CR83","doi-asserted-by":"publisher","unstructured":"Zhao Y, Leung H, Yang Y, Zhou Y, Xu B (2017) Towards an understanding of change types in bug fixing code. Information and Software Technology 86:37\u201353 https:\/\/doi.org\/10.1016\/j.infsof.2017.02.003,http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0950584917301313","DOI":"10.1016\/j.infsof.2017.02.003,"},{"key":"10092_CR84","doi-asserted-by":"publisher","unstructured":"Zhou Y, Yang Y, Lu H, Chen L, Li Y, Zhao Y, Qian J, Xu B (2018) How far we have progressed in the journey? an examination of cross-project defect prediction. ACM Trans Softw Eng Methodol 27(1):1:1:51 https:\/\/doi.org\/10.1145\/3183339","DOI":"10.1145\/3183339"},{"key":"10092_CR85","doi-asserted-by":"publisher","unstructured":"Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the Third International Workshop on Predictor Models in Software Engineering, IEEE Computer Society, Washington, DC, USA, PROMISE \u201907, pp 9\u2013 https:\/\/doi.org\/10.1109\/PROMISE.2007.10","DOI":"10.1109\/PROMISE.2007.10"}],"container-title":["Empirical Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-021-10092-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10664-021-10092-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-021-10092-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,4,12]],"date-time":"2022-04-12T13:22:48Z","timestamp":1649769768000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10664-021-10092-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,18]]},"references-count":85,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,3]]}},"alternative-id":["10092"],"URL":"https:\/\/doi.org\/10.1007\/s10664-021-10092-4","relation":{},"ISSN":["1382-3256","1573-7616"],"issn-type":[{"value":"1382-3256","type":"print"},{"value":"1573-7616","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,18]]},"assertion":[{"value":"10 November 2021","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 January 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"42"}}