{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T04:56:24Z","timestamp":1777524984241,"version":"3.51.4"},"reference-count":135,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2021,8,18]],"date-time":"2021-08-18T00:00:00Z","timestamp":1629244800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,8,18]],"date-time":"2021-08-18T00:00:00Z","timestamp":1629244800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100006447","name":"Universit\u00e4t Z\u00fcrich","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006447","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Empir Software Eng"],"published-print":{"date-parts":[[2021,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Software benchmarks are only as good as the performance measurements they yield. Unstable benchmarks show high variability among repeated measurements, which causes uncertainty about the actual performance and complicates reliable change assessment. However, if a benchmark is stable or unstable only becomes evident after it has been executed and its results are available. In this paper, we introduce a machine-learning-based approach to predict a benchmark\u2019s stability without having to execute it. Our approach relies on 58 statically-computed source code features, extracted for benchmark code and code called by a benchmark, related to (1) meta information, e.g., lines of code (LOC), (2) programming language elements, e.g., conditionals or loops, and (3) potentially performance-impacting standard library calls, e.g., file and network input\/output (I\/O). To assess our approach\u2019s effectiveness, we perform a large-scale experiment on 4,461 Go benchmarks coming from 230 open-source software (OSS) projects. First, we assess the prediction performance of our machine learning models using 11 binary classification algorithms. We find that Random Forest performs best with good prediction performance from 0.79 to 0.90, and 0.43 to 0.68, in terms of AUC and MCC, respectively. Second, we perform feature importance analyses for individual features and feature categories. We find that 7 features related to meta-information, slice usage, nested loops, and synchronization application programming interfaces (APIs) are individually important for good predictions; and that the combination of all features of the called source code is paramount for our model, while the combination of features of the benchmark itself is less important. Our results show that although benchmark stability is affected by more than just the source code, we can effectively utilize machine learning models to predict whether a benchmark will be stable or not ahead of execution. This enables spending precious testing time on reliable benchmarks, supporting developers to identify unstable benchmarks during development, allowing unstable benchmarks to be repeated more often, estimating stability in scenarios where repeated benchmark execution is infeasible or impossible, and warning developers if new benchmarks or existing benchmarks executed in new environments will be unstable.<\/jats:p>","DOI":"10.1007\/s10664-021-09996-y","type":"journal-article","created":{"date-parts":[[2021,8,18]],"date-time":"2021-08-18T15:03:02Z","timestamp":1629298982000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":25,"title":["Predicting unstable software benchmarks using static source code features"],"prefix":"10.1007","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6817-331X","authenticated-orcid":false,"given":"Christoph","family":"Laaber","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mikael","family":"Basmaci","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8687-052X","authenticated-orcid":false,"given":"Pasquale","family":"Salza","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,8,18]]},"reference":[{"key":"9996_CR1","doi-asserted-by":"publisher","unstructured":"Abedi A, Brecht T (2017) Conducting repeatable experiments in highly variable cloud computing environments. In: Proceedings of the 8th ACM\/SPEC on International Conference on Performance Engineering, ICPE, vol 2017. ACM, New York, pp 287\u2013292. https:\/\/doi.org\/10.1145\/3030207.3030229","DOI":"10.1145\/3030207.3030229"},{"key":"9996_CR2","unstructured":"Akinshin A (2020a) Quantile confidence intervals for weighted samples. https:\/\/aakinshin.net\/posts\/weighted-quantiles-ci\/, accessed: 2.2. 2021"},{"key":"9996_CR3","unstructured":"Akinshin A (2020b) Quantile-respectful density estimation based on the Harrell-Davis, quantile estimator. https:\/\/aakinshin.net\/posts\/qrde-hd\/, accessed: 2.2. 2021"},{"key":"9996_CR4","unstructured":"Akinshin A (2021) Unbiased median absolute deviation. https:\/\/aakinshin.net\/posts\/unbiased-mad\/, accessed: 9.2.2021"},{"key":"9996_CR5","doi-asserted-by":"publisher","unstructured":"Alam MMu, Liu T, Zeng G, Muzahid A (2017) SyncPerf: Categorizing, detecting, and diagnosing synchronization performance bugs. In: Proceedings of the 12th European Conference on Computer Systems, EuroSys. ACM, New York, pp 298\u2013313. https:\/\/doi.org\/10.1145\/3064176.3064186","DOI":"10.1145\/3064176.3064186"},{"key":"9996_CR6","doi-asserted-by":"publisher","unstructured":"AlGhamdi HM, Syer MD, Shang W, Hassan AE (2016) An automated approach for recommending when to stop performance tests. In: Proceedings of the 32nd IEEE International Conference on Software Maintenance and Evolution, ICSME, vol 2016, pp 279\u2013289. https:\/\/doi.org\/10.1109\/ICSME.2016.46","DOI":"10.1109\/ICSME.2016.46"},{"key":"9996_CR7","doi-asserted-by":"publisher","unstructured":"AlGhamdi HM, Bezemer CP, Shang W, Hassan AE, Flora P (2020) Towards reducing the time needed for load testing. Journal of Software, Evolution and Process. https:\/\/doi.org\/10.1002\/smr.2276","DOI":"10.1002\/smr.2276"},{"key":"9996_CR8","doi-asserted-by":"publisher","unstructured":"Alshoaibi D, Hannigan K, Gupta H, Mkaouer MW (2019) PRICE: Detection of performance regression introducing code changes using static and dynamic metrics. In: Proceedings of the 11th International Symposium on Search Based Software Engineering, Springer Nature, SSBSE 2019, pp 75\u201388. https:\/\/doi.org\/10.1007\/978-3-030-27455-9_6","DOI":"10.1007\/978-3-030-27455-9_6"},{"issue":"10","key":"9996_CR9","doi-asserted-by":"publisher","first-page":"1340","DOI":"10.1093\/bioinformatics\/btq134","volume":"26","author":"A Altmann","year":"2010","unstructured":"Altmann A, Tolos\u0307i L, Sander O, Lengauer T (2010) Permutation importance: A corrected feature importance measure. Bioinformatics 26 (10):1340\u20131347. https:\/\/doi.org\/10.1093\/bioinformatics\/btq134","journal-title":"Bioinformatics"},{"key":"9996_CR10","unstructured":"Andersen LO (1994) Program analysis and specialization for the C programming language. PhD thesis, University of Copenhagen, Universitetsparken 1, DK-2100 Copenhagen, Denmark"},{"key":"9996_CR11","doi-asserted-by":"publisher","unstructured":"Arachchige CNPG, Prendergast LA, Staudte RG (2020) Robust analogs to the coefficient of variation. J Appl Stat:1\u201323. https:\/\/doi.org\/10.1080\/02664763.2020.1808599","DOI":"10.1080\/02664763.2020.1808599"},{"issue":"3","key":"9996_CR12","doi-asserted-by":"publisher","first-page":"1490","DOI":"10.1007\/s10664-017-9553-x","volume":"23","author":"MM Arif","year":"2018","unstructured":"Arif MM, Shang W, Shihab E (2018) Empirical study on the discrepancy between performance testing results from virtual and physical environments. Empir Softw Eng 23(3):1490\u20131518. https:\/\/doi.org\/10.1007\/s10664-017-9553-x","journal-title":"Empir Softw Eng"},{"key":"9996_CR13","doi-asserted-by":"publisher","unstructured":"Bacon DF, Sweeney PF (1996) Fast static analysis of C++ virtual function calls. In: Proceedings of the 11th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA, vol 1996. ACM, New York, pp 324\u2013341. https:\/\/doi.org\/10.1145\/236337.236371","DOI":"10.1145\/236337.236371"},{"key":"9996_CR14","doi-asserted-by":"publisher","unstructured":"Bezemer CP, Eismann S, Ferme V, Grohmann J, Heinrich R, Jamshidi P, Shang W, van Hoorn A, Villavicencio M, Walter J, Willnecker F (2019) How is performance addressed in DevOps?. In: Proceedings of the 10th ACM\/SPEC International Conference on Performance Engineering, ICPE. https:\/\/doi.org\/10.1145\/3297663.3309672, vol 2019. ACM, New York, pp 45\u201350","DOI":"10.1145\/3297663.3309672"},{"key":"9996_CR15","doi-asserted-by":"publisher","unstructured":"Blackburn SM, Cheng P, McKinley KS (2004) (2004) Myths And realities: The performance impact of garbage collection. In: Proceedings of the ACM Joint International Conference on Measurement and Modeling of Computer Systems, ACM, SIGMETRICS\/Performance. https:\/\/doi.org\/10.1145\/1005686.1005693","DOI":"10.1145\/1005686.1005693"},{"key":"9996_CR16","doi-asserted-by":"publisher","unstructured":"Blackburn SM, Diwan A, Hauswirth M, Sweeney PF, Amaral JN, Brecht T, Bulej L, Click C, Eeckhout L, Fischmeister S et al (2016) The truth, the whole truth, and nothing but the truth: A pragmatic guide to assessing empirical evaluations. ACM Trans Programm Lang Syst 38(4). https:\/\/doi.org\/10.1145\/2983574","DOI":"10.1145\/2983574"},{"issue":"7","key":"9996_CR17","doi-asserted-by":"publisher","first-page":"1145","DOI":"10.1016\/S0031-3203(96)00142-2","volume":"30","author":"AP Bradley","year":"1997","unstructured":"Bradley A P (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145\u20131159. https:\/\/doi.org\/10.1016\/s0031-3203(96)00142-2","journal-title":"Pattern Recogn"},{"key":"9996_CR18","doi-asserted-by":"publisher","unstructured":"Breiman L (2001) Random forests. Mach Learn 45(1):5\u201332. https:\/\/doi.org\/10.1023\/a:1010933404324","DOI":"10.1023\/a:1010933404324"},{"key":"9996_CR19","doi-asserted-by":"publisher","unstructured":"Buckland M, Gey F (1994) The relationship between Recall and Precision. J Amer Soc Inf Sci 45(1):12\u201319, https:\/\/doi.org\/10.1002\/(sici)1097-4571(199401)45:1%3C12::aid-asi2%3E3.0.co;2-l","DOI":"10.1002\/(sici)1097-4571(199401)45:1%3C12::aid-asi2%3E3.0.co;2-l"},{"key":"9996_CR20","doi-asserted-by":"publisher","unstructured":"Bulej L, Bure\u0161 T, Keznikl J, koubkov\u00e1 A, Podzimek A, T\u016fma P (2012) Capturing performance assumptions using Stochastic Performance Logic. In: Proceedings of the 3rd ACM\/SPEC International Conference on Performance Engineering, ICPE, vol 2012. ACM, New York, pp 311\u2013322. https:\/\/doi.org\/10.1145\/2188286.2188345","DOI":"10.1145\/2188286.2188345"},{"issue":"1","key":"9996_CR21","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1007\/s10515-015-0188-0","volume":"24","author":"L Bulej","year":"2017","unstructured":"Bulej L, Bure\u0161 T, Hork\u00fd V, Kotr\u010d J, Marek L, Troj\u00e1nek T, T\u016fma P (2017a) Unit testing performance with Stochastic Performance Logic. Autom Softw Eng 24(1):139\u2013187. https:\/\/doi.org\/10.1007\/s10515-015-0188-0","journal-title":"Autom Softw Eng"},{"key":"9996_CR22","doi-asserted-by":"publisher","unstructured":"Bulej L, Hork\u00fd V, T\u016fma P (2017b) Do we teach useful statistics for performance evaluation? In: Proceedings of the 8th ACM\/SPEC on International Conference on Performance Engineering Companion, ICPE 2017 Companion. ACM, New York, pp 185\u2013189. https:\/\/doi.org\/10.1145\/3053600.3053638","DOI":"10.1145\/3053600.3053638"},{"key":"9996_CR23","doi-asserted-by":"publisher","unstructured":"Bulej L, Hork\u00fd V, T\u016fma P, Farquet F, Prokopec A (2020) Duet benchmarking: Improving measurement accuracy in the cloud. In: Proceedings of the 11th ACM\/SPEC International Conference on Performance Engineering, ICPE. ACM, New York, p 2020. https:\/\/doi.org\/10.1145\/3358960.3379132","DOI":"10.1145\/3358960.3379132"},{"key":"9996_CR24","doi-asserted-by":"publisher","unstructured":"Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 16:321\u2013357. https:\/\/doi.org\/10.1613\/jair.953","DOI":"10.1613\/jair.953"},{"key":"9996_CR25","doi-asserted-by":"publisher","unstructured":"Chen J, Shang W (2017) An exploratory study of performance regression introducing code changes. In: Proceedings of the 33rd IEEE International Conference on Software Maintenance and Evolution, ISCME. IEEE, New York, p 2017. https:\/\/doi.org\/10.1109\/icsme.2017.13","DOI":"10.1109\/icsme.2017.13"},{"key":"9996_CR26","doi-asserted-by":"publisher","unstructured":"Chen J, Shang W, Shihab E (2020) PerfJIT: Test-level just-in-time prediction for performance regression introducing commits. IEEE Transactions on Software Engineering pp 1\u20131. https:\/\/doi.org\/10.1109\/tse.2020.3023955","DOI":"10.1109\/tse.2020.3023955"},{"key":"9996_CR27","doi-asserted-by":"publisher","unstructured":"Chen T H, Syer M D, Shang W, Jiang Z M, Hassan A E, Nasser M, Flora P (2019) Analytics-driven Load testing: An industrial experience report on load testing of large-scale systems. In: Proceedings of the 39th IEEE\/ACM International Conference on Software Engineering: Software Engineering in Practice. IEEE, ICSE-SEIP. https:\/\/doi.org\/10.1109\/icse-seip.2017.26","DOI":"10.1109\/icse-seip.2017.26"},{"key":"9996_CR28","doi-asserted-by":"publisher","unstructured":"Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and Accuracy in binary classification evaluation. BMC Genom 21(1). https:\/\/doi.org\/10.1186\/s12864-019-6413-7","DOI":"10.1186\/s12864-019-6413-7"},{"key":"9996_CR29","doi-asserted-by":"publisher","unstructured":"Chinchor N (1992) MUC-4 Evaluation metrics. In: Proceedings of the 4th Conference on Message Understanding, Association for Computational Linguistics MUC4. https:\/\/doi.org\/10.3115\/1072064.1072067","DOI":"10.3115\/1072064.1072067"},{"key":"9996_CR30","unstructured":"Cliff N (1996) Ordinal Methods for Behavioral Data Analysis, 1st edn. Psychology Press"},{"issue":"3","key":"9996_CR31","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1007\/bf00994018","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273\u2013297. https:\/\/doi.org\/10.1007\/bf00994018","journal-title":"Mach Learn"},{"key":"9996_CR32","doi-asserted-by":"publisher","unstructured":"Costa D, Andrzejak A, Seboek J, Lo D (2017) Empirical study of usage and performance of java collections. In: Proceedings of the 8th ACM\/SPEC on International Conference on Performance Engineering. ACM, ICPE. https:\/\/doi.org\/10.1145\/3030207.3030221","DOI":"10.1145\/3030207.3030221"},{"key":"9996_CR33","doi-asserted-by":"publisher","unstructured":"Curtsinger C, Berger E D (2013) STABILIZER: Statistically sound performance evaluation. In: Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS. ACM, New York, pp 219\u2013228. https:\/\/doi.org\/10.1145\/2451116.2451141","DOI":"10.1145\/2451116.2451141"},{"issue":"4","key":"9996_CR34","doi-asserted-by":"publisher","first-page":"316","DOI":"10.2307\/2684359","volume":"44","author":"RB D\u2019Agostino","year":"1990","unstructured":"D\u2019Agostino RB, Belanger A, D\u2019Agostino RB Jr (1990) A suggestion for using powerful and informative tests of normality. Amer Stat 44(4):316. https:\/\/doi.org\/10.2307\/2684359","journal-title":"Amer Stat"},{"key":"9996_CR35","doi-asserted-by":"publisher","unstructured":"Damasceno Costa DE, Bezemer CP, Leitner P, Andrzejak A (2019) What\u2019s wrong with my benchmark results? Studying bad practices in JMH benchmarks. IEEE Transactions on Software Engineering, pp 1\u20131. https:\/\/doi.org\/10.1109\/TSE.2019.2925345","DOI":"10.1109\/TSE.2019.2925345"},{"key":"9996_CR36","doi-asserted-by":"crossref","unstructured":"Davison AC, Hinkley D (1997) Bootstrap methods and their application. J Am Stat Assoc:94","DOI":"10.1017\/CBO9780511802843"},{"key":"9996_CR37","doi-asserted-by":"publisher","unstructured":"Dean J, Grove D, Chambers C (1995) Optimization of object-oriented programs using static class hierarchy analysis. In: Proceedings of the 9th European Conference on Object-Oriented Programming, Springer Berlin Heidelberg, ECOOP 1995, pp 77\u2013101. https:\/\/doi.org\/10.1007\/3-540-49538-x_5","DOI":"10.1007\/3-540-49538-x_5"},{"key":"9996_CR38","doi-asserted-by":"publisher","unstructured":"Dilley N, Lange J (2019) An empirical study of messaging passing concurrency in Go projects. In: Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE, SANER. https:\/\/doi.org\/10.1109\/saner.2019.8668036","DOI":"10.1109\/saner.2019.8668036"},{"key":"9996_CR39","doi-asserted-by":"publisher","unstructured":"Ding Z, Chen J, Shang W (2020) Towards the use of the readily available tests from the release pipeline as performance tests. Are we there yet?. In: Proceedings of the 42nd IEEE\/ACM International Conference on Software Engineering, ICSE. ACM, New York, p 2020. https:\/\/doi.org\/10.1145\/3377811.3380351","DOI":"10.1145\/3377811.3380351"},{"issue":"3","key":"9996_CR40","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1080\/00401706.1964.10490181","volume":"6","author":"OJ Dunn","year":"1964","unstructured":"Dunn O J (1964) Multiple comparisons using rank sums. Technometrics 6(3):241\u2013252. https:\/\/doi.org\/10.1080\/00401706.1964.10490181","journal-title":"Technometrics"},{"key":"9996_CR41","doi-asserted-by":"publisher","unstructured":"Foo K C, Jiang Z M J, Adams B, Hassan A E, Zou Y, Flora P (2015) An industrial case study on the automated detection of performance regressions in heterogeneous environments. In: Proceedings of the 37th IEEE\/ACM International Conference on Software Engineering, ICSE 2015, vol 2. IEEE Press, Piscataway, pp 159\u2013168. https:\/\/doi.org\/10.1109\/icse.2015.144","DOI":"10.1109\/icse.2015.144"},{"key":"9996_CR42","unstructured":"Fox J (2016) Applied Regression Analysis and Generalized Linear Models, 3rd edn. SAGE Publications, https:\/\/us.sagepub.com\/en-us\/nam\/applied-regression-analysis-and-generalized-linear-models\/book237254"},{"key":"9996_CR43","doi-asserted-by":"publisher","unstructured":"Fraser G, Arcuri A (2011) EvoSuite: Automatic test suite generation for object-oriented software. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. ACM, ESEC\/FSE. https:\/\/doi.org\/10.1145\/2025113.2025179","DOI":"10.1145\/2025113.2025179"},{"issue":"1","key":"9996_CR44","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1006\/jcss.1997.1504","volume":"55","author":"Y Freund","year":"1997","unstructured":"Freund Y, Schapire R E (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119\u2013139. https:\/\/doi.org\/10.1006\/jcss.1997.1504","journal-title":"J Comput Syst Sci"},{"issue":"1","key":"9996_CR45","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1214\/aos\/1176347963","volume":"19","author":"JH Friedman","year":"1991","unstructured":"Friedman J H (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1\u201367. https:\/\/doi.org\/10.1214\/aos\/1176347963","journal-title":"Ann Stat"},{"issue":"5","key":"9996_CR46","doi-asserted-by":"publisher","first-page":"1189","DOI":"10.1214\/aos\/1013203451","volume":"29","author":"JH Friedman","year":"2001","unstructured":"Friedman J H (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189\u20131232. https:\/\/doi.org\/10.1214\/aos\/1013203451","journal-title":"Ann Stat"},{"key":"9996_CR47","doi-asserted-by":"publisher","unstructured":"Gao R, Jiang ZMJ (2017) An exploratory study on assessing the impact of environment variations on the results of load tests. https:\/\/doi.org\/10.1109\/msr.2017.22","DOI":"10.1109\/msr.2017.22"},{"key":"9996_CR48","doi-asserted-by":"publisher","unstructured":"Georges A, Buytaert D, Eeckhout L (2007) Statistically rigorous java performance evaluation. In: Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications, OOPSLA 2007. ACM, New York, pp 57\u201376. https:\/\/doi.org\/10.1145\/1297027.1297033","DOI":"10.1145\/1297027.1297033"},{"issue":"1","key":"9996_CR49","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1186\/2192-113X-2-6","volume":"2","author":"L Gillam","year":"2013","unstructured":"Gillam L, Li B, O\u2019Loughlin J, Tomar A P S (2013) Fair benchmarking for cloud computing systems. J Cloud Comput Adv Syst Appl 2(1):6. https:\/\/doi.org\/10.1186\/2192-113X-2-6","journal-title":"J Cloud Comput Adv Syst Appl"},{"key":"9996_CR50","doi-asserted-by":"publisher","unstructured":"Gligoric M, Eloussi L, Marinov D (2015) Practical regression test selection with dynamic file dependencies, Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA 2015. ACM, New York, pp 211\u2013222. https:\/\/doi.org\/10.1145\/2771783.2771784","DOI":"10.1145\/2771783.2771784"},{"key":"9996_CR51","unstructured":"Go Authors (2020a) Go \u2013 frequently asked questions (FAQ). https:\/\/golang.org\/doc\/faq"},{"key":"9996_CR52","unstructured":"Go Authors (2020b) The Go programming language specification. https:\/\/golang.org\/ref\/spec"},{"key":"9996_CR53","unstructured":"Goldberger J, Roweis S, Hinton GE, Salakhutdinov RR (2004) Neighbourhood components analysis. In: Advances in Neural Information Processing Systems, vol 17. MIT Press, NIPS 2004, vol 17, pp 513\u2013520, https:\/\/proceedings.neurips.cc\/paper\/2004\/file\/42fe880812925e520249e808937738d2-Paper.pdf"},{"issue":"6","key":"9996_CR54","doi-asserted-by":"publisher","first-page":"685","DOI":"10.1145\/506315.506316","volume":"23","author":"D Grove","year":"2001","unstructured":"Grove D, Chambers C (2001) A framework for call graph construction algorithms. ACM Trans Programm Lang Syst 23(6):685\u2013746. https:\/\/doi.org\/10.1145\/506315.506316","journal-title":"ACM Trans Programm Lang Syst"},{"issue":"1","key":"9996_CR55","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1148\/radiology.143.1.7063747","volume":"143","author":"JA Hanley","year":"1982","unstructured":"Hanley J A, McNeil B J (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29\u201336. https:\/\/doi.org\/10.1148\/radiology.143.1.7063747","journal-title":"Radiology"},{"issue":"3","key":"9996_CR56","doi-asserted-by":"publisher","first-page":"635","DOI":"10.1093\/biomet\/69.3.635","volume":"69","author":"FE Harrell","year":"1982","unstructured":"Harrell F E, Davis C E (1982) A new distribution-free quantile estimator. Biometrika 69(3):635\u2013640. https:\/\/doi.org\/10.1093\/biomet\/69.3.635","journal-title":"Biometrika"},{"issue":"2","key":"9996_CR57","doi-asserted-by":"publisher","first-page":"87","DOI":"10.2478\/v10117-011-0021-1","volume":"30","author":"J Hauke","year":"2011","unstructured":"Hauke J, Kossowski T (2011) Comparison of values of pearson\u2019s and spearman\u2019s correlation coefficients on the same sets of data. Quaest Geograph 30(2):87\u201393. https:\/\/doi.org\/10.2478\/v10117-011-0021-1","journal-title":"Quaest Geograph"},{"key":"9996_CR58","doi-asserted-by":"publisher","unstructured":"He S, Manns G, Saunders J, Wang W, Pollock L, Soffa M L (2019) A statistics-based performance testing methodology for cloud applications. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC\/FSE 2019. ACM, New York, pp 188\u2013199. https:\/\/doi.org\/10.1145\/3338906.3338912","DOI":"10.1145\/3338906.3338912"},{"key":"9996_CR59","unstructured":"Hess MR, Kromrey JD (2004) Robust confidence intervals for effect sizes: A comparative study of cohen\u2019s d and cliff\u2019s delta under non-normality and heterogeneous variances. Annual Meeting of the American Educational Research Association"},{"issue":"4","key":"9996_CR60","doi-asserted-by":"publisher","first-page":"371","DOI":"10.1080\/00031305.2015.1089789","volume":"69","author":"TC Hesterberg","year":"2015","unstructured":"Hesterberg T C (2015) What teachers should know about the bootstrap: Resampling in the undergraduate statistics curriculum. Amer Stat 69(4):371\u2013386. https:\/\/doi.org\/10.1080\/00031305.2015.1089789","journal-title":"Amer Stat"},{"key":"9996_CR61","doi-asserted-by":"publisher","unstructured":"Hind M (2001) Pointer Analysis: Haven\u2019t we solved this problem yet?. In: Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering. ACM, PASTE. https:\/\/doi.org\/10.1145\/379605.379665","DOI":"10.1145\/379605.379665"},{"key":"9996_CR62","doi-asserted-by":"publisher","unstructured":"Hork\u00fd V, Libi\u010d P, Marek L, Steinhauser A, T\u016fma P (2015) Utilizing performance unit tests to increase performance awareness. In: Proceedings of the 6th ACM\/SPEC International Conference on Performance Engineering, ICPE 2015. ACM, New York, pp 289\u2013300. https:\/\/doi.org\/10.1145\/2668930.2688051","DOI":"10.1145\/2668930.2688051"},{"key":"9996_CR63","doi-asserted-by":"crossref","unstructured":"Hosmer Jr, DW, Lemeshow S, Sturdivant R X (2013) Applied logistic regression, 3rd edn. Wiley","DOI":"10.1002\/9781118548387"},{"key":"9996_CR64","doi-asserted-by":"publisher","unstructured":"Huang P, Ma X, Shen D, Zhou Y (2014) Performance regression testing target prioritization via performance risk analysis. In: Proceedings of the 36th IEEE\/ACM International Conference on Software Engineering, ICSE 2014. ACM, New York, pp 60\u201371. https:\/\/doi.org\/10.1145\/2568225.2568232","DOI":"10.1145\/2568225.2568232"},{"key":"9996_CR65","unstructured":"Hudson R (2018) Getting to Go: The journey of Go\u2019s garbage collector. https:\/\/blog.golang.org\/ismmkeynote"},{"key":"9996_CR66","doi-asserted-by":"publisher","unstructured":"Iosup A, Yigitbasi N, Epema D (2011) On the performance variability of production cloud services. In: Proceedings of the 11th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2011. IEEE Computer Society, Washington, pp 104\u2013113. https:\/\/doi.org\/10.1109\/CCGrid.2011.22","DOI":"10.1109\/CCGrid.2011.22"},{"key":"9996_CR67","unstructured":"Jangda A, Powers B, Berger ED, Guha A (2019) Not so fast: Analyzing the performance of WebAssembly vs. native code. In: Proceedings of the 2019 USENIX Annual Technical Conference, USENIX ATC 2019. USENIX Association, Renton, pp 107\u2013120, https:\/\/www.usenix.org\/conference\/atc19\/presentation\/jangda"},{"issue":"11","key":"9996_CR68","doi-asserted-by":"publisher","first-page":"1091","DOI":"10.1109\/TSE.2015.2445340","volume":"41","author":"ZM Jiang","year":"2015","unstructured":"Jiang Z M, Hassan A E (2015) A survey on load testing of large-scale software systems. IEEE Trans Softw Eng 41(11):1091\u20131118. https:\/\/doi.org\/10.1109\/TSE.2015.2445340","journal-title":"IEEE Trans Softw Eng"},{"key":"9996_CR69","doi-asserted-by":"publisher","unstructured":"Jiarpakdee J, Tantithamthavorn C, Treude C (2018) AutoSpearman: Automatically mitigating correlated software metrics for interpreting defect models. In: Proceedings of the 34th IEEE International Conference on Software Maintenance and Evolution. IEEE, ICSME. https:\/\/doi.org\/10.1109\/icsme.2018.00018","DOI":"10.1109\/icsme.2018.00018"},{"key":"9996_CR70","unstructured":"Jiarpakdee J, Tantithamthavorn c, Hassan AE (2019) The impact of correlated metrics on the interpretation of defect models. IEEE Transactions on Software Engineering"},{"issue":"5","key":"9996_CR71","doi-asserted-by":"publisher","first-page":"3590","DOI":"10.1007\/s10664-020-09848-1","volume":"25","author":"J Jiarpakdee","year":"2020","unstructured":"Jiarpakdee J, Tantithamthavorn C, Treude C (2020) The impact of automated feature selection techniques on the interpretation of defect models. Empir Softw Eng 25(5):3590\u20133638. https:\/\/doi.org\/10.1007\/s10664-020-09848-1","journal-title":"Empir Softw Eng"},{"key":"9996_CR72","doi-asserted-by":"publisher","unstructured":"Jimenez I, Watkins N, Sevilla M, Lofstead J, Maltzahn C (2018) quiho: Automated performance regression testing using inferred resource utilization profiles. In: Proceedings of the 9th ACM\/SPEC International Conference on Performance Engineering, ICPE 2018. ACM, New York, pp 273\u2013284. https:\/\/doi.org\/10.1145\/3184407.3184422","DOI":"10.1145\/3184407.3184422"},{"key":"9996_CR73","doi-asserted-by":"publisher","unstructured":"Jin G, Song L, Shi X, Scherpelz J, Lu S (2012) Understanding and detecting real-world performance bugs. In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2012. ACM, New York, pp 77\u201388. https:\/\/doi.org\/10.1145\/2254064.2254075","DOI":"10.1145\/2254064.2254075"},{"key":"9996_CR74","unstructured":"John GH, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, UAI 1995. Morgan Kaufmann Publishers Inc., San Francisco, pp 338\u2013345, arXiv:https:\/\/arxiv.org\/abs\/1302.4964"},{"key":"9996_CR75","unstructured":"Kalibera T, Jones R (2012) Quantifying performance changes with effect size confidence intervals. Technical Report 4\u201312, University of Kent, http:\/\/www.cs.kent.ac.uk\/pubs\/2012\/3233"},{"key":"9996_CR76","doi-asserted-by":"publisher","unstructured":"Kalibera T, Jones R (2013) Rigorous benchmarking in reasonable time. In: Proceedings of the 2013 ACM SIGPLAN International Symposium on Memory Management, ISMM 2013. ACM, New York, pp 63\u201374. https:\/\/doi.org\/10.1145\/2464157.2464160","DOI":"10.1145\/2464157.2464160"},{"key":"9996_CR77","doi-asserted-by":"publisher","unstructured":"Kaltenecker C, Grebhahn A, Siegmund N, Guo J, Apel S (2019) Distance-based sampling of software configuration spaces. In: Proceedings of the 41st IEEE\/ACM International Conference on Software Engineering. IEEE, ICSE. https:\/\/doi.org\/10.1109\/icse.2019.00112","DOI":"10.1109\/icse.2019.00112"},{"key":"9996_CR78","doi-asserted-by":"publisher","unstructured":"Kraemer HC, Morgan GA, Leech NL, Gliner JA, Vaske JJ, Harmon RJ (2003) Measures of clinical significance. J Amer Acad Child Adolesc Psych 42(12):1524\u20131529. https:\/\/doi.org\/10.1097\/00004583-200312000-00022","DOI":"10.1097\/00004583-200312000-00022"},{"issue":"260","key":"9996_CR79","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1080\/01621459.1952.10483441","volume":"47","author":"WH Kruskal","year":"1952","unstructured":"Kruskal W H, Wallis W A (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583\u2013621. https:\/\/doi.org\/10.1080\/01621459.1952.10483441","journal-title":"J Am Stat Assoc"},{"key":"9996_CR80","doi-asserted-by":"publisher","unstructured":"Laaber C, Leitner P (2018) An evaluation of open-source software microbenchmark suites for continuous performance assessment. In: Proceedings of the 15th International Conference on Mining Software Repositories, MSR 2018. ACM, New York, pp 119\u2013130. https:\/\/doi.org\/10.1145\/3196398.3196407","DOI":"10.1145\/3196398.3196407"},{"key":"9996_CR81","doi-asserted-by":"publisher","unstructured":"Laaber C, Scheuner J, Leitner P (2019) Software microbenchmarking in the cloud. How bad is it really? Empirical Software Engineering. https:\/\/doi.org\/10.1007\/s10664-019-09681-1","DOI":"10.1007\/s10664-019-09681-1"},{"key":"9996_CR82","doi-asserted-by":"publisher","unstructured":"Laaber C, W\u00fcrsten S, Gall H C, Leitner P (2020) Dynamically reconfiguring software microbenchmarks: Reducing execution time without sacrificing result quality. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, ESEC\/FSE. https:\/\/doi.org\/10.1145\/3368089.3409683","DOI":"10.1145\/3368089.3409683"},{"key":"9996_CR83","doi-asserted-by":"publisher","unstructured":"Laaber C, Basmaci M, Salza P (2021) Replication package \u201dPredicting unstable software benchmarks using static source code features\u201d. https:\/\/doi.org\/10.5281\/zenodo.4783139","DOI":"10.5281\/zenodo.4783139"},{"key":"9996_CR84","doi-asserted-by":"publisher","unstructured":"Leitner P, Bezemer C P (2017) An exploratory study of the state of practice of performance testing in java-based open source projects. In: Proceedings of the 8th ACM\/SPEC on International Conference on Performance Engineering, ICPE 2017. ACM, New York, pp 373\u2013384. https:\/\/doi.org\/10.1145\/3030207.3030213","DOI":"10.1145\/3030207.3030213"},{"key":"9996_CR85","doi-asserted-by":"publisher","unstructured":"Leitner P, Cito J (2016) Patterns in the chaos \u2013 A study of performance variation and predictability in public IaaS clouds. ACM Trans Internet Technol 16(3):15:1\u201315:23. https:\/\/doi.org\/10.1145\/2885497","DOI":"10.1145\/2885497"},{"key":"9996_CR86","doi-asserted-by":"publisher","unstructured":"Liu Y, Xu C, Cheung S C (2014) Characterizing and detecting performance bugs for smartphone applications. In: Proceedings of the 36th IEEE\/ACM International Conference on Software Engineering, ICSE 2014. ACM, New York, pp 1013\u20131024. https:\/\/doi.org\/10.1145\/2568225.2568229","DOI":"10.1145\/2568225.2568229"},{"key":"9996_CR87","doi-asserted-by":"publisher","unstructured":"Luo Q, Hariri F, Eloussi L, Marinov D (2014) An empirical analysis of flaky tests. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014. ACM Press. https:\/\/doi.org\/10.1145\/2635868.2635920","DOI":"10.1145\/2635868.2635920"},{"key":"9996_CR88","doi-asserted-by":"publisher","unstructured":"Machalica M, Samylkin A, Porth M, Chandra S (2019) Predictive test selection. In: Proceedings of the 41st IEEE\/ACM International Conference on Software Engineering: Software Engineering in Practice. IEEE, ICSE-SEIP. https:\/\/doi.org\/10.1109\/icse-seip.2019.00018","DOI":"10.1109\/icse-seip.2019.00018"},{"key":"9996_CR89","unstructured":"Maricq A, Duplyakin D, Jimenez I, Maltzahn C, Stutsman R, Ricci R (2018) Taming performance variability. In: Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, OSDI 2018. USENIX Association, pp 409\u2013425, https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/maricq"},{"issue":"361","key":"9996_CR90","doi-asserted-by":"publisher","first-page":"194","DOI":"10.1080\/01621459.1978.10480027","volume":"73","author":"JS Maritz","year":"1978","unstructured":"Maritz J S, Jarrett R G (1978) A note on estimating the variance of the sample median. J Am Stat Assoc 73(361):194\u2013196. https:\/\/doi.org\/10.1080\/01621459.1978.10480027","journal-title":"J Am Stat Assoc"},{"issue":"2","key":"9996_CR91","doi-asserted-by":"publisher","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","volume":"405","author":"BW Matthews","year":"1975","unstructured":"Matthews B W (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Bioch Biophys Acta (BBAxs) - Protein Struct 405 (2):442\u2013451. https:\/\/doi.org\/10.1016\/0005-2795(75)90109-9","journal-title":"Bioch Biophys Acta (BBAxs) - Protein Struct"},{"issue":"3","key":"9996_CR92","doi-asserted-by":"publisher","first-page":"1644","DOI":"10.1007\/s10664-019-09795-6","volume":"25","author":"A Mazuera-Rozo","year":"2020","unstructured":"Mazuera-Rozo A, Trubiani C, Linares-V\u00e1squez M, Bavota G (2020) Investigating types and survivability of performance bugs in mobile apps. Empir Softw Eng 25(3):1644\u20131686. https:\/\/doi.org\/10.1007\/s10664-019-09795-6","journal-title":"Empir Softw Eng"},{"key":"9996_CR93","doi-asserted-by":"publisher","unstructured":"McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng SE-2(4):308\u2013320. https:\/\/doi.org\/10.1109\/tse.1976.233837","DOI":"10.1109\/tse.1976.233837"},{"issue":"4","key":"9996_CR94","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1109\/MIC.2002.1020328","volume":"6","author":"DA Menasc\u00e9","year":"2002","unstructured":"Menasc\u00e9 D A (2002) Load testing of web sites. IEEE Internet Comput 6(4):70\u201374. https:\/\/doi.org\/10.1109\/MIC.2002.1020328","journal-title":"IEEE Internet Comput"},{"key":"9996_CR95","doi-asserted-by":"publisher","unstructured":"Mostafa S, Wang X, Xie T (2017) PerfRanker: Prioritization of performance regression tests for collection-intensive software. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2017. ACM, New York, pp 23\u201334. https:\/\/doi.org\/10.1145\/3092703.3092725","DOI":"10.1145\/3092703.3092725"},{"key":"9996_CR96","doi-asserted-by":"publisher","unstructured":"M\u00fchlbauer S, Apel S, Siegmund N (2020) Identifying software performance changes across variants and versions. In: Proceedings of the 35th IEEE\/ACM International Conference on Automated Software Engineering. ACM, ASE. https:\/\/doi.org\/10.1145\/3324884.3416573","DOI":"10.1145\/3324884.3416573"},{"key":"9996_CR97","doi-asserted-by":"publisher","unstructured":"Mytkowicz T, Diwan A, Hauswirth M, Sweeney P F (2009) Producing wrong data without doing anything obviously wrong!. In: Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2009. ACM, New York, pp 265\u2013276. https:\/\/doi.org\/10.1145\/1508244.1508275","DOI":"10.1145\/1508244.1508275"},{"key":"9996_CR98","doi-asserted-by":"publisher","unstructured":"Nguyen T H D, Nagappan M, Hassan A E, Nasser M, Flora P (2014) An industrial case study of automatically identifying performance regression-causes. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014. ACM, New York, pp 232\u2013241. https:\/\/doi.org\/10.1145\/2597073.2597092","DOI":"10.1145\/2597073.2597092"},{"key":"9996_CR99","doi-asserted-by":"publisher","unstructured":"Nistor A, Song L, Marinov D, Lu S (2013) Toddler: Detecting performance problems via similar memory-access patterns. In: Proceedings of the 35th IEEE\/ACM International Conference on Software Engineering, ICSE 2013. IEEE Press, Piscataway, pp 562\u2013571. https:\/\/doi.org\/10.1109\/ICSE.2013.6606602","DOI":"10.1109\/ICSE.2013.6606602"},{"key":"9996_CR100","doi-asserted-by":"publisher","unstructured":"Nistor A, Chang PC, Radoi C, Lu S (2015) Caramel: Detecting and fixing performance problems that have non-intrusive fixes. In: Proceedings of the 37th IEEE\/ACM International Conference on Software Engineering, ICSE 2015, vol 1. IEEE Press, Piscataway, pp 902\u2013912. https:\/\/doi.org\/10.1109\/ICSE.2015.100","DOI":"10.1109\/ICSE.2015.100"},{"key":"9996_CR101","doi-asserted-by":"publisher","unstructured":"de Oliveira A B, Petkovich J C, Reidemeister T, Fischmeister S (2013) DataMill: Rigorous performance evaluation made easy. In: Proceedings of the 4th ACM\/SPEC International Conference on Performance Engineering, ICPE 2013. ACM, New York, pp 137\u2013148. https:\/\/doi.org\/10.1145\/2479871.2479892","DOI":"10.1145\/2479871.2479892"},{"key":"9996_CR102","doi-asserted-by":"publisher","unstructured":"de Oliveira AB, Fischmeister S, Diwan A, Hauswirth M, Sweeney PF (2017) Perphecy: Performance regression test selection made simple but effective. In: Proceedings of the 10th IEEE International Conference on Software Testing, Verification and Validation, ICST 2017, pp 103\u2013113. https:\/\/doi.org\/10.1109\/ICST.2017.17","DOI":"10.1109\/ICST.2017.17"},{"key":"9996_CR103","doi-asserted-by":"publisher","unstructured":"Park C, Kim H, Wang M (2020) Investigation of finite-sample properties of robust location and scale estimators. Commun Stat Simul Comput:1\u201327. https:\/\/doi.org\/10.1080\/03610918.2019.1699114","DOI":"10.1080\/03610918.2019.1699114"},{"key":"9996_CR104","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12(85):2825\u20132830. http:\/\/jmlr.org\/papers\/v12\/pedregosa11a.html"},{"issue":"1","key":"9996_CR105","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1007\/bf00116251","volume":"1","author":"JR Quinlan","year":"1986","unstructured":"Quinlan J R (1986) Induction of decision trees. Mach Learn 1 (1):81\u2013106. https:\/\/doi.org\/10.1007\/bf00116251","journal-title":"Mach Learn"},{"key":"9996_CR106","doi-asserted-by":"publisher","unstructured":"Rodriguez-Cancio M, Combemale B, Baudry B (2016) Automatic microbenchmark generation to prevent dead code elimination and constant folding. In: Proceedings of the 31st IEEE\/ACM International Conference on Automated Software Engineering, ASE 2016. Association for Computing Machinery, New York, pp 132\u2013143. https:\/\/doi.org\/10.1145\/2970276.2970346","DOI":"10.1145\/2970276.2970346"},{"key":"9996_CR107","unstructured":"Romano J, Kromrey J, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen\u2019s d for evaluating group differences on the NSSE and other surveys? In: Annual Meeting of the Florida Association of Institutional Research, pp 1\u20133"},{"key":"9996_CR108","doi-asserted-by":"publisher","unstructured":"Rubin D B (1987) Multiple imputation for nonresponse in surveys. Wiley. https:\/\/doi.org\/10.1002\/9780470316696","DOI":"10.1002\/9780470316696"},{"issue":"4","key":"9996_CR109","doi-asserted-by":"publisher","first-page":"296","DOI":"10.1109\/72.80266","volume":"1","author":"DW Ruck","year":"1990","unstructured":"Ruck D W, Rogers S K, Kabrisky M, Oxley M E, Suter B W (1990) The multilayer perceptron as an approximation to a bayes optimal discriminant function. IEEE Trans Neural Netw 1(4):296\u2013298. https:\/\/doi.org\/10.1109\/72.80266","journal-title":"IEEE Trans Neural Netw"},{"key":"9996_CR110","doi-asserted-by":"publisher","unstructured":"Sandoval Alcocer J P, Bergel A (2015) Tracking down performance variation against source code evolution. In: Proceedings of the 11th Symposium on Dynamic Languages, DLS 2015. ACM, New York, pp 129\u2013139. https:\/\/doi.org\/10.1145\/2816707.2816718","DOI":"10.1145\/2816707.2816718"},{"key":"9996_CR111","doi-asserted-by":"publisher","unstructured":"Sandoval Alcocer J P, Bergel A, Valente M T (2016) Learning from source code history to identify performance failures. In: Proceedings of the 7th ACM\/SPEC on International Conference on Performance Engineering, ICPE 2016. ACM, New York, pp 37\u201348. https:\/\/doi.org\/10.1145\/2851553.2851571","DOI":"10.1145\/2851553.2851571"},{"key":"9996_CR112","doi-asserted-by":"publisher","first-page":"102415","DOI":"10.1016\/j.scico.2020.102415","volume":"191","author":"JP Sandoval Alcocer","year":"2020","unstructured":"Sandoval Alcocer J P, Bergel A, Valente M T (2020) Prioritizing versions for performance regression testing: The Pharo case. Sci Comput Program 191:102415. https:\/\/doi.org\/10.1016\/j.scico.2020.102415","journal-title":"Sci Comput Program"},{"key":"9996_CR113","doi-asserted-by":"publisher","unstructured":"Scheuner J, Leitner P (2018) Estimating cloud application performance based on micro-benchmark profiling. In: Proceedings of the 11th IEEE International Conference on Cloud Computing. IEEE, CLOUD 2014. https:\/\/doi.org\/10.1109\/cloud.2018.00019","DOI":"10.1109\/cloud.2018.00019"},{"key":"9996_CR114","doi-asserted-by":"publisher","unstructured":"Selakovic M, Pradel M (2016) Performance issues and optimizations in JavaScript: An empirical study. In: Proceedings of the 38th IEEE\/ACM International Conference on Software Engineering, ICSE 2016. ACM, New York, pp 61\u201372. https:\/\/doi.org\/10.1145\/2884781.2884829","DOI":"10.1145\/2884781.2884829"},{"key":"9996_CR115","unstructured":"Shipilev A (2018) Reconsider defaults for warmup and measurement iteration counts, durations. https:\/\/bugs.openjdk.java.net\/browse\/CODETOOLS-7902165"},{"key":"9996_CR116","doi-asserted-by":"publisher","unstructured":"Shivers O (1988) Control flow analysis in scheme. In: Proceedings of the 1988 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 1988. ACM, New York, pp 164\u2013174. https:\/\/doi.org\/10.1145\/960116.54007","DOI":"10.1145\/960116.54007"},{"key":"9996_CR117","doi-asserted-by":"publisher","unstructured":"Siegmund N, Grebhahn A, Apel S, K\u00e4stner C (2015) Performance-influence models for highly configurable systems. In: Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. ACM, ESEC\/FSE. https:\/\/doi.org\/10.1145\/2786805.2786845","DOI":"10.1145\/2786805.2786845"},{"key":"9996_CR118","doi-asserted-by":"publisher","unstructured":"Song L, Lu S (2017) Performance diagnosis for inefficient loops. In: Proceedings of the 39th IEEE\/ACM International Conference on Software Engineering. IEEE, ICSE. https:\/\/doi.org\/10.1109\/icse.2017.41","DOI":"10.1109\/icse.2017.41"},{"key":"9996_CR119","doi-asserted-by":"publisher","unstructured":"Stefan P, Hork\u00fd V, Bulej L, T\u016fma P (2017) Unit testing performance in Java projects: Are we there yet?. In: Proceedings of the 8th ACM\/SPEC on International Conference on Performance Engineering, ICPE 2017. ACM, New York, pp 401\u2013412. https:\/\/doi.org\/10.1145\/3030207.3030226","DOI":"10.1145\/3030207.3030226"},{"issue":"3","key":"9996_CR120","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3241743","volume":"27","author":"KJ Stol","year":"2018","unstructured":"Stol K J, Fitzgerald B (2018) The ABC of software engineering research. ACM Trans Softw Eng Methodol 27(3):1\u201351. https:\/\/doi.org\/10.1145\/3241743","journal-title":"ACM Trans Softw Eng Methodol"},{"issue":"7","key":"9996_CR121","doi-asserted-by":"publisher","first-page":"683","DOI":"10.1109\/TSE.2018.2794977","volume":"45","author":"C Tantithamthavorn","year":"2019","unstructured":"Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K (2019) The impact of automated parameter optimization on defect prediction models. IEEE Trans Softw Eng 45(7):683\u2013711. https:\/\/doi.org\/10.1109\/TSE.2018.2794977","journal-title":"IEEE Trans Softw Eng"},{"issue":"11","key":"9996_CR122","doi-asserted-by":"publisher","first-page":"1200","DOI":"10.1109\/TSE.2018.2876537","volume":"46","author":"C Tantithamthavorn","year":"2020","unstructured":"Tantithamthavorn C, Hassan A E, Matsumoto K (2020) The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans Softw Eng 46(11):1200\u20131219. https:\/\/doi.org\/10.1109\/tse.2018.2876537","journal-title":"IEEE Trans Softw Eng"},{"issue":"5","key":"9996_CR123","doi-asserted-by":"publisher","first-page":"540","DOI":"10.1007\/s10664-008-9103-7","volume":"14","author":"B Turhan","year":"2009","unstructured":"Turhan B, Menzies T, Bener A B, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540\u2013578. https:\/\/doi.org\/10.1007\/s10664-008-9103-7","journal-title":"Empir Softw Eng"},{"issue":"3","key":"9996_CR124","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1177\/0962280206074463","volume":"16","author":"S van Buuren","year":"2007","unstructured":"van Buuren S (2007) Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res 16(3):219\u2013242. https:\/\/doi.org\/10.1177\/0962280206074463","journal-title":"Stat Methods Med Res"},{"key":"9996_CR125","doi-asserted-by":"publisher","unstructured":"van Buuren S, Groothuis-Oudshoorn K (2011) mice: Multivariate imputation by chained equations in R. J Stat Softw 45(3). https:\/\/doi.org\/10.18637\/jss.v045.i03","DOI":"10.18637\/jss.v045.i03"},{"issue":"2","key":"9996_CR126","doi-asserted-by":"publisher","first-page":"101","DOI":"10.2307\/1165329","volume":"25","author":"A Vargha","year":"2000","unstructured":"Vargha A, Delaney H D (2000) A critique and improvement of the \u201dCL\u201d common language effect size statistics of McGraw and Wong. J Educ Behav Stat 25(2):101\u2013132. https:\/\/doi.org\/10.2307\/1165329","journal-title":"J Educ Behav Stat"},{"key":"9996_CR127","doi-asserted-by":"publisher","unstructured":"Wang W, Tian N, Huang S, He S, Srivastava A, Soffa M L, Pollock L (2018) Testing cloud applications under cloud-uncertainty performance effects. In: Proceedings of the 11th IEEE International Conference on Software Testing. Verification and Validation, ICST 2018, pp 81\u201392. https:\/\/doi.org\/10.1109\/ICST.2018.00018","DOI":"10.1109\/ICST.2018.00018"},{"issue":"12","key":"9996_CR128","doi-asserted-by":"publisher","first-page":"1147","DOI":"10.1109\/32.888628","volume":"26","author":"EJ Weyuker","year":"2000","unstructured":"Weyuker E J, Vokolos F I (2000) Experience with performance testing of software systems: Issues, an approach, and case study. IEEE Trans Softw Eng 26 (12):1147\u20131156. https:\/\/doi.org\/10.1109\/32.888628","journal-title":"IEEE Trans Softw Eng"},{"issue":"6","key":"9996_CR129","doi-asserted-by":"publisher","first-page":"80","DOI":"10.2307\/3001968","volume":"1","author":"F Wilcoxon","year":"1945","unstructured":"Wilcoxon F (1945) Individual comparisons by ranking methods. Biometr Bullet 1(6):80. https:\/\/doi.org\/10.2307\/3001968","journal-title":"Biometr Bullet"},{"key":"9996_CR130","doi-asserted-by":"publisher","unstructured":"Woodside M, Franks G, Petriu DC (2007) The future of software performance engineering. In: Future of software engineering. IEEE, FOSE. https:\/\/doi.org\/10.1109\/fose.2007.32","DOI":"10.1109\/fose.2007.32"},{"issue":"2","key":"9996_CR131","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1002\/stv.430","volume":"22","author":"S Yoo","year":"2012","unstructured":"Yoo S, Harman M (2012) Regression testing minimization, selection and prioritization: A survey. Softw Test Verif Reliab 22(2):67\u2013120. https:\/\/doi.org\/10.1002\/stv.430","journal-title":"Softw Test Verif Reliab"},{"issue":"5","key":"9996_CR132","doi-asserted-by":"publisher","first-page":"3034","DOI":"10.1007\/s10664-017-9578-1","volume":"23","author":"T Yu","year":"2017","unstructured":"Yu T, Pradel M (2017) Pinpointing and repairing performance bottlenecks in concurrent programs. Empir Softw Eng 23(5):3034\u20133071. https:\/\/doi.org\/10.1007\/s10664-017-9578-1","journal-title":"Empir Softw Eng"},{"key":"9996_CR133","doi-asserted-by":"publisher","unstructured":"Zhang L (2018) Hybrid regression test selection. In: Proceedings of the 40th IEEE\/ACM International Conference on Software Engineering, ICSE 2018. ACM, New York, pp 199\u2013209. https:\/\/doi.org\/10.1145\/3180155.3180198","DOI":"10.1145\/3180155.3180198"},{"key":"9996_CR134","doi-asserted-by":"publisher","unstructured":"Zhao Y, Xiao L, Wang X, Sun L, Chen B, Liu Y, Bondi AB (2020) How are performance issues caused and resolved?\u2013An empirical study from a design perspective. In: Proceedings of the 11th ACM\/SPEC International Conference on Performance Engineering. ACM, ICPE. https:\/\/doi.org\/10.1145\/3358960.3379130","DOI":"10.1145\/3358960.3379130"},{"key":"9996_CR135","doi-asserted-by":"publisher","unstructured":"Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering. ACM Press, ESEC\/FSE. https:\/\/doi.org\/10.1145\/1595696.1595713","DOI":"10.1145\/1595696.1595713"}],"container-title":["Empirical Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-021-09996-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10664-021-09996-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-021-09996-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,10,28]],"date-time":"2021-10-28T04:12:47Z","timestamp":1635394367000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10664-021-09996-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,18]]},"references-count":135,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,11]]}},"alternative-id":["9996"],"URL":"https:\/\/doi.org\/10.1007\/s10664-021-09996-y","relation":{},"ISSN":["1382-3256","1573-7616"],"issn-type":[{"value":"1382-3256","type":"print"},{"value":"1573-7616","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,18]]},"assertion":[{"value":"1 June 2021","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 August 2021","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"114"}}