{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T05:01:23Z","timestamp":1777525283058,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":56,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,7,18]],"date-time":"2022-07-18T00:00:00Z","timestamp":1658102400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,7,18]]},"DOI":"10.1145\/3533767.3534405","type":"proceedings-article","created":{"date-parts":[[2022,7,15]],"date-time":"2022-07-15T14:28:50Z","timestamp":1657895330000},"page":"101-113","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":36,"title":["On the use of evaluation measures for defect prediction studies"],"prefix":"10.1145","author":[{"given":"Rebecca","family":"Moussa","sequence":"first","affiliation":[{"name":"University College London, UK"}]},{"given":"Federica","family":"Sarro","sequence":"additional","affiliation":[{"name":"University College London, UK"}]}],"member":"320","published-online":{"date-parts":[[2022,7,18]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Massimiliano Di Penta, and Davide Falessi","author":"Ahluwalia Aalok","year":"2020","unstructured":"Aalok Ahluwalia , Massimiliano Di Penta, and Davide Falessi . 2020 . On the Need of Removing Last Releases of Data When Using or Validating Defect Prediction Models . arXiv preprint arXiv:2003.14376. Aalok Ahluwalia, Massimiliano Di Penta, and Davide Falessi. 2020. On the Need of Removing Last Releases of Data When Using or Validating Defect Prediction Models. arXiv preprint arXiv:2003.14376."},{"key":"e_1_3_2_1_2_1","first-page":"219","article-title":"A hitchhiker\u2019s guide to statistical tests for assessing randomized algorithms in software engineering","volume":"24","author":"Arcuri Andrea","year":"2014","unstructured":"Andrea Arcuri and Lionel Briand . 2014 . A hitchhiker\u2019s guide to statistical tests for assessing randomized algorithms in software engineering . STVR , 24 , 3 (2014), 219 \u2013 250 . Andrea Arcuri and Lionel Briand. 2014. A hitchhiker\u2019s guide to statistical tests for assessing randomized algorithms in software engineering. STVR, 24, 3 (2014), 219\u2013250.","journal-title":"STVR"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2009.06.055"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-020-09878-9"},{"key":"e_1_3_2_1_5_1","article-title":"Evaluation measures for models assessment over imbalanced data sets","volume":"3","author":"Bekkar Mohamed","year":"2013","unstructured":"Mohamed Bekkar , Hassiba Kheliouane Djemaa , and Taklit Akrouf Alitouche . 2013 . Evaluation measures for models assessment over imbalanced data sets . J Inf Eng Appl , 3 , 10 (2013). Mohamed Bekkar, Hassiba Kheliouane Djemaa, and Taklit Akrouf Alitouche. 2013. Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl, 3, 10 (2013).","journal-title":"J Inf Eng Appl"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"crossref","unstructured":"K. E. Bennin K. Toda Y. Kamei J. Keung A. Monden and N. Ubayashi. 2016. Empirical Evaluation of Cross-Release Effort-Aware Defect Prediction Models. In Procs. of QRS. 214\u2013221. \t\t\t\t\t  K. E. Bennin K. Toda Y. Kamei J. Keung A. Monden and N. Ubayashi. 2016. Empirical Evaluation of Cross-Release Effort-Aware Defect Prediction Models. In Procs. of QRS. 214\u2013221.","DOI":"10.1109\/QRS.2016.33"},{"key":"e_1_3_2_1_7_1","unstructured":"Henry Bodkin. 2019. https:\/\/www.telegraph.co.uk\/news\/2018\/10\/19\/nhs-blunder-put-10000-patients-risk-wrong-prescription\/ \t\t\t\t\t  Henry Bodkin. 2019. https:\/\/www.telegraph.co.uk\/news\/2018\/10\/19\/nhs-blunder-put-10000-patients-risk-wrong-prescription\/"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"crossref","unstructured":"David Bowes Tracy Hall and David Gray. 2012. Comparing the performance of fault prediction models which report multiple performance measures: recomputing the confusion matrix. In Procs. of PROMISE. 109\u2013118. \t\t\t\t\t  David Bowes Tracy Hall and David Gray. 2012. Comparing the performance of fault prediction models which report multiple performance measures: recomputing the confusion matrix. In Procs. of PROMISE. 109\u2013118.","DOI":"10.1145\/2365324.2365338"},{"key":"e_1_3_2_1_9_1","volume-title":"Random forests. Machine learning, 45, 1","author":"Breiman Leo","year":"2001","unstructured":"Leo Breiman . 2001. Random forests. Machine learning, 45, 1 ( 2001 ), 5\u201332. Leo Breiman. 2001. Random forests. Machine learning, 45, 1 (2001), 5\u201332."},{"key":"e_1_3_2_1_10_1","volume-title":"A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2, 2","author":"Burges Christopher JC","year":"1998","unstructured":"Christopher JC Burges . 1998. A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2, 2 ( 1998 ), 121\u2013167. Christopher JC Burges. 1998. A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2, 2 (1998), 121\u2013167."},{"key":"e_1_3_2_1_11_1","volume-title":"The advantages of the MCC over F1 score and accuracy in binary classification evaluation. BMC genomics, 21, 1","author":"Chicco Davide","year":"2020","unstructured":"Davide Chicco and Giuseppe Jurman . 2020. The advantages of the MCC over F1 score and accuracy in binary classification evaluation. BMC genomics, 21, 1 ( 2020 ), 6. Davide Chicco and Giuseppe Jurman. 2020. The advantages of the MCC over F1 score and accuracy in binary classification evaluation. BMC genomics, 21, 1 (2020), 6."},{"key":"e_1_3_2_1_12_1","volume-title":"Mathematical methods of statistics","author":"Cramir Harald","unstructured":"Harald Cramir . 1946. Mathematical methods of statistics . Princeton U. Press . Harald Cramir. 1946. Mathematical methods of statistics. Princeton U. Press."},{"key":"e_1_3_2_1_13_1","unstructured":"Peter Flach and Meelis Kull. 2015. Precision-recall-gain curves: PR analysis done right. In Advances in neural information processing systems. 838\u2013846. \t\t\t\t\t  Peter Flach and Meelis Kull. 2015. Precision-recall-gain curves: PR analysis done right. In Advances in neural information processing systems. 838\u2013846."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1882471.1882479"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Takafumi Fukushima Yasutaka Kamei Shane McIntosh Kazuhiro Yamashita and Naoyasu Ubayashi. 2014. An Empirical Study of Just-in-Time Defect Prediction Using Cross-Project Models. In Procs. of MSR. 172\u2013181. \t\t\t\t\t  Takafumi Fukushima Yasutaka Kamei Shane McIntosh Kazuhiro Yamashita and Naoyasu Ubayashi. 2014. An Empirical Study of Just-in-Time Defect Prediction Using Cross-Project Models. In Procs. of MSR. 172\u2013181.","DOI":"10.1145\/2597073.2597075"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2011.103"},{"key":"e_1_3_2_1_17_1","volume-title":"Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine learning, 77, 1","author":"Hand David J","year":"2009","unstructured":"David J Hand . 2009. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine learning, 77, 1 ( 2009 ), 103\u2013123. David J Hand. 2009. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine learning, 77, 1 (2009), 103\u2013123."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"crossref","unstructured":"Mark Harman Syed Islam Yue Jia Leandro L Minku Federica Sarro and Komsan Srivisut. 2014. Less is more: Temporal fault predictive performance over multiple hadoop releases. In Procs. of SSBSE. 240\u2013246. \t\t\t\t\t  Mark Harman Syed Islam Yue Jia Leandro L Minku Federica Sarro and Komsan Srivisut. 2014. Less is more: Temporal fault predictive performance over multiple hadoop releases. In Procs. of SSBSE. 240\u2013246.","DOI":"10.1007\/978-3-319-09940-8_19"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2008.239"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2017.2724538"},{"key":"e_1_3_2_1_21_1","volume-title":"A Survey of Performance Optimization for Mobile Applications","author":"Hort Max","unstructured":"Max Hort , Maria Kechagia , Federica Sarro , and Mark Harman . 2021. A Survey of Performance Optimization for Mobile Applications . IEEE TSE. Max Hort, Maria Kechagia, Federica Sarro, and Mark Harman. 2021. A Survey of Performance Optimization for Mobile Applications. IEEE TSE."},{"key":"e_1_3_2_1_22_1","volume-title":"A Survey of Performance Optimization for Mobile Applications","author":"Hort Max","unstructured":"Max Hort , Maria Kechagia , Federica Sarro , and Mark Harman . 2021. A Survey of Performance Optimization for Mobile Applications . IEEE TSE. Max Hort, Maria Kechagia, Federica Sarro, and Mark Harman. 2021. A Survey of Performance Optimization for Mobile Applications. IEEE TSE."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-008-9079-3"},{"key":"e_1_3_2_1_24_1","volume-title":"Yves Le Traon, and Mark Harman","author":"Jimenez Matthieu","year":"2019","unstructured":"Matthieu Jimenez , Renaud Rwemalika , Mike Papadakis , Federica Sarro , Yves Le Traon, and Mark Harman . 2019 . The importance of accounting for real-world labelling when predicting software vulnerabilities. In Procs. of ESEC\/FSE. 695\u2013705. Matthieu Jimenez, Renaud Rwemalika, Mike Papadakis, Federica Sarro, Yves Le Traon, and Mark Harman. 2019. The importance of accounting for real-world labelling when predicting software vulnerabilities. In Procs. of ESEC\/FSE. 695\u2013705."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-015-9400-x"},{"key":"e_1_3_2_1_26_1","volume-title":"Defect Prediction: Accomplishments and Future Challenges. In Procs. of SANER. 33\u201345.","author":"Kamei Y.","year":"2016","unstructured":"Y. Kamei and E. Shihab . 2016 . Defect Prediction: Accomplishments and Future Challenges. In Procs. of SANER. 33\u201345. Y. Kamei and E. Shihab. 2016. Defect Prediction: Accomplishments and Future Challenges. In Procs. of SANER. 33\u201345."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/MS.2014.50"},{"key":"e_1_3_2_1_28_1","volume-title":"Logistic regression","author":"Kleinbaum David G","unstructured":"David G Kleinbaum , K Dietz , M Gail , Mitchel Klein , and Mitchell Klein . 2002. Logistic regression . Springer . David G Kleinbaum, K Dietz, M Gail, Mitchel Klein, and Mitchell Klein. 2002. Logistic regression. Springer."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2016.2630689"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/0005-2795(75)90109-9"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2007.70721"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2007.256941"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-011-9193-5"},{"key":"e_1_3_2_1_34_1","unstructured":"Mariam El Mezouar Feng Zhang and Ying Zou. 2016. Local versus Global Models for Effort-Aware Defect Prediction. In Procs. of CASCON. 178\u2013187. \t\t\t\t\t  Mariam El Mezouar Feng Zhang and Ying Zou. 2016. Local versus Global Models for Effort-Aware Defect Prediction. In Procs. of CASCON. 178\u2013187."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-020-09861-4"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Rebecca Moussa and Federica Sarro. 2022. On-line appendix - On the Use of Evaluation Measures for Defect Prediction Models. https:\/\/github.com\/SOLAR-group\/dpevalmeasures \t\t\t\t\t  Rebecca Moussa and Federica Sarro. 2022. On-line appendix - On the Use of Evaluation Measures for Defect Prediction Models. https:\/\/github.com\/SOLAR-group\/dpevalmeasures","DOI":"10.1145\/3533767.3534405"},{"key":"e_1_3_2_1_37_1","unstructured":"C. Ni X. Xia D. Lo X. Chen and Q. Gu. 2020. Revisiting Supervised and Unsupervised Methods for Effort-Aware Cross-Project Defect Prediction. IEEE TSE. \t\t\t\t\t  C. Ni X. Xia D. Lo X. Chen and Q. Gu. 2020. Revisiting Supervised and Unsupervised Methods for Effort-Aware Cross-Project Defect Prediction. IEEE TSE."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2018.12.001"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2019.110493"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Jean Petri\u0107 David Bowes Tracy Hall Bruce Christianson and Nathan Baddoo. 2016. The jinx on the NASA software defect data sets. In Procs. of EASE. 1\u20135. \t\t\t\t\t  Jean Petri\u0107 David Bowes Tracy Hall Bruce Christianson and Nathan Baddoo. 2016. The jinx on the NASA software defect data sets. In Procs. of EASE. 1\u20135.","DOI":"10.1145\/2915970.2916007"},{"key":"e_1_3_2_1_41_1","unstructured":"David Martin Powers. 2011. Evaluation: from precision recall and F-measure to ROC informedness markedness and correlation. \t\t\t\t\t  David Martin Powers. 2011. Evaluation: from precision recall and F-measure to ROC informedness markedness and correlation."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"crossref","unstructured":"Federica Sarro Mark Harman Yue Jia and Yuanyuan Zhang. 2018. Customer rating reactions can be predicted purely using app features. In Procs. of RE. 76\u201387. \t\t\t\t\t  Federica Sarro Mark Harman Yue Jia and Yuanyuan Zhang. 2018. Customer rating reactions can be predicted purely using app features. In Procs. of RE. 76\u201387.","DOI":"10.1109\/RE.2018.00018"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"crossref","unstructured":"Federica Sarro Alessio Petrozziello and Mark Harman. 2016. Multi-objective software effort estimation. In Procs. of ICSE. 619\u2013630. \t\t\t\t\t  Federica Sarro Alessio Petrozziello and Mark Harman. 2016. Multi-objective software effort estimation. In Procs. of ICSE. 619\u2013630.","DOI":"10.1145\/2884781.2884830"},{"key":"e_1_3_2_1_44_1","unstructured":"J. Sayyad Shirabad and T.J. Menzies. 2005. The PROMISE Repository of Software Engineering Databases.. http:\/\/promise.site.uottawa.ca\/SERepository \t\t\t\t\t  J. Sayyad Shirabad and T.J. Menzies. 2005. The PROMISE Repository of Software Engineering Databases.. http:\/\/promise.site.uottawa.ca\/SERepository"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2014.2322358"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2018.2836442"},{"key":"e_1_3_2_1_47_1","first-page":"822","article-title":"Local versus Global Lessons for Defect Prediction and Effort Estimation","volume":"39","author":"Turhan B.","year":"2013","unstructured":"B. Turhan , T. Zimmermann , F. Shull , L. Layman , A. Marcus , A. Butcher , D. Cok , and T. Menzies . 2013 . Local versus Global Lessons for Defect Prediction and Effort Estimation . IEEE TSE , 39 , 06 (2013), 822 \u2013 834 . issn:1939-3520 B. Turhan, T. Zimmermann, F. Shull, L. Layman, A. Marcus, A. Butcher, D. Cok, and T. Menzies. 2013. Local versus Global Lessons for Defect Prediction and Effort Estimation. IEEE TSE, 39, 06 (2013), 822\u2013834. issn:1939-3520","journal-title":"IEEE TSE"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"crossref","unstructured":"Mauno Vihinen. 2012. How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis. In BMC genomics. 13. \t\t\t\t\t  Mauno Vihinen. 2012. How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis. In BMC genomics. 13.","DOI":"10.1186\/1471-2164-13-S4-S2"},{"key":"e_1_3_2_1_49_1","volume-title":"Predicting Defective Lines Using a Model-Agnostic Technique","author":"Wattanakriengkrai Supatsara","unstructured":"Supatsara Wattanakriengkrai , Patanamon Thongtanunam , Hideaki Tantithamthavorn , Chakkrit Hata , and Kenichi Matsumoto . 2020. Predicting Defective Lines Using a Model-Agnostic Technique . In IEEE IEEE TSE. Supatsara Wattanakriengkrai, Patanamon Thongtanunam, Hideaki Tantithamthavorn, Chakkrit Hata, and Kenichi Matsumoto. 2020. Predicting Defective Lines Using a Model-Agnostic Technique. In IEEE IEEE TSE."},{"key":"e_1_3_2_1_50_1","unstructured":"Ian H Witten Eibe Frank and Mark A Hall. 2005. Practical machine learning tools and techniques. Morgan Kaufmann 578. \t\t\t\t\t  Ian H Witten Eibe Frank and Mark A Hall. 2005. Practical machine learning tools and techniques. Morgan Kaufmann 578."},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"crossref","unstructured":"Xiao Xuan David Lo Xin Xia and Yuan Tian. 2015. Evaluating Defect Prediction Approaches Using a Massive Set of Metrics: An Empirical Study. In Procs. of ACM SAC. 1644\u20131647. \t\t\t\t\t  Xiao Xuan David Lo Xin Xia and Yuan Tian. 2015. Evaluating Defect Prediction Approaches Using a Massive Set of Metrics: An Empirical Study. In Procs. of ACM SAC. 1644\u20131647.","DOI":"10.1145\/2695664.2695959"},{"key":"e_1_3_2_1_52_1","unstructured":"M. Yan X. Xia Y. Fan A. E. Hassan D. Lo and S. Li. 2020. Just-In-Time Defect Identification and Localization: A Two-Phase Framework. IEEE TSE 1\u20131. \t\t\t\t\t  M. Yan X. Xia Y. Fan A. E. Hassan D. Lo and S. Li. 2020. Just-In-Time Defect Identification and Localization: A Two-Phase Framework. IEEE TSE 1\u20131."},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"crossref","unstructured":"Meng Yan Xin Xia Yuanrui Fan David Lo Ahmed E. Hassan and Xindong Zhang. 2020. Effort-Aware Just-in-Time Defect Identification in Practice: A Case Study at Alibaba. ACM 1308\u20131319. \t\t\t\t\t  Meng Yan Xin Xia Yuanrui Fan David Lo Ahmed E. Hassan and Xindong Zhang. 2020. Effort-Aware Just-in-Time Defect Identification in Practice: A Case Study at Alibaba. ACM 1308\u20131319.","DOI":"10.1145\/3368089.3417048"},{"key":"e_1_3_2_1_54_1","unstructured":"Jingxiu Yao and Martin Shepperd. 2020. Assessing software defection prediction performance: why using the Matthews correlation coefficient matters. 120\u2013129. \t\t\t\t\t  Jingxiu Yao and Martin Shepperd. 2020. Assessing software defection prediction performance: why using the Matthews correlation coefficient matters. 120\u2013129."},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"crossref","unstructured":"Suraj Yatish Jirayus Jiarpakdee Patanamon Thongtanunam and Chakkrit Tantithamthavorn. 2019. Mining software defects: should we consider affected releases? In Procs. of ICSE. 654\u2013665. \t\t\t\t\t  Suraj Yatish Jirayus Jiarpakdee Patanamon Thongtanunam and Chakkrit Tantithamthavorn. 2019. Mining software defects: should we consider affected releases? In Procs. of ICSE. 654\u2013665.","DOI":"10.1109\/ICSE.2019.00075"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2007.70706"}],"event":{"name":"ISSTA '22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","location":"Virtual South Korea","acronym":"ISSTA '22","sponsor":["SIGSOFT ACM Special Interest Group on Software Engineering"]},"container-title":["Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3533767.3534405","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3533767.3534405","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T18:43:41Z","timestamp":1750272221000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3533767.3534405"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,18]]},"references-count":56,"alternative-id":["10.1145\/3533767.3534405","10.1145\/3533767"],"URL":"https:\/\/doi.org\/10.1145\/3533767.3534405","relation":{},"subject":[],"published":{"date-parts":[[2022,7,18]]},"assertion":[{"value":"2022-07-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}