{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T02:32:17Z","timestamp":1775010737577,"version":"3.50.1"},"reference-count":49,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2023,9,1]],"date-time":"2023-09-01T00:00:00Z","timestamp":1693526400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,9,18]],"date-time":"2023-09-18T00:00:00Z","timestamp":1694995200000},"content-version":"vor","delay-in-days":17,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62002148"],"award-info":[{"award-number":["62002148"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100011525","name":"Guangdong Key Laboratory of Fermentation and Enzyme Engineering","doi-asserted-by":"publisher","award":["2020B121201001"],"award-info":[{"award-number":["2020B121201001"]}],"id":[{"id":"10.13039\/501100011525","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100012540","name":"Guangdong Province Introduction of Innovative R&D Team","doi-asserted-by":"publisher","award":["2017ZT07X386"],"award-info":[{"award-number":["2017ZT07X386"]}],"id":[{"id":"10.13039\/100012540","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Research Institute of Trustworthy Autonomous Systems"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Empir Software Eng"],"published-print":{"date-parts":[[2023,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Just-In-Time Software Defect Prediction (JIT-SDP) is concerned with predicting whether software changes are defect-inducing or clean. It operates in scenarios where labels of software changes arrive over time with delay, which in part corresponds to the time we wait to label software changes as clean (waiting time). However, clean labels decided based on waiting time may be different from the true labels of software changes, i.e., there may be label noise. This typically overlooked issue has recently been shown to affect the validity of continuous performance evaluation procedures used to monitor the predictive performance of JIT-SDP models during the software development process. It is still unknown whether this issue could potentially also affect evaluation procedures that rely on retrospective collection of software changes such as those adopted in JIT-SDP research studies, affecting the validity of the conclusions of a large body of existing work. We conduct the first investigation of the extent with which the choice of waiting time and its corresponding label noise would affect the validity of retrospective performance evaluation procedures. Based on 13 GitHub projects, we found that the choice of waiting time did not have a significant impact on the validity and that even small waiting times resulted in high validity. Therefore, (1) the estimated predictive performances in JIT-SDP studies are likely reliable in view of different waiting times, and (2) future studies can make use of not only larger (5k+ software changes), but also smaller (1k software changes) projects for evaluating performance of JIT-SDP models.<\/jats:p>","DOI":"10.1007\/s10664-023-10341-8","type":"journal-article","created":{"date-parts":[[2023,9,18]],"date-time":"2023-09-18T10:01:57Z","timestamp":1695031317000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["On the validity of retrospective predictive performance evaluation procedures in just-in-time software defect prediction"],"prefix":"10.1007","volume":"28","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1172-8825","authenticated-orcid":false,"given":"Liyan","family":"Song","sequence":"first","affiliation":[]},{"given":"Leandro L.","family":"Minku","sequence":"additional","affiliation":[]},{"given":"Xin","family":"Yao","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,9,18]]},"reference":[{"key":"10341_CR1","doi-asserted-by":"crossref","unstructured":"Antoniol G, Ayari K, Di Penta M, Khomh F, Gueheneuc YG (2008) Is it a bug or an enhancement? a text-based approach to classify change requests. Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds. Ontario, Canada, pp 304\u2013318","DOI":"10.1145\/1463788.1463819"},{"key":"10341_CR2","doi-asserted-by":"crossref","unstructured":"Aranda J, Venolia G (2009) The secret life of bugs: Going past the errors and omissions in software repositories. International Conference on Software Engineering. Vancouver, BC, Canada, pp 298\u2013308","DOI":"10.1109\/ICSE.2009.5070530"},{"key":"10341_CR3","first-page":"281","volume":"13","author":"J Bergstra","year":"2012","unstructured":"Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. Journal of Machine Learning Research 13:281\u2013305","journal-title":"Journal of Machine Learning Research"},{"key":"10341_CR4","doi-asserted-by":"crossref","unstructured":"Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced? bias in bug-fix datasets. International Symposium on the Foundations of Software Engineering. The Netherlands, Amsterdam, pp 121\u2013130","DOI":"10.1145\/1595696.1595716"},{"key":"10341_CR5","doi-asserted-by":"publisher","unstructured":"Cabral GG, Minku LL (2023) Towards reliable online just-in-time software defect prediction. IEEE Transactions on Software Engineering 49(3):1342\u20131358 https:\/\/doi.org\/10.1109\/TSE.2022.3175789","DOI":"10.1109\/TSE.2022.3175789"},{"key":"10341_CR6","doi-asserted-by":"crossref","unstructured":"Cabral GG, Minku LL, Shihab E, Mujahid S (2019) Class imbalance evolution and verification latency in just-in-time software defect prediction. International Conference on Software Engineering. Montreal QC, Canada, pp 666\u2013676","DOI":"10.1109\/ICSE.2019.00076"},{"key":"10341_CR7","doi-asserted-by":"crossref","unstructured":"Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321\u2013357","DOI":"10.1613\/jair.953"},{"key":"10341_CR8","doi-asserted-by":"crossref","unstructured":"Chen TH, Nagappan M, Shihab E, Hassan AE (2014) An empirical study of dormant bugs. Conference on Mining Software Repositories. Hyderabad, India, pp 82\u201391","DOI":"10.1145\/2597073.2597108"},{"key":"10341_CR9","doi-asserted-by":"crossref","unstructured":"Da Costa DA, McIntosh S, Shang W, Kulesza U, Coelho R, Hassan AE (2017) A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. IEEE Transactions on Software Engineering 43(7):641\u2013657","DOI":"10.1109\/TSE.2016.2616306"},{"key":"10341_CR10","doi-asserted-by":"crossref","unstructured":"Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in non-stationary environments: A survey. IEEE Computational Intelligence Magazine 10(4):12\u201325","DOI":"10.1109\/MCI.2015.2471196"},{"key":"10341_CR11","doi-asserted-by":"crossref","unstructured":"Ekanayake J, Tappolet J, Gall HC, Bernstein A (2012) Time variance and defect prediction in software projects. Empirical Software Engineering 17(4):348\u2013389","DOI":"10.1007\/s10664-011-9180-x"},{"key":"10341_CR12","doi-asserted-by":"crossref","unstructured":"Eyolfson J, Tan L, Lam P (2011) Do time of day and developer experience affect commit bugginess? International Workshop on Mining Software Repositories. Waikiki, Honolulu, HI, USA, pp 153\u2013162","DOI":"10.1145\/1985441.1985464"},{"key":"10341_CR13","doi-asserted-by":"crossref","unstructured":"Fan Y, Xia X, Da Costa DA, Lo D, Hassan AE, Li S (2019) The impact of mislabeled changes by SZZ on just-in-time defect prediction. IEEE Transactions Software Engineering 47(8):1559\u20131586","DOI":"10.1109\/TSE.2019.2929761"},{"key":"10341_CR14","doi-asserted-by":"crossref","unstructured":"Gama J, Sebastiao R, Rodrigues PP (2013) On evaluating stream learning algorithms. Journal of Machine Learning 90(3):317\u2013346","DOI":"10.1007\/s10994-012-5320-9"},{"key":"10341_CR15","doi-asserted-by":"crossref","unstructured":"Harman M, Islam S, Jia Y, Minku LL, Sarro F, Srivisut K (2014) Less is more: Temporal fault predictive performance over multiple hadoop releases. In: Symposium on Search-Based Software Engineering (SSBSE\u201914). Lecture Notes in Computer Science Volume 8636. Cham, pp 240\u2013246","DOI":"10.1007\/978-3-319-09940-8_19"},{"key":"10341_CR16","doi-asserted-by":"crossref","unstructured":"Hassan AE (2009) Predicting faults using the complexity of code changes. International Conference on Software Engineering. Vancouver, BC, Canada, pp 16\u201324","DOI":"10.1109\/ICSE.2009.5070510"},{"key":"10341_CR17","doi-asserted-by":"crossref","unstructured":"He H, Garcia EA (2009) Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9):1263\u20131284","DOI":"10.1109\/TKDE.2008.239"},{"key":"10341_CR18","doi-asserted-by":"crossref","unstructured":"Herbold S, Trautsch A, Trautsch F, Ledel B (2022) Problems with SZZ and features: An empirical study of the state of practice of defect prediction data collection. Empirical Software Engineering 27(2):1\u201349","DOI":"10.1007\/s10664-021-10092-4"},{"key":"10341_CR19","doi-asserted-by":"crossref","unstructured":"Herzig K, Just S, Zeller A (2013) It is not a bug, it is a feature: How misclassification impacts bug prediction. International Conference on Software Engineering. San Francisco, CA, USA, pp 392\u2013401","DOI":"10.1109\/ICSE.2013.6606585"},{"key":"10341_CR20","doi-asserted-by":"crossref","unstructured":"Hoang T, Khanh Dam H, Kamei Y, Lo D, Ubayashi N (2019) DeepJIT: an end-to-end deep learning framework for just-in-time defect prediction. International Conference on Mining Software Repositories. Montreal QC, Canada, pp 34\u201345","DOI":"10.1109\/MSR.2019.00016"},{"key":"10341_CR21","doi-asserted-by":"crossref","unstructured":"Hoang T, Kang HJ, Lo D, Lawall J (2020) CC2Vec: Distributed representations of code changes. International Conference on Software Engineering. Seoul, South Korea, pp 518\u2013529","DOI":"10.1145\/3377811.3380361"},{"key":"10341_CR22","doi-asserted-by":"crossref","unstructured":"Kabir MA, Keung JW, Bennin KE, Zhang M (2019) Assessing the significant impact of concept drift in software defect prediction. Computer Software and Applications Conference. Milwaukee, WI, USA, pp 53\u201358","DOI":"10.1109\/COMPSAC.2019.00017"},{"key":"10341_CR23","doi-asserted-by":"crossref","unstructured":"Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39(6):757\u2013773","DOI":"10.1109\/TSE.2012.70"},{"key":"10341_CR24","doi-asserted-by":"crossref","unstructured":"Kamei Y, Fukushima T, Mcintosh S, Yamashita K, Ubayashi N, Hassan AE (2016) Studying just-in-time defect prediction using cross-project models. Empirical Software Engineering 21(5):2072\u20132106","DOI":"10.1007\/s10664-015-9400-x"},{"key":"10341_CR25","doi-asserted-by":"crossref","unstructured":"Kim S, Zimmermann T, Pan K, Whitehead EJ (2006) Automatic identification of bug-introducing changes. Automated Software Engineering. Tokyo, Japan, pp 81\u201390","DOI":"10.1109\/ASE.2006.23"},{"key":"10341_CR26","doi-asserted-by":"crossref","unstructured":"Kim S, Whitehead EJ, Zhang Y (2008) Classifying software changes: Clean or buggy? IEEE Transactions on Software Engineering 34(2):181\u2013196","DOI":"10.1109\/TSE.2007.70773"},{"key":"10341_CR27","doi-asserted-by":"crossref","unstructured":"Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. International Conference on Software Engineering. Honolulu, HI, USA, pp 481\u2013490","DOI":"10.1145\/1985793.1985859"},{"key":"10341_CR28","doi-asserted-by":"crossref","unstructured":"Kubat M, Holte R, Matwin S (1997) Learning when negative examples abound. In: European Conference on Machine Learning. Berlin, Heidelberg, pp 146\u2013153","DOI":"10.1007\/3-540-62858-4_79"},{"key":"10341_CR29","doi-asserted-by":"crossref","unstructured":"Mauchly JW (1940) Significance test for sphericity of a normal n-variate distribution. The Annals of Mathematical Statistics 11(2):204\u2013209","DOI":"10.1214\/aoms\/1177731915"},{"key":"10341_CR30","doi-asserted-by":"crossref","unstructured":"McIntosh S, Kamei Y (2018) Are fix-inducing changes a moving target? A longitudinal case study of just-in-time defect prediction. IEEE Transactions on Software Engineering 44(5):412\u2013428","DOI":"10.1109\/TSE.2017.2693980"},{"key":"10341_CR31","doi-asserted-by":"crossref","unstructured":"Misirli AT, Shihab E, Kamei Y (2016) Studying high impact fix-inducing changes. Empirical Software Engineering 21(2):605\u2013641","DOI":"10.1007\/s10664-015-9370-z"},{"key":"10341_CR32","doi-asserted-by":"crossref","unstructured":"Mockus A, Weiss DM (2000) Predicting risk of software change. Bell Labs Technical Journal 5(2):169\u2013180","DOI":"10.1002\/bltj.2229"},{"key":"10341_CR33","unstructured":"Montgomery DC (2017) Design and analysis of experiments. John Wiley & Sons"},{"key":"10341_CR34","doi-asserted-by":"crossref","unstructured":"Nam J, Kim S (2015) CLAMI: Defect prediction on unlabelled datasets. International Conference on Automated Software Engineering. Lincoln, NE, USA, pp 452\u2013463","DOI":"10.1109\/ASE.2015.56"},{"key":"10341_CR35","doi-asserted-by":"crossref","unstructured":"Neto EC, da Costa DA, Kulesza U (2018) The impact of refactoring changes on the SZZ algorithm: An empirical study. International Conference on Software Analysis. Evolution and Reengineering. Campobasso, Italy, pp 380\u2013390","DOI":"10.1109\/SANER.2018.8330225"},{"key":"10341_CR36","doi-asserted-by":"crossref","unstructured":"Nguyen HM, Cooper EW, Kamei K (2011) Online learning from imbalanced data streams. International Conference of Soft Computing and Pattern Recognition. Dalian, China, pp 347\u2013352","DOI":"10.1109\/SoCPaR.2011.6089268"},{"key":"10341_CR37","doi-asserted-by":"crossref","unstructured":"Pornprasit C, Tantithamthavorn CK (2021) JITLine: a simpler, better, faster, finer-grained just-in-time defect prediction. International Conference on Mining Software Repositories. Madrid, Spain, pp 369\u2013379","DOI":"10.1109\/MSR52588.2021.00049"},{"key":"10341_CR38","doi-asserted-by":"crossref","unstructured":"Rosen C, Grawi B, Shihab E (2015) Commit Guru: analytics and risk prediction of software commits. International Symposium on the Foundations of Software Engineering. pp 966\u2013969","DOI":"10.1145\/2786805.2803183"},{"key":"10341_CR39","doi-asserted-by":"crossref","unstructured":"Shihab E, Hassan AE, Adams B, Jiang ZM (2012) An industrial study on the risk of software changes. International Symposium on the Foundations of Software Engineering. Cary, North Carolina, pp 1\u201311","DOI":"10.1145\/2393596.2393670"},{"key":"10341_CR40","doi-asserted-by":"crossref","unstructured":"\u015aliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? ACM Sigsoft Software Engineering Notes 30(4):1\u20135","DOI":"10.1145\/1082983.1083147"},{"key":"10341_CR41","doi-asserted-by":"crossref","unstructured":"Song L, Minku LL (2023) A procedure to continuously evaluate predictive performance of just-in-time software defect prediction models during software development. IEEE Transactions on Software Engineering 49(2):646\u2013666","DOI":"10.1109\/TSE.2022.3158831"},{"key":"10341_CR42","doi-asserted-by":"crossref","unstructured":"Tabassum S, Minku LL, Feng D, Cabral GG, Song L (2020) An investigation of cross-project learning in online just-in-time software defect prediction. International Conference on Software Engineering. Seoul, South Korea, pp 554\u2013565","DOI":"10.1145\/3377811.3380403"},{"key":"10341_CR43","doi-asserted-by":"crossref","unstructured":"Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalance data. International Conference on Software Engineering. Florence, Italy, pp 99\u2013108","DOI":"10.1109\/ICSE.2015.139"},{"key":"10341_CR44","doi-asserted-by":"crossref","unstructured":"Tantithamthavorn C, McIntosh S, Hassan AE, Ihara A, Matsumoto K (2015) The impact of mislabelling on the performance and interpretation of defect prediction models. International Conference on Software Engineering. Florence, Italy, pp 812\u2013823","DOI":"10.1109\/ICSE.2015.93"},{"key":"10341_CR45","doi-asserted-by":"crossref","unstructured":"Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Transactions on Reliability 62(2):434\u2013443","DOI":"10.1109\/TR.2013.2259203"},{"key":"10341_CR46","doi-asserted-by":"crossref","unstructured":"Wang S, Minku LL, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. IEEE Transactions on Knowledge Data Engineering 27(5):1356\u20131368","DOI":"10.1109\/TKDE.2014.2345380"},{"key":"10341_CR47","doi-asserted-by":"crossref","unstructured":"Wang S, Minku LL, Yao X (2018) A systematic study of online class imbalance learning with concept drift. IEEE Transactions on Neural Networks and Learning Systems 29(10):4802\u20134821","DOI":"10.1109\/TNNLS.2017.2771290"},{"key":"10341_CR48","doi-asserted-by":"crossref","unstructured":"Yao J, Shepperd M (2021) The impact of using biased performance metrics on software defect prediction research. Information and Software Technology 139:106664","DOI":"10.1016\/j.infsof.2021.106664"},{"key":"10341_CR49","doi-asserted-by":"crossref","unstructured":"Yatish S, Jiarpakdee J, Thongtanunam P, Tantithamthavorn C (2019) Mining software defects: Should we consider affected releases? International Conference on Software Engineering. Montreal QC, Canada, pp 654\u2013665","DOI":"10.1109\/ICSE.2019.00075"}],"container-title":["Empirical Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-023-10341-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10664-023-10341-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-023-10341-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,4]],"date-time":"2023-10-04T12:21:50Z","timestamp":1696422110000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10664-023-10341-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9]]},"references-count":49,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,9]]}},"alternative-id":["10341"],"URL":"https:\/\/doi.org\/10.1007\/s10664-023-10341-8","relation":{},"ISSN":["1382-3256","1573-7616"],"issn-type":[{"value":"1382-3256","type":"print"},{"value":"1573-7616","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9]]},"assertion":[{"value":"15 May 2023","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 September 2023","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no relevant financial or non-financial interests to disclose. The authors have no competing interests to declare that are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"124"}}