{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T18:32:03Z","timestamp":1780425123585,"version":"3.54.1"},"reference-count":86,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2016,12,30]],"date-time":"2016-12-30T00:00:00Z","timestamp":1483056000000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2017,6]]},"DOI":"10.1007\/s10994-016-5612-6","type":"journal-article","created":{"date-parts":[[2016,12,30]],"date-time":"2016-12-30T21:31:08Z","timestamp":1483133468000},"page":"911-949","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":26,"title":["Confidence curves: an alternative to null hypothesis significance testing for the comparison of classifiers"],"prefix":"10.1007","volume":"106","author":[{"given":"Daniel","family":"Berrar","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2016,12,30]]},"reference":[{"key":"5612_CR1","first-page":"117","volume-title":"What if there were no significance tests?","author":"R Abelson","year":"1997","unstructured":"Abelson, R. (1997). A retrospective on the significance test ban of 1999 (if there were no significance tests, they would need to be invented). In L. Harlow, S. Mulaik, & J. Steiger (Eds.), What if there were no significance tests? (pp. 117\u2013141). Mahwah, NJ: Psychology Press."},{"key":"5612_CR2","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-230-36355-7","volume-title":"Serious stats: A guide to advanced statistics for the behavioral sciences","author":"T Baguley","year":"2012","unstructured":"Baguley, T. (2012). Serious stats: A guide to advanced statistics for the behavioral sciences. New York: Palgrave Macmillan."},{"issue":"6","key":"5612_CR3","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1037\/h0020412","volume":"66","author":"D Bakan","year":"1966","unstructured":"Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 66(6), 423\u2013437.","journal-title":"Psychological Bulletin"},{"issue":"452","key":"5612_CR4","first-page":"1127","volume":"95","author":"M Bayarri","year":"2000","unstructured":"Bayarri, M., & Berger, J. (2000). P values for composite null models. Journal of the American Statistical Association, 95(452), 1127\u20131142.","journal-title":"Journal of the American Statistical Association"},{"key":"5612_CR5","unstructured":"Benavoli, A., Corani, G., Mangili, F., & Zaffalon, M. (2015). A Bayesian nonparametric procedure for comparing algorithms. In Proceedings of the 32nd international conference on machine learning, JMLR.org, JMLR Proceedings (Vol. 37, pp. 1264\u20131272)."},{"issue":"2","key":"5612_CR6","first-page":"159","volume":"76","author":"J Berger","year":"1988","unstructured":"Berger, J., & Berry, D. (1988). Statistical analysis and the illusion of objectivity. American Scientist, 76(2), 159\u2013165.","journal-title":"American Scientist"},{"issue":"3","key":"5612_CR7","doi-asserted-by":"publisher","first-page":"317","DOI":"10.1214\/ss\/1177013238","volume":"2","author":"J Berger","year":"1987","unstructured":"Berger, J., & Delampaday, M. (1987). Testing precise hypotheses. Statistical Science, 2(3), 317\u2013352.","journal-title":"Statistical Science"},{"key":"5612_CR8","first-page":"112","volume":"82","author":"J Berger","year":"1987","unstructured":"Berger, J., & Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of $$p$$ p values and evidence. Journal of the American Statistical Association, 82, 112\u2013122.","journal-title":"Journal of the American Statistical Association"},{"issue":"2","key":"5612_CR9","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1080\/0952813X.2012.680252","volume":"25","author":"D Berrar","year":"2013","unstructured":"Berrar, D., & Lozano, J. (2013). Significance tests or confidence intervals: Which are preferable for the comparison of classifiers? Journal of Experimental and Theoretical Artificial Intelligence, 25(2), 189\u2013206.","journal-title":"Journal of Experimental and Theoretical Artificial Intelligence"},{"key":"5612_CR10","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1038\/nrd1927","volume":"5","author":"D Berry","year":"2006","unstructured":"Berry, D. (2006). Bayesian clinical trials. Nature Reviews Drug Discovery, 5, 27\u201336.","journal-title":"Nature Reviews Drug Discovery"},{"key":"5612_CR11","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1214\/aoms\/1177705145","volume":"32","author":"A Birnbaum","year":"1961","unstructured":"Birnbaum, A. (1961). A unified theory of estimation. I. Annals of Mathematical Statistics, 32, 112\u2013135.","journal-title":"Annals of Mathematical Statistics"},{"key":"5612_CR12","doi-asserted-by":"publisher","unstructured":"Bouckaert, R., & Frank, E. (2004). Evaluating the replicability of significance tests for comparing learning algorithms. In Proceedings of the 8th Asia-Pacific conference on advances in knowledge discovery and data mining, Springer Lecture Notes in Computer Science (Vol. 3056, pp. 3\u201312).","DOI":"10.1007\/978-3-540-24775-3_3"},{"issue":"1","key":"5612_CR13","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5\u201332.","journal-title":"Machine Learning"},{"key":"5612_CR14","volume-title":"Classification and regression trees","author":"L Breiman","year":"1984","unstructured":"Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. New York: Chapman and Hall."},{"issue":"3","key":"5612_CR15","doi-asserted-by":"publisher","first-page":"378","DOI":"10.17763\/haer.48.3.t490261645281841","volume":"48","author":"R Carver","year":"1978","unstructured":"Carver, R. (1978). The case against statistical significance testing. Harvard Educational Review, 48(3), 378\u2013399.","journal-title":"Harvard Educational Review"},{"issue":"12","key":"5612_CR16","doi-asserted-by":"publisher","first-page":"1304","DOI":"10.1037\/0003-066X.45.12.1304","volume":"45","author":"J Cohen","year":"1990","unstructured":"Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304\u20131312.","journal-title":"American Psychologist"},{"issue":"12","key":"5612_CR17","doi-asserted-by":"publisher","first-page":"997","DOI":"10.1037\/0003-066X.49.12.997","volume":"49","author":"J Cohen","year":"1994","unstructured":"Cohen, J. (1994). The earth is round (p $$<$$ < .05). American Psychologist, 49(12), 997\u20131003.","journal-title":"American Psychologist"},{"key":"5612_CR18","doi-asserted-by":"publisher","unstructured":"Corani, G., Benavoli, A., Mangili, F., & Zaffalon, M. (2015). Bayesian hypothesis testing in machine learning. In Proceedings of 2015 ECML-PKDD, Part III, Springer Lecture Notes in Artificial Intelligence (pp. 199\u2013202).","DOI":"10.1007\/978-3-319-23461-8_13"},{"issue":"2","key":"5612_CR19","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1214\/aoms\/1177706618","volume":"29","author":"D Cox","year":"1958","unstructured":"Cox, D. (1958). Some problems connected with statistical inference. Annals of Mathematical Statistics, 29(2), 357\u2013372.","journal-title":"Annals of Mathematical Statistics"},{"issue":"2","key":"5612_CR20","first-page":"49","volume":"4","author":"D Cox","year":"1977","unstructured":"Cox, D. (1977). The role of significance tests. Scandinavian Journal of Statistics, 4(2), 49\u201370.","journal-title":"Scandinavian Journal of Statistics"},{"key":"5612_CR21","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4899-2887-0","volume-title":"Theoretical statistics","author":"D Cox","year":"1974","unstructured":"Cox, D., & Hinkley, D. (1974). Theoretical statistics. New York: Chapman and Hall\/CR."},{"key":"5612_CR22","volume-title":"Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis","author":"G Cummings","year":"2012","unstructured":"Cummings, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, London: Routledge, Taylor & Francis Group."},{"key":"5612_CR23","first-page":"1","volume":"7","author":"J Dem\u0161ar","year":"2006","unstructured":"Dem\u0161ar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1\u201330.","journal-title":"Journal of Machine Learning Research"},{"key":"5612_CR24","unstructured":"Dem\u0161ar, J. (2008). On the appropriateness of statistical tests in machine learning. In Proceedings of the 3rd workshop on evaluation methods for machine learning, in conjunction with the 25th international conference on machine learning (pp. 1\u20134)."},{"key":"5612_CR25","doi-asserted-by":"publisher","first-page":"1895","DOI":"10.1162\/089976698300017197","volume":"10","author":"TG Dietterich","year":"1998","unstructured":"Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10, 1895\u20131923.","journal-title":"Neural Computation"},{"key":"5612_CR26","unstructured":"Drummond, C. (2006). Machine learning as an experimental science, revisited. In Proceedings of the 21st national conference on artificial intelligence: Workshop on evaluation methods for machine learning, Technical Report WS-06-06 (pp. 1\u20135). AAAI Press."},{"key":"5612_CR27","unstructured":"Drummond, C. (2009). Replicability is not reproducibility: Nor is it good science. In Proceedings of evaluation methods for machine learning workshop at the 26th international conference on machine learning, Montreal (pp. 1\u20136)."},{"key":"5612_CR28","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1080\/09528130903010295","volume":"2","author":"C Drummond","year":"2010","unstructured":"Drummond, C., & Japkowicz, N. (2010). Warning: Statistical benchmarking is addictive. Kicking the habit in machine learning. Journal of Experimental and Theoretical Artificial Intelligence, 2, 67\u201380.","journal-title":"Journal of Experimental and Theoretical Artificial Intelligence"},{"key":"5612_CR29","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1080\/01621459.1943.10501783","volume":"38","author":"R Fisher","year":"1943","unstructured":"Fisher, R. (1943). Note on Dr. Berkson\u2019s criticism of tests of significance. Journal of the American Statistical Association, 38, 103\u2013104.","journal-title":"Journal of the American Statistical Association"},{"issue":"1","key":"5612_CR30","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1111\/j.2517-6161.1955.tb00180.x","volume":"17","author":"R Fisher","year":"1955","unstructured":"Fisher, R. (1955). Statistical methods and scientific induction. Journal of the Royal Statistical Society, Series B, 17(1), 69\u201378.","journal-title":"Journal of the Royal Statistical Society, Series B"},{"key":"5612_CR31","volume-title":"Ideas of statistics","author":"J Folks","year":"1981","unstructured":"Folks, J. (1981). Ideas of statistics. New York: Wiley."},{"key":"5612_CR32","first-page":"149","volume-title":"Handbook of research methods in personality psychology","author":"R Fraley","year":"2007","unstructured":"Fraley, R., & Marks, M. (2007). The null hypothesis significance testing debate and its implications for personality research. In R. Robins, R. Fraley, & R. Krueger (Eds.), Handbook of research methods in personality psychology (pp. 149\u2013169). New York: Guilford."},{"issue":"200","key":"5612_CR33","doi-asserted-by":"publisher","first-page":"675","DOI":"10.1080\/01621459.1937.10503522","volume":"32","author":"M Friedman","year":"1937","unstructured":"Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200), 675\u2013701.","journal-title":"Journal of the American Statistical Association"},{"issue":"1","key":"5612_CR34","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1214\/aoms\/1177731944","volume":"11","author":"M Friedman","year":"1940","unstructured":"Friedman, M. (1940). A comparison of alternative tests of significance for the problem of $$m$$ m rankings. Annals of Mathematical Statistics, 11(1), 86\u201392.","journal-title":"Annals of Mathematical Statistics"},{"key":"5612_CR35","first-page":"2677","volume":"9","author":"S Garc\u00eda","year":"2008","unstructured":"Garc\u00eda, S., & Herrera, F. (2008). An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. Journal of Machine Learning Research, 9, 2677\u20132694.","journal-title":"Journal of Machine Learning Research"},{"key":"5612_CR36","doi-asserted-by":"publisher","first-page":"199","DOI":"10.1017\/S0140525X98281167","volume":"21","author":"G Gigerenzer","year":"1998","unstructured":"Gigerenzer, G. (1998). We need statistical thinking, not statistical rituals. Behavioral and Brain Sciences, 21, 199\u2013200.","journal-title":"Behavioral and Brain Sciences"},{"key":"5612_CR37","first-page":"391","volume-title":"The sage handbook of quantitative methodology for the social sciences","author":"G Gigerenzer","year":"2004","unstructured":"Gigerenzer, G., Krauss, S., & Vitouch, O. (2004). The null ritual\u2013What you always wanted to know about significance testing but were afraid to ask. In D. Kaplan (Ed.), The sage handbook of quantitative methodology for the social sciences (pp. 391\u2013408). Thousand Oaks, CA: Sage."},{"issue":"5","key":"5612_CR38","doi-asserted-by":"publisher","first-page":"485","DOI":"10.1093\/oxfordjournals.aje.a116700","volume":"137","author":"S Goodman","year":"1993","unstructured":"Goodman, S. (1993). $$p$$ p values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate. American Journal of Epidemiology, 137(5), 485\u2013496.","journal-title":"American Journal of Epidemiology"},{"issue":"12","key":"5612_CR39","doi-asserted-by":"publisher","first-page":"995","DOI":"10.7326\/0003-4819-130-12-199906150-00008","volume":"130","author":"S Goodman","year":"1999","unstructured":"Goodman, S. (1999). Toward evidence-based medical statistics. 1: The $$p$$ p value fallacy. Annals of Internal Medicine, 130(12), 995\u20131004.","journal-title":"Annals of Internal Medicine"},{"issue":"3","key":"5612_CR40","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1053\/j.seminhematol.2008.04.003","volume":"45","author":"S Goodman","year":"2008","unstructured":"Goodman, S. (2008). A dirty dozen: Twelve $$p$$ p -value misconceptions. Seminars in Hematology, 45(3), 135\u2013140.","journal-title":"Seminars in Hematology"},{"issue":"12","key":"5612_CR41","doi-asserted-by":"publisher","first-page":"1568","DOI":"10.2105\/AJPH.78.12.1568","volume":"78","author":"S Goodman","year":"1988","unstructured":"Goodman, S., & Royall, R. (1988). Evidence and scientific research. American Journal of Public Health, 78(12), 1568\u20131574.","journal-title":"American Journal of Public Health"},{"issue":"2","key":"5612_CR42","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1111\/j.1469-8986.1996.tb02121.x","volume":"33","author":"A Greenwald","year":"1996","unstructured":"Greenwald, A., Gonzalez, R., Harris, R., & Guthrie, D. (1996). Effect sizes and $$p$$ p values: What should be reported and what should be replicated? Psychophysiology, 33(2), 175\u2013183.","journal-title":"Psychophysiology"},{"key":"5612_CR43","unstructured":"Guyon, I., Lemaire, V., Boull\u00e9, M., Dror, G., & Vogel, D. (2009). Analysis of the KDD Cup 2009: Fast scoring on a large Orange customer database. In JMLR: Workshop and conference proceedings (Vol. 7, pp. 1\u201322)."},{"key":"5612_CR44","volume-title":"What if there were no significance tests? Multivariate applications book series","author":"L Harlow","year":"1997","unstructured":"Harlow, L., Mulaik, S., & Steiger, J. (1997). What if there were no significance tests? Multivariate applications book series. Mahwah, NJ: Lawrence Erlbaum Associates Publishers."},{"key":"5612_CR45","volume-title":"Statistics","author":"W Hays","year":"1963","unstructured":"Hays, W. (1963). Statistics. New York: Holt, Rinehart and Winston."},{"key":"5612_CR46","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4899-7180-7","volume-title":"Multiple comparisons: Theory and methods","author":"J Hsu","year":"1996","unstructured":"Hsu, J. (1996). Multiple comparisons: Theory and methods. Boca Raton, FL: CRC Press."},{"issue":"3","key":"5612_CR47","doi-asserted-by":"publisher","first-page":"295","DOI":"10.1177\/0959354304043638","volume":"14","author":"R Hubbard","year":"2004","unstructured":"Hubbard, R. (2004). Alphabet soup\u2014blurring the distinctions between $$p$$ p \u2019s and $$\\alpha $$ \u03b1 \u2019s in psychological research. Theory and Psychology, 14(3), 295\u2013327.","journal-title":"Theory and Psychology"},{"issue":"2","key":"5612_CR48","doi-asserted-by":"publisher","first-page":"114","DOI":"10.1177\/0273475306288399","volume":"28","author":"R Hubbard","year":"2006","unstructured":"Hubbard, R., & Armstrong, J. (2006). Why we don\u2019t really know what \u201cstatistical significance\u201d means: A major educational failure. Journal of Marketing Education, 28(2), 114\u2013120.","journal-title":"Journal of Marketing Education"},{"key":"5612_CR49","unstructured":"Hubbard, R., & Bayarri, M. (2003). P values are not error probabilities. Technical Report University of Valencia; Accessed 22 Sept. 2016 http:\/\/www.uv.es\/sestio\/TechRep\/tr14-03"},{"issue":"1","key":"5612_CR50","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1177\/0959354307086923","volume":"18","author":"R Hubbard","year":"2008","unstructured":"Hubbard, R., & Lindsay, R. (2008). Why $$p$$ p values are not a useful measure of evidence in statistical significance testing. Theory Psychology, 18(1), 69\u201388.","journal-title":"Theory Psychology"},{"issue":"6","key":"5612_CR51","doi-asserted-by":"publisher","first-page":"571","DOI":"10.1080\/03610928008827904","volume":"9","author":"R Iman","year":"1980","unstructured":"Iman, R., & Davenport, J. (1980). Approximations of the critical region of the Friedman statistic. Communications in Statistics\u2014Theory and Methods, 9(6), 571\u2013595.","journal-title":"Communications in Statistics\u2014Theory and Methods"},{"issue":"5","key":"5612_CR52","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1111\/j.0956-7976.2005.01538.x","volume":"16","author":"P Killeen","year":"2004","unstructured":"Killeen, P. (2004). An alternative to null hypothesis significance tests. Psychological Science, 16(5), 345\u2013353.","journal-title":"Psychological Science"},{"issue":"1","key":"5612_CR53","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1037\/0003-066X.56.1.16","volume":"56","author":"J Krueger","year":"2001","unstructured":"Krueger, J. (2001). Null hypothesis significance testing\u2014On the survival of a flawed method. American Psychologist, 56(1), 16\u201326.","journal-title":"American Psychologist"},{"issue":"2","key":"5612_CR54","first-page":"43","volume":"5","author":"J Levin","year":"1998","unstructured":"Levin, J. (1998). What if there were no more bickering about statistical significance tests? Research in the Schools, 5(2), 43\u201353.","journal-title":"Research in the Schools"},{"key":"5612_CR55","unstructured":"Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18\u201322. http:\/\/CRAN.R-project.org\/doc\/Rnews\/"},{"key":"5612_CR56","unstructured":"Lichman, M, (2013). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http:\/\/archive.ics.uci.edu\/ml"},{"key":"5612_CR57","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1093\/biomet\/44.1-2.187","volume":"44","author":"D Lindley","year":"1957","unstructured":"Lindley, D. (1957). A statistical paradox. Biometrika, 44, 187\u2013192.","journal-title":"Biometrika"},{"issue":"4","key":"5612_CR58","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1207\/S15327035EX1104_2","volume":"11","author":"P Morgan","year":"2003","unstructured":"Morgan, P. (2003). Null hypothesis significance testing: Philosophical and practical considerations of a statistical controversy. Exceptionality, 11(4), 209\u2013221.","journal-title":"Exceptionality"},{"key":"5612_CR59","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1023\/A:1024068626366","volume":"52","author":"C Nadeau","year":"2003","unstructured":"Nadeau, C., & Bengio, Y. (2003). Inference for the generalization error. Machine Learning, 52, 239\u2013281.","journal-title":"Machine Learning"},{"key":"5612_CR60","unstructured":"Nemenyi, P. (1963). Distribution-free multiple comparisons. Ph.D. thesis, Princeton University, Princeton."},{"key":"5612_CR61","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1098\/rsta.1933.0009","volume":"231","author":"J Neyman","year":"1933","unstructured":"Neyman, J., & Pearson, E. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London Series A, 231, 289\u2013337.","journal-title":"Philosophical Transactions of the Royal Society of London Series A"},{"key":"5612_CR62","doi-asserted-by":"publisher","first-page":"150","DOI":"10.1038\/506150a","volume":"506","author":"R Nuzzo","year":"2014","unstructured":"Nuzzo, R. (2014). Statistical errors. Nature, 506, 150\u2013152.","journal-title":"Nature"},{"key":"5612_CR63","doi-asserted-by":"publisher","first-page":"1236","DOI":"10.1136\/bmj.316.7139.1236","volume":"316","author":"T Perneger","year":"1998","unstructured":"Perneger, T. (1998). What\u2019s wrong with Bonferroni adjustments. British Medical Journal, 316, 1236\u20131238.","journal-title":"British Medical Journal"},{"issue":"77","key":"5612_CR64","doi-asserted-by":"publisher","first-page":"195","DOI":"10.2105\/AJPH.77.2.195","volume":"2","author":"C Poole","year":"1987","unstructured":"Poole, C. (1987). Beyond the confidence interval. American Journal of Public Health, 2(77), 195\u2013199.","journal-title":"American Journal of Public Health"},{"issue":"2","key":"5612_CR65","first-page":"241","volume":"4","author":"C Poole","year":"1991","unstructured":"Poole, C. (1991). Multiple comparisons? No problem!. Epidemiology, 4(2), 241\u2013243.","journal-title":"Epidemiology"},{"issue":"3","key":"5612_CR66","doi-asserted-by":"publisher","first-page":"291","DOI":"10.1097\/00001648-200105000-00005","volume":"12","author":"C Poole","year":"2001","unstructured":"Poole, C. (2001). Low p-values or narrow confidence intervals: Which are more durable? Epidemiology, 12(3), 291\u2013294.","journal-title":"Epidemiology"},{"key":"5612_CR67","unstructured":"R Development Core Team. (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http:\/\/www.R-project.org , ISBN 3-900051-07-0"},{"issue":"1","key":"5612_CR68","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1097\/00001648-199001000-00010","volume":"1","author":"K Rothman","year":"1990","unstructured":"Rothman, K. (1990). No adjustments are needed for multiple comparisons. Epidemiology, 1(1), 43\u201346.","journal-title":"Epidemiology"},{"issue":"3","key":"5612_CR69","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1097\/00001648-199805000-00019","volume":"9","author":"K Rothman","year":"1998","unstructured":"Rothman, K. (1998). Writing for Epidemiology. Epidemiology, 9(3), 333\u2013337.","journal-title":"Epidemiology"},{"key":"5612_CR70","volume-title":"Modern epidemiology","author":"K Rothman","year":"2008","unstructured":"Rothman, K., Greenland, S., & Lash, T. (2008). Modern epidemiology (3rd ed.). Philadelphia: Wolters Kluwer.","edition":"3"},{"key":"5612_CR71","doi-asserted-by":"publisher","first-page":"416","DOI":"10.1037\/h0042040","volume":"57","author":"W Rozeboom","year":"1960","unstructured":"Rozeboom, W. (1960). The fallacy of the null hypothesis significance test. Psychological Bulletin, 57, 416\u2013428.","journal-title":"Psychological Bulletin"},{"key":"5612_CR72","first-page":"132","volume-title":"What if there were no significance tests?","author":"W Rozeboom","year":"1997","unstructured":"Rozeboom, W. (1997). Good science is abductive, not hypothetico-deductive. In L. Harlow, S. Mulaik, & J. Steiger (Eds.), What if there were no significance tests? (pp. 132\u2013149). Mahwah, NJ: Psychology Press."},{"issue":"6","key":"5612_CR73","first-page":"1","volume":"245","author":"V Savalei","year":"2015","unstructured":"Savalei, V., & Dunn, E. (2015). Is the call to abandon $$p$$ p -values the red herring of the replicability crisis? Frontiers in Psychology, 245(6), 1\u20134.","journal-title":"Frontiers in Psychology"},{"issue":"9","key":"5612_CR74","doi-asserted-by":"publisher","first-page":"813","DOI":"10.1093\/oxfordjournals.aje.a009532","volume":"147","author":"D Savitz","year":"1998","unstructured":"Savitz, D., & Olshan, A. (1998). Describing data requires no adjustment for multiple comparisons: A reply from Savitz and Olshan. American Journal of Epidemiology, 147(9), 813\u2013814.","journal-title":"American Journal of Epidemiology"},{"issue":"3","key":"5612_CR75","first-page":"203","volume":"50","author":"M Schervish","year":"1996","unstructured":"Schervish, M. (1996). $$P$$ P values: What they are and what they are not. The American Statistician, 50(3), 203\u2013206.","journal-title":"The American Statistician"},{"issue":"2","key":"5612_CR76","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1037\/1082-989X.1.2.115","volume":"1","author":"F Schmidt","year":"1996","unstructured":"Schmidt, F. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1(2), 115\u2013129.","journal-title":"Psychological Methods"},{"key":"5612_CR77","first-page":"37","volume-title":"What if there were no significance tests?","author":"F Schmidt","year":"1997","unstructured":"Schmidt, F., & Hunter, J. (1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In L. Harlow, S. Mulaik, & J. Steiger (Eds.), What if there were no significance tests? (pp. 37\u201364). Mahwah, NJ: Psychology Press."},{"issue":"1","key":"5612_CR78","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1198\/000313001300339950","volume":"55","author":"T Sellke","year":"2001","unstructured":"Sellke, T., Bayarri, M., & Berger, J. (2001). Calibration of $$p$$ p values for testing precise null hypotheses. The American Statistician, 55(1), 62\u201371.","journal-title":"The American Statistician"},{"key":"5612_CR79","volume-title":"Handbook of parametric and nonparametric statistical procedures","author":"D Sheskin","year":"2007","unstructured":"Sheskin, D. (2007). Handbook of parametric and nonparametric statistical procedures (4th ed.). London\/New York: Chapman and Hall.","edition":"4"},{"key":"5612_CR80","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1007\/s10654-010-9440-x","volume":"25","author":"A Stang","year":"2010","unstructured":"Stang, A., Poole, C., & Kuss, O. (2010). The ongoing tyranny of statistical significance testing in biomedical research. European Journal of Epidemiology, 25, 225\u2013230.","journal-title":"European Journal of Epidemiology"},{"issue":"1","key":"5612_CR81","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1097\/00001648-199001000-00009","volume":"1","author":"K Sullivan","year":"1990","unstructured":"Sullivan, K., & Foster, D. (1990). Use of the confidence interval function. Epidemiology, 1(1), 39\u201342.","journal-title":"Epidemiology"},{"key":"5612_CR82","unstructured":"Therneau, T., Atkinson, B., & Ripley, B. (2014). rpart: Recursive partitioning and regression trees. http:\/\/CRAN.R-project.org\/package=rpart , R package version 4.1-5."},{"issue":"2","key":"5612_CR83","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1177\/095935439992006","volume":"9","author":"B Thompson","year":"1999","unstructured":"Thompson, B. (1999). If statistical significance tests are broken\/misused, what practices should supplement or replace them? Theory & Psychology, 9(2), 165\u2013181.","journal-title":"Theory & Psychology"},{"issue":"1","key":"5612_CR84","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1214\/ss\/1177011945","volume":"6","author":"J Tukey","year":"1991","unstructured":"Tukey, J. (1991). The philosophy of multiple comparisons. Statistical Science, 6(1), 100\u2013116.","journal-title":"Statistical Science"},{"issue":"253","key":"5612_CR85","first-page":"19","volume":"46","author":"F Yates","year":"1951","unstructured":"Yates, F. (1951). The influence of statistical methods for research workers on the development of the science of statistics. Journal of the American Statistical Association, 46(253), 19\u201334.","journal-title":"Journal of the American Statistical Association"},{"issue":"1","key":"5612_CR86","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1080\/00220973.1993.9943832","volume":"62","author":"D Zimmerman","year":"1993","unstructured":"Zimmerman, D., & Zumbo, B. (1993). Relative power of the Wilcoxon test, the Friedman test, and repeated-measures ANOVA on ranks. The Journal of Experimental Education, 62(1), 75\u201386.","journal-title":"The Journal of Experimental Education"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10994-016-5612-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-016-5612-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-016-5612-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,21]],"date-time":"2024-06-21T14:13:43Z","timestamp":1718979223000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10994-016-5612-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,12,30]]},"references-count":86,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2017,6]]}},"alternative-id":["5612"],"URL":"https:\/\/doi.org\/10.1007\/s10994-016-5612-6","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,12,30]]}}}