{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T07:48:39Z","timestamp":1774770519402,"version":"3.50.1"},"reference-count":103,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2022,4,11]],"date-time":"2022-04-11T00:00:00Z","timestamp":1649635200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"},{"start":{"date-parts":[[2022,4,11]],"date-time":"2022-04-11T00:00:00Z","timestamp":1649635200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Min Knowl Disc"],"published-print":{"date-parts":[[2022,5]]},"DOI":"10.1007\/s10618-022-00828-1","type":"journal-article","created":{"date-parts":[[2022,4,11]],"date-time":"2022-04-11T12:04:09Z","timestamp":1649678649000},"page":"1102-1139","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Using p-values for the comparison of classifiers: pitfalls and alternatives"],"prefix":"10.1007","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7038-2601","authenticated-orcid":false,"given":"Daniel","family":"Berrar","sequence":"first","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,4,11]]},"reference":[{"key":"828_CR1","unstructured":"Abelson R (2016) A retrospective on the significance test ban of 1999 (if there were no significance tests, they would need to be invented). In: Harlow L, Mulaik S, Steiger J (eds) What if there were no significance tests?. Routledge Classic Editions, pp 107\u2013128"},{"key":"828_CR2","doi-asserted-by":"publisher","first-page":"1644","DOI":"10.1016\/j.athoracsur.2015.11.024","volume":"101","author":"A Althouse","year":"2016","unstructured":"Althouse A (2016) Adjust for multiple comparisons? It\u2019s not that simple. Ann Thorac Surg 101:1644\u20131645","journal-title":"Ann Thorac Surg"},{"issue":"4","key":"828_CR3","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1038\/s41562-017-0224-0","volume":"2","author":"V Amrhein","year":"2018","unstructured":"Amrhein V, Greenland S (2018) Remove, rather than redefine, statistical significance. Nat Hum Behav 2(4):4","journal-title":"Nat Hum Behav"},{"key":"828_CR4","doi-asserted-by":"crossref","unstructured":"Amrhein V, Korner-Nievergelt F, Roth T (2017) The earth is flat ($$p > 0.05$$): significance thresholds and the crisis of unreplicable research. PeerJ 5:e3544","DOI":"10.7717\/peerj.3544"},{"key":"828_CR5","doi-asserted-by":"crossref","unstructured":"Bayarri M, Berger J (2000) P values for composite null models. J Am Stat Assoc 95(452):1127\u20131142","DOI":"10.1080\/01621459.2000.10474309"},{"key":"828_CR6","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1016\/j.jmp.2015.12.007","volume":"72","author":"M Bayarri","year":"2016","unstructured":"Bayarri M, Benjamin D, Berger J, Sellke T (2016) Rejection odds and rejection ratios: a proposal for statistical practice in testing hypotheses. J Math Psychol 72:90\u2013103","journal-title":"J Math Psychol"},{"issue":"5","key":"828_CR7","first-page":"1","volume":"17","author":"A Benavoli","year":"2016","unstructured":"Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17(5):1\u201310","journal-title":"J Mach Learn Res"},{"issue":"77","key":"828_CR8","first-page":"1","volume":"18","author":"A Benavoli","year":"2017","unstructured":"Benavoli A, Corani G, Dem\u0161ar J, Zaffalon M (2017) Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis. J Mach Learn Res 18(77):1\u201336","journal-title":"J Mach Learn Res"},{"key":"828_CR9","unstructured":"Benjamin D, Berger J (2016) Comment: a simple alternative to $$p$$-values. Am Stat (Online Discussion: ASA Statement on Statistical Significance and $$P$$-values) 70:1\u20132"},{"issue":"sup1","key":"828_CR10","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1080\/00031305.2018.1543135","volume":"73","author":"D Benjamin","year":"2019","unstructured":"Benjamin D, Berger J (2019) Three recommendations for improving the use of $$p$$-values. Am Stat 73(sup1):186\u2013191","journal-title":"Am Stat"},{"issue":"1","key":"828_CR11","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1038\/s41562-017-0189-z","volume":"2","author":"D Benjamin","year":"2018","unstructured":"Benjamin D, Berger J, Johannesson M, Nosek B, Wagenmakers E, Berk R, Bollen K, Brembs B, Brown L, Camerer C, Cesarini D, Chambers C, Clyde M, Cook T, De Boeck P, Dienes Z, Dreber A, Easwaran K, Efferson C, Fehr E, Fidler F, Field A, Forster M, George E, Gonzalez R, Goodman S, Green E, Green D, Greenwald A, Hadfield J, Hedges L, Held L, Hua Ho T, Hoijtink H, Hruschka D, Imai K, Imbens G, Ioannidis J, Jeon M, Jones J, Kirchler M, Laibson D, List J, Little R, Lupia A, Machery E, Maxwell S, McCarthy M, Moore D, Morgan S, Munaf\u00f3 M, Nakagawa S, Nyhan B, Parker T, Pericchi L, Perugini M, Rouder J, Rousseau J, Savalei V, Sch\u00f6nbrodt F, Sellke T, Sinclair B, Tingley D, Van Zandt T, Vazire S, Watts D, Winship C, Wolpert R, Xie Y, Young C, Zinman J, Johnson V (2018) Redefine statistical significance. Nat Hum Behav 2(1):6\u201310","journal-title":"Nat Hum Behav"},{"key":"828_CR12","first-page":"159","volume":"76","author":"J Berger","year":"1988","unstructured":"Berger J, Berry D (1988) Statistical analysis and the illusion of objectivity. Am Sci 76:159\u2013165","journal-title":"Am Sci"},{"issue":"3","key":"828_CR13","first-page":"317","volume":"2","author":"J Berger","year":"1987","unstructured":"Berger J, Delampady M (1987) Testing precise hypotheses. Stat Sci 2(3):317\u2013352","journal-title":"Stat Sci"},{"key":"828_CR14","first-page":"112","volume":"82","author":"J Berger","year":"1987","unstructured":"Berger J, Sellke T (1987) Testing a point null hypothesis: the irreconcilability of $$p$$ values and evidence. J Am Stat Assoc 82:112\u2013122","journal-title":"J Am Stat Assoc"},{"key":"828_CR15","doi-asserted-by":"crossref","unstructured":"Berger J, Wolpert R (1988) The Likelihood Principle, 2nd edn. Institute of Mathematical Statistics, Hayward, California","DOI":"10.1214\/lnms\/1215466210"},{"key":"828_CR16","doi-asserted-by":"crossref","unstructured":"Berrar D (2017) Confidence curves: an alternative to null hypothesis significance testing for the comparison of classifiers. Mach Learn 106(6):911\u2013949","DOI":"10.1007\/s10994-016-5612-6"},{"issue":"4","key":"828_CR17","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1007\/s41060-018-0148-4","volume":"7","author":"D Berrar","year":"2019","unstructured":"Berrar D, Dubitzky W (2019) Should significance testing be abandoned in machine learning? Int J Data Sci Anal 7(4):247\u2013257","journal-title":"Int J Data Sci Anal"},{"issue":"2","key":"828_CR18","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1080\/0952813X.2012.680252","volume":"25","author":"D Berrar","year":"2013","unstructured":"Berrar D, Lozano J (2013) Significance tests or confidence intervals: which are preferable for the comparison of classifiers? J Exp Theor Artif Intell 25(2):189\u2013206","journal-title":"J Exp Theor Artif Intell"},{"issue":"2","key":"828_CR19","doi-asserted-by":"publisher","first-page":"143","DOI":"10.1007\/s41060-017-0057-y","volume":"4","author":"D Berrar","year":"2017","unstructured":"Berrar D, Lopes P, Dubitzky W (2017) Caveats and pitfalls in crowdsourcing research: the case of soccer referee bias. Int J Data Sci Anal 4(2):143\u2013151","journal-title":"Int J Data Sci Anal"},{"key":"828_CR20","doi-asserted-by":"publisher","first-page":"895","DOI":"10.1080\/01621459.2017.1316279","volume":"112","author":"D Berry","year":"2017","unstructured":"Berry D (2017) A $$p$$-value to die for. J Am Stat Assoc 112:895\u2013897","journal-title":"J Am Stat Assoc"},{"key":"828_CR21","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1214\/aoms\/1177705145","volume":"32","author":"A Birnbaum","year":"1961","unstructured":"Birnbaum A (1961) A unified theory of estimation, I. Ann Math Stat 32:112\u2013135","journal-title":"Ann Math Stat"},{"key":"828_CR22","doi-asserted-by":"publisher","first-page":"100665","DOI":"10.1016\/j.swevo.2020.100665","volume":"54","author":"J Carrasco","year":"2020","unstructured":"Carrasco J, Garc\u00eda S, Rueda M, Das S, Herrera F (2020) Recent trends in the use of statistical tests for comparing swarm and evolutionary computing algorithms: practical guidelines and a critical review. Swarm Evol Comput 54:100665","journal-title":"Swarm Evol Comput"},{"issue":"3","key":"828_CR23","doi-asserted-by":"publisher","first-page":"378","DOI":"10.17763\/haer.48.3.t490261645281841","volume":"48","author":"R Carver","year":"1978","unstructured":"Carver R (1978) The case against statistical significance testing. Harv Educ Rev 48(3):378\u2013399","journal-title":"Harv Educ Rev"},{"issue":"2","key":"828_CR24","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1198\/000313005X20871","volume":"59","author":"R Christensen","year":"2005","unstructured":"Christensen R (2005) Testing Fisher, Neyman, Pearson, and Bayes. Am Stat 59(2):121\u2013126","journal-title":"Am Stat"},{"issue":"8","key":"828_CR25","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1145\/3360311","volume":"63","author":"A Cockburn","year":"2020","unstructured":"Cockburn A, Dragicevic P, Besan\u00e7on L, Gutwin C (2020) Threats of a replication crisis in empirical computer science. Commun ACM 63(8):70\u201379","journal-title":"Commun ACM"},{"issue":"12","key":"828_CR26","doi-asserted-by":"publisher","first-page":"1304","DOI":"10.1037\/0003-066X.45.12.1304","volume":"45","author":"J Cohen","year":"1990","unstructured":"Cohen J (1990) Things I have learned (so far). Am Psychol 45(12):1304\u20131312","journal-title":"Am Psychol"},{"issue":"12","key":"828_CR27","doi-asserted-by":"publisher","first-page":"997","DOI":"10.1037\/0003-066X.49.12.997","volume":"49","author":"J Cohen","year":"1994","unstructured":"Cohen J (1994) The earth is round ($$p <$$ .05). Am Psychol 49(12):997\u20131003","journal-title":"Am Psychol"},{"key":"828_CR28","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1016\/0021-9681(79)90006-7","volume":"32","author":"P Cole","year":"1979","unstructured":"Cole P (1979) The evolving case-control study. J Chronic Dis 32:15\u201327","journal-title":"J Chronic Dis"},{"key":"828_CR29","doi-asserted-by":"publisher","first-page":"171085","DOI":"10.1098\/rsos.171085","volume":"4","author":"D Colquhoun","year":"2017","unstructured":"Colquhoun D (2017) The reproducibility of research and the misinterpretation of $$p$$-values. R Soc Open Sci 4:171085","journal-title":"R Soc Open Sci"},{"key":"828_CR30","volume-title":"Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis","author":"G Cumming","year":"2012","unstructured":"Cumming G (2012) Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. Routledge, Taylor & Francis Group, New York\/London"},{"key":"828_CR31","doi-asserted-by":"crossref","unstructured":"Dau HA, Bagnall AJ, Kamgar K, Yeh CM, Zhu Y, Gharghabi S, Ratanamahatana CA, Keogh EJ (2019) The UCR time series archive. CoRR. arXiv:1810.07758","DOI":"10.1109\/JAS.2019.1911747"},{"key":"828_CR32","first-page":"1","volume":"7","author":"J Dem\u0161ar","year":"2006","unstructured":"Dem\u0161ar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1\u201330","journal-title":"J Mach Learn Res"},{"key":"828_CR33","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1162\/089976698300017197","volume":"10","author":"T Dietterich","year":"1998","unstructured":"Dietterich T (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:31\u201336","journal-title":"Neural Comput"},{"key":"828_CR34","unstructured":"Drummond C (2006) Machine learning as an experimental science, revisited. In: Proceedings of the 21st national conference on artificial intelligence: workshop on evaluation methods for machine learning. AAAI Press, pp 1\u20135"},{"key":"828_CR35","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1080\/09528130903010295","volume":"2","author":"C Drummond","year":"2010","unstructured":"Drummond C, Japkowicz N (2010) Warning: statistical benchmarking is addictive. Kicking the habit in machine learning. J Exp Theor Artif Intell 2:67\u201380","journal-title":"J Exp Theor Artif Intell"},{"key":"828_CR36","unstructured":"Dua D, Graff C (2019) UCI machine learning repository. http:\/\/archive.ics.uci.edu\/ml"},{"key":"828_CR37","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-49317-6","volume-title":"Multiple testing procedures with applications to genomics","author":"S Dudoit","year":"2008","unstructured":"Dudoit S, van der Laan M (2008) Multiple testing procedures with applications to genomics, 1st edn. Springer, New York","edition":"1"},{"key":"828_CR38","doi-asserted-by":"publisher","first-page":"675","DOI":"10.1080\/01621459.1937.10503522","volume":"32","author":"M Friedman","year":"1937","unstructured":"Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675\u2013701","journal-title":"J Am Stat Assoc"},{"issue":"89","key":"828_CR39","first-page":"2677","volume":"9","author":"S Garc\u00eda","year":"2008","unstructured":"Garc\u00eda S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res 9(89):2677\u20132694","journal-title":"J Mach Learn Res"},{"key":"828_CR40","unstructured":"Gelman A (2016) The problems with $$p$$-values are not just with $$p$$-values. The American Statistician, Online Discussion, pp 1\u20132"},{"issue":"0","key":"828_CR41","first-page":"1","volume":"0","author":"E Gibson","year":"2020","unstructured":"Gibson E (2020) The role of $$p$$-values in judging the strength of evidence and realistic replication expectations. Stat Biopharm Res 0(0):1\u201313","journal-title":"Stat Biopharm Res"},{"key":"828_CR42","doi-asserted-by":"publisher","first-page":"199","DOI":"10.1017\/S0140525X98281167","volume":"21","author":"G Gigerenzer","year":"1998","unstructured":"Gigerenzer G (1998) We need statistical thinking, not statistical rituals. Behav Brain Sci 21:199\u2013200","journal-title":"Behav Brain Sci"},{"key":"828_CR43","doi-asserted-by":"publisher","first-page":"587","DOI":"10.1016\/j.socec.2004.09.033","volume":"33","author":"G Gigerenzer","year":"2004","unstructured":"Gigerenzer G (2004) Mindless statistics. J Socio-Econ 33:587\u2013606","journal-title":"J Socio-Econ"},{"key":"828_CR44","first-page":"391","volume-title":"The Sage handbook of quantitative methodology for the social sciences","author":"G Gigerenzer","year":"2004","unstructured":"Gigerenzer G, Krauss S, Vitouch O (2004) The Null Ritual-What you always wanted to know about significance testing but were afraid to ask. In: Kaplan D (ed) The Sage handbook of quantitative methodology for the social sciences. Sage, Thousand Oaks, pp 391\u2013408"},{"key":"828_CR45","doi-asserted-by":"publisher","first-page":"875","DOI":"10.1002\/sim.4780110705","volume":"11","author":"S Goodman","year":"1992","unstructured":"Goodman S (1992) A comment on replication, $$p$$-values and evidence. Stat Med 11:875\u2013879","journal-title":"Stat Med"},{"key":"828_CR46","doi-asserted-by":"crossref","unstructured":"Goodman S (1993) P values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. Am J Epidemiol 137(5):485\u2013496","DOI":"10.1093\/oxfordjournals.aje.a116700"},{"issue":"12","key":"828_CR47","doi-asserted-by":"publisher","first-page":"995","DOI":"10.7326\/0003-4819-130-12-199906150-00008","volume":"130","author":"S Goodman","year":"1999","unstructured":"Goodman S (1999) Toward evidence-based medical statistics 1: the P value fallacy. Ann Intern Med 130(12):995\u20131004","journal-title":"Ann Intern Med"},{"key":"828_CR48","doi-asserted-by":"crossref","unstructured":"Goodman S (2008) A dirty dozen: twelve P-value misconceptions. Semin Hematol 45(3):135\u2013140","DOI":"10.1053\/j.seminhematol.2008.04.003"},{"issue":"12","key":"828_CR49","doi-asserted-by":"publisher","first-page":"1568","DOI":"10.2105\/AJPH.78.12.1568","volume":"78","author":"S Goodman","year":"1988","unstructured":"Goodman S, Royall R (1988) Evidence and scientific research. Am J Public Health 78(12):1568\u20131574","journal-title":"Am J Public Health"},{"issue":"4","key":"828_CR50","doi-asserted-by":"publisher","first-page":"337","DOI":"10.1007\/s10654-016-0149-3","volume":"31","author":"S Greenland","year":"2016","unstructured":"Greenland S, Senn S, Rothman K, Carlin J, Poole C, Goodman S, Altman D (2016) Statistical tests, $$p$$ values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 31(4):337\u2013350","journal-title":"Eur J Epidemiol"},{"key":"828_CR51","unstructured":"Gundersen OE, Kjensmo S (2018) State of the art: reproducibility in artificial intelligence. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the 32nd AAAI conference on artificial intelligence. AAAI Press, pp 1644\u20131651"},{"issue":"1","key":"828_CR52","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1037\/0003-066X.52.1.15","volume":"52","author":"R Hagen","year":"1997","unstructured":"Hagen R (1997) In praise of the null hypothesis significance test. Am Psychol 52(1):15\u201323","journal-title":"Am Psychol"},{"key":"828_CR53","volume-title":"Statistics","author":"W Hays","year":"1963","unstructured":"Hays W (1963) Statistics. Holt, Rinehart and Winston, New York"},{"issue":"5","key":"828_CR54","doi-asserted-by":"publisher","first-page":"1157","DOI":"10.3758\/s13423-013-0572-3","volume":"21","author":"R Hoekstra","year":"2014","unstructured":"Hoekstra R, Morey R, Rouder J, Wagenmakers E-J (2014) Robust misinterpretation of confidence intervals. Psychon Bull Rev 21(5):1157\u20131164","journal-title":"Psychon Bull Rev"},{"issue":"2","key":"828_CR55","first-page":"65","volume":"6","author":"S Holm","year":"1979","unstructured":"Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65\u201370","journal-title":"Scand J Stat"},{"issue":"3","key":"828_CR56","doi-asserted-by":"publisher","first-page":"295","DOI":"10.1177\/0959354304043638","volume":"14","author":"R Hubbard","year":"2004","unstructured":"Hubbard R (2004) Alphabet soup\u2014blurring the distinctions between $$p$$\u2019s and $$\\alpha $$\u2019s in psychological research. Theory Psychol 14(3):295\u2013327","journal-title":"Theory Psychol"},{"key":"828_CR57","doi-asserted-by":"crossref","unstructured":"Hubbard R (2019) Will the ASA\u2019s efforts to improve statistical practice be successful? Some evidence to the contrary. Am Stat 73(sup1: Statistical Inference in the 21st Century: A World Beyond $$p < 0.05$$):31\u201335","DOI":"10.1080\/00031305.2018.1497540"},{"key":"828_CR58","unstructured":"Hubbard R, Bayarri M (2003) P values are not error probabilities. Technical Report University of Valencia. http:\/\/www.uv.es\/sestio\/TechRep\/tr14-03.pdf. Accessed 8 February 2021"},{"issue":"6","key":"828_CR59","doi-asserted-by":"publisher","first-page":"571","DOI":"10.1080\/03610928008827904","volume":"9","author":"R Iman","year":"1980","unstructured":"Iman R, Davenport J (1980) Approximations of the critical region of the Friedman statistic. Commun Stat 9(6):571\u2013595","journal-title":"Commun Stat"},{"key":"828_CR60","doi-asserted-by":"crossref","unstructured":"Infanger D, Schmidt-Trucks\u00e4ss A (2019) P value functions: an underused method to present research results and to promote quantitative reasoning. Stat Med 38(21):4189\u20134197","DOI":"10.1002\/sim.8293"},{"issue":"14","key":"828_CR61","doi-asserted-by":"publisher","first-page":"1960","DOI":"10.1016\/j.patrec.2008.06.018","volume":"29","author":"A Isaksson","year":"2008","unstructured":"Isaksson A, Wallmana M, G\u00f6ransson H, Gustafsson M (2008) Cross-validation and bootstrapping are unreliable in small sample classification. Pattern Recogn Lett 29(14):1960\u20131965","journal-title":"Pattern Recogn Lett"},{"key":"828_CR62","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511921803","volume-title":"Evaluating learning algorithms: a classification perspective","author":"N Japkowicz","year":"2011","unstructured":"Japkowicz N, Shah M (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press, New York"},{"issue":"430","key":"828_CR63","doi-asserted-by":"publisher","first-page":"773","DOI":"10.1080\/01621459.1995.10476572","volume":"90","author":"R Kass","year":"1995","unstructured":"Kass R, Raftery A (1995) Bayes factors. J Am Stat Assoc 90(430):773\u2013795","journal-title":"J Am Stat Assoc"},{"issue":"5","key":"828_CR64","doi-asserted-by":"publisher","first-page":"658","DOI":"10.1002\/wcs.72","volume":"1","author":"J Kruschke","year":"2010","unstructured":"Kruschke J (2010) Bayesian data analysis. WIREs Cogn Sci 1(5):658\u2013676","journal-title":"WIREs Cogn Sci"},{"issue":"2","key":"828_CR65","doi-asserted-by":"publisher","first-page":"573","DOI":"10.1037\/a0029146","volume":"142","author":"J Kruschke","year":"2013","unstructured":"Kruschke J (2013) Bayesian estimation supersedes the $$t$$ test. J Exp Psychol Gen 142(2):573\u2013603","journal-title":"J Exp Psychol Gen"},{"key":"828_CR66","unstructured":"Kruschke J (2015) Doing Bayesian data analysis, 2nd edn. Elsevier Academic Press, Amsterdam. http:\/\/doingbayesiandataanalysis.blogspot.com\/"},{"issue":"2","key":"828_CR67","doi-asserted-by":"publisher","first-page":"270","DOI":"10.1177\/2515245918771304","volume":"1","author":"J Kruschke","year":"2018","unstructured":"Kruschke J (2018) Rejecting or accepting parameter values in Bayesian estimation. Adv Methods Pract Psychol Sci 1(2):270\u2013280","journal-title":"Adv Methods Pract Psychol Sci"},{"key":"828_CR68","doi-asserted-by":"publisher","first-page":"155","DOI":"10.3758\/s13423-017-1272-1","volume":"25","author":"J Kruschke","year":"2018","unstructured":"Kruschke J, Liddell T (2018) Bayesian data analysis for newcomers. Psychon Bull Rev 25:155\u2013177","journal-title":"Psychon Bull Rev"},{"issue":"3","key":"828_CR69","doi-asserted-by":"publisher","first-page":"639","DOI":"10.1177\/1745691620958012","volume":"16","author":"D Lakens","year":"2021","unstructured":"Lakens D (2021) The practical alternative to the $$p$$ value is the correctly used $$p$$ value. Perspect Psychol Sci 16(3):639\u2013648","journal-title":"Perspect Psychol Sci"},{"key":"828_CR70","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1093\/biomet\/44.1-2.187","volume":"44","author":"D Lindley","year":"1957","unstructured":"Lindley D (1957) A statistical paradox. Biometrika 44:187\u2013192","journal-title":"Biometrika"},{"key":"828_CR71","doi-asserted-by":"publisher","first-page":"153","DOI":"10.1007\/BF02295996","volume":"12","author":"Q McNemar","year":"1947","unstructured":"McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12:153\u2013157","journal-title":"Psychometrika"},{"key":"828_CR72","doi-asserted-by":"crossref","unstructured":"McShane BB, Gal D, Gelman A, Robert C, Tackett JL (2019) Abandon statistical significance. Am Stat 73(sup1: Statistical Inference in the 21st Century: A World Beyond $$p < 0.05$$):235\u2013245","DOI":"10.1080\/00031305.2018.1527253"},{"issue":"2","key":"828_CR73","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1086\/288135","volume":"34","author":"P Meehl","year":"1967","unstructured":"Meehl P (1967) Theory-testing in psychology and physics: a methodological paradox. Philos Sci 34(2):103\u2013115","journal-title":"Philos Sci"},{"issue":"1","key":"828_CR74","doi-asserted-by":"publisher","first-page":"124","DOI":"10.3758\/s13423-015-0859-7","volume":"23","author":"J Miller","year":"2014","unstructured":"Miller J, Ulrich R (2014) Interpreting confidence intervals: a comment on Hoekstra, Morey, and Wagenmakers (2014). Psychon Bull Rev 23(1):124\u2013130","journal-title":"Psychon Bull Rev"},{"key":"828_CR75","unstructured":"Mulaik S, Raju N, R.A H (2016) There is a time and a place for significance testing. In: Harlow L, Mulaik S, Steiger J (eds) What if there were no significance tests? Routledge Classic Editions"},{"key":"828_CR76","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1023\/A:1024068626366","volume":"52","author":"C Nadeau","year":"2003","unstructured":"Nadeau C, Bengio Y (2003) Inference for the generalization error. Mach Learn 52:239\u2013281","journal-title":"Mach Learn"},{"issue":"11","key":"828_CR77","doi-asserted-by":"publisher","first-page":"2600","DOI":"10.1073\/pnas.1708274114","volume":"115","author":"B Nosek","year":"2018","unstructured":"Nosek B, Ebersole C, DeHaven A, Mellor D (2018) The preregistration revolution. Proc Natl Acad Sci USA 115(11):2600\u20132606","journal-title":"Proc Natl Acad Sci USA"},{"key":"828_CR78","doi-asserted-by":"publisher","first-page":"150","DOI":"10.1038\/506150a","volume":"506","author":"R Nuzzo","year":"2014","unstructured":"Nuzzo R (2014) Statistical errors. Nature 506:150\u2013152","journal-title":"Nature"},{"key":"828_CR79","doi-asserted-by":"publisher","first-page":"1236","DOI":"10.1136\/bmj.316.7139.1236","volume":"316","author":"T Perneger","year":"1998","unstructured":"Perneger T (1998) What\u2019s wrong with Bonferroni adjustments. BMJ 316:1236\u20131238","journal-title":"BMJ"},{"issue":"77","key":"828_CR80","doi-asserted-by":"publisher","first-page":"195","DOI":"10.2105\/AJPH.77.2.195","volume":"2","author":"C Poole","year":"1987","unstructured":"Poole C (1987) Beyond the confidence interval. Am J Public Health 2(77):195\u2013199","journal-title":"Am J Public Health"},{"key":"828_CR81","unstructured":"Raschka S (2018) Model evaluation, model selection, and algorithm selection in machine learning. CoRR. arXiv:1811.12808"},{"issue":"1","key":"828_CR82","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1097\/00001648-199001000-00010","volume":"1","author":"K Rothman","year":"1990","unstructured":"Rothman K (1990) No adjustments are needed for multiple comparisons. Epidemiology 1(1):43\u201346","journal-title":"Epidemiology"},{"issue":"3","key":"828_CR83","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1097\/00001648-199805000-00019","volume":"9","author":"K Rothman","year":"1998","unstructured":"Rothman K (1998) Writing for epidemiology. Epidemiology 9(3):333\u2013337","journal-title":"Epidemiology"},{"key":"828_CR84","unstructured":"Rothman K, Greenland S, Lash T (2008) Modern epidemiology, 3rd edn. Wolters Kluwer"},{"key":"828_CR85","doi-asserted-by":"publisher","first-page":"416","DOI":"10.1037\/h0042040","volume":"57","author":"W Rozeboom","year":"1960","unstructured":"Rozeboom W (1960) The fallacy of the null hypothesis significance test. Psychol Bull 57:416\u2013428","journal-title":"Psychol Bull"},{"key":"828_CR86","doi-asserted-by":"publisher","first-page":"317","DOI":"10.1023\/A:1009752403260","volume":"1","author":"S Salzberg","year":"1997","unstructured":"Salzberg S (1997) On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Disc 1:317\u2013327","journal-title":"Data Min Knowl Disc"},{"issue":"2","key":"828_CR87","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1037\/1082-989X.1.2.115","volume":"1","author":"F Schmidt","year":"1996","unstructured":"Schmidt F (1996) Statistical significance testing and cumulative knowledge in psychology: implications for training of researchers. Psychol Methods 1(2):115\u2013129","journal-title":"Psychol Methods"},{"key":"828_CR88","unstructured":"Schmidt F, Hunter J (2016) Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In: Harlow L, Mulaik S, Steiger J (eds) What if there were no significance tests? Routledge, pp 35\u201360"},{"issue":"1","key":"828_CR89","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1007\/s11192-014-1251-5","volume":"102","author":"J Schneider","year":"2015","unstructured":"Schneider J (2015) Null hypothesis significance tests: a mix-up of two different theories-the basis for widespread confusion and numerous misinterpretations. Scientometrics 102(1):411\u2013432","journal-title":"Scientometrics"},{"issue":"1","key":"828_CR90","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1198\/000313001300339950","volume":"55","author":"T Sellke","year":"2001","unstructured":"Sellke T, Bayarri M, Berger J (2001) Calibration of $$p$$ values for testing precise null hypotheses. Am Stat 55(1):62\u201371","journal-title":"Am Stat"},{"issue":"1","key":"828_CR91","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1037\/0003-066X.40.1.73","volume":"40","author":"R Serlin","year":"1985","unstructured":"Serlin R, Lapsley D (1985) Rationality in psychological research: the good-enough principle. Am Psychol 40(1):73\u201383","journal-title":"Am Psychol"},{"key":"828_CR92","unstructured":"Sheskin D (2007) Handbook of parametric and nonparametric statistical procedures, 4th edn. Chapman and Hall, CRC"},{"key":"828_CR93","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/0197-2456(89)90015-9","volume":"10","author":"R Simon","year":"1989","unstructured":"Simon R (1989) Optimal two-stage designs for stage II clinical trials. Control Clin Trials 10:1\u201310","journal-title":"Control Clin Trials"},{"key":"828_CR94","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1007\/s10654-010-9440-x","volume":"25","author":"A Stang","year":"2010","unstructured":"Stang A, Poole C, Kuss O (2010) The ongoing tyranny of statistical significance testing in biomedical research. Eur J Epidemiol 25:225\u2013230","journal-title":"Eur J Epidemiol"},{"issue":"1","key":"828_CR95","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1214\/ss\/1177011945","volume":"6","author":"J Tukey","year":"1991","unstructured":"Tukey J (1991) The philosophy of multiple comparisons. Stat Sci 6(1):100\u2013116","journal-title":"Stat Sci"},{"key":"828_CR96","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1111\/j.2517-6161.1993.tb01904.x","volume":"55","author":"V Vovk","year":"1993","unstructured":"Vovk V (1993) A logic of probability, with application to the foundations of statistics. J Roy Stat Soc B 55:317\u2013351","journal-title":"J Roy Stat Soc B"},{"issue":"5","key":"828_CR97","doi-asserted-by":"publisher","first-page":"779","DOI":"10.3758\/BF03194105","volume":"14","author":"E-J Wagenmakers","year":"2007","unstructured":"Wagenmakers E-J (2007) A practical solution to the pervasive problems of $$p$$ values. Psychon Bull Rev 14(5):779\u2013804","journal-title":"Psychon Bull Rev"},{"key":"828_CR98","unstructured":"Wagenmakers E-J, Ly A (2021) History and nature of the Jeffreys\u2013Lindley Paradox. https:\/\/arxiv.org\/abs\/2111.10191"},{"key":"828_CR99","doi-asserted-by":"publisher","unstructured":"Wagenmakers E-J, Gronau Q, Vandekerckhove J (2019) Five Bayesian intuitions for the stopping rule principle. PsyArXiv 1\u201313. https:\/\/doi.org\/10.31234\/osf.io\/5ntkd","DOI":"10.31234\/osf.io\/5ntkd"},{"issue":"2","key":"828_CR100","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1080\/00031305.2016.1154108","volume":"70","author":"R Wasserstein","year":"2016","unstructured":"Wasserstein R, Lazar N (2016) The ASA\u2019s statement on $$p$$-values: context, process, and purpose (editorial). Am Stat 70(2):129\u2013133","journal-title":"Am Stat"},{"key":"828_CR101","doi-asserted-by":"crossref","unstructured":"Wasserstein R, Schirm A, Lazar N (2019) Moving to a world beyond \u201c$$p < 0.05$$\". Am Stat 73(sup1: Statistical Inference in the 21st Century: A World Beyond $$p < 0.05$$):1\u201319","DOI":"10.1080\/00031305.2019.1583913"},{"issue":"6","key":"828_CR102","doi-asserted-by":"publisher","first-page":"80","DOI":"10.2307\/3001968","volume":"1","author":"F Wilcoxon","year":"1945","unstructured":"Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80\u201383","journal-title":"Biom Bull"},{"issue":"7","key":"828_CR103","doi-asserted-by":"publisher","first-page":"1341","DOI":"10.1162\/neco.1996.8.7.1341","volume":"8","author":"D Wolpert","year":"1996","unstructured":"Wolpert D (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341\u20131390","journal-title":"Neural Comput"}],"container-title":["Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-022-00828-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10618-022-00828-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-022-00828-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,21]],"date-time":"2024-09-21T23:01:49Z","timestamp":1726959709000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10618-022-00828-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,11]]},"references-count":103,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,5]]}},"alternative-id":["828"],"URL":"https:\/\/doi.org\/10.1007\/s10618-022-00828-1","relation":{},"ISSN":["1384-5810","1573-756X"],"issn-type":[{"value":"1384-5810","type":"print"},{"value":"1573-756X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,11]]},"assertion":[{"value":"16 February 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 March 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 April 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}