{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,16]],"date-time":"2026-05-16T02:05:48Z","timestamp":1778897148002,"version":"3.51.4"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,12,2]],"date-time":"2022-12-02T00:00:00Z","timestamp":1669939200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,12,2]],"date-time":"2022-12-02T00:00:00Z","timestamp":1669939200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000185","name":"Defense Advanced Research Projects Agency","doi-asserted-by":"publisher","award":["D17AC00001"],"award-info":[{"award-number":["D17AC00001"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Stat Comput"],"published-print":{"date-parts":[[2023,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Predictive modeling uncovers knowledge and insights regarding a hypothesized data generating mechanism (DGM). Results from different studies on a complex DGM, derived from different data sets, and using complicated models and algorithms, are hard to quantitatively compare due to random noise and statistical uncertainty in model results. This has been one of the main contributors to the <jats:italic>replication crisis<\/jats:italic> in the behavioral sciences. The contribution of this paper is to apply prediction scoring to the problem of comparing two studies, such as can arise when evaluating replications or competing evidence. We examine the role of predictive models in quantitatively assessing agreement between two datasets that are assumed to come from two distinct DGMs. We formalize a distance between the DGMs that is estimated using cross validation. We argue that the resulting prediction scores depend on the predictive models created by cross validation. In this sense, the prediction scores measure the distance between DGMs, along the dimension of the particular predictive model. Using human behavior data from experimental economics, we demonstrate that prediction scores can be used to evaluate preregistered hypotheses and provide insights comparing data from different populations and settings. We examine the asymptotic behavior of the prediction scores using simulated experimental data and demonstrate that leveraging competing predictive models can reveal important differences between underlying DGMs. Our proposed cross-validated prediction scores are capable of quantifying differences between unobserved data generating mechanisms and allow for the validation and assessment of results from complex models.<\/jats:p>","DOI":"10.1007\/s11222-022-10154-7","type":"journal-article","created":{"date-parts":[[2022,12,2]],"date-time":"2022-12-02T18:03:42Z","timestamp":1670004222000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Prediction scoring of data-driven discoveries for reproducible research"],"prefix":"10.1007","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1507-0207","authenticated-orcid":false,"given":"Anna L.","family":"Smith","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tian","family":"Zheng","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrew","family":"Gelman","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,12,2]]},"reference":[{"key":"10154_CR1","doi-asserted-by":"publisher","first-page":"291","DOI":"10.1080\/00031305.2018.1518270","volume":"73","author":"D Billheimer","year":"2019","unstructured":"Billheimer, D.: Predictive inference and scientific reproducibility. Am. Stat. 73, 291\u2013295 (2019)","journal-title":"Am. Stat."},{"key":"10154_CR2","doi-asserted-by":"crossref","unstructured":"Colling, L.J., Sz\u0171cs, D.: Statistical inference and the replication crisis. Rev. Philos. Psychol. pp. 1\u201327 (2018)","DOI":"10.1007\/s13164-018-0421-4"},{"key":"10154_CR3","doi-asserted-by":"publisher","first-page":"675","DOI":"10.1198\/106186006X136976","volume":"15","author":"SR Cook","year":"2006","unstructured":"Cook, S.R., Gelman, A., Rubin, D.B.: Validation of software for Bayesian models using posterior quantiles. J. Comput. Graph. Stat. 15, 675\u2013692 (2006)","journal-title":"J. Comput. Graph. Stat."},{"key":"10154_CR4","doi-asserted-by":"crossref","unstructured":"Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning pp. 233\u2013240. ACM (2006)","DOI":"10.1145\/1143844.1143874"},{"key":"10154_CR5","unstructured":"Diego-Rosell, P.: Experiment 1, Open Science Framework, OSF (2017)"},{"key":"10154_CR6","volume-title":"Predictive Inference","author":"S Geisser","year":"2017","unstructured":"Geisser, S.: Predictive Inference. Routledge, London (2017)"},{"key":"10154_CR7","doi-asserted-by":"publisher","first-page":"432","DOI":"10.1198\/004017005000000661","volume":"48","author":"A Gelman","year":"2006","unstructured":"Gelman, A.: Multilevel (hierarchical) modeling: what it can and cannot do. Technometrics 48, 432\u2013435 (2006)","journal-title":"Technometrics"},{"key":"10154_CR8","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1093\/pan\/mps032","volume":"21","author":"A Gelman","year":"2013","unstructured":"Gelman, A.: Preregistration of studies and mock reports. Polit. Anal. 21, 40\u201341 (2013)","journal-title":"Polit. Anal."},{"key":"10154_CR9","doi-asserted-by":"publisher","first-page":"997","DOI":"10.1007\/s11222-013-9416-2","volume":"24","author":"A Gelman","year":"2014","unstructured":"Gelman, A., Hwang, J., Vehtari, A.: Understanding predictive information criteria for Bayesian models. Stat. Comput. 24, 997\u20131016 (2014)","journal-title":"Stat. Comput."},{"key":"10154_CR10","doi-asserted-by":"publisher","first-page":"460","DOI":"10.1511\/2014.111.460","volume":"102","author":"A Gelman","year":"2014","unstructured":"Gelman, A., Loken, E.: The statistical crisis in science. Am. Sci. 102, 460\u2013465 (2014)","journal-title":"Am. Sci."},{"key":"10154_CR11","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1111\/j.1467-9868.2007.00587.x","volume":"69","author":"T Gneiting","year":"2007","unstructured":"Gneiting, T., Balabdaoui, F., Raftery, A.E.: Probabilistic forecasts, calibration and sharpness. J. R. Stat. Soc. B 69, 243\u2013268 (2007)","journal-title":"J. R. Stat. Soc. B"},{"key":"10154_CR12","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1146\/annurev-statistics-062713-085831","volume":"1","author":"T Gneiting","year":"2014","unstructured":"Gneiting, T., Katzfuss, M.: Probabilistic forecasting. Annu. Rev. Stat. Appl. 1, 125\u2013151 (2014)","journal-title":"Annu. Rev. Stat. Appl."},{"key":"10154_CR13","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1198\/016214506000001437","volume":"102","author":"T Gneiting","year":"2007","unstructured":"Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102, 359\u2013378 (2007)","journal-title":"J. Am. Stat. Assoc."},{"key":"10154_CR14","volume-title":"The Elements of Statistical Learning","author":"T Hastie","year":"2017","unstructured":"Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, Berlin (2017)","edition":"2"},{"key":"10154_CR15","doi-asserted-by":"crossref","unstructured":"Held, L., Schr\u00f6dle, B., Rue, H.: Posterior and cross-validatory predictive checks: a comparison of MCMC and INLA. In: Statistical Modelling and Regression Structures, pp. 91\u2013110. Springer (2010)","DOI":"10.1007\/978-3-7908-2413-1_6"},{"key":"10154_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1093\/pan\/mps021","volume":"21","author":"M Humphreys","year":"2013","unstructured":"Humphreys, M., Sanchez de la Sierra, R., Van der Windt, P.: Fishing, commitment, and communication: a proposal for comprehensive nonbinding research registration. Polit. Anal. 21, 1\u201320 (2013)","journal-title":"Polit. Anal."},{"key":"10154_CR17","unstructured":"Jeske, D.: Statistical inference in the 21st century: A world beyond $$p < 0.05$$ [special issue]. Am. Stat. 73 (2019)"},{"key":"10154_CR18","first-page":"1","volume":"4","author":"AN Kolmogorov","year":"1933","unstructured":"Kolmogorov, A.N.: Sulla determinazione empirica di una legge di distribuzione. Giornale dell\u2019Istituto Italiano degli Attuari 4, 1\u201311 (1933)","journal-title":"Giornale dell\u2019Istituto Italiano degli Attuari"},{"key":"10154_CR19","volume-title":"Handbook of Experimental Economics","author":"JO Ledyard","year":"1995","unstructured":"Ledyard, J.O.: Public goods: some experimental results. In: Kagel, J., Roth, A. (eds.) Handbook of Experimental Economics. Princeton University Press, Princeton (1995)"},{"key":"10154_CR20","doi-asserted-by":"publisher","first-page":"881","DOI":"10.1007\/s11222-015-9577-2","volume":"26","author":"L Li","year":"2016","unstructured":"Li, L., Qiu, S., Zhang, B., Feng, C.X.: Approximating cross-validatory predictive evaluation in Bayesian latent variable models with integrated IS and WAIC. Stat. Comput. 26, 881\u2013897 (2016)","journal-title":"Stat. Comput."},{"key":"10154_CR21","doi-asserted-by":"publisher","first-page":"375","DOI":"10.1007\/s11222-017-9736-8","volume":"28","author":"RB Millar","year":"2018","unstructured":"Millar, R.B.: Conditional vs marginal estimation of the predictive loss of hierarchical models using WAIC and cross-validation. Stat. Comput. 28, 375\u2013385 (2018)","journal-title":"Stat. Comput."},{"key":"10154_CR22","unstructured":"Nosek, B.A., Spitzer, M., Russell, A., Tully, E., Rajtmajer, S., Ahn, S.-H., Zheng, T., Foy, D., Kluch, S.P., Stewart, C., et al.: NGS2 DARPA Program, Open Science Framework (2018). https:\/\/osf.io\/4jbx4\/"},{"key":"10154_CR23","doi-asserted-by":"publisher","first-page":"150","DOI":"10.1038\/506150a","volume":"506","author":"R Nuzzo","year":"2014","unstructured":"Nuzzo, R.: Statistical errors: P values, the\u2019gold standard\u2019of statistical validity, are not as reliable as many scientists assume. Nature 506, 150\u2013153 (2014)","journal-title":"Nature"},{"key":"10154_CR24","unstructured":"Pawel, S., Held, L.: The sceptical Bayes factor for the assessment of replication success (2020). arXiv preprint arXiv:2009.01520"},{"key":"10154_CR25","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1093\/biomet\/25.3-4.379","volume":"25","author":"K Pearson","year":"1933","unstructured":"Pearson, K.: On a method of determining whether a sample of size n supposed to have been drawn from a parent population having a known probability integral has probably been drawn at random. Biometrika 25, 379\u2013410 (1933)","journal-title":"Biometrika"},{"key":"10154_CR26","doi-asserted-by":"publisher","first-page":"e6287","DOI":"10.1371\/journal.pone.0006287","volume":"4","author":"TH Pers","year":"2009","unstructured":"Pers, T.H., Albrechtsen, A., Holst, C., S\u00f8rensen, T.I., Gerds, T.A.: The validation and assessment of machine learning: a game of prediction from high-dimensional data. PLoS ONE 4, e6287 (2009)","journal-title":"PLoS ONE"},{"key":"10154_CR27","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1016\/S0304-4076(00)00030-0","volume":"99","author":"J Racine","year":"2000","unstructured":"Racine, J.: Consistent cross-validatory model-selection for dependent data: hv-block cross-validation. J. Econom. 99, 39\u201361 (2000)","journal-title":"J. Econom."},{"key":"10154_CR28","doi-asserted-by":"publisher","first-page":"19193","DOI":"10.1073\/pnas.1108243108","volume":"108","author":"DG Rand","year":"2011","unstructured":"Rand, D.G., Arbesman, S., Christakis, N.A.: Dynamic social networks promote cooperation in experiments with humans. Proc. Natl. Acad. Sci. 108, 19193\u201319198 (2011)","journal-title":"Proc. Natl. Acad. Sci."},{"key":"10154_CR29","doi-asserted-by":"publisher","first-page":"913","DOI":"10.1111\/ecog.02881","volume":"40","author":"DR Roberts","year":"2017","unstructured":"Roberts, D.R., Bahn, V., Ciuti, S., Boyce, M.S., Elith, J., Guillera-Arroita, G., Hauenstein, S., Lahoz-Monfort, J.J., Schr\u00f6der, B., Thuiller, W., et al.: Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40, 913\u2013929 (2017)","journal-title":"Ecography"},{"key":"10154_CR30","doi-asserted-by":"publisher","first-page":"470","DOI":"10.1214\/aoms\/1177729394","volume":"23","author":"M Rosenblatt","year":"1952","unstructured":"Rosenblatt, M.: Remarks on a multivariate transformation. Ann. Math. Stat. 23, 470\u2013472 (1952)","journal-title":"Ann. Math. Stat."},{"key":"10154_CR31","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1177\/0956797614567341","volume":"26","author":"U Simonsohn","year":"2015","unstructured":"Simonsohn, U.: Small telescopes: Detectability and the evaluation of replication results. Psychol. Sci. 26, 559\u2013569 (2015)","journal-title":"Psychol. Sci."},{"key":"10154_CR32","unstructured":"Smith, A.L.: PredictionScoring (2020)"},{"key":"10154_CR33","unstructured":"Suchow, J.W., Stewart, A.J., Morgan, T.J.H., Malkomes, G., Krafft, P., Lall, V., Mosleh, M., Arechar, A., Akcay, E., Morsky, B., Rand, D., Plotkin, J.B., Griffiths, T.L.: Innovation in adversarial collective-sensing game, Open Science Framework, OSF (2017). https:\/\/osf.io\/zpvd3"},{"key":"10154_CR34","doi-asserted-by":"crossref","unstructured":"Sz\u00e9kely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependence by correlation of distances. Ann. Stat. 35, 2769\u20132794 (2007)","DOI":"10.1214\/009053607000000505"},{"key":"10154_CR35","unstructured":"Talts, S., Betancourt, M., Simpson, D., Vehtari, A., Gelman, A.: Validating Bayesian inference algorithms with simulation-based calibration (2018). arXiv preprint arXiv:1804.06788"},{"key":"10154_CR36","doi-asserted-by":"crossref","unstructured":"Tukey, J.W.: Data analysis, computation and mathematics. Q. Appl. Math. 30, 51\u201365 (1972)","DOI":"10.1090\/qam\/99740"},{"key":"10154_CR37","doi-asserted-by":"publisher","first-page":"1413","DOI":"10.1007\/s11222-016-9696-4","volume":"27","author":"A Vehtari","year":"2017","unstructured":"Vehtari, A., Gelman, A., Gabry, J.: Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 27, 1413\u20131432 (2017)","journal-title":"Stat. Comput."},{"key":"10154_CR38","doi-asserted-by":"publisher","first-page":"142","DOI":"10.1214\/12-SS102","volume":"6","author":"A Vehtari","year":"2012","unstructured":"Vehtari, A., Ojanen, J., et al.: A survey of Bayesian predictive methods for model assessment, selection and comparison. Stat. Surv. 6, 142\u2013228 (2012)","journal-title":"Stat. Surv."},{"key":"10154_CR39","first-page":"1","volume":"7","author":"W Wang","year":"2014","unstructured":"Wang, W., Gelman, A.: Difficulty of selecting among multilevel models using predictive accuracy. Stat. Interface 7, 1\u201388 (2014)","journal-title":"Stat. Interface"},{"key":"10154_CR40","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1080\/00031305.2016.1154108","volume":"70","author":"RL Wasserstein","year":"2016","unstructured":"Wasserstein, R.L., Lazar, N.A.: The ASA\u2019s statement on p-values: context, process, and purpose. Am. Stat. 70, 129\u2013133 (2016)","journal-title":"Am. Stat."},{"key":"10154_CR41","doi-asserted-by":"publisher","first-page":"2600","DOI":"10.1016\/j.patrec.2005.06.006","volume":"26","author":"WA Yousef","year":"2005","unstructured":"Yousef, W.A., Wagner, R.F., Loew, M.H.: Estimating the uncertainty in the estimated mean area under the ROC curve of a classifier. Pattern Recogn. Lett. 26, 2600\u20132610 (2005)","journal-title":"Pattern Recogn. Lett."},{"key":"10154_CR42","volume-title":"The Cult of Statistical Significance: How the Standard Error Costs us Jobs, Justice, and Lives","author":"S Ziliak","year":"2008","unstructured":"Ziliak, S., McCloskey, D.N.: The Cult of Statistical Significance: How the Standard Error Costs us Jobs, Justice, and Lives. University of Michigan Press, Ann Arbor (2008)"}],"container-title":["Statistics and Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11222-022-10154-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11222-022-10154-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11222-022-10154-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,13]],"date-time":"2023-02-13T22:46:00Z","timestamp":1676328360000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11222-022-10154-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,2]]},"references-count":42,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,2]]}},"alternative-id":["10154"],"URL":"https:\/\/doi.org\/10.1007\/s11222-022-10154-7","relation":{},"ISSN":["0960-3174","1573-1375"],"issn-type":[{"value":"0960-3174","type":"print"},{"value":"1573-1375","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,2]]},"assertion":[{"value":"20 May 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 September 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 December 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Experimental data analyzed in Sect.\u00a0 is available on the Open Science Framework through associated GitHub links (Diego-Rosell, ). The simulated data and all code used in this paper are available in an additional public Github repository (Smith, ).","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Data and Code"}}],"article-number":"11"}}