{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T15:59:47Z","timestamp":1765382387793,"version":"3.37.3"},"reference-count":45,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,5,14]],"date-time":"2022-05-14T00:00:00Z","timestamp":1652486400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,5,14]],"date-time":"2022-05-14T00:00:00Z","timestamp":1652486400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"publisher","award":["298742","313122"],"award-info":[{"award-number":["298742","313122"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006136","name":"Teknologiateollisuuden 100-Vuotisjuhlas\u00e4\u00e4ti\u00f6","doi-asserted-by":"publisher","award":["70007503"],"award-info":[{"award-number":["70007503"]}],"id":[{"id":"10.13039\/501100006136","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Comput Stat"],"published-print":{"date-parts":[[2023,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Variable selection, or more generally, model reduction is an important aspect of the statistical workflow aiming to provide insights from data. In this paper, we discuss and demonstrate the benefits of using a reference model in variable selection. A reference model acts as a noise-filter on the target variable by modeling its data generating mechanism. As a result, using the reference model predictions in the model selection procedure reduces the variability and improves stability, leading to improved model selection performance. Assuming that a Bayesian reference model describes the true distribution of future data well, the theoretically preferred usage of the reference model is to project its predictive distribution to a reduced model, leading to projection predictive variable selection approach. We analyse how much the great performance of the projection predictive variable is due to the use of reference model and show that other variable selection methods can also be greatly improved by using the reference model as target instead of the original data. In several numerical experiments, we investigate the performance of the projective prediction approach as well as alternative variable selection methods with and without reference models. Our results indicate that the use of reference models generally translates into better and more stable variable selection.<\/jats:p>","DOI":"10.1007\/s00180-022-01231-6","type":"journal-article","created":{"date-parts":[[2022,5,14]],"date-time":"2022-05-14T04:03:03Z","timestamp":1652500983000},"page":"349-371","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Using reference models in variable selection"],"prefix":"10.1007","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8460-4762","authenticated-orcid":false,"given":"Federico","family":"Pavone","sequence":"first","affiliation":[]},{"given":"Juho","family":"Piironen","sequence":"additional","affiliation":[]},{"given":"Paul-Christian","family":"B\u00fcrkner","sequence":"additional","affiliation":[]},{"given":"Aki","family":"Vehtari","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,5,14]]},"reference":[{"key":"1231_CR1","doi-asserted-by":"crossref","unstructured":"Akaike H (1974) A new look at the statistical model identification selected papers of Hirotugu Akaike. Springer, pp 215\u2013222","DOI":"10.1007\/978-1-4612-1694-0_16"},{"issue":"473","key":"1231_CR2","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1198\/016214505000000628","volume":"101","author":"E Bair","year":"2006","unstructured":"Bair E, Hastie T, Paul D, Tibshirani R (2006) Prediction by supervised principal components. J Am Stat Assoc 101(473):119\u2013137","journal-title":"J Am Stat Assoc"},{"key":"1231_CR3","doi-asserted-by":"crossref","unstructured":"Betancourt M (2017) A conceptual introduction to Hamiltonian Monte Carlo. arXiv:1701.02434","DOI":"10.3150\/16-BEJ810"},{"issue":"1","key":"1231_CR4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v076.i01","volume":"76","author":"B Carpenter","year":"2017","unstructured":"Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Riddell A (2017) Stan: a probabilistic programming language. J Stat Softw 76(1):1\u201332","journal-title":"J Stat Softw"},{"key":"1231_CR5","unstructured":"Catalina A, B\u00fcrkner PC, Vehtari A (2020) Projection predictive inference for generalized linear and additive multilevel models. arXiv:2010.06994"},{"key":"1231_CR6","unstructured":"Catalina A, B\u00fcrkner P, Vehtari A (2021) Latent space projection predictive inference. arXiv:2109.04702"},{"issue":"1\u20132","key":"1231_CR7","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1016\/S0378-3758(02)00286-0","volume":"111","author":"JA Dupuis","year":"2003","unstructured":"Dupuis JA, Robert CP (2003) Variable selection in qualitative models via an entropic explanatory power. J Stat Plan Inference 111(1\u20132):77\u201394","journal-title":"J Stat Plan Inference"},{"issue":"1","key":"1231_CR8","first-page":"1","volume":"23","author":"B Efron","year":"2008","unstructured":"Efron B (2008) Microarrays, empirical Bayes and the two-groups model. Stat Sci 23(1):1\u201322","journal-title":"Stat Sci"},{"key":"1231_CR9","doi-asserted-by":"crossref","unstructured":"Efron B (2011) Tweedie\u2019s formula and selection bias. J Am Stat Assoc 106(496):1602\u20131614","DOI":"10.1198\/jasa.2011.tm11181"},{"key":"1231_CR10","volume-title":"Large-scale inference: empirical Bayes methods for estimation, testing, and prediction","author":"B Efron","year":"2012","unstructured":"Efron B (2012) Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Cambridge University Press, Cambridge"},{"key":"1231_CR11","unstructured":"Efron B, Turnbull B, Narasimhan B (2015) locfdr: Computes local false discovery rates https:\/\/CRAN.R-project.org\/package=locfdr. R package version 1.1-8"},{"issue":"19","key":"1231_CR12","doi-asserted-by":"publisher","first-page":"2965","DOI":"10.1002\/sim.912","volume":"20","author":"D Faraggi","year":"2001","unstructured":"Faraggi D, LeBlanc M, Crowley J (2001) Understanding neural networks using regression trees: an application to multiple myeloma survival data. Stat Med 20(19):2965\u20132976","journal-title":"Stat Med"},{"issue":"2","key":"1231_CR13","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1111\/rssa.12378","volume":"182","author":"J Gabry","year":"2019","unstructured":"Gabry J, Simpson D, Vehtari A, Betancourt M, Gelman A (2019) Visualization in Bayesian workflow. J R Stat Soc Ser A (Stat Soc) 182(2):389\u2013402","journal-title":"J R Stat Soc Ser A (Stat Soc)"},{"key":"1231_CR14","unstructured":"Gelman A, Vehtari A, Simpson D, Margossian CC, Carpenter B, Yao Y, Modr\u00e1k M (2020) Bayesian workflow. arXiv:2011.01808"},{"key":"1231_CR15","unstructured":"Goodrich B, Gabry J, Ali I Brilleman S (2019) rstanarm: Bayesian applied regression modeling via Stan. https:\/\/mc-stan.org\/rstanarm. R package version 2.19.3"},{"key":"1231_CR16","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-19425-7","volume-title":"Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis","author":"FE Harrell","year":"2015","unstructured":"Harrell FE (2015) Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer, Berlin"},{"key":"1231_CR17","doi-asserted-by":"crossref","unstructured":"Hawkins D (1989) Using U statistics to derive the asymptotic distribution of Fisher\u2019s Z statistic. Am Stat 43(4):235\u2013237","DOI":"10.1080\/00031305.1989.10475666"},{"issue":"3","key":"1231_CR18","doi-asserted-by":"publisher","first-page":"431","DOI":"10.1002\/bimj.201700067","volume":"60","author":"G Heinze","year":"2018","unstructured":"Heinze G, Wallisch C, Dunkler D (2018) Variable selection\u2014a review and recommendations for the practicing statistician. Biom J 60(3):431\u2013449","journal-title":"Biom J"},{"issue":"1","key":"1231_CR19","first-page":"1593","volume":"15","author":"MD Hoffman","year":"2014","unstructured":"Hoffman MD, Gelman A (2014) The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res 15(1):1593\u20131623","journal-title":"J Mach Learn Res"},{"key":"1231_CR20","doi-asserted-by":"crossref","unstructured":"Johnson RW (1996) Fitting percentage of body fat to simple body measurements. J Stat Educ 4(1)","DOI":"10.1080\/10691898.1996.11910505"},{"issue":"4","key":"1231_CR21","doi-asserted-by":"publisher","first-page":"1594","DOI":"10.1214\/009053604000000030","volume":"32","author":"IM Johnstone","year":"2004","unstructured":"Johnstone IM, Silverman BW (2004) Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. Ann Stat 32(4):1594\u20131649","journal-title":"Ann Stat"},{"issue":"1","key":"1231_CR22","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1111\/j.2517-6161.1968.tb01505.x","volume":"30","author":"DV Lindley","year":"1968","unstructured":"Lindley DV (1968) The choice of variables in multiple regression. J Roy Stat Soc Ser B (Methodol) 30(1):31\u201353","journal-title":"J Roy Stat Soc Ser B (Methodol)"},{"issue":"1","key":"1231_CR23","first-page":"6345","volume":"18","author":"S Nogueira","year":"2017","unstructured":"Nogueira S, Sechidis K, Brown G (2017) On the stability of feature selection algorithms. J Mach Learn Res 18(1):6345\u20136398","journal-title":"J Mach Learn Res"},{"key":"1231_CR24","doi-asserted-by":"publisher","DOI":"10.1002\/9780470746684","volume-title":"Decision theory: principles and approaches","author":"G Parmigiani","year":"2009","unstructured":"Parmigiani G, Inoue L (2009) Decision theory: principles and approaches, vol 812. Wiley, New York"},{"key":"1231_CR25","doi-asserted-by":"crossref","unstructured":"Paul D, Bair E, Hastie T, Tibshirani R (2008) \u201cPreconditioning\u201d for feature selection and regression in high-dimensional problems. Ann Stat 36(4):1595\u20131618","DOI":"10.1214\/009053607000000578"},{"key":"1231_CR26","doi-asserted-by":"crossref","unstructured":"Piironen J, Vehtari A (2015) Projection predictive variable selection using Stan + R. arXiv:1508.02502","DOI":"10.1109\/MLSP.2016.7738829"},{"key":"1231_CR27","doi-asserted-by":"crossref","unstructured":"Piironen J, Vehtari A (2016) Projection predictive model selection for Gaussian processes. In: 2016 IEEE 26th international workshop on machine learning for signal processing (MLSP)","DOI":"10.1109\/MLSP.2016.7738829"},{"issue":"3","key":"1231_CR28","doi-asserted-by":"publisher","first-page":"711","DOI":"10.1007\/s11222-016-9649-y","volume":"27","author":"J Piironen","year":"2017","unstructured":"Piironen J, Vehtari A (2017a) Comparison of Bayesian predictive methods for model selection. Stat Comput 27(3):711\u2013735","journal-title":"Stat Comput"},{"issue":"2","key":"1231_CR29","doi-asserted-by":"publisher","first-page":"5018","DOI":"10.1214\/17-EJS1337SI","volume":"11","author":"J Piironen","year":"2017","unstructured":"Piironen J, Vehtari A (2017b) Sparsity information and regularization in the horseshoe and other shrinkage priors. Electron J Stat 11(2):5018\u20135051","journal-title":"Electron J Stat"},{"key":"1231_CR30","unstructured":"Piironen J, Vehtari A (2018) Iterative supervised principal components. In: Storkey A, Perez-Cruz F (eds) Proceedings of the 21st international conference on artificial intelligence and statistics, vol 84, pp 106\u2013114"},{"key":"1231_CR31","doi-asserted-by":"crossref","unstructured":"Piironen J, Paasiniemi M, Vehtari A (2019) projpred: projection predictive feature selection. http:\/\/mc-stan.org\/projpred, http:\/\/discourse.mc-stan.org\/","DOI":"10.32614\/CRAN.package.projpred"},{"issue":"1","key":"1231_CR32","doi-asserted-by":"publisher","first-page":"2155","DOI":"10.1214\/20-EJS1711","volume":"14","author":"J Piironen","year":"2020","unstructured":"Piironen J, Paasiniemi M, Vehtari A (2020) Projective inference in high-dimensional problems: prediction and feature selection. Electron J Stat 14(1):2155\u20132197","journal-title":"Electron J Stat"},{"key":"1231_CR33","unstructured":"R Core Team (2018) R: a language and environment for statistical computing Vienna, Austria. https:\/\/www.R-project.org\/"},{"issue":"11\u201312","key":"1231_CR34","doi-asserted-by":"publisher","first-page":"1221","DOI":"10.1002\/sim.4439","volume":"31","author":"V Rockova","year":"2012","unstructured":"Rockova V, Lesaffre E, Luime J, L\u00f6wenberg B (2012) Hierarchical Bayesian formulations for selecting variables in regression models. Stat Med 31(11\u201312):1221\u20131237","journal-title":"Stat Med"},{"key":"1231_CR35","unstructured":"Silverman BW, Evers L, Xu K, Carbonetto P, Stephens M (2017) Ebayesthresh: empirical bayes thresholding and related. https:\/\/CRAN.R-project.org\/package=EbayesThresh. R package version 1.4-12"},{"key":"1231_CR36","unstructured":"Sivula T, Magnusson, M Vehtari A (2020) Uncertainty in Bayesian leave-one-out cross-validation based model comparison. arXiv:2008.10296"},{"key":"1231_CR37","unstructured":"Stan Development Team (2019) RStan: the R interface to Stan. http:\/\/mc-stan.org\/. R package version 2.19.2"},{"key":"1231_CR38","doi-asserted-by":"crossref","unstructured":"Stein C (1956) Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proceedings of the third Berkeley symposium on mathematical statistics and probability, volume 1: contributions to the theory of statistics","DOI":"10.1525\/9780520313880-018"},{"key":"1231_CR39","unstructured":"Stein C, James W (1961) Estimation with quadratic loss. In: Proceedings of the 4th Berkeley symposium mathematical statistics probability, vol 1, pp 361\u2013379"},{"issue":"1","key":"1231_CR40","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","volume":"58","author":"R Tibshirani","year":"1996","unstructured":"Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B (Methodol) 58(1):267\u2013288","journal-title":"J Roy Stat Soc Ser B (Methodol)"},{"key":"1231_CR41","doi-asserted-by":"publisher","first-page":"142","DOI":"10.1214\/12-SS102","volume":"6","author":"A Vehtari","year":"2012","unstructured":"Vehtari A, Ojanen J (2012) A survey of Bayesian predictive methods for model assessment, selection and comparison. Stat Surv 6:142\u2013228","journal-title":"Stat Surv"},{"issue":"5","key":"1231_CR42","doi-asserted-by":"publisher","first-page":"1413","DOI":"10.1007\/s11222-016-9696-4","volume":"27","author":"A Vehtari","year":"2017","unstructured":"Vehtari A, Gelman A, Gabry J (2017) Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27(5):1413\u20131432","journal-title":"Stat Comput"},{"key":"1231_CR43","volume-title":"Modern applied statistics with s-plus","author":"WN Venables","year":"2013","unstructured":"Venables WN, Ripley BD (2013) Modern applied statistics with s-plus. Springer, Berlin"},{"key":"1231_CR44","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24277-4","volume-title":"ggplot2: elegant graphics for data analysis","author":"H Wickham","year":"2016","unstructured":"Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer, New York"},{"issue":"43","key":"1231_CR45","doi-asserted-by":"publisher","first-page":"1686","DOI":"10.21105\/joss.01686","volume":"4","author":"H Wickham","year":"2019","unstructured":"Wickham H, Averick M, Bryan J, Chang W, McGowan LD, Fran\u00e7ois R, Yutani H (2019) Welcome to the tidyverse. J Open Source Softw 4(43):1686","journal-title":"J Open Source Softw"}],"container-title":["Computational Statistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00180-022-01231-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00180-022-01231-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00180-022-01231-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,24]],"date-time":"2024-09-24T19:41:24Z","timestamp":1727206884000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00180-022-01231-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,14]]},"references-count":45,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,3]]}},"alternative-id":["1231"],"URL":"https:\/\/doi.org\/10.1007\/s00180-022-01231-6","relation":{},"ISSN":["0943-4062","1613-9658"],"issn-type":[{"type":"print","value":"0943-4062"},{"type":"electronic","value":"1613-9658"}],"subject":[],"published":{"date-parts":[[2022,5,14]]},"assertion":[{"value":"4 December 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 April 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 May 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 September 2022","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Update","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Missing Open Access funding information has been added in the Funding Note.","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}}]}}