{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,7]],"date-time":"2025-12-07T03:52:20Z","timestamp":1765079540536,"version":"3.37.3"},"reference-count":45,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,5,29]],"date-time":"2024-05-29T00:00:00Z","timestamp":1716940800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,5,29]],"date-time":"2024-05-29T00:00:00Z","timestamp":1716940800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002341","name":"Research Council of Finland","doi-asserted-by":"crossref","award":["340721"],"award-info":[{"award-number":["340721"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Comput Stat"],"published-print":{"date-parts":[[2025,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>The projection predictive variable selection is a decision-theoretically justified Bayesian variable selection approach achieving an outstanding trade-off between predictive performance and sparsity. Its projection problem is not easy to solve in general because it is based on the Kullback\u2013Leibler divergence from a restricted posterior predictive distribution of the so-called reference model to the parameter-conditional predictive distribution of a candidate model. Previous work showed how this projection problem can be solved for response families employed in generalized linear models and how an approximate latent-space approach can be used for many other response families. Here, we present an exact projection method for all response families with discrete and finite support, called the augmented-data projection. A simulation study for an ordinal response family shows that the proposed method performs better than or similarly to the previously proposed approximate latent-space projection. The cost of the slightly better performance of the augmented-data projection is a substantial increase in runtime. Thus, if the augmented-data projection\u2019s runtime is too high, we recommend the latent projection in the early phase of the model-building workflow and the augmented-data projection for final results. The ordinal response family from our simulation study is supported by both projection methods, but we also include a real-world cancer subtyping example with a nominal response family, a case that is not supported by the latent projection.<\/jats:p>","DOI":"10.1007\/s00180-024-01506-0","type":"journal-article","created":{"date-parts":[[2024,5,29]],"date-time":"2024-05-29T07:02:18Z","timestamp":1716966138000},"page":"701-721","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Projection predictive variable selection for discrete response families with finite support"],"prefix":"10.1007","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4842-7922","authenticated-orcid":false,"given":"Frank","family":"Weber","sequence":"first","affiliation":[]},{"given":"\u00c4nne","family":"Glass","sequence":"additional","affiliation":[]},{"given":"Aki","family":"Vehtari","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,5,29]]},"reference":[{"key":"1506_CR1","doi-asserted-by":"publisher","unstructured":"Betancourt M (2018) A conceptual introduction to Hamiltonian Monte Carlo. https:\/\/doi.org\/10.48550\/arXiv.1701.02434","DOI":"10.48550\/arXiv.1701.02434"},{"issue":"1","key":"1506_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v080.i01","volume":"80","author":"PC B\u00fcrkner","year":"2017","unstructured":"B\u00fcrkner PC (2017) brms: an R package for Bayesian multilevel models using Stan. J Stat Softw 80(1):1\u20132. https:\/\/doi.org\/10.18637\/jss.v080.i01","journal-title":"J Stat Softw"},{"issue":"1","key":"1506_CR3","doi-asserted-by":"publisher","first-page":"395","DOI":"10.32614\/RJ-2018-017","volume":"10","author":"PC B\u00fcrkner","year":"2018","unstructured":"B\u00fcrkner PC (2018) Advanced Bayesian multilevel modeling with the R package brms. R J 10(1):395\u2013411. https:\/\/doi.org\/10.32614\/RJ-2018-017","journal-title":"R J"},{"key":"1506_CR4","unstructured":"B\u00fcrkner PC, Gabry J, Kay M et\u00a0al (2023) posterior: tools for working with posterior distributions. https:\/\/mc-stan.org\/posterior\/, R package, version 1.4.1"},{"issue":"1","key":"1506_CR5","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1214\/12-BA703","volume":"7","author":"P Carbonetto","year":"2012","unstructured":"Carbonetto P, Stephens M (2012) Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Anal 7(1):73\u2013108. https:\/\/doi.org\/10.1214\/12-BA703","journal-title":"Bayesian Anal"},{"issue":"1","key":"1506_CR6","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v076.i01","volume":"76","author":"B Carpenter","year":"2017","unstructured":"Carpenter B, Gelman A, Hoffman MD et al (2017) Stan: a probabilistic programming language. J Stat Softw 76(1):1\u201332. https:\/\/doi.org\/10.18637\/jss.v076.i01","journal-title":"J Stat Softw"},{"key":"1506_CR7","doi-asserted-by":"publisher","unstructured":"Catalina A, B\u00fcrkner P, Vehtari A (2021) Latent space projection predictive inference. https:\/\/doi.org\/10.48550\/arXiv.2109.04702","DOI":"10.48550\/arXiv.2109.04702"},{"key":"1506_CR8","unstructured":"Catalina A, B\u00fcrkner PC, Vehtari A (2022) Projection predictive inference for generalized linear and additive multilevel models. In: Camps-Valls G, Ruiz FJR, Valera I (eds) Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol 151. PMLR, pp 4446\u20134461. https:\/\/proceedings.mlr.press\/v151\/catalina22a.html"},{"key":"1506_CR9","unstructured":"Clyde M (2022) BAS: Bayesian variable selection and model averaging using Bayesian adaptive sampling. https:\/\/CRAN.R-project.org\/package=BAS, R package, version 1.6.4"},{"key":"1506_CR10","unstructured":"Cs\u00e1rdi G (2019) cranlogs: download logs from the \u2019RStudio\u2019 \u2019CRAN\u2019 mirror. https:\/\/CRAN.R-project.org\/package=cranlogs, R package, version 2.1.1"},{"issue":"1\u20132","key":"1506_CR11","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1016\/S0378-3758(02)00286-0","volume":"111","author":"JA Dupuis","year":"2003","unstructured":"Dupuis JA, Robert CP (2003) Variable selection in qualitative models via an entropic explanatory power. J Stat Plan Inference 111(1\u20132):77\u201394. https:\/\/doi.org\/10.1016\/S0378-3758(02)00286-0","journal-title":"J Stat Plan Inference"},{"key":"1506_CR12","unstructured":"Gabry J, \u010ce\u0161novar R (2022) cmdstanr: R interface to \u2019CmdStan\u2019. https:\/\/mc-stan.org\/cmdstanr\/, R package, version 0.5.3"},{"issue":"1","key":"1506_CR13","doi-asserted-by":"publisher","first-page":"155","DOI":"10.32614\/RJ-2018-021","volume":"10","author":"G Garcia-Donato","year":"2018","unstructured":"Garcia-Donato G, Forte A (2018) Bayesian testing, variable selection and model averaging in linear models using R with BayesVarSel. R J 10(1):155\u2013174. https:\/\/doi.org\/10.32614\/RJ-2018-021","journal-title":"R J"},{"issue":"4","key":"1506_CR14","doi-asserted-by":"publisher","first-page":"1360","DOI":"10.1214\/08-AOAS191","volume":"2","author":"A Gelman","year":"2008","unstructured":"Gelman A, Jakulin A, Pittau MG et al (2008) A weakly informative default prior distribution for logistic and other regression models. Ann Appl Stat 2(4):1360\u20131383. https:\/\/doi.org\/10.1214\/08-AOAS191","journal-title":"Ann Appl Stat"},{"key":"1506_CR15","unstructured":"Goodrich B, Gabry J, Ali I et\u00a0al (2023) rstanarm: Bayesian applied regression modeling via Stan. https:\/\/mc-stan.org\/rstanarm\/, R package, version 2.21.4"},{"issue":"1","key":"1506_CR16","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1093\/biomet\/85.1.29","volume":"85","author":"C Goutis","year":"1998","unstructured":"Goutis C, Robert CP (1998) Model choice in generalised linear models: a Bayesian approach via Kullback\u2013Leibler projections. Biometrika 85(1):29\u201337","journal-title":"Biometrika"},{"issue":"1","key":"1506_CR17","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1214\/aoms\/1177729694","volume":"22","author":"S Kullback","year":"1951","unstructured":"Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79\u201386. https:\/\/doi.org\/10.1214\/aoms\/1177729694","journal-title":"Ann Math Stat"},{"issue":"1","key":"1506_CR18","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1111\/j.2517-6161.1968.tb01505.x","volume":"30","author":"DV Lindley","year":"1968","unstructured":"Lindley DV (1968) The choice of variables in multiple regression. J R Stat Soc Ser B Methodol 30(1):31\u201366","journal-title":"J R Stat Soc Ser B Methodol"},{"key":"1506_CR19","unstructured":"Liquet B, Sutton M (2017) MBSGS: multivariate Bayesian sparse group selection with spike and slab. https:\/\/CRAN.R-project.org\/package=MBSGS, R package, version 1.1.0"},{"key":"1506_CR20","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4899-3242-6","volume-title":"Generalized linear models","author":"P McCullagh","year":"1989","unstructured":"McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman & Hall, London","edition":"2"},{"key":"1506_CR21","doi-asserted-by":"publisher","unstructured":"McLatchie Y, R\u00f6gnvaldsson S, Weber F et\u00a0al (2023) Robust and efficient projection predictive inference. https:\/\/doi.org\/10.48550\/arXiv.2306.15581","DOI":"10.48550\/arXiv.2306.15581"},{"key":"1506_CR22","unstructured":"Nikooienejad A, Johnson VE (2020) BVSNLP: Bayesian variable selection in high dimensional settings using nonlocal priors. https:\/\/CRAN.R-project.org\/package=BVSNLP, R package, version 1.1.9"},{"key":"1506_CR23","doi-asserted-by":"publisher","DOI":"10.1007\/s00180-022-01231-6","author":"F Pavone","year":"2022","unstructured":"Pavone F, Piironen J, B\u00fcrkner PC et al (2022) Using reference models in variable selection. Comput Stat. https:\/\/doi.org\/10.1007\/s00180-022-01231-6","journal-title":"Comput Stat"},{"issue":"2","key":"1506_CR24","doi-asserted-by":"publisher","first-page":"483","DOI":"10.1534\/genetics.114.164442","volume":"198","author":"P Perez","year":"2014","unstructured":"Perez P, de Campos G (2014) Genome-wide regression and prediction with the BGLR statistical package. Genetics 198(2):483\u2013495","journal-title":"Genetics"},{"issue":"3","key":"1506_CR25","doi-asserted-by":"publisher","first-page":"711","DOI":"10.1007\/s11222-016-9649-y","volume":"27","author":"J Piironen","year":"2017","unstructured":"Piironen J, Vehtari A (2017a) Comparison of Bayesian predictive methods for model selection. Stat Comput 27(3):711\u2013735. https:\/\/doi.org\/10.1007\/s11222-016-9649-y","journal-title":"Stat Comput"},{"key":"1506_CR26","unstructured":"Piironen J, Vehtari A (2017b) On the hyperprior choice for the global shrinkage parameter in the horseshoe prior. In: Singh A, Zhu J (eds) Proceedings of The 20th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol 54. PMLR, pp 905\u2013913. https:\/\/proceedings.mlr.press\/v54\/piironen17a.html"},{"issue":"2","key":"1506_CR27","doi-asserted-by":"publisher","first-page":"5018","DOI":"10.1214\/17-EJS1337SI","volume":"11","author":"J Piironen","year":"2017","unstructured":"Piironen J, Vehtari A (2017c) Sparsity information and regularization in the horseshoe and other shrinkage priors. Electron J Stat 11(2):5018\u20135051. https:\/\/doi.org\/10.1214\/17-EJS1337SI","journal-title":"Electron J Stat"},{"issue":"1","key":"1506_CR28","doi-asserted-by":"publisher","first-page":"2155","DOI":"10.1214\/20-EJS1711","volume":"14","author":"J Piironen","year":"2020","unstructured":"Piironen J, Paasiniemi M, Vehtari A (2020) Projective inference in high-dimensional problems: prediction and feature selection. Electron J Stat 14(1):2155\u20132197. https:\/\/doi.org\/10.1214\/20-EJS1711","journal-title":"Electron J Stat"},{"key":"1506_CR29","unstructured":"Piironen J, Paasiniemi M, Catalina A et al (2023) projpred: projection predictive feature selection. https:\/\/mc-stan.org\/projpred\/, R package, version 2.5.0"},{"key":"1506_CR30","unstructured":"R Core Team (2023) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, https:\/\/www.R-project.org\/"},{"key":"1506_CR31","unstructured":"Rossell D, Cook JD, Telesca D et al (2023) mombf: model selection with Bayesian methods and information criteria. https:\/\/CRAN.R-project.org\/package=mombf, R package, version 3.3.1"},{"issue":"1","key":"1506_CR32","doi-asserted-by":"publisher","first-page":"217","DOI":"10.1080\/10618600.2016.1276840","volume":"26","author":"C R\u00f6ver","year":"2017","unstructured":"R\u00f6ver C, Friede T (2017) Discrete approximation of a mixture distribution via restricted divergence. J Comput Graph Stat 26(1):217\u2013222. https:\/\/doi.org\/10.1080\/10618600.2016.1276840","journal-title":"J Comput Graph Stat"},{"issue":"14","key":"1506_CR33","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v043.i14","volume":"43","author":"F Scheipl","year":"2011","unstructured":"Scheipl F (2011) spikeSlabGAM: Bayesian variable selection, model choice and regularization for generalized additive mixed models in R. J Stat Softw 43(14):1\u201324. https:\/\/doi.org\/10.18637\/jss.v043.i14","journal-title":"J Stat Softw"},{"key":"1506_CR34","unstructured":"Stan Development Team (2022a) Runtime warnings and convergence problems. https:\/\/mc-stan.org\/misc\/warnings.html, version from March 10, 2022. Accessed 13 April 2022"},{"key":"1506_CR35","unstructured":"Stan Development Team (2022b) Stan modeling language users guide and reference manual, Version 2.31. https:\/\/mc-stan.org"},{"key":"1506_CR36","unstructured":"Stell L, Sabatti C (2015) ptycho: Bayesian variable selection with hierarchical priors. https:\/\/CRAN.R-project.org\/package=ptycho, R package, version 1.1-4"},{"key":"1506_CR37","doi-asserted-by":"publisher","first-page":"142","DOI":"10.1214\/12-SS102","volume":"6","author":"A Vehtari","year":"2012","unstructured":"Vehtari A, Ojanen J (2012) A survey of Bayesian predictive methods for model assessment, selection and comparison. Stat Surv 6:142\u2013228. https:\/\/doi.org\/10.1214\/12-SS102","journal-title":"Stat Surv"},{"issue":"5","key":"1506_CR38","doi-asserted-by":"publisher","first-page":"1413","DOI":"10.1007\/s11222-016-9696-4","volume":"27","author":"A Vehtari","year":"2017","unstructured":"Vehtari A, Gelman A, Gabry J (2017) Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27(5):1413\u20131432. https:\/\/doi.org\/10.1007\/s11222-016-9696-4","journal-title":"Stat Comput"},{"issue":"2","key":"1506_CR39","doi-asserted-by":"publisher","first-page":"667","DOI":"10.1214\/20-BA1221","volume":"16","author":"A Vehtari","year":"2021","unstructured":"Vehtari A, Gelman A, Simpson D et al (2021) Rank-normalization, folding, and localization: an improved $$\\widehat{R}$$ for assessing convergence of MCMC (with discussion). Bayesian Anal 16(2):667\u2013718. https:\/\/doi.org\/10.1214\/20-BA1221","journal-title":"Bayesian Anal"},{"key":"1506_CR40","doi-asserted-by":"publisher","unstructured":"Vehtari A, Simpson D, Gelman A et\u00a0al (2022) Pareto smoothed importance sampling. https:\/\/doi.org\/10.48550\/arXiv.1507.02646","DOI":"10.48550\/arXiv.1507.02646"},{"key":"1506_CR41","doi-asserted-by":"crossref","unstructured":"Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York, https:\/\/www.stats.ox.ac.uk\/pub\/MASS4\/","DOI":"10.1007\/978-0-387-21706-2"},{"key":"1506_CR42","doi-asserted-by":"publisher","unstructured":"Wickham H (2016) ggplot2: elegant graphics for data analysis, 2nd edn. Springer, New York, https:\/\/doi.org\/10.1007\/978-3-319-24277-4, https:\/\/ggplot2.tidyverse.org","DOI":"10.1007\/978-3-319-24277-4"},{"issue":"538","key":"1506_CR43","doi-asserted-by":"publisher","first-page":"862","DOI":"10.1080\/01621459.2020.1825449","volume":"117","author":"YD Zhang","year":"2022","unstructured":"Zhang YD, Naughton BP, Bondell HD et al (2022) Bayesian regression using a prior on the model fit: the R2\u2013D2 shrinkage prior. J Am Stat Assoc 117(538):862\u2013874. https:\/\/doi.org\/10.1080\/01621459.2020.1825449","journal-title":"J Am Stat Assoc"},{"issue":"11","key":"1506_CR44","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v100.i11","volume":"100","author":"Z Zhao","year":"2021","unstructured":"Zhao Z, Banterle M, Bottolo L et al (2021) BayesSUR: an R package for high-dimensional multivariate Bayesian variable and covariance selection in linear regression. J Stat Softw 100(11):1\u201332. https:\/\/doi.org\/10.18637\/jss.v100.i11","journal-title":"J Stat Softw"},{"issue":"9","key":"1506_CR45","doi-asserted-by":"publisher","first-page":"1057","DOI":"10.1007\/s00120-019-0952-z","volume":"58","author":"A Zimpfer","year":"2019","unstructured":"Zimpfer A, Glass \u00c4, Zettl H et al (2019) Histopathologische Diagnose und Prognose des Nierenzellkarzinoms im Kontext der WHO-Klassifikation 2016. Urologe 58(9):1057\u20131065. https:\/\/doi.org\/10.1007\/s00120-019-0952-z","journal-title":"Urologe"}],"container-title":["Computational Statistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00180-024-01506-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00180-024-01506-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00180-024-01506-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,17]],"date-time":"2025-02-17T21:54:41Z","timestamp":1739829281000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00180-024-01506-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,29]]},"references-count":45,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,2]]}},"alternative-id":["1506"],"URL":"https:\/\/doi.org\/10.1007\/s00180-024-01506-0","relation":{},"ISSN":["0943-4062","1613-9658"],"issn-type":[{"type":"print","value":"0943-4062"},{"type":"electronic","value":"1613-9658"}],"subject":[],"published":{"date-parts":[[2024,5,29]]},"assertion":[{"value":"26 April 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 May 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 May 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no Conflict of interest to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}