{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T02:31:33Z","timestamp":1760236293362,"version":"build-2065373602"},"reference-count":43,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2021,11,8]],"date-time":"2021-11-08T00:00:00Z","timestamp":1636329600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Variational inference is an optimization-based method for approximating the posterior distribution of the parameters in Bayesian probabilistic models. A key challenge of variational inference is to approximate the posterior with a distribution that is computationally tractable yet sufficiently expressive. We propose a novel method for generating samples from a highly flexible variational approximation. The method starts with a coarse initial approximation and generates samples by refining it in selected, local regions. This allows the samples to capture dependencies and multi-modality in the posterior, even when these are absent from the initial approximation. We demonstrate theoretically that our method always improves the quality of the approximation (as measured by the evidence lower bound). In experiments, our method consistently outperforms recent variational inference methods in terms of log-likelihood and ELBO across three example tasks: the Eight-Schools example (an inference task in a hierarchical model), training a ResNet-20 (Bayesian inference in a large neural network), and the Mushroom task (posterior sampling in a contextual bandit problem).<\/jats:p>","DOI":"10.3390\/e23111475","type":"journal-article","created":{"date-parts":[[2021,11,8]],"date-time":"2021-11-08T22:08:41Z","timestamp":1636409321000},"page":"1475","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Sampling the Variational Posterior with Local Refinement"],"prefix":"10.3390","volume":"23","author":[{"given":"Marton","family":"Havasi","sequence":"first","affiliation":[{"name":"Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jasper","family":"Snoek","sequence":"additional","affiliation":[{"name":"Brain Team, Google Research, Mountain View, CA 94043, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dustin","family":"Tran","sequence":"additional","affiliation":[{"name":"Brain Team, Google Research, Mountain View, CA 94043, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jonathan","family":"Gordon","sequence":"additional","affiliation":[{"name":"Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jos\u00e9 Miguel","family":"Hern\u00e1ndez-Lobato","sequence":"additional","affiliation":[{"name":"Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,11,8]]},"reference":[{"key":"ref_1","unstructured":"Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., and Man\u00e9, D. (2016). Concrete problems in AI safety. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B. (2013). Bayesian Data Analysis, Chapman and Hall\/CRC.","DOI":"10.1201\/b16018"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v076.i01","article-title":"Stan: A probabilistic programming language","volume":"76","author":"Carpenter","year":"2017","journal-title":"J. Stat. Softw."},{"key":"ref_4","unstructured":"Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018, January 10\u201315). Yes, but Did It Work?: Evaluating Variational Inference. Proceedings of the International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_5","unstructured":"Kucukelbir, A., Ranganath, R., Gelman, A., and Blei, D. (2015, January 7\u201312). Automatic variational inference in Stan. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_6","first-page":"1593","article-title":"The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo","volume":"15","author":"Hoffman","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Salvatier, J., Wiecki, T.V., and Fonnesbeck, C. (2016). Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci., 2.","DOI":"10.7717\/peerj-cs.55"},{"key":"ref_8","unstructured":"Graves, A. (2011, January 12\u201314). Practical variational inference for neural networks. Proceedings of the Advances in Neural Information Processing Systems, Granada, Spain."},{"key":"ref_9","unstructured":"Blundell, C., Cornebise, J., Kavukcuoglu, K., and Wierstra, D. (2015, January 7\u20139). Weight uncertainty in neural network. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_10","unstructured":"Louizos, C., and Welling, M. (2017, January 6\u201311). Multiplicative normalizing flows for variational Bayesian neural networks. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_11","unstructured":"Lakshminarayanan, B., Pritzel, A., and Blundell, C. (2017, January 4\u20139). Simple and scalable predictive uncertainty estimation using deep ensembles. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_12","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv."},{"key":"ref_13","unstructured":"Hern\u00e1ndez-Lobato, J.M., and Adams, R. (2015, January 7\u20139). Probabilistic backpropagation for scalable learning of Bayesian neural networks. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_14","unstructured":"Kingma, D.P., Salimans, T., and Welling, M. (2015). Variational Dropout and the Local Reparameterization Trick. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_15","unstructured":"LeCun, Y., and Bengio, Y. (1995). Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks, MIT Press. A Bradford Book."},{"key":"ref_16","unstructured":"Wen, Y., Vicol, P., Ba, J., Tran, D., and Grosse, R. (2018). Flipout: Efficient pseudo-independent weight perturbations on mini-batches. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_18","unstructured":"Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J.V., Lakshminarayanan, B., and Snoek, J. (2019, January 8\u20139). Can You Trust Your Model\u2019s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_19","unstructured":"Ioffe, S., and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv."},{"key":"ref_20","unstructured":"Osawa, K., Swaroop, S., Jain, A., Eschenhagen, R., Turner, R.E., Yokota, R., and Khan, M.E. (2019, January 8\u201314). Practical Deep Learning with Bayesian Principles. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_21","unstructured":"Wen, Y., Tran, D., and Ba, J. (2019, January 6\u20139). BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning. Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1093\/biomet\/25.3-4.285","article-title":"On the likelihood that one unknown probability exceeds another in view of the evidence of two samples","volume":"25","author":"Thompson","year":"1933","journal-title":"Biometrika"},{"key":"ref_23","unstructured":"Hern\u00e1ndez-Lobato, J.M., Requeima, J., Pyzer-Knapp, E.O., and Aspuru-Guzik, A. (2017, January 6\u201311). Parallel and distributed Thompson sampling for large-scale accelerated exploration of chemical space. Proceedings of the International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_24","unstructured":"Guez, A. (2015). Sample-Based Search Methods for Bayes-Adaptive Planning. [Ph.D. Thesis, UCL (University College London)]."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Hinton, G., and Van Camp, D. (1993, January 26\u201328). Keeping neural networks simple by minimizing the description length of the weights. Proceedings of the 6th Ann. ACM Conf. on Computational Learning Theory, Santa Cruz, CA, USA.","DOI":"10.1145\/168304.168306"},{"key":"ref_26","first-page":"995","article-title":"A mean field theory learning algorithm for neural networks","volume":"1","author":"Peterson","year":"1987","journal-title":"Complex Syst."},{"key":"ref_27","unstructured":"Kingma, D.P., and Welling, M. (2013). Auto-encoding variational Bayes. arXiv."},{"key":"ref_28","unstructured":"Nguyen, C.V., Li, Y., Bui, T.D., and Turner, R.E. (2017). Variational continual learning. arXiv."},{"key":"ref_29","unstructured":"Riquelme, C., Tucker, G., and Snoek, J.R. (May, January 30). Deep Bayesian Bandits Showdown. Proceedings of the International Conference on Representation Learning, Vancouver, BC, Canada."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"2008","DOI":"10.1109\/TPAMI.2018.2889774","article-title":"Advances in variational inference","volume":"41","author":"Zhang","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_31","unstructured":"Agakov, F.V., and Barber, D. (2004, January 22\u201325). An auxiliary variational method. Proceedings of the International Conference on Neural Information Processing, Calcutta, India."},{"key":"ref_32","unstructured":"Ranganath, R., Tran, D., and Blei, D. (2016, January 20\u201322). Hierarchical variational models. Proceedings of the International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_33","unstructured":"Salimans, T., Kingma, D., and Welling, M. (2015, January 7\u20139). Markov chain monte carlo and variational inference: Bridging the gap. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_34","unstructured":"Zhang, Y., Hern\u00e1ndez-Lobato, J.M., and Ghahramani, Z. (2018). Ergodic measure preserving flows. arXiv."},{"key":"ref_35","unstructured":"Ruiz, F., and Titsias, M. (2019, January 9\u201315). A Contrastive Divergence for Combining Variational Inference and MCMC. Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_36","unstructured":"Guo, F., Wang, X., Fan, K., Broderick, T., and Dunson, D.B. (2016). Boosting variational inference. arXiv."},{"key":"ref_37","unstructured":"Miller, A.C., Foti, N.J., and Adams, R.P. (2017, January 6\u201311). Variational Boosting: Iteratively Refining Posterior Approximations. Proceedings of the International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_38","unstructured":"Locatello, F., Dresdner, G., Khanna, R., Valera, I., and Raetsch, G. (2018, January 3\u20138). Boosting Black Box Variational Inference. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_39","unstructured":"Hjelm, D., Salakhutdinov, R.R., Cho, K., Jojic, N., Calhoun, V., and Chung, J. (2016, January 5\u201310). Iterative refinement of the approximate posterior for directed belief networks. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_40","unstructured":"Cremer, C., Li, X., and Duvenaud, D. (2018). Inference suboptimality in variational autoencoders. arXiv."},{"key":"ref_41","unstructured":"Kim, Y., Wiseman, S., Miller, A.C., Sontag, D., and Rush, A.M. (2018). Semi-amortized variational autoencoders. arXiv."},{"key":"ref_42","unstructured":"Marino, J., Yue, Y., and Mandt, S. (2018). Iterative amortized inference. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/BF00058655","article-title":"Bagging predictors","volume":"24","author":"Breiman","year":"1996","journal-title":"Mach. Learn."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/11\/1475\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:27:37Z","timestamp":1760167657000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/11\/1475"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,8]]},"references-count":43,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2021,11]]}},"alternative-id":["e23111475"],"URL":"https:\/\/doi.org\/10.3390\/e23111475","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2021,11,8]]}}}