{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T17:52:20Z","timestamp":1768413140692,"version":"3.49.0"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2022,10,17]],"date-time":"2022-10-17T00:00:00Z","timestamp":1665964800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,10,17]],"date-time":"2022-10-17T00:00:00Z","timestamp":1665964800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Stat Comput"],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Stochastic processes provide a mathematically elegant way to model complex data. In theory, they provide flexible priors over function classes that can encode a wide range of interesting assumptions. However, in practice efficient inference by optimisation or marginalisation is difficult, a problem further exacerbated with big data and high dimensional input spaces. We propose a novel variational autoencoder (VAE) called the prior encoding variational autoencoder (<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\pi $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>\u03c0<\/mml:mi>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>VAE). <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\pi $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>\u03c0<\/mml:mi>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>VAE is a new continuous stochastic process. We use <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\pi $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>\u03c0<\/mml:mi>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>VAE to learn low dimensional embeddings of function classes by combining a trainable feature mapping with generative model using a VAE. We show that our framework can accurately learn expressive function classes such as Gaussian processes, but also properties of functions such as their integrals. For popular tasks, such as spatial interpolation, <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\pi $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>\u03c0<\/mml:mi>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>VAE achieves state-of-the-art performance both in terms of accuracy and computational efficiency. Perhaps most usefully, we demonstrate an elegant and scalable means of performing fully Bayesian inference for stochastic processes within probabilistic programming languages such as Stan.<\/jats:p>","DOI":"10.1007\/s11222-022-10151-w","type":"journal-article","created":{"date-parts":[[2022,10,17]],"date-time":"2022-10-17T18:03:06Z","timestamp":1666029786000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["$$\\pi $$VAE: a stochastic process prior for Bayesian deep learning with MCMC"],"prefix":"10.1007","volume":"32","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8759-5902","authenticated-orcid":false,"given":"Swapnil","family":"Mishra","sequence":"first","affiliation":[]},{"given":"Seth","family":"Flaxman","sequence":"additional","affiliation":[]},{"given":"Tresnia","family":"Berah","sequence":"additional","affiliation":[]},{"given":"Harrison","family":"Zhu","sequence":"additional","affiliation":[]},{"given":"Mikko","family":"Pakkanen","sequence":"additional","affiliation":[]},{"given":"Samir","family":"Bhatt","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,10,17]]},"reference":[{"issue":"6","key":"10151_CR1","doi-asserted-by":"publisher","first-page":"1152","DOI":"10.1214\/aos\/1176342871","volume":"2","author":"CE Antoniak","year":"1974","unstructured":"Antoniak, C.E.: Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Stat. 2(6), 1152\u20131174 (1974)","journal-title":"Ann. Stat."},{"issue":"4A","key":"10151_CR2","doi-asserted-by":"publisher","first-page":"3109","DOI":"10.3150\/16-BEJ810","volume":"5","author":"M Betancourt","year":"2017","unstructured":"Betancourt, M., Byrne, S., Livingstone, S., Girolami, M.: The geometric foundations of Hamiltonian Monte Carlo. Bernoulli 5(4A), 3109\u20133138 (2017). https:\/\/doi.org\/10.3150\/16-BEJ810","journal-title":"Bernoulli"},{"key":"10151_CR3","unstructured":"Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural networks. In: 32nd international conference on machine learning, icml 2015. (2015)"},{"key":"10151_CR4","unstructured":"Broomhead, D.S., Lowe, D.: Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks. DTIC. Retrieved from https:\/\/apps.dtic.mil\/sti\/citations\/ADA196234 (1988)"},{"issue":"1","key":"10151_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v076.i01","volume":"76","author":"B Carpenter","year":"2017","unstructured":"Carpenter, B., Gelman, A., Hoffman, M.D., Lee, D., Goodrich, B., Betancourt, M., Riddell, A.: Stan\u202f: a probabilistic programming language. J. Stat. Softw. 76(1), 1\u201332 (2017). https:\/\/doi.org\/10.18637\/jss.v076.i01","journal-title":"J. Stat. Softw."},{"key":"10151_CR6","unstructured":"Caterini, A.L., Doucet, A., Sejdinovic, D.: Hamiltonian variational auto-encoder. Adv. Neural Info. Process. Syst. (2018)"},{"key":"10151_CR7","unstructured":"Gardner, J.R., Pleiss, G., Bindel, D., Weinberger, K.Q., Wilson, A.G.: Gpytorch: Blackbox matrix-matrix Gaussian process inference with GPU acceleration. Adv. Neural Info. Process. Syst. (2018)"},{"key":"10151_CR8","unstructured":"Garnelo, M., Rosenbaum, D., Maddison, C.J., Ramalho, T., Saxton, D., Shanahan, M., Eslami, S.M.: Conditional neural processes. In: 35th international conference on machine learning, icml 2018. (2018)"},{"issue":"1","key":"10151_CR9","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1093\/biomet\/58.1.83","volume":"58","author":"AG Hawkes","year":"1971","unstructured":"Hawkes, A.G.: Spectra of some selfexciting and mutually exciting point processes. Biometrika 58(1), 83\u201390 (1971). https:\/\/doi.org\/10.1093\/biomet\/58.1.83","journal-title":"Biometrika"},{"key":"10151_CR10","unstructured":"Hern\u00e1ndez-Lobato, J.M., Adams, R.: Probabilistic backpropagation for scalable learning of bayesian neural networks. In: International conference on machine learning (pp. 1861-1869) (2015)"},{"issue":"5786","key":"10151_CR11","doi-asserted-by":"publisher","first-page":"504","DOI":"10.1126\/science.1127647","volume":"313","author":"GE Hinton","year":"2006","unstructured":"Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504\u2013507 (2006)","journal-title":"Science"},{"issue":"1","key":"10151_CR12","first-page":"1303","volume":"14","author":"MD Hoffman","year":"2013","unstructured":"Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14(1), 1303\u20131347 (2013)","journal-title":"J. Mach. Learn. Res."},{"key":"10151_CR13","unstructured":"Huggins, J.H., Kasprzak, M., Campbell, T., Broderick, T.: Practical posterior error bounds from variational objectives. arXiv preprint arXiv:1910.04102 (2019)"},{"key":"10151_CR14","unstructured":"Jacot, A., Gabriel, F., Hongler, C.: Neural tangent kernel: convergence and generalization in neural networks. Adv. Neural Info. Process. Syst. (2018)"},{"key":"10151_CR15","unstructured":"Karhunen, K.; On linear methods in probability theory. In: Annales academiae scientiarum fennicae, ser. al (Vol. 37, pp. 3-79) (1947)"},{"key":"10151_CR16","unstructured":"Kim, H., Mnih, A., Schwarz, J., Garnelo, M., Eslami, S.M.A., Rosenbaum, D., Teh, Y.W. :Attentive Neural Processes. CoRR, abs\/1901.0 (2019)"},{"key":"10151_CR17","unstructured":"Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (2014)"},{"key":"10151_CR18","unstructured":"Kingma, D.P., Welling, M.: Auto- Encoding Variational Bayes (VAE, reparameterization trick). ICLR 2014. (2014)"},{"key":"10151_CR19","doi-asserted-by":"crossref","unstructured":"Kingma, D.P., Welling, M.: An introduction to variational autoencoders. Found. Trends \u00ae Mach. Learn. 12(4), 307-392 (2019)","DOI":"10.1561\/2200000056"},{"key":"10151_CR20","unstructured":"Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. Adv. Neural Info. Process. Syst. (2017)"},{"issue":"4","key":"10151_CR21","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1111\/j.1467-9868.2011.00777.x","volume":"73","author":"F Lindgren","year":"2011","unstructured":"Lindgren, F., Rue, H., Lindstr\u00f6m, J.: An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J. R. Stat. Soc.: Ser. B Stat. Methodol. 73(4), 423\u2013498 (2011). https:\/\/doi.org\/10.1111\/j.1467-9868.2011.00777.x","journal-title":"J. R. Stat. Soc.: Ser. B Stat. Methodol."},{"key":"10151_CR22","unstructured":"Loeve, M.: Functions aleatoires du second ordre. Processus stochastique et mouvement Brownien, pp. 366-420 (1948)"},{"key":"10151_CR23","unstructured":"Minka, T.P.: Expectation propagation for approximate bayesian inference. In: Proceedings of the seventeenth conference on uncertainty in artificial intelligence (pp. 362-369). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc (2001)"},{"key":"10151_CR24","doi-asserted-by":"crossref","unstructured":"Mishra, S., Rizoiu, M.-A., Xie, L.: Feature driven and point process approaches for popularity prediction. In: Proceedings of the 25th ACM international on conference on information and knowledge management (pp. 1069-1078). New York, NY, USA: Association for Computing Machinery. Retrieved from https:\/\/doi.org\/10.1145\/2983323.2983812 (2016)","DOI":"10.1145\/2983323.2983812"},{"issue":"3","key":"10151_CR25","doi-asserted-by":"publisher","first-page":"451","DOI":"10.1111\/1467-9469.00115","volume":"25","author":"J M\u00f8ller","year":"1998","unstructured":"M\u00f8ller, J., Syversveen, A.R., Waagepetersen, R.P.: Log gaussian cox processes. Scand. J. Stat. 25(3), 451\u2013482 (1998)","journal-title":"Scand. J. Stat."},{"key":"10151_CR26","doi-asserted-by":"crossref","unstructured":"Neal, R.: Bayesian Learning for Neural Networks. Lecture notes in statistics -New York- Springer Verlag (1996)","DOI":"10.1007\/978-1-4612-0745-0"},{"key":"10151_CR27","unstructured":"Neal, R.M.: Probabilistic inference using Markov chain monte Carlo methods. University of Toronto Toronto, Ontario, Canada, Department of Computer Science (1993)"},{"issue":"2","key":"10151_CR28","doi-asserted-by":"publisher","first-page":"246","DOI":"10.1162\/neco.1991.3.2.246","volume":"3","author":"J Park","year":"1991","unstructured":"Park, J., Sandberg, I.W.: Universal approximation using radial-basis-function networks. Neural Comput. 3(2), 246\u2013257 (1991). https:\/\/doi.org\/10.1162\/neco.1991.3.2.246","journal-title":"Neural Comput."},{"key":"10151_CR29","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Chintala, S.: Pytorch: an imperative style, highperformance deep learning library. Adv. Neural Info. Process. Syst. (pp. 8024-8035) (2019)"},{"key":"10151_CR30","doi-asserted-by":"crossref","unstructured":"Pavliotis, G.A.: Stochastic processes and applications: diffusion processes, the fokkerplanck and langevin equations (Vol. 60). Springer (2014)","DOI":"10.1007\/978-1-4939-1323-7"},{"key":"10151_CR31","unstructured":"Rahimi, A., Recht, B.: Random features for large-scale kernel machines. Adv. Neural Info. Process. Syst. (pp. 1177-1184) (2008)"},{"key":"10151_CR32","doi-asserted-by":"crossref","unstructured":"Rasmussen, C.E., Williams, C.K.I.: Gaussian processes for machine learning. MIT Press. Retrieved from http:\/\/www.worldcat.org\/oclc\/61285753 (2006)","DOI":"10.7551\/mitpress\/3206.001.0001"},{"key":"10151_CR33","unstructured":"Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. International conference on machine learning (pp. 1278-1286) (2014)"},{"key":"10151_CR34","unstructured":"Ritter, H., Botev, A., Barber, D.: A scalable laplace approximation for neural networks. In: 6th international conference on learning representations, iclr 2018 - conference track proceedings (2018)"},{"key":"10151_CR35","unstructured":"Ross, S.M.: Stochastic processes (Vol. 2). John Wiley & Sons (1996)"},{"key":"10151_CR36","unstructured":"Roy, D.M., Teh, Y.W.: The Mondrian process. Advances in neural information processing systems 21 - proceedings of the 2008 conference (2009)"},{"key":"10151_CR37","doi-asserted-by":"publisher","unstructured":"Rue, H., Martino, S., Chopin, N.: Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B Stat Methodol. 71(2), 319\u2013392 (2009). https:\/\/doi.org\/10.1111\/j.1467-9868.2008.00700.x","DOI":"10.1111\/j.1467-9868.2008.00700.x"},{"key":"10151_CR38","unstructured":"Schwaighofer, A., Tresp, V.: Transductive and inductive methods for approximate gaussian process regression. Adv. Neural Info. Process. Syst. (pp. 977-984) (2003)"},{"issue":"191","key":"10151_CR39","doi-asserted-by":"publisher","first-page":"20220094","DOI":"10.1098\/rsif.2022.0094","volume":"19","author":"E Semenova","year":"2022","unstructured":"Semenova, E., Xu, Y., Howes, A., Rashid, T., Bhatt, S., Mishra, S., Flaxman, S.: Priorvae: encoding spatial priors with variational autoencoders for small-area estimation. J. R. Soc. Interface 19(191), 20220094 (2022)","journal-title":"J. R. Soc. Interface"},{"key":"10151_CR40","doi-asserted-by":"publisher","DOI":"10.1016\/j.spasta.2018.02.002","author":"J-F Ton","year":"2018","unstructured":"Ton, J.-F., Flaxman, S., Sejdinovic, D., Bhatt, S.: Spatial mapping with Gaussian processes and nonstationary Fourier features. Spat. Stat. (2018). https:\/\/doi.org\/10.1016\/j.spasta.2018.02.002","journal-title":"Spat. Stat."},{"key":"10151_CR41","unstructured":"Wang, K., Pleiss, G., Gardner, J., Tyree, S., Weinberger, K.Q., Wilson, A.G.: Exact gaussian processes on a million data points. Adv. Neural Info. Process. Syst. (pp. 14622-14632) (2019)"},{"key":"10151_CR42","unstructured":"Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient langevin dynamics. In: Proceedings of the 28th international conference on machine learning, ICML 2011 (2011)"},{"key":"10151_CR43","unstructured":"Yao, J., Pan, W., Ghosh, S., Doshi-Velez, F.: Quality of uncertainty quantification for bayesian neural network inference. arXiv preprint arXiv:1906.09686 (2019)"},{"key":"10151_CR44","unstructured":"Yao, Y., Vehtari, A., Simpson, D., Gelman, A.: Yes, but did it work?: evaluating variational inference. In: 35th international conference on machine learning, ICML 2018 (2018)"}],"container-title":["Statistics and Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11222-022-10151-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11222-022-10151-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11222-022-10151-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,29]],"date-time":"2022-11-29T12:28:01Z","timestamp":1669724881000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11222-022-10151-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,17]]},"references-count":44,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["10151"],"URL":"https:\/\/doi.org\/10.1007\/s11222-022-10151-w","relation":{},"ISSN":["0960-3174","1573-1375"],"issn-type":[{"value":"0960-3174","type":"print"},{"value":"1573-1375","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,17]]},"assertion":[{"value":"25 August 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 September 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 October 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"All authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"Not applicable.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval"}}],"article-number":"96"}}