{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T06:30:46Z","timestamp":1763706646922,"version":"build-2065373602"},"reference-count":56,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2023,9,22]],"date-time":"2023-09-22T00:00:00Z","timestamp":1695340800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"NSF","award":["1718991","R01-EB026955"],"award-info":[{"award-number":["1718991","R01-EB026955"]}]},{"name":"NIH","award":["1718991","R01-EB026955"],"award-info":[{"award-number":["1718991","R01-EB026955"]}]},{"name":"Intel Corporation","award":["1718991","R01-EB026955"],"award-info":[{"award-number":["1718991","R01-EB026955"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Energy-based models (EBMs) assign an unnormalized log probability to data samples. This functionality has a variety of applications, such as sample synthesis, data denoising, sample restoration, outlier detection, Bayesian reasoning and many more. But, the training of EBMs using standard maximum likelihood is extremely slow because it requires sampling from the model distribution. Score matching potentially alleviates this problem. In particular, denoising-score matching has been successfully used to train EBMs. Using noisy data samples with one fixed noise level, these models learn fast and yield good results in data denoising. However, demonstrations of such models in the high-quality sample synthesis of high-dimensional data were lacking. Recently, a paper showed that a generative model trained by denoising-score matching accomplishes excellent sample synthesis when trained with data samples corrupted with multiple levels of noise. Here we provide an analysis and empirical evidence showing that training with multiple noise levels is necessary when the data dimension is high. Leveraging this insight, we propose a novel EBM trained with multiscale denoising-score matching. Our model exhibits a data-generation performance comparable to state-of-the-art techniques such as GANs and sets a new baseline for EBMs. The proposed model also provides density information and performs well on an image-inpainting task.<\/jats:p>","DOI":"10.3390\/e25101367","type":"journal-article","created":{"date-parts":[[2023,9,22]],"date-time":"2023-09-22T05:32:45Z","timestamp":1695360765000},"page":"1367","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Learning Energy-Based Models in High-Dimensional Spaces with Multiscale Denoising-Score Matching"],"prefix":"10.3390","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3720-0108","authenticated-orcid":false,"given":"Zengyi","family":"Li","sequence":"first","affiliation":[{"name":"Redwood Center for Theoretical Neuroscience, Berkeley, CA 94720, USA"},{"name":"Department of Physics, University of California Berkeley, Berkeley, CA 94720, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8930-3512","authenticated-orcid":false,"given":"Yubei","family":"Chen","sequence":"additional","affiliation":[{"name":"Redwood Center for Theoretical Neuroscience, Berkeley, CA 94720, USA"},{"name":"Berkeley AI Research, University of California Berkeley, Berkeley, CA 94720, USA"}]},{"given":"Friedrich T.","family":"Sommer","sequence":"additional","affiliation":[{"name":"Redwood Center for Theoretical Neuroscience, Berkeley, CA 94720, USA"},{"name":"Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA 94720, USA"},{"name":"Neuromorphic Computing Group, Intel Labs, 2200 Mission College Blvd., Santa Clara, CA 95054, USA"}]}],"member":"1968","published-online":{"date-parts":[[2023,9,22]]},"reference":[{"key":"ref_1","first-page":"12","article-title":"Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion","volume":"11","author":"Vincent","year":"2010","journal-title":"J. Mach. Learn. Res. (JMLR)"},{"key":"ref_2","unstructured":"Zhai, S., Cheng, Y., Lu, W., and Zhang, Z. (2016, January 16). Deep Structured Energy Based Models for Anomaly Detection. Proceedings of the International Conference on Machine Learning (ICML), New York, NY, USA."},{"key":"ref_3","unstructured":"Choi, H., Jang, E., and Alemi, A.A. (2018). Waic, but why? Generative ensembles for robust anomaly detection. arXiv."},{"key":"ref_4","unstructured":"Nijkamp, E., Hill, M., Han, T., Zhu, S.C., and Wu, Y.N. (February, January 27). On the Anatomy of MCMC-based Maximum Likelihood Learning of Energy-Based Models. Proceedings of the Conference on Artificial Intelligence (AAAI), Honolulu, HI, USA."},{"key":"ref_5","unstructured":"Du, Y., and Mordatch, I. (2019). Implicit generation and generalization in energy-based models. arXiv."},{"key":"ref_6","unstructured":"Welling, M., and Teh, Y.W. (July, January 28). Bayesian learning via stochastic gradient Langevin dynamics. Proceedings of the International Conference on Machine Learning (ICML), Bellevue, WA, USA."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Bakir, G., Hofman, T., Sch\u00f6lkopf, B., and Smola, A. (2006). Predicting Structured Data, MIT Press.","DOI":"10.7551\/mitpress\/7443.001.0001"},{"key":"ref_8","unstructured":"Ngiam, J., Chen, Z., Koh, P.W., and Ng, A.Y. (July, January 28). Learning deep energy models. Proceedings of the International Conference on Machine Learning (ICML), Bellevue, WA, USA."},{"key":"ref_9","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014, January 8\u201313). Generative adversarial nets. Proceedings of the Advances in Neural Information Processing Systems (NIPS), Montreal, QC, Canada."},{"key":"ref_10","unstructured":"Dinh, L., Krueger, D., and Bengio, Y. (2015, January 7\u20139). NICE: Non-linear Independent Components Estimation. Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA."},{"key":"ref_11","unstructured":"Kingma, D.P., and Dhariwal, P. (2018). Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Curran Associates, Inc."},{"key":"ref_12","unstructured":"van den Oord, A., Kalchbrenner, N., and Kavukcuoglu, K. (2016, January 19\u201324). Pixel Recurrent Neural Networks. Proceedings of the International Conference on Machine Learning (ICML), New York, NY, USA."},{"key":"ref_13","unstructured":"Ostrovski, G., Dabney, W., and Munos, R. (2018, January 10\u201315). Autoregressive Quantile Networks for Generative Modeling. Proceedings of the International Conference on Machine Learning (ICML), Vienna, Austria."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hinton, G.E. (1999, January 14\u201317). Products of experts. Proceedings of the International Conference on Artificial Neural Networks (ICANN), Bratislava, Slovakia.","DOI":"10.1049\/cp:19991075"},{"key":"ref_15","unstructured":"Haarnoja, T., Tang, H., Abbeel, P., and Levine, S. (2017, January 7\u20139). Reinforcement learning with deep energy-based policies. Proceedings of the International Conference on Machine Learning (ICML), Sydney, Australia."},{"key":"ref_16","unstructured":"Kumar, R., Goyal, A., Courville, A., and Bengio, Y. (2019). Maximum Entropy Generators for Energy-Based Models. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1771","DOI":"10.1162\/089976602760128018","article-title":"Training products of experts by minimizing contrastive divergence","volume":"14","author":"Hinton","year":"2002","journal-title":"Neural Comput."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Tieleman, T. (2008, January 5\u20139). Training restricted Boltzmann machines using approximations to the likelihood gradient. Proceedings of the International Conference on Machine Learning (ICML), Helsinki, Finland.","DOI":"10.1145\/1390156.1390290"},{"key":"ref_19","first-page":"4","article-title":"Estimation of non-normalized statistical models by score matching","volume":"6","year":"2005","journal-title":"J. Mach. Learn. Res. (JMLR)"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.1214\/aoms\/1177704472","article-title":"On Estimation of a Probability Density Function and Mode","volume":"33","author":"Parzen","year":"1962","journal-title":"Ann. Math. Stat."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Johnson, O.T. (2004). Information Theory and the Central Limit Theorem, World Scientific.","DOI":"10.1142\/p341"},{"key":"ref_22","unstructured":"DasGupta, A. (2008). Asymptotic Theory of Statistics and Probability, Springer Science & Business Media."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1661","DOI":"10.1162\/NECO_a_00142","article-title":"A connection between score matching and denoising autoencoders","volume":"23","author":"Vincent","year":"2011","journal-title":"Neural Comput."},{"key":"ref_24","unstructured":"Kingma, D.P., and LeCun, Y. (2010). Advances in Neural Information Processing Systems 23 (NIPS 2010), Curran Associates, Inc."},{"key":"ref_25","unstructured":"Drucker, H., and Le Cun, Y. (1991, January 8\u201312). Double backpropagation increasing generalization performance. Proceedings of the International Joint Conference on Neural Networks (IJCNN), Seattle, WA, USA."},{"key":"ref_26","unstructured":"Saremi, S., Mehrjou, A., Sch\u00f6lkopf, B., and Hyv\u00e4rinen, A. (2018). Deep energy estimator networks. arXiv."},{"key":"ref_27","unstructured":"Saremi, S., and Hyv\u00e4rinen, A. (2019). Neural Empirical Bayes. arXiv."},{"key":"ref_28","unstructured":"Song, Y., and Ermon, S. (2019). Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Curran Associates, Inc."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"2319","DOI":"10.1126\/science.290.5500.2319","article-title":"A global geometric framework for nonlinear dimensionality reduction","volume":"290","author":"Tenenbaum","year":"2000","journal-title":"Science"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"2323","DOI":"10.1126\/science.290.5500.2323","article-title":"Nonlinear dimensionality reduction by locally linear embedding","volume":"290","author":"Roweis","year":"2000","journal-title":"Science"},{"key":"ref_31","first-page":"11","article-title":"Probabilistic non-linear principal component analysis with Gaussian process latent variable models","volume":"6","author":"Lawrence","year":"2005","journal-title":"J. Mach. Learn. Res. (JMLR)"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Vershynin, R. (2018). High-Dimensional Probability: An Introduction with Applications in Data Science, Cambridge University Press.","DOI":"10.1017\/9781108231596"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Tao, T. (2012). Topics in Random Matrix Theory, American Mathematical Society.","DOI":"10.1090\/gsm\/132"},{"key":"ref_34","unstructured":"Karklin, Y., and Simoncelli, E.P. (2011). Efficient coding of natural images with a population of noisy linear-nonlinear neurons. Adv. Neural Inf. Process. Syst. (NIPS), 24."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1126\/science.220.4598.671","article-title":"Optimization by simulated annealing","volume":"220","author":"Kirkpatrick","year":"1983","journal-title":"Science"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1023\/A:1008923215028","article-title":"Annealed importance sampling","volume":"11","author":"Neal","year":"2001","journal-title":"Stat. Comput."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1137\/S0036141096303359","article-title":"The variational formulation of the Fokker\u2013Planck equation","volume":"29","author":"Jordan","year":"1998","journal-title":"SIAM J. Math. Anal."},{"key":"ref_38","unstructured":"Bellec, G., Kappel, D., Maass, W., and Legenstein, R.A. (May, January 30). Deep Rewiring: Training very sparse deep networks. Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Liu, Z., Luo, P., Wang, X., and Tang, X. (2015, January 11\u201317). Deep learning face attributes in the wild. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV.2015.425"},{"key":"ref_40","unstructured":"Krizhevsky, A., and Hinton, G. (2009). Learning Multiple Layers of Features from Tiny Images, University of Toronto. Technical Report."},{"key":"ref_41","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 12\u201314). Identity mappings in deep residual networks. Proceedings of the European Conference on computer vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46493-0_38"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Fan, F., Cong, W., and Wang, G. (2018). A new type of neurons for machine learning. Int. J. Numer. Methods Biomed. Eng. (JNMBE), 34.","DOI":"10.1002\/cnm.2920"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"721","DOI":"10.1109\/TPAMI.1984.4767596","article-title":"Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images","volume":"6","author":"Geman","year":"1984","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell. (PAMI)"},{"key":"ref_45","unstructured":"Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., and Chen, X. (2016, January 5\u201310). Improved techniques for training gans. Proceedings of the Advances in Neural Information Processing Systems (NIPS), Barcelona, Spain."},{"key":"ref_46","unstructured":"Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. (2017, January 4\u20139). Gans trained by a two time-scale update rule converge to a local nash equilibrium. Proceedings of the Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA."},{"key":"ref_47","unstructured":"Barratt, S., and Sharma, R. (July, January 10). A note on the inception score. Proceedings of the International Conference on Machine Learning (ICML), Workshop on Theoretical Foundations and Applications of Deep Generative Models, Stockholm, Sweden."},{"key":"ref_48","unstructured":"Neal, R.M. (2011). Handbook of Markov Chain Monte Carlo, CRC Press."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Salakhutdinov, R., and Murray, I. (2008, January 5\u20139). On the quantitative analysis of deep belief networks. Proceedings of the International Conference on Machine learning (ICML), Helsinki, Finland.","DOI":"10.1145\/1390156.1390266"},{"key":"ref_50","unstructured":"Burda, Y., Grosse, R.B., and Salakhutdinov, R. (2015, January 9\u201312). Accurate and conservative estimates of MRF log-likelihood using reverse annealing. Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), San Diego, CA, USA."},{"key":"ref_51","unstructured":"Behrmann, J., Grathwohl, W., Chen, R.T.Q., Duvenaud, D., and Jacobsen, J. (2019, January 9\u201315). Invertible Residual Networks. Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA."},{"key":"ref_52","unstructured":"Chen, T.Q., Behrmann, J., Duvenaud, D., and Jacobsen, J. (2019, January 8\u201314). Residual Flows for Invertible Generative Modeling. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada."},{"key":"ref_53","unstructured":"Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. (May, January 30). Spectral Normalization for Generative Adversarial Networks. Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada."},{"key":"ref_54","unstructured":"Nalisnick, E.T., Matsukawa, A., Teh, Y.W., G\u00f6r\u00fcr, D., and Lakshminarayanan, B. (2019, January 6\u20139). Do Deep Generative Models Know What They Don\u2019t Know?. Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA."},{"key":"ref_55","unstructured":"Song, Y., Garg, S., Shi, J., and Ermon, S. (2019, January 22\u201326). Sliced Score Matching: A Scalable Approach to Density and Score Estimation. Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), Tel Aviv, Israel."},{"key":"ref_56","unstructured":"Russell, S.J., and Norvig, P. (2016). Artificial Intelligence: A Modern Approach, Pearson Education Limited."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/10\/1367\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:55:49Z","timestamp":1760129749000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/10\/1367"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,22]]},"references-count":56,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2023,10]]}},"alternative-id":["e25101367"],"URL":"https:\/\/doi.org\/10.3390\/e25101367","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2023,9,22]]}}}