{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,8]],"date-time":"2026-05-08T15:57:53Z","timestamp":1778255873742,"version":"3.51.4"},"reference-count":67,"publisher":"MIT Press","issue":"5","content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,4,17]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>A restricted Boltzmann machine (RBM) is a two-layer neural network with shared weights and has been extensively studied for dimensionality reduction, data representation, and recommendation systems in the literature. The traditional RBM requires a probabilistic interpretation of the values on both layers and a Markov chain Monte Carlo (MCMC) procedure to generate samples during the training. The contrastive divergence (CD) is efficient to train the RBM, but its convergence has not been proved mathematically. In this letter, we investigate the RBM by using a maximum a posteriori (MAP) estimate and the expectation\u2013maximization (EM) algorithm. We show that the CD algorithm without MCMC is convergent for the conditional likelihood object function. Another key contribution in this letter is the reformulation of the RBM into a deterministic model. Within the reformulated RBM, the CD algorithm without MCMC approximates the gradient descent (GD) method. This reformulated RBM can take the continuous scalar and vector variables on the nodes with flexibility in choosing the activation functions. Numerical experiments show its capability in both linear and nonlinear dimensionality reduction, and for the nonlinear dimensionality reduction, the reformulated RBM can outperform principal component analysis (PCA) by choosing the proper activation functions. Finally, we demonstrate its application to vector-valued nodes for the CIFAR-10 data set (color images) and the multivariate sequence data, which cannot be configured naturally with the traditional RBM. This work not only provides theoretical insights regarding the traditional RBM but also unifies the linear and nonlinear dimensionality reduction for scalar and vector variables.<\/jats:p>","DOI":"10.1162\/neco_a_01751","type":"journal-article","created":{"date-parts":[[2025,3,20]],"date-time":"2025-03-20T18:40:07Z","timestamp":1742496007000},"page":"1034-1055","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":2,"title":["Reformulation of RBM to Unify Linear and Nonlinear Dimensionality Reduction"],"prefix":"10.1162","volume":"37","author":[{"given":"Jiangsheng","family":"You","sequence":"first","affiliation":[{"name":"Aspen Technology, Bedford, MA 01730, U.S.A. jason.you@aspentech.com"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chun-Yen","family":"Liu","sequence":"additional","affiliation":[{"name":"Aspen Technology, Bedford, MA 01730, U.S.A. mark.liu@aspentech.com"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2025,4,17]]},"reference":[{"issue":"1","key":"2025042219355149500_bib1","first-page":"147","article-title":"A learning algorithm for Boltzmann machines","volume":"9","author":"Ackley","year":"1985","journal-title":"Cognitive Science"},{"key":"2025042219355149500_bib2","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1016\/S0076-5392(08)60869-3","article-title":"Canonical correlation analysis of time series and the use of an information criterion","volume":"126","author":"Akaike","year":"1976","journal-title":"Mathematics in Science and Engineering"},{"issue":"6","key":"2025042219355149500_bib3","first-page":"1601","article-title":"Learning deep architectures for AI","volume":"21","author":"Bengio","year":"2009","journal-title":"Foundations and Trends in Machine Learning"},{"issue":"6","key":"2025042219355149500_bib4","doi-asserted-by":"publisher","first-page":"1601","DOI":"10.1162\/neco.2008.11-07-647","article-title":"Justifying and generalizing contrastive divergence","volume":"21","author":"Bengio","year":"2009","journal-title":"Neural Computation"},{"issue":"11","key":"2025042219355149500_bib5","doi-asserted-by":"publisher","first-page":"7327","DOI":"10.1109\/TPAMI.2021.3116668","article-title":"Deep generative modelling: A comparative review of VAEs, GANs, normalizing flows, energy-based and autoregressive models","volume":"44","author":"Bond-Taylor","year":"2022","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"6","key":"2025042219355149500_bib6","article-title":"Introduction to the issue on robust subspace learning and tracking: Theory, algorithms, and applications","volume":"12","author":"Bouwmans","year":"2018","journal-title":"IEEE Journal of Selected Topics in Signal Processing"},{"key":"2025042219355149500_bib7","doi-asserted-by":"publisher","first-page":"A96","DOI":"10.1051\/0004-6361\/201424194","article-title":"Restricted Boltzmann machine: A non-linear substitute for PCA in spectral processing","volume":"576","author":"Bu","year":"2015","journal-title":"Astronomy and Astrophysics"},{"issue":"3","key":"2025042219355149500_bib8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1970392.1970395","article-title":"Robust principal component analysis","volume":"58","author":"Candes","year":"2009","journal-title":"Journal of the ACM"},{"key":"2025042219355149500_bib9","first-page":"59","article-title":"On contrastive divergence learning","author":"Carreira","year":"2005","journal-title":"Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics"},{"key":"2025042219355149500_bib10","article-title":"Encoding musical style with transformer autoencoders","volume-title":"Proceedings of the 37th International Conference on Machine Learning","author":"Choi","year":"2020"},{"issue":"1","key":"2025042219355149500_bib11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"Journal of the Royal Statistical Society, Series B"},{"key":"2025042219355149500_bib12","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1111\/insr.12294","article-title":"An updated literature review of distance correlation and its applications to time series","volume":"87","author":"Edelmann","year":"2019","journal-title":"International Statistical Review"},{"key":"2025042219355149500_bib13","first-page":"208","article-title":"Empirical analysis of the divergence of Gibbs sampling based learning algorithms for restricted Boltzmann machines","volume-title":"Proceedings of the International Conference on Artificial Neural Networks","author":"Fischer","year":"2010"},{"key":"2025042219355149500_bib14","doi-asserted-by":"publisher","first-page":"664","DOI":"10.1162\/NECO_a_00085","article-title":"Bounding the bias of contrastive divergence learning","volume":"23","author":"Fischer","year":"2011","journal-title":"Neural Computation"},{"key":"2025042219355149500_bib15","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1016\/j.patcog.2013.05.025","article-title":"Training restricted Boltzmann machines: An introduction","volume":"47","author":"Fischer","year":"2014","journal-title":"Pattern Recognition"},{"key":"2025042219355149500_bib16","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4612-3094-6","volume-title":"Stochastic finite elements: A spectral approach","author":"Ghanem","year":"1991"},{"key":"2025042219355149500_bib17","volume-title":"Deep learning","author":"Goodfellow","year":"2015"},{"key":"2025042219355149500_bib18","volume-title":"Principal manifolds for data visualisation and dimension reduction.","author":"Gorban","year":"2007"},{"key":"2025042219355149500_bib19","volume-title":"Theory and applications of correspondence analysis","author":"Greenacre","year":"1983"},{"key":"2025042219355149500_bib20","article-title":"A comprehensive survey and analysis of generative models in machine learning","volume":"38","author":"Harshvardhan","year":"2020","journal-title":"Computer Science Review"},{"key":"2025042219355149500_bib21","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1109\/TMI.1987.4307828","article-title":"Bayesian image processing in two dimensions","volume":"6","author":"Hart","year":"1987","journal-title":"IEEE Transactions in Medical Imaging"},{"key":"2025042219355149500_bib22","first-page":"525","article-title":"R\u00e9seau de neurones \u00e0 synapses modifiables: D\u00e9codagede messages sensoriels composites par apprentissage non supervis\u00e9 et permanent","volume":"299","author":"H\u00e9rault","year":"1984","journal-title":"Comptes Rendus de l\u2019Acad\u00e9mie des Sciences, S\u00e9rie III"},{"key":"2025042219355149500_bib23","doi-asserted-by":"publisher","first-page":"1771","DOI":"10.1162\/089976602760128018","article-title":"Training products of experts by minimizing contrastive divergence","volume":"14","author":"Hinton","year":"2002","journal-title":"Neural Computation"},{"key":"2025042219355149500_bib24","article-title":"A practical guide to training restricted Boltzmann machines","author":"Hinton","year":"2010"},{"key":"2025042219355149500_bib25","first-page":"358","article-title":"Learning representations by recirculation","volume-title":"Advances in neural information processing systems","author":"Hinton","year":"1987"},{"key":"2025042219355149500_bib26","doi-asserted-by":"publisher","first-page":"504","DOI":"10.1126\/science.1127647","article-title":"Reducing the dimensionality of data with neural networks","volume":"313","author":"Hinton","year":"2006","journal-title":"Science"},{"issue":"2","key":"2025042219355149500_bib27","first-page":"282","article-title":"Learning and relearning in Boltzmann machines","volume":"1","author":"Hinton","year":"1986","journal-title":"Parallel distributed processing: Explorations in the Microstructure of Cognition"},{"key":"2025042219355149500_bib28","author":"Hinton","year":"1984","journal-title":"Boltzmann machines: Constraint satisfaction networks that learn"},{"issue":"3","key":"2025042219355149500_bib29","doi-asserted-by":"publisher","first-page":"863","DOI":"10.1016\/j.patcog.2006.07.009","article-title":"Kernel PCA for novelty detection","volume":"40","author":"Hoffmann","year":"2007","journal-title":"Pattern Recognition"},{"key":"2025042219355149500_bib30","doi-asserted-by":"publisher","first-page":"543","DOI":"10.1111\/rssb.12336","article-title":"Unbiased Markov chain Monte Carlo methods with couplings","volume":"82","author":"Jacob","year":"2020","journal-title":"Journal of the Royal Statistical Society: Series B"},{"issue":"2065","key":"2025042219355149500_bib31","doi-asserted-by":"publisher","first-page":"20150202","DOI":"10.1098\/rsta.2015.0202","article-title":"Principal component analysis: A review and recent developments","volume":"374","author":"Jolliffe","year":"2016","journal-title":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences"},{"key":"2025042219355149500_bib32","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Communications of the ACM"},{"key":"2025042219355149500_bib33","first-page":"556","article-title":"Algorithms for non-negative matrix factorization","volume-title":"Advances in neural information processing systems","author":"Lee","year":"2000"},{"issue":"2","key":"2025042219355149500_bib34","doi-asserted-by":"publisher","first-page":"228","DOI":"10.1109\/34.908974","article-title":"PCA versus lDA","volume":"23","author":"Martinez","year":"2001","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2025042219355149500_bib35","author":"McInnes","year":"2018","journal-title":"Uniform manifold approximation and projection for dimension reduction"},{"key":"2025042219355149500_bib36","volume-title":"Discriminant analysis and statistical pattern recognition","author":"McLachlan","year":"2004"},{"issue":"3","key":"2025042219355149500_bib37","doi-asserted-by":"publisher","first-page":"511","DOI":"10.1111\/1467-9868.00082","article-title":"The EM algorithm: An old folk-song sung to a fast new tune","volume":"59","author":"Meng","year":"1997","journal-title":"Journal of the Royal Statistical Society: Series B"},{"key":"2025042219355149500_bib38","first-page":"514","article-title":"Conditional restricted Boltzmann machines for structured output prediction","volume-title":"Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence","author":"Mnih","year":"2011"},{"key":"2025042219355149500_bib39","first-page":"046116","article-title":"Detection and characterization of changes of the correlation structure in multivariate time series","volume":"E71","author":"M\u00fcller","year":"2005","journal-title":"Physical Review"},{"key":"2025042219355149500_bib40","first-page":"139","article-title":"High-order representation learning for multivariate time series forecasting","volume-title":"Proceedings of 38th ICML","author":"Nguyen","year":"2021"},{"issue":"2","key":"2025042219355149500_bib41","doi-asserted-by":"publisher","first-page":"117","DOI":"10.3847\/0004-637X\/824\/2\/117","article-title":"Detection and characterization of exoplanets using projections on Karhunen Loeve eigenimages: Forward modeling","volume":"824","author":"Pueyo","year":"2016","journal-title":"Astrophysical Journal"},{"issue":"2","key":"2025042219355149500_bib42","doi-asserted-by":"publisher","first-page":"104","DOI":"10.3847\/1538-4357\/aaa1f2","article-title":"Non-negative matrix factorization: Robust extraction of extended structures","volume":"852","author":"Ren","year":"2018","journal-title":"Astrophysical Journal"},{"key":"2025042219355149500_bib43","first-page":"448","article-title":"Deep Boltzmann machines","volume-title":"Proceedings of the 12th International Conference on Artificial Intelligence and Statistics","author":"Salakhutdinov","year":"2009"},{"key":"2025042219355149500_bib44","doi-asserted-by":"crossref","DOI":"10.1145\/1273496.1273596","article-title":"Restricted Boltzmann machines for collaborative filtering","volume-title":"Proceedings of the 24th International Conference on Machine Learning","author":"Salakhutdinov","year":"2007"},{"issue":"5","key":"2025042219355149500_bib45","doi-asserted-by":"crossref","first-page":"1299","DOI":"10.1162\/089976698300017467","article-title":"Nonlinear component analysis as a kernel eigenvalue problem","volume":"10","author":"Sch\u00f6lkopf","year":"1998","journal-title":"Neural Computation"},{"key":"2025042219355149500_bib46","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-540-73750-6_2","article-title":"Nonlinear principal component analysis: Neural network models and applications","volume-title":"Principal manifolds for data visualization and dimension reduction","author":"Scholz","year":"2008"},{"key":"2025042219355149500_bib47","author":"Shen","year":"2018","journal-title":"Nonlinear dimensionality reduction on graphs"},{"key":"2025042219355149500_bib48","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1109\/TMI.1982.4307558","article-title":"Maximum likelihood reconstruction for emission tomography","volume":"1","author":"Shepp","year":"1982","journal-title":"IEEE Transaction on Medical Imaging"},{"key":"2025042219355149500_bib49","volume-title":"Parallel distributed processing: Explorations in the microstructure of cognition.","author":"Smolensky","year":"1986"},{"key":"2025042219355149500_bib50","doi-asserted-by":"crossref","DOI":"10.1109\/IJCNN.2016.7727482","article-title":"Learning Boltzmann machine with EM-like method","volume-title":"Proceedings of the International Joint Conference on Neural Networks","author":"Song","year":"2016"},{"key":"2025042219355149500_bib51","volume-title":"Probability, random processes, and estimation theory for engineers","author":"Stark","year":"1986"},{"key":"2025042219355149500_bib52","first-page":"789","article-title":"On the convergence properties of contrastive divergence","volume-title":"Proceedings of the 12th International Conference on Artificial Intelligence and Statistics","author":"Sutskever","year":"2010"},{"key":"2025042219355149500_bib53","first-page":"1","article-title":"A tutorial on stochastic approximation algorithms for training restricted Boltzmann machines and deep belief nets","volume-title":"Proceedings of the Information Theory and Application Workshop","author":"Swersky","year":"2010"},{"key":"2025042219355149500_bib54","article-title":"End-to-end training of deep Boltzmann machines by unbiased divergence with local mode initialization","volume-title":"Proceedings of the 40th International Conference on Machine Learning","author":"Taniguchi","year":"2023"},{"key":"2025042219355149500_bib55","volume-title":"Confirmatory factor analysis for applied research methodology in the social sciences","author":"Timothy","year":"2006"},{"key":"2025042219355149500_bib56","first-page":"2579","article-title":"Visualizing high-dimensional data using t-SNE","volume":"9","author":"van der Maaten","year":"2008","journal-title":"Journal of Machine Learning Research"},{"key":"2025042219355149500_bib57","author":"van der Maaten","year":"2009","journal-title":"Dimensionality reduction: A comparative review"},{"key":"2025042219355149500_bib58","article-title":"Attention is all you need","volume-title":"Advances in neural information processing systems","author":"Vaswani","year":"2017"},{"key":"2025042219355149500_bib59","first-page":"3371","article-title":"Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion","volume":"11","author":"Vincent","year":"2010","journal-title":"Journal of Machine Learning Research"},{"key":"2025042219355149500_bib60","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1561\/2200000056","article-title":"An introduction to variational autoencoders","volume":"12","author":"Welling","year":"2019","journal-title":"Foundations and Trends in Machine Learning"},{"key":"2025042219355149500_bib61","author":"Woodford","year":"2006","journal-title":"Notes on contrastive divergence."},{"issue":"1","key":"2025042219355149500_bib62","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1214\/aos\/1176346060","article-title":"On the convergence properties of the EM algorithm","volume":"11","author":"Wu","year":"1983","journal-title":"Annals of Statistics"},{"key":"2025042219355149500_bib63","doi-asserted-by":"publisher","first-page":"377","DOI":"10.4310\/18-SII552","article-title":"Accelerate training of restricted Boltzmann machine via iterative conditional maximum likelihood estimation","volume-title":"Statistics and Its Interface","author":"Wu","year":"2019"},{"key":"2025042219355149500_bib64","volume-title":"VAEBM: A symbiosis between variational autoencoders and energy-based models","author":"Xiao","year":"2021"},{"key":"2025042219355149500_bib65","doi-asserted-by":"crossref","DOI":"10.1109\/ICPR.2014.270","article-title":"To be Bernoulli or to be gaussian, for a restricted Boltzmann machine","volume-title":"Proceedings of 22th International Conference on Pattern Recognition","author":"Yamashita","year":"2014"},{"key":"2025042219355149500_bib66","doi-asserted-by":"publisher","first-page":"1696","DOI":"10.1109\/TNS.2007.901198","article-title":"Range condition and ML-EM checkerboard artifacts","volume":"54","author":"You","year":"2007","journal-title":"IEEE Transactions in Nuclear Science"},{"key":"2025042219355149500_bib67","doi-asserted-by":"publisher","first-page":"56","DOI":"10.38094\/jastt1224","article-title":"A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction","volume":"1","author":"Zebari","year":"2022","journal-title":"Journal of Applied Science and Technology Trends"}],"container-title":["Neural Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/neco\/article-pdf\/37\/5\/1034\/2508872\/neco_a_01751.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/neco\/article-pdf\/37\/5\/1034\/2508872\/neco_a_01751.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,22]],"date-time":"2025-04-22T23:36:11Z","timestamp":1745364971000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/neco\/article\/37\/5\/1034\/128505\/Reformulation-of-RBM-to-Unify-Linear-and-Nonlinear"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,17]]},"references-count":67,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,4,17]]},"published-print":{"date-parts":[[2025,4,17]]}},"URL":"https:\/\/doi.org\/10.1162\/neco_a_01751","relation":{},"ISSN":["0899-7667","1530-888X"],"issn-type":[{"value":"0899-7667","type":"print"},{"value":"1530-888X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,5]]},"published":{"date-parts":[[2025,4,17]]}}}