{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T11:03:03Z","timestamp":1762340583624},"reference-count":24,"publisher":"MIT Press - Journals","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Neural Computation"],"published-print":{"date-parts":[[2013,3]]},"abstract":"<jats:p> Restricted Boltzmann machines (RBMs) are often used as building blocks in greedy learning of deep networks. However, training this simple model can be laborious. Traditional learning algorithms often converge only with the right choice of metaparameters that specify, for example, learning rate scheduling and the scale of the initial weights. They are also sensitive to specific data representation. An equivalent RBM can be obtained by flipping some bits and changing the weights and biases accordingly, but traditional learning rules are not invariant to such transformations. Without careful tuning of these training settings, traditional algorithms can easily get stuck or even diverge. In this letter, we present an enhanced gradient that is derived to be invariant to bit-flipping transformations. We experimentally show that the enhanced gradient yields more stable training of RBMs both when used with a fixed learning rate and an adaptive one. <\/jats:p>","DOI":"10.1162\/neco_a_00397","type":"journal-article","created":{"date-parts":[[2012,11,13]],"date-time":"2012-11-13T18:17:01Z","timestamp":1352830621000},"page":"805-831","source":"Crossref","is-referenced-by-count":37,"title":["Enhanced Gradient for Training Restricted Boltzmann Machines"],"prefix":"10.1162","volume":"25","author":[{"given":"KyungHyun","family":"Cho","sequence":"first","affiliation":[{"name":"Department of Information and Computer Science, Aalto University School of Science, Espoo, Uusimaa 02150, Finland"}]},{"given":"Tapani","family":"Raiko","sequence":"additional","affiliation":[{"name":"Department of Information and Computer Science, Aalto University School of Science, Espoo, Uusimaa 02150, Finland"}]},{"given":"Alexander","family":"Ilin","sequence":"additional","affiliation":[{"name":"Department of Information and Computer Science, Aalto University School of Science, Espoo, Uusimaa 02150, Finland"}]}],"member":"281","reference":[{"key":"B1","doi-asserted-by":"publisher","DOI":"10.1561\/2200000006"},{"key":"B2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.2008.11-07-647"},{"key":"B3","first-page":"281","volume":"13","author":"Bergstra J.","year":"2012","journal-title":"Journal of Machine Learning Research"},{"key":"B5","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-21735-7_2"},{"key":"B6","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2010.5596837"},{"key":"B9","volume-title":"Tempered Markov chain Monte Carlo for training of restricted Boltzmann machines","author":"Desjardins G.","year":"2009"},{"key":"B10","first-page":"145","volume-title":"Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics","author":"Desjardins G.","year":"2010"},{"key":"B11","first-page":"625","volume":"11","author":"Erhan D.","year":"2010","journal-title":"Journal of Machine Learning Research"},{"key":"B12","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15825-4_26"},{"key":"B13","doi-asserted-by":"publisher","DOI":"10.1162\/089976602760128018"},{"key":"B14","volume-title":"A practical guide to training restricted Boltzmann machines","author":"Hinton G. E.","year":"2010"},{"key":"B15","doi-asserted-by":"publisher","DOI":"10.1126\/science.1127647"},{"key":"B16","volume-title":"Convolutional deep belief networks on CIFAR-10","author":"Krizhevsky A.","year":"2010"},{"key":"B17","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"B18","first-page":"873","volume-title":"Advances in neural information processing systems, 20","author":"Lee H.","year":"2008"},{"key":"B19","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553453"},{"key":"B20","first-page":"509","volume-title":"Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics","author":"Marlin B. M.","year":"2010"},{"key":"B21","volume-title":"Derivations on the enhanced gradient for the Boltzmann machine","author":"Raiko T.","year":"2011"},{"key":"B22","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2007.383157"},{"key":"B23","first-page":"1598","volume-title":"Advances in neural information processing systems, 22","author":"Salakhutdinov R.","year":"2009"},{"key":"B24","first-page":"448","volume-title":"Proceedings of the International Conference on Artificial Intelligence and Statistics","volume":"5","author":"Salakhutdinov R.","year":"2009"},{"key":"B26","first-page":"194","volume-title":"Parallel distributed processing: Explorations in the microstructure of cognition, Vol. 1: Foundations","author":"Smolensky P.","year":"1986"},{"key":"B27","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390290"},{"key":"B28","doi-asserted-by":"publisher","DOI":"10.1007\/BF00341287"}],"container-title":["Neural Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/NECO_a_00397","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T21:39:42Z","timestamp":1615585182000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/neco\/article\/25\/3\/805-831\/7857"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,3]]},"references-count":24,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2013,3]]}},"alternative-id":["10.1162\/NECO_a_00397"],"URL":"https:\/\/doi.org\/10.1162\/neco_a_00397","relation":{},"ISSN":["0899-7667","1530-888X"],"issn-type":[{"value":"0899-7667","type":"print"},{"value":"1530-888X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,3]]}}}