{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,4,3]],"date-time":"2022-04-03T15:36:25Z","timestamp":1649000185458},"reference-count":24,"publisher":"MIT Press - Journals","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Neural Computation"],"published-print":{"date-parts":[[2019,12]]},"abstract":"<jats:p> We analyze the joint probability distribution on the lengths of the vectors of hidden variables in different layers of a fully connected deep network, when the weights and biases are chosen randomly according to gaussian distributions. We show that if the activation function [Formula: see text] satisfies a minimal set of assumptions, satisfied by all activation functions that we know that are used in practice, then, as the width of the network gets large, the \u201clength process\u201d converges in probability to a length map that is determined as a simple function of the variances of the random weights and biases and the activation function [Formula: see text]. We also show that this convergence may fail for [Formula: see text] that violate our assumptions. We show how to use this analysis to choose the variance of weight initialization, depending on the activation function, so that hidden variables maintain a consistent scale throughout the network. <\/jats:p>","DOI":"10.1162\/neco_a_01235","type":"journal-article","created":{"date-parts":[[2019,10,15]],"date-time":"2019-10-15T21:52:21Z","timestamp":1571176341000},"page":"2562-2580","source":"Crossref","is-referenced-by-count":2,"title":["On the Effect of the Activation Function on the Distribution of Hidden Nodes in a Deep Network"],"prefix":"10.1162","volume":"31","author":[{"given":"Philip M.","family":"Long","sequence":"first","affiliation":[{"name":"Google Brain, Mountain View, CA 94043, U.S.A."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hanie","family":"Sedghi","sequence":"additional","affiliation":[{"name":"Google Brain, Mountain View, CA 94043, U.S.A."}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","reference":[{"key":"B1","first-page":"242","author":"Allen-Zhu Z.","year":"2019","journal-title":"Proceedings of the International Conference on Machine Learning"},{"key":"B2","author":"Chen M.","year":"2018","journal-title":"Dynamical isometry and a mean field theory of RNNs: Gating enables signal propagation in recurrent neural networks"},{"key":"B3","first-page":"2253","volume-title":"Advances in neural information processing systems","volume":"29","author":"Daniely A.","year":"2016"},{"key":"B4","volume-title":"An introduction to probability theory and its applications","author":"Feller W.","year":"2008"},{"key":"B5","first-page":"249","author":"Glorot X.","year":"2010","journal-title":"Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics"},{"key":"B6","first-page":"580","volume-title":"Advances in neural information processing systems","volume":"31","author":"Hanin B.","year":"2018"},{"key":"B7","author":"Hayou S.","year":"2018","journal-title":"On the selection of initialization and activation function for deep neural networks"},{"key":"B8","volume-title":"Encyclopaedia of mathematics","author":"Hazewinkel M.","year":"2013"},{"key":"B9","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.123"},{"key":"B10","doi-asserted-by":"publisher","DOI":"10.1007\/s00222-006-0028-8"},{"key":"B11","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-49430-8_2"},{"key":"B12","author":"Lee J.","year":"2018","journal-title":"Proceedings of the International Conference on Learning Representations"},{"key":"B13","author":"Long P. M.","year":"2019","journal-title":"On the effect of the activation function on the distribution of hidden nodes in a deep network"},{"key":"B14","doi-asserted-by":"crossref","DOI":"10.1515\/9780691213194","volume-title":"Statistics in theory and practice","author":"Lupton R.","year":"1993"},{"key":"B15","author":"Matthews A. G. d. G.","year":"2018","journal-title":"Gaussian process behaviour in wide deep neural networks"},{"key":"B16","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4612-0745-0"},{"key":"B17","author":"Novak R.","year":"2019","journal-title":"Proceedings of the International Conference on Learning Representations"},{"key":"B18","first-page":"4785","volume-title":"Advances in neural information processing systems","volume":"30","author":"Pennington J.","year":"2017"},{"key":"B19","author":"Pennington J.","year":"2018","journal-title":"The emergence of spectral universality in deep networks"},{"key":"B20","first-page":"3360","volume-title":"Advances in neural information processing systems","volume":"29","author":"Poole B.","year":"2016"},{"key":"B21","author":"Schoenholz S. S.","year":"2016","journal-title":"Deep information propagation"},{"key":"B22","author":"Xiao L.","year":"2018","journal-title":"How to train 10,000-layer vanilla convolutional neural networks"},{"key":"B23","first-page":"7103","volume-title":"Advances in neural information processing systems","volume":"30","author":"Yang G.","year":"2017"},{"key":"B24","author":"Zou D.","year":"2018","journal-title":"CoRR"}],"container-title":["Neural Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/neco_a_01235","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T21:43:22Z","timestamp":1615585402000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/neco\/article\/31\/12\/2562-2580\/95607"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12]]},"references-count":24,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2019,12]]}},"alternative-id":["10.1162\/neco_a_01235"],"URL":"https:\/\/doi.org\/10.1162\/neco_a_01235","relation":{},"ISSN":["0899-7667","1530-888X"],"issn-type":[{"value":"0899-7667","type":"print"},{"value":"1530-888X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,12]]}}}