{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T17:02:20Z","timestamp":1777654940506,"version":"3.51.4"},"reference-count":35,"publisher":"MDPI AG","issue":"18","license":[{"start":{"date-parts":[[2021,9,13]],"date-time":"2021-09-13T00:00:00Z","timestamp":1631491200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["UIDP\/50025\/2020"],"award-info":[{"award-number":["UIDP\/50025\/2020"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["UIDB\/50025\/2020"],"award-info":[{"award-number":["UIDB\/50025\/2020"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["CEECIND\/04697\/2017"],"award-info":[{"award-number":["CEECIND\/04697\/2017"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["UIDB\/50008\/2020-UIDP\/50008\/2020"],"award-info":[{"award-number":["UIDB\/50008\/2020-UIDP\/50008\/2020"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Mathematics"],"abstract":"<jats:p>The function and performance of neural networks are largely determined by the evolution of their weights and biases in the process of training, starting from the initial configuration of these parameters to one of the local minima of the loss function. We perform the quantitative statistical characterization of the deviation of the weights of two-hidden-layer feedforward ReLU networks of various sizes trained via Stochastic Gradient Descent (SGD) from their initial random configuration. We compare the evolution of the distribution function of this deviation with the evolution of the loss during training. We observed that successful training via SGD leaves the network in the close neighborhood of the initial configuration of its weights. For each initial weight of a link we measured the distribution function of the deviation from this value after training and found how the moments of this distribution and its peak depend on the initial weight. We explored the evolution of these deviations during training and observed an abrupt increase within the overfitting region. This jump occurs simultaneously with a similarly abrupt increase recorded in the evolution of the loss function. Our results suggest that SGD\u2019s ability to efficiently find local minima is restricted to the vicinity of the random initial configuration of weights.<\/jats:p>","DOI":"10.3390\/math9182246","type":"journal-article","created":{"date-parts":[[2021,9,13]],"date-time":"2021-09-13T23:32:23Z","timestamp":1631575943000},"page":"2246","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Effect of Initial Configuration of Weights on Training and Function of Artificial Neural Networks"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9651-4756","authenticated-orcid":false,"given":"Ricardo J.","family":"Jesus","sequence":"first","affiliation":[{"name":"Departamento de Eletr\u00f3nica, Telecomunica\u00e7\u00f5es e Inform\u00e1tica, Campus Universitario de Santiago, Universidade de Aveiro, 3810-193 Aveiro, Portugal"},{"name":"EPCC, The University of Edinburgh, Edinburgh EH8 9YL, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6504-9441","authenticated-orcid":false,"given":"M\u00e1rio L.","family":"Antunes","sequence":"additional","affiliation":[{"name":"Departamento de Eletr\u00f3nica, Telecomunica\u00e7\u00f5es e Inform\u00e1tica, Campus Universitario de Santiago, Universidade de Aveiro, 3810-193 Aveiro, Portugal"},{"name":"Instituto de Telecomunica\u00e7\u00f5es, Campus Universitario de Santiago, Universidade de Aveiro, 3810-193 Aveiro, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9102-1362","authenticated-orcid":false,"given":"Rui A.","family":"da Costa","sequence":"additional","affiliation":[{"name":"Departamento de F\u00edsica & I3N, Campus Universitario de Santiago, Universidade de Aveiro, 3810-193 Aveiro, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3320-3387","authenticated-orcid":false,"given":"Sergey N.","family":"Dorogovtsev","sequence":"additional","affiliation":[{"name":"Departamento de F\u00edsica & I3N, Campus Universitario de Santiago, Universidade de Aveiro, 3810-193 Aveiro, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4707-5945","authenticated-orcid":false,"given":"Jos\u00e9 F. F.","family":"Mendes","sequence":"additional","affiliation":[{"name":"Departamento de F\u00edsica & I3N, Campus Universitario de Santiago, Universidade de Aveiro, 3810-193 Aveiro, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0107-6253","authenticated-orcid":false,"given":"Rui L.","family":"Aguiar","sequence":"additional","affiliation":[{"name":"Departamento de Eletr\u00f3nica, Telecomunica\u00e7\u00f5es e Inform\u00e1tica, Campus Universitario de Santiago, Universidade de Aveiro, 3810-193 Aveiro, Portugal"},{"name":"Instituto de Telecomunica\u00e7\u00f5es, Campus Universitario de Santiago, Universidade de Aveiro, 3810-193 Aveiro, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2021,9,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_2","unstructured":"Li, Y., and Liang, Y. (2018). Learning overparameterized neural networks via stochastic gradient descent on structured data. Advances in Neural Information Processing Systems 31, Curran Associates Inc."},{"key":"ref_3","unstructured":"Jacot, A., Gabriel, F., and Hongler, C. (2018). Neural tangent kernel: Convergence and generalization in neural networks. Advances in Neural Information Processing Systems 31, Curran Associates Inc."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Lee, J., Xiao, L., Schoenholz, S., Bahri, Y., Novak, R., Sohl-Dickstein, J., and Pennington, J. (2019). Wide neural networks of any depth evolve as linear models under gradient descent. Advances in Neural Information Processing Systems 32, Curran Associates Inc.","DOI":"10.1088\/1742-5468\/abc62b"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"LeCun, Y., Bottou, L., Orr, G.B., and M\u00fcller, K.R. (1998). Efficient BackProp. Neural Networks: Tricks of the Trade, Springer.","DOI":"10.1007\/3-540-49430-8_2"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1016\/S0925-2312(99)00127-7","article-title":"A weight initialization method for improving training speed in feedforward neural network","volume":"30","author":"Yam","year":"2000","journal-title":"Neurocomputing"},{"key":"ref_7","unstructured":"Glorot, X., and Bengio, Y. (2010, January 13\u201315). Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015, January 7\u201313). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.123"},{"key":"ref_9","unstructured":"Chapelle, O., and Erhan, D. (2011). Improved preconditioner for hessian free optimization. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, Available online: https:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.297.3089&rep=rep1&type=pdf."},{"key":"ref_10","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, Curran Associates Inc."},{"key":"ref_11","unstructured":"Sutskever, I., Martens, J., Dahl, G., and Hinton, G. (2013, January 16\u201321). On the importance of initialization and momentum in deep learning. Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA."},{"key":"ref_12","unstructured":"Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, MIT Press."},{"key":"ref_13","unstructured":"Frankle, J., and Carbin, M. (2019, January 6\u20139). The lottery ticket hypothesis: Finding sparse, trainable neural networks. Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA."},{"key":"ref_14","unstructured":"Zhou, H., Lan, J., Liu, R., and Yosinski, J. (2019, January 8\u201314). Deconstructing lottery tickets: Zeros, signs, and the supermask. Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Ramanujan, V., Wortsman, M., Kembhavi, A., Farhadi, A., and Rastegari, M. (2019). What is Hidden in a Randomly Weighted Neural Network?. arXiv.","DOI":"10.1109\/CVPR42600.2020.01191"},{"key":"ref_16","unstructured":"Du, S., Lee, J., Li, H., Wang, L., and Zhai, X. (2019, January 9\u201315). Gradient descent finds global minima of deep neural networks. Proceedings of the 36th International Conference on Machine Learning, PMLR, Long Beach, CA, USA."},{"key":"ref_17","unstructured":"Du, S.S., Zhai, X., P\u00f3czos, B., and Singh, A. (2019, January 6\u20139). Gradient descent provably optimizes over-parameterized neural networks. Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA."},{"key":"ref_18","unstructured":"Allen-Zhu, Z., Li, Y., and Liang, Y. (2019). Learning and generalization in overparameterized neural networks, going beyond two layers. Advances in Neural Information Processing Systems 32, Curran Associates Inc."},{"key":"ref_19","first-page":"242","article-title":"A convergence theory for deep learning via over-parameterization","volume":"Volume 97","author":"Chaudhuri","year":"2019","journal-title":"Proceedings of the 36th International Conference on Machine Learning"},{"key":"ref_20","unstructured":"Allen-Zhu, Z., Li, Y., and Song, Z. (2019). On the convergence rate of training recurrent neural networks. Advances in Neural Information Processing Systems 32, Curran Associates, Inc."},{"key":"ref_21","first-page":"4951","article-title":"Overparameterized nonlinear learning: Gradient descent takes the shortest path?","volume":"Volume 97","author":"Oymak","year":"2019","journal-title":"Proceedings of the 36th International Conference on Machine Learning"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1109\/JSAIT.2020.2991332","article-title":"Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks","volume":"1","author":"Oymak","year":"2020","journal-title":"IEEE J. Sel. Areas Inf. Theory"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"467","DOI":"10.1007\/s10994-019-05839-6","article-title":"Gradient descent optimizes over-parameterized deep ReLU networks","volume":"109","author":"Zou","year":"2020","journal-title":"Mach. Learn."},{"key":"ref_24","first-page":"322","article-title":"Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks","volume":"Volume 97","author":"Arora","year":"2019","journal-title":"Proceedings of the 36th International Conference on Machine Learning"},{"key":"ref_25","first-page":"8141","article-title":"On exact computation with an infinitely wide neural net","volume":"Volume 32","author":"Wallach","year":"2019","journal-title":"Advances in Neural Information Processing Systems"},{"key":"ref_26","unstructured":"Chizat, L., Oyallon, E., and Bach, F. (2019). On lazy training in differentiable programming. Advances in Neural Information Processing Systems 32, Curran Associates, Inc."},{"key":"ref_27","unstructured":"Frankle, J., Schwab, D.J., and Morcos, A.S. (2020, January 26\u201330). The early phase of neural network training. Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia."},{"key":"ref_28","unstructured":"Lu, Z., Pu, H., Wang, F., Hu, Z., and Wang, L. (2017). The expressive power of neural networks: A view from the width. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_29","unstructured":"Li, D., Ding, T., and Sun, R. (2018). On the benefit of width for neural networks: Disappearance of bad basins. arXiv."},{"key":"ref_30","unstructured":"Chollet, F. (2021, September 05). Keras. Available online: https:\/\/keras.io."},{"key":"ref_31","unstructured":"Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., and Devin, M. (2021, September 05). TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Available online: tensorflow.org."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_33","unstructured":"Xiao, H., Rasul, K., and Vollgraf, R. (2017). Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv."},{"key":"ref_34","unstructured":"Thoma, M. (2017). The hasyv2 dataset. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1080\/02640410400021310","article-title":"The relative age effect in youth soccer across Europe","volume":"23","author":"Helsen","year":"2005","journal-title":"J. Sport. Sci."}],"container-title":["Mathematics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-7390\/9\/18\/2246\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:01:39Z","timestamp":1760166099000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-7390\/9\/18\/2246"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,13]]},"references-count":35,"journal-issue":{"issue":"18","published-online":{"date-parts":[[2021,9]]}},"alternative-id":["math9182246"],"URL":"https:\/\/doi.org\/10.3390\/math9182246","relation":{},"ISSN":["2227-7390"],"issn-type":[{"value":"2227-7390","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,13]]}}}