{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,14]],"date-time":"2025-10-14T00:40:19Z","timestamp":1760402419661,"version":"build-2065373602"},"reference-count":10,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2020,1,15]],"date-time":"2020-01-15T00:00:00Z","timestamp":1579046400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former.<\/jats:p>","DOI":"10.3390\/e22010101","type":"journal-article","created":{"date-parts":[[2020,1,17]],"date-time":"2020-01-17T04:14:41Z","timestamp":1579234481000},"page":"101","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics"],"prefix":"10.3390","volume":"22","author":[{"given":"Rita","family":"Fioresi","sequence":"first","affiliation":[{"name":"Dipartimento di Matematica, piazza Porta San Donato 5, University of Bologna, 40126 Bologna, Italy"}]},{"given":"Pratik","family":"Chaudhari","sequence":"additional","affiliation":[{"name":"Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104, USA"}]},{"given":"Stefano","family":"Soatto","sequence":"additional","affiliation":[{"name":"Computer Science Department, University of California, Los Angeles, CA 90095, USA"}]}],"member":"1968","published-online":{"date-parts":[[2020,1,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Chaudhari, P., and Soatto, S. (2017). Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks. arXiv.","DOI":"10.1109\/ITA.2018.8503224"},{"key":"ref_4","unstructured":"Chaudhari, P., and Soatto, S. (2015). On the energy landscape of deep networks. arXiv."},{"key":"ref_5","unstructured":"Chaudhari, P., Choromanska, A., Soatto, S., LeCun, Y., Baldassi, C., Borgs, C., Chayes, J., Sagun, L., and Zecchina, R. (2016). Entropy-SGD: Biasing gradient descent into wide valleys. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1162\/089976698300017746","article-title":"Natural Gradient Works Efficiently in Learning","volume":"10","author":"Amari","year":"1998","journal-title":"Neural Comput."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Adler, R., Bazin, M., and Schiffer, M. (1965). Introduction to General Relativity, McGraw-Hill.","DOI":"10.1063\/1.3047725"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017). Densely Connected Convolutional Networks. arXiv.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Achille, A., and Soatto, S. (2017). On the emergence of invariance and disentangling in deep representations. arXiv.","DOI":"10.1109\/ITA.2018.8503149"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Petersen, P. (1998). Riemannian Geometry, Springer. (GTM).","DOI":"10.1007\/978-1-4757-6434-5"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/22\/1\/101\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T13:43:05Z","timestamp":1760362985000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/22\/1\/101"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,15]]},"references-count":10,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2020,1]]}},"alternative-id":["e22010101"],"URL":"https:\/\/doi.org\/10.3390\/e22010101","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2020,1,15]]}}}