{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T23:06:09Z","timestamp":1773270369447,"version":"3.50.1"},"reference-count":16,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T00:00:00Z","timestamp":1773187200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T00:00:00Z","timestamp":1773187200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100016111","name":"Universit\u00e0 degli Studi Mediterranea di Reggio Calabria","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100016111","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Optim Theory Appl"],"published-print":{"date-parts":[[2026,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    We introduce Geometric-Entropic Optimization (GEO), an algorithm for neural network training that integrates Riemannian gradient methods with entropy-regularized optimal transport. The algorithm operates on a parameter manifold equipped with a combined Fisher-Wasserstein metric and incorporates Sinkhorn-type projections to enforce distributional constraints on layer activations. We establish convergence guarantees showing that GEO achieves an\n                    <jats:inline-formula>\n                      <jats:alternatives>\n                        <jats:tex-math>$$O(1\/\\sqrt{T}) + O(\\rho ^{2K})$$<\/jats:tex-math>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                          <mml:mrow>\n                            <mml:mi>O<\/mml:mi>\n                            <mml:mrow>\n                              <mml:mo>(<\/mml:mo>\n                              <mml:mn>1<\/mml:mn>\n                              <mml:mo>\/<\/mml:mo>\n                              <mml:msqrt>\n                                <mml:mi>T<\/mml:mi>\n                              <\/mml:msqrt>\n                              <mml:mo>)<\/mml:mo>\n                            <\/mml:mrow>\n                            <mml:mo>+<\/mml:mo>\n                            <mml:mi>O<\/mml:mi>\n                            <mml:mrow>\n                              <mml:mo>(<\/mml:mo>\n                              <mml:msup>\n                                <mml:mi>\u03c1<\/mml:mi>\n                                <mml:mrow>\n                                  <mml:mn>2<\/mml:mn>\n                                  <mml:mi>K<\/mml:mi>\n                                <\/mml:mrow>\n                              <\/mml:msup>\n                              <mml:mo>)<\/mml:mo>\n                            <\/mml:mrow>\n                          <\/mml:mrow>\n                        <\/mml:math>\n                      <\/jats:alternatives>\n                    <\/jats:inline-formula>\n                    rate, where the first term reflects Riemannian gradient descent and the second captures the contraction of Sinkhorn iterations. Computational experiments on continuous control tasks and language modeling demonstrate consistent improvements over standard optimizers, with performance gains of approximately 20% on benchmark tasks. The theoretical framework unifies recent architectural innovations in deep learning, including manifold-constrained connections and orthogonality-preserving updates within a coherent optimization-theoretic perspective rooted in the geometric dynamics tradition.\n                  <\/jats:p>","DOI":"10.1007\/s10957-026-02958-8","type":"journal-article","created":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T07:52:14Z","timestamp":1773215534000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Geometric-Entropic Optimization: Integrating Optimal Transport with Riemannian Gradient Methods for Neural Network Training"],"prefix":"10.1007","volume":"209","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3663-836X","authenticated-orcid":false,"given":"Massimiliano","family":"Ferrara","sequence":"first","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2026,3,11]]},"reference":[{"key":"2958_CR1","doi-asserted-by":"publisher","DOI":"10.1515\/9781400830244","volume-title":"Optimization Algorithms on Matrix Manifolds","author":"PA Absil","year":"2008","unstructured":"Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)"},{"issue":"2","key":"2958_CR2","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1162\/089976698300017746","volume":"10","author":"SI Amari","year":"1998","unstructured":"Amari, S.I.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251\u2013276 (1998)","journal-title":"Neural Comput."},{"key":"2958_CR3","unstructured":"Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: Advances in Neural Information Processing Systems, vol. 26, pp. 2292\u20132300 (2013)"},{"key":"2958_CR4","unstructured":"Gupta, V., Koren, T., Singer, Y.: Shampoo: Preconditioned stochastic tensor optimization. In: Proceedings of the 35th International Conference on Machine Learning, pp. 1842\u20131850 (2018)"},{"key":"2958_CR5","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning, pp. 1861\u20131870 (2018)"},{"key":"2958_CR6","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898717778","volume-title":"Functions of Matrices: Theory and Computation","author":"NJ Higham","year":"2008","unstructured":"Higham, N.J.: Functions of Matrices: Theory and Computation. SIAM, Philadelphia (2008)"},{"key":"2958_CR7","unstructured":"Jordan, K., Jin, Y., Boza, V., et al.: Muon: An optimizer for hidden layers in neural networks. Technical report (2024). https:\/\/kellerjordan.github.io\/posts\/muon\/"},{"key":"2958_CR8","unstructured":"Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7\u20139, 2015, Conference Track Proceedings, pp. 1\u201315 (2015). Available at: arXiv:1412.6980"},{"key":"2958_CR9","volume-title":"A path towards autonomous machine intelligence","author":"Y LeCun","year":"2022","unstructured":"LeCun, Y.: A path towards autonomous machine intelligence. Preprint, OpenReview (2022)"},{"key":"2958_CR10","unstructured":"Martens, J., Grosse, R.: Optimizing neural networks with Kronecker-factored approximate curvature. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 2408\u20132417 (2015)"},{"issue":"5\u20136","key":"2958_CR11","first-page":"355","volume":"11","author":"G Peyr\u00e9","year":"2019","unstructured":"Peyr\u00e9, G., Cuturi, M.: Computational optimal transport. Found. Trends Mach. Learn. 11(5\u20136), 355\u2013607 (2019)","journal-title":"Found. Trends Mach. Learn."},{"issue":"2","key":"2958_CR12","doi-asserted-by":"publisher","first-page":"876","DOI":"10.1214\/aoms\/1177703591","volume":"35","author":"R Sinkhorn","year":"1964","unstructured":"Sinkhorn, R.: A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann. Math. Stat. 35(2), 876\u2013879 (1964)","journal-title":"Ann. Math. Stat."},{"key":"2958_CR13","doi-asserted-by":"publisher","DOI":"10.1007\/978-94-011-4187-1","volume-title":"Geometric Dynamics","author":"C Udri\u015fte","year":"2000","unstructured":"Udri\u015fte, C.: Geometric Dynamics. Kluwer Academic Publishers, Dordrecht (2000)"},{"issue":"3","key":"2958_CR14","doi-asserted-by":"publisher","first-page":"1036","DOI":"10.1007\/s10957-012-0021-x","volume":"154","author":"C Udri\u015fte","year":"2012","unstructured":"Udri\u015fte, C., Ferrara, M., Zugr\u0103vescu, D., Munteanu, F.: Controllability of a nonholonomic macroeconomic system. J. Optim. Theory Appl. 154(3), 1036\u20131054 (2012)","journal-title":"J. Optim. Theory Appl."},{"key":"2958_CR15","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-71050-9","volume-title":"Optimal Transport: Old and New","author":"C Villani","year":"2009","unstructured":"Villani, C.: Optimal Transport: Old and New. Springer, Berlin (2009)"},{"key":"2958_CR16","unstructured":"Xie, Z., Wei, Y., Cao, H., et al.: mHC: Manifold-Constrained Hyper-Connections. Preprint arXiv:2501.01427 (2025)"}],"container-title":["Journal of Optimization Theory and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10957-026-02958-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10957-026-02958-8","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10957-026-02958-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T07:52:18Z","timestamp":1773215538000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10957-026-02958-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,11]]},"references-count":16,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,4]]}},"alternative-id":["2958"],"URL":"https:\/\/doi.org\/10.1007\/s10957-026-02958-8","relation":{},"ISSN":["0022-3239","1573-2878"],"issn-type":[{"value":"0022-3239","type":"print"},{"value":"1573-2878","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,11]]},"assertion":[{"value":"7 January 2026","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 February 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 March 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The author declares no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of Interest"}}],"article-number":"2"}}