{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T06:52:05Z","timestamp":1775112725478,"version":"3.50.1"},"reference-count":30,"publisher":"Proceedings of the National Academy of Sciences","issue":"1","license":[{"start":{"date-parts":[[2019,12,23]],"date-time":"2019-12-23T00:00:00Z","timestamp":1577059200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000006","name":"DOD | United States Navy | Office of Naval Research","doi-asserted-by":"publisher","award":["N00014-17-1-2569"],"award-info":[{"award-number":["N00014-17-1-2569"]}],"id":[{"id":"10.13039\/100000006","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000006","name":"DOD | United States Navy | Office of Naval Research","doi-asserted-by":"publisher","award":["N00014-17-1-2569"],"award-info":[{"award-number":["N00014-17-1-2569"]}],"id":[{"id":"10.13039\/100000006","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.pnas.org"],"crossmark-restriction":true},"short-container-title":["Proc. Natl. Acad. Sci. U.S.A."],"published-print":{"date-parts":[[2020,1,7]]},"abstract":"<jats:p>Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open question. In this paper we study basic nonconvex 1- and 2-layer neural network models that learn random patterns and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy-driven greedy and message-passing algorithms that focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian, and their generalization performance on real data.<\/jats:p>","DOI":"10.1073\/pnas.1908636117","type":"journal-article","created":{"date-parts":[[2019,12,24]],"date-time":"2019-12-24T01:36:25Z","timestamp":1577151385000},"page":"161-170","update-policy":"https:\/\/doi.org\/10.1073\/pnas.cm10313","source":"Crossref","is-referenced-by-count":65,"title":["Shaping the learning landscape in neural networks around wide flat minima"],"prefix":"10.1073","volume":"117","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5451-8388","authenticated-orcid":false,"given":"Carlo","family":"Baldassi","sequence":"first","affiliation":[{"name":"Artificial Intelligence Lab, Institute for Data Science and Analytics, Bocconi University, 20136 Milan, Italy;"},{"name":"Istituto Nazionale di Fisica Nucleare, Sezione di Torino, 10125 Torino, Italy;"}]},{"given":"Fabrizio","family":"Pittorino","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Lab, Institute for Data Science and Analytics, Bocconi University, 20136 Milan, Italy;"},{"name":"Department of Applied Science and Technology, Politecnico di Torino, 10129 Torino, Italy;"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1221-5207","authenticated-orcid":false,"given":"Riccardo","family":"Zecchina","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Lab, Institute for Data Science and Analytics, Bocconi University, 20136 Milan, Italy;"},{"name":"International Centre for Theoretical Physics, 34151 Trieste, Italy"}]}],"member":"341","published-online":{"date-parts":[[2019,12,23]]},"reference":[{"key":"e_1_3_4_1_2","volume-title":"Information Theory, Inference and Learning Algorithms","author":"MacKay D. J.","year":"2003","unstructured":"D. J. MacKay, Information Theory, Inference and Learning Algorithms (Cambridge University Press, 2003)."},{"key":"e_1_3_4_2_2","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun Y.","year":"2015","unstructured":"Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436\u2013444 (2015).","journal-title":"Nature"},{"key":"e_1_3_4_3_2","doi-asserted-by":"crossref","first-page":"128101","DOI":"10.1103\/PhysRevLett.115.128101","article-title":"Subdominant dense clusters allow for simple learning and high computational performance in neural networks with discrete synapses","volume":"115","author":"Baldassi C.","year":"2015","unstructured":"C. Baldassi, A. Ingrosso, C. Lucibello, L. Saglietti, R. Zecchina, Subdominant dense clusters allow for simple learning and high computational performance in neural networks with discrete synapses. Phys. Rev. Lett. 115, 128101 (2015).","journal-title":"Phys. Rev. Lett."},{"key":"e_1_3_4_4_2","doi-asserted-by":"crossref","first-page":"E7655","DOI":"10.1073\/pnas.1608103113","article-title":"Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes","volume":"113","author":"Baldassi C.","year":"2016","unstructured":"C. Baldassi , Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes. Proc. Natl. Acad. Sci. U.S.A. 113, E7655\u2013E7662 (2016).","journal-title":"Proc. Natl. Acad. Sci. U.S.A."},{"key":"e_1_3_4_5_2","unstructured":"N. S. Keskar D. Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang On large-batch training for deep learning: Generalization gap and sharp minima. arXiv:1609.04836 (15 September 2016)."},{"key":"e_1_3_4_6_2","doi-asserted-by":"crossref","first-page":"3057","DOI":"10.1051\/jphys:0198900500200305700","article-title":"Storage capacity of memory networks with binary couplings","volume":"50","author":"Krauth W.","year":"1989","unstructured":"W. Krauth, M. M\u00e9zard, Storage capacity of memory networks with binary couplings. J. Phys. France 50, 3057\u20133066 (1989).","journal-title":"J. Phys. France"},{"key":"e_1_3_4_7_2","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1145\/3313276.3316383","volume-title":"Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing","author":"Ding J.","year":"2019","unstructured":"J. Ding, N. Sun, \u201cCapacity lower bound for the ising perceptron\u201d in Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (ACM, 2019), pp. 816\u2013827."},{"key":"e_1_3_4_8_2","doi-asserted-by":"crossref","first-page":"052813","DOI":"10.1103\/PhysRevE.90.052813","article-title":"Origin of the computational hardness for learning with binary synapses","volume":"90","author":"Huang H.","year":"2014","unstructured":"H. Huang, Y. Kabashima, Origin of the computational hardness for learning with binary synapses. Phys. Rev. E 90, 052813 (2014).","journal-title":"Phys. Rev. E"},{"key":"e_1_3_4_9_2","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1007\/BF01313839","article-title":"Dynamics of learning for the binary perceptron problem","volume":"86","author":"Horner H.","year":"1992","unstructured":"H. Horner, Dynamics of learning for the binary perceptron problem. Zeitschrift f\u00fcr Physik B Condens. Matter 86, 291\u2013308 (1992).","journal-title":"Zeitschrift f\u00fcr Physik B Condens. Matter"},{"key":"e_1_3_4_10_2","doi-asserted-by":"crossref","DOI":"10.1103\/PhysRevLett.96.030201","article-title":"Learning by message passing in networks of discrete synapses","volume":"96","author":"Braunstein A.","year":"2006","unstructured":"A. Braunstein, R. Zecchina, Learning by message passing in networks of discrete synapses. Phys. Rev. Lett. 96, 030201 (2006).","journal-title":"Phys. Rev. Lett."},{"key":"e_1_3_4_11_2","doi-asserted-by":"crossref","first-page":"11079","DOI":"10.1073\/pnas.0700324104","article-title":"Efficient supervised learning in networks with binary synapses","volume":"104","author":"Baldassi C.","year":"2007","unstructured":"C. Baldassi, A. Braunstein, N. Brunel, R. Zecchina, Efficient supervised learning in networks with binary synapses. Proc. Natl. Acad. Sci. U.S.A. 104, 11079\u20131084 (2007).","journal-title":"Proc. Natl. Acad. Sci. U.S.A."},{"key":"e_1_3_4_12_2","doi-asserted-by":"crossref","first-page":"902","DOI":"10.1007\/s10955-009-9822-1","article-title":"Generalization learning in a perceptron with binary synapses","volume":"136","author":"Baldassi C.","year":"2009","unstructured":"C. Baldassi, Generalization learning in a perceptron with binary synapses. J. Stat. Phys. 136, 902\u2013916 (2009).","journal-title":"J. Stat. Phys."},{"key":"e_1_3_4_13_2","doi-asserted-by":"crossref","first-page":"023301","DOI":"10.1088\/1742-5468\/2016\/02\/023301","article-title":"Local entropy as a measure for sampling solutions in constraint satisfaction problems","volume":"2016","author":"Baldassi C.","year":"2016","unstructured":"C. Baldassi, A. Ingrosso, C. Lucibello, L. Saglietti, R. Zecchina, Local entropy as a measure for sampling solutions in constraint satisfaction problems. J. Stat. Mech. Theory Exp. 2016, 023301 (2016).","journal-title":"J. Stat. Mech. Theory Exp."},{"key":"e_1_3_4_14_2","doi-asserted-by":"crossref","first-page":"052313","DOI":"10.1103\/PhysRevE.93.052313","article-title":"Learning may need only a few bits of synaptic precision","volume":"93","author":"Baldassi C.","year":"2016","unstructured":"C. Baldassi, F. Gerace, C. Lucibello, L. Saglietti, R. Zecchina, Learning may need only a few bits of synaptic precision. Phys. Rev. E 93, 052313 (2016).","journal-title":"Phys. Rev. E"},{"key":"e_1_3_4_15_2","doi-asserted-by":"crossref","first-page":"4146","DOI":"10.1103\/PhysRevA.45.4146","article-title":"Broken symmetries in multilayered perceptrons","volume":"45","author":"Barkai E.","year":"1992","unstructured":"E. Barkai, D. Hansel, H. Sompolinsky, Broken symmetries in multilayered perceptrons. Phys. Rev. A 45, 4146\u20134161 (1992).","journal-title":"Phys. Rev. A"},{"key":"e_1_3_4_16_2","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1209\/0295-5075\/20\/4\/015","article-title":"Generalization in a large committee machine","volume":"20","author":"Schwarze H.","year":"1992","unstructured":"H. Schwarze, J. Hertz, Generalization in a large committee machine. Europhys. Lett. 20, 375\u2013380 (1992).","journal-title":"Europhys. Lett."},{"key":"e_1_3_4_17_2","doi-asserted-by":"crossref","first-page":"7590","DOI":"10.1103\/PhysRevA.45.7590","article-title":"Storage capacity and learning algorithms for two-layer neural networks","volume":"45","author":"Engel A.","year":"1992","unstructured":"A. Engel, H. M. K\u00f6hler, F. Tschepke, H. Vollmayr, A. Zippelius, Storage capacity and learning algorithms for two-layer neural networks. Phys. Rev. A 45, 7590\u20137609 (1992).","journal-title":"Phys. Rev. A"},{"key":"e_1_3_4_18_2","volume-title":"Spin Glass Theory and beyond: An Introduction to the Replica Method and Its Applications","author":"M\u00e9zard M.","year":"1987","unstructured":"M. M\u00e9zard, G. Parisi, M. Virasoro, Spin Glass Theory and beyond: An Introduction to the Replica Method and Its Applications (World Scientific Publishing Company, 1987), vol. 9."},{"key":"e_1_3_4_19_2","doi-asserted-by":"crossref","first-page":"2312","DOI":"10.1103\/PhysRevLett.65.2312","article-title":"Statistical mechanics of a multilayered neural network","volume":"65","author":"Barkai E.","year":"1990","unstructured":"E. Barkai, D. Hansel, I. Kanter, Statistical mechanics of a multilayered neural network. Phys. Rev. Lett. 65, 2312\u20132315 (1990).","journal-title":"Phys. Rev. Lett."},{"key":"e_1_3_4_20_2","doi-asserted-by":"crossref","first-page":"2432","DOI":"10.1103\/PhysRevLett.75.2432","article-title":"Weight space structure and internal representations: A direct approach to learning and generalization in multilayer neural networks","volume":"75","author":"Monasson R.","year":"1995","unstructured":"R. Monasson, R. Zecchina, Weight space structure and internal representations: A direct approach to learning and generalization in multilayer neural networks. Phys. Rev. Lett. 75, 2432\u20132435 (1995).","journal-title":"Phys. Rev. Lett."},{"key":"e_1_3_4_21_2","doi-asserted-by":"crossref","first-page":"4839","DOI":"10.1109\/TIT.2006.883541","article-title":"Weight distribution of low-density parity-check codes","volume":"52","author":"Di C.","year":"2006","unstructured":"C. Di, T. J. Richardson, R. L. Urbanke, Weight distribution of low-density parity-check codes. IEEE Trans. Inf. Theory 52, 4839\u20134855 (2006).","journal-title":"IEEE Trans. Inf. Theory"},{"key":"e_1_3_4_22_2","doi-asserted-by":"crossref","first-page":"170602","DOI":"10.1103\/PhysRevLett.123.170602","article-title":"Properties of the geometry of solutions and capacity of multilayer neural networks with rectified linear unit activations","volume":"123","author":"Baldassi C.","year":"2019","unstructured":"C. Baldassi, E. M. Malatesta, R. Zecchina, Properties of the geometry of solutions and capacity of multilayer neural networks with rectified linear unit activations. Phys. Rev. Lett. 123, 170602 (2019).","journal-title":"Phys. Rev. Lett."},{"key":"e_1_3_4_23_2","doi-asserted-by":"crossref","first-page":"268103","DOI":"10.1103\/PhysRevLett.120.268103","article-title":"Role of synaptic stochasticity in training low-precision neural networks","volume":"120","author":"Baldassi C.","year":"2018","unstructured":"C. Baldassi , Role of synaptic stochasticity in training low-precision neural networks. Phys. Rev. Lett. 120, 268103 (2018).","journal-title":"Phys. Rev. Lett."},{"key":"e_1_3_4_24_2","first-page":"1401","article-title":"Recipes for metastable states in spin glasses","volume":"5","author":"Franz S.","year":"1995","unstructured":"S. Franz, G. Parisi, Recipes for metastable states in spin glasses. J. de Physique I 5, 1401\u20131415 (1995).","journal-title":"J. de Physique I"},{"key":"e_1_3_4_25_2","volume-title":"Statistical Physics, Optimization, Inference, and Message-Passing Algorithms","author":"Krzakala F.","year":"2016","unstructured":"F. Krzakala , Statistical Physics, Optimization, Inference, and Message-Passing Algorithms (Oxford University Press, 2016)."},{"key":"e_1_3_4_26_2","unstructured":"W. C. Ridgway \u201cAn adaptive logic system with generalizing properties \u201d PhD thesis Stanford Electronics Labs. Rep. 1556-1 Stanford University Stanford CA (1962)."},{"key":"e_1_3_4_27_2","first-page":"288","article-title":"\u201cPattern-recognizing control systems\u201d in","author":"Widrow B.","year":"1964","unstructured":"B. Widrow, F. W. Smith (1964) \u201cPattern-recognizing control systems\u201d in Computer and Information Sciences: Collected Papers on Learning, Adaptation and Control in Information Systems, J. T. Tou, R. H. Wilcox, Eds. (COINS, Spartan Books, Washington DC. 1964), pp. 288\u2013317.","journal-title":"Computer and Information Sciences: Collected Papers on Learning, Adaptation and Control in Information Systems"},{"key":"e_1_3_4_28_2","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1007\/BF00204772","article-title":"Bounds on the learning capacity of some multi-layer networks","volume":"60","author":"Mitchison G.","year":"1989","unstructured":"G. Mitchison, R. Durbin, Bounds on the learning capacity of some multi-layer networks. Biol. Cybern. 60, 345\u2013365 (1989).","journal-title":"Biol. Cybern."},{"key":"e_1_3_4_29_2","unstructured":"H. Xiao K. Rasul R. Vollgraf Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747 (25 August 2017)."},{"key":"e_1_3_4_30_2","doi-asserted-by":"crossref","first-page":"2847","DOI":"10.1103\/PhysRevLett.75.2847","article-title":"Structural glass transition and the entropy of the metastable states","volume":"75","author":"Monasson R.","year":"1995","unstructured":"R. Monasson, Structural glass transition and the entropy of the metastable states. Phys. Rev. Lett. 75, 2847\u20132850 (1995).","journal-title":"Phys. Rev. Lett."}],"container-title":["Proceedings of the National Academy of Sciences"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/www.pnas.org\/syndication\/doi\/10.1073\/pnas.1908636117","content-type":"unspecified","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/pnas.org\/doi\/pdf\/10.1073\/pnas.1908636117","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,7]],"date-time":"2022-06-07T21:27:05Z","timestamp":1654637225000},"score":1,"resource":{"primary":{"URL":"https:\/\/pnas.org\/doi\/full\/10.1073\/pnas.1908636117"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12,23]]},"references-count":30,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,1,7]]}},"alternative-id":["10.1073\/pnas.1908636117"],"URL":"https:\/\/doi.org\/10.1073\/pnas.1908636117","relation":{},"ISSN":["0027-8424","1091-6490"],"issn-type":[{"value":"0027-8424","type":"print"},{"value":"1091-6490","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,12,23]]},"assertion":[{"value":"2019-12-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}