{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T04:31:18Z","timestamp":1775709078129,"version":"3.50.1"},"reference-count":46,"publisher":"IOP Publishing","issue":"1","license":[{"start":{"date-parts":[[2023,2,14]],"date-time":"2023-02-14T00:00:00Z","timestamp":1676332800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,14]],"date-time":"2023-02-14T00:00:00Z","timestamp":1676332800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/501100002322","name":"Coordena\u00e7\u00e3o de Aperfei\u00e7oamento de Pessoal de N\u00edvel Superior","doi-asserted-by":"crossref","award":["001"],"award-info":[{"award-number":["001"]}],"id":[{"id":"10.13039\/501100002322","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100010663","name":"H2020 European Research Council","doi-asserted-by":"crossref","award":["714608-SMiL"],"award-info":[{"award-number":["714608-SMiL"]}],"id":[{"id":"10.13039\/100010663","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2023,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with a single-layer teacher\u2013student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal (BO) estimation and empirical risk minimisation (ERM) were extensively analysed in this setting. At the same time, a considerable part of modern machine learning practice concerns multi-class classification. Yet, an analogous analysis for the multi-class teacher\u2013student perceptron was missing. In this manuscript we fill this gap by deriving and evaluating asymptotic expressions for the BO and ERM generalisation errors in the high-dimensional regime. For Gaussian teacher, we investigate the performance of ERM with both cross-entropy and square losses, and explore the role of ridge regularisation in approaching Bayes-optimality. In particular, we observe that regularised cross-entropy minimisation yields close-to-optimal accuracy. Instead, for Rademacher teacher we show that a first-order phase transition arises in the BO performance.<\/jats:p>","DOI":"10.1088\/2632-2153\/acb428","type":"journal-article","created":{"date-parts":[[2023,1,18]],"date-time":"2023-01-18T10:40:46Z","timestamp":1674038446000},"page":"015019","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Learning curves for the multi-class teacher\u2013student perceptron"],"prefix":"10.1088","volume":"4","author":[{"given":"Elisabetta","family":"Cornacchia","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9944-2498","authenticated-orcid":true,"given":"Francesca","family":"Mignacco","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6835-4871","authenticated-orcid":true,"given":"Rodrigo","family":"Veiga","sequence":"additional","affiliation":[]},{"given":"C\u00e9dric","family":"Gerbelot","sequence":"additional","affiliation":[]},{"given":"Bruno","family":"Loureiro","sequence":"additional","affiliation":[]},{"given":"Lenka","family":"Zdeborov\u00e1","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2023,2,14]]},"reference":[{"key":"mlstacb428bib1","doi-asserted-by":"publisher","first-page":"1983","DOI":"10.1088\/0305-4470\/22\/12\/004","article-title":"Three unfinished works on the optimal storage capacity of networks","volume":"22","author":"Gardner","year":"1989","journal-title":"J. Phys. A: Math. Gen."},{"key":"mlstacb428bib2","doi-asserted-by":"publisher","first-page":"6056","DOI":"10.1103\/PhysRevA.45.6056","article-title":"Statistical mechanics of learning from examples","volume":"45","author":"Seung","year":"1992","journal-title":"Phys. Rev. A"},{"key":"mlstacb428bib3","doi-asserted-by":"publisher","first-page":"499","DOI":"10.1103\/RevModPhys.65.499","article-title":"The statistical mechanics of learning a rule","volume":"65","author":"Watkin","year":"1993","journal-title":"Rev. Mod. Phys."},{"key":"mlstacb428bib4","author":"Engel","year":"2001"},{"key":"mlstacb428bib5","doi-asserted-by":"publisher","first-page":"7097","DOI":"10.1103\/PhysRevA.41.7097","article-title":"First-order transition to perfect generalization in a neural network with binary synapses","volume":"41","author":"Gy\u00f6rgyi","year":"1990","journal-title":"Phys. Rev. A"},{"key":"mlstacb428bib6","doi-asserted-by":"publisher","first-page":"1683","DOI":"10.1103\/PhysRevLett.65.1683","article-title":"Learning from examples in large neural networks","volume":"65","author":"Sompolinsky","year":"1990","journal-title":"Phys. Rev. Lett."},{"key":"mlstacb428bib7","doi-asserted-by":"publisher","first-page":"5451","DOI":"10.1073\/pnas.1802705116","article-title":"Optimal errors and phase transitions in high-dimensional generalized linear models","volume":"116","author":"Barbier","year":"2019","journal-title":"Proc. Natl Acad. Sci."},{"key":"mlstacb428bib8","first-page":"12199","article-title":"Generalization error in high-dimensional perceptrons: approaching bayes error with convex optimization","volume":"vol 33","author":"Aubin","year":"2020"},{"key":"mlstacb428bib9","article-title":"Learning curves for multi-task gaussian process regression","volume":"vol 25","author":"Sollich","year":"2012"},{"key":"mlstacb428bib10","first-page":"10144","article-title":"Learning gaussian mixtures with generalized linear models: precise asymptotics in high-dimensions","volume":"vol 34","author":"Loureiro","year":"2021"},{"key":"mlstacb428bib11","article-title":"Benign overfitting in multiclass classification: all roads lead to interpolation","author":"Wang","year":"2021"},{"key":"mlstacb428bib12","first-page":"pp 4020","article-title":"Phase transitions for one-vs-one and one-vs-all linear separability in multiclass gaussian mixtures","author":"Kini","year":"2021"},{"key":"mlstacb428bib13","article-title":"Theoretical insights into multiclass classification: a high-dimensional asymptotic view","author":"Thrampoulidis","year":"2020"},{"key":"mlstacb428bib14","first-page":"pp 3357","article-title":"A large scale analysis of logistic regression: asymptotic performance and new insights","author":"Mai","year":"2019"},{"key":"mlstacb428bib15","first-page":"pp 4267","article-title":"A model of double descent for high-dimensional logistic regression","author":"Deng","year":"2020"},{"key":"mlstacb428bib16","first-page":"pp 2527","article-title":"Analytic study of double descent in binary classification: the impact of loss","author":"Kini","year":"2020"},{"key":"mlstacb428bib17","first-page":"pp 6874","article-title":"The role of regularization in classification of high-dimensional noisy gaussian mixture","author":"Mignacco","year":"2020"},{"key":"mlstacb428bib18","doi-asserted-by":"publisher","DOI":"10.1088\/1742-5468\/ab43d2","article-title":"The committee machine: computational to statistical gaps in learning a two-layers neural network","volume":"2019","author":"Aubin","year":"2019","journal-title":"J. Stat. Mech."},{"key":"mlstacb428bib19","doi-asserted-by":"publisher","first-page":"597","DOI":"10.1093\/imaiai\/iaaa008","article-title":"Overlap matrix concentration in optimal bayesian inference","volume":"10","author":"Barbier","year":"2021","journal-title":"Inf. Inference A"},{"key":"mlstacb428bib20","first-page":"18137","article-title":"Learning curves of generic features maps for realistic datasets with a teacher\u2013student model","volume":"vol 34","author":"Loureiro","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"mlstacb428bib21","doi-asserted-by":"publisher","first-page":"5592","DOI":"10.1109\/TIT.2018.2840720","article-title":"Precise error analysis of regularized m-estimators in high dimensions","volume":"64","author":"Thrampoulidis","year":"2018","journal-title":"IEEE Trans. Inf. Theory"},{"key":"mlstacb428bib22","article-title":"Fluctuations, bias, variance & ensemble of learners: Exact asymptotics for convex losses in high-dimension","author":"Loureiro","year":"2022"},{"key":"mlstacb428bib23","doi-asserted-by":"publisher","first-page":"30063","DOI":"10.1073\/pnas.1907378117","article-title":"Benign overfitting in linear regression","volume":"117","author":"Bartlett","year":"2020","journal-title":"Proc. Natl Acad. Sci."},{"key":"mlstacb428bib24","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevX.10.041044","article-title":"Modeling the influence of data structure on learning in neural networks: the hidden manifold model","volume":"10","author":"Goldt","year":"2020","journal-title":"Phys. Rev. X"},{"key":"mlstacb428bib25","first-page":"pp 15568","article-title":"Kernel alignment risk estimator: risk prediction from training data","volume":"vol 33","author":"Jacot","year":"2020"},{"key":"mlstacb428bib26","first-page":"pp 1024","article-title":"Spectrum dependent learning curves in kernel regression and wide neural networks","author":"Bordelon","year":"2020"},{"key":"mlstacb428bib27","author":"Duda","year":"2012"},{"key":"mlstacb428bib28","article-title":"The shape of learning curves: a review","author":"Viering","year":"2021"},{"key":"mlstacb428bib29","doi-asserted-by":"publisher","first-page":"1997","DOI":"10.1109\/TIT.2011.2174612","article-title":"The LASSO risk for Gaussian matrices","volume":"58","author":"Bayati","year":"2011","journal-title":"IEEE Trans. Inf. Theory"},{"key":"mlstacb428bib30","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1093\/imaiai\/iat004","article-title":"State evolution for general approximate message passing algorithms, with applications to spatial coupling","volume":"2","author":"Javanmard","year":"2013","journal-title":"Inf. Inference A"},{"key":"mlstacb428bib31","article-title":"Graph-based approximate message passing iterations","author":"Gerbelot","year":"2021"},{"key":"mlstacb428bib32","author":"Nishimori","year":"2001"},{"key":"mlstacb428bib33","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"mlstacb428bib34","first-page":"pp 1078","article-title":"The estimation error of general first order methods","author":"Celentano","year":"2020"},{"key":"mlstacb428bib35","doi-asserted-by":"publisher","first-page":"18914","DOI":"10.1073\/pnas.0909892106","article-title":"Message-passing algorithms for compressed sensing","volume":"106","author":"Donoho","year":"2009","journal-title":"Proc. Natl Acad. Sci."},{"key":"mlstacb428bib36","first-page":"pp 2168","article-title":"Generalized approximate message passing for estimation with random linear mixing","author":"Rangan","year":"2011"},{"key":"mlstacb428bib37","article-title":"Mean-field methods and algorithmic perspectives for high-dimensional machine learning","author":"Aubin","year":"2020"},{"key":"mlstacb428bib38","volume":"vol 47","author":"Vershynin","year":"2018"},{"key":"mlstacb428bib39","doi-asserted-by":"publisher","first-page":"596","DOI":"10.1137\/S0363012902407120","article-title":"Bregman monotone optimization algorithms","volume":"42","author":"Bauschke","year":"2003","journal-title":"SIAM J. Control Optim."},{"key":"mlstacb428bib40","article-title":"Joint minimization with alternating bregman proximity operators","volume":"2","author":"Bauschke","year":"2006","journal-title":"Pac. J. Optim."},{"key":"mlstacb428bib41","doi-asserted-by":"publisher","first-page":"764","DOI":"10.1109\/TIT.2010.2094817","article-title":"The dynamics of message passing on dense graphs, with applications to compressed sensing","volume":"57","author":"Bayati","year":"2011","journal-title":"IEEE Trans. Inf. Theory"},{"key":"mlstacb428bib42","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1561\/2400000003","article-title":"Proximal algorithms","volume":"1","author":"Parikh","year":"2014","journal-title":"Found. Trends Optim."},{"key":"mlstacb428bib43","volume":"vol 408","author":"Bauschke","year":"2011"},{"key":"mlstacb428bib44","doi-asserted-by":"publisher","first-page":"453","DOI":"10.1080\/00018732.2016.1211393","article-title":"Statistical physics of inference: thresholds and algorithms","volume":"65","author":"Zdeborov\u00e1","year":"2016","journal-title":"Adv. Phys."},{"key":"mlstacb428bib45","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1038\/s41592-019-0686-2","article-title":"SciPy 1.0: fundamental algorithms for scientific computing in Python","volume":"17","author":"Virtanen","year":"2020","journal-title":"Nat. Methods"},{"key":"mlstacb428bib46","doi-asserted-by":"crossref","DOI":"10.1145\/2833157.2833162","article-title":"Numba: a LLVM-based python JIT compiler","author":"Lam","year":"2015"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acb428","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acb428\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acb428","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acb428\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acb428\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acb428\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acb428\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acb428\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,14]],"date-time":"2023-02-14T13:32:06Z","timestamp":1676381526000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acb428"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,14]]},"references-count":46,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,2,14]]},"published-print":{"date-parts":[[2023,3,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/acb428","relation":{},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,14]]},"assertion":[{"value":"Learning curves for the multi-class teacher\u2013student perceptron","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2023 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2022-09-21","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2023-01-11","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2023-02-14","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}