{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,10]],"date-time":"2026-07-10T14:29:09Z","timestamp":1783693749132,"version":"3.55.0"},"reference-count":19,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,7,11]],"date-time":"2022-07-11T00:00:00Z","timestamp":1657497600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,7,11]],"date-time":"2022-07-11T00:00:00Z","timestamp":1657497600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002835","name":"Chalmers University of Technology","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100002835","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Process Lett"],"published-print":{"date-parts":[[2023,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Recent results on optimization and generalization properties of neural networks showed that in a simple two-layer network, the alignment of the labels to the eigenvectors of the corresponding Gram matrix determines the convergence of the optimization during training. Such analyses also provide upper bounds on the generalization error. We experimentally investigate the implications of these results to deeper networks via embeddings. We regard the layers preceding the final hidden layer as producing different representations of the input data which are then fed to the two-layer model. We show that these representations improve both optimization and generalization. In particular, we investigate three kernel representations when fed to the final hidden layer: the Gaussian kernel and its approximation by random Fourier features, kernels designed to imitate representations produced by neural networks and finally an optimal kernel designed to align the data with target labels. The approximated representations induced by these kernels are fed to the neural network and the optimization and generalization properties of the final model are evaluated and compared.<\/jats:p>","DOI":"10.1007\/s11063-022-10958-8","type":"journal-article","created":{"date-parts":[[2022,7,12]],"date-time":"2022-07-12T06:06:40Z","timestamp":1657606000000},"page":"1681-1695","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Do Kernel and Neural Embeddings Help in Training and Generalization?"],"prefix":"10.1007","volume":"55","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9159-7831","authenticated-orcid":false,"given":"Arman","family":"Rahbar","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Emilio","family":"Jorge","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Devdatt","family":"Dubhashi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Morteza","family":"Haghir Chehreghani","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2022,7,11]]},"reference":[{"key":"10958_CR1","unstructured":"Arora S, Du SS, Hu W, Li Z, Wang R (2019) Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, 322\u2013332"},{"key":"10958_CR2","unstructured":"Belkin M, Ma S, Mandal S (2018) To understand deep learning we need to understand kernel learning. In: Dy J, Krause A (eds.) Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol.\u00a080, 541\u2013549. PMLR"},{"issue":"8","key":"10958_CR3","doi-asserted-by":"publisher","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","volume":"35","author":"Y Bengio","year":"2013","unstructured":"Bengio Y, Courville AC, Vincent P (2013) Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8):1798\u20131828","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"10958_CR4","unstructured":"Cho Y, Saul LK (2009) Kernel methods for deep learning. In: Bengio Y, Schuurmans D, Lafferty JD, Williams CKI, Culotta A (eds.) Advances in Neural Information Processing Systems 22, 342\u2013350. Curran Associates, Inc"},{"key":"10958_CR5","first-page":"795","volume":"13","author":"C Cortes","year":"2012","unstructured":"Cortes C, Mohri M, Rostamizadeh A (2012) Algorithms for learning kernels based on centered alignment. J. Mach. Learn. Res. 13:795\u2013828","journal-title":"J. Mach. Learn. Res."},{"key":"10958_CR6","unstructured":"Cristianini N, Shawe-Taylor J, Elisseeff A, Kandola JS (2001) On kernel-target alignment. In: Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, December 3-8, 2001, Vancouver, British Columbia, Canada], 367\u2013373"},{"key":"10958_CR7","first-page":"2211","volume":"12","author":"M G\u00f6nen","year":"2011","unstructured":"G\u00f6nen M, Alpayd E (2011) Multiple kernel learning algorithms. J. Mach. Learn. Res. 12:2211\u20132268","journal-title":"J. Mach. Learn. Res."},{"key":"10958_CR8","unstructured":"Jacot A, Gabriel F, Hongler C (2018) Neural tangent kernel: Convergence and generalization in neural networks. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds.) Advances in Neural Information Processing Systems 31, 8571\u20138580. Curran Associates, Inc"},{"issue":"7553","key":"10958_CR9","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y LeCun","year":"2015","unstructured":"LeCun Y, Bengio Y, Hinton GE (2015) Deep learning. Nature 521(7553):436\u2013444","journal-title":"Nature"},{"issue":"4","key":"10958_CR10","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1007\/s11222-007-9033-z","volume":"17","author":"U von Luxburg","year":"2007","unstructured":"von Luxburg U (2007) A tutorial on spectral clustering. Statistics and Computing 17(4):395\u2013416","journal-title":"Statistics and Computing"},{"key":"10958_CR11","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4612-0745-0","volume-title":"Bayesian Learning for Neural Networks","author":"RM Neal","year":"1996","unstructured":"Neal RM (1996) Bayesian Learning for Neural Networks. Springer-Verlag, Berlin, Heidelberg"},{"key":"10958_CR12","unstructured":"Rahimi A, Recht B (2007) Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 3-6, 2007, 1177\u20131184"},{"key":"10958_CR13","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1016\/j.neunet.2014.09.003","volume":"61","author":"J Schmidhuber","year":"2015","unstructured":"Schmidhuber J (2015) Deep learning in neural networks: An overview. Neural Networks 61:85\u2013117","journal-title":"Neural Networks"},{"key":"10958_CR14","doi-asserted-by":"crossref","unstructured":"Sch\u00f6lkopf B, Smola AJ, Bach F, et\u00a0al (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press","DOI":"10.7551\/mitpress\/4175.001.0001"},{"key":"10958_CR15","unstructured":"Tsuchida R, Roosta-Khorasani F, Gallagher M (2018) Invariance of weight distributions in rectified mlps. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm\u00e4ssan, Stockholm, Sweden, July 10-15, 2018, 5002\u20135011"},{"issue":"5","key":"10958_CR16","doi-asserted-by":"publisher","first-page":"1203","DOI":"10.1162\/089976698300017412","volume":"10","author":"CKI Williams","year":"1998","unstructured":"Williams CKI (1998) Computation with infinite neural networks. Neural Computation 10(5):1203\u20131216","journal-title":"Neural Computation"},{"key":"10958_CR17","unstructured":"Williams CKI, Seeger M (2001) Using the nystr\u00f6m method to speed up kernel machines. In: Leen TK, Dietterich TG, Tresp V (eds.) Advances in Neural Information Processing Systems 13, 682\u2013688. MIT Press"},{"key":"10958_CR18","unstructured":"Yu F, Zhang Y, Song S, Seff A, Xiao J (2015) Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365"},{"key":"10958_CR19","unstructured":"Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2017) Understanding deep learning requires rethinking generalization. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings"}],"container-title":["Neural Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-022-10958-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11063-022-10958-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-022-10958-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,28]],"date-time":"2024-09-28T23:00:24Z","timestamp":1727564424000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11063-022-10958-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,11]]},"references-count":19,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,4]]}},"alternative-id":["10958"],"URL":"https:\/\/doi.org\/10.1007\/s11063-022-10958-8","relation":{},"ISSN":["1370-4621","1573-773X"],"issn-type":[{"value":"1370-4621","type":"print"},{"value":"1573-773X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,11]]},"assertion":[{"value":"4 July 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 July 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}