{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,7]],"date-time":"2026-02-07T00:31:47Z","timestamp":1770424307497,"version":"3.49.0"},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2025,2,7]],"date-time":"2025-02-07T00:00:00Z","timestamp":1738886400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0"},{"start":{"date-parts":[[2025,2,7]],"date-time":"2025-02-07T00:00:00Z","timestamp":1738886400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2025,3]]},"DOI":"10.1007\/s10994-024-06661-5","type":"journal-article","created":{"date-parts":[[2025,2,7]],"date-time":"2025-02-07T18:06:07Z","timestamp":1738951567000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Per-example gradient regularization improves learning signals from noisy data"],"prefix":"10.1007","volume":"114","author":[{"given":"Xuran","family":"Meng","sequence":"first","affiliation":[]},{"given":"Yuan","family":"Cao","sequence":"additional","affiliation":[]},{"given":"Difan","family":"Zou","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,2,7]]},"reference":[{"key":"6661_CR1","unstructured":"Allen-Zhu, Z., & Li, Y. (2023). Towards understanding ensemble, knowledge distillation and self-distillation in deep learning. In 11nd International conference on learning representations, ICLR."},{"key":"6661_CR2","unstructured":"Andriushchenko, M., & Flammarion, N. (2022). Towards understanding sharpness-aware minimization. In International conference on machine learning (pp. 639\u2013668). PMLR."},{"key":"6661_CR3","unstructured":"Barrett, D., & Dherin, B. (2021). Implicit gradient regularization. In International conference on learning representations."},{"issue":"48","key":"6661_CR4","doi-asserted-by":"publisher","first-page":"30063","DOI":"10.1073\/pnas.1907378117","volume":"117","author":"PL Bartlett","year":"2020","unstructured":"Bartlett, P. L., Long, P. M., Lugosi, G., & Tsigler, A. (2020). Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48), 30063\u201330070.","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"6661_CR5","unstructured":"Blanc, G., Gupta, N., Valiant, G., & Valiant, P. (2020). Implicit regularization for deep neural networks driven by an Ornstein\u2013Uhlenbeck like process. In Conference on learning theory (pp. 483\u2013513). PMLR."},{"key":"6661_CR6","first-page":"25237","volume":"35","author":"Y Cao","year":"2022","unstructured":"Cao, Y., Chen, Z., Belkin, M., & Gu, Q. (2022). Benign overfitting in two-layer convolutional neural networks. Advances in Neural Information Processing Systems, 35, 25237\u201325250.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"6661_CR7","first-page":"8407","volume":"34","author":"Y Cao","year":"2021","unstructured":"Cao, Y., Gu, Q., & Belkin, M. (2021). Risk bounds for over-parameterized maximum margin classification on sub-Gaussian mixtures. Advances in Neural Information Processing Systems, 34, 8407\u20138418.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"1","key":"6661_CR8","first-page":"5721","volume":"22","author":"NS Chatterji","year":"2021","unstructured":"Chatterji, N. S., & Long, P. M. (2021). Finite-sample analysis of interpolating linear classifiers in the overparameterized regime. Journal of Machine Learning Research, 22(1), 5721\u20135750.","journal-title":"Journal of Machine Learning Research"},{"key":"6661_CR9","unstructured":"Dherin, B., Munn, M., Rosca, M., & Barrett, D. G. T. (2022). Why neural networks find simple solutions: The many regularizers of geometric complexity."},{"key":"6661_CR10","unstructured":"Foret, P., Kleiner, A., Mobahi, H., & Neyshabur, B. (2021). Sharpness-aware minimization for efficiently improving generalization. In International conference on learning representations."},{"key":"6661_CR11","unstructured":"Frei, S., Chatterji, N. S., & Bartlett, P. L. (2022). Benign overfitting without linearity: Neural network classifiers trained by gradient descent for noisy linear data. arXiv preprint arXiv:2202.05928"},{"key":"6661_CR12","unstructured":"Geiping, J., Goldblum, M., Pope, P.E., Moeller, M., & Goldstein, T. (2021). Stochastic training is not necessary for generalization. In ICLR."},{"key":"6661_CR13","unstructured":"Hastie, T., Montanari, A., Rosset, S., & Tibshirani, R. J. (2019). Surprises in high-dimensional ridgeless least squares interpolation. arXiv preprint arXiv:1903.08560"},{"issue":"1","key":"6661_CR14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1162\/neco.1997.9.1.1","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter, S., & Schmidhuber, J. (1997). Flat minima. Neural Computation, 9(1), 1\u201342.","journal-title":"Neural Computation"},{"key":"6661_CR15","unstructured":"Jastrzebski, S., Kenton, Z., Arpit, D., Ballas, N., Fischer, A., Bengio, Y., & Storkey, A. (2017). Three factors influencing minima in SGD. arXiv preprint arXiv:1711.04623"},{"key":"6661_CR16","unstructured":"Jiang, Y., Neyshabur, B., Mobahi, H., Krishnan, D., & Bengio, S. (2020). Fantastic generalization measures and where to find them. In International conference on learning representations."},{"key":"6661_CR17","unstructured":"Keskar, N. S., Nocedal, J., Tang, P. T. P., Mudigere, D., & Smelyanskiy, M. (2017). On large-batch training for deep learning: Generalization gap and sharp minima. In 5th International conference on learning representations, ICLR."},{"key":"6661_CR18","unstructured":"Kou, Y., Chen, Z., Chen, Y., & Gu, Q. (2023). Benign overfitting in two-layer ReLU convolutional neural networks. In International conference on machine learning (pp. 17615\u201317659). PMLR."},{"key":"6661_CR19","unstructured":"Kuka\u010dka, J., Golkov, V., & Cremers, D. (2017). Regularization for deep learning: A taxonomy. arXiv preprint arXiv:1710.10686"},{"issue":"1","key":"6661_CR20","first-page":"7479","volume":"22","author":"CH Martin","year":"2021","unstructured":"Martin, C. H., & Mahoney, M. W. (2021). Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning. The Journal of Machine Learning Research, 22(1), 7479\u20137551.","journal-title":"The Journal of Machine Learning Research"},{"issue":"4","key":"6661_CR21","doi-asserted-by":"publisher","first-page":"667","DOI":"10.1002\/cpa.22008","volume":"75","author":"S Mei","year":"2022","unstructured":"Mei, S., & Montanari, A. (2022). The generalization error of random features regression: Precise asymptotics and the double descent curve. Communications on Pure and Applied Mathematics, 75(4), 667\u2013766.","journal-title":"Communications on Pure and Applied Mathematics"},{"key":"6661_CR22","unstructured":"Meng, X., Cao, Y., & Zou, D. (2023). Per-example gradient regularization improves learning signals from noisy data. arXiv preprint arXiv:2303.17940"},{"key":"6661_CR23","first-page":"1","volume":"24","author":"X Meng","year":"2023","unstructured":"Meng, X., & Yao, J. (2023). Impact of classification difficulty on the weight matrices spectra in deep learning and application to early-stopping. Journal of Machine Learning Research, 24, 1\u201340.","journal-title":"Journal of Machine Learning Research"},{"key":"6661_CR24","unstructured":"Meng, X., Yao, J., & Cao, Y. (2022). Multiple descent in the multiple random feature model."},{"key":"6661_CR25","unstructured":"Shen, R., Bubeck, S., & Gunasekar, S. (2022). Data augmentation as feature manipulation. In International Conference on Machine Learning (pp. 19773\u201319808)."},{"key":"6661_CR26","unstructured":"Smith, S., Elsen, E., & De, S. (2020). On the generalization benefit of noise in stochastic gradient descent. In International conference on machine learning (pp. 9058\u20139067). PMLR."},{"key":"6661_CR27","unstructured":"Smith, S. L., Dherin, B., Barrett, D., & De, S. (2021). On the origin of implicit regularization in stochastic gradient descent. In International conference on learning representations."},{"key":"6661_CR28","unstructured":"Wen, K., Ma, T., & Li, Z. (2023). How does sharpness-aware minimization minimize sharpness?"},{"key":"6661_CR29","first-page":"10112","volume":"33","author":"D Wu","year":"2020","unstructured":"Wu, D., & Xu, J. (2020). On the optimal weighted regularization in overparameterized linear regression. Advances in Neural Information Processing Systems, 33, 10112\u201310123.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"6661_CR30","unstructured":"Zhao, Y., Zhang, H., & Hu, X. (2022). Penalizing gradient norm for efficiently improving generalization in deep learning. In International conference on machine learning (pp. 26982\u201326992). PMLR."},{"key":"6661_CR31","unstructured":"Zou, D., Cao, Y., Li, Y., & Gu, Q. (2023a). The benefits of mixup for feature learning. arXiv preprint arXiv:2303.08433"},{"key":"6661_CR32","unstructured":"Zou, D., Cao, Y., Li, Y., & Gu, Q. (2023b). Understanding the generalization of Adam in learning neural networks with proper regularization. In 11nd International conference on learning representations, ICLR 2023."},{"key":"6661_CR33","unstructured":"Zou, D., Wu, J., Braverman, V., Gu, Q., & Kakade, S. (2021). Benign overfitting of constant-stepsize SGD for linear regression. In Conference on learning theory (pp. 4633\u20134635). PMLR."}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-024-06661-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-024-06661-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-024-06661-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,25]],"date-time":"2025-02-25T19:43:15Z","timestamp":1740512595000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-024-06661-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,7]]},"references-count":33,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,3]]}},"alternative-id":["6661"],"URL":"https:\/\/doi.org\/10.1007\/s10994-024-06661-5","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,7]]},"assertion":[{"value":"29 May 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 August 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 December 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 February 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"74"}}