{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T20:50:28Z","timestamp":1774903828110,"version":"3.50.1"},"reference-count":44,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2024,12,17]],"date-time":"2024-12-17T00:00:00Z","timestamp":1734393600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["12071356"],"award-info":[{"award-number":["12071356"]}]},{"name":"National Natural Science Foundation of China","award":["2024AFC020"],"award-info":[{"award-number":["2024AFC020"]}]},{"name":"National Natural Science Foundation of China","award":["CZY23010"],"award-info":[{"award-number":["CZY23010"]}]},{"name":"Natural Science Foundation of Hubei Province in China","award":["12071356"],"award-info":[{"award-number":["12071356"]}]},{"name":"Natural Science Foundation of Hubei Province in China","award":["2024AFC020"],"award-info":[{"award-number":["2024AFC020"]}]},{"name":"Natural Science Foundation of Hubei Province in China","award":["CZY23010"],"award-info":[{"award-number":["CZY23010"]}]},{"name":"Fundamental Research Funds for the Central Universities, South-Central MinZu University","award":["12071356"],"award-info":[{"award-number":["12071356"]}]},{"name":"Fundamental Research Funds for the Central Universities, South-Central MinZu University","award":["2024AFC020"],"award-info":[{"award-number":["2024AFC020"]}]},{"name":"Fundamental Research Funds for the Central Universities, South-Central MinZu University","award":["CZY23010"],"award-info":[{"award-number":["CZY23010"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Maximum correntropy criterion (MCC) has been an important method in machine learning and signal processing communities since it was successfully applied in various non-Gaussian noise scenarios. In comparison with the classical least squares method (LS), which takes only the second-order moment of models into consideration and belongs to the convex optimization problem, MCC captures the high-order information of models that play crucial roles in robust learning, which is usually accompanied by solving the non-convexity optimization problems. As we know, the theoretical research on convex optimizations has made significant achievements, while theoretical understandings of non-convex optimization are still far from mature. Motivated by the popularity of the stochastic gradient descent (SGD) for solving nonconvex problems, this paper considers SGD applied to the kernel version of MCC, which has been shown to be robust to outliers and non-Gaussian data in nonlinear structure models. As the existing theoretical results for the SGD algorithm applied to the kernel MCC are not well established, we present the rigorous analysis for the convergence behaviors and provide explicit convergence rates under some standard conditions. Our work can fill the gap between optimization process and convergence during the iterations: the iterates need to converge to the global minimizer while the obtained estimator cannot ensure the global optimality in the learning process.<\/jats:p>","DOI":"10.3390\/e26121104","type":"journal-article","created":{"date-parts":[[2024,12,17]],"date-time":"2024-12-17T08:17:00Z","timestamp":1734423420000},"page":"1104","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Stochastic Gradient Descent for Kernel-Based Maximum Correntropy Criterion"],"prefix":"10.3390","volume":"26","author":[{"given":"Tiankai","family":"Li","sequence":"first","affiliation":[{"name":"School of Mathematics and Statistics, South-Central MinZu University, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Baobin","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, South-Central MinZu University, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chaoquan","family":"Peng","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, South-Central MinZu University, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0903-6344","authenticated-orcid":false,"given":"Hong","family":"Yin","sequence":"additional","affiliation":[{"name":"School of Mathematics, Renmin University of China, Beijing 100872, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,12,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"880","DOI":"10.1109\/LSP.2014.2319308","article-title":"Steady-state mean-square error analysis for adaptive filtering under the maximum correntropy criterion","volume":"21","author":"Chen","year":"2014","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"538","DOI":"10.1109\/LSP.2023.3273174","article-title":"An efficient parameter optimization of maximum correntropy criterion","volume":"30","author":"Shi","year":"2023","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_3","first-page":"1339","article-title":"An improved variable kernel width for maximum correntropy criterion algorithm","volume":"67","author":"Shi","year":"2020","journal-title":"IEEE Trans. Circuits Syst. II Express Briefs"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1032","DOI":"10.1109\/LSP.2023.3301808","article-title":"Euclidean direction search algorithm based on maximum correntropy criterion","volume":"30","author":"Wang","year":"2023","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Principe, J.C. (2010). Information Theoretic Learning: Renyi\u2019s Entropy and Kernel Perspectives, Springer.","DOI":"10.1007\/978-1-4419-1570-2"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"5286","DOI":"10.1109\/TSP.2007.896065","article-title":"Correntropy: Properties and Applications in Non-Gaussian Signal Processing","volume":"55","author":"Liu","year":"2007","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_7","first-page":"993","article-title":"Learning with the Maximum Correntropy Criterion Induced Losses for Regression","volume":"16","author":"Feng","year":"2015","journal-title":"J. Mach. Learn. Res."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"4027","DOI":"10.1109\/TIP.2015.2456508","article-title":"Robust Hyperspectral Unmixing with Correntropy-Based Metric","volume":"24","author":"Wang","year":"2015","journal-title":"IEEE Trans. Image Process."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1016\/j.acha.2016.04.004","article-title":"Kernel-based sparse regression with the correntropy-induced loss","volume":"44","author":"Chen","year":"2018","journal-title":"Appl. Comput. Harmon. Anal."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1561","DOI":"10.1109\/TPAMI.2010.220","article-title":"Maximum Correntropy Criterion for Robust Face Recognition","volume":"33","author":"He","year":"2011","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1657","DOI":"10.1109\/TPWRS.2009.2030291","article-title":"Entropy and Correntropy Against Minimum Square Error in Offline and Online Three-Day Ahead Wind Power Forecasting","volume":"24","author":"Bessa","year":"2009","journal-title":"IEEE Trans. Power Syst."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1016\/j.automatica.2016.10.004","article-title":"Maximum correntropy Kalman filter","volume":"76","author":"Chen","year":"2017","journal-title":"Automatica"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"5248","DOI":"10.1021\/ie401347k","article-title":"Correntropy kernel learning for nonlinear system identification with outliers","volume":"53","author":"Liu","year":"2013","journal-title":"Ind. Eng. Chem. Res."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1950","DOI":"10.1109\/TNET.2012.2187923","article-title":"An information theoretic approach of designing sparse kernel adaptive filters","volume":"20","author":"Liu","year":"2009","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"6252","DOI":"10.1109\/TNNLS.2018.2827778","article-title":"A new correntropy-based conjugate gradient backpropagation algorithm for improving training in neural networks","volume":"16","author":"Heravi","year":"2018","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"3390","DOI":"10.1109\/TCSI.2018.2825241","article-title":"Random fourier filters under maximum correntropy criterion","volume":"65","author":"Wang","year":"2018","journal-title":"IEEE Trans. Circuits Syst. I Regul. Pap."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.sigpro.2015.04.024","article-title":"Kernel recursive maximum correntropy","volume":"117","author":"Wu","year":"2015","journal-title":"Signal Process."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1016\/j.dsp.2017.01.010","article-title":"Quantized kernel maximum correntropy and its mean square convergence analysis","volume":"63","author":"Wang","year":"2017","journal-title":"Digit. Signal Process."},{"key":"ref_19","first-page":"1159","article-title":"Robust multikernel maximum correntropy filters","volume":"67","author":"Xiong","year":"2020","journal-title":"IEEE Trans. Circuits Syst. II Express Briefs"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhao, S., Chen, B., and Pr\u00edncipe, J.C. (August, January 31). Kernel adaptive filtering with maximum correntropy criterion. Proceedings of the 2011 International Joint Conference on Neural Networks, San Jose, CA, USA.","DOI":"10.1109\/IJCNN.2011.6033473"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"5497","DOI":"10.1109\/TCYB.2019.2959834","article-title":"Kernel correntropy conjugate gradient algorithms based on half-quadratic optimization","volume":"51","author":"Xiong","year":"2021","journal-title":"IEEE Trans. Cybern."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhang, T. (2004, January 4\u20138). Solving large scale linear prediction problems using stochastic gradient descent algorithms. Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada.","DOI":"10.1145\/1015330.1015332"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"5008","DOI":"10.1109\/TNNLS.2017.2764960","article-title":"On the Generalization Ability of Online Gradient Descent Algorithm Under the Quadratic Growth Condition","volume":"29","author":"Chang","year":"2018","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1016\/j.acha.2015.08.007","article-title":"Unregularized Online Learning Algorithms with General Loss Functions","volume":"42","author":"Ying","year":"2017","journal-title":"Appl. Comput. Harmon. Anal."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1090\/S0002-9947-1950-0051437-7","article-title":"Theory of reproducing kernels","volume":"68","author":"Aronszajn","year":"1950","journal-title":"Trans. Am. Math. Soc."},{"key":"ref_26","first-page":"1","article-title":"A Statistical Learning Approach to Modal Regression","volume":"21","author":"Feng","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"795","DOI":"10.1016\/j.acha.2019.09.001","article-title":"Learning with correntropy-induced losses for regression with mixture of symmetric stable noise","volume":"48","author":"Feng","year":"2020","journal-title":"Appl. Comput. Harmon. Anal."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"2341","DOI":"10.1137\/120880811","article-title":"Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming","volume":"23","author":"Ghadimi","year":"2013","journal-title":"Siam J. Optim."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Sun, D., Roth, S., and Black, M.J. (2010, January 13\u201318). Secrets of Optical Flow Estimation and Their Principles. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539939"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"065009","DOI":"10.1088\/1361-6420\/aabe55","article-title":"Gradient Descent for Robust Kernel-based Regression","volume":"34","author":"Guo","year":"2018","journal-title":"Inverse Probl."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"4394","DOI":"10.1109\/TNNLS.2019.2952219","article-title":"Stochastic Gradient Descent for Nonconvex Learning Without Bounded Gradient Assumptions","volume":"31","author":"Lei","year":"2020","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Karimi, H., Nutini, J., and Schmidt, M. (2016). Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Lojasiewicz Condition. Machine Learning and Knowledge Discovery in Databases: European Conference, Springer International Publishing.","DOI":"10.1007\/978-3-319-46128-1_50"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"864","DOI":"10.1016\/0041-5553(63)90382-3","article-title":"Gradient methods for the minimisation of functionals","volume":"3","author":"Polyak","year":"1963","journal-title":"USSR Comput. Math. Math. Phys."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1007\/s11590-013-0626-5","article-title":"On the optimization properties of the correntropic loss function in data analysis","volume":"8","author":"Syed","year":"2014","journal-title":"Optim. Lett."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1162\/neco_a_01334","article-title":"New insights into learning with correntropy-based regression","volume":"33","author":"Feng","year":"2021","journal-title":"Neural Comput."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Wang, B., and Hu, T. (2019). Online gradient descent for kernel-based maximum correntropy criterion. Entropy, 21.","DOI":"10.3390\/e21070644"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"4775","DOI":"10.1109\/TIT.2006.883632","article-title":"Online regularized classification algorithm","volume":"52","author":"Ying","year":"2006","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_38","unstructured":"Kingma, D.P., and Ba, L.J. (2015, January 7\u20139). Adam: A method for stochastic optimization. Proceedings of the International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_39","first-page":"2121","article-title":"Adaptive subgradient methods for online learning and stochastic optimization","volume":"12","author":"Duchi","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_40","first-page":"26","article-title":"RMSProp: Divide the gradient by a running average of its recent magnitude","volume":"4","author":"Tieleman","year":"2012","journal-title":"Coursera Neural Netw. Mach. Learn."},{"key":"ref_41","unstructured":"Reddi, S.J., Kale, S., and Kumar, S. (May, January 30). On the convergence of Adam and beyond. Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"4159","DOI":"10.3934\/cpaa.2020186","article-title":"Kernel-based maximum correntropy criterion with gradient descent method","volume":"19","author":"Hu","year":"2020","journal-title":"Commun. Pure Appl. Anal."},{"key":"ref_43","unstructured":"Steinwart, I., and Christmann, A. (2008). Support Vector Machines, Springer Science and Business Media."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1016\/j.acha.2019.01.002","article-title":"Distributed kernel gradient descent algorithm for minimum error entropy principle","volume":"49","author":"Hu","year":"2020","journal-title":"Appl. Comput. Harmon. Anal."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/26\/12\/1104\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:53:57Z","timestamp":1760115237000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/26\/12\/1104"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,17]]},"references-count":44,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["e26121104"],"URL":"https:\/\/doi.org\/10.3390\/e26121104","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,17]]}}}