{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:37:22Z","timestamp":1760240242320,"version":"build-2065373602"},"reference-count":52,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2019,4,16]],"date-time":"2019-04-16T00:00:00Z","timestamp":1555372800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Model-free variable selection has attracted increasing interest recently due to its flexibility in algorithmic design and outstanding performance in real-world applications. However, most of the existing statistical methods are formulated under the mean square error (MSE) criterion, and susceptible to non-Gaussian noise and outliers. As the MSE criterion requires the data to satisfy Gaussian noise condition, it potentially hampers the effectiveness of model-free methods in complex circumstances. To circumvent this issue, we present a new model-free variable selection algorithm by integrating kernel modal regression and gradient-based variable identification together. The derived modal regression estimator is related closely to information theoretic learning under the maximum correntropy criterion, and assures algorithmic robustness to complex noise by replacing learning of the conditional mean with the conditional mode. The gradient information of estimator offers a model-free metric to screen the key variables. In theory, we investigate the theoretical foundations of our new model on generalization-bound and variable selection consistency. In applications, the effectiveness of the proposed method is verified by data experiments.<\/jats:p>","DOI":"10.3390\/e21040403","type":"journal-article","created":{"date-parts":[[2019,4,17]],"date-time":"2019-04-17T03:02:01Z","timestamp":1555470121000},"page":"403","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Robust Variable Selection and Estimation Based on Kernel Modal Regression"],"prefix":"10.3390","volume":"21","author":[{"given":"Changying","family":"Guo","sequence":"first","affiliation":[{"name":"College of Science, Huazhong Agricultural University, Wuhan 430070, China"}]},{"given":"Biqin","family":"Song","sequence":"additional","affiliation":[{"name":"College of Science, Huazhong Agricultural University, Wuhan 430070, China"}]},{"given":"Yingjie","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Science, Huazhong Agricultural University, Wuhan 430070, China"}]},{"given":"Hong","family":"Chen","sequence":"additional","affiliation":[{"name":"College of Science, Huazhong Agricultural University, Wuhan 430070, China"}]},{"given":"Huijuan","family":"Xiong","sequence":"additional","affiliation":[{"name":"College of Science, Huazhong Agricultural University, Wuhan 430070, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,4,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and delection via the Lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1111\/j.1467-9868.2005.00532.x","article-title":"Model selection and estimation in regression with grouped variables","volume":"68","author":"Yuan","year":"2006","journal-title":"J. R. Stat. Soc."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Zou","year":"2005","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"689","DOI":"10.1214\/aos\/1176349548","article-title":"Additive regression and other nonparametric models","volume":"13","author":"Stone","year":"1985","journal-title":"Ann. Stat."},{"key":"ref_5","unstructured":"Hastie, T.J., and Tibshirani, R.J. (1990). Generalized Additive Models, Chapman and Hall."},{"key":"ref_6","unstructured":"Kandasamy, K., and Yu, Y. (2016, January 19\u201324). Additive approximations in high dimensional nonparametric regression via the SALSA. Proceedings of the International Conference on Machine Learning (ICML), New York, NY, USA."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1620","DOI":"10.1109\/TIT.2016.2634401","article-title":"Nonparametric regression based on hierarchical interaction models","volume":"63","author":"Kohler","year":"2017","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_8","unstructured":"Chen, H., Wang, X., and Huang, H. (2017, January 4\u20139). Group sparse additive machine. Proceedings of the Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.1111\/j.1467-9868.2009.00718.x","article-title":"SpAM: Sparse additive models","volume":"71","author":"Ravikumar","year":"2009","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"ref_10","first-page":"2272","article-title":"Component selection and smoothing in multivariate nonparametric regression","volume":"34","author":"Lin","year":"2007","journal-title":"Ann. Stat."},{"key":"ref_11","unstructured":"Yin, J., Chen, X., and Xing, E.P. (July, January 26). Group sparse additive models. Proceedings of the International Conference on Machine Learning (ICML), Edinburgh, UK."},{"key":"ref_12","unstructured":"He, X., Wang, J., and Lv, S. (2018). Scalable kernel-based variable selection with sparsistency. arXiv."},{"key":"ref_13","first-page":"1","article-title":"Model-free variable selection in reproducing kernel Hilbert space","volume":"17","author":"Yang","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s10994-012-5284-9","article-title":"Learning sparse gradients for variable selection and dimension reduction","volume":"87","author":"Ye","year":"2012","journal-title":"Mach. Learn."},{"key":"ref_15","unstructured":"Gregorov\u00e1, M., Kalousis, A., and Marchand-Maillet, S. (2018). Structured nonlinear variable selection. arXiv."},{"key":"ref_16","first-page":"519","article-title":"Analysis of half-quadratic minimization methods for signal and image recovery","volume":"7","author":"Mukherjee","year":"2006","journal-title":"J. Mach. Learn. Res."},{"key":"ref_17","first-page":"1665","article-title":"Nonparametric sparsity and regularization","volume":"14","author":"Rosasco","year":"2013","journal-title":"J. Mach. Learn. Res."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2200000016","article-title":"Distributed optimization and statistical learning via the alternating direction method of multipliers","volume":"3","author":"Boyd","year":"2011","journal-title":"Found. Trends Mach. Learn."},{"key":"ref_19","unstructured":"Feng, Y., Fan, J., and Suykens, J.A.K. (2017). A statistical learning approach to modal regression. arXiv."},{"key":"ref_20","unstructured":"Wang, X., Chen, H., Cai, W., Shen, D., and Huang, H. (2017, January 4\u20139). Regularized modal regression with applications in cognitive impairment prediction. Proceedings of the Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.1214\/aoms\/1177704472","article-title":"On estimation of a probability density function and mode","volume":"33","author":"Parzen","year":"1962","journal-title":"Ann. Math. Stat."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1007\/BF02868560","article-title":"Estimation of the mode","volume":"16","author":"Chernoff","year":"1964","journal-title":"Ann. Inst. Stat. Math."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1080\/10485252.2012.678848","article-title":"Local modal regression","volume":"24","author":"Yao","year":"2012","journal-title":"J. Nonparametr. Stat."},{"key":"ref_24","first-page":"489","article-title":"Nonparametric modal regression","volume":"44","author":"Chen","year":"2014","journal-title":"Ann. Stat."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1016\/0378-3758(86)90099-6","article-title":"A note on prediction via estimation of the conditional mode function","volume":"15","author":"Collomb","year":"1986","journal-title":"J. Stat. Plan. Inference"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1016\/0304-4076(89)90057-2","article-title":"Mode regression","volume":"42","author":"Lee","year":"1989","journal-title":"J. Econom."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"690","DOI":"10.1214\/aos\/1176345865","article-title":"Maximum likelihood estimation of isotonic modal regression","volume":"10","author":"Sager","year":"1982","journal-title":"Ann. Stat."},{"key":"ref_28","first-page":"1687","article-title":"A nonparametric statistical approach to clustering via mode identification","volume":"8","author":"Li","year":"2007","journal-title":"J. Mach. Learn. Res."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"5286","DOI":"10.1109\/TSP.2007.896065","article-title":"Correntropy: Properties and applications in non-Gaussian signal processing","volume":"55","author":"Liu","year":"2007","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Pr\u00edncipe, J.C. (2010). Information Theoretic Learning: R\u00e9nyi\u2019s Entropy and Kernel Perspectives, Springer.","DOI":"10.1007\/978-1-4419-1570-2"},{"key":"ref_31","first-page":"993","article-title":"Learning with the maximum correntropy criterion induced losses for regression","volume":"16","author":"Feng","year":"2015","journal-title":"J. Mach. Learn. Res."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"937","DOI":"10.1137\/030600862","article-title":"Analysis of half-quadratic minimization methods for signal and image recovery","volume":"27","author":"Nikolova","year":"2005","journal-title":"SIAM J. Sci. Comput."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1090\/S0002-9947-1950-0051437-7","article-title":"Theory of Reproducing Kernels","volume":"68","author":"Aronszajn","year":"1950","journal-title":"Trans. Am. Math. Soc."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Cucker, F., and Zhou, D.X. (2007). Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press.","DOI":"10.1017\/CBO9780511618796"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"656","DOI":"10.1111\/sjos.12054","article-title":"A new regression model: Modal linear regression","volume":"41","author":"Yao","year":"2013","journal-title":"Scand. J. Stat."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1016\/j.acha.2016.04.004","article-title":"Kernel-based sparse regression with the correntropy-induced loss","volume":"44","author":"Chen","year":"2018","journal-title":"Appl. Comput. Harmon. Anal."},{"key":"ref_37","first-page":"3419","article-title":"Consistent selection of tuning parameters via variable selection stability","volume":"14","author":"Sun","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1007\/s10994-009-5104-z","article-title":"The generalization performance of ERM algorithm with strongly mixing observations","volume":"75","author":"Zou","year":"2009","journal-title":"Mach. Learn."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1007\/s10444-011-9238-8","article-title":"Concentration estimates for learning with unbounded sampling","volume":"38","author":"Guo","year":"2013","journal-title":"Adv. Comput. Math."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1016\/j.acha.2011.01.001","article-title":"Concentration estimates for learning with \u21131-regularizer and data dependent hypothesis spaces","volume":"31","author":"Shi","year":"2011","journal-title":"Appl. Comput. Harmon. Anal."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1016\/j.acha.2012.05.001","article-title":"Learning theory estimates for coefficient-based regularized regression","volume":"34","author":"Shi","year":"2013","journal-title":"Appl. Comput. Harmon. Anal."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1107","DOI":"10.1162\/NECO_a_00421","article-title":"Error analysis of coefficient-based regularized algorithm for density-level detection","volume":"25","author":"Chen","year":"2013","journal-title":"Neural Comput."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1328","DOI":"10.1109\/TNNLS.2016.2609441","article-title":"k-Times markov sampling for SVMC","volume":"29","author":"Zou","year":"2018","journal-title":"IEEE Trans. Neural Networks Learn. Syst."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"4166","DOI":"10.1109\/TNNLS.2017.2757140","article-title":"Learning with coefficient-based regularized regression on Markov resampling","volume":"29","author":"Li","year":"2018","journal-title":"IEEE Trans. Neural Networks Learn. Syst."},{"key":"ref_45","unstructured":"Steinwart, I., and Christmann, A. (2008). Support Vector Machines, Springer Science and Business Media."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1016\/j.jco.2006.06.007","article-title":"Multi-kernel regularized classifiers","volume":"23","author":"Wu","year":"2007","journal-title":"J. Complex."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"211","DOI":"10.3150\/10-BEJ267","article-title":"Estimating conditional quantiles with the help of the pinball loss","volume":"17","author":"Steinwart","year":"2011","journal-title":"Bernoulli"},{"key":"ref_48","first-page":"82","article-title":"\u21131-penalized quantile regression in high dimensional sparse models","volume":"39","author":"Belloni","year":"2009","journal-title":"Ann. Stat."},{"key":"ref_49","unstructured":"Kato, K. (2011). Group Lasso for high dimensional sparse quantile regression models. arXiv."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"781","DOI":"10.1214\/17-AOS1567","article-title":"Oracle inequalities for sparse additive quantile regression in reproducing kernel Hilbert space","volume":"46","author":"Lv","year":"2018","journal-title":"Ann. Stat."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"1354","DOI":"10.1109\/TCYB.2016.2544852","article-title":"Correntropy matching pursuit with application to robust digit and face recognition","volume":"47","author":"Wang","year":"2017","journal-title":"IEEE Trans. Cybern."},{"key":"ref_52","unstructured":"Rockafellar, R.T. (1997). Convex Analysis, Princeton Univ. Press."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/4\/403\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:45:49Z","timestamp":1760186749000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/4\/403"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,4,16]]},"references-count":52,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2019,4]]}},"alternative-id":["e21040403"],"URL":"https:\/\/doi.org\/10.3390\/e21040403","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2019,4,16]]}}}