{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T23:41:12Z","timestamp":1768002072824,"version":"3.49.0"},"reference-count":28,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2012,11,12]],"date-time":"2012-11-12T00:00:00Z","timestamp":1352678400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Recent studies suggest that the minimum error entropy (MEE) criterion can outperform the traditional mean square error criterion in supervised machine learning, especially in nonlinear and non-Gaussian situations. In practice, however, one has to estimate the error entropy from the samples since in general the analytical evaluation of error entropy is not possible. By the Parzen windowing approach, the estimated error entropy converges asymptotically to the entropy of the error plus an independent random variable whose probability density function (PDF) corresponds to the kernel function in the Parzen method. This quantity of entropy is called the smoothed error entropy, and the corresponding optimality criterion is named the smoothed MEE (SMEE) criterion. In this paper, we study theoretically the SMEE criterion in supervised machine learning where the learning machine is assumed to be nonparametric and universal. Some basic properties are presented. In particular, we show that when the smoothing factor is very small, the smoothed error entropy equals approximately the true error entropy plus a scaled version of the Fisher information of error. We also investigate how the smoothing factor affects the optimal solution. In some special situations, the optimal solution under the SMEE criterion does not change with increasing smoothing factor. In general cases, when the smoothing factor tends to infinity, minimizing the smoothed error entropy will be approximately equivalent to minimizing error variance, regardless of the conditional PDF and the kernel.<\/jats:p>","DOI":"10.3390\/e14112311","type":"journal-article","created":{"date-parts":[[2012,11,12]],"date-time":"2012-11-12T11:37:24Z","timestamp":1352720244000},"page":"2311-2323","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["On the Smoothed Minimum Error Entropy Criterion"],"prefix":"10.3390","volume":"14","author":[{"given":"Badong","family":"Chen","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA"},{"name":"Department of Precision Instruments and Mechanology, Tsinghua University, Beijing, 100084, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jose","family":"Principe","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2012,11,12]]},"reference":[{"key":"ref_1","unstructured":"Cover, T.M., and Thomas, J.A. (1991). Element of Information Theory, John Wiley & Sons, Inc."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1109\/TIT.1970.1054444","article-title":"Entropy analysis of estimating systems","volume":"16","author":"Weidemann","year":"1970","journal-title":"IEEE Trans. Inform. Theor."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1016\/S0019-9958(76)90140-6","article-title":"An application of the information theory to estimation problems","volume":"32","author":"Tomita","year":"1976","journal-title":"Inf. Control"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1016\/0020-0255(94)90043-4","article-title":"Minimum entropy of error principle in estimation","volume":"79","author":"Janzura","year":"1994","journal-title":"Inf. Sci."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"937","DOI":"10.1016\/j.sigpro.2004.11.028","article-title":"Minimum-entropy estimation in semi-parametric models","volume":"85","author":"Wolsztynski","year":"2005","journal-title":"Signal Process."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"545","DOI":"10.1016\/j.jfranklin.2009.11.009","article-title":"On optimal estimations with minimum error entropy criterion","volume":"347","author":"Chen","year":"2010","journal-title":"J. Frankl. Inst. Eng. Appl. Math."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/0020-0255(79)90039-2","article-title":"Linear prediction, filtering and smoothing: An information theoretic approach","volume":"17","author":"Kalata","year":"1979","journal-title":"Inf. Sci."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"771","DOI":"10.1109\/9.587329","article-title":"Optimal state estimation for stochastic systems: An information theoretic approach","volume":"42","author":"Feng","year":"1997","journal-title":"IEEE Trans. Automat. Contr."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"695","DOI":"10.1109\/TAC.2006.872771","article-title":"Minimum entropy filtering for multivariate stochastic systems with non-Gaussian Noises","volume":"51","author":"Guo","year":"2006","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"83","DOI":"10.3109\/00207457209147016","article-title":"Learning and information theory","volume":"3","author":"Pfaffelhuber","year":"1972","journal-title":"Int. J. Neurosci."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1162\/neco.1989.1.3.295","article-title":"Unsupervised learning","volume":"1","author":"Barlow","year":"1989","journal-title":"Neural Comput."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1780","DOI":"10.1109\/TSP.2002.1011217","article-title":"An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems","volume":"50","author":"Erdogmus","year":"2002","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1035","DOI":"10.1109\/TNN.2002.1031936","article-title":"Generalized information potential criterion for adaptive system training","volume":"13","author":"Erdogmus","year":"2002","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1966","DOI":"10.1109\/TSP.2003.812843","article-title":"Convergence properties and data efficiency of the minimum error entropy criterion in Adaline training","volume":"51","author":"Erdogmus","year":"2003","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1109\/SP-M.2006.248709","article-title":"From linear adaptive filtering to nonlinear information processing\u2014The design and analysis of information processing systems","volume":"23","author":"Erdogmus","year":"2006","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1184","DOI":"10.1109\/78.995074","article-title":"Entropy minimization for supervised digital communications channel equalization","volume":"50","author":"Santamaria","year":"2002","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"941","DOI":"10.1007\/s00034-007-9004-9","article-title":"Stochastic gradient algorithm under (h,\u03c6)-entropy criterion","volume":"26","author":"Chen","year":"2007","journal-title":"Circuits Syst. Signal Process."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Principe, J.C. (2010). Information Theoretic Learning: Renyi's Entropy and Kernel Perspectives, Springer.","DOI":"10.1007\/978-1-4419-1570-2"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Vapnik, V. (1995). The Nature of Statistical Learning Theory, Springer.","DOI":"10.1007\/978-1-4757-2440-0"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Scholkopf, B., and Smola, A.J. (2002). Learning with Kernels, Support Vector Machines, Regularization, Optimization and Beyond, MIT Press.","DOI":"10.7551\/mitpress\/4175.001.0001"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Liu, W., and Principe, J.C. (2010). Kernel Adaptive Filtering: A Comprehensive Introduction, John Wiley & Sons, Inc.","DOI":"10.1002\/9780470608593"},{"key":"ref_22","unstructured":"Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis, Chapman & Hall."},{"key":"ref_23","first-page":"17","article-title":"Nonparametric entropy estimation: An overview","volume":"6","author":"Beirlant","year":"1997","journal-title":"Int. J. Math. Statist. Sci."},{"key":"ref_24","unstructured":"Linnik, Ju.V., and Ostrovskii, I.V. (1977). Decompositions of Random Variables and Vectors, American Mathematical Society."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1109\/TIT.2010.2090193","article-title":"Information theoretic proofs of entropy power inequalities","volume":"57","author":"Rioul","year":"2011","journal-title":"IEEE Trans. Inform. Theor."},{"key":"ref_26","unstructured":"Xu, J.-W., Erdogmus, D., and Principe, J.C. (2004, January 17\u201321). Minimizing Fisher information of the error in supervised adaptive filter training. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, Quebec, Canada."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"3166","DOI":"10.1109\/TIT.2008.924686","article-title":"On the minimum entropy of a mixture of unimodal and symmetric distributions","volume":"54","author":"Chen","year":"2008","journal-title":"IEEE Trans. Inf. Theor."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/S0167-7152(98)00013-3","article-title":"Simple proofs of two results on convolutions of unimodal distributions","volume":"39","author":"Purkayastha","year":"1998","journal-title":"Statist. Prob. Lett."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/14\/11\/2311\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T21:53:29Z","timestamp":1760219609000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/14\/11\/2311"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,11,12]]},"references-count":28,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2012,11]]}},"alternative-id":["e14112311"],"URL":"https:\/\/doi.org\/10.3390\/e14112311","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,11,12]]}}}