{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T08:08:36Z","timestamp":1762070916922,"version":"build-2065373602"},"reference-count":82,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2022,9,30]],"date-time":"2022-09-30T00:00:00Z","timestamp":1664496000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>The demands for machine learning and knowledge extraction methods have been booming due to the unprecedented surge in data volume and data quality. Nevertheless, challenges arise amid the emerging data complexity as significant chunks of information and knowledge lie within the non-ordinal realm of data. To address the challenges, researchers developed considerable machine learning and knowledge extraction methods regarding various domain-specific challenges. To characterize and extract information from non-ordinal data, all the developed methods pointed to the subject of Information Theory, established following Shannon\u2019s landmark paper in 1948. This article reviews recent developments in entropic statistics, including estimation of Shannon\u2019s entropy and its functionals (such as mutual information and Kullback\u2013Leibler divergence), concepts of entropic basis, generalized Shannon\u2019s entropy (and its functionals), and their estimations and potential applications in machine learning and knowledge extraction. With the knowledge of recent development in entropic statistics, researchers can customize existing machine learning and knowledge extraction methods for better performance or develop new approaches to address emerging domain-specific challenges.<\/jats:p>","DOI":"10.3390\/make4040044","type":"journal-article","created":{"date-parts":[[2022,10,9]],"date-time":"2022-10-09T01:43:11Z","timestamp":1665279791000},"page":"865-887","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Entropic Statistics: Concept, Estimation, and Application in Machine Learning and Knowledge Extraction"],"prefix":"10.3390","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7527-758X","authenticated-orcid":false,"given":"Jialin","family":"Zhang","sequence":"first","affiliation":[{"name":"Department of Mathematics and Statistics, Mississippi State University, Mississippi State, MS 39762, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,9,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Syst. Tech. J."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1214\/aoms\/1177729694","article-title":"On information and sufficiency","volume":"22","author":"Kullback","year":"1951","journal-title":"Ann. Math. Stat."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1080\/10485252.2016.1190357","article-title":"Entropic representation and estimation of diversity indices","volume":"28","author":"Zhang","year":"2016","journal-title":"J. Nonparametr. Stat."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"774","DOI":"10.1080\/10485252.2018.1482294","article-title":"Asymptotic normality for plug-in estimators of diversity indices on countable alphabets","volume":"30","author":"Grabchak","year":"2018","journal-title":"J. Nonparametr. Stat."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"158","DOI":"10.3390\/stats3020013","article-title":"Generalized Mutual Information","volume":"3","author":"Zhang","year":"2020","journal-title":"Stats"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Burnham, K.P., and Anderson, D.R. (1998). Model Selection and Inference, Springer.","DOI":"10.1007\/978-1-4757-2917-7"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1501","DOI":"10.1109\/18.104312","article-title":"Information theoretic inequalities","volume":"37","author":"Dembo","year":"1991","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Chatterjee, S., and Hadi, A.S. (2006). Regression Analysis by Example, John Wiley & Sons.","DOI":"10.1002\/0470055464"},{"key":"ref_9","first-page":"885","article-title":"What is an analysis of variance?","volume":"15","author":"Speed","year":"1987","journal-title":"Ann. Stat."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Hardy, M.A. (1993). Regression with Dummy Variables, Sage.","DOI":"10.4135\/9781412985628"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1093\/biomet\/70.1.163","article-title":"Information gain and a general measure of correlation","volume":"70","author":"Kent","year":"1983","journal-title":"Biometrika"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1109\/18.61115","article-title":"Divergence measures based on the Shannon entropy","volume":"37","author":"Lin","year":"1991","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"3797","DOI":"10.1109\/TIT.2014.2320500","article-title":"R\u00e9nyi divergence and Kullback-Leibler divergence","volume":"60","author":"Harremos","year":"2014","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1109\/TPAMI.1982.4767278","article-title":"Hierarchical classifier design using mutual information","volume":"4","author":"Sethi","year":"1982","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1109\/21.97458","article-title":"A survey of decision tree classifier methodology","volume":"21","author":"Safavian","year":"1991","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_16","first-page":"1","article-title":"Feature selection: A data perspective","volume":"50","author":"Li","year":"2017","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1137\/1104033","article-title":"On a statistical estimate for the entropy of a sequence of independent random variables","volume":"4","author":"Basharin","year":"1959","journal-title":"Theory Probab. Appl."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Harris, B. (1975). The Statistical Estimation of Entropy in the Non-Parametric Case, Wisconsin Univ-Madison Mathematics Research Center. Technical Report.","DOI":"10.21236\/ADA020217"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"2745","DOI":"10.1109\/TIT.2011.2179702","article-title":"A normal law for the plug-in estimator of entropy","volume":"58","author":"Zhang","year":"2012","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_20","unstructured":"Miller, G.A., and Madow, W.G. (1954). On the Maximum Likelihood Estimate of the Shannon-Weiner Measure of Information, Operational Applications Laboratory, Air Force Cambridge Research Center, Air Research and Development Command, Bolling Air Force Base."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"907","DOI":"10.2307\/1936227","article-title":"Jackknifing an index of diversity","volume":"58","author":"Zahl","year":"1977","journal-title":"Ecology"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Chen, C., Grabchak, M., Stewart, A., Zhang, J., and Zhang, Z. (2018). Normal Laws for Two Entropy Estimators on Infinite Alphabets. Entropy, 20.","DOI":"10.3390\/e20050371"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1002\/rsa.10019","article-title":"Convergence properties of functional estimates for discrete distributions","volume":"19","author":"Antos","year":"2001","journal-title":"Random Struct. Algorithms"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1191","DOI":"10.1162\/089976603321780272","article-title":"Estimation of entropy and mutual information","volume":"15","author":"Paninski","year":"2003","journal-title":"Neural Comput."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1368","DOI":"10.1162\/NECO_a_00266","article-title":"Entropy estimation in Turing\u2019s perspective","volume":"24","author":"Zhang","year":"2012","journal-title":"Neural Comput."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2097","DOI":"10.1162\/NECO_a_00775","article-title":"A note on entropy estimation","volume":"27","year":"2015","journal-title":"Neural Comput."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1109\/TIT.2012.2217393","article-title":"Asymptotic normality of an entropy estimator with exponentially decaying bias","volume":"59","author":"Zhang","year":"2013","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhang, Z. (2016). Statistical Implications of Turing\u2019s Formula, John Wiley & Sons.","DOI":"10.1002\/9781119237150"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1023\/A:1026096204727","article-title":"Nonparametric estimation of Shannon\u2019s index of diversity when there are unseen species in sample","volume":"10","author":"Chao","year":"2003","journal-title":"Environ. Ecol. Stat."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Nemenman, I., Shafee, F., and Bialek, W. (2001). Entropy and inference, revisited. arXiv.","DOI":"10.7551\/mitpress\/1120.003.0065"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1007\/s10260-005-0121-y","article-title":"Bayesian inference for categorical data analysis","volume":"14","author":"Agresti","year":"2005","journal-title":"Stat. Methods Appl."},{"key":"ref_32","first-page":"1469","article-title":"Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks","volume":"10","author":"Hausser","year":"2009","journal-title":"J. Mach. Learn. Res."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Shi, J., Zhang, J., and Ge, Y. (2019). CASMI\u2014An Entropic Feature Selection Method in Turing\u2019s Perspective. Entropy, 21.","DOI":"10.3390\/e21121179"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1515\/sagmb-2014-0047","article-title":"A mutual information estimator with exponentially decaying bias","volume":"14","author":"Zhang","year":"2015","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"20180005","DOI":"10.1515\/sagmb-2018-0005","article-title":"On \u201cA mutual information estimator with exponentially decaying bias\u201d by Zhang and Zheng","volume":"17","author":"Zhang","year":"2018","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"ref_36","unstructured":"Williams, P.L., and Beer, R.D. (2010). Nonnegative decomposition of multivariate information. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"2161","DOI":"10.3390\/e16042161","article-title":"Quantifying unique information","volume":"16","author":"Bertschinger","year":"2014","journal-title":"Entropy"},{"key":"ref_38","unstructured":"Griffith, V., and Koch, C. (2014). Guided Self-Organization: Inception, Springer."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Tax, T.M., Mediano, P.A., and Shanahan, M. (2017). The partial information decomposition of generative neural network models. Entropy, 19.","DOI":"10.3390\/e19090474"},{"key":"ref_40","unstructured":"Wollstadt, P., Schmitt, S., and Wibral, M. (2021). A rigorous information-theoretic definition of redundancy and relevancy in feature selection based on (partial) information decomposition. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"635","DOI":"10.1162\/003465305775098170","article-title":"A divergence statistic for industrial localization","volume":"87","author":"Mori","year":"2005","journal-title":"Rev. Econ. Stat."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"2392","DOI":"10.1109\/TIT.2009.2016060","article-title":"Divergence estimation for multidimensional densities via k-Nearest-Neighbor distances","volume":"55","author":"Wang","year":"2009","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"5847","DOI":"10.1109\/TIT.2010.2068870","article-title":"Estimating divergence functionals and the likelihood ratio by convex risk minimization","volume":"56","author":"Nguyen","year":"2010","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"2570","DOI":"10.1162\/NECO_a_00646","article-title":"Nonparametric estimation of K\u00fcllback-Leibler divergence","volume":"26","author":"Zhang","year":"2014","journal-title":"Neural Comput."},{"key":"ref_45","unstructured":"Press, W.H., and Teukolsky Saul, A. (1993). Numerical Recipes in Fortran: The Art of Scientific Computing, Cambridge University Press."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1023\/A:1022694001379","article-title":"A distance-based attribute selection measure for decision tree induction","volume":"6","year":"1991","journal-title":"Mach. Learn."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1109\/TSMC.1987.4309069","article-title":"Entropy and correlation: Some comments","volume":"17","author":"Kvalseth","year":"1987","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_48","first-page":"583","article-title":"Cluster ensembles\u2014A knowledge reuse framework for combining multiple partitions","volume":"3","author":"Strehl","year":"2002","journal-title":"J. Mach. Learn. Res."},{"key":"ref_49","unstructured":"Yao, Y. (2003). Entropy Measures, Maximum Entropy Principle and Emerging Applications, Springer."},{"key":"ref_50","first-page":"2837","article-title":"Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance","volume":"11","author":"Vinh","year":"2010","journal-title":"J. Mach. Learn. Res."},{"key":"ref_51","unstructured":"Zhang, Z., and Stewart, A.M. (2016). Estimation of Standardized Mutual Information, UNC Charlotte Technical Report. Technical Report."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"1731","DOI":"10.1016\/j.jspi.2009.12.023","article-title":"Re-parameterization of multinomial distributions and diversity indices","volume":"140","author":"Zhang","year":"2010","journal-title":"J. Stat. Plan. Inference"},{"key":"ref_53","unstructured":"Chen, C. (2019). Goodness-of-Fit Tests under Permutations. [Ph.D. Thesis, The University of North Carolina at Charlotte]."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"688","DOI":"10.1038\/163688a0","article-title":"Measurement of diversity","volume":"163","author":"Simpson","year":"1949","journal-title":"Nature"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"124","DOI":"10.2307\/2223319","article-title":"Measurement of inequality of incomes","volume":"31","author":"Gini","year":"1921","journal-title":"Econ. J."},{"key":"ref_56","unstructured":"R\u00e9nyi, A. (1961, January 1). On measures of entropy and information. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA."},{"key":"ref_57","unstructured":"Emlen, J.M. (1977). Ecology: An Evolutionary Approach, Addison-Wesley."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1080\/03610926.2018.1536786","article-title":"Estimation of population size in entropic perspective","volume":"49","author":"Zhang","year":"2020","journal-title":"Commun.-Stat.-Theory Methods"},{"key":"ref_59","unstructured":"Beck, C., and Sch\u00f6gl, F. (1995). Thermodynamics of Chaotic Systems, Cambridge University Press."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Zhang, J., and Shi, J. (2022). Asymptotic Normality for Plug-In Estimators of Generalized Shannon\u2019s Entropy. Entropy, 24.","DOI":"10.3390\/e24050683"},{"key":"ref_61","unstructured":"Zhang, J., and Zhang, Z. (2022). A Normal Test for Independence via Generalized Mutual Information. arXiv."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"6053","DOI":"10.1109\/TIT.2016.2604842","article-title":"Estimating the directed information and testing for causality","volume":"62","author":"Kontoyiannis","year":"2016","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Huang, N., Lu, G., Cai, G., Xu, D., Xu, J., Li, F., and Zhang, L. (2016). Feature selection of power quality disturbance signals with an entropy-importance-based random forest. Entropy, 18.","DOI":"10.3390\/e18020044"},{"key":"ref_64","first-page":"27","article-title":"Conditional likelihood maximisation: A unifying framework for information theoretic feature selection","volume":"13","author":"Brown","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Lewis, D.D. (1992, January 23\u201326). Feature selection and feature extraction for text categorization. Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, NY, USA.","DOI":"10.3115\/1075527.1075574"},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1109\/72.298224","article-title":"Using mutual information for selecting features in supervised neural net learning","volume":"5","author":"Battiti","year":"1994","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_67","unstructured":"Yang, H., and Moody, J. (1999, January 22\u201325). Feature selection based on joint mutual information. Proceedings of the International ICSC Symposium on Advances in Intelligent Data Analysis, Rochester, NY, USA."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"682","DOI":"10.1038\/nn870","article-title":"Visual features of intermediate complexity and their use in classification","volume":"5","author":"Ullman","year":"2002","journal-title":"Nat. Neurosci."},{"key":"ref_69","first-page":"1205","article-title":"Efficient feature selection via analysis of relevance and redundancy","volume":"5","author":"Yu","year":"2004","journal-title":"J. Mach. Learn. Res."},{"key":"ref_70","unstructured":"Tesmer, M., and Est\u00e9vez, P.A. (2004, January 25\u201329). AMIFS: Adaptive feature selection by using mutual information. Proceedings of the 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541), Budapest, Hungary."},{"key":"ref_71","first-page":"1531","article-title":"Fast binary feature selection with conditional mutual information","volume":"5","author":"Fleuret","year":"2004","journal-title":"J. Mach. Learn. Res."},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1109\/TPAMI.2005.159","article-title":"Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy","volume":"27","author":"Peng","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_73","unstructured":"Jakulin, A. (2005). Machine Learning Based on Attribute Interactions. [Ph.D. Thesis, Univerza v Ljubljani]."},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Lin, D., and Tang, X. (2006, January 7\u201313). Conditional infomax learning: An integrated framework for feature extraction and fusion. Proceedings of the European Conference on Computer Vision, Graz, Austria.","DOI":"10.1007\/11744023_6"},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Meyer, P.E., and Bontempi, G. (2006, January 10\u201312). On the use of variable complementarity for feature selection in cancer classification. Proceedings of the Workshops on Applications of Evolutionary Computation, Budapest, Hungary.","DOI":"10.1007\/11732242_9"},{"key":"ref_76","first-page":"116","article-title":"A powerful feature selection approach based on mutual information","volume":"8","author":"Aboutajdine","year":"2008","journal-title":"Int. J. Comput. Sci. Netw. Secur."},{"key":"ref_77","first-page":"36","article-title":"Gait feature subset selection by mutual information","volume":"39","author":"Guo","year":"2008","journal-title":"IEEE Trans. Syst. Man-Cybern.-Part Syst. Hum."},{"key":"ref_78","doi-asserted-by":"crossref","first-page":"210","DOI":"10.4218\/etrij.11.0110.0237","article-title":"Conditional Mutual Information-Based Feature Selection Analyzing for Synergy and Redundancy","volume":"33","author":"Cheng","year":"2011","journal-title":"Etri J."},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Singhal, A., and Sharma, D. (2021, January 19\u201320). Keyword extraction using Renyi entropy: A statistical and domain independent method. Proceedings of the 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India.","DOI":"10.1109\/ICACCS51430.2021.9441909"},{"key":"ref_80","unstructured":"(2022, September 27). R Package Entropy. Available online: https:\/\/cran.r-project.org\/web\/packages\/entropy\/index.html."},{"key":"ref_81","unstructured":"(2022, September 27). R Package Bootstrap. Available online: https:\/\/cran.r-project.org\/web\/packages\/bootstrap\/index.html."},{"key":"ref_82","unstructured":"(2022, September 27). R Package EntropyEstimation. Available online: https:\/\/cran.r-project.org\/web\/packages\/EntropyEstimation\/index.html."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/4\/4\/44\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:44:44Z","timestamp":1760143484000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/4\/4\/44"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,30]]},"references-count":82,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["make4040044"],"URL":"https:\/\/doi.org\/10.3390\/make4040044","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2022,9,30]]}}}