{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:52:23Z","timestamp":1760241143586,"version":"build-2065373602"},"reference-count":50,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2019,11,29]],"date-time":"2019-11-29T00:00:00Z","timestamp":1574985600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Health data are generally complex in type and small in sample size. Such domain-specific challenges make it difficult to capture information reliably and contribute further to the issue of generalization. To assist the analytics of healthcare datasets, we develop a feature selection method based on the concept of coverage adjusted standardized mutual information (CASMI). The main advantages of the proposed method are: (1) it selects features more efficiently with the help of an improved entropy estimator, particularly when the sample size is small; and (2) it automatically learns the number of features to be selected based on the information from sample data. Additionally, the proposed method handles feature redundancy from the perspective of joint-distribution. The proposed method focuses on non-ordinal data, while it works with numerical data with an appropriate binning method. A simulation study comparing the proposed method to six widely cited feature selection methods shows that the proposed method performs better when measured by the Information Recovery Ratio, particularly when the sample size is small.<\/jats:p>","DOI":"10.3390\/e21121179","type":"journal-article","created":{"date-parts":[[2019,11,29]],"date-time":"2019-11-29T10:58:21Z","timestamp":1575025101000},"page":"1179","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["CASMI\u2014An Entropic Feature Selection Method in Turing\u2019s Perspective"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9017-3006","authenticated-orcid":false,"given":"Jingyi","family":"Shi","sequence":"first","affiliation":[{"name":"Department of Mathematics and Statistics, Mississippi State University, Mississippi State, Starkville, MS 39762, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7527-758X","authenticated-orcid":false,"given":"Jialin","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Statistics, Mississippi State University, Mississippi State, Starkville, MS 39762, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yaorong","family":"Ge","sequence":"additional","affiliation":[{"name":"Department of Software and Information Systems, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,11,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"e38","DOI":"10.2196\/medinform.5359","article-title":"Challenges and opportunities of big data in health care: A systematic review","volume":"4","author":"Kruse","year":"2016","journal-title":"JMIR Med. Inform."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"3","DOI":"10.23876\/j.krcp.2017.36.1.3","article-title":"Medical big data: Promise and challenges","volume":"36","author":"Lee","year":"2017","journal-title":"Kidney Res. Clin. Pract."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/S0004-3702(97)00063-5","article-title":"Selection of relevant features and examples in machine learning","volume":"97","author":"Blum","year":"1997","journal-title":"Artif. Intell."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1016\/S0004-3702(97)00043-X","article-title":"Wrappers for feature subset selection","volume":"97","author":"Kohavi","year":"1997","journal-title":"Artif. Intell."},{"key":"ref_5","first-page":"94","article-title":"Feature selection: A data perspective","volume":"50","author":"Li","year":"2017","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1016\/j.jbi.2018.07.014","article-title":"Relief-based feature selection: introduction and review","volume":"85","author":"Urbanowicz","year":"2018","journal-title":"J. Biomed. Inform."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1016\/j.neucom.2017.11.077","article-title":"Feature selection in machine learning: A new perspective","volume":"300","author":"Cai","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Syst. Tech. J."},{"key":"ref_9","unstructured":"Duda, R.O., Hart, P.E., and Stork, D.G. (2012). Pattern Classification, John Wiley & Sons."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1023\/A:1025667309714","article-title":"Theoretical and empirical analysis of ReliefF and RReliefF","volume":"53","author":"Kononenko","year":"2003","journal-title":"Mach. Learn."},{"key":"ref_11","unstructured":"Nie, F., Xiang, S., Jia, Y., Zhang, C., and Yan, S. (2008, January 13\u201317). Trace Ratio Criterion for Feature Selection. Proceedings of the 23rd National Conference on Artificial Intelligence, Chicago, IL, USA."},{"key":"ref_12","unstructured":"Jordan, M.I., LeCun, Y., and Solla, S.A. (2006). Laplacian Score for Feature Selection. Advances in Neural Information Processing Systems, NIPS."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zhao, Z., and Liu, H. (2007, January 20\u201324). Spectral Feature Selection for Supervised and Unsupervised Learning. Proceedings of the 24th International Conference on Machine Learning, Corvalis, ON, USA.","DOI":"10.1145\/1273496.1273641"},{"key":"ref_14","first-page":"7","article-title":"SLEP: Sparse learning with efficient projections","volume":"6","author":"Liu","year":"2009","journal-title":"Arizona State Univ."},{"key":"ref_15","unstructured":"Nie, F., Huang, H., Cai, X., and Ding, C.H. (2010). Efficient and Robust Feature Selection via Joint 2, 1-norms Minimization. Advances in Neural Information Processing Systems, NIPS."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Cai, D., Zhang, C., and He, X. (2010, January 25\u201328). Unsupervised Feature Selection for Multi-Cluster Data. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA.","DOI":"10.1145\/1835804.1835848"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1080","DOI":"10.1109\/TCBB.2010.103","article-title":"Robust feature selection for microarray data based on multicriterion fusion","volume":"8","author":"Yang","year":"2011","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"ref_18","unstructured":"Li, Z., Yang, Y., Liu, J., Zhou, X., and Lu, H. (2012, January 22\u201326). Unsupervised Feature Selection Using Nonnegative Spectral Analysis. Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Toronto, ON, Canada."},{"key":"ref_19","unstructured":"Davis, J.C., and Sampson, R.J. (1986). Statistics and Data Analysis in Geology, Wiley."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B (Methodol.)"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Lewis, D.D. (1992, January 23\u201326). Feature Selection and Feature Extraction for Text Categorization. Proceedings of the Workshop on Speech and Natural Language. Association for Computational Linguistics, Harriman, NY, USA.","DOI":"10.3115\/1075527.1075574"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1109\/72.298224","article-title":"Using mutual information for selecting features in supervised neural net learning","volume":"5","author":"Battiti","year":"1994","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_23","unstructured":"Yang, H.H., and Moody, J. (2000). Data Visualization and Feature Selection: New Algorithms for Nongaussian Data. Advances in Neural Information Processing Systems, NIPS."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Vidal-Naquet, M., and Ullman, S. (2003, January 13\u201316). Object Recognition with Informative Features and Linear Classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nice, France.","DOI":"10.1109\/ICCV.2003.1238356"},{"key":"ref_25","first-page":"1531","article-title":"Fast binary feature selection with conditional mutual information","volume":"5","author":"Fleuret","year":"2004","journal-title":"J. Mach. Learn. Res."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1109\/TPAMI.2005.159","article-title":"Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy","volume":"27","author":"Peng","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Lin, D., and Tang, X. (2006, January 7\u201313). Conditional Infomax Learning: An Integrated Framework for Feature Extraction and Fusion. Proceedings of the European Conference on Computer Vision, Graz, Austria.","DOI":"10.1007\/11744023_6"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Meyer, P.E., and Bontempi, G. (2006, January 10\u201312). On the use of Variable Complementarity for Feature Selection in Cancer Classification. Proceedings of the Workshops on Applications of Evolutionary Computation, Budapest, Hungary.","DOI":"10.1007\/11732242_9"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"8520","DOI":"10.1016\/j.eswa.2015.07.007","article-title":"Feature selection using joint mutual information maximisation","volume":"42","author":"Bennasar","year":"2015","journal-title":"Expert Syst. Appl."},{"key":"ref_30","unstructured":"Liu, H., and Setiono, R. (1995, January 5\u20138). Chi2: Feature Selection and Discretization of Numeric Attributes. Proceedings of the 7th IEEE International Conference on Tools with Artificial Intelligence, Herndon, VA, USA."},{"key":"ref_31","first-page":"3","article-title":"Variabilita e mutabilita, Studi Economico-Giuridici della R","volume":"3","author":"Gini","year":"1912","journal-title":"Univ. Cagliari"},{"key":"ref_32","unstructured":"Hall, M.A., and Smith, L.A. (1999, January 1\u20135). Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper. Proceedings of the FLAIRS Conference, Orlando, FL, USA."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Harris, B. (1975). The Statistical Estimation of Entropy in the Non-Parametric Case, Wisconsin Univ-Madison Mathematics Research Center. Technical Report.","DOI":"10.21236\/ADA020217"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1191","DOI":"10.1162\/089976603321780272","article-title":"Estimation of entropy and mutual information","volume":"15","author":"Paninski","year":"2003","journal-title":"Neural Comput."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1093\/biomet\/40.3-4.237","article-title":"The population frequencies of species and the estimation of population parameters","volume":"40","author":"Good","year":"1953","journal-title":"Biometrika"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1214\/aoms\/1177729694","article-title":"On information and sufficiency","volume":"22","author":"Kullback","year":"1951","journal-title":"Ann. Math. Stat."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1016\/S0019-9958(78)90026-8","article-title":"A definition of conditional mutual information for arbitrary ensembles","volume":"38","author":"Wyner","year":"1978","journal-title":"Inf. Control"},{"key":"ref_38","unstructured":"Guiasu, S. (1977). Information Theory with Applications, McGraw-Hill."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhang, Z. (2016). Statistical Implications of Turing\u2019s Formula, John Wiley & Sons.","DOI":"10.1002\/9781119237150"},{"key":"ref_40","unstructured":"Ohannessian, M.I., and Dahleh, M.A. (2012, January 25\u201327). Rare Probability Estimation Under Regularly Varying Heavy Tails. Proceedings of the 25th Conference on Learning Theory (COLT 2012), Edinburgh, Scotland."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1368","DOI":"10.1162\/NECO_a_00266","article-title":"Entropy estimation in Turing\u2019s perspective","volume":"24","author":"Zhang","year":"2012","journal-title":"Neural Comput."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1515\/sagmb-2014-0047","article-title":"A mutual information estimator with exponentially decaying bias","volume":"14","author":"Zhang","year":"2015","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Dougherty, J., Kohavi, R., and Sahami, M. (1995). Supervised and Unsupervised Discretization of Continuous Features. Machine Learning Proceedings 1995, Elsevier.","DOI":"10.1016\/B978-1-55860-377-6.50032-3"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1093\/biomet\/63.3.581","article-title":"Inference and missing data","volume":"63","author":"Rubin","year":"1976","journal-title":"Biometrika"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1355","DOI":"10.1056\/NEJMsr1203730","article-title":"The prevention and treatment of missing data in clinical trials","volume":"367","author":"Little","year":"2012","journal-title":"N. Engl. J. Med."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"402","DOI":"10.4097\/kjae.2013.64.5.402","article-title":"The prevention and handling of the missing data","volume":"64","author":"Kang","year":"2013","journal-title":"Korean J. Anesthesiol."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Zhang, J., and Chen, C. (2018). On \u2019A mutual information estimator with exponentially decaying bias\u2019 by Zhang and Zheng. Stat. Appl. Genet. Mol. Biol., 17.","DOI":"10.1515\/sagmb-2018-0005"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1016\/j.neucom.2016.11.047","article-title":"Theoretical evaluation of feature selection methods based on mutual information","volume":"226","author":"Pascoal","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_49","unstructured":"Shi, J. (2019, November 01). CASMI Simulation R Codes. Available online: https:\/\/github.com\/JingyiShi\/CASMI\/blob\/master\/SimulationEvaluationUsingGroundTruth.R."},{"key":"ref_50","unstructured":"Shi, J. (2019, November 01). CASMI in R. Available online: https:\/\/github.com\/JingyiShi\/CASMI."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/12\/1179\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:38:51Z","timestamp":1760189931000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/12\/1179"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,29]]},"references-count":50,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2019,12]]}},"alternative-id":["e21121179"],"URL":"https:\/\/doi.org\/10.3390\/e21121179","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2019,11,29]]}}}