{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:08:41Z","timestamp":1760148521348,"version":"build-2065373602"},"reference-count":47,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,5,4]],"date-time":"2023-05-04T00:00:00Z","timestamp":1683158400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Science and Technology Council Taiwan","award":["110-2410-H-001-046"],"award-info":[{"award-number":["110-2410-H-001-046"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>This paper proposed the use of mutual information (MI) decomposition as a novel approach to identifying indispensable variables and their interactions for contingency table analysis. The MI analysis identified subsets of associative variables based on multinomial distributions and validated parsimonious log-linear and logistic models. The proposed approach was assessed using two real-world datasets dealing with ischemic stroke (with 6 risk factors) and banking credit (with 21 discrete attributes in a sparse table). This paper also provided an empirical comparison of MI analysis versus two state-of-the-art methods in terms of variable and model selections. The proposed MI analysis scheme can be used in the construction of parsimonious log-linear and logistic models with a concise interpretation of discrete multivariate data.<\/jats:p>","DOI":"10.3390\/e25050750","type":"journal-article","created":{"date-parts":[[2023,5,4]],"date-time":"2023-05-04T02:03:18Z","timestamp":1683165798000},"page":"750","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Modeling Categorical Variables by Mutual Information Decomposition"],"prefix":"10.3390","volume":"25","author":[{"given":"Jiun-Wei","family":"Liou","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering, Ming Chi University of Technology, New Taipei City 243, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0590-7977","authenticated-orcid":false,"given":"Michelle","family":"Liou","sequence":"additional","affiliation":[{"name":"Institute of Statistical Science, Academia Sinica, Taipei 115, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Philip E.","family":"Cheng","sequence":"additional","affiliation":[{"name":"Institute of Statistical Science, Academia Sinica, Taipei 115, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,5,4]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1080\/00401706.1970.10488635","article-title":"Ridge regression: Applications to nonorthogonal problems","volume":"12","author":"Hoerl","year":"1970","journal-title":"Technometrics"},{"key":"ref_2","first-page":"661","article-title":"Some comments on Cp","volume":"15","author":"Mallows","year":"1973","journal-title":"Technometrics"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"716","DOI":"10.1109\/TAC.1974.1100705","article-title":"A new look at the statistical model identification","volume":"19","author":"Akaike","year":"1974","journal-title":"IEEE Trans. Autom. Control."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1214\/aos\/1176344136","article-title":"Estimating the dimension of a model","volume":"6","author":"Schwarz","year":"1978","journal-title":"Ann. Stat."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1080\/00401706.1995.10484371","article-title":"Better subset regression using the nonnegative garrote","volume":"37","author":"Breiman","year":"1995","journal-title":"Technometrics"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1177\/0049124104268644","article-title":"Multimodal inference understanding AIC and BIC in model selection","volume":"33","author":"Burnham","year":"2004","journal-title":"Social. Methods Res."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Fahrmeir, L., and Tutz, G. (1994). Multivariate Statistical Modeling Based on Generalized Linear Models, Springer.","DOI":"10.1007\/978-1-4899-0010-4"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1007\/BF02932566","article-title":"Finite sample selection criteria for multinomial models","volume":"27","author":"Linhart","year":"1986","journal-title":"Stat. Hefte"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1080\/03610927808827599","article-title":"Further analysts of the data by Akaike\u2019s information criterion and the finite corrections: Further analysts of the data by Akaike\u2019s","volume":"7","author":"Sugiura","year":"1978","journal-title":"Commun. Stat.-Theory Methods"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Morozova, O., Levina, O., Uuskula, A., and Heimer, R. (2015). Comparison of subset selection methods in linear regression in the context of health-related quality of life and substance abuse in Russia. BMC Med. Res. Methodol., 15.","DOI":"10.1186\/s12874-015-0066-2"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1007\/s10654-009-9411-2","article-title":"Variable selection: Current practice in epidemiological studies","volume":"24","author":"Walter","year":"2009","journal-title":"Eur. J. Epidemiol."},{"key":"ref_12","first-page":"2313","article-title":"The Dantzig selector: Statistical estimation when p is much larger than n","volume":"35","author":"Candes","year":"2007","journal-title":"Ann. Stat."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1348","DOI":"10.1198\/016214501753382273","article-title":"Variable selection via nonconcave penalized likelihood and its oracle properties","volume":"96","author":"Fan","year":"2001","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1436","DOI":"10.1214\/009053606000000281","article-title":"High-dimensional graphs and variable selection with the lasso","volume":"34","author":"Meinshausen","year":"2006","journal-title":"Ann. Stat."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1418","DOI":"10.1198\/016214506000000735","article-title":"The adaptive lasso and its oracle properties","volume":"101","author":"Zou","year":"2006","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"697","DOI":"10.1007\/s11222-019-09914-9","article-title":"High-dimensional regression in practice: An empirical study of finite-sample prediction, variable selection and ranking","volume":"30","author":"Wang","year":"2020","journal-title":"Stat. Comput."},{"key":"ref_18","unstructured":"Bishop, Y.M., Fienberg, S.E., and Holland, P.W. (2007). Discrete Multivariate Analysis: Theory and Practice, Springer."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Christensen, R. (1990). Log-Linear Models and Logistic Regression, Springer.","DOI":"10.1007\/978-1-4757-4111-7"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1080\/01621459.1970.10481076","article-title":"The multivariate analysis of qualitative data: Interactions among multiple classifications","volume":"65","author":"Goodman","year":"1970","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"McCullagh, P., and Nelder, J.A. (1989). Generalized Linear Models, Chapman and Hall.","DOI":"10.1007\/978-1-4899-3242-6"},{"key":"ref_22","first-page":"370","article-title":"Generalized Linear Models","volume":"135","author":"Nelder","year":"1972","journal-title":"Encycl. Stat. Sci."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"297","DOI":"10.6339\/JDS.2007.05(3).442","article-title":"Linear information models: An introduction","volume":"5","author":"Cheng","year":"2007","journal-title":"J. Data Sci."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1214\/aoms\/1177729694","article-title":"On information and sufficiency","volume":"22","author":"Kullback","year":"1951","journal-title":"Ann. Math. Stat."},{"key":"ref_25","first-page":"535","article-title":"Information identities and testing hypotheses: Power analysis for contingency tables","volume":"18","author":"Cheng","year":"2008","journal-title":"Stat. Sin."},{"key":"ref_26","unstructured":"Breslow, N.E., Day, N.E., and Heseltine, E. (1980). Statistical methods in cancer research, The Analysis of Case-Control Studies."},{"key":"ref_27","first-page":"719","article-title":"Statistical aspects of the analysis of data from retrospective studies of disease","volume":"22","author":"Mantel","year":"1959","journal-title":"J. Natl. Cancer Inst."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"740","DOI":"10.1198\/jasa.2010.tm09061","article-title":"Likelihood ratio tests with three-way tables","volume":"105","author":"Cheng","year":"2010","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Lehmann, E.L., Romano, J.P., and Casella, G. (1986). Testing Statistical Hypotheses, Wiley.","DOI":"10.1007\/978-1-4757-1923-9"},{"key":"ref_30","unstructured":"Agresti, A. (2013). Categorical Data Analysis, Wiley."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Everitt, B.S. (1992). The Analysis of Contingency Tables, Chapman and Hall.","DOI":"10.1201\/b15072"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Kateri, M. (2014). Contingency Table Analysis, Methods and Implementation Using R, Birkhauser.","DOI":"10.1007\/978-0-8176-4811-4"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Cover, T.M., and Thomas, J.A. (2006). Elements of Information Theory, Wiley.","DOI":"10.1002\/047174882X"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"387","DOI":"10.6339\/JDS.2006.04(4).369","article-title":"Data information in contingency tables: A fallacy of hierarchical loglinear models","volume":"4","author":"Cheng","year":"2006","journal-title":"J. Data Sci."},{"key":"ref_35","first-page":"115","article-title":"Multidimensional contingency tables","volume":"1","author":"Anderson","year":"1974","journal-title":"Scand. J. Stat."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"522","DOI":"10.1214\/aos\/1176345006","article-title":"Markov fields and log-linear interaction models for contingency tables","volume":"8","author":"Darroch","year":"1980","journal-title":"Ann. Stat."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"e2311","DOI":"10.1097\/MD.0000000000002311","article-title":"Middle Cerebral Artery Calcification: Association with ischemic stroke","volume":"94","author":"Kao","year":"2015","journal-title":"Medicine"},{"key":"ref_38","unstructured":"Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics, Wiley."},{"key":"ref_39","first-page":"357","article-title":"Application of the logistic function to bio-assay","volume":"39","author":"Berkson","year":"1944","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_40","unstructured":"Fahrmeir, L., Hamerle, A., and Tutz, G. (1984). Multivariate Statistische Verfahren [Multivariate Statistical Analyses], Walter de Grnyter. (In German)."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1111\/j.1467-9868.2008.00674.x","article-title":"Sure independence screening for ultrahigh dimensional feature space","volume":"70","author":"Fan","year":"2008","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"3567","DOI":"10.1214\/10-AOS798","article-title":"Sure independence screening in generalized linear models with NP-dimensionality","volume":"38","author":"Fan","year":"2010","journal-title":"Ann. Stat."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v083.i02","article-title":"SIS: An R package for sure independence screening in ultrahigh dimensional statistical models","volume":"83","author":"Saldana","year":"2018","journal-title":"J. Stat. Softw."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Zou","year":"2005","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1080\/07350015.2013.863158","article-title":"Feature screening for ultrahigh dimensional categorical data with applications","volume":"32","author":"Huang","year":"2014","journal-title":"J. Bus. Econ. Stat."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1335","DOI":"10.1214\/09-AOAS265","article-title":"Discovering influential variables: A method of partitions","volume":"3","author":"Chernoff","year":"2009","journal-title":"Ann. Appl. Stat."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"13892","DOI":"10.1073\/pnas.1518285112","article-title":"Why significant variables aren\u2019t automatically good predictors","volume":"112","author":"Lo","year":"2015","journal-title":"Proc. Natl. Acad. Sci. USA"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/5\/750\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:28:51Z","timestamp":1760124531000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/5\/750"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,4]]},"references-count":47,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,5]]}},"alternative-id":["e25050750"],"URL":"https:\/\/doi.org\/10.3390\/e25050750","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2023,5,4]]}}}