{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:59:33Z","timestamp":1760234373800,"version":"build-2065373602"},"reference-count":18,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2021,5,11]],"date-time":"2021-05-11T00:00:00Z","timestamp":1620691200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>We develop Categorical Exploratory Data Analysis (CEDA) with mimicking to explore and exhibit the complexity of information content that is contained within any data matrix: categorical, discrete, or continuous. Such complexity is shown through visible and explainable serial multiscale structural dependency with heterogeneity. CEDA is developed upon all features\u2019 categorical nature via histogram and it is guided by all features\u2019 associative patterns (order-2 dependence) in a mutual conditional entropy matrix. Higher-order structural dependency of k(\u22653) features is exhibited through block patterns within heatmaps that are constructed by permuting contingency-kD-lattices of counts. By growing k, the resultant heatmap series contains global and large scales of structural dependency that constitute the data matrix\u2019s information content. When involving continuous features, the principal component analysis (PCA) extracts fine-scale information content from each block in the final heatmap. Our mimicking protocol coherently simulates this heatmap series by preserving global-to-fine scales structural dependency. Upon every step of mimicking process, each accepted simulated heatmap is subject to constraints with respect to all of the reliable observed categorical patterns. For reliability and robustness in sciences, CEDA with mimicking enhances data visualization by revealing deterministic and stochastic structures within each scale-specific structural dependency. For inferences in Machine Learning (ML) and Statistics, it clarifies, upon which scales, which covariate feature-groups have major-vs.-minor predictive powers on response features. For the social justice of Artificial Intelligence (AI) products, it checks whether a data matrix incompletely prescribes the targeted system.<\/jats:p>","DOI":"10.3390\/e23050594","type":"journal-article","created":{"date-parts":[[2021,5,11]],"date-time":"2021-05-11T10:20:36Z","timestamp":1620728436000},"page":"594","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Mimicking Complexity of Structured Data Matrix\u2019s Information Content: Categorical Exploratory Data Analysis"],"prefix":"10.3390","volume":"23","author":[{"given":"Fushing","family":"Hsieh","sequence":"first","affiliation":[{"name":"Department of Statistics, University of California at Davis, Davis, CA 95616, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8983-142X","authenticated-orcid":false,"given":"Elizabeth P.","family":"Chou","sequence":"additional","affiliation":[{"name":"Department of Statistics, National Chengchi University, Taibei 116, Taiwan"}]},{"given":"Ting-Li","family":"Chen","sequence":"additional","affiliation":[{"name":"Institute of Statistical Science, Academia Sinica, Taipei 115, Taiwan"}]}],"member":"1968","published-online":{"date-parts":[[2021,5,11]]},"reference":[{"key":"ref_1","unstructured":"Steinbeck, J. (1951). The chapter of March, 20, Easter. The Log From The Sea of Cortez, The Viking Press."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1126\/science.177.4047.393","article-title":"More is different","volume":"177","author":"Anderson","year":"1972","journal-title":"Science"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"745","DOI":"10.1080\/10618600.2017.1384734","article-title":"50 years of data science","volume":"26","author":"Donoho","year":"2017","journal-title":"J. Comput. Graph. Stat."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1111\/j.1751-5823.2003.tb00203.x","article-title":"A Bayesian formulation of exploratory data analysis and goodness-of-fit testing","volume":"71","author":"Gelman","year":"2003","journal-title":"Int. Stat. Rev."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Gelman, A., and Vehtari, A. (2021). What are the most important statistical ideas of the past 50 years?. arXiv.","DOI":"10.1080\/01621459.2021.1938081"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1214\/aoms\/1177704711","article-title":"The future of data analysis","volume":"33","author":"Tukey","year":"1962","journal-title":"Ann. Math. Statist."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1119\/1.1934921","article-title":"Effect of Spin and Speed on the Lateral Deflection (Curve) of a Baseball and the Magnus Effect for Smooth Spheres","volume":"27","author":"Briggs","year":"1959","journal-title":"Am. J. Phys."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"171026","DOI":"10.1098\/rsos.171026","article-title":"Complexity of Possibly-gapped Histogram and Analysis of Histogram (ANOHT)","volume":"5","author":"Fushing","year":"2018","journal-title":"R. Socity Open Sci."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Fushing, H., Liu, S.-Y., Hsieh, Y.-C., and McCowan, B. (2018). From patterned response dependency to structured covariate dependency: Categorical-pattern-matching. PLoS ONE.","DOI":"10.1371\/journal.pone.0198253"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Cox, D.R., and Hinkley, D.V. (1974). Theoretical Statistics, Chapman and Hall.","DOI":"10.1007\/978-1-4899-2887-0"},{"key":"ref_11","unstructured":"Tufte, E.R. (1983). The Visual Display of Quantitative Information, Graphics Press."},{"key":"ref_12","unstructured":"Wilkinson, L. (2005). The Grammar of Graphics, Springer. [2nd ed.]."},{"key":"ref_13","unstructured":"Li, M., and Vitanyi, P.M.B. (2009). An Introduction to Kolmogorov Complexity and Its Applications, Springer."},{"key":"ref_14","unstructured":"Chou, E.P.-T., McVey, C., Hsieh, Y.-C., Enriquez, S., and Fushing, H. (2020). Extreme-K categorical samples problem. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"7821","DOI":"10.1073\/pnas.122653799","article-title":"Community structure in social and biological networks","volume":"99","author":"Girvan","year":"2002","journal-title":"Proc. Natl. Acad. Sci. USA."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"041120","DOI":"10.1103\/PhysRevE.86.041120","article-title":"Multi-scale community geometry in network and its application","volume":"86","author":"Chen","year":"2012","journal-title":"Phys. Rev. E"},{"key":"ref_17","unstructured":"Fushing, H., and Chou, E.P. (2020). Categorical Exploratory Data Analysis: From Multiclass Classification and Response Manifold Analytics perspectives of baseball pitching dynamics. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1214\/aos\/1176344552","article-title":"Bootstrap methods: Another look at the jackknife","volume":"7","author":"Efron","year":"1979","journal-title":"Ann. Statist."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/5\/594\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:59:17Z","timestamp":1760162357000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/5\/594"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,11]]},"references-count":18,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2021,5]]}},"alternative-id":["e23050594"],"URL":"https:\/\/doi.org\/10.3390\/e23050594","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2021,5,11]]}}}