{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T22:36:38Z","timestamp":1780439798278,"version":"3.54.1"},"reference-count":50,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2024,4,30]],"date-time":"2024-04-30T00:00:00Z","timestamp":1714435200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)","doi-asserted-by":"publisher","award":["EXC 2075\u2013390740016"],"award-info":[{"award-number":["EXC 2075\u2013390740016"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)","doi-asserted-by":"publisher","award":["507884992"],"award-info":[{"award-number":["507884992"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Using information-theoretic quantities in practical applications with continuous data is often hindered by the fact that probability density functions need to be estimated in higher dimensions, which can become unreliable or even computationally unfeasible. To make these useful quantities more accessible, alternative approaches such as binned frequencies using histograms and k-nearest neighbors (k-NN) have been proposed. However, a systematic comparison of the applicability of these methods has been lacking. We wish to fill this gap by comparing kernel-density-based estimation (KDE) with these two alternatives in carefully designed synthetic test cases. Specifically, we wish to estimate the information-theoretic quantities: entropy, Kullback\u2013Leibler divergence, and mutual information, from sample data. As a reference, the results are compared to closed-form solutions or numerical integrals. We generate samples from distributions of various shapes in dimensions ranging from one to ten. We evaluate the estimators\u2019 performance as a function of sample size, distribution characteristics, and chosen hyperparameters. We further compare the required computation time and specific implementation challenges. Notably, k-NN estimation tends to outperform other methods, considering algorithmic implementation, computational efficiency, and estimation accuracy, especially with sufficient data. This study provides valuable insights into the strengths and limitations of the different estimation methods for information-theoretic quantities. It also highlights the significance of considering the characteristics of the data, as well as the targeted information-theoretic quantity when selecting an appropriate estimation technique. These findings will assist scientists and practitioners in choosing the most suitable method, considering their specific application and available data. We have collected the compared estimation methods in a ready-to-use open-source Python 3 toolbox and, thereby, hope to promote the use of information-theoretic quantities by researchers and practitioners to evaluate the information in data and models in various disciplines.<\/jats:p>","DOI":"10.3390\/e26050387","type":"journal-article","created":{"date-parts":[[2024,4,30]],"date-time":"2024-04-30T09:50:07Z","timestamp":1714470607000},"page":"387","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["On the Accurate Estimation of Information-Theoretic Quantities from Multi-Dimensional Sample Data"],"prefix":"10.3390","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-8990-3785","authenticated-orcid":false,"given":"Manuel","family":"\u00c1lvarez Chaves","sequence":"first","affiliation":[{"name":"Stuttgart Center for Simulation Science, Cluster of Excellence EXC 2075, University of Stuttgart, 70569 Stuttgart, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9855-2839","authenticated-orcid":false,"given":"Hoshin V.","family":"Gupta","sequence":"additional","affiliation":[{"name":"Hydrology and Atmospheric Sciences, The University of Arizona, Tucson, AZ 85721, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3454-8755","authenticated-orcid":false,"given":"Uwe","family":"Ehret","sequence":"additional","affiliation":[{"name":"Institute of Water and River Basin Management, Karlsruhe Institute of Technology (KIT), 76131 Karlsruhe, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2901-1603","authenticated-orcid":false,"given":"Anneli","family":"Guthke","sequence":"additional","affiliation":[{"name":"Stuttgart Center for Simulation Science, Cluster of Excellence EXC 2075, University of Stuttgart, 70569 Stuttgart, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2024,4,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Cover, T.M., and Thomas, J.A. (2006). Elements of Information Theory, Wiley. [2nd ed.].","DOI":"10.1002\/047174882X"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A Mathematical Theory of Communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Syst. Tech. J."},{"key":"ref_3","unstructured":"MacKay, D.J.C. (2003). Information Theory, Inference, and Learning Algorithms, Cambridge University Press."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1002\/2013EO050007","article-title":"Applying Information Theory in the Geosciences to Quantify Process Uncertainty, Feedback, Scale","volume":"94","author":"Ruddell","year":"2013","journal-title":"Eos Trans. Am. Geophys. Union"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Nowak, W., and Guthke, A. (2016). Entropy-Based Experimental Design for Optimal Model Discrimination in the Geosciences. Entropy, 18.","DOI":"10.3390\/e18110409"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Timme, N.M., and Lapish, C. (2018). A Tutorial for Information Theory in Neuroscience. eNeuro, 5.","DOI":"10.1523\/ENEURO.0052-18.2018"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"940","DOI":"10.1111\/joes.12226","article-title":"Information Theoretic Approaches in Economics","volume":"32","author":"Yang","year":"2018","journal-title":"J. Econ. Surv."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1007\/s10827-013-0458-4","article-title":"Synergy, redundancy, and multivariate information measures: An experimentalist\u2019s perspective","volume":"36","author":"Timme","year":"2014","journal-title":"J. Comput. Neurosci."},{"key":"ref_9","first-page":"17","article-title":"Nonparametric Entropy Estimation: An Overview","volume":"6","author":"Beirlant","year":"1997","journal-title":"Int. J. Math. Stat. Sci."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Gupta, H.V., Ehsani, M.R., Roy, T., Sans-Fuentes, M.A., Ehret, U., and Behrangi, A. (2021). Computing Accurate Probabilistic Estimates of One-D Entropy from Equiprobable Random Samples. Entropy, 23.","DOI":"10.3390\/e23060740"},{"key":"ref_11","unstructured":"Silverman, B.W. (1998). Density Estimation for Statistics and Data Analysis, Chapman & Hall\/CRC. Number 26 in Monographs on Statistics and Applied Probability."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1016\/S0169-7161(04)24009-3","article-title":"Multidimensional Density Estimation","volume":"Volume 24","author":"Scott","year":"2005","journal-title":"Handbook of Statistics"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"683","DOI":"10.1007\/BF00057735","article-title":"Estimation of entropy and other functionals of a multivariate density","volume":"41","author":"Joe","year":"1989","journal-title":"Ann. Inst. Stat. Math."},{"key":"ref_14","unstructured":"Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Springer. Information Science and Statistics (ISS)."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Bossomaier, T., Barnett, L., Harr\u00e9, M., and Lizier, J.T. (2016). An Introduction to Transfer Entropy, Springer International Publishing.","DOI":"10.1007\/978-3-319-43222-9"},{"key":"ref_16","unstructured":"Liu, H., Lafferty, J., and Wasserman, L. (2017, January 20\u201322). Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo. Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"4523","DOI":"10.5194\/hess-24-4523-2020","article-title":"Histogram via entropy reduction (HER): An information-theoretic alternative for geostatistics","volume":"24","author":"Thiesen","year":"2020","journal-title":"Hydrol. Earth Syst. Sci."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"e2021WR031164","DOI":"10.1029\/2021WR031164","article-title":"Source Relationships and Model Structures Determine Information Flow Paths in Ecohydrologic Models","volume":"58","author":"Goodwell","year":"2022","journal-title":"Water Resour. Res."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Kim, J., Kim, G., An, S., Kwon, Y.K., and Yoon, S. (2013). Entropy-Based Analysis and Bioinformatics-Inspired Integration of Global Economic Information Transfer. PLoS ONE, 8.","DOI":"10.1371\/journal.pone.0051986"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1142\/S201019451200788X","article-title":"EEG transfer entropy tracks changes in information transfer on the onset of vision","volume":"17","author":"Madulara","year":"2012","journal-title":"Int. J. Mod. Phys. Conf. Ser."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"066138","DOI":"10.1103\/PhysRevE.69.066138","article-title":"Estimating mutual information","volume":"69","author":"Kraskov","year":"2004","journal-title":"Phys. Rev. E"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"025006","DOI":"10.1088\/2632-2153\/acc444","article-title":"A robust estimator of mutual information for deep learning interpretability","volume":"4","author":"Piras","year":"2023","journal-title":"Mach. Learn. Sci. Technol."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"2392","DOI":"10.1109\/TIT.2009.2016060","article-title":"Divergence Estimation for Multidimensional Densities Via k-Nearest-Neighbor Distances","volume":"55","author":"Wang","year":"2009","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.1214\/aoms\/1177704472","article-title":"On Estimation of a Probability Density Function and Mode","volume":"33","author":"Parzen","year":"1962","journal-title":"Ann. Math. Stat."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"832","DOI":"10.1214\/aoms\/1177728190","article-title":"Remarks on Some Nonparametric Estimates of a Density Function","volume":"27","author":"Rosenblatt","year":"1956","journal-title":"Ann. Math. Stat."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2318","DOI":"10.1103\/PhysRevE.52.2318","article-title":"Estimation of mutual information using kernel density estimators","volume":"52","author":"Moon","year":"1995","journal-title":"Phys. Rev. E"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1093\/biomet\/66.3.605","article-title":"On optimal and data-based histograms","volume":"66","author":"Scott","year":"1979","journal-title":"Biometrika"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference, Springer. Springer Texts in Statistics.","DOI":"10.1007\/978-0-387-21736-9"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1007\/BF02603004","article-title":"Bin width selection in multivariate histograms by the combinatorial method","volume":"13","author":"Devroye","year":"2004","journal-title":"Test"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"044002","DOI":"10.7566\/JPSJ.88.044002","article-title":"Multidimensional Bin-Width Optimization for Histogram and Its Application to Four-Dimensional Neutron Inelastic Scattering Data","volume":"88","author":"Muto","year":"2019","journal-title":"J. Phys. Soc. Jpn."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1080\/01621459.1926.10502161","article-title":"The Choice of a Class Interval","volume":"21","author":"Sturges","year":"1926","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1007\/BF01025868","article-title":"On the histogram as a density estimator: L2 theory","volume":"57","author":"Freedman","year":"1981","journal-title":"Z. F\u00fcR Wahrscheinlichkeitstheorie Und Verwandte Geb."},{"key":"ref_33","unstructured":"Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, MIT Press. Adaptive Computation and Machine Learning."},{"key":"ref_34","first-page":"95","article-title":"A statistical estimate for the entropy of a random vector","volume":"23","author":"Kozachenko","year":"1987","journal-title":"Probl. Inf. Transm."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1016\/j.jspi.2017.01.004","article-title":"On the Kozachenko\u2013Leonenko entropy estimator","volume":"185","author":"Delattre","year":"2017","journal-title":"J. Stat. Plan. Inference"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"372","DOI":"10.1109\/TIT.1976.1055550","article-title":"A nonparametric estimation of the entropy for absolutely continuous distributions (Corresp.)","volume":"22","author":"Ahmad","year":"1976","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"5629","DOI":"10.1109\/TIT.2018.2807481","article-title":"Demystifying Fixed k-Nearest Neighbor Information Estimators","volume":"64","author":"Gao","year":"2018","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Piessens, R. (1983). QUADPACK: A Subroutine Package for Automatic Integration, Springer.","DOI":"10.1007\/978-3-642-61786-7"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1038\/s41592-019-0686-2","article-title":"SciPy 1.0: Fundamental algorithms for scientific computing in Python","volume":"17","author":"Virtanen","year":"2020","journal-title":"Nat. Methods"},{"key":"ref_40","unstructured":"Van Rossum, G., and Drake, F.L. (2009). Python 3 Reference Manual, CreateSpace."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/s41586-020-2649-2","article-title":"Array programming with NumPy","volume":"585","author":"Harris","year":"2020","journal-title":"Nature"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Huber, M.F., Bailey, T., Durrant-Whyte, H., and Hanebeck, U.D. (2008, January 20\u201322). On entropy approximation for Gaussian mixture random vectors. Proceedings of the 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, Seoul, Republic of Korea.","DOI":"10.1109\/MFI.2008.4648062"},{"key":"ref_43","first-page":"58152","article-title":"On the Properties of Kullback-Leibler Divergence between Multivariate Gaussian Distributions","volume":"Volume 36","author":"Oh","year":"2023","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Shiryayev, A.N. (1993). Selected Works of A. N. Kolmogorov: Volume III: Information Theory and the Theory of Algorithms, Springer. Mathematics and Its Applications.","DOI":"10.1007\/978-94-017-2973-4"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"709","DOI":"10.1109\/18.825848","article-title":"Entropy expressions for multivariate continuous distributions","volume":"46","author":"Darbellay","year":"2000","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1111\/j.1467-9469.2011.00774.x","article-title":"Shannon Entropy and Mutual Information for Multivariate Skew-Elliptical Distributions","volume":"40","author":"Genton","year":"2013","journal-title":"Scand. J. Stat."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"102581","DOI":"10.1016\/j.dsp.2019.102581","article-title":"Optimal data-based binning for histograms and histogram-based probability density models","volume":"95","author":"Knuth","year":"2019","journal-title":"Digit. Signal Process."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1080\/00031305.1997.10473591","article-title":"Data-Based Choice of Histogram Bin Width","volume":"51","author":"Wand","year":"1997","journal-title":"Am. Stat."},{"key":"ref_49","first-page":"9990","article-title":"Entropy Estimation via Normalizing Flow","volume":"36","author":"Ao","year":"2022","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_50","unstructured":"Belghazi, M.I., Baratin, A., Rajeshwar, S., Ozair, S., Bengio, Y., Courville, A., and Hjelm, D. (2018, January 10\u201315). Mutual Information Neural Estimation. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/26\/5\/387\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:37:15Z","timestamp":1760107035000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/26\/5\/387"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,30]]},"references-count":50,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2024,5]]}},"alternative-id":["e26050387"],"URL":"https:\/\/doi.org\/10.3390\/e26050387","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,30]]}}}