{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T07:11:23Z","timestamp":1765177883611,"version":"build-2065373602"},"reference-count":26,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2018,8,13]],"date-time":"2018-08-13T00:00:00Z","timestamp":1534118400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100009133","name":"Karlsruher Institut f\u00fcr Technologie","doi-asserted-by":"publisher","award":["KIT publication fund"],"award-info":[{"award-number":["KIT publication fund"]}],"id":[{"id":"10.13039\/100009133","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>When constructing discrete (binned) distributions from samples of a data set, applications exist where it is desirable to assure that all bins of the sample distribution have nonzero probability. For example, if the sample distribution is part of a predictive model for which we require returning a response for the entire codomain, or if we use Kullback\u2013Leibler divergence to measure the (dis-)agreement of the sample distribution and the original distribution of the variable, which, in the described case, is inconveniently infinite. Several sample-based distribution estimators exist which assure nonzero bin probability, such as adding one counter to each zero-probability bin of the sample histogram, adding a small probability to the sample pdf, smoothing methods such as Kernel-density smoothing, or Bayesian approaches based on the Dirichlet and Multinomial distribution. Here, we suggest and test an approach based on the Clopper\u2013Pearson method, which makes use of the binominal distribution. Based on the sample distribution, confidence intervals for bin-occupation probability are calculated. The mean of each confidence interval is a strictly positive estimator of the true bin-occupation probability and is convergent with increasing sample size. For small samples, it converges towards a uniform distribution, i.e., the method effectively applies a maximum entropy approach. We apply this nonzero method and four alternative sample-based distribution estimators to a range of typical distributions (uniform, Dirac, normal, multimodal, and irregular) and measure the effect with Kullback\u2013Leibler divergence. While the performance of each method strongly depends on the distribution type it is applied to, on average, and especially for small sample sizes, the nonzero, the simple \u201cadd one counter\u201d, and the Bayesian Dirichlet-multinomial model show very similar behavior and perform best. We conclude that, when estimating distributions without an a priori idea of their shape, applying one of these methods is favorable.<\/jats:p>","DOI":"10.3390\/e20080601","type":"journal-article","created":{"date-parts":[[2018,8,13]],"date-time":"2018-08-13T11:27:13Z","timestamp":1534159633000},"page":"601","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["A Maximum-Entropy Method to Estimate Discrete Distributions from Samples Ensuring Nonzero Probabilities"],"prefix":"10.3390","volume":"20","author":[{"given":"Paul","family":"Darscheid","sequence":"first","affiliation":[{"name":"Institute of Water Resources and River Basin Management, Karlsruhe Institute of Technology\u2014KIT, 76131 Karlsruhe, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2901-1603","authenticated-orcid":false,"given":"Anneli","family":"Guthke","sequence":"additional","affiliation":[{"name":"Institute for Modelling Hydraulic and Environmental Systems (IWS), University of Stuttgart, 70569 Stuttgart, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3454-8755","authenticated-orcid":false,"given":"Uwe","family":"Ehret","sequence":"additional","affiliation":[{"name":"Institute of Water Resources and River Basin Management, Karlsruhe Institute of Technology\u2014KIT, 76131 Karlsruhe, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2018,8,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1214\/aoms\/1177729694","article-title":"On Information and Sufficiency","volume":"22","author":"Kullback","year":"1951","journal-title":"Ann. Math. Stat."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"380","DOI":"10.1061\/(ASCE)0733-9399(2002)128:4(380)","article-title":"Bayesian Updating of Structural Models and Reliability using Markov Chain Monte Carlo Simulation","volume":"128","author":"Beck","year":"2002","journal-title":"J. Eng. Mech."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/S0167-4730(02)00047-4","article-title":"Important sampling in high dimensions","volume":"25","author":"Au","year":"2003","journal-title":"Struct. Saf."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Kavetski, D., Kuczera, G., and Franks, S.W. (2006). Bayesian analysis of input uncertainty in hydrological modeling: 1. Theory. Water Resour. Res., 42.","DOI":"10.1029\/2005WR004368"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/02626667.2014.983516","article-title":"Robust informational entropy-based descriptors of flow in catchment hydrology","volume":"61","author":"Pechlivanidis","year":"2016","journal-title":"Hydrol. Sci. J."},{"key":"ref_6","unstructured":"Knuth, K.H. (arXiv, 2013). Optimal Data-Based Binning for Histograms, arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1134","DOI":"10.1103\/PhysRevA.33.1134","article-title":"Independent coordinates for strange attractors from mutual information","volume":"33","author":"Fraser","year":"1986","journal-title":"Phys. Rev. A"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1315","DOI":"10.1109\/18.761290","article-title":"Estimation of the information by an adaptive partitioning of the observation space","volume":"45","author":"Darbellay","year":"1999","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"066138","DOI":"10.1103\/PhysRevE.69.066138","article-title":"Estimating mutual information","volume":"69","author":"Kraskov","year":"2004","journal-title":"Phys. Rev. E"},{"key":"ref_10","first-page":"423","article-title":"Nonlinear Kernel Density Estimation for Binned Data: Convergence in Entropy","volume":"8","author":"Blower","year":"2002","journal-title":"Bernoulli"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Simonoff, J.S. (1996). Smoothing Methods in Statistics, Springer.","DOI":"10.1007\/978-1-4612-4026-6"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1093\/biomet\/40.3-4.237","article-title":"The Population Frequencies of Species and the Estimation of Population Parameters","volume":"40","author":"Good","year":"1953","journal-title":"Biometrika"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"783","DOI":"10.2307\/3315916","article-title":"Confidence curves and improved exact confidence intervals for discrete distributions","volume":"28","author":"Blaker","year":"2000","journal-title":"Can. J. Stat."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"963","DOI":"10.1111\/j.0006-341X.2001.00963.x","article-title":"On Small-Sample Confidence Intervals for Parameters in Discrete Distributions","volume":"57","author":"Agresti","year":"2001","journal-title":"Biometrics"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"809","DOI":"10.1002\/sim.4780120902","article-title":"Confidence intervals for a binomial proportion","volume":"12","author":"Vollset","year":"1993","journal-title":"Stat. Med."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"577","DOI":"10.1080\/01621459.1995.10476550","article-title":"Bayesian Density Estimation and Inference Using Mixtures","volume":"90","author":"Escobar","year":"1995","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1016\/j.csda.2009.11.002","article-title":"Bayesian density estimation and model selection using nonparametric hierarchical mixtures","volume":"54","author":"Argiento","year":"2010","journal-title":"Comput. Stat. Data Anal."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1080\/00031305.1996.10473544","article-title":"A Comparison of Approximate Interval Estimators for the Bernoulli Parameter","volume":"50","author":"Leemis","year":"1996","journal-title":"Am. Stat."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"404","DOI":"10.1093\/biomet\/26.4.404","article-title":"The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial","volume":"26","author":"Clopper","year":"1934","journal-title":"Biometrika"},{"key":"ref_20","unstructured":"Larson, H.J. (1982). Introduction to Probability Theory and Statistical Inference, Wiley."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Bickel, P.J., and Doksum, K.A. (2015). Mathematical Statistics, CRC Press.","DOI":"10.1201\/b19822"},{"key":"ref_22","unstructured":"Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Springer."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1214\/ss\/1009213286","article-title":"Interval estimation for a binomial proportion","volume":"16","author":"Brown","year":"2001","journal-title":"Stat. Sci."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"843","DOI":"10.1080\/01621459.1986.10478343","article-title":"Approximate binomial confidence limits","volume":"81","author":"Blyth","year":"1986","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Johnson, N.L., Kemp, A.W., and Kotz, S. (2005). Univariate Discrete Distributions, Wiley.","DOI":"10.1002\/0471715816"},{"key":"ref_26","unstructured":"Olver, F.W., Lozier, D.W., Boisvert, R.F., and Clark, C.W. (2010). NIST Handbook of Mathematical Functions, Cambridge University Press. [1st ed.]."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/20\/8\/601\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:18:30Z","timestamp":1760195910000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/20\/8\/601"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,8,13]]},"references-count":26,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2018,8]]}},"alternative-id":["e20080601"],"URL":"https:\/\/doi.org\/10.3390\/e20080601","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2018,8,13]]}}}