{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,24]],"date-time":"2025-10-24T08:25:27Z","timestamp":1761294327415,"version":"build-2065373602"},"reference-count":51,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2021,6,18]],"date-time":"2021-06-18T00:00:00Z","timestamp":1623974400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Learning and making inference from a finite set of samples are among the fundamental problems in science. In most popular applications, the paradigmatic approach is to seek a model that best explains the data. This approach has many desirable properties when the number of samples is large. However, in many practical setups, data acquisition is costly and only a limited number of samples is available. In this work, we study an alternative approach for this challenging setup. Our framework suggests that the role of the train-set is not to provide a single estimated model, which may be inaccurate due to the limited number of samples. Instead, we define a class of \u201creasonable\u201d models. Then, the worst-case performance in the class is controlled by a minimax estimator with respect to it. Further, we introduce a robust estimation scheme that provides minimax guarantees, also for the case where the true model is not a member of the model class. Our results draw important connections to universal prediction, the redundancy-capacity theorem, and channel capacity theory. We demonstrate our suggested scheme in different setups, showing a significant improvement in worst-case performance over currently known alternatives.<\/jats:p>","DOI":"10.3390\/e23060773","type":"journal-article","created":{"date-parts":[[2021,6,18]],"date-time":"2021-06-18T11:19:20Z","timestamp":1624015160000},"page":"773","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Robust Universal Inference"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5899-5608","authenticated-orcid":false,"given":"Amichai","family":"Painsky","sequence":"first","affiliation":[{"name":"The Industrial Engineering Department, Tel Aviv University, Tel Aviv 6997801, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1290-0482","authenticated-orcid":false,"given":"Meir","family":"Feder","sequence":"additional","affiliation":[{"name":"The School of Electrical Engineering, Tel Aviv University, Tel Aviv 6997801, Israel"}]}],"member":"1968","published-online":{"date-parts":[[2021,6,18]]},"reference":[{"unstructured":"Vapnik, V. (2013). The Nature of Statistical Learning Theory, Springer Science & Business Media.","key":"ref_1"},{"unstructured":"Lehmann, E.L., and Casella, G. (2006). Theory of Point Estimation, Springer Science & Business Media.","key":"ref_2"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1135","DOI":"10.1214\/aos\/1176345632","article-title":"Estimation of the mean of a multivariate normal distribution","volume":"9","author":"Stein","year":"1981","journal-title":"Ann. Stat."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1031","DOI":"10.1214\/aoms\/1177703262","article-title":"Uniform approximation of minimax point estimates","volume":"35","author":"Ghosh","year":"1964","journal-title":"Ann. Math. Stat."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1416","DOI":"10.1214\/aos\/1176347758","article-title":"Minimax risk over hyperrectangles, and implications","volume":"18","author":"Donoho","year":"1990","journal-title":"Ann. Stat."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1301","DOI":"10.1214\/aos\/1176345646","article-title":"Minimax estimation of the mean of a normal distribution when the parameter space is restricted","volume":"9","author":"Bickel","year":"1981","journal-title":"Ann. Stat."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1016\/S0167-7152(02)00089-5","article-title":"On the minimax estimator of a bounded normal mean","volume":"58","author":"Marchand","year":"2002","journal-title":"Stat. Probab. Lett."},{"key":"ref_8","first-page":"555","article-title":"A robust minimax approach to classification","volume":"3","author":"Lanckriet","year":"2002","journal-title":"J. Mach. Learn. Res."},{"unstructured":"Eban, E., Mezuman, E., and Globerson, A. (2014, January 21\u201326). Discrete chebyshev classifiers. Proceedings of the International Conference on Machine Learning, Beijing, China.","key":"ref_9"},{"unstructured":"Razaviyayn, M., Farnia, F., and Tse, D. (2015). Discrete r\u00e9nyi classifiers. Advances in Neural Information Processing Systems, MIT Press.","key":"ref_10"},{"unstructured":"Farnia, F., and Tse, D. (2016). A minimax approach to supervised learning. Advances in Neural Information Processing Systems, Curran Associates Inc.","key":"ref_11"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"446","DOI":"10.1002\/j.1538-7305.1948.tb01340.x","article-title":"Spectra of Quantized Signals","volume":"27","author":"Bennett","year":"1948","journal-title":"Bell Syst. Tech. J."},{"unstructured":"Nisar, M.D. (2011). Minimax Robustness in Signal Processing for Communications, Shaker Verlag GmbH.","key":"ref_13"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1109\/PROC.1985.13167","article-title":"Robust techniques for signal processing: A survey","volume":"73","author":"Kassam","year":"1985","journal-title":"Proc. IEEE"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2124","DOI":"10.1109\/18.720534","article-title":"Universal prediction","volume":"44","author":"Merhav","year":"1998","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1109\/TIT.1984.1056876","article-title":"On minimax robustness: A general approach and applications","volume":"30","author":"Verdu","year":"1984","journal-title":"IEEE Trans. Inf. Theory"},{"unstructured":"Takeuchi, J.I., and Barron, A.R. (1998, January 14\u201316). Robustly minimax codes for universal data compression. Proceedings of the ISITA, Mexico City, Mexico.","key":"ref_17"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1712","DOI":"10.1109\/18.930912","article-title":"Strong optimality of the normalized ML models as universal codes and information in data","volume":"47","author":"Rissanen","year":"2001","journal-title":"IEEE Trans. Inf. Theory"},{"doi-asserted-by":"crossref","unstructured":"Gr\u00fcnwald, P. (2012). The safe bayesian. International Conference on Algorithmic Learning Theory, Springer.","key":"ref_19","DOI":"10.1007\/978-3-642-34106-9_16"},{"doi-asserted-by":"crossref","unstructured":"Harremo\u00ebs, P., and Tishby, N. (2007, January 24\u201329). The information bottleneck revisited or how to choose a good distortion measure. Proceedings of the 2007 IEEE International Symposium on Information Theory, Nice, France.","key":"ref_20","DOI":"10.1109\/ISIT.2007.4557285"},{"unstructured":"Cover, T.M., and Thomas, J.A. (2012). Elements of Information Theory, John Wiley & Sons.","key":"ref_21"},{"doi-asserted-by":"crossref","unstructured":"Painsky, A., and Wornell, G. (2018, January 17\u201322). On the universality of the logistic loss function. Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA.","key":"ref_22","DOI":"10.1109\/ISIT.2018.8437786"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1109\/TIT.2019.2958705","article-title":"Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss","volume":"66","author":"Painsky","year":"2019","journal-title":"IEEE Trans. Inf. Theory"},{"unstructured":"Altman, D., Machin, D., Bryant, T., and Gardner, M. (2013). Statistics with Confidence: Confidence Intervals and Statistical Guidelines, John Wiley  Sons.","key":"ref_24"},{"unstructured":"Gallager, R.G. (1968). Information Theory and Reliable Communication, Springer.","key":"ref_25"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1016\/1385-7258(74)90000-6","article-title":"On the Shannon capacity of an arbitrary channel","volume":"Volume 77","author":"Kemperman","year":"1974","journal-title":"Indagationes Mathematicae (Proceedings)"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1276","DOI":"10.1109\/18.605594","article-title":"A general minimax result for relative entropy","volume":"43","author":"Haussler","year":"1997","journal-title":"IEEE Trans. Inf. Theory"},{"unstructured":"Feder, M., and Polyanskiy, Y. (2021). Sequential prediction under log-loss and misspecification.. arXiv.","key":"ref_28"},{"key":"ref_29","first-page":"3","article-title":"Universal sequential coding of single messages","volume":"23","author":"Shtarkov","year":"1988","journal-title":"Probl. Inform. Transm."},{"doi-asserted-by":"crossref","unstructured":"Raginsky, M. (2008, January 23\u201326). On the information capacity of Gaussian channels under small peak power constraints. Proceedings of the 2008 46th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA.","key":"ref_30","DOI":"10.1109\/ALLERTON.2008.4797569"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"2073","DOI":"10.1109\/TIT.2005.847707","article-title":"Capacity-achieving probability measure for conditionally Gaussian channels with bounded inputs","volume":"51","author":"Chan","year":"2005","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"3907","DOI":"10.1109\/TIT.2018.2890208","article-title":"On the capacity of the peak power constrained vector Gaussian channel: An estimation theoretic perspective","volume":"65","author":"Dytso","year":"2019","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1016\/S0378-3758(99)00047-6","article-title":"Simultaneous confidence intervals for multinomial proportions","volume":"82","author":"Glaz","year":"1999","journal-title":"J. Stat. Plan. Inference"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"716","DOI":"10.1214\/aoms\/1177703569","article-title":"Simultaneous confidence intervals for contrasts among multinomial populations","volume":"35","author":"Goodman","year":"1964","journal-title":"Ann. Math. Stat."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1080\/00401706.1964.10490163","article-title":"Large sample simultaneous confidence intervals for multinomial proportions","volume":"6","author":"Quesenberry","year":"1964","journal-title":"Technometrics"},{"doi-asserted-by":"crossref","unstructured":"Powers, D.M. (1998). Applications and explanations of Zipf\u2019s law. New Methods in Language Processing and Computational Natural Language Learning, ACL.","key":"ref_36","DOI":"10.3115\/1603899.1603924"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1016\/S0378-4371(99)00086-2","article-title":"Zipf\u2019s law in income distribution of companies","volume":"269","author":"Okuyama","year":"1999","journal-title":"Phys. A Stat. Mech. Appl."},{"doi-asserted-by":"crossref","unstructured":"Saichev, A.I., Malevergne, Y., and Sornette, D. (2009). Theory of Zipf\u2019s Law and Beyond, Springer Science & Business Media.","key":"ref_38","DOI":"10.1007\/978-3-642-02946-2"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1109\/TIT.1981.1056331","article-title":"The performance of universal encoding","volume":"27","author":"Krichevsky","year":"1981","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1085","DOI":"10.1109\/18.87000","article-title":"The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression","volume":"37","author":"Witten","year":"1991","journal-title":"IEEE Trans. Inf. Theory"},{"unstructured":"Orlitsky, A., and Suresh, A.T. (2015). Competitive distribution estimation: Why is good-turning good. Advances in Neural Information Processing Systems, Curran Associates, Inc.","key":"ref_41"},{"unstructured":"Laplace, P.S. (1825). Pierre-Simon Laplace Philosophical Essay on Probabilities: Translated from the Fifth French Edition of 1825 with Notes by the Translator, Springer Science & Business Media.","key":"ref_42"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1093\/biomet\/40.3-4.237","article-title":"The population frequencies of species and the estimation of population parameters","volume":"40","author":"Good","year":"1953","journal-title":"Biometrika"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1191","DOI":"10.1162\/089976603321780272","article-title":"Estimation of entropy and mutual information","volume":"15","author":"Paninski","year":"2003","journal-title":"Neural Comput."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1080\/09296179508590051","article-title":"Good-turing frequency estimation without tears","volume":"2","author":"Gale","year":"1995","journal-title":"J. Quant. Linguist."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"570","DOI":"10.1287\/opre.43.4.570","article-title":"Breast cancer diagnosis and prognosis via linear programming","volume":"43","author":"Mangasarian","year":"1995","journal-title":"Oper. Res."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"171","DOI":"10.2140\/pjm.1958.8.171","article-title":"On general minimax theorems","volume":"8","author":"Sion","year":"1958","journal-title":"Pac. J. Math."},{"doi-asserted-by":"crossref","unstructured":"Boyd, S., and Vandenberghe, L. (2004). Convex Optimization, Cambridge University Press.","key":"ref_48","DOI":"10.1017\/CBO9780511804441"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"460","DOI":"10.1109\/TIT.1972.1054855","article-title":"Computation of channel capacity and rate-distortion functions","volume":"18","author":"Blahut","year":"1972","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1109\/TIT.1972.1054753","article-title":"An algorithm for computing the capacity of arbitrary discrete memoryless channels","volume":"18","author":"Arimoto","year":"1972","journal-title":"IEEE Trans. Inf. Theory"},{"unstructured":"Yeung, R.W. (2008). Information Theory and Network Coding, Springer Science & Business Media.","key":"ref_51"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/6\/773\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:18:52Z","timestamp":1760163532000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/6\/773"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,18]]},"references-count":51,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2021,6]]}},"alternative-id":["e23060773"],"URL":"https:\/\/doi.org\/10.3390\/e23060773","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2021,6,18]]}}}