{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:42:57Z","timestamp":1760233377374,"version":"build-2065373602"},"reference-count":35,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2021,1,12]],"date-time":"2021-01-12T00:00:00Z","timestamp":1610409600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>In this paper, we deal with the classical Statistical Learning Theory\u2019s problem of bounding, with high probability, the true risk R(h) of a hypothesis h chosen from a set H of m hypotheses. The Union Bound (UB) allows one to state that PLR^(h),\u03b4qh\u2264R(h)\u2264UR^(h),\u03b4ph\u22651\u2212\u03b4 where R^(h) is the empirical errors, if it is possible to prove that P{R(h)\u2265L(R^(h),\u03b4)}\u22651\u2212\u03b4 and P{R(h)\u2264U(R^(h),\u03b4)}\u22651\u2212\u03b4, when h, qh, and ph are chosen before seeing the data such that qh,ph\u2208[0,1] and \u2211h\u2208H(qh+ph)=1. If no a priori information is available qh and ph are set to 12m, namely equally distributed. This approach gives poor results since, as a matter of fact, a learning procedure targets just particular hypotheses, namely hypotheses with small empirical error, disregarding the others. In this work we set the qh and ph in a distribution-dependent way increasing the probability of being chosen to function with small true risk. We will call this proposal Distribution-Dependent Weighted UB (DDWUB) and we will retrieve the sufficient conditions on the choice of qh and ph that state that DDWUB outperforms or, in the worst case, degenerates into UB. Furthermore, theoretical and numerical results will show the applicability, the validity, and the potentiality of DDWUB.<\/jats:p>","DOI":"10.3390\/e23010101","type":"journal-article","created":{"date-parts":[[2021,1,12]],"date-time":"2021-01-12T14:28:53Z","timestamp":1610461733000},"page":"101","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Distribution-Dependent Weighted Union Bound"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8445-395X","authenticated-orcid":false,"given":"Luca","family":"Oneto","sequence":"first","affiliation":[{"name":"Department of Computer Science, Bioengineering, Robotics and Systems Engineering, University of Genoa, Via Opera Pia 11a, 16145 Genova, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sandro","family":"Ridella","sequence":"additional","affiliation":[{"name":"Department of Biophysical and Electronic Engineering, University of Genoa, Via Opera Pia 11a, 16145 Genova, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,1,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Kearns, M.J., and Vazirani, U.V. (1994). An Introduction to Computational Learning Theory, MIT Press.","DOI":"10.7551\/mitpress\/3897.001.0001"},{"key":"ref_2","unstructured":"Vapnik, V.N. (1998). Statistical Learning Theory, Wiley."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Friedman, J., Hastie, T., and Tibshirani, R. (2001). The Elements of Statistical Learning, Springer.","DOI":"10.1007\/978-0-387-21606-5"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Shalev-Shwartz, S., and Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press.","DOI":"10.1017\/CBO9781107298019"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1023\/A:1013999503812","article-title":"Model selection and error estimation","volume":"48","author":"Bartlett","year":"2002","journal-title":"Mach. Learn."},{"key":"ref_6","first-page":"273","article-title":"Tutorial on practical prediction theory for classification","volume":"6","author":"Langford","year":"2005","journal-title":"J. Mach. Learn. Res."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"e1252","DOI":"10.1002\/widm.1252","article-title":"Model Selection and Error Estimation Without the Agonizing Pain","volume":"8","author":"Oneto","year":"2018","journal-title":"WIREs Data Min. Knowl. Discov."},{"key":"ref_8","unstructured":"Langford, J. (2002). Quantitatively Tight Sample Complexity Bounds. [Ph.D. Thesis, Carnegie Mellon University]."},{"key":"ref_9","first-page":"463","article-title":"Rademacher and Gaussian complexities: Risk bounds and structural results","volume":"3","author":"Bartlett","year":"2002","journal-title":"J. Mach. Learn. Res."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1497","DOI":"10.1214\/009053605000000282","article-title":"Local Rademacher complexities","volume":"33","author":"Bartlett","year":"2005","journal-title":"Ann. Stat."},{"key":"ref_11","first-page":"499","article-title":"Stability and generalization","volume":"2","author":"Bousquet","year":"2002","journal-title":"J. Mach. Learn. Res."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1007\/BF00993593","article-title":"Sample compression, learnability, and the Vapnik-Chervonenkis dimension","volume":"21","author":"Floyd","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1023\/A:1007618624809","article-title":"Some PAC-Bayesian theorems","volume":"37","author":"McAllester","year":"1999","journal-title":"Mach. Learn."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., and Roth, A.L. (2015, January 14\u201317). Preserving Statistical Validity in Adaptive Data Analysis. Proceedings of the ACM Symposium on Theory of Computing, Portland, OR, USA.","DOI":"10.1145\/2746539.2746580"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"404","DOI":"10.1093\/biomet\/26.4.404","article-title":"The use of confidence or fiducial limits illustrated in the case of the binomial","volume":"26","author":"Clopper","year":"1934","journal-title":"Biometrika"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1080\/01621459.1963.10500830","article-title":"Probability inequalities for sums of bounded random variables","volume":"58","author":"Hoeffding","year":"1963","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Hotz, T., Kelma, F., and Wieditz, J. (2016). Non-asymptotic confidence sets for circular means. Entropy, 18.","DOI":"10.3390\/e18100375"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1007\/s10444-004-7634-z","article-title":"Learning theory: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization","volume":"25","author":"Mukherjee","year":"2006","journal-title":"Adv. Comput. Math."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1926","DOI":"10.1109\/18.705570","article-title":"Structural risk minimization over data-dependent hierarchies","volume":"44","author":"Bartlett","year":"1998","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_20","first-page":"529","article-title":"Computable shell decomposition bounds","volume":"5","author":"Langford","year":"2004","journal-title":"J. Mach. Learn. Res."},{"key":"ref_21","unstructured":"Tikhonov, A.N., Arsenin, V.I.A., and John, F. (1977). Solutions of Ill-Posed Problems, Winston."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Sch\u00f6lkopf, B., Smola, A.J., and Bach, F. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press.","DOI":"10.7551\/mitpress\/4175.001.0001"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Schuster, T., Kaltenbacher, B., Hofmann, B., and Kazimierski, K.S. (2012). Regularization Methods in Banach Spaces, Walter de Gruyter.","DOI":"10.1515\/9783110255720"},{"key":"ref_24","unstructured":"Oneto, L., Ridella, S., and Anguita, D. (2020, January 2\u20134). Improving the Union Bound: A Distribution Dependent Approach. Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Brugge, Belgium."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1023\/A:1006556606079","article-title":"The racing algorithm: Model selection for lazy learners","volume":"11","author":"Maron","year":"1997","journal-title":"Artif. Intell. Rev."},{"key":"ref_26","first-page":"398","article-title":"Genome-wide significance levels and weighted hypothesis testing","volume":"24","author":"Roeder","year":"2009","journal-title":"Stat. Sci. Rev. J. Inst. Math. Stat."},{"key":"ref_27","unstructured":"Catoni, O. (2007). PAC-Bayesian Supervised Classification, Institute of Mathematical Statistics."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1016\/j.neunet.2016.07.002","article-title":"A local Vapnik-Chervonenkis complexity","volume":"82","author":"Oneto","year":"2016","journal-title":"Neural Netw."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Polato, M., Lauriola, I., and Aiolli, F. (2018). A novel boolean kernels family for categorical data. Entropy, 20.","DOI":"10.3390\/e20060444"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1016\/j.tcs.2012.10.013","article-title":"Tighter PAC-Bayes bounds through distribution-dependent priors","volume":"473","author":"Lever","year":"2013","journal-title":"Theor. Comput. Sci."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1016\/j.patrec.2016.06.019","article-title":"PAC-Bayesian analysis of distribution dependent priors: Tighter risk bounds and stability analysis","volume":"80","author":"Oneto","year":"2016","journal-title":"Pattern Recognit. Lett."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/j.neucom.2017.10.066","article-title":"Randomized Learning: Generalization Performance of Old and New Theoretically Grounded Algorithms","volume":"298","author":"Oneto","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1093\/biomet\/41.1-2.100","article-title":"Continuous inspection schemes","volume":"41","author":"Page","year":"1954","journal-title":"Biometrika"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1023\/A:1007631014630","article-title":"Multiple comparisons in induction algorithms","volume":"38","author":"Jensen","year":"2000","journal-title":"Mach. Learn."},{"key":"ref_35","first-page":"26","article-title":"A remark on Stirling\u2019s formula","volume":"62","author":"Robbins","year":"1955","journal-title":"Am. Math. Mon."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/1\/101\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:10:09Z","timestamp":1760159409000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/1\/101"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,12]]},"references-count":35,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,1]]}},"alternative-id":["e23010101"],"URL":"https:\/\/doi.org\/10.3390\/e23010101","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2021,1,12]]}}}