{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T02:27:42Z","timestamp":1760236062602,"version":"build-2065373602"},"reference-count":30,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2021,10,20]],"date-time":"2021-10-20T00:00:00Z","timestamp":1634688000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"H2020 Marie Sklodowska-Curie-RISE Grant","award":["872181"],"award-info":[{"award-number":["872181"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>This paper presents the intrinsic limit determination algorithm (ILD Algorithm), a novel technique to determine the best possible performance, measured in terms of the AUC (area under the ROC curve) and accuracy, that can be obtained from a specific dataset in a binary classification problem with categorical features regardless of the model used. This limit, namely, the Bayes error, is completely independent of any model used and describes an intrinsic property of the dataset. The ILD algorithm thus provides important information regarding the prediction limits of any binary classification algorithm when applied to the considered dataset. In this paper, the algorithm is described in detail, its entire mathematical framework is presented and the pseudocode is given to facilitate its implementation. Finally, an example with a real dataset is given.<\/jats:p>","DOI":"10.3390\/a14110301","type":"journal-article","created":{"date-parts":[[2021,10,20]],"date-time":"2021-10-20T07:05:57Z","timestamp":1634713557000},"page":"301","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["A Model-Agnostic Algorithm for Bayes Error Determination in Binary Classification"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6060-5365","authenticated-orcid":false,"given":"Umberto","family":"Michelucci","sequence":"first","affiliation":[{"name":"TOELT LLC, Machine Learning Research and Development, Birchlenstr. 25, 8600 D\u00fcbendorf, Switzerland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0756-8523","authenticated-orcid":false,"given":"Michela","family":"Sperti","sequence":"additional","affiliation":[{"name":"PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, 10129 Turin, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7691-4886","authenticated-orcid":false,"given":"Dario","family":"Piga","sequence":"additional","affiliation":[{"name":"IDSIA\u2014Dalle Molle Institute for Artificial Intelligence, USI-SUPSI, Via la Santa 1, 6962 Lugano, Switzerland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2562-9932","authenticated-orcid":false,"given":"Francesca","family":"Venturini","sequence":"additional","affiliation":[{"name":"TOELT LLC, Machine Learning Research and Development, Birchlenstr. 25, 8600 D\u00fcbendorf, Switzerland"},{"name":"Institute of Applied Mathematics and Physics, Zurich University of Applied Sciences, Technikumstrasse 9, 8401 Winterthur, Switzerland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1918-1772","authenticated-orcid":false,"given":"Marco A.","family":"Deriu","sequence":"additional","affiliation":[{"name":"PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, 10129 Turin, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2021,10,20]]},"reference":[{"key":"ref_1","unstructured":"Raschka, S. (2018). Model evaluation, model selection, and algorithm selection in machine learning. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1214\/09-SS054","article-title":"A survey of cross-validation procedures for model selection","volume":"4","author":"Arlot","year":"2010","journal-title":"Stat. Surv."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"357","DOI":"10.3390\/make3020018","article-title":"Estimating neural network\u2019s performance with bootstrap: A tutorial","volume":"3","author":"Michelucci","year":"2021","journal-title":"Mach. Learn. Knowl. Extr."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Michelucci, U. (2018). Applied Deep Learning\u2014A Case-Based Approach to Understanding Deep Neural Networks, APRESS Media, LLC.","DOI":"10.1007\/978-1-4842-3790-8"},{"key":"ref_5","unstructured":"Yu, T., and Zhu, H. (2020). Hyper-parameter optimization: A review of algorithms and applications. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1007\/s10044-007-0087-5","article-title":"On the k-NN performance in a challenging scenario of imbalance and overlapping","volume":"11","author":"Mollineda","year":"2008","journal-title":"Pattern Anal. Appl."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"4457","DOI":"10.1007\/s00521-020-05256-0","article-title":"A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets","volume":"33","author":"Yuan","year":"2021","journal-title":"Neural Comput. Appl."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1007\/BF00116895","article-title":"Incremental learning from noisy data","volume":"1","author":"Schlimmer","year":"1986","journal-title":"Mach. Learn."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1007\/BF00116829","article-title":"Learning from noisy examples","volume":"2","author":"Angluin","year":"1988","journal-title":"Mach. Learn."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1145\/2914770.2837671","article-title":"Learning programs from noisy data","volume":"51","author":"Raychev","year":"2016","journal-title":"ACM Sigplan Not."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1080\/10255810305042","article-title":"Bayes error rate estimation using classifier ensembles","volume":"5","author":"Tumer","year":"2003","journal-title":"Int. J. Smart Eng. Syst. Des."},{"key":"ref_12","unstructured":"Gareth, J., Daniela, W., Trevor, H., and Robert, T. (2013). An Introduction to Statistical Learning: With Applications in R, Spinger."},{"key":"ref_13","unstructured":"Tumer, K., Bollacker, K., and Ghosh, J. (1998). A mutual information based ensemble method to estimate bayes error. Intelligent Engineering Systems through Artificial Neural Networks, ASME Press."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Ghosh, J. (2002). Multiclassifier systems: Back to the future. International Workshop on Multiple Classifier Systems, Springer.","DOI":"10.1007\/3-540-45428-4_1"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1162\/neco.1991.3.4.461","article-title":"Neural network classifiers estimate Bayesian a posteriori probabilities","volume":"3","author":"Richard","year":"1991","journal-title":"Neural Comput."},{"key":"ref_16","unstructured":"Shoemaker, P., Carlin, M., Shimabukuro, R., and Priebe, C. (1991). Least-Squares Learning and Approximation of Posterior Probabilities on Classification Problems by Neural Network Models, Technical Report; Naval Ocean Systems Center."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11239-019-01940-8","article-title":"Machine learning versus traditional risk stratification methods in acute coronary syndrome: A pooled randomized clinical trial analysis","volume":"49","author":"Gibson","year":"2020","journal-title":"J. Thromb. Thrombolysis"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1289","DOI":"10.1177\/1460458219871780","article-title":"A machine learning\u2013based 1-year mortality prediction model after hospital discharge for clinical patients with acute coronary syndrome","volume":"26","author":"Sherazi","year":"2020","journal-title":"Health Inform. J."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"e24018","DOI":"10.2196\/24018","article-title":"Machine learning to predict mortality and critical events in a cohort of patients with COVID-19 in New York City: Model development and validation","volume":"22","author":"Vaid","year":"2020","journal-title":"J. Med. Internet Res."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"e24225","DOI":"10.2196\/24225","article-title":"An Easy-to-Use Machine Learning Model to Predict the Prognosis of Patients with COVID-19: Retrospective Cohort Study","volume":"22","author":"Kim","year":"2020","journal-title":"J. Med. Internet Res."},{"key":"ref_21","unstructured":"Wang, S., Pathak, J., and Zhang, Y. (2019). Using electronic health records and machine learning to predict postpartum depression. MEDINFO 2019: Health and Wellbeing e-Networks for All, IOS Press, 1013 BG."},{"key":"ref_22","unstructured":"Hogg, R.V., Tanis, E.A., and Zimmerman, D.L. (2010). Probability and Statistical Inference, Pearson\/Prentice Hall."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"999","DOI":"10.1016\/S0140-6736(13)61752-3","article-title":"The Framingham Heart Study and the epidemiology of cardiovascular disease: A historical perspective","volume":"383","author":"Mahmood","year":"2014","journal-title":"Lancet"},{"key":"ref_24","unstructured":"Nocedal, J., and Wright, S. (2006). Numerical Optimization, Springer Science & Business Media."},{"key":"ref_25","unstructured":"(2021, June 29). Framingham Dataset Download, Kaggle Website. Available online: https:\/\/www.kaggle.com\/eeshanpaul\/framingham."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1837","DOI":"10.1161\/01.CIR.97.18.1837","article-title":"Prediction of coronary heart disease using risk factor categories","volume":"97","author":"Wilson","year":"1998","journal-title":"Circulation"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1161\/CIRCULATIONAHA.107.699579","article-title":"General cardiovascular risk profile for use in primary care","volume":"117","author":"Vasan","year":"2008","journal-title":"Circulation"},{"key":"ref_28","unstructured":"World Health Organisation (2021, June 28). Cardiovascular Diseases (CVDs). Available online: https:\/\/www.who.int\/news-room\/fact-sheets\/detail\/cardiovascular-diseases-(cvds)."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Herschtal, A., and Raskutti, B. (2004, January 4\u20138). Optimising area under the ROC curve using gradient descent. Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada.","DOI":"10.1145\/1015330.1015366"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Joachims, T. (2005, January 7\u201311). A support vector method for multivariate performance measures. Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany.","DOI":"10.1145\/1102351.1102399"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/11\/301\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:18:44Z","timestamp":1760167124000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/11\/301"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,20]]},"references-count":30,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2021,11]]}},"alternative-id":["a14110301"],"URL":"https:\/\/doi.org\/10.3390\/a14110301","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2021,10,20]]}}}