{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T07:24:23Z","timestamp":1767597863116,"version":"build-2065373602"},"reference-count":29,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2019,2,8]],"date-time":"2019-02-08T00:00:00Z","timestamp":1549584000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Beta regression models are a class of supervised learning tools for regression problems with univariate and limited response. Current fitting procedures for beta regression require variable selection based on (potentially problematic) information criteria. We propose model selection criteria that take into account the leverage, residuals, and influence of the observations, both to systematic linear and nonlinear components. To that end, we propose a Predictive Residual Sum of Squares (PRESS)-like machine learning tool and a prediction coefficient, namely     P 2     statistic, as a computational procedure. Monte Carlo simulation results on the finite sample behavior of prediction-based model selection criteria     P 2     are provided. We also evaluated two versions of the     R 2     criterion. Finally, applications to real data are presented. The new criterion proved to be crucial to choose models taking into account the robustness of the maximum likelihood estimation procedure in the presence of influential cases.<\/jats:p>","DOI":"10.3390\/make1010026","type":"journal-article","created":{"date-parts":[[2019,2,11]],"date-time":"2019-02-11T03:26:01Z","timestamp":1549855561000},"page":"427-449","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["Model Selection Criteria on Beta Regression for Machine Learning"],"prefix":"10.3390","volume":"1","author":[{"given":"Patr\u00edcia L.","family":"Espinheira","sequence":"first","affiliation":[{"name":"Departamento de Estat\u00edstica, CAST \u2013 Computational Agriculture Statistics Laboratory, Universidade Federal de Pernambuco, Recife 50740-540, Brazil"}]},{"given":"Luana C. Meireles","family":"da Silva","sequence":"additional","affiliation":[{"name":"Departamento de Estat\u00edstica, CAST \u2013 Computational Agriculture Statistics Laboratory, Universidade Federal de Pernambuco, Recife 50740-540, Brazil"}]},{"given":"Alisson de Oliveira","family":"Silva","sequence":"additional","affiliation":[{"name":"Departamento de Estat\u00edstica, CAST \u2013 Computational Agriculture Statistics Laboratory, Universidade Federal de Pernambuco, Recife 50740-540, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9884-9090","authenticated-orcid":false,"given":"Raydonal","family":"Ospina","sequence":"additional","affiliation":[{"name":"Departamento de Estat\u00edstica, CAST \u2013 Computational Agriculture Statistics Laboratory, Universidade Federal de Pernambuco, Recife 50740-540, Brazil"}]}],"member":"1968","published-online":{"date-parts":[[2019,2,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"348","DOI":"10.1016\/j.csda.2009.08.017","article-title":"Improved estimators for a general class of beta regression models","volume":"54","author":"Simas","year":"2010","journal-title":"Comput. Stat. Data Anal."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1007\/s00362-008-0125-4","article-title":"Inflated beta distributions","volume":"51","author":"Ospina","year":"2010","journal-title":"Stat. Pap."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1609","DOI":"10.1016\/j.csda.2011.10.005","article-title":"A general class of zero-or-one inflated beta regression models","volume":"56","author":"Ospina","year":"2012","journal-title":"Comput. Stat. Data Anal."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1007\/s11749-010-0189-z","article-title":"Influence diagnostics in a general class of beta regression models","volume":"20","author":"Rocha","year":"2011","journal-title":"Test"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1111\/j.2517-6161.1986.tb01398.x","article-title":"Assessment of local influence (with discussion)","volume":"48","author":"Cook","year":"1986","journal-title":"J. R. Stat. Soc. B"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"720","DOI":"10.1080\/03610918.2014.977918","article-title":"Model selection criteria in beta regression with varying dispersion","volume":"46","author":"Bayer","year":"2017","journal-title":"Commun. Stat. Simul. Comput."},{"key":"ref_7","unstructured":"Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. Second International Symposium on Information Theory (Tsahkadsor, 1971), Akad\u00e9miai Kiad\u00f3."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1214\/aos\/1176344136","article-title":"Estimating the dimension of a model","volume":"6","author":"Schwarz","year":"1978","journal-title":"Ann. Stat."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1080\/00401706.1974.10489157","article-title":"The relationship between variable selection and data augmentation and a method for prediction","volume":"16","author":"Allen","year":"1974","journal-title":"Technometrics"},{"key":"ref_10","first-page":"44","article-title":"A comparison of the coefficient of predictive power, the coefficient of determination and AIC for linear regression","volume":"8","author":"Mediavilla","year":"2008","journal-title":"J. Appl. Bus. Econ."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1007\/s00362-014-0596-4","article-title":"Predictive performance of linear regression models","volume":"56","year":"2015","journal-title":"Stat. Pap."},{"key":"ref_12","first-page":"15","article-title":"Detection of Influential Observation in Linear Regression","volume":"19","author":"Cook","year":"1977","journal-title":"Technometrics"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1080\/0266476042000214501","article-title":"Beta regression for modelling rates and proportions","volume":"31","author":"Ferrari","year":"2004","journal-title":"J. Appl. Stat."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1214\/aos\/1176345513","article-title":"Logistic Regression Diagnostics","volume":"9","author":"Pregibon","year":"1981","journal-title":"Ann. Stat."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1080\/02664760701834931","article-title":"On beta regression residuals","volume":"35","author":"Espinheira","year":"2008","journal-title":"J. Appl. Stat."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1080\/00401706.1988.10488370","article-title":"Influence Measures in Ridge Regression","volume":"30","author":"Walker","year":"1988","journal-title":"Technometrics"},{"key":"ref_17","unstructured":"Cook, R.D., and Weisberg, S. (1982). Residuals and Influence in Regression, Chapman and Hall."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"4417","DOI":"10.1016\/j.csda.2008.02.028","article-title":"Influence diagnostics in beta regression","volume":"52","author":"Espinheira","year":"2008","journal-title":"Comput. Stat. Data Anal."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"570","DOI":"10.2307\/3109764","article-title":"Local Influence in Linear Mixed Models","volume":"54","author":"Lesaffre","year":"1998","journal-title":"Biometrics"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"691","DOI":"10.1093\/biomet\/78.3.691","article-title":"A note on a general definition of the coefficient of determination","volume":"78","author":"Nagelkerke","year":"1991","journal-title":"Biometrika"},{"key":"ref_21","unstructured":"Doornik, J.A. (2009). An Object-Oriented Matrix Programming Language Ox, Timberlake Consultants Ltd. [6th ed.]."},{"key":"ref_22","unstructured":"Espinheira, P.L., and Silva, A.O. (arXiv, 2018). Nonlinear simplex regression models, arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1002\/bimj.201600136","article-title":"On nonlinear beta regression residuals","volume":"59","author":"Espinheira","year":"2017","journal-title":"Biom. J."},{"key":"ref_24","unstructured":"Salazar, S.M.G. (2005). Contribuicion al Estudio de la Reaccion de Decomposicion de la Zeolita Y em Presencia de Vapor de Agua y Vanadio. [Master\u2019s Thesis, Universidad Nacional de Colombia]."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1263","DOI":"10.1007\/s00180-014-0490-5","article-title":"Bootstrap prediction intervals in beta regressions","volume":"29","author":"Espinheira","year":"2014","journal-title":"Comput. Stat."},{"key":"ref_26","unstructured":"Rupert, G. (2012). Simultaneous Statistical Inference, Springer Science & Business Media."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1007\/s10115-013-0706-y","article-title":"Self-labeled techniques for semi-supervised learning: Taxonomy, software and empirical study","volume":"42","author":"Triguero","year":"2015","journal-title":"Knowl. Inf. Syst."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Livieris, I., Kanavos, A., Tampakas, V., and Pintelas, P. (2018). An auto-adjustable semi-supervised self-training algorithm. Algorithms, 11.","DOI":"10.3390\/a11090139"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: A practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/1\/26\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:30:46Z","timestamp":1760185846000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/1\/26"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,2,8]]},"references-count":29,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["make1010026"],"URL":"https:\/\/doi.org\/10.3390\/make1010026","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2019,2,8]]}}}