{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:38:48Z","timestamp":1760240328795,"version":"build-2065373602"},"reference-count":42,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2019,5,15]],"date-time":"2019-05-15T00:00:00Z","timestamp":1557878400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003593","name":"Conselho Nacional de Desenvolvimento Cient\u00edfico e Tecnol\u00f3gico","doi-asserted-by":"publisher","award":["307556\/2017-4"],"award-info":[{"award-number":["307556\/2017-4"]}],"id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002322","name":"Coordena\u00e7\u00e3o de Aperfei\u00e7oamento de Pessoal de N\u00edvel Superior","doi-asserted-by":"publisher","award":["001"],"award-info":[{"award-number":["001"]}],"id":[{"id":"10.13039\/501100002322","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006162","name":"Funda\u00e7\u00e3o de Amparo \u00e0 Ci\u00eancia e Tecnologia do Estado de Pernambuco","doi-asserted-by":"publisher","award":["IBPG-0086-1.02\/13"],"award-info":[{"award-number":["IBPG-0086-1.02\/13"]}],"id":[{"id":"10.13039\/501100006162","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>A quantifier of similarity is generally a type of score that assigns a numerical value to a pair of sequences based on their proximity. Similarity measures play an important role in prediction problems with many applications, such as statistical learning, data mining, biostatistics, finance and others. Based on observed data, where a response variable of interest is assumed to be associated with some regressors, it is possible to make response predictions using a weighted average of observed response variables, where the weights depend on the similarity of the regressors. In this work, we propose a parametric regression model for continuous response based on empirical similarities for the case where the regressors are represented by categories. We apply the proposed method to predict tooth length growth in guinea pigs based on Vitamin C supplements considering three different dosage levels and two delivery methods. The inferential procedure is performed through maximum likelihood and least squares estimation under two types of similarity functions and two distance metrics. The empirical results show that the method yields accurate models with low dimension facilitating the parameters\u2019 interpretation.<\/jats:p>","DOI":"10.3390\/make1020038","type":"journal-article","created":{"date-parts":[[2019,5,15]],"date-time":"2019-05-15T11:37:40Z","timestamp":1557920260000},"page":"641-652","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Prediction by Empirical Similarity via Categorical Regressors"],"prefix":"10.3390","volume":"1","author":[{"given":"Jeniffer Duarte","family":"Sanchez","sequence":"first","affiliation":[{"name":"Statistics Department, Universidade de S\u00e3o Paulo, S\u00e3o Paulo 05508-010, SP, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4091-024X","authenticated-orcid":false,"given":"Leandro C.","family":"R\u00eago","sequence":"additional","affiliation":[{"name":"Statistics and Applied Math Department, Universidade Federal do Cear\u00e1, Fortaleza 60440-900, CE, Brazil"},{"name":"Statistics and Management Engineering Graduate Programs, Universidade Federal de Pernambuco, Recife 50740-540, PE, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9884-9090","authenticated-orcid":false,"given":"Raydonal","family":"Ospina","sequence":"additional","affiliation":[{"name":"Statistics Department, CAST\u2014Computational Agriculture Statistics Laboratory, Universidade Federal de Pernambuco, Recife 50740-540, PE, Brazil"}]}],"member":"1968","published-online":{"date-parts":[[2019,5,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Fix, E., and Hodges, J.L. (1951). Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties, USAF School of Aviation Medicine. Technical Report 4; Project Number 21-49-004.","DOI":"10.1037\/e471672008-001"},{"key":"ref_2","unstructured":"Fix, E., and Hodges, J.L. (1952). Discriminatory Analysis: Samall Sample Performance, USAF School of Aviation Medicine. Technical Report 21-49-004."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/TIT.1967.1053964","article-title":"Nearest Neighbor Pattern Classification","volume":"13","author":"Cover","year":"1967","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Devroye, L., Gyorfy, L., and Lugosi, G. (1996). A probabilistic Theory of Pattern Recognition, Springer.","DOI":"10.1007\/978-1-4612-0711-5"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1007\/BF02900741","article-title":"An approximation to the density function","volume":"6","author":"Akaike","year":"1954","journal-title":"Ann. Inst. Stat. Math."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"832","DOI":"10.1214\/aoms\/1177728190","article-title":"Remarks on some Nonparametric Estimates of a Density Function","volume":"27","author":"Rosenblatt","year":"1956","journal-title":"Ann. Math. Stat."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.1214\/aoms\/1177704472","article-title":"On the Estimation of a Probability Density Function and the Mode","volume":"33","author":"Parzen","year":"1962","journal-title":"Ann. Math. Stat."},{"key":"ref_8","unstructured":"Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis, Chapman & Hall."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Scott, D.W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization, Wiley.","DOI":"10.1002\/9780470316849"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1162\/rest.88.3.433","article-title":"Empirical Similarity","volume":"88","author":"Gilboa","year":"2006","journal-title":"Rev. Econ. Stat."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"605","DOI":"10.2307\/2946694","article-title":"Case-based decision theory","volume":"110","author":"Gilboa","year":"1995","journal-title":"Q. J. Econ."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Gilboa, I., and Schmeidler, D. (2001). A Theory of Case-Based Decisions, Cambridge University Press.","DOI":"10.1017\/CBO9780511493539"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/1468-0262.00388","article-title":"Inductive inference: An axiomatic approach","volume":"71","author":"Gilboa","year":"2003","journal-title":"Econometrica"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Gayer, G., Gilboa, I., and Lieberman, O. (2007). Rule-Based and Case-Based Reasoning in Housing Prices. BE J. Theor. Econ., 7.","DOI":"10.2202\/1935-1704.1284"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1016\/j.jeconom.2009.10.015","article-title":"A similarity-based approach to prediction","volume":"162","author":"Gilboa","year":"2011","journal-title":"J. Econ."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1032","DOI":"10.1017\/S0266466609990454","article-title":"Asymptotic Theory for Empirical Similarity Models","volume":"4","author":"Lieberman","year":"2010","journal-title":"Econ. Theory"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1111\/j.1467-9892.2012.00783.x","article-title":"A Similarity-Based Approach to Time-Varying Coefficient Non-Stationary Autoregression","volume":"33","author":"Lieberman","year":"2012","journal-title":"J. Time Ser. Anal."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Davison, A.C. (2003). Statistical Models, Cambridge University Press.","DOI":"10.1017\/CBO9780511815850"},{"key":"ref_19","unstructured":"Wassermann, L. (2006). All of Nonparametric Statistics, Springer."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1125","DOI":"10.1111\/j.1468-0262.2005.00611.x","article-title":"Probabilities as similarity-weighted frequencies","volume":"73","author":"Billot","year":"2005","journal-title":"Econometrica"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1016\/j.mathsocsci.2007.08.002","article-title":"Axiomatization of an exponential similarity function","volume":"55","author":"Billot","year":"2008","journal-title":"Math. Soc. Sci."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1007\/s11229-009-9473-4","article-title":"On the definition of objective probabilities by empirical similarity","volume":"172","author":"Gilboa","year":"2010","journal-title":"Synthese"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"592","DOI":"10.1111\/jtsa.12083","article-title":"Norming Rates and Limit Theory for Some Time-Varying Coefficient Autoregressions","volume":"35","author":"Lieberman","year":"2014","journal-title":"J. Time Ser. Anal."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1016\/j.jebo.2015.06.005","article-title":"Forecasting volatility with empirical similarity and Google Trends","volume":"117","author":"Hamid","year":"2015","journal-title":"J. Econ. Behav. Organ."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1080\/07474938.2017.1308054","article-title":"Similarity-based model for ordered categorical data","volume":"38","author":"Gayer","year":"2019","journal-title":"Econ. Rev."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1093\/biomet\/63.3.413","article-title":"Multivariate binary discrimination by the kernel method","volume":"63","author":"Aitchison","year":"1976","journal-title":"Biometrika"},{"key":"ref_27","first-page":"1477","article-title":"Nonparametric and semiparametric estimation with discrete regressors","volume":"63","author":"Delgado","year":"1995","journal-title":"Econom. J. Econom. Soc."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"463","DOI":"10.4310\/SII.2011.v4.n4.a5","article-title":"Nonparametric regression with discrete covariate and missing values","volume":"4","author":"Chen","year":"2011","journal-title":"Stat. Its Interface"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"48","DOI":"10.32614\/RJ-2012-012","article-title":"The crs Package: Nonparametric Regression Splines for Continuous and Categorical Predictors","volume":"4","author":"Nie","year":"2012","journal-title":"R J."},{"key":"ref_30","first-page":"515","article-title":"Additive regression splines with irrelevant categorical and continuous regressors","volume":"23","author":"Ma","year":"2013","journal-title":"Stat. Sin."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"199","DOI":"10.3390\/econometrics3020199","article-title":"Plug-in bandwidth selection for kernel density estimation with discrete data","volume":"3","author":"Chu","year":"2015","journal-title":"Econometrics"},{"key":"ref_32","unstructured":"Guo, C., and Berkhahn, F. (2016). Entity embeddings of categorical variables. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/S0304-4076(03)00157-X","article-title":"Nonparametric estimation of regression functions with both categorical and continuous data","volume":"119","author":"Racine","year":"2004","journal-title":"J. Econ."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"266","DOI":"10.1016\/S0047-259X(02)00025-8","article-title":"Nonparametric estimation of distributions with categorical and continuous data","volume":"86","author":"Li","year":"2003","journal-title":"J. Multivar. Anal."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1257\/aer.98.2.357","article-title":"Estimating average treatment effects with continuous and discrete covariates: The case of Swan-Ganz catheterization","volume":"98","author":"Li","year":"2008","journal-title":"Am. Econ. Rev."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Farn\u00e8, M., and Vouldis, A.T. (2018). A Methodology for Automatised Outlier Detection in High-Dimensional Datasets: An Application to Euro Area Banks\u2019 Supervisory Data, European Central Bank.","DOI":"10.2139\/ssrn.3224300"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1093\/jn\/33.5.491","article-title":"The growth of the odontoblasts of the incisor tooth as a criterion of the vitamin C intake of the guinea pig","volume":"33","author":"Crampton","year":"1947","journal-title":"J. Nutr."},{"key":"ref_38","unstructured":"R Core Team (2018). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1080\/00031305.1998.10480559","article-title":"Violin Plots: A Box Plot-Density Trace Synergism","volume":"52","author":"Hintze","year":"1998","journal-title":"Am. Stat."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1177\/1471082X16642560","article-title":"Regularized regression for categorical data","volume":"16","author":"Tutz","year":"2016","journal-title":"Stat. Model."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1177\/1471082X16644998","article-title":"On coding effects in regularized categorical regression","volume":"16","author":"Chiquet","year":"2016","journal-title":"Stat. Model."},{"key":"ref_42","unstructured":"Tibshirani, R., Wainwright, M., and Hastie, T. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations, Chapman and Hall\/CRC."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/2\/38\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:52:12Z","timestamp":1760187132000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/2\/38"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,5,15]]},"references-count":42,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2019,6]]}},"alternative-id":["make1020038"],"URL":"https:\/\/doi.org\/10.3390\/make1020038","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2019,5,15]]}}}