{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T04:10:10Z","timestamp":1778645410864,"version":"3.51.4"},"reference-count":60,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2023,9,12]],"date-time":"2023-09-12T00:00:00Z","timestamp":1694476800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"FCT\u2014Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","award":["UIDB\/00297\/2020"],"award-info":[{"award-number":["UIDB\/00297\/2020"]}]},{"name":"FCT\u2014Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","award":["UIDP\/00297\/2020"],"award-info":[{"award-number":["UIDP\/00297\/2020"]}]},{"name":"FCT\u2014Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","award":["UIDB\/04152\/2020"],"award-info":[{"award-number":["UIDB\/04152\/2020"]}]},{"name":"FCT\u2014Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","award":["UIDB\/00315\/2020"],"award-info":[{"award-number":["UIDB\/00315\/2020"]}]},{"name":"Center for Mathematics and Applications","award":["UIDB\/00297\/2020"],"award-info":[{"award-number":["UIDB\/00297\/2020"]}]},{"name":"Center for Mathematics and Applications","award":["UIDP\/00297\/2020"],"award-info":[{"award-number":["UIDP\/00297\/2020"]}]},{"name":"Center for Mathematics and Applications","award":["UIDB\/04152\/2020"],"award-info":[{"award-number":["UIDB\/04152\/2020"]}]},{"name":"Center for Mathematics and Applications","award":["UIDB\/00315\/2020"],"award-info":[{"award-number":["UIDB\/00315\/2020"]}]},{"name":"Centro de Investiga\u00e7\u00e3o em Gest\u00e3o de Informa\u00e7\u00e3o (MagIC)","award":["UIDB\/00297\/2020"],"award-info":[{"award-number":["UIDB\/00297\/2020"]}]},{"name":"Centro de Investiga\u00e7\u00e3o em Gest\u00e3o de Informa\u00e7\u00e3o (MagIC)","award":["UIDP\/00297\/2020"],"award-info":[{"award-number":["UIDP\/00297\/2020"]}]},{"name":"Centro de Investiga\u00e7\u00e3o em Gest\u00e3o de Informa\u00e7\u00e3o (MagIC)","award":["UIDB\/04152\/2020"],"award-info":[{"award-number":["UIDB\/04152\/2020"]}]},{"name":"Centro de Investiga\u00e7\u00e3o em Gest\u00e3o de Informa\u00e7\u00e3o (MagIC)","award":["UIDB\/00315\/2020"],"award-info":[{"award-number":["UIDB\/00315\/2020"]}]},{"name":"BRU-ISCTE-IUL","award":["UIDB\/00297\/2020"],"award-info":[{"award-number":["UIDB\/00297\/2020"]}]},{"name":"BRU-ISCTE-IUL","award":["UIDP\/00297\/2020"],"award-info":[{"award-number":["UIDP\/00297\/2020"]}]},{"name":"BRU-ISCTE-IUL","award":["UIDB\/04152\/2020"],"award-info":[{"award-number":["UIDB\/04152\/2020"]}]},{"name":"BRU-ISCTE-IUL","award":["UIDB\/00315\/2020"],"award-info":[{"award-number":["UIDB\/00315\/2020"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Risks"],"abstract":"<jats:p>Modelling claim frequency and claim severity are topics of great interest in property-casualty insurance for supporting underwriting, ratemaking, and reserving actuarial decisions. Standard Generalized Linear Models (GLM) frequency\u2013severity models assume a linear relationship between a function of the response variable and the predictors, independence between the claim frequency and severity, and assign full credibility to the data. To overcome some of these restrictions, this paper investigates the predictive performance of Gradient Boosting with decision trees as base learners to model the claim frequency and the claim severity distributions of an auto insurance big dataset and compare it with that obtained using a standard GLM model. The out-of-sample performance measure results show that the predictive performance of the Gradient Boosting Model (GBM) is superior to the standard GLM model in the Poisson claim frequency model. Differently, in the claim severity model, the classical GLM outperformed the Gradient Boosting Model. The findings suggest that gradient boost models can capture the non-linear relation between the response variable and feature variables and their complex interactions and thus are a valuable tool for the insurer in feature engineering and the development of a data-driven approach to risk management and insurance.<\/jats:p>","DOI":"10.3390\/risks11090163","type":"journal-article","created":{"date-parts":[[2023,9,12]],"date-time":"2023-09-12T21:34:12Z","timestamp":1694554452000},"page":"163","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["Modelling Motor Insurance Claim Frequency and Severity Using Gradient Boosting"],"prefix":"10.3390","volume":"11","author":[{"given":"Carina","family":"Clemente","sequence":"first","affiliation":[{"name":"NOVA IMS\u2014Information Management School, Universidade Nova de Lisboa, 1070-312 Lisbon, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4805-2638","authenticated-orcid":false,"given":"Gracinda R.","family":"Guerreiro","sequence":"additional","affiliation":[{"name":"FCT NOVA, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal"},{"name":"CMA-FCT-UNL, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7389-5103","authenticated-orcid":false,"given":"Jorge M.","family":"Bravo","sequence":"additional","affiliation":[{"name":"NOVA IMS\u2014Information Management School, Universidade Nova de Lisboa, MagIC, 1070-312 Lisbon, Portugal"},{"name":"Department of Economics, University Paris-Dauphine PSL, 75016 Paris, France"},{"name":"CEFAGE-UE, 7000-809 \u00c9vora, Portugal"},{"name":"BRU-ISCTE-IUL, 1649-026 Lisbon, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2023,9,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1016\/j.aap.2013.06.014","article-title":"Multivariate spatial models of excess crash frequency at area level: Case of Costa Rica","volume":"59","year":"2013","journal-title":"Accident Analysis & Prevention"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"114835","DOI":"10.1016\/j.eswa.2021.114835","article-title":"A Conservative Approach for Online Credit Scoring","volume":"176","author":"Ashofteh","year":"2021","journal-title":"Expert Systems With Applications"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"109422","DOI":"10.1016\/j.asoc.2022.109422","article-title":"A New Ensemble Learning Strategy for Panel Time-Series Forecasting with Applications to Tracking Respiratory Disease Excess Mortality during the COVID-19 pandemic","volume":"128","author":"Ashofteh","year":"2022","journal-title":"Applied Soft Computing"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Ayuso, Mercedes, Bravo, Jorge M., Holzmann, Robert, and Palmer, Eduard (2021). Automatic indexation of pension age to life expectancy: When policy design matters. Risks, 9.","DOI":"10.3390\/risks9050096"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1016\/j.dss.2017.04.009","article-title":"The value of vehicle telematics data in insurance risk selection processes","volume":"98","author":"Baecke","year":"2017","journal-title":"Decision Support Systems"},{"key":"ref_6","unstructured":"Boehmke, Bradley, and Greenwel, Brandon (2020). Hands-On Machine Learning with R, CRC Press, Taylor & Francis. [1st ed.]."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1007\/s13385-021-00279-w","article-title":"Pricing Participating Longevity-Linked Life Annuities: A Bayesian Model Ensemble approach","volume":"12","author":"Bravo","year":"2021","journal-title":"European Actuarial Journal"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Bravo, Jorge M., and Ayuso, Mercedes (2021). Linking Pensions to Life Expectancy: Tackling Conceptual Uncertainty through Bayesian Model Averaging. Mathematics, 9.","DOI":"10.3390\/math9243307"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Machine Learning"},{"key":"ref_10","unstructured":"Chollet, Fran\u00e7ois (2021). Deep Learning with Python, Manning. [2nd ed.]."},{"key":"ref_11","unstructured":"Clemente, Carina (2023). A Refreshed Vision of Non-Life Insurance Pricing\u2014A Generalized Linear Model and Machine Learning Approach. [Master\u2019s thesis, NOVA IMS]."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Cunha, Louren\u00e7o, and Bravo, Jorge M. (, January June). Automobile Usage-Based-Insurance: Improving Risk Management using Telematics Data. Paper presented at 2022 17th Iberian Conference on Information Systems and Technologies (CISTI), Madrid, Spain.","DOI":"10.23919\/CISTI54924.2022.9820146"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1080\/03461238.2010.546147","article-title":"A mixed copula model for insurance claims and claim sizes","volume":"4","author":"Czado","year":"2012","journal-title":"Scandinavian Actuarial Journal"},{"key":"ref_14","unstructured":"European Parliament (2016). General Data Protection Regulation, European Parliament. Regulation (EU) 2016\/679."},{"key":"ref_15","first-page":"159","article-title":"The Accuracy of XGBoost for Insurance Claim Prediction","volume":"10","author":"Fauzan","year":"2018","journal-title":"International Journal of Advances in Soft Computing and Its Applications"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"165","DOI":"10.2143\/AST.39.1.2038061","article-title":"Actuarial applications of a hierarchical insurance claims model","volume":"39","author":"Frees","year":"2009","journal-title":"ASTIN Bulletin: The Journal of the IAA"},{"key":"ref_17","first-page":"360","article-title":"Copula credibility for aggregate loss models","volume":"38","author":"Frees","year":"2006","journal-title":"Insurance: Mathematics and Economics"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1080\/10920277.2011.10597626","article-title":"Predicting the frequency and amount of health care expenditures","volume":"15","author":"Frees","year":"2011","journal-title":"North American Actuarial Journal"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy boosting approximation: A gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Annals of Statistics"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"916","DOI":"10.1214\/07-AOAS148","article-title":"Predictive learning via rule ensembles","volume":"2","author":"Friedman","year":"2008","journal-title":"The Annals of Applied Statistics"},{"key":"ref_21","first-page":"29","article-title":"Dependence modeling of frequency-severity of insurance claims using waiting time Author links open overlay panel","volume":"109","author":"Gao","year":"2023","journal-title":"Insurance: Mathematics and Economics"},{"key":"ref_22","first-page":"205","article-title":"Generalized linear models for dependent frequency and severity of insurance claims","volume":"70","author":"Garrido","year":"2016","journal-title":"Insurance: Mathematics and Economics"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1080\/10618600.2014.907095","article-title":"Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation","volume":"24","author":"Goldstein","year":"2015","journal-title":"Journal of Computational and Graphical Statistics"},{"key":"ref_24","first-page":"202","article-title":"Spatial modelling of claim frequency and claim size in non-life insurance","volume":"3","author":"Czado","year":"2007","journal-title":"Scandinavian Actuarial Journal"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Hanafy, Mohamed, and Ming, Ruixing (2021). Machine learning approaches for auto insurance big data. Risks, 9.","DOI":"10.3390\/risks9020042"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1109\/34.58871","article-title":"Neural networks Ensembles","volume":"12","author":"Hansen","year":"1990","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome (2009). The Elements of Statistical Learning\u2014Data Mining, Inference, and Prediction, Springer. [2nd ed.]. Springer Series in Statistics.","DOI":"10.1007\/978-0-387-84858-7"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1080\/10920277.2020.1745656","article-title":"Boosting Insights in Insurance Tariff Plans with Tree-Based Machine Learning Methods","volume":"25","author":"Henckaerts","year":"2021","journal-title":"North American Actuarial Journal"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1162\/neco.1991.3.1.79","article-title":"Adaptive mixtures of local experts","volume":"3","author":"Jacobs","year":"1991","journal-title":"Neural Computation"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1016\/j.insmatheco.2020.07.011","article-title":"Predictive compound risk models with dependence","volume":"94","author":"Jeong","year":"2020","journal-title":"Insurance. Mathematics and Economics"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1016\/j.ijforecast.2007.06.001","article-title":"Simple robust averages of forecasts: Some empirical results","volume":"24","author":"Jose","year":"2008","journal-title":"International Journal of Forecasting"},{"key":"ref_32","first-page":"187","article-title":"Statistical Concepts of a Priori and a Posteriori Risk Classification in Insurance","volume":"96","author":"Katrien","year":"2011","journal-title":"Advances in Statistical Analysis"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"117366","DOI":"10.1016\/j.eswa.2022.117366","article-title":"Bagging ensemble-based novel data generation method for univariate time series forecasting","volume":"203","author":"Kim","year":"2022","journal-title":"Expert Systems with Applications"},{"key":"ref_34","first-page":"829","article-title":"Total loss estimation using copula-based regression models","volume":"53","author":"Brechmann","year":"2013","journal-title":"Insurance: Mathematics and Economics"},{"key":"ref_35","unstructured":"Kuo, Kuo, and Lupton, Daniel (2023, September 05). Towards Explainability of Machine Learning Models in Insurance Pricing. Available online: https:\/\/variancejournal.org\/article\/68374-towards-explainability-of-machine-learning-models-in-insurance-pricing."},{"key":"ref_36","first-page":"115","article-title":"Actuarial intelligence in auto insurance: Claim frequency modeling with driving behavior features and improved boosted trees","volume":"106","author":"Meng","year":"2022","journal-title":"Insurance: Mathematics and Economics"},{"key":"ref_37","unstructured":"Noll, Alexander, Salzmann, Robert, and W\u00fcthrich, Mario V. (2020). Case Study: French Motor Third-Party Liability Claims. SSRN Eletronic Journal, 1\u201341."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Ohlsson, Esbj\u00f6rn, and Johansson, Bj\u00f6rn (2010). Non-Life Insurance Pricing with Generalized Linear Models, Springer. [2nd ed.].","DOI":"10.1007\/978-3-642-10791-7"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1007\/PL00011679","article-title":"Arbitrating among competing classifiers using learned referees","volume":"3","author":"Ortega","year":"2001","journal-title":"Knowledge and Information Systems"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1016\/j.dss.2013.06.001","article-title":"Evaluation and aggregation of pay-as-you-drive insurance rate factors: A classification analysis approach","volume":"56","author":"Paefgen","year":"2013","journal-title":"Decision Support Systems"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Pesantez-Narvaez, Jessica, Guillen, Monserrat, and Alca\u00f1iz, Manuela (2019). Predicting motor insurance claims using telematics data\u2014XGBoost versus logistic regression. Risks, 7.","DOI":"10.20944\/preprints201905.0122.v1"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"606","DOI":"10.1080\/10618600.2015.1005213","article-title":"Tweedie\u2019s Compound Poisson Model with Grouped Elastic Net","volume":"25","author":"Qian","year":"2016","journal-title":"Journal of Computational and Graphical Statistics"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1515\/demo-2018-0022","article-title":"Predictive analytics of insurance claims using multivariate decision trees","volume":"6","author":"Quan","year":"2018","journal-title":"Dependence Modeling"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1080\/01621459.1997.10473615","article-title":"Bayesian model averaging for linear regression models","volume":"92","author":"Raftery","year":"1997","journal-title":"Journal of the American Statistical Association"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"265","DOI":"10.2143\/AST.24.2.2005070","article-title":"Modelling the claims process in the presence of covariates","volume":"24","author":"Renshaw","year":"1994","journal-title":"ASTIN Bulletin"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/j.neucom.2016.08.072","article-title":"Dynamic selection of forecast combiners","volume":"218","author":"Sergio","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1080\/03461238.2014.921639","article-title":"Insurance ratemaking using a copula-based multivariate Tweedie model","volume":"2016","author":"Shi","year":"2016","journal-title":"Scandinavian Actuarial Journal"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1214\/19-AOAS1299","article-title":"Regression for copula-linked compound distributions with application in modelling aggregate insurance claims","volume":"14","author":"Shi","year":"2020","journal-title":"The Annals of Applied Statistics"},{"key":"ref_49","first-page":"417","article-title":"Dependent frequency\u2013severity modeling of insurance claims","volume":"64","author":"Shi","year":"2015","journal-title":"Insurance: Mathematics and Economics"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1029\/2003WR002816","article-title":"Artificial neural network ensembles and their application in pooled flood frequency analysis","volume":"40","author":"Shu","year":"2004","journal-title":"Water Resources Research"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Staudt, Yves, and Wagner, Joel (2021). Assessing the performance of random forests for modeling claim severity in collision car insurance. Risks, 9.","DOI":"10.3390\/risks9030053"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"644","DOI":"10.1257\/jel.20191385","article-title":"Model Averaging and Its Use in Economics","volume":"58","author":"Steel","year":"2020","journal-title":"Journal of Economic Literature"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"e0238000","DOI":"10.1371\/journal.pone.0238000","article-title":"Stochastic gradient boosting frequency-severity model of insurance claims","volume":"15","author":"Su","year":"2020","journal-title":"PLoS ONE"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"1275","DOI":"10.1111\/rssc.12283","article-title":"Unravelling the predictive power of telematics data in car insurance pricing","volume":"67","author":"Verbelen","year":"2018","journal-title":"Journal of the Royal Statistical Society. Series C. Applied Statistics"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1016\/S0893-6080(05)80023-1","article-title":"Stacked generalization","volume":"5","author":"Wolpert","year":"1992","journal-title":"Neural Networks"},{"key":"ref_56","unstructured":"W\u00fcthrich, Mario V., and Buser, Christoph (2023). Data Analytics for Non-Life Insurance Pricing, ETH Zurich. Swiss Finance Institute Research Paper No. 16-68."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"W\u00fcthrich, Mario V., and Merz, Michael (2023). Statistical Foundations of Actuarial Learning and Applications, Springer.","DOI":"10.1007\/978-3-031-12409-9"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"456","DOI":"10.1080\/07350015.2016.1200981","article-title":"Insurance premium prediction via gradient tree-boosted Tweedie compound Poisson models","volume":"36","author":"Yang","year":"2018","journal-title":"Journal of Business & Economic Statistics"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1016\/j.aap.2016.11.018","article-title":"A multivariate random-parameters Tobit model for analyzing highway crash rates by injury severity","volume":"99","author":"Zeng","year":"2017","journal-title":"Accident Analysis & Prevention"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"5507","DOI":"10.1080\/03610918.2020.1772302","article-title":"Tweedie Gradient Boosting for Extremely Unbalanced Zero-inflated Data","volume":"51","author":"Zhou","year":"2022","journal-title":"Communications in Statistics\u2014Simulation and Computation"}],"container-title":["Risks"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9091\/11\/9\/163\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:49:26Z","timestamp":1760129366000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9091\/11\/9\/163"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,12]]},"references-count":60,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2023,9]]}},"alternative-id":["risks11090163"],"URL":"https:\/\/doi.org\/10.3390\/risks11090163","relation":{},"ISSN":["2227-9091"],"issn-type":[{"value":"2227-9091","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,12]]}}}