{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,22]],"date-time":"2026-03-22T08:07:05Z","timestamp":1774166825727,"version":"3.50.1"},"reference-count":34,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2021,11,15]],"date-time":"2021-11-15T00:00:00Z","timestamp":1636934400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>The role of insurance in financial inclusion and economic growth, in general, is immense and is increasingly being recognized. However, low uptake impedes the growth of the sector, hence the need for a model that robustly predicts insurance uptake among potential clients. This study undertook a two phase comparison of machine learning classifiers. Phase I had eight machine learning models compared for their performance in predicting the insurance uptake using 2016 Kenya FinAccessHousehold Survey data. Taking Phase I as a base in Phase II, random forest and XGBoost were compared with four deep learning classifiers using 2019 Kenya FinAccess Household Survey data. The random forest model trained on oversampled data showed the highest F1-score, accuracy, and precision. The area under the receiver operating characteristic curve was furthermore highest for random forest; hence, it could be construed as the most robust model for predicting the insurance uptake. Finally, the most important features in predicting insurance uptake as extracted from the random forest model were income, bank usage, and ability and willingness to support others. Hence, there is a need for a design and distribution of low income based products, and bancassurance could be said to be a plausible channel for the distribution of insurance products.<\/jats:p>","DOI":"10.3390\/data6110116","type":"journal-article","created":{"date-parts":[[2021,11,15]],"date-time":"2021-11-15T08:19:20Z","timestamp":1636964360000},"page":"116","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["A Comparative Analysis of Machine Learning Models for the Prediction of Insurance Uptake in Kenya"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8888-1110","authenticated-orcid":false,"given":"Nelson Kemboi","family":"Yego","sequence":"first","affiliation":[{"name":"African Center of Excellence in Data Science, University of Rwanda, Kigali, Rwanda"},{"name":"Faculty of Sciences, Department of Mathematics and Computing, Moi University, Eldoret 3900-30100, Kenya"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0941-9604","authenticated-orcid":false,"given":"Juma","family":"Kasozi","sequence":"additional","affiliation":[{"name":"African Center of Excellence in Data Science, University of Rwanda, Kigali, Rwanda"},{"name":"Faculty of Physical Sciences, Department of Mathematics, Makerere University, Kampala 7062-10218, Uganda"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8685-9806","authenticated-orcid":false,"given":"Joseph","family":"Nkurunziza","sequence":"additional","affiliation":[{"name":"African Center of Excellence in Data Science, University of Rwanda, Kigali, Rwanda"},{"name":"School of Economics, University of Rwanda, Kigali, Rwanda"}]}],"member":"1968","published-online":{"date-parts":[[2021,11,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1150390","DOI":"10.1080\/23322039.2016.1150390","article-title":"Insurance penetration and economic growth in Africa: Dynamic effects analysis using Bayesian TVP-VAR approach","volume":"4","author":"Olayungbo","year":"2016","journal-title":"Cogent Econ. Financ."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zhou, J., Guo, Y., Ye, Y., and Jiang, J. (2020, January 27\u201329). Multi-Label Entropy-Based Feature Selection with Applications to Insurance Purchase Prediction. Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China.","DOI":"10.1109\/ICAICA50127.2020.9181921"},{"key":"ref_3","unstructured":"African Union Commission (2017). Agenda2063-The Africa We Want, African Union Commission."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Lambregts, T.R., and Schut, F.T. (2019). A Systematic Review of the Reasons for Low Uptake of Long-Term Care Insurance and Life Annuities: Could Integrated Products Counter Them?, Netspar.","DOI":"10.1016\/j.jeoa.2020.100236"},{"key":"ref_5","unstructured":"AKI (2015). Insurance Industry Annual Report 2015, Association of Kenya Insurers. Technical Report."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Gine, X., Ribeiro, B., and Wrede, P. (2019). Beyond the S-Curve: Insurance Penetration, Institutional Quality and Financial Market Development, The World Bank.","DOI":"10.1596\/1813-9450-8925"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"257204","DOI":"10.1103\/PhysRevLett.120.257204","article-title":"Machine learning out-of-equilibrium phases of matter","volume":"120","author":"Venderley","year":"2018","journal-title":"Phys. Rev. Lett."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"L\u00f3pez Belmonte, J., Segura-Robles, A., Moreno-Guerrero, A.J., and Parra-Gonz\u00e1lez, M.E. (2020). Machine learning and big data in the impact literature. A bibliometric review with scientific mapping in Web of science. Symmetry, 12.","DOI":"10.3390\/sym12040495"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"523","DOI":"10.1002\/asmb.2543","article-title":"Machine learning applications in nonlife insurance","volume":"36","author":"Grize","year":"2020","journal-title":"Appl. Stoch. Model. Bus. Ind."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Krah, A.S., Nikoli\u0107, Z., and Korn, R. (2020). Machine learning in least-squares Monte Carlo proxy modeling of life insurance companies. Risks, 8.","DOI":"10.3390\/risks8010021"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"B\u00e4rtl, M., and Krummaker, S. (2020). Prediction of claims in export credit finance: A comparison of four machine learning techniques. Risks, 8.","DOI":"10.3390\/risks8010022"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Petrides, G., Moldovan, D., Coenen, L., Guns, T., and Verbeke, W. (2020). Cost-sensitive learning for profit-driven credit scoring. J. Oper. Res. Soc., 1\u201313.","DOI":"10.1080\/01605682.2020.1843975"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.is.2015.04.007","article-title":"Time-series clustering\u2013a decade review","volume":"53","author":"Aghabozorgi","year":"2015","journal-title":"Inf. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Pavlyshenko, B.M. (2019). Machine-learning models for sales time series forecasting. Data, 4.","DOI":"10.3390\/data4010015"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Dashtipour, K., Gogate, M., Adeel, A., Ieracitano, C., Larijani, H., and Hussain, A. (2018, January 7\u20138). Exploiting deep learning for Persian sentiment analysis. Proceedings of the International Conference on Brain Inspired Cognitive Systems, Xi\u2019an, China.","DOI":"10.1007\/978-3-030-00563-4_58"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1016\/j.patrec.2009.09.011","article-title":"Data clustering: 50 years beyond K-means","volume":"31","author":"Jain","year":"2010","journal-title":"Pattern Recognit. Lett."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"788","DOI":"10.1016\/j.asoc.2015.09.040","article-title":"Artificial neural networks in business: Two decades of research","volume":"38","author":"Verner","year":"2016","journal-title":"Appl. Soft Comput."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"368","DOI":"10.1016\/j.engappai.2014.09.019","article-title":"A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance","volume":"37","author":"Sundarkumar","year":"2015","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Shalev-Shwartz, S., and Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press.","DOI":"10.1017\/CBO9781107298019"},{"key":"ref_20","first-page":"41","article-title":"Applications of support vector machine (SVM) learning in cancer genomics","volume":"15","author":"Huang","year":"2018","journal-title":"Cancer Genom.-Proteom."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"548","DOI":"10.1016\/j.procs.2020.01.049","article-title":"Effective Diagnosis of Alzheimer\u2019s Disease using Modified Decision Tree Classifier","volume":"165","author":"Naganandhini","year":"2019","journal-title":"Procedia Comput. Sci."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1212\/WNL.50.3_Suppl_3.S1","article-title":"An algorithm (decision tree) for the management of Parkinson\u2019s disease: Treatment guidelines","volume":"50","author":"Olanow","year":"1998","journal-title":"Neurology"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1016\/j.proeng.2012.01.849","article-title":"Network anomaly detection by cascading k-Means clustering and C4. 5 decision tree algorithm","volume":"30","author":"Muniyandi","year":"2012","journal-title":"Procedia Eng."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1016\/j.geoderma.2017.12.002","article-title":"Spatial prediction of soil water retention in a P\u00e1ramo landscape: Methodological insight into machine learning using random forest","volume":"316","author":"Blanco","year":"2018","journal-title":"Geoderma"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1016\/j.foodres.2019.03.062","article-title":"Comparison between random forest and gradient boosting machine methods for predicting Listeria spp. prevalence in the environment of pastured poultry farms","volume":"122","author":"Golden","year":"2019","journal-title":"Food Res. Int."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1016\/j.energy.2019.05.230","article-title":"Predicting residential energy consumption using CNN-LSTM neural networks","volume":"182","author":"Kim","year":"2019","journal-title":"Energy"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Sun, J., Di, L., Sun, Z., Shen, Y., and Lai, Z. (2019). County-level soybean yield prediction using deep CNN-LSTM model. Sensors, 19.","DOI":"10.3390\/s19204363"},{"key":"ref_28","unstructured":"Central Bank of Kenya, FSD Kenya, and Kenya National Bureau of Statistics (2016). FinAccess Household Survey 2015, Central Bank of Kenya."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"7940","DOI":"10.1109\/ACCESS.2016.2619719","article-title":"Comparing oversampling techniques to handle the class imbalance problem: A customer churn prediction case study","volume":"4","author":"Amin","year":"2016","journal-title":"IEEE Access"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Pawluszek-Filipiak, K., and Borkowski, A. (2020). On the Importance of Train\u2013Test Split Ratio of Datasets in Automatic Landslide Detection by Supervised Classification. Remote Sens., 12.","DOI":"10.3390\/rs12183054"},{"key":"ref_31","unstructured":"Poria, S., Cambria, E., Hazarika, D., Majumder, N., Zadeh, A., and Morency, L.P. (August, January 30). Context-dependent sentiment analysis in user-generated videos. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"118271","DOI":"10.1016\/j.conbuildmat.2020.118271","article-title":"An ensemble machine learning approach for prediction and optimization of modulus of elasticity of recycled aggregate concrete","volume":"244","author":"Han","year":"2020","journal-title":"Constr. Build. Mater."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Casalicchio, G., Molnar, C., and Bischl, B. (2018, January 10\u201314). Visualizing the feature importance for black box models. Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Dublin, Ireland.","DOI":"10.1007\/978-3-030-10925-7_40"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Pesantez-Narvaez, J., Guillen, M., and Alca\u00f1iz, M. (2019). Predicting motor insurance claims using telematics data\u2014XGBoost versus logistic regression. Risks, 7.","DOI":"10.20944\/preprints201905.0122.v1"}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/6\/11\/116\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:30:15Z","timestamp":1760167815000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/6\/11\/116"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,15]]},"references-count":34,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2021,11]]}},"alternative-id":["data6110116"],"URL":"https:\/\/doi.org\/10.3390\/data6110116","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,15]]}}}