{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T16:22:50Z","timestamp":1778343770704,"version":"3.51.4"},"reference-count":47,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2024,12,3]],"date-time":"2024-12-03T00:00:00Z","timestamp":1733184000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>Nowadays, predicting customer churn is essential for the success of any company. Loyal customers generate continuous revenue streams, resulting in long-term success and growth. Moreover, companies are increasingly prioritizing the retention of existing customers due to the higher costs associated with attracting new ones. Consequently, there has been a growing demand for advanced methods aimed at enhancing customer loyalty and satisfaction, as well as predicting churners. In our work, we focused on building a robust churn prediction model for the telecommunications industry based on large embeddings from large language models and logistic regression to accurately identify churners. We conducted extensive experiments using a range of embedding techniques, including OpenAI Text-embedding, Google Gemini Text Embedding, bidirectional encoder representations from transformers (BERT), Sentence-Transformers, Sent2vec, and Doc2vec, to extract meaningful features. Additionally, we tested various classifiers, including logistic regression, support vector machine, random forest, K-nearest neighbors, multilayer perceptron, naive Bayes, decision tree, and zero-shot classification, to build a robust model capable of making accurate predictions. The best-performing model in our experiments is the logistic regression classifier, which we trained using the extracted feature from the OpenAI Text-embedding-ada-002 model, achieving an accuracy of 89%. The proposed model demonstrates a high discriminative ability between churning and loyal customers.<\/jats:p>","DOI":"10.3390\/fi16120453","type":"journal-article","created":{"date-parts":[[2024,12,3]],"date-time":"2024-12-03T09:18:32Z","timestamp":1733217512000},"page":"453","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Customer Churn Prediction Approach Based on LLM Embeddings and Logistic Regression"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-5728-5850","authenticated-orcid":false,"given":"Meryem","family":"Chajia","sequence":"first","affiliation":[{"name":"LISAC Laboratory, Department of Computer Science, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez 30000, Morocco"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5816-0897","authenticated-orcid":false,"given":"El Habib","family":"Nfaoui","sequence":"additional","affiliation":[{"name":"LISAC Laboratory, Department of Computer Science, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez 30000, Morocco"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,12,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1016\/j.procs.2020.03.187","article-title":"Churn Prediction in Telecommunication using Logistic Regression and Logit Boost","volume":"167","author":"Jain","year":"2020","journal-title":"Procedia Comput. Sci."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"165","DOI":"10.3390\/jtaer17010009","article-title":"Customer Churn in Retail E-Commerce Business: Spatial and Machine Learning Approach","volume":"17","author":"Kopczewska","year":"2022","journal-title":"Theor. Appl. Electron. Commer. Res."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"108491","DOI":"10.1016\/j.asoc.2022.108491","article-title":"Churn prediction in digital game-based learning using data mining techniques: Logistic regression, decision tree, and random forest","volume":"118","author":"Kiguchi","year":"2022","journal-title":"Appl. Soft Comput."},{"key":"ref_4","unstructured":"Blank, C., and Hermansson, T. (2024, July 03). A Machine Learning Approach to Churn Prediction in a Subscription-Based Service. Available online: https:\/\/www.diva-portal.org\/smash\/record.jsf?pid=diva2:1271985."},{"key":"ref_5","first-page":"34","article-title":"The hidden costs of customer dissatisfaction","volume":"14","author":"Tatikonda","year":"2013","journal-title":"Manag. Account. Q."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Agbemadon, K.B., Couturier, R., Laiymani, D., Agbemadon, K.B., and Couturier, R. (2022, January 14). Churn detection using machine learning in the retail industry. Proceedings of the 2022 2nd International Conference on Computer, Control and Robotics, Shanghai, China.","DOI":"10.1109\/ICCCR54399.2022.9790213"},{"key":"ref_7","unstructured":"De, S., Prabu, P., and Paulose, J. (2022, January 8). Application of Machine Learning in Customer Churn Prediction. Proceedings of the 3rd IEEE International Virtual Conference on Innovations in Power and Advanced Computing Technologies, Kuala Lumpur, Malaysia."},{"key":"ref_8","unstructured":"(2024, July 03). What Is Customer Churn?. Available online: https:\/\/blog.hubspot.com\/service\/what-is-customer-churn."},{"key":"ref_9","unstructured":"(2024, July 06). Churn Rate: What It Means, Examples, and Calculations. Available online: https:\/\/www.investopedia.com\/terms\/c\/churnrate.asp."},{"key":"ref_10","unstructured":"(2024, July 15). Customer Churn 101: What Is It, Types of Churn, and What to Do About It?. Available online: https:\/\/www.paddle.com\/resources\/customer-churn."},{"key":"ref_11","unstructured":"(2024, July 15). 50 Customer Retention Statistics to Know. Available online: https:\/\/blog.hubspot.com\/service\/statistics-on-customer-retention."},{"key":"ref_12","unstructured":"(2024, July 07). Customer Churn and How to Calculate Churn Rate. Available online: https:\/\/www.qualtrics.com\/experience-management\/customer\/customer-churn\/."},{"key":"ref_13","unstructured":"(2024, July 09). What is Customer Churn?\u2014NGDATA. Available online: https:\/\/www.ngdata.com\/what-is-customer-churn\/."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"100728","DOI":"10.1016\/j.measen.2023.100728","article-title":"E-commerce customer churn prevention using machine learning-based business intelligence strategy","volume":"27","author":"Gangadhar","year":"2023","journal-title":"Meas. Sens."},{"key":"ref_15","first-page":"145","article-title":"Customer churning analysis using machine learning algorithms","volume":"4","author":"Prabadevi","year":"2023","journal-title":"Int. J. Intell. Netw."},{"key":"ref_16","first-page":"100567","article-title":"Explaining customer churn prediction in telecom industry using tabular machine learning models","volume":"17","author":"Poudel","year":"2024","journal-title":"Mach. Learn. Appl."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1016\/j.dsm.2023.09.002","article-title":"Investigating customer churn in banking: A machine learning approach and visualization app for data science and management","volume":"7","author":"Singh","year":"2024","journal-title":"Data Sci. Manag."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"105389","DOI":"10.1016\/j.bandl.2024.105389","article-title":"Sentence-level embeddings reveal dissociable word- and sentence-level cortical representation across coarse- and fine-grained levels of meaning","volume":"250","author":"Fairhall","year":"2024","journal-title":"Brain Lang."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Moghadasi, M.N., and Zhuang, Y. (2021, January 19). Sent2Vec: A New Sentence Embedding Representation with Sentimental Semantic. Proceedings of the IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA.","DOI":"10.1109\/BigData50022.2020.9378337"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1016\/j.procs.2020.12.007","article-title":"Unsupervised News Topic Modelling with Doc2Vec and Spherical Clustering","volume":"179","author":"Budiarto","year":"2021","journal-title":"Procedia Comput. Sci."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"35","DOI":"10.21786\/bbrc\/13.14\/9","article-title":"BERT for Opinion Mining and Sentiment Farming","volume":"13","author":"Buche","year":"2020","journal-title":"Biosc. Biotech. Res. Comm."},{"key":"ref_22","first-page":"112","article-title":"Hey BERT! Meet the Databases: Explorations of Bidirectional Encoder Representation from Transformers Model Use in Database Search Algorithms","volume":"18","author":"Coghill","year":"2021","journal-title":"J. Electron. Resour. Med. Libr."},{"key":"ref_23","first-page":"1641","article-title":"BERT Algorithm used in Google Search","volume":"70","author":"Singh","year":"2021","journal-title":"Math. Stat. Eng. Appl."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"506","DOI":"10.3390\/s23010506","article-title":"A BERT Framework to Sentiment Analysis of Tweets","volume":"23","author":"Bello","year":"2023","journal-title":"Sensors"},{"key":"ref_25","unstructured":"Ronfard, R., and de Verdi\u00e8re, R.C. (2020). OpenKinoAI: An Open Source Framework for Intelligent Cinematography and Editing of Live Performances. arXiv, Available online: https:\/\/arxiv.org\/abs\/2011.05203v1."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1016\/j.image.2019.02.011","article-title":"Prototype adjustment for zero shot classification","volume":"74","author":"Li","year":"2019","journal-title":"Signal Process Image Commun."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Tesfagergish, S.G., Kapo\u010di\u016bt\u0117-Dzikien\u0117, J., and Dama\u0161evi\u010dius, R. (2022). Zero-Shot Emotion Detection for Semi-Supervised Sentiment Analysis Using Sentence Transformers and Ensemble Learning. Appl. Sci., 12.","DOI":"10.3390\/app12178662"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Wang, G., Li, C., Wang, W., Zhang, Y., Shen, D., Zhang, X., Henao, R., and Carin, L. (2018, January 15\u201320). Joint Embedding of Words and Labels for Text Classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), Melbourne, Australia.","DOI":"10.18653\/v1\/P18-1216"},{"key":"ref_29","unstructured":"(2024, July 16). OpenAI| New and Improved Embedding Model. Available online: https:\/\/openai.com\/index\/new-and-improved-embedding-model\/."},{"key":"ref_30","unstructured":"(2024, July 21). Gemini API | Google for Developers. Available online: https:\/\/ai.google.dev\/gemini-api\/docs\/models\/gemini."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"420","DOI":"10.1016\/j.procs.2024.05.049","article-title":"Enhancing Spam Detection with GANs and BERT Embeddings: A Novel Approach to Imbalanced Datasets","volume":"236","author":"Filali","year":"2024","journal-title":"Procedia Comput. Sci."},{"key":"ref_32","unstructured":"(2024, July 25). Sentence Transformers Documentation. Available online: https:\/\/www.sbert.net\/."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. (, January 16\u201320). Transformers: State-of-the-Art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"ref_34","unstructured":"(2024, July 25). Hugging Face| Sentence-Transformers\/All-MiniLM-L6-v2. Available online: https:\/\/huggingface.co\/sentence-transformers\/all-MiniLM-L6-v2."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"119728","DOI":"10.1016\/j.ins.2023.119728","article-title":"Word2Vec-based efficient privacy-preserving shared representation learning for federated recommendation system in a cross-device setting","volume":"651","author":"Lee","year":"2023","journal-title":"Inf. Sci."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"769","DOI":"10.1016\/j.procs.2022.09.132","article-title":"Combining FastText and Glove Word Embedding for Offensive and Hate Speech Text Detection","volume":"207","author":"Badri","year":"2022","journal-title":"Procedia Comput. Sci."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1007\/978-3-7908-1807-9_21","article-title":"Support Vector Machines for Classification and Mapping of Reservoir Data","volume":"Volume 80","author":"Wong","year":"2002","journal-title":"Soft Computing for Reservoir Characterization and Modeling. Studies in Fuzziness and Soft Computing"},{"key":"ref_38","unstructured":"Dierckx, G. (2006). Logistic Regression Model. Encyclopedia of Actuarial Science, John Wiley & Sons, Ltd."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Singh, P. (2022). Random Forests Using PySpark. Machine Learning with PySpark, Springlink (Apress).","DOI":"10.1007\/978-1-4842-7777-5"},{"key":"ref_40","unstructured":"Murthy, K.V.S. (1996). On Growing Better Decision Trees from Data. [Ph.D. Dissertation, The Johns Hopkins University]."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1109\/TPAMI.2013.140","article-title":"Attribute-Based Classification for Zero-Shot Visual Object Categorization","volume":"36","author":"Lampert","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_42","first-page":"47","article-title":"A Customer Churn Prediction using Pearson Correlation Function and K Nearest Neighbor Algorithm for Telecommunication Industry","volume":"11","author":"Sjarif","year":"2019","journal-title":"Int. J. Adv. Soft Compu. Appl."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"994","DOI":"10.1016\/j.asoc.2014.08.041","article-title":"Improved churn prediction in telecommunication industry using data mining techniques","volume":"24","author":"Keramati","year":"2014","journal-title":"Appl. Soft Comput."},{"key":"ref_44","first-page":"70","article-title":"Improved Accuracy of Naive Bayes Classifier for Determination of Customer Churn Uses SMOTE and Genetic Algorithms","volume":"1","author":"Safitri","year":"2020","journal-title":"J. Soft Comput. Explor."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"339","DOI":"10.54691\/bcpbm.v44i.4840","article-title":"Customer Churn Prediction Based on the Decision Tree and Random Forest Model","volume":"44","author":"Zhao","year":"2023","journal-title":"BCP Bus. Manag."},{"key":"ref_46","unstructured":"(2024, June 05). Customer Churn. Available online: https:\/\/www.kaggle.com\/datasets\/undersc0re\/predict-the-churn-risk-rate."},{"key":"ref_47","first-page":"273","article-title":"Machine-Learning Techniques for Customer Retention: A Comparative Study","volume":"9","author":"Sabbeh","year":"2018","journal-title":"Int. J. Adv. Comput. Sci. Appl."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/16\/12\/453\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:46:01Z","timestamp":1760114761000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/16\/12\/453"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,3]]},"references-count":47,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["fi16120453"],"URL":"https:\/\/doi.org\/10.3390\/fi16120453","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,3]]}}}