{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T06:22:17Z","timestamp":1770963737587,"version":"3.50.1"},"reference-count":36,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,2,8]],"date-time":"2023-02-08T00:00:00Z","timestamp":1675814400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Puglia Region (Italy)\u2014Project \u201cVOice Intelligence for Customer Experience (VO.I.C.E. First)\u201d"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>The paper deals with the analysis of conversation transcriptions between customers and agents in a call center of a customer care service. The objective is to support the analysis of text transcription of human-to-human conversations, to obtain reports on customer problems and complaints, and on the way an agent has solved them. The aim is to provide customer care service with a high level of efficiency and user satisfaction. To this aim, topic modeling is considered since it facilitates insightful analysis from large documents and datasets, such as a summarization of the main topics and topic characteristics. This paper presents a performance comparison of four topic modeling algorithms: (i) Latent Dirichlet Allocation (LDA); (ii) Non-negative Matrix Factorization (NMF); (iii) Neural-ProdLDA (Neural LDA) and Contextualized Topic Models (CTM). The comparison study is based on a database containing real conversation transcriptions in Italian Natural Language. Experimental results and different topic evaluation metrics are analyzed in this paper to determine the most suitable model for the case study. The gained knowledge can be exploited by practitioners to identify the optimal strategy and to perform and evaluate topic modeling on Italian natural language transcriptions of human-to-human conversations. This work can be an asset for grounding applications of topic modeling and can be inspiring for similar case studies in the domain of customer care quality.<\/jats:p>","DOI":"10.3390\/a16020094","type":"journal-article","created":{"date-parts":[[2023,2,8]],"date-time":"2023-02-08T05:37:31Z","timestamp":1675834651000},"page":"94","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["A Comparison of Different Topic Modeling Methods through a Real Case Study of Italian Customer Care"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1507-8188","authenticated-orcid":false,"given":"Gabriele","family":"Papadia","sequence":"first","affiliation":[{"name":"Department of Engineering for Innovation, University of Salento, 73100 Lecce, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3712-7932","authenticated-orcid":false,"given":"Massimo","family":"Pacella","sequence":"additional","affiliation":[{"name":"Department of Engineering for Innovation, University of Salento, 73100 Lecce, Italy"}]},{"given":"Massimiliano","family":"Perrone","sequence":"additional","affiliation":[{"name":"Department of Engineering for Innovation, University of Salento, 73100 Lecce, Italy"}]},{"given":"Vincenzo","family":"Giliberti","sequence":"additional","affiliation":[{"name":"IN & OUT S.p.A. a Socio Unico Teleperformance S.E., 74121 Taranto, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,8]]},"reference":[{"key":"ref_1","first-page":"993","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_2","unstructured":"Leen, T., Dietterich, T., and Tresp, V. Algorithms for Non-negative Matrix Factorization. Proceedings of the Advances in Neural Information Processing Systems."},{"key":"ref_3","unstructured":"Srivastava, A., and Sutton, C. (2017). Autoencoding Variational Inference For Topic Models. arXiv."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Bianchi, F., Terragni, S., Hovy, D., Nozza, D., and Fersini, E. (2021, January 19\u201323). Cross-lingual Contextualized Topic Models with Zero-shot Learning. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online.","DOI":"10.18653\/v1\/2021.eacl-main.143"},{"key":"ref_5","unstructured":"Dieng, A.B., Ruiz, F.J., and Blei, D.M. (2019). The dynamic embedded topic model. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1852102.1852106","article-title":"A similarity measure for indefinite rankings","volume":"28","author":"Webber","year":"2010","journal-title":"ACM Trans. Inf. Syst. (TOIS)"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Papadia, G., Pacella, M., and Giliberti, V. (2022). Topic Modeling for Automatic Analysis of Natural Language: A Case Study in an Italian Customer Support Center. Algorithms, 15.","DOI":"10.3390\/a15060204"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3507900","article-title":"The evolution of topic modeling","volume":"54","author":"Churchill","year":"2022","journal-title":"ACM Comput. Surv."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1023\/A:1007692713085","article-title":"Text classification from labeled and unlabeled documents using EM","volume":"39","author":"Nigam","year":"2000","journal-title":"Mach. Learn."},{"key":"ref_10","unstructured":"Blei, D., and Lafferty, J. (2006, January 4\u20137). Correlated topic models. Proceedings of the NIPS\u201906, Vancouver, BC, Canada."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1162\/tacl_a_00325","article-title":"Topic modeling in embedding spaces","volume":"8","author":"Dieng","year":"2020","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Bianchi, F., Terragni, S., and Hovy, D. (2020). Pre-training is a hot topic: Contextualized document embeddings improve topic coherence. arXiv.","DOI":"10.18653\/v1\/2021.acl-short.96"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Lau, J.H., Newman, D., and Baldwin, T. (2014, January 26\u201330). Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. Proceedings of the EACL\u201914, Gothenburg, Sweden.","DOI":"10.3115\/v1\/E14-1056"},{"key":"ref_14","first-page":"3111","article-title":"Distributed representations of words and phrases and their compositionality","volume":"26","author":"Mikolov","year":"2013","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Xia, L., Luo, D., Zhang, C., and Wu, Z. (2019, January 25\u201328). A survey of topic models in text classification. Proceedings of the 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China.","DOI":"10.1109\/ICAIBD.2019.8836970"},{"key":"ref_16","first-page":"1","article-title":"A detailed survey on topic modeling for document and short text data","volume":"178","author":"Likhitha","year":"2019","journal-title":"Int. J. Comput. Appl."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"102131","DOI":"10.1016\/j.is.2022.102131","article-title":"Topic modeling algorithms and applications: A survey","volume":"112","author":"Abdelrazek","year":"2022","journal-title":"Inf. Syst."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Liu, Z., Ng, A., Lee, S., Aw, A.T., and Chen, N.F. (2019, January 14\u201318). Topic-aware pointer-generator networks for summarizing spoken conversations. Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Singapore.","DOI":"10.1109\/ASRU46091.2019.9003764"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Tur, G., and De Mori, R. (2011). Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, John Wiley & Sons.","DOI":"10.1002\/9781119992691"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"15169","DOI":"10.1007\/s11042-018-6894-4","article-title":"Latent Dirichlet Allocation (LDA) and Topic modeling: Models, applications, a survey","volume":"78","author":"Jelodar","year":"2019","journal-title":"Multimed. Tools Appl."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1002\/9781119992691.ch12","article-title":"Chapter 12: Topic identification","volume":"Volume 12","author":"Hazen","year":"2011","journal-title":"Spoken Language Understanding: Systems for Extracting Semantic Information from Speech"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhao, G., Zhao, J., Li, Y., Alt, C., Schwarzenberg, R., Hennig, L., Schaffer, S., Schmeier, S., Hu, C., and Xu, F. (2019). MOLI: Smart conversation agent for mobile customer service. Information, 10.","DOI":"10.3390\/info10020063"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Pota, M., Ventura, M., Catelli, R., and Esposito, M. (2020). An effective BERT-based pipeline for Twitter sentiment analysis: A case study in Italian. Sensors, 21.","DOI":"10.3390\/s21010133"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Agostino, D., Brambilla, M., Pavanetto, S., and Riva, P. (2021). The contribution of online reviews for quality evaluation of cultural tourism offers: The experience of Italian museums. Sustainability, 13.","DOI":"10.3390\/su132313340"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Aria, M., Cuccurullo, C., D\u2019Aniello, L., Misuraca, M., and Spano, M. (2022). Thematic analysis as a new culturomic tool: The social media coverage on COVID-19 pandemic in Italy. Sustainability, 14.","DOI":"10.3390\/su14063643"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Murdock, J., and Allen, C. (2015, January 25\u201330). Visualization Techniques for Topic Model Checking. Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA.","DOI":"10.1609\/aaai.v29i1.9268"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1080\/19312458.2018.1430754","article-title":"Applying LDA topic modeling in communication research: Toward a valid and reliable methodology","volume":"12","author":"Maier","year":"2018","journal-title":"Commun. Methods Meas."},{"key":"ref_28","unstructured":"Kingma, D.P., and Welling, M. (2013). Auto-Encoding Variational Bayes. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Terragni, S., Fersini, E., Galuzzi, B.G., Tropeano, P., and Candelieri, A. (2021, January 19\u201323). Octis: Comparing and optimizing topic models is simple!. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, Online.","DOI":"10.18653\/v1\/2021.eacl-demos.31"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"R\u00f6der, M., Both, A., and Hinneburg, A. (2015, January 2\u20136). Exploring the space of topic coherence measures. Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, Shanghai, China.","DOI":"10.1145\/2684822.2685324"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Phan, X.H., Nguyen, L.M., and Horiguchi, S. (2008, January 21\u201325). Learning to classify short and sparse text & web with hidden topics from large-scale data collections. Proceedings of the 17th International Conference on World Wide Web, Beijing, China.","DOI":"10.1145\/1367497.1367510"},{"key":"ref_32","unstructured":"(2022, December 11). Simplemma: A Simple Multilingual Lemmatizer for Python [Computer Software]. Available online: https:\/\/github.com\/adbar\/simplemma."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Barbaresi, A., and Hein, K. (2017, January 27\u201331). Data-driven identification of German phrasal compounds. Proceedings of the International Conference on Text, Speech, and Dialogue, Prague, Czech Republic.","DOI":"10.1007\/978-3-319-64206-2_22"},{"key":"ref_34","unstructured":"Barbaresi, A. (2016, January 12). An unsupervised morphological criterion for discriminating similar languages. Proceedings of the 3rd Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2016), Osaka, Japan."},{"key":"ref_35","unstructured":"Barbaresi, A. (2016, January 19\u201321). Bootstrapped OCR error detection for a less-resourced language variant. Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), Bochum, Germany."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Guo, L., Li, S., Lu, R., Yin, L., Gorson-Deruel, A., and King, L. (2018). The research topic landscape in the literature of social class and inequality. PLoS ONE, 13.","DOI":"10.1371\/journal.pone.0199510"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/2\/94\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:27:51Z","timestamp":1760120871000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/2\/94"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,8]]},"references-count":36,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["a16020094"],"URL":"https:\/\/doi.org\/10.3390\/a16020094","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,8]]}}}