{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,12]],"date-time":"2026-02-12T13:40:28Z","timestamp":1770903628051,"version":"3.50.1"},"reference-count":46,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2022,6,13]],"date-time":"2022-06-13T00:00:00Z","timestamp":1655078400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Puglia Region (Italy)\u2013Project \u201cVOice Intelligence for Customer Experience (VO.I.C.E. First)\u201d"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>This paper focuses on the automatic analysis of conversation transcriptions in the call center of a customer care service. The goal is to recognize topics related to problems and complaints discussed in several dialogues between customers and agents. Our study aims to implement a framework able to automatically cluster conversation transcriptions into cohesive and well-separated groups based on the content of the data. The framework can alleviate the analyst selecting proper values for the analysis and the clustering processes. To pursue this goal, we consider a probabilistic model based on the latent Dirichlet allocation, which associates transcriptions with a mixture of topics in different proportions. A case study consisting of transcriptions in the Italian natural language, and collected in a customer support center of an energy supplier, is considered in the paper. Performance comparison of different inference techniques is discussed using the case study. The experimental results demonstrate the approach\u2019s efficacy in clustering Italian conversation transcriptions. It also results in a practical tool to simplify the analytic process and off-load the parameter tuning from the end-user. According to recent works in the literature, this paper may be valuable for introducing latent Dirichlet allocation approaches in topic modeling for the Italian natural language.<\/jats:p>","DOI":"10.3390\/a15060204","type":"journal-article","created":{"date-parts":[[2022,6,13]],"date-time":"2022-06-13T06:31:59Z","timestamp":1655101919000},"page":"204","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Topic Modeling for Automatic Analysis of Natural Language: A Case Study in an Italian Customer Support Center"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1507-8188","authenticated-orcid":false,"given":"Gabriele","family":"Papadia","sequence":"first","affiliation":[{"name":"Department of Engineering for Innovation, University of Salento, 73100 Lecce, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3712-7932","authenticated-orcid":false,"given":"Massimo","family":"Pacella","sequence":"additional","affiliation":[{"name":"Department of Engineering for Innovation, University of Salento, 73100 Lecce, Italy"}]},{"given":"Vincenzo","family":"Giliberti","sequence":"additional","affiliation":[{"name":"IN & OUT S.p.A. a Socio Unico Teleperformance S.E., 74121 Taranto, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2022,6,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1016\/j.inffus.2016.10.004","article-title":"A review of natural language processing techniques for opinion mining systems","volume":"36","author":"Sun","year":"2017","journal-title":"Inf. Fusion"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Mukhamediev, R.I., Symagulov, A., Kuchin, Y., Yakunin, K., and Yelis, M. (2021). From Classical Machine Learning to Deep Neural Networks: A Simplified Scientometric Review. Appl. Sci., 11.","DOI":"10.3390\/app11125541"},{"key":"ref_3","unstructured":"Gupta, P., and Narang, B. (2012). Role of text mining in business intelligence. Gian Jyoti E-J., 1."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Hofmann, T. (1999, January 15\u201319). Probabilistic latent semantic indexing. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA.","DOI":"10.1145\/312624.312649"},{"key":"ref_5","unstructured":"Xu, W., Liu, X., and Gong, Y. (August, January 28). Document clustering based on non-negative matrix factorization. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, ON, Canada."},{"key":"ref_6","first-page":"993","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1064","DOI":"10.1111\/ajps.12103","article-title":"Structural topic models for open-ended survey responses","volume":"58","author":"Roberts","year":"2014","journal-title":"Am. J. Political Sci."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"2293","DOI":"10.1016\/j.ins.2011.01.029","article-title":"Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization","volume":"181","author":"Huang","year":"2011","journal-title":"Inf. Sci."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1080\/10580530.2020.1746982","article-title":"Exploratory analysis of internet of things (IoT) in healthcare: A topic modelling & co-citation approaches","volume":"38","author":"Dantu","year":"2021","journal-title":"Inf. Syst. Manag."},{"key":"ref_10","first-page":"0165551520930907","article-title":"A topic analysis method based on a three-dimensional strategic diagram","volume":"47","author":"Feng","year":"2020","journal-title":"J. Inf. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s13278-021-00767-7","article-title":"Identifying Covid-19 misinformation tweets and learning their spatio-temporal topic dynamics using Nonnegative Coupled Matrix Tensor Factorization","volume":"11","author":"Balasubramaniam","year":"2021","journal-title":"Soc. Netw. Anal. Min."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wallach, H.M., Murray, I., Salakhutdinov, R., and Mimno, D. (2009, January 14\u201318). Evaluation methods for topic models. Proceedings of the ICML\u201909, Montreal, QC, Canada.","DOI":"10.1145\/1553374.1553515"},{"key":"ref_13","unstructured":"Buntine, W. (2009, January 14\u201318). Estimating likelihoods for topic models. Proceedings of the ACML\u201909, Montreal, QC, Canada."},{"key":"ref_14","first-page":"136","article-title":"Sentiment analysis of Italian and English corpora of internet news: A comparison with some economic trends","volume":"5","author":"Pavan","year":"2022","journal-title":"Int. J. Linguist. Lit. Transl."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.neucom.2019.10.009","article-title":"A hybrid Persian sentiment analysis framework: Integrating dependency grammar based rules and deep neural networks","volume":"380","author":"Dashtipour","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Catelli, R., Pelosi, S., and Esposito, M. (2022). Lexicon-based vs. Bert-based sentiment analysis: A comparative study in Italian. Electronics, 11.","DOI":"10.3390\/electronics11030374"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zubani, M., Sigalini, L., Serina, I., Putelli, L., Gerevini, A.E., and Chiari, M. (2022). A Performance Comparison of Different Cloud-Based Natural Language Understanding Services for an Italian e-Learning Platform. Future Internet, 14.","DOI":"10.3390\/fi14020062"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Tur, G., and De Mori, R. (2011). Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, John Wiley & Sons.","DOI":"10.1002\/9781119992691"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1002\/9781119992691.ch12","article-title":"Topic identification","volume":"Volume 12","author":"Hazen","year":"2011","journal-title":"Spoken Language Understanding: Systems for Extracting Semantic Information from Speech"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhao, G., Zhao, J., Li, Y., Alt, C., Schwarzenberg, R., Hennig, L., Schaffer, S., Schmeier, S., Hu, C., and Xu, F. (2019). MOLI: Smart conversation agent for mobile customer service. Information, 10.","DOI":"10.3390\/info10020063"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"859","DOI":"10.1080\/01621459.2017.1285773","article-title":"Variational inference: A review for statisticians","volume":"112","author":"Blei","year":"2017","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"101582","DOI":"10.1016\/j.is.2020.101582","article-title":"A review of topic modeling methods","volume":"94","author":"Vayansky","year":"2020","journal-title":"Inf. Syst."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Foulds, J., Boyles, L., DuBois, C., Smyth, P., and Welling, M. (2013, January 11\u201314). Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation. Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA.","DOI":"10.1145\/2487575.2487697"},{"key":"ref_24","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_25","unstructured":"Rehurek, R., and Sojka, P. (2010, January 22). Software framework for topic modelling with large corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"5228","DOI":"10.1073\/pnas.0307752101","article-title":"Finding scientific topics","volume":"101","author":"Griffiths","year":"2004","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., and Welling, M. (2008, January 24\u201327). Fast collapsed gibbs sampling for latent dirichlet allocation. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA.","DOI":"10.1145\/1401890.1401960"},{"key":"ref_28","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv."},{"key":"ref_29","unstructured":"Chen, M. (2017). Efficient vector representation for documents through corruption. arXiv."},{"key":"ref_30","unstructured":"Le, Q., and Mikolov, T. (2014, January 22\u201324). Distributed representations of sentences and documents. Proceedings of the International Conference on Machine Learning, PMLR, Beijing, China."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9","article-title":"Indexing by latent semantic analysis","volume":"41","author":"Deerwester","year":"1990","journal-title":"J. Am. Soc. Inf. Sci."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"40","DOI":"10.22215\/timreview\/1170","article-title":"A topic modelling analysis of living Labs research","volume":"8","author":"Westerlund","year":"2018","journal-title":"Technol. Innov. Manag. Rev."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"106511","DOI":"10.1016\/j.compchemeng.2019.106511","article-title":"Forty years of Computers and Chemical Engineering: Analysis of the field via text mining techniques","volume":"129","author":"Zhang","year":"2019","journal-title":"Comput. Chem. Eng."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1016\/j.jbusres.2019.01.053","article-title":"A text mining and topic modelling perspective of ethnic marketing research","volume":"103","author":"Moro","year":"2019","journal-title":"J. Bus. Res."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Anantharaman, A., Jadiya, A., Siri, C.T.S., Adikar, B.N., and Mohan, B. (2019, January 23\u201325). Performance evaluation of topic modeling algorithms for text classification. Proceedings of the 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India.","DOI":"10.1109\/ICOEI.2019.8862599"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"979","DOI":"10.1080\/08839514.2019.1661576","article-title":"Review and implementation of topic modeling in Hindi","volume":"33","author":"Ray","year":"2019","journal-title":"Appl. Artif. Intell."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"5055","DOI":"10.1007\/s12652-020-01956-6","article-title":"Implementation and comparison of topic modeling techniques based on user reviews in e-commerce recommendations","volume":"12","author":"Chehal","year":"2021","journal-title":"J. Ambient. Intell. Humaniz. Comput."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1162\/tacl_a_00325","article-title":"Topic modeling in embedding spaces","volume":"8","author":"Dieng","year":"2020","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_39","unstructured":"Wallach, H.M., Mimno, D.M., and McCallum, A. (2009, January 6\u20138). Rethinking LDA: Why priors matter. Proceedings of the NIPS\u201909, Vancouver, BC, Canada."},{"key":"ref_40","unstructured":"Teh, Y.W., Jordan, M.I., Beal, M.J., and Blei, D.M. (2005, January 5\u20138). Sharing clusters among related groups: Hierarchical Dirichlet processes. Proceedings of the NIPS\u201905, Vancouver, BC, Canada."},{"key":"ref_41","unstructured":"Asuncion, A., Welling, M., Smyth, P., and Teh, Y. (2009, January 18\u201321). On smoothing and inference for topic models. Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), Montreal, QC, Canada."},{"key":"ref_42","first-page":"1353","article-title":"A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation","volume":"19","author":"Teh","year":"2006","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_43","first-page":"1303","article-title":"Stochastic variational inference","volume":"14","author":"Hoffman","year":"2013","journal-title":"J. Mach. Learn. Res."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Saleh, I., and El-Tazi, N. (2017, January 24\u201327). Automatic organization of semantically related tags using topic modelling. Proceedings of the European Conference on Advances in Databases and Information Systems, Nicosia, Cyprus.","DOI":"10.1007\/978-3-319-67162-8_23"},{"key":"ref_45","first-page":"1","article-title":"A heuristic approach to determine an appropriate number of topics in topic modeling","volume":"Volume 16","author":"Zhao","year":"2015","journal-title":"Proceedings of the BMC Bioinformatics"},{"key":"ref_46","unstructured":"Hinton, G.E., and Roweis, S. (2002). Stochastic neighbor embedding. Adv. Neural Inf. Process. Syst., 15."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/6\/204\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:28:35Z","timestamp":1760138915000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/6\/204"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,13]]},"references-count":46,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["a15060204"],"URL":"https:\/\/doi.org\/10.3390\/a15060204","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,13]]}}}