{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T00:29:26Z","timestamp":1777854566178,"version":"3.51.4"},"reference-count":62,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2023,1,20]],"date-time":"2023-01-20T00:00:00Z","timestamp":1674172800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:p>Probabilistic topic models have become one of the most widespread machine learning techniques in textual analysis. Topic discovering is an unsupervised process that does not guarantee the interpretability of its output. Hence, the automatic evaluation of topic coherence has attracted the interest of many researchers over the last decade, and it is an open research area. This article offers a new quality evaluation method based on statistically validated networks (SVNs). The proposed probabilistic approach consists of representing each topic as a weighted network of its most probable words. The presence of a link between each pair of words is assessed by statistically validating their co-occurrence in sentences against the null hypothesis of random co-occurrence. The proposed method allows one to distinguish between high-quality and low-quality topics, by making use of a battery of statistical tests. The statistically significant pairwise associations of words represented by the links in the SVN might reasonably be expected to be strictly related to the semantic coherence and interpretability of a topic. Therefore, the more connected the network, the more coherent the topic in question. We demonstrate the effectiveness of the method through an analysis of a real text corpus, which shows that the proposed measure is more correlated with human judgement than the state-of-the-art coherence measures.<\/jats:p>","DOI":"10.1177\/01655515221148369","type":"journal-article","created":{"date-parts":[[2023,1,20]],"date-time":"2023-01-20T07:44:25Z","timestamp":1674200665000},"page":"744-765","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":7,"title":["Ranking coherence in topic models using statistically validated networks"],"prefix":"10.1177","volume":"51","author":[{"given":"Andrea","family":"Simonetti","sequence":"first","affiliation":[{"name":"Department of Business Economics and Statistics, University of Palermo, Italy"}]},{"given":"Alessandro","family":"Albano","sequence":"additional","affiliation":[{"name":"Department of Business Economics and Statistics, University of Palermo, Italy"}]},{"given":"Antonella","family":"Plaia","sequence":"additional","affiliation":[{"name":"Department of Business Economics and Statistics, University of Palermo, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6045-6761","authenticated-orcid":false,"given":"Michele","family":"Tumminello","sequence":"additional","affiliation":[{"name":"Department of Business Economics and Statistics, University of Palermo, Italy; Institute for Biomedical Research and Innovation, National Research Council, Italy"}]}],"member":"179","published-online":{"date-parts":[[2023,1,20]]},"reference":[{"key":"e_1_3_5_2_2","doi-asserted-by":"crossref","unstructured":"Feldman R Dagan I. Knowledge discovery in textual databases (KDT) 1995 https:\/\/www.aaai.org\/Papers\/KDD\/1995\/KDD95-012.pdf","DOI":"10.1049\/ic:19950112"},{"key":"e_1_3_5_3_2","unstructured":"Allahyari M Pouriyeh S Assefi M et al. A brief survey of text mining: classification clustering and extraction techniques 2017 https:\/\/arxiv.org\/pdf\/1707.02919.pdf"},{"key":"e_1_3_5_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-28349-8_2"},{"key":"e_1_3_5_5_2","doi-asserted-by":"crossref","unstructured":"McGregor A Hall M Lorier P et al. Flow clustering using machine learning techniques. In: Proceedings of the international workshop on passive and active network measurement 2004 pp. 205\u2013214 https:\/\/www.researchgate.net\/publication\/220850213_Flow_Clustering_Using_Machine_Learning_Techniques","DOI":"10.1007\/978-3-540-24668-8_21"},{"key":"e_1_3_5_6_2","volume-title":"Clustering and information retrieval","author":"Wu W","year":"2003","unstructured":"Wu W, Xiong H, Shekhar S. Clustering and information retrieval. New York: Springer Science & Business Media, 2003."},{"key":"e_1_3_5_7_2","first-page":"993","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei D","year":"2003","unstructured":"Blei D, Ng A, Jordan M. Latent Dirichlet allocation. J Mach Learn Res 2003; 3: 993\u20131022.","journal-title":"J Mach Learn Res"},{"key":"e_1_3_5_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/2133806.2133826"},{"key":"e_1_3_5_9_2","unstructured":"Chang J Gerrish S Wang C et al. Reading tea leaves: how humans interpret topic models 2009 https:\/\/proceedings.neurips.cc\/paper\/2009\/file\/f92586a25bb3145facd64ab20fd554ff-Paper.pdf"},{"key":"e_1_3_5_10_2","doi-asserted-by":"publisher","DOI":"10.1561\/9781680833096"},{"key":"e_1_3_5_11_2","unstructured":"Lau JH Newman D Baldwin T. Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Proceedings of the 14th conference of the European chapter of the Association for Computational Linguistics https:\/\/aclanthology.org\/E14-1056.pdf"},{"key":"e_1_3_5_12_2","unstructured":"Newman D Lau JH Grieser K et al. Automatic evaluation of topic coherence. In: Proceedings of the human language technologies: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics https:\/\/aclanthology.org\/N10-1012.pdf"},{"key":"e_1_3_5_13_2","unstructured":"Aletras N Stevenson M. Evaluating topic coherence using distributional semantics. In: Proceedings of the 10th international conference on computational semantics (IWCS 2013: long papers) https:\/\/aclanthology.org\/W13-0102.pdf"},{"key":"e_1_3_5_14_2","unstructured":"Ramrakhiyani N Pawar S Hingmire S et al. Measuring topic coherence through optimal word buckets. In: Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics (vol. 2: short papers) https:\/\/aclanthology.org\/E17-2070.pdf"},{"key":"e_1_3_5_15_2","unstructured":"Hoyle A Goel P Hian-Cheong A et al. Is automated topic model evaluation broken?The incoherence of coherence 2021 https:\/\/openreview.net\/forum?id=tjdHCnPqoo"},{"key":"e_1_3_5_16_2","unstructured":"AlSumait L Barbar\u00e1 D Gentle J et al. Topic significance ranking of LDA generative models https:\/\/mimno.infosci.cornell.edu\/info6150\/readings\/ECML09_AlSumaitetal.pdf"},{"key":"e_1_3_5_17_2","doi-asserted-by":"publisher","DOI":"10.1177\/0165551515617393"},{"key":"e_1_3_5_18_2","doi-asserted-by":"crossref","unstructured":"R\u00f6der M Both A Hinneburg A. Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM international conference on web search and data mining https:\/\/svn.aksw.org\/papers\/2015\/WSDM_Topic_Evaluation\/public.pdf","DOI":"10.1145\/2684822.2685324"},{"key":"e_1_3_5_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-015-0882-z"},{"key":"e_1_3_5_20_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0017994"},{"key":"e_1_3_5_21_2","doi-asserted-by":"crossref","unstructured":"Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval 1999 https:\/\/sigir.org\/wp-content\/uploads\/2017\/06\/p211.pdf","DOI":"10.1145\/312624.312649"},{"key":"e_1_3_5_22_2","first-page":"147","article-title":"Correlated topic models","volume":"18","author":"Blei D","year":"2006","unstructured":"Blei D, Lafferty J. Correlated topic models. Adv Neur Inform Process Syst 2006; 18: 147.","journal-title":"Adv Neur Inform Process Syst"},{"key":"e_1_3_5_23_2","doi-asserted-by":"crossref","unstructured":"Li W McCallum A. Pachinko allocation: DAG-structured mixture models of topic correlations. In: Proceedings of the 23rd international conference on machine learning 2006 https:\/\/people.cs.umass.edu\/~mccallum\/papers\/pam-icml06.pdf","DOI":"10.1145\/1143844.1143917"},{"key":"e_1_3_5_24_2","unstructured":"Dieng A Ruiz F Blei D. The dynamic embedded topic model 2019 https:\/\/arxiv.org\/abs\/1907.05545"},{"key":"e_1_3_5_25_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00325"},{"key":"e_1_3_5_26_2","unstructured":"Srivastava A Sutton C. Autoencoding variational inference for topic models 2017 https:\/\/homepages.inf.ed.ac.uk\/csutton\/publications\/avitm.pdf"},{"key":"e_1_3_5_27_2","unstructured":"Wallach HM Murray I Salakhutdinov R et al. Evaluation methods for topic models. In: Proceedings of the 26th annual international conference on machine learning http:\/\/dirichlet.net\/pdf\/wallach09evaluation.pdf"},{"issue":"169","key":"e_1_3_5_28_2","first-page":"1","article-title":"In search of coherence and consensus: measuring the interpretability of statistical topics","volume":"18","author":"Morstatter F","year":"2018","unstructured":"Morstatter F, Liu H. In search of coherence and consensus: measuring the interpretability of statistical topics. J Mach Learn Res 2018; 18(169): 1\u201332.","journal-title":"J Mach Learn Res"},{"issue":"8","key":"e_1_3_5_29_2","first-page":"1639","article-title":"Topic discovery based on LDA\u2013col model and topic significance re-ranking","volume":"6","author":"Wang L","year":"2011","unstructured":"Wang L, Wei B, Yuan J. Topic discovery based on LDA\u2013col model and topic significance re-ranking. JCP 2011; 6(8): 1639\u20131647.","journal-title":"JCP"},{"key":"e_1_3_5_30_2","unstructured":"Newman D Karimi S Cavedon L. External evaluation of topic models. In: Proceedings of the Australasian document computing symposium 2009 https:\/\/www.researchgate.net\/publication\/255602484_External_Evaluation_of_Topic_Models"},{"key":"e_1_3_5_31_2","unstructured":"Bouma G. Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of the GSCL 2009 https:\/\/svn.spraakdata.gu.se\/repos\/gerlof\/pub\/www\/Docs\/npmi-pfd.pdf"},{"key":"e_1_3_5_32_2","unstructured":"Mimno D Wallach H Talley E et al. Optimizing semantic coherence in topic models. In: Proceedings of the 2011 conference on empirical methods in natural language processing http:\/\/dirichlet.net\/pdf\/mimno11optimizing.pdf"},{"key":"e_1_3_5_33_2","doi-asserted-by":"publisher","DOI":"10.1177\/0165551515587839"},{"key":"e_1_3_5_34_2","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(88)90021-0"},{"issue":"4","key":"e_1_3_5_35_2","first-page":"774","article-title":"Student mobility in higher education: Sicilian outflow network and chain migrations","volume":"12","author":"Genova VG","year":"2019","unstructured":"Genova VG, Tumminello M, Enea M et al. Student mobility in higher education: Sicilian outflow network and chain migrations. Electron J Appl Stat Anal 2019; 12(4): 774\u2013800.","journal-title":"Electron J Appl Stat Anal"},{"key":"e_1_3_5_36_2","doi-asserted-by":"publisher","DOI":"10.1088\/1742-5468\/ab16c5"},{"key":"e_1_3_5_37_2","doi-asserted-by":"publisher","DOI":"10.1177\/0165551518824577"},{"key":"e_1_3_5_38_2","unstructured":"Paranyushkin D. Identifying the pathways for meaning circulation using text network analysis. Nodus Labs 2011 https:\/\/noduslabs.com\/research\/pathways-meaning-circulation-text-network-analysis\/"},{"key":"e_1_3_5_39_2","doi-asserted-by":"crossref","unstructured":"Miller RG. Simultaneous statistical inference 1981 https:\/\/link.springer.com\/book\/10.1007\/978-1-4613-8122-8","DOI":"10.1007\/978-1-4613-8122-8"},{"key":"e_1_3_5_40_2","first-page":"391","volume-title":"Proceedings of the Pacific-Asia conference on knowledge discovery and data mining","author":"Arun R","unstructured":"Arun R, Suresh V, Madhavan CV et al. On finding the natural number of topics with latent Dirichlet allocation: some observations. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, Hyderabad, India, 21\u201324 June 2010, pp. 391\u2013402. Berlin: Springer."},{"key":"e_1_3_5_41_2","doi-asserted-by":"publisher","DOI":"10.3390\/make1010025"},{"key":"e_1_3_5_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11135-020-00976-w"},{"key":"e_1_3_5_43_2","unstructured":"Chuang J Gupta S Manning C et al. Topic model diagnostics: assessing domain relevance via topical alignment. In: Proceedings of the international conference on machine learning (PMLR) https:\/\/proceedings.mlr.press\/v28\/chuang13.html"},{"key":"e_1_3_5_44_2","doi-asserted-by":"publisher","DOI":"10.1093\/sysbio\/45.3.380"},{"key":"e_1_3_5_45_2","doi-asserted-by":"publisher","DOI":"10.2307\/1932409"},{"key":"e_1_3_5_46_2","volume-title":"Principles of numerical taxonomy","author":"Sokal RR","year":"1963","unstructured":"Sokal RR, Sneath PHA. Principles of numerical taxonomy. New York: W.H. Freeman & Co., 1963."},{"key":"e_1_3_5_47_2","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1983.10478008"},{"key":"e_1_3_5_48_2","doi-asserted-by":"crossref","unstructured":"Xing L Paul MJ Carenini G. Evaluating topic quality with posterior variability 2019 https:\/\/aclanthology.org\/D19-1349.pdf","DOI":"10.18653\/v1\/D19-1349"},{"key":"e_1_3_5_49_2","doi-asserted-by":"crossref","unstructured":"Hong L Davison BD. Empirical study of topic modeling in twitter. In: Proceedings of the first workshop on social media analytics https:\/\/snap.stanford.edu\/soma2010\/papers\/soma2010_12.pdf","DOI":"10.1145\/1964858.1964870"},{"key":"e_1_3_5_50_2","volume-title":"Introduction to information retrieval","author":"Sch\u00fctze H","year":"2008","unstructured":"Sch\u00fctze H, Manning CD, Raghavan P. Introduction to information retrieval, vol. 39. Cambridge: Cambridge University Press, 2008."},{"key":"e_1_3_5_51_2","doi-asserted-by":"publisher","DOI":"10.1017\/pan.2017.44"},{"key":"e_1_3_5_52_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2019.03.001"},{"key":"e_1_3_5_53_2","unstructured":"Waldherr A Heyer G Ja\u00e4hnichen P et al. Mining big data with computational methods https:\/\/www.researchgate.net\/publication\/290192966_Mining_Big_Data_With_Computational_Methods"},{"key":"e_1_3_5_54_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00038-008-7068-3"},{"issue":"2","key":"e_1_3_5_55_2","first-page":"12","article-title":"Four common misuses of the Likert scale","volume":"18","author":"Pornel JB","year":"2013","unstructured":"Pornel JB, Salda\u00f1a GA. Four common misuses of the Likert scale. Philip J Soc Sci Hum Univ Philip Vis 2013; 18(2): 12\u201319.","journal-title":"Philip J Soc Sci Hum Univ Philip Vis"},{"key":"e_1_3_5_56_2","unstructured":"Taherdoost H. What is the best response scale for survey and questionnaire design; review of different lengths of rating scale\/attitude scale\/Likert scale 2019 https:\/\/www.researchgate.net\/publication\/343994538_What_Is_the_Best_Response_Scale_for_Survey_and_Questionnaire_Design_Review_of_Different_Lengths_of_Rating_Scale_Attitude_Scale_Likert_Scale"},{"key":"e_1_3_5_57_2","doi-asserted-by":"publisher","DOI":"10.1002\/pfi.21727"},{"key":"e_1_3_5_58_2","doi-asserted-by":"publisher","DOI":"10.1002\/mcda.313"},{"key":"e_1_3_5_59_2","first-page":"1","article-title":"Consensus measures among preference rankings: a new weighted correlation coefficient for linear and weak orderings","volume":"2021","author":"Plaia A","year":"2021","unstructured":"Plaia A, Buscemi S, Sciandra M. Consensus measures among preference rankings: a new weighted correlation coefficient for linear and weak orderings. J Classif 2021; 2021: 1\u201322.","journal-title":"J Classif"},{"issue":"1","key":"e_1_3_5_60_2","article-title":"Element weighted Kemeny distance for ranking data","volume":"14","author":"Albano A","year":"2021","unstructured":"Albano A, Plaia A. Element weighted Kemeny distance for ranking data. Electron J Appl Stat Anal 2021; 14(1): 117.s\u2013145.s.","journal-title":"Electron J Appl Stat Anal"},{"key":"e_1_3_5_61_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10260-010-0142-z"},{"key":"e_1_3_5_62_2","volume-title":"A new technique for high level decision support","author":"Emond EJ","year":"2000","unstructured":"Emond EJ, Mason DW. A new technique for high level decision support. Ottawa, ON, Canada: Operational Research Division, Department of National Defence, 2000."},{"key":"e_1_3_5_63_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1000488107"}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01655515221148369","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/01655515221148369","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01655515221148369","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T23:10:00Z","timestamp":1777504200000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/01655515221148369"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,20]]},"references-count":62,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["10.1177\/01655515221148369"],"URL":"https:\/\/doi.org\/10.1177\/01655515221148369","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"value":"0165-5515","type":"print"},{"value":"1741-6485","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,20]]}}}