{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,30]],"date-time":"2026-05-30T02:20:00Z","timestamp":1780107600919,"version":"3.54.0"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,7,22]],"date-time":"2023-07-22T00:00:00Z","timestamp":1689984000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,7,22]],"date-time":"2023-07-22T00:00:00Z","timestamp":1689984000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000024","name":"CIHR","doi-asserted-by":"crossref","award":["FDN 143303"],"award-info":[{"award-number":["FDN 143303"]}],"id":[{"id":"10.13039\/501100000024","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Topic models are a class of unsupervised machine learning models, which facilitate summarization, browsing and retrieval from large unstructured document collections. This study reviews several methods for assessing the quality of unsupervised topic models estimated using non-negative matrix factorization. Techniques for topic model validation have been developed across disparate fields. We synthesize this literature, discuss the advantages and disadvantages of different techniques for topic model validation, and illustrate their usefulness for guiding model selection on a large clinical text corpus.<\/jats:p><\/jats:sec><jats:sec><jats:title>Design, setting and data<\/jats:title><jats:p>Using a retrospective cohort design, we curated a text corpus containing 382,666 clinical notes collected between 01\/01\/2017 through 12\/31\/2020 from primary care electronic medical records in Toronto Canada.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>Several topic model quality metrics have been proposed to assess different aspects of model fit. We explored the following metrics: reconstruction error, topic coherence, rank biased overlap, Kendall\u2019s weighted tau, partition coefficient, partition entropy and the Xie-Beni statistic. Depending on context, cross-validation and\/or bootstrap stability analysis were used to estimate these metrics on our corpus.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Cross-validated reconstruction error favored large topic models (K\u2009\u2265\u2009100 topics) on our corpus. Stability analysis using topic coherence and the Xie-Beni statistic also favored large models (K\u2009=\u2009100 topics). Rank biased overlap and Kendall\u2019s weighted tau favored small models (K\u2009=\u20095 topics). Few model evaluation metrics suggested mid-sized topic models (25\u2009\u2264\u2009K\u2009\u2264\u200975) as being optimal. However, human judgement suggested that mid-sized topic models produced expressive low-dimensional summarizations of the corpus.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>Topic model quality indices are transparent quantitative tools for guiding model selection and evaluation. Our empirical illustration demonstrated that different topic model quality indices favor models of different complexity; and may not select models aligning with human judgment. This suggests that different metrics capture different aspects of model goodness of fit. A combination of topic model quality indices, coupled with human validation, may be useful in appraising unsupervised topic models.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s12911-023-02216-1","type":"journal-article","created":{"date-parts":[[2023,7,22]],"date-time":"2023-07-22T04:01:51Z","timestamp":1689998511000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Quality indices for topic model selection and evaluation: a literature review and case study"],"prefix":"10.1186","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5429-5233","authenticated-orcid":false,"given":"Christopher","family":"Meaney","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Therese A.","family":"Stukel","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Peter C.","family":"Austin","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rahim","family":"Moineddin","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Michelle","family":"Greiver","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Michael","family":"Escobar","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2023,7,22]]},"reference":[{"key":"2216_CR1","doi-asserted-by":"publisher","first-page":"535","DOI":"10.1257\/jel.20181020","volume":"57","author":"M Gentzkow","year":"2019","unstructured":"Gentzkow M, Kelly B, Taddy M. Text as Data. Journal of Economic Literature. 2019;57:535\u201374.","journal-title":"Journal of Economic Literature"},{"key":"2216_CR2","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9","volume":"41","author":"S Deerwester","year":"1990","unstructured":"Deerwester S, Dumais S, Furnas G, et al. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science. 1990;41:391\u2013408.","journal-title":"Journal of the American Society for Information Science"},{"key":"2216_CR3","doi-asserted-by":"publisher","first-page":"573","DOI":"10.1137\/1037127","volume":"37","author":"M Berry","year":"1995","unstructured":"Berry M, Dumais S, O\u2019Brien G. Using Linear Algebra for Intelligent Information Retrieval. SIAM Rev. 1995;37:573\u201395.","journal-title":"SIAM Rev"},{"key":"2216_CR4","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1037\/0033-295X.104.2.211","volume":"104","author":"T Landauer","year":"1997","unstructured":"Landauer T, Dumais S. A Solution to Plato\u2019s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Reviews. 1997;104:211\u201340.","journal-title":"Psychological Reviews"},{"key":"2216_CR5","doi-asserted-by":"publisher","first-page":"788","DOI":"10.1038\/44565","volume":"401","author":"D Lee","year":"1999","unstructured":"Lee D, Seung S. Learning the Parts of an Object by Non-Negative Matrix Factorization. Nature. 1999;401:788\u201391.","journal-title":"Nature"},{"key":"2216_CR6","unstructured":"Lee D, Seung HS. Algorithms for non-negative matrix factorization. Advances in neural information processing systems. 2000;13."},{"key":"2216_CR7","doi-asserted-by":"crossref","unstructured":"Xu W, Liu X, Gong Y. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval; 2003. pp. 267\u201373.","DOI":"10.1145\/860435.860485"},{"key":"2216_CR8","doi-asserted-by":"crossref","unstructured":"Hofmann T. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval; 1999. pp. 50\u20137.","DOI":"10.1145\/312624.312649"},{"key":"2216_CR9","first-page":"993","volume":"3","author":"D Blei","year":"2003","unstructured":"Blei D, Ng A, Jordan M. Latent Dirichlet Allocation. J Mach Learn Res. 2003;3:993\u20131022.","journal-title":"J Mach Learn Res"},{"key":"2216_CR10","volume-title":"Gibbs Sampling in the Generative Model of Latent Dirichlet Allocation","author":"M Griffiths","year":"2002","unstructured":"Griffiths M. Gibbs Sampling in the Generative Model of Latent Dirichlet Allocation. Technical Report: Stanford University; 2002."},{"key":"2216_CR11","unstructured":"Griffiths, T, Steyvers, M. Probabilistic Topic Models. In Handbook of Latent Semantic Analysis. Chapter 21. 2007."},{"key":"2216_CR12","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1145\/2133806.2133826","volume":"55","author":"D Blei","year":"2012","unstructured":"Blei D. Probabilistic Topic Models. Commun ACM. 2012;55:77\u201384.","journal-title":"Commun ACM"},{"key":"2216_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1561\/2200000055","volume":"9","author":"M Udell","year":"2016","unstructured":"Udell M, Horn C, Zadeh R, et al. Generalized Low Rank Models. Foundations and Trends in Machine Learning. 2016;9:1\u2013118.","journal-title":"Foundations and Trends in Machine Learning"},{"key":"2216_CR14","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1002\/env.3170050203","volume":"5","author":"P Paatero","year":"1994","unstructured":"Paatero P, Tapper U. Positive Matrix Factorization: A Non-Negative Factor Model with Optimal Utilization of Error Estimates of Data Values. Environmetrics. 1994;5:111\u201326.","journal-title":"Environmetrics"},{"key":"2216_CR15","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1016\/j.csda.2006.11.006","volume":"52","author":"M Berry","year":"2007","unstructured":"Berry M, Browne M, Langville A, et al. Algorithms and Applications for Approximate Non-Negative Matrix Factorization. Comput Stat Data Anal. 2007;52:155\u201373.","journal-title":"Comput Stat Data Anal"},{"key":"2216_CR16","unstructured":"Chang J, Gerrish S, Wang C, Boyd-Graber J, Blei D. Reading tea leaves: How humans interpret topic models. Advances in neural information processing systems. 2009;22."},{"key":"2216_CR17","doi-asserted-by":"crossref","unstructured":"Doogan C, Buntine W. Topic model or topic twaddle? re-evaluating semantic interpretability measures. In Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies; 2021. pp. 3824\u201348.","DOI":"10.18653\/v1\/2021.naacl-main.300"},{"key":"2216_CR18","doi-asserted-by":"crossref","unstructured":"Greene D, Cunningham P, Mayer R. Unsupervised learning and clustering. Machine learning techniques for multimedia: Case studies on organization and retrieval. 2008:51\u201390.","DOI":"10.1007\/978-3-540-75171-7_3"},{"key":"2216_CR19","unstructured":"Palacio-Nino, J, Berzal, F. Evaluation Metrics for Unsupervised Learning Algorithms. Arxiv. 2019; 1\u20139. URL: https:\/\/arxiv.org\/pdf\/1905.05667.pdf"},{"key":"2216_CR20","doi-asserted-by":"crossref","unstructured":"Matthews P. Human-In-The-Loop Topic Modelling: Assessing topic labelling and genre-topic relations with a movie plot summary corpus. In The Human Position in an Artificial World: Creativity, Ethics and AI in Knowledge Organization. Ergon-Verlag; 2019. pp. 181\u2013207.","DOI":"10.5771\/9783956505508-181"},{"key":"2216_CR21","volume-title":"Content Analysis: An Introduction to its Methodology","author":"K Krippendorff","year":"2008","unstructured":"Krippendorff K. Content Analysis: An Introduction to its Methodology. 2nd ed. Thousand Oaks, California: Sage Publications; 2008.","edition":"2"},{"key":"2216_CR22","doi-asserted-by":"crossref","unstructured":"Hastie T, Tibshirani R, Friedman JH, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. New York: Springer; 2009. Vol. 2, pp. 1\u2013758.","DOI":"10.1007\/978-0-387-84858-7"},{"key":"2216_CR23","doi-asserted-by":"publisher","first-page":"387","DOI":"10.1080\/00401706.1978.10489693","volume":"20","author":"S Wold","year":"1978","unstructured":"Wold S. Cross-Validatory Estimation of the Number of Components in Factor Analysis and Principal Component Analysis. Technometrics. 1978;20:387\u2013405.","journal-title":"Technometrics"},{"key":"2216_CR24","doi-asserted-by":"publisher","first-page":"564","DOI":"10.1214\/08-AOAS227","volume":"3","author":"A Owen","year":"2009","unstructured":"Owen A, Perry P. Bi-Cross Validation of the SVD and Non-Negative Matrix Factorization. Annals of Applied Statistics. 2009;3:564\u201394.","journal-title":"Annals of Applied Statistics"},{"key":"2216_CR25","doi-asserted-by":"publisher","first-page":"1241","DOI":"10.1007\/s00216-007-1790-1","volume":"390","author":"R Bro","year":"2008","unstructured":"Bro R, Khejdahl K, Smilde A, et al. Cross-Validation of Component Model: A Critical Look at Current Methods. Annals of Bioanalytic Chemistry. 2008;390:1241\u201351.","journal-title":"Annals of Bioanalytic Chemistry"},{"key":"2216_CR26","doi-asserted-by":"crossref","unstructured":"Greene D, O\u2019Callaghan D, Cunningham P. How many topics? stability analysis for topic models. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part I 14.\u00a0 Springer Berlin Heidelberg; 2014. pp. 498\u2013513.","DOI":"10.1007\/978-3-662-44848-9_32"},{"key":"2216_CR27","doi-asserted-by":"publisher","first-page":"1299","DOI":"10.1162\/089976604773717621","volume":"16","author":"T Lange","year":"2004","unstructured":"Lange T, Roth V, Braun M, et al. Stability Based Validation of Clustering Solutions. Neural Comput. 2004;16:1299\u2013323.","journal-title":"Neural Comput"},{"key":"2216_CR28","doi-asserted-by":"crossref","unstructured":"AlSumait L, Barbar\u00e1 D, Gentle J, Domeniconi C. Topic significance ranking of LDA generative models. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part I 20. Springer Berlin Heidelberg; 2009. pp. 67\u201382.\u00a0","DOI":"10.1007\/978-3-642-04180-8_22"},{"key":"2216_CR29","unstructured":"Newman D, Lau JH, Grieser K, Baldwin T. Automatic evaluation of topic coherence. In Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics; 2010. pp. 100\u20138."},{"key":"2216_CR30","unstructured":"Mimno D, Wallach H, Talley E, Leenders M, McCallum A. Optimizing semantic coherence in topic models. In Proceedings of the 2011 conference on empirical methods in natural language processing; 2011. pp. 262\u201372."},{"key":"2216_CR31","doi-asserted-by":"crossref","unstructured":"R\u00f6der M, Both A, Hinneburg A. Exploring the space of topic coherence measures. In Proceedings of the eighth ACM international conference on Web search and data mining; 2015. pp. 399\u2013408.","DOI":"10.1145\/2684822.2685324"},{"key":"2216_CR32","doi-asserted-by":"publisher","first-page":"134","DOI":"10.1137\/S0895480102412856","volume":"17","author":"R Fagin","year":"2003","unstructured":"Fagin R, Kumar R, Sivakumar D. Comparing Top-K Lists. SIAM Journal of Discrete Mathematics. 2003;17:134\u201360.","journal-title":"SIAM Journal of Discrete Mathematics"},{"key":"2216_CR33","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1852102.1852106","volume":"28","author":"W Webber","year":"2010","unstructured":"Webber W, Moffat A, Zobel J. A Similarity Measure for Indefinite Rankings. ACM Transactions on Information Systems. 2010;28:1\u201334.","journal-title":"ACM Transactions on Information Systems"},{"key":"2216_CR34","doi-asserted-by":"publisher","first-page":"258","DOI":"10.1002\/bs.3830070216","volume":"7","author":"J Hurley","year":"1962","unstructured":"Hurley J, Cattell R. Producing Direct Rotation to Test a Hypothesized Factor Solution. Behavioural Science. 1962;7:258\u201362.","journal-title":"Behavioural Science"},{"key":"2216_CR35","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1002\/nav.3800020109","volume":"2","author":"H Kuhn","year":"1955","unstructured":"Kuhn H. The Hungarian Method for the Assignment Problem. Naval Research Logistics Quarterly. 1955;2:83\u201397.","journal-title":"Naval Research Logistics Quarterly"},{"key":"2216_CR36","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1093\/biomet\/30.1-2.81","volume":"30","author":"M Kendall","year":"1938","unstructured":"Kendall M. A New Measure of Rank Correlation. Biometrika. 1938;30:81\u20139.","journal-title":"Biometrika"},{"key":"2216_CR37","volume-title":"Handbook of Mixed Membership Models and Their Applications","author":"E Airoldi","year":"2015","unstructured":"Airoldi E, Blei D, Erosheva E, et al. Handbook of Mixed Membership Models and Their Applications. Boca Raton, Florida: Chapman and Hall Press; 2015."},{"key":"2216_CR38","doi-asserted-by":"publisher","first-page":"2095","DOI":"10.1016\/j.fss.2007.03.004","volume":"158","author":"W Wang","year":"2007","unstructured":"Wang W, Zhang Y. On Fuzzy Clustering Validity Indices. Fuzzy Sets Syst. 2007;158:2095\u2013117.","journal-title":"Fuzzy Sets Syst"},{"key":"2216_CR39","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1080\/01969727308546047","volume":"3","author":"J Bezdek","year":"1974","unstructured":"Bezdek J. Cluster Validity with Fuzzy Sets. Cybernetics. 1974;3:58\u201373.","journal-title":"Cybernetics"},{"key":"2216_CR40","doi-asserted-by":"publisher","first-page":"613","DOI":"10.1016\/0167-8655(96)00026-8","volume":"17","author":"R Dave","year":"1996","unstructured":"Dave R. Validating Fuzzy Partitions Obtained Through C-Shells Partitions. Pattern Recogn Lett. 1996;17:613\u201323.","journal-title":"Pattern Recogn Lett"},{"key":"2216_CR41","doi-asserted-by":"publisher","first-page":"841","DOI":"10.1109\/34.85677","volume":"13","author":"X Xie","year":"1991","unstructured":"Xie X, Beni G. A Validity Measure for Fuzzy Clustering. IEEE Transactions of Pattern Analysis and Machine Learning. 1991;13:841\u20137.","journal-title":"IEEE Transactions of Pattern Analysis and Machine Learning"},{"key":"2216_CR42","doi-asserted-by":"publisher","first-page":"370","DOI":"10.1109\/91.413225","volume":"3","author":"N Pal","year":"1995","unstructured":"Pal N, Bezdek J. On Cluster Validity of the Fuzzy C-Means Model. IEEE Trans Fuzzy Syst. 1995;3:370\u20139.","journal-title":"IEEE Trans Fuzzy Syst"},{"key":"2216_CR43","doi-asserted-by":"crossref","unstructured":"Garies S, Birtwhistle R, Drummond N, Queenan J, Williamson T. Data resource profile: national electronic medical record data from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN).\u00a0Int J Epidemiol. 2017;46(4):1091\u20132f.","DOI":"10.1093\/ije\/dyw248"},{"key":"2216_CR44","doi-asserted-by":"crossref","unstructured":"Webster JJ, Kit C. Tokenization as the initial phase in NLP. In COLING 1992 volume 4: The 14th international conference on computational linguistics; 1992.","DOI":"10.3115\/992424.992434"},{"key":"2216_CR45","doi-asserted-by":"crossref","unstructured":"D\u00edaz NPC, L\u00f3pez MJM. An analysis of biomedical tokenization: problems and strategies. In Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis; 2015. pp. 40\u20139.","DOI":"10.18653\/v1\/W15-2605"},{"key":"2216_CR46","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-05318-5","volume-title":"Automated Machine Learning","author":"F Hutter","year":"2019","unstructured":"Hutter F, Kotthoff L, Vanschoren J. Automated Machine Learning. Springer; 2019."}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-023-02216-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12911-023-02216-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-023-02216-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,24]],"date-time":"2024-10-24T20:41:40Z","timestamp":1729802500000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-023-02216-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,22]]},"references-count":46,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["2216"],"URL":"https:\/\/doi.org\/10.1186\/s12911-023-02216-1","relation":{},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,22]]},"assertion":[{"value":"31 May 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 June 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 July 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"This study received ethics approval from North York General Hospital Research Ethics Board (REB ID: NYGH #20\u20130014). All participating primary care physicians provided written informed consent for the collection and analysis of their electronic medical record data at UTOPIAN; patients rostered to a particular primary care physician could opt-out of providing their data to UTOPIAN if they so chose.\u00a0This model of consent was approved by REB and is consistent with Ontario's privacy legislation (PHIPA Sect.\u00a044).","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"132"}}