{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T02:40:47Z","timestamp":1769740847741,"version":"3.49.0"},"reference-count":51,"publisher":"Springer Science and Business Media LLC","issue":"12","license":[{"start":{"date-parts":[[2022,3,12]],"date-time":"2022-03-12T00:00:00Z","timestamp":1647043200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,3,12]],"date-time":"2022-03-12T00:00:00Z","timestamp":1647043200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Fraunhofer-Institut f\u00fcr System- und Innovationsforschung ISI"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Scientometrics"],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Policymakers and funding agencies tend to support scientific work across disciplines, thereby relying on indicators for interdisciplinarity. Recently, text-based quantitative methods have been proposed for the computation of interdisciplinarity that hold promise to have several advantages over the bibliometric approach. In this paper, we provide a systematic analysis of the computation of the text-based Rao index, based on probabilistic topic models, comparing a classical LDA model versus a neural network topic model. We provide a systematic analysis of model parameters that affect the diversity scores and make the interaction between its different components explicit. We present an empirical study on a real data set, upon which we quantify the diversity of the research within several departments of Fraunhofer and Max Planck Society by means of scientific abstracts published in Scopus between 2008 and 2018. Our experiments show that parameter variations, i.e. the choice of the Number of topics, hyper-parameters, and size and balance of the underlying data used for training the model, have a strong effect on the topic model-based Rao metrics. In particular, we could observe that the quality\u00a0of the\u00a0topic\u00a0models impacts on the downstream task of computing the Rao index.\u00a0Topic models that yield semantically cohesive topics are less affected by fluctuations when varying over the number of topics, and result in more stable measurements of the Rao index.<\/jats:p>","DOI":"10.1007\/s11192-022-04312-x","type":"journal-article","created":{"date-parts":[[2022,3,12]],"date-time":"2022-03-12T04:35:25Z","timestamp":1647059725000},"page":"7751-7768","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Impact of model settings on the text-based Rao diversity index"],"prefix":"10.1007","volume":"127","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2092-6153","authenticated-orcid":false,"given":"Andrea","family":"Zielinski","sequence":"first","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,3,12]]},"reference":[{"key":"4312_CR1","doi-asserted-by":"crossref","unstructured":"Aletras, N., & Stevenson, M. (2014). Measuring the similarity between automatically generated topics. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers\u00a0(pp. 22\u201327)","DOI":"10.3115\/v1\/E14-4005"},{"key":"4312_CR2","doi-asserted-by":"crossref","unstructured":"Bache, K., Newman, D., & Smyth, P. (2013). Text-based measures of document diversity. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining\u00a0(pp. 23\u201331)","DOI":"10.1145\/2487575.2487672"},{"issue":"6","key":"4312_CR3","first-page":"55","volume":"27","author":"D Blei","year":"2010","unstructured":"Blei, D., Carin, L., & Dunson, D. (2010). Probabilistic topic models. IEEE Signal Processing Magazine, 27(6), 55\u201365.","journal-title":"IEEE Signal Processing Magazine"},{"issue":"Jan","key":"4312_CR4","first-page":"993","volume":"3","author":"DM Blei","year":"2003","unstructured":"Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993\u20131022.","journal-title":"Journal of Machine Learning Research"},{"issue":"1","key":"4312_CR6","doi-asserted-by":"publisher","first-page":"e0170296","DOI":"10.1371\/journal.pone.0170296","volume":"12","author":"L Cassi","year":"2017","unstructured":"Cassi, L., Champeimont, R., Mescheba, W., & De Turckheim, E. (2017). Analysing institutions interdisciplinarity by extensive use of Rao-Stirling diversity index. PLoS ONE, 12(1), e0170296.","journal-title":"PLoS ONE"},{"key":"4312_CR7","first-page":"288","volume":"22","author":"J Chang","year":"2009","unstructured":"Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J., & Blei, D. (2009). Reading tea leaves: How humans interpret topic models. Advances in Neural Information Processing Systems, 22, 288\u2013296.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"4312_CR8","doi-asserted-by":"crossref","unstructured":"Chuang, J., Ramage, D., Manning, C., & Heer, J. (2012). Interpretation and trust: Designing model-driven visualizations for text analysis. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems\u00a0(pp. 443\u2013452)","DOI":"10.1145\/2207676.2207738"},{"key":"4312_CR10","doi-asserted-by":"publisher","first-page":"439","DOI":"10.1162\/tacl_a_00325","volume":"8","author":"AB Dieng","year":"2020","unstructured":"Dieng, A. B., Ruiz, F. J., & Blei, D. M. (2020). Topic modeling in embedding spaces. Transactions of the Association for Computational Linguistics, 8, 439\u2013453.","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"4312_CR11","doi-asserted-by":"crossref","unstructured":"Doan, T., & Hoang, T. (2021). Benchmarking neural topic models: An empirical study.\u00a0FINDINGS","DOI":"10.18653\/v1\/2021.findings-acl.382"},{"key":"4312_CR12","first-page":"1","volume":"36","author":"S Gershman","year":"2014","unstructured":"Gershman, S., & Goodman, N. D. (2014). Amortized inference in probabilistic reasoning. Cognitive Science, 36, 1.","journal-title":"Cognitive Science"},{"issue":"suppl 1","key":"4312_CR13","doi-asserted-by":"publisher","first-page":"5228","DOI":"10.1073\/pnas.0307752101","volume":"101","author":"TL Griffiths","year":"2004","unstructured":"Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228\u20135235.","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"4312_CR14","doi-asserted-by":"crossref","unstructured":"Guo, Z., Zhu, S., Chi, Y., Zhang, Z., & Gong, Y. (2009). A latent topic model for linked documents. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval\u00a0(pp. 720\u2013721)","DOI":"10.1145\/1571941.1572095"},{"key":"4312_CR15","doi-asserted-by":"crossref","unstructured":"Hall, D., Jurafsky, D., & Manning, C. D. (2008). Studying the history of ideas using topic models. In: Proceedings of the 2008 conference on empirical methods in natural language processing\u00a0(pp. 363\u2013371)","DOI":"10.3115\/1613715.1613763"},{"issue":"1","key":"4312_CR16","first-page":"1303","volume":"14","author":"MD Hoffman","year":"2013","unstructured":"Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic variational inference. The Journal of Machine Learning Research, 14(1), 1303\u20131347.","journal-title":"The Journal of Machine Learning Research"},{"key":"4312_CR17","unstructured":"Jagarlamudi, J., Daum\u00e9 III, H., & Udupa, R. (2012). Incorporating lexical priors into topic models. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics\u00a0(pp. 204\u2013213)"},{"key":"4312_CR19","unstructured":"Kingma, D.P., & Welling, M. (2014). Auto-encoding variational Bayes.\u00a0CoRR. Retrieved from https:\/\/arxiv.org\/abs\/1312.6114"},{"key":"4312_CR20","doi-asserted-by":"crossref","unstructured":"Lau, J. H., Newman, D., & Baldwin, T. (2014, April). Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics\u00a0(pp. 530\u2013539)","DOI":"10.3115\/v1\/E14-1056"},{"issue":"12","key":"4312_CR21","doi-asserted-by":"publisher","first-page":"1973","DOI":"10.1002\/asi.20914","volume":"59","author":"JM Levitt","year":"2008","unstructured":"Levitt, J. M., & Thelwall, M. (2008). Is multidisciplinary research more highly cited? A macrolevel study. Journal of the American Society for Information Science and Technology, 59(12), 1973\u20131984.","journal-title":"Journal of the American Society for Information Science and Technology"},{"issue":"3","key":"4312_CR22","doi-asserted-by":"publisher","first-page":"2113","DOI":"10.1007\/s11192-018-2810-y","volume":"116","author":"L Leydesdorff","year":"2018","unstructured":"Leydesdorff, L. (2018). Diversity and interdisciplinarity: How can one distinguish and recombine disparity, variety, and balance? Scientometrics, 116(3), 2113\u20132121.","journal-title":"Scientometrics"},{"issue":"2","key":"4312_CR23","doi-asserted-by":"publisher","first-page":"348","DOI":"10.1002\/asi.20967","volume":"60","author":"L Leydesdorff","year":"2009","unstructured":"Leydesdorff, L., & Rafols, I. (2009). A global map of science based on the ISI subject categories. Journal of the American Society for Information Science and Technology, 60(2), 348\u2013362.","journal-title":"Journal of the American Society for Information Science and Technology"},{"issue":"3","key":"4312_CR24","doi-asserted-by":"publisher","first-page":"904","DOI":"10.1016\/j.joi.2019.03.016","volume":"13","author":"L Leydesdorff","year":"2019","unstructured":"Leydesdorff, L., Wagner, C. S., & Bornmann, L. (2019). Diversity measurement: Steps towards the measurement of interdisciplinarity? Journal of Informetrics, 13(3), 904\u2013905. https:\/\/doi.org\/10.1016\/j.joi.2019.03.016","journal-title":"Journal of Informetrics"},{"key":"4312_CR25","doi-asserted-by":"crossref","unstructured":"Li, W., & McCallum, A. (2006). Pachinko allocation: DAG-structured mixture models of topic correlations. In: Proceedings of the 23rd international conference on Machine learning\u00a0(pp. 577\u2013584)","DOI":"10.1145\/1143844.1143917"},{"key":"4312_CR26","unstructured":"Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. NIPS"},{"key":"4312_CR27","unstructured":"Mimno, D., Wallach, H., Talley, E., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence in topic models. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing\u00a0(pp. 262\u2013272)"},{"key":"4312_CR29","doi-asserted-by":"publisher","DOI":"10.1045\/september2016-nanni","author":"F Nanni","year":"2016","unstructured":"Nanni, F., Dietz, L., Faralli, S., Glava\u0161, G., & Ponzetto, S. P. (2016). Capturing interdisciplinarity in academic abstracts. D-Lib Magazine. https:\/\/doi.org\/10.1045\/september2016-nanni","journal-title":"D-Lib Magazine"},{"key":"4312_CR100","unstructured":"National Academies. (2005). National Science Foundation Committee\non Facilitating Interdisciplinary Research, Committee on Science, Engineering, and Public Policy (2004). Facilitating interdisciplinary research. Washington: National\nAcademy Press, p. 2."},{"key":"4312_CR30","first-page":"496","volume":"24","author":"D Newman","year":"2011","unstructured":"Newman, D., Bonilla, E. V., & Buntine, W. (2011). Improving topic coherence with regularized topic models. Advances in Neural Information Processing Systems, 24, 496\u2013504.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"3","key":"4312_CR31","doi-asserted-by":"publisher","first-page":"741","DOI":"10.1007\/s11192-014-1319-2","volume":"100","author":"LG Nichols","year":"2014","unstructured":"Nichols, L. G. (2014). A topic model approach to measuring interdisciplinarity at the National Science Foundation. Scientometrics, 100(3), 741\u2013754.","journal-title":"Scientometrics"},{"key":"4312_CR32","unstructured":"Paul, M., & Girju, R. (2009). Topic modeling of research fields: An interdisciplinary perspective. In: Proceedings of the International Conference RANLP-2009 (pp. 337\u2013342)"},{"issue":"3","key":"4312_CR101","doi-asserted-by":"publisher","first-page":"719","DOI":"10.1007\/s11192-008-2197-2","volume":"81","author":"A Porter","year":"2009","unstructured":"Porter, A., & Rafols, I. (2009). Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81(3), 719\u2013745.","journal-title":"Scientometrics"},{"key":"4312_CR33","unstructured":"Quan, X., Kit, C., Ge, Y., & Pan, S. J. (2015). Short and sparse text topic modeling via self-aggregation. In: Twenty-Fourth International Joint Conference on Artificial Intelligence"},{"key":"4312_CR35","unstructured":"Ramage, D., Manning, C. D., & McFarland, D. A. (2010). Which universities lead and lag? Toward university rankings based on scholarly output. In: Proceedings of NIPS Workshop on Computational Social Science and the Wisdom of the Crowds"},{"key":"4312_CR102","doi-asserted-by":"crossref","unstructured":"Ramage, D., Manning, C. D., & Dumais, S. (2011). Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 457\u2013465).","DOI":"10.1145\/2020408.2020481"},{"issue":"1","key":"4312_CR36","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1016\/0040-5809(82)90004-1","volume":"21","author":"CR Rao","year":"1982","unstructured":"Rao, C. R. (1982). Diversity and dissimilarity coefficients: A unified approach. Theoretical Population Biology, 21(1), 24\u201343.","journal-title":"Theoretical Population Biology"},{"key":"4312_CR37","doi-asserted-by":"crossref","unstructured":"R\u00f6der, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM international conference on Web search and data mining\u00a0(pp. 399\u2013408)","DOI":"10.1145\/2684822.2685324"},{"issue":"3","key":"4312_CR38","doi-asserted-by":"publisher","first-page":"906","DOI":"10.1016\/j.joi.2019.03.015","volume":"13","author":"R Rousseau","year":"2019","unstructured":"Rousseau, R. (2019). On the Leydesdorff-Wagner-Bornmann proposal for diversity measurement. Journal of Informetrics, 13(3), 906\u2013907. https:\/\/doi.org\/10.1016\/j.joi.2019.03.015","journal-title":"Journal of Informetrics"},{"issue":"15","key":"4312_CR40","doi-asserted-by":"publisher","first-page":"707","DOI":"10.1098\/rsif.2007.0213","volume":"4","author":"A Stirling","year":"2007","unstructured":"Stirling, A. (2007). A general framework for analyzing diversity in science, technology and society. Journal of the Royal Society Interface, 4(15), 707\u2013719.","journal-title":"Journal of the Royal Society Interface"},{"issue":"10","key":"4312_CR41","doi-asserted-by":"publisher","first-page":"2464","DOI":"10.1002\/asi.23596","volume":"67","author":"A Suominen","year":"2016","unstructured":"Suominen, A., & Toivanen, H. (2016). Map of science with topic modeling: Comparison of unsupervised learning and human-assigned subject classification. Journal of the Association for Information Science and Technology, 67(10), 2464\u20132476.","journal-title":"Journal of the Association for Information Science and Technology"},{"key":"4312_CR42","doi-asserted-by":"crossref","unstructured":"Syed, S., & Spruit, M. (2018). Selecting priors for latent Dirichlet allocation. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC)\u00a0(pp. 194\u2013202). IEEE.","DOI":"10.1109\/ICSC.2018.00035"},{"issue":"6","key":"4312_CR43","doi-asserted-by":"publisher","first-page":"443","DOI":"10.1038\/nmeth.1619","volume":"8","author":"EM Talley","year":"2011","unstructured":"Talley, E. M., Newman, D., Mimno, D., Herr, B. W., II., Wallach, H. M., Burns, G. A., Leenders, A. G., & McCallum, A. (2011). Database of NIH grants using machine-learned categories and graphical clustering. Nature Methods, 8(6), 443.","journal-title":"Nature Methods"},{"key":"4312_CR44","unstructured":"Tang, J., Meng, Z., Nguyen, X., Mei, Q., & Zhang, M. (2014). Understanding the limiting factors of topic modeling via posterior contraction analysis. In: International Conference on Machine Learning\u00a0(pp. 190\u2013198)"},{"issue":"1","key":"4312_CR45","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1016\/j.joi.2010.06.004","volume":"5","author":"CS Wagner","year":"2011","unstructured":"Wagner, C. S., Roessner, J. D., Bobb, K., Klein, J. T., Boyack, K. W., Keyton, J., Rafols, I., & B\u00f6rner, K. (2011). Approaches to understanding and measuring interdisciplinary scientific research (IDR): A review of the literature. Journal of Informetrics, 5(1), 14\u201326.","journal-title":"Journal of Informetrics"},{"key":"4312_CR46","first-page":"1973","volume":"22","author":"H Wallach","year":"2009","unstructured":"Wallach, H., Mimno, D., & McCallum, A. (2009). Rethinking LDA: Why priors matter. Advances in Neural Information Processing Systems, 22, 1973\u20131981.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"1","key":"4312_CR47","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1162\/qss_a_00011","volume":"1","author":"Q Wang","year":"2020","unstructured":"Wang, Q., & Schneider, J. W. (2020). Consistency and validity of interdisciplinarity measures. Quantitative Science Studies, 1(1), 239\u2013263.","journal-title":"Quantitative Science Studies"},{"key":"4312_CR48","doi-asserted-by":"crossref","unstructured":"Wang, X., Fang, A., Ounis, I., & Macdonald, C. (2019). Evaluating similarity metrics for latent Twitter topics. European conference on information retrieval (pp. 787\u2013794). Springer.","DOI":"10.1007\/978-3-030-15712-8_54"},{"key":"4312_CR49","doi-asserted-by":"crossref","unstructured":"Wang, K., Sha, C., Wang, X., & Zhou, A. (2014). Based on citation diversity to explore influential papers for interdisciplinarity. In Asia-Pacific web conference (pp. 343\u2013354). Springer, Cham.","DOI":"10.1007\/978-3-319-11116-2_30"},{"issue":"3","key":"4312_CR50","doi-asserted-by":"publisher","first-page":"767","DOI":"10.1007\/s11192-014-1321-8","volume":"100","author":"CK Yau","year":"2014","unstructured":"Yau, C. K., Porter, A., Newman, N., & Suominen, A. (2014). Clustering scientific documents with topic modeling. Scientometrics, 100(3), 767\u2013786.","journal-title":"Scientometrics"},{"issue":"5","key":"4312_CR51","doi-asserted-by":"publisher","first-page":"1257","DOI":"10.1002\/asi.23487","volume":"67","author":"L Zhang","year":"2016","unstructured":"Zhang, L., Rousseau, R., & Gl\u00e4nzel, W. (2016). Diversity of references as an indicator of the interdisciplinarity of journals: Taking similarity between subject fields into account. Journal of the Association for Information Science and Technology, 67(5), 1257\u20131265.","journal-title":"Journal of the Association for Information Science and Technology"},{"key":"4312_CR52","doi-asserted-by":"crossref","unstructured":"Zhao, H., Du, L., Buntine, W., & Liu, G. (2017). MetaLDA: A topic model that efficiently incorporates meta information. In: 2017 IEEE International Conference on Data Mining (ICDM)\u00a0(pp. 635\u2013644). IEEE","DOI":"10.1109\/ICDM.2017.73"},{"issue":"3","key":"4312_CR53","doi-asserted-by":"publisher","first-page":"787","DOI":"10.1007\/s11192-012-0767-9","volume":"93","author":"Q Zhou","year":"2012","unstructured":"Zhou, Q., Rousseau, R., Yang, L., Yue, T., & Yang, G. (2012). A general framework for describing diversity within systems and similarity between systems with applications in informetrics. Scientometrics, 93(3), 787\u2013812.","journal-title":"Scientometrics"},{"key":"4312_CR54","unstructured":"Zielinski, A. (2021). Impact of model settings on the text-based Rao diversity index. In: 18th International Conference on Scientometrics and Informetrics Conference ISSI 2021\u00a0(pp. 1405\u20131416)."}],"container-title":["Scientometrics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11192-022-04312-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11192-022-04312-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11192-022-04312-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,5]],"date-time":"2022-12-05T05:14:01Z","timestamp":1670217241000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11192-022-04312-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,12]]},"references-count":51,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["4312"],"URL":"https:\/\/doi.org\/10.1007\/s11192-022-04312-x","relation":{},"ISSN":["0138-9130","1588-2861"],"issn-type":[{"value":"0138-9130","type":"print"},{"value":"1588-2861","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,12]]},"assertion":[{"value":"28 October 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 February 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 March 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}