{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T23:59:38Z","timestamp":1773705578356,"version":"3.50.1"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T00:00:00Z","timestamp":1740096000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T00:00:00Z","timestamp":1740096000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100009092","name":"Universidad de Alicante","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100009092","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Digit Libr"],"published-print":{"date-parts":[[2025,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Diversity indices have been traditionally used to capture the biodiversity of ecosystems by measuring the effective number of species or groups of species. In contrast to abundance, which grows with the amount of data available and is sensitive to the appearance of small groups, diversity indices provide a more robust indicator on the variability of individuals. These types of indices can be employed in the context of digital libraries to analyse their content and metadata. They can be used, for example, to identify trends in the distribution of topics, to compare the lexica employed by different authors or to analyse the coverage of semantic metadata. In this article, the lexical diversity is measured through one of the most common indices employed to evaluate diversity, the Shannon index. The experiments show that this index slowly grows with the length of the text used to calculate it. As this growth has the diversity value as ceiling, the curves show that the true value of diversity will only be reached for very large samples. Unfortunately, the available text is often not long enough to achieve the convergence. This paper introduces therefore a new model for the calculation of the asymptotic value of the Shannon diversity of the vocabulary which outperforms traditional models. As regards metadata in digital libraries, we use the new model to analyse the topical specialization of a digital library and its time evolution and propose a more robust way to measure the variety of tags (classes and properties) employed by digital libraries to describe their holdings in Linked Open Data repositories.<\/jats:p>","DOI":"10.1007\/s00799-025-00411-1","type":"journal-article","created":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T13:30:02Z","timestamp":1740144602000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Measuring the diversity of data and metadata in digital libraries"],"prefix":"10.1007","volume":"26","author":[{"given":"Rafael","family":"C. Carrasco","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6122-0777","authenticated-orcid":false,"given":"Gustavo","family":"Candela","sequence":"additional","affiliation":[]},{"given":"Manuel","family":"Marco-Such","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,2,21]]},"reference":[{"key":"411_CR1","doi-asserted-by":"publisher","DOI":"10.5334\/johd.124","author":"H Alkemade","year":"2023","unstructured":"Alkemade, H., Claeyssens, S., Colavizza, G., et al.: Datasheets for digital cultural heritage datasets. J. Open Hum. Data (2023). https:\/\/doi.org\/10.5334\/johd.124","journal-title":"J. Open Hum. Data"},{"key":"411_CR2","doi-asserted-by":"publisher","unstructured":"Bache, K., Newman, D., Smyth, P: Text-based measures of document diversity. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, KDD \u201913, pp. 23\u20133. (2013). https:\/\/doi.org\/10.1145\/2487575.2487672","DOI":"10.1145\/2487575.2487672"},{"key":"411_CR3","unstructured":"Berners-Lee, T.: Linked data. https:\/\/www.w3.org\/DesignIssues\/LinkedData.html, (2006)"},{"key":"411_CR4","doi-asserted-by":"crossref","unstructured":"Berners-Lee, T., Hendler, J., Lassila, O. The Semantic Web. Scientific American 284, (2001)","DOI":"10.1038\/scientificamerican0501-34"},{"key":"411_CR5","doi-asserted-by":"publisher","unstructured":"del Carmen Calatrava Moreno, M., Auzinger, T., Werthner, H.: On the uncertainty of interdisciplinarity measurements due to incomplete bibliographic data. Scientometrics 107(1), 213\u201323 (2016). https:\/\/doi.org\/10.1007\/S11192-016-1842-4","DOI":"10.1007\/S11192-016-1842-4"},{"key":"411_CR6","doi-asserted-by":"publisher","unstructured":"Carrasco, R.C., Candela, G., Marco-Such, M.: rccarrasco\/dl_diversity: Initial release. https:\/\/doi.org\/10.5281\/zenodo.6389967, (2022)","DOI":"10.5281\/zenodo.6389967"},{"key":"411_CR7","doi-asserted-by":"publisher","unstructured":"Carroll, S.R., Garba, I., Figueroa-Rodr\u00edguez, O.L., et al.: The CARE principles for Indigenous Data Governance. Data Sci. J. 19, 43 (2020). https:\/\/doi.org\/10.5334\/DSJ-2020-043","DOI":"10.5334\/DSJ-2020-043"},{"issue":"1311","key":"411_CR8","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1098\/rstb.1994.0091","volume":"345","author":"RK Colwell","year":"1994","unstructured":"Colwell, R.K., Coddington, J.A.: Estimating terrestrial biodiversity through extrapolation. Philos. Trans. R. Soc. Lond. B Biol. Sci. 345(1311), 101\u2013118 (1994). https:\/\/doi.org\/10.1098\/rstb.1994.0091","journal-title":"Philos. Trans. R. Soc. Lond. B Biol. Sci."},{"key":"411_CR9","doi-asserted-by":"publisher","DOI":"10.1111\/ecog.00814","author":"RK Colwell","year":"2014","unstructured":"Colwell, R.K., Elsensohn, J.E.: Estimates turns 20: statistical estimation of species richness and shared species from samples, with non-parametric extrapolation. Ecography (2014). https:\/\/doi.org\/10.1111\/ecog.00814","journal-title":"Ecography"},{"key":"411_CR10","doi-asserted-by":"publisher","unstructured":"Dobreva, M., Stefanov, K., Ivanova, K.: Data Spaces for Cultural Heritage: Insights from GLAM Innovation Labs. In: From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries - 24th International Conference on Asian Digital Libraries, ICADL 2022, Hanoi, Vietnam, November 30 - December 2, 2022, Proceedings, pp 492\u2013500, (2022). https:\/\/doi.org\/10.1007\/978-3-031-21756-2_41","DOI":"10.1007\/978-3-031-21756-2_41"},{"key":"411_CR11","unstructured":"Europeana Foundation Deployment of a common European data space for cultural heritage. https:\/\/pro.europeana.eu\/files\/Europeana_Professional\/Publications\/data_space_annual_report_2022_2023.pdf. (2023)"},{"issue":"1","key":"411_CR12","doi-asserted-by":"crossref","first-page":"140","DOI":"10.14704\/WEB\/V17I1\/a213","volume":"17","author":"B Heshmati","year":"2020","unstructured":"Heshmati, B.: Global research trends of public libraries from 1968 to 2017: a bibliometric and visualization analysis. Webology 17(1), 140\u2013157 (2020)","journal-title":"Webology"},{"issue":"2","key":"411_CR13","doi-asserted-by":"publisher","first-page":"427","DOI":"10.2307\/1934352","volume":"54","author":"MO Hill","year":"1973","unstructured":"Hill, M.O.: Diversity and evenness: a unifying notation and its consequences. Ecology 54(2), 427\u201343 (1973). https:\/\/doi.org\/10.2307\/1934352","journal-title":"Ecology"},{"issue":"2","key":"411_CR14","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1023\/A:1022673822140","volume":"37","author":"DL Hoover","year":"2003","unstructured":"Hoover, D.L.: Another perspective on vocabulary richness. Comput. Humanit. 37(2), 151\u201317 (2003). https:\/\/doi.org\/10.1023\/A:1022673822140","journal-title":"Comput. Humanit."},{"issue":"s1","key":"411_CR15","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1111\/j.1467-9922.2012.00739.x","volume":"63","author":"S Jarvis","year":"2013","unstructured":"Jarvis, S.: Capturing the diversity in lexical diversity. Lang. Learn. 63(s1), 87\u201310 (2013). https:\/\/doi.org\/10.1111\/j.1467-9922.2012.00739.x","journal-title":"Lang. Learn."},{"key":"411_CR16","unstructured":"Karsdorp, F., Manjavacas, E., Fonteyn, L.: Introducing functional diversity: A novel approach to lexical diversity in (historical) corpora. In: Karsdorp F, Nielbo KL (eds) Proceedings of the Computational Humanities Research Conference 2022, CHR 2022, Antwerp, Belgium, December 12-14, 2022, CEUR Workshop Proceedings, vol 3290. CEUR-WS.org, pp 114\u2013126, https:\/\/ceur-ws.org\/Vol-3290\/short_paper2780.pdf. (2022)"},{"issue":"4","key":"411_CR17","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1080\/09296174.2013.830552","volume":"20","author":"M Kubat","year":"2013","unstructured":"Kubat, M., Milicka, J.: Vocabulary richness measure in genres. J. Quant. Linguist. 20(4), 339 (2013). https:\/\/doi.org\/10.1080\/09296174.2013.830552","journal-title":"J. Quant. Linguist."},{"issue":"2","key":"411_CR18","doi-asserted-by":"publisher","first-page":"154","DOI":"10.1080\/15434303.2020.1844205","volume":"18","author":"K Kyle","year":"2021","unstructured":"Kyle, K., Crossley, S.A., Jarvis, S.: Assessing the validity of lexical diversity indices using direct judgements. Lang. Ass. Quart. 18(2), 154 (2021). https:\/\/doi.org\/10.1080\/15434303.2020.1844205","journal-title":"Lang. Ass. Quart."},{"issue":"4","key":"411_CR19","doi-asserted-by":"publisher","first-page":"424","DOI":"10.1080\/01616846.2022.2116886","volume":"42","author":"S Li","year":"2023","unstructured":"Li, S., Yang, F.: Green library research: a bibliometric analysis. Public Libr. Q. 42(4), 424\u201344 (2023). https:\/\/doi.org\/10.1080\/01616846.2022.2116886","journal-title":"Public Libr. Q."},{"key":"411_CR20","doi-asserted-by":"publisher","unstructured":"Mahey, M., Al-Abdulla, A., Ames, S., et\u00a0al.: Open a GLAM Lab. International GLAM Labs Community, Book Sprint, Doha, Qatar,https:\/\/doi.org\/10.21428\/16ac48ec.f54af6ae, (2019)","DOI":"10.21428\/16ac48ec.f54af6ae"},{"issue":"2","key":"411_CR21","doi-asserted-by":"publisher","first-page":"381","DOI":"10.3758\/brm.42.2.381","volume":"42","author":"PM McCarthy","year":"2010","unstructured":"McCarthy, P.M., Jarvis, S.: MTLD, vocd-d, and HD-d: a validation study of sophisticated approaches to lexical diversity assessment. Behav. Res. Methods 42(2), 381 (2010). https:\/\/doi.org\/10.3758\/brm.42.2.381","journal-title":"Behav. Res. Methods"},{"issue":"3","key":"411_CR22","doi-asserted-by":"publisher","first-page":"323","DOI":"10.1093\/llc\/15.3.323","volume":"15","author":"G McKee","year":"2000","unstructured":"McKee, G., Malvern, D., Richards, B.: Measuring vocabulary diversity using dedicated software. Liter. Linguist. Comput. 15(3), 323\u2013338 (2000). https:\/\/doi.org\/10.1093\/llc\/15.3.323","journal-title":"Liter. Linguist. Comput."},{"issue":"2","key":"411_CR23","doi-asserted-by":"publisher","first-page":"1145","DOI":"10.1007\/s11192-020-03481-x","volume":"125","author":"U Moschini","year":"2020","unstructured":"Moschini, U., Fenialdi, E., Daraio, C., et al.: A comparison of three multidisciplinarity indices based on the diversity of scopus subject areas of authors\u2019 documents, their bibliography and their citing papers. Scientometrics 125(2), 1145\u2013115 (2020). https:\/\/doi.org\/10.1007\/s11192-020-03481-x","journal-title":"Scientometrics"},{"issue":"2","key":"411_CR24","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1016\/S0143-6228(02)00002-4","volume":"22","author":"H Nagendra","year":"2002","unstructured":"Nagendra, H.: Opposite trends in response for the Shannon and Simpson indices of landscape diversity. Appl. Geogr. 22(2), 175\u2013186 (2002). https:\/\/doi.org\/10.1016\/S0143-6228(02)00002-4","journal-title":"Appl. Geogr."},{"key":"411_CR25","doi-asserted-by":"publisher","unstructured":"Padilla T, Allen L, Frost H, et\u00a0al (2019) Final Report \u2014 Always Already Computational: Collections as Data. https:\/\/doi.org\/10.5281\/zenodo.3152935,","DOI":"10.5281\/zenodo.3152935"},{"key":"411_CR26","doi-asserted-by":"publisher","unstructured":"Padilla, T., Scates Kettler, H., Varner, S., et al.: Vancouver statement on collections as data (2023). https:\/\/doi.org\/10.5281\/zenodo.8342171","DOI":"10.5281\/zenodo.8342171"},{"key":"411_CR27","doi-asserted-by":"publisher","first-page":"1112","DOI":"10.3758\/s13423-014-0585-6","volume":"21","author":"ST Piantadosi","year":"2014","unstructured":"Piantadosi, S.T.: Zipf\u2019s word frequency law in natural language: a critical review and future directions. Psychonom. Bull. Rev. 21, 1112\u201330 (2014). https:\/\/doi.org\/10.3758\/s13423-014-0585-6","journal-title":"Psychonom. Bull. Rev."},{"issue":"1","key":"411_CR28","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1016\/0040-5809(82)90004-1","volume":"21","author":"CR Rao","year":"1982","unstructured":"Rao, C.R.: Diversity and dissimilarity coefficients: a unified approach. Theor. Popul. Biol. 21(1), 24\u201343 (1982)","journal-title":"Theor. Popul. Biol."},{"issue":"7","key":"411_CR29","doi-asserted-by":"publisher","first-page":"729","DOI":"10.1080\/02664760600708970","volume":"33","author":"A Riba","year":"2006","unstructured":"Riba, A., Ginebra, J.: Diversity of vocabulary and homogeneity of literary style. J. Appl. Stat. 33(7), 729\u2013741 (2006). https:\/\/doi.org\/10.1080\/02664760600708970","journal-title":"J. Appl. Stat."},{"key":"411_CR30","doi-asserted-by":"publisher","unstructured":"Richards, B.: Type\/Token Ratios: what do they really tell us? J. Child Lang. 14(2), 201\u201320 (1987). https:\/\/doi.org\/10.1017\/S0305000900012885","DOI":"10.1017\/S0305000900012885"},{"issue":"3","key":"411_CR31","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1111\/oik.07202","volume":"130","author":"M Roswell","year":"2021","unstructured":"Roswell, M., Dushoff, J., Winfree, R.: A conceptual guide to measuring species diversity. Oikos 130(3), 321\u201333 (2021). https:\/\/doi.org\/10.1111\/oik.07202","journal-title":"Oikos"},{"key":"411_CR32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1108\/CC-01-2018-001","volume":"37","author":"A Sahu","year":"2018","unstructured":"Sahu, A., Jena, P.: Role of libraries in promoting education: a bibliometric analysis (2012\u20132016). Collect. Curat. 37, 1\u20138 (2018). https:\/\/doi.org\/10.1108\/CC-01-2018-001","journal-title":"Collect. Curat."},{"issue":"3","key":"411_CR33","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","volume":"27","author":"CE Shannon","year":"1948","unstructured":"Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379 (1948). https:\/\/doi.org\/10.1002\/j.1538-7305.1948.tb01338.x","journal-title":"Bell Syst. Tech. J."},{"key":"411_CR34","doi-asserted-by":"publisher","unstructured":"Smith-Yoshimura, K.: Transitioning to the next generation of metadata. https:\/\/doi.org\/10.25333\/rqgd-b343 (2020)","DOI":"10.25333\/rqgd-b343"},{"key":"411_CR35","unstructured":"World Wide Web Consortium: SPARQL query language for RDF. https:\/\/www.w3.org\/TR\/sparql11-overview\/ (2013)"},{"key":"411_CR36","unstructured":"World Wide Web Consortium: RDF 1.1 concepts and abstract syntax. https:\/\/www.w3.org\/TR\/rdf11-concepts\/ (2014)"}],"container-title":["International Journal on Digital Libraries"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00799-025-00411-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00799-025-00411-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00799-025-00411-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,24]],"date-time":"2025-03-24T06:30:08Z","timestamp":1742797808000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00799-025-00411-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,21]]},"references-count":36,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3]]}},"alternative-id":["411"],"URL":"https:\/\/doi.org\/10.1007\/s00799-025-00411-1","relation":{},"ISSN":["1432-5012","1432-1300"],"issn-type":[{"value":"1432-5012","type":"print"},{"value":"1432-1300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,21]]},"assertion":[{"value":"3 January 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 October 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 January 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 February 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"5"}}