{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T21:12:55Z","timestamp":1769634775331,"version":"3.49.0"},"reference-count":66,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[2021,12,13]],"date-time":"2021-12-13T00:00:00Z","timestamp":1639353600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2023,12]]},"abstract":"<jats:p>In the domain of Galleries, Libraries, Archives and Museums (GLAM) institutions, creative and innovative tools and methodologies for content delivery and user engagement have recently gained international attention. New methods have been proposed to publish digital collections as datasets amenable to computational use. Standardised benchmarks can be useful to broaden the scope of machine-actionable collections and to promote cultural and linguistic diversity. In this article, we propose a methodology to select datasets for computationally driven research applied to Spanish text corpora. This work seeks to encourage Spanish and Latin American institutions to publish machine-actionable collections based on best practices and avoiding common mistakes.<\/jats:p>","DOI":"10.1177\/01655515211060530","type":"journal-article","created":{"date-parts":[[2021,12,13]],"date-time":"2021-12-13T03:48:23Z","timestamp":1639367303000},"page":"1451-1461","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":4,"title":["A benchmark of Spanish language datasets for computationally driven research"],"prefix":"10.1177","volume":"49","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6122-0777","authenticated-orcid":false,"given":"Gustavo","family":"Candela","sequence":"first","affiliation":[{"name":"Universidad de Alicante, Spain"}]},{"given":"Mar\u00eda-Dolores","family":"S\u00e1ez","sequence":"additional","affiliation":[{"name":"Universidad de Alicante, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7705-5224","authenticated-orcid":false,"given":"Pilar","family":"Escobar","sequence":"additional","affiliation":[{"name":"Universidad de Alicante, Spain"}]},{"given":"Manuel","family":"Marco-Such","sequence":"additional","affiliation":[{"name":"Universidad de Alicante, Spain"}]}],"member":"179","published-online":{"date-parts":[[2021,12,13]]},"reference":[{"key":"e_1_3_3_2_2","volume-title":"Open a GLAM lab","author":"Mahey M","year":"2019","unstructured":"Mahey M, Al-Abdulla A, Ames S, et al. Open a GLAM lab. Doha, Qatar: QU Press, 2019."},{"key":"e_1_3_3_3_2","unstructured":"Library of Congress. Digital scholarship at the Library of Congress: a research guide https:\/\/guides.loc.gov\/digital-scholarship\/introduction"},{"key":"e_1_3_3_4_2","unstructured":"Padilla T. Responsible operations: data science machine learning and AI in libraries 2019 https:\/\/www.oclc.org\/research\/publications\/2019\/oclcresearch-responsible-operations-data-science-machine-learning-ai.html (accessed 26 June-2020)."},{"key":"e_1_3_3_5_2","doi-asserted-by":"crossref","unstructured":"Guti\u00e9rrez De la Torre SE Cuadros-S\u00e1nchez MD. Digital resources: the digital library of Ibero-American Heritage 2020 https:\/\/oxfordre.com\/latinamericanhistory\/view\/10.1093\/acrefore\/9780199366439.001.0001\/acrefore-9780199366439-e-798","DOI":"10.1093\/acrefore\/9780199366439.013.798"},{"key":"e_1_3_3_6_2","unstructured":"Mahey M Al-Abdulla A Ames S et al. Open a GLAM lab. Alicante: Biblioteca Virtual Miguel de Cervantes 2021 http:\/\/www.cervantesvirtual.com\/nd\/ark:\/59851\/bmc1066249"},{"key":"e_1_3_3_7_2","unstructured":"Unlocking the Colonial Archive. Harnessing artificial intelligence for indigenous and Spanish American collections 2021 https:\/\/unlockingarchives.com\/research\/"},{"key":"e_1_3_3_8_2","first-page":"31","article-title":"Mnemosyne: a digital library of the other silver age (origins, contents, perspectives)","volume":"30","author":"Gonz\u00e1lez Soriano JM","year":"2021","unstructured":"Gonz\u00e1lez Soriano JM. Mnemosyne: a digital library of the other silver age (origins, contents, perspectives). Signa 2021; 30: 31\u201358.","journal-title":"Signa"},{"key":"e_1_3_3_9_2","doi-asserted-by":"crossref","unstructured":"Sim SE Easterbrook SM Holt RC. Using benchmarking to advance research: a challenge to software engineering. In: Proceedings of the 25th international conference on software engineering Portland OR 3\u201310 May 2003 pp. 74\u201383. https:\/\/doi.org\/10.1109\/ICSE.2003.1201189","DOI":"10.1109\/ICSE.2003.1201189"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.3233\/SW-180323"},{"key":"e_1_3_3_11_2","unstructured":"European Commission. Cultural heritage: digitisation online accessibility and digital preservation 2018 https:\/\/ec.europa.eu\/newsroom\/dae\/document.cfm?doc_id=60045 (accessed 26 June 2020)."},{"key":"e_1_3_3_12_2","unstructured":"Europeana. Issue 16: newspapers https:\/\/pro.europeana.eu\/page\/issue-16-newspapers"},{"key":"e_1_3_3_13_2","unstructured":"Lorang E Soh LK Liu Y et al. Digital libraries intelligent data analytics and augmented description: a demonstration project 2020 https:\/\/labs.loc.gov\/static\/labs\/work\/experiments\/final-report-revised_june-2020.pdf"},{"key":"e_1_3_3_14_2","unstructured":"Padilla T Allen L Frost H et al. Final report \u2013 always already computational: collections as data 2019 https:\/\/doi.org\/10.5281\/zenodo.3152935"},{"key":"e_1_3_3_15_2","unstructured":"Harris G Potter A Zwaard K et al. Digital scholarship at the library of congress 2020 https:\/\/labs.loc.gov\/static\/labs\/work\/reports\/DHWorkingGroupPaper-v1.0.pdf (accessed 26 June 2020)."},{"key":"e_1_3_3_16_2","unstructured":"Tasovac T Chambers S T\u00f3th-Czifra E. Cultural heritage data from a humanities research perspective: a DARIAH position paper 2020 https:\/\/hal.archives-ouvertes.fr\/hal-02961317"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1177\/2053951720970576."},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1177\/0165551520950246."},{"key":"e_1_3_3_19_2","unstructured":"Davids A Gabriels N. Data-level access to Belgian historical censuses 2020 https:\/\/enrichingheritage.wordpress.com\/author\/nelegabrielsoutlookcom\/"},{"key":"e_1_3_3_20_2","unstructured":"Ziku M Gabriels N. Opening up a little more: a minimal-computing approach for developing Git and machine-actionable GLAM open data 2020 https:\/\/enrichingheritage.wordpress.com\/2020\/05\/01\/git-and-machine-actionable-data-pilot\/"},{"key":"e_1_3_3_21_2","unstructured":"Gijsbers P LeDell E Poirier S et al. An Open Source AutoML Benchmark. arXiv preprint arXiv:190700909 [csLG] 2019 https:\/\/arxiv.org\/abs\/1907.00909"},{"key":"e_1_3_3_22_2","unstructured":"Library of Congress. Selected datasets: a new Library of Congress collection 2020 https:\/\/blogs.loc.Gov\/thesignal\/2020\/06\/selected-datasets-a-new-library-of-congress-collection\/ (accessed 26 June 2020)."},{"key":"e_1_3_3_23_2","unstructured":"Library of Congress. Chronicling America https:\/\/chroniclingamerica.loc.gov\/about\/"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","unstructured":"Liceras-Garrido R Comino A Murrieta-Flores P. Transcripci\u00f3n del Cat\u00e1logo Monumental de Espa\u00f1a: Provincia de \u00c1vila por Manuel G\u00f3mez Moreno (1900-1901) 2020. DOI: 10.6084\/m9.figshare.12006318.v1.","DOI":"10.6084\/m9.figshare.12006318.v1."},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","unstructured":"Liceras-Garrido R Comino A Murrieta-Flores P. Transcripci\u00f3n del Cat\u00e1logo Monumental de la Provincia de Soria por Juan Cabr\u00e9 (1916-1917) 2020. DOI: 10.6084\/m9.figshare.12006273.v1.","DOI":"10.6084\/m9.figshare.12006273.v1."},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","unstructured":"Liceras-Garrido R Comino A Murrieta-Flores P. Transcripci\u00f3n del cat\u00e1logo monumental y art\u00edstico de la provincia de burgos por narciso sentenach (1925) 2020. DOI: 10.6084\/m9.figshare.12006327.v1.","DOI":"10.6084\/m9.figshare.12006327.v1."},{"key":"e_1_3_3_27_2","unstructured":"British Library. A collection of datasets released by the British Library https:\/\/data.bl.uk\/"},{"key":"e_1_3_3_28_2","unstructured":"Ministry of Culture. Mexicana 2017 https:\/\/mexicana.cultura.gob.mx\/en\/repositorio\/acerca"},{"key":"e_1_3_3_29_2","unstructured":"National Library of Scotland. Data Foundry. Data collections from the National Library of Scotland https:\/\/data.nls.uk\/"},{"key":"e_1_3_3_30_2","unstructured":"Austrian National Library. Data Sets. View use and reuse the digital data sets of the ONB Labs https:\/\/data.nls.uk\/"},{"key":"e_1_3_3_31_2","unstructured":"KB Labs. Datasets https:\/\/lab.kb.nl\/datasets"},{"key":"e_1_3_3_32_2","unstructured":"World Wide Web Consortium. SPARQL 1.1 query language 2013 https:\/\/www.w3.org\/TR\/sparql11-query\/"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.3233\/SW-170274"},{"key":"e_1_3_3_34_2","unstructured":"IFLA Information Technology Section; IFLA Semantic Web Special Interest Group; Biblioth\u00e8que nationale de France. We grew up together: data.bnf.Fr from the BnF and Logilab perspectives. Paris Biblioth\u00e8que nationale de France Petit auditorium: IFLA Information Technology Section; IFLA Semantic Web Special Interest Group; Biblioth\u00e8que nationale de France 2014 http:\/\/ifla2014-satdata.bnf.fr\/program.html"},{"key":"e_1_3_3_35_2","unstructured":"British Library. Basic RDF\/XML 2014 http:\/\/www.bl.uk\/bibliographic\/datafree.html#basicrdfxml (accessed 26 June 2020)."},{"issue":"8","key":"e_1_3_3_36_2","first-page":"252","article-title":"Aggregation of linked data in the cultural heritage domain: a case study in the Europeana network","volume":"10","author":"Freire N","year":"2019","unstructured":"Freire N, Voorburg R, Cornelissen R, et al. Aggregation of linked data in the cultural heritage domain: a case study in the Europeana network. Inf 2019; 10(8): 252.","journal-title":"Inf"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","unstructured":"Beals M Bell E. The atlas of digitised newspapers and metadata: reports from Oceanic Exchanges 2020. DOI: 10.6084\/m9.figshare.11560059.v2.","DOI":"10.6084\/m9.figshare.11560059.v2."},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.2218\/ijdc.v11i2.421"},{"key":"e_1_3_3_39_2","unstructured":"Padilla T Allen L Frost H et al. 50 things \u2013 always already computational: collections as data 2019 https:\/\/doi.org\/10.5281\/zenodo.3066237"},{"key":"e_1_3_3_40_2","unstructured":"Project Jupyter https:\/\/jupyter.org\/"},{"key":"e_1_3_3_41_2","unstructured":"Sherratt T. Glam-workbench\/getting-started 2019 https:\/\/doi.org\/10.5281\/zenodo.3549636"},{"key":"e_1_3_3_42_2","unstructured":"Library of Congress. LC maps for robots 2020 https:\/\/blogs.loc.gov\/thesignal\/2020\/05\/lc-maps-for-robots\/"},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1080\/15623599.2007.10773099"},{"key":"e_1_3_3_44_2","doi-asserted-by":"crossref","unstructured":"Heckman SS Williams L. On establishing a benchmark for evaluating static analysis alert prioritization and classification techniques. In: Proceedings of the second international symposium on empirical software engineering and measurement (ESEM 2008) Kaiserslautern 9\u201310 October 2008 pp. 41\u201350 https:\/\/doi.org\/10.1145\/1414004.1414013","DOI":"10.1145\/1414004.1414013"},{"key":"e_1_3_3_45_2","doi-asserted-by":"publisher","DOI":"10.1093\/database\/baz117"},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.1177\/0165551520930951."},{"key":"e_1_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1006750"},{"key":"e_1_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.1080\/07421222.1996.11518099"},{"key":"e_1_3_3_49_2","unstructured":"Library of Congress. OCR data https:\/\/chroniclingamerica.loc.gov\/ocr\/"},{"key":"e_1_3_3_50_2","unstructured":"Library of Congress. By the people https:\/\/crowd.loc.gov\/"},{"key":"e_1_3_3_51_2","unstructured":"Biblioteca Nacional de Espa\u00f1a. Comunidad BNE https:\/\/comunidad.bne.es\/"},{"key":"e_1_3_3_52_2","unstructured":"British Library. Libcrowds https:\/\/www.libcrowds.com\/"},{"key":"e_1_3_3_53_2","unstructured":"Europeana. Europeana transcribe https:\/\/europeana.transcribathon.eu"},{"key":"e_1_3_3_54_2","doi-asserted-by":"crossref","unstructured":"Lee BCG Mears J Jakeway E et al. The newspaper navigator dataset: Extracting and analyzing visual content from 16 million historic newspaper pages in chronicling America 2020 https:\/\/arxiv.org\/abs\/2005.01583","DOI":"10.1145\/3340531.3412767"},{"key":"e_1_3_3_55_2","unstructured":"Thomas Padilla. On a collections as data imperative https:\/\/labs.loc.gov\/static\/labs\/work\/reports\/tpadilla_OnaCollectionsasDataImperative_final.pdf"},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.1038\/sdata.2016.18"},{"key":"e_1_3_3_57_2","unstructured":"Snydman S Sanderson R Cramer T. The International Image Interoperability Framework (IIIF): a community & technology approach for web-based images 2015 https:\/\/stacks.stanford.edu\/file\/druid:df650pk4327\/2015ARCHIVING_IIIF.pdf"},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/CultureComputing.2013.10."},{"key":"e_1_3_3_59_2","unstructured":"Research Libraries UK. The role of academic and research libraries as active participants and leaders in the production of scholarly research 2021 https:\/\/www.rluk.ac.uk\/wp-content\/uploads\/2021\/07\/RLUK-Scoping-Study-Report.pdf"},{"key":"e_1_3_3_60_2","unstructured":"Binder https:\/\/mybinder.org\/"},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.1108\/JD-09-2016-0106"},{"key":"e_1_3_3_62_2","doi-asserted-by":"crossref","unstructured":"van Strien D Beelen K Ardanuy MC et al. Assessing the impact of OCR quality on downstream NLP tasks. In: Proceedings of the 12th international conference on agents and artificial intelligence (ICAART 2020) vol. 1 Valletta 22\u201324 February 2020 pp. 484\u2013496 https:\/\/doi.org\/10.5220\/0009169004840496","DOI":"10.5220\/0009169004840496"},{"key":"e_1_3_3_63_2","unstructured":"Poncelas A Aboomar M Buts J et al. A tool for facilitating OCR postediting in historical documents 2020 https:\/\/arxiv.org\/abs\/2004.11471"},{"key":"e_1_3_3_64_2","doi-asserted-by":"crossref","unstructured":"Colutto S Kahle P Hackl G et al. Transkribus. A platform for automated text recognition and searching of historical documents. In: 15th international conference on eScience (eScience 2019) San Diego CA 24\u201327 September 2019 pp. 463\u2013466 https:\/\/doi.org\/10.1109\/eScience.2019.00060","DOI":"10.1109\/eScience.2019.00060"},{"key":"e_1_3_3_65_2","doi-asserted-by":"crossref","unstructured":"IFLA Study Group on the Functional Requirements for Bibliographic Records. Functional requirements for bibliographic records 1998 https:\/\/www.ifla.org\/publications\/functional-requirements-for-bibliographic-records","DOI":"10.1515\/9783110962451"},{"key":"e_1_3_3_66_2","unstructured":"RDA Steering Committee. RDA registry 2014 http:\/\/www.rdaregistry.info\/"},{"key":"e_1_3_3_67_2","unstructured":"Temnikova I Baumgartner WAJr Hailu ND et al. Sublanguage corpus analysis toolkit: a tool for assessing the representativeness and sublanguage characteristics of corpora. In: Proceedings of the ninth international conference on language resources and evaluation (LREC\u201914). Reykjavik Iceland: European Language Resources Association (ELRA) pp. 1714\u20131718 http:\/\/www.lrec-conf.org\/proceedings\/lrec2014\/pdf\/675_Paper.pdf"}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01655515211060530","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/01655515211060530","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01655515211060530","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,6]],"date-time":"2026-01-06T08:07:43Z","timestamp":1767686863000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/01655515211060530"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,13]]},"references-count":66,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,12]]}},"alternative-id":["10.1177\/01655515211060530"],"URL":"https:\/\/doi.org\/10.1177\/01655515211060530","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"value":"0165-5515","type":"print"},{"value":"1741-6485","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,13]]}}}