{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,18]],"date-time":"2025-05-18T15:40:05Z","timestamp":1747582805518,"version":"3.40.5"},"reference-count":101,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,7,18]],"date-time":"2024-07-18T00:00:00Z","timestamp":1721260800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,7,18]],"date-time":"2024-07-18T00:00:00Z","timestamp":1721260800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Artificial Intelligence and Computer Science Laboratory","award":["UIDB\/00027\/2020","UIDB\/00027\/2020"],"award-info":[{"award-number":["UIDB\/00027\/2020","UIDB\/00027\/2020"]}]},{"DOI":"10.13039\/501100006752","name":"Universidade do Porto","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006752","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Lang Resources &amp; Evaluation"],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>This survey aims to thoroughly examine and evaluate the current landscape of electronic corpora in historical Portuguese. This is achieved through a comprehensive analysis of existing resources. The article makes two main contributions. The first is an exhaustive cataloguing of existing Portuguese historical corpora, where each corpus is meticulously detailed regarding linguistic periods, geographic origins, and thematic contents. The second contribution focuses on the digital accessibility of these corpora for researchers. These contributions are crucial in enhancing and progressing the study of historical corpora in the Portuguese language, laying a critical groundwork for future linguistic research in this field. Our survey identified 20 freely accessible corpora, comprising approximately 63.9 million tokens, and two private corpora, totalling 59.9 million tokens.<\/jats:p>","DOI":"10.1007\/s10579-024-09757-5","type":"journal-article","created":{"date-parts":[[2024,7,18]],"date-time":"2024-07-18T14:02:31Z","timestamp":1721311351000},"page":"1797-1832","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Historical Portuguese corpora: a survey"],"prefix":"10.1007","volume":"59","author":[{"given":"Tom\u00e1s Freitas","family":"Os\u00f3rio","sequence":"first","affiliation":[]},{"given":"Henrique","family":"Lopes Cardoso","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,7,18]]},"reference":[{"key":"9757_CR1","unstructured":"Abadji, J., Ortiz Suarez, P., Romary, L., & Sagot, B. (2022). Towards a cleaner document-oriented multilingual crawled corpus. arXiv e-prints, 2201\u201306642 arXiv:2201.06642 [cs.CL]"},{"key":"9757_CR2","unstructured":"Alatrash, R., Schlechtweg, D., Kuhn, J., & Walde, S. (2020). CCOHA: Clean Corpus of Historical American English. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 6958\u20136966. European Language Resources Association, Marseille, France. https:\/\/aclanthology.org\/2020.lrec-1.859 Accessed 2023-09-23"},{"key":"9757_CR3","unstructured":"Arquivo dos A\u00e7ores. https:\/\/hdl.handle.net\/21.11129\/0000-000D-F8C0-2. Accessed: 16-5-2023"},{"key":"9757_CR4","unstructured":"ARQUIVO PESSOA. http:\/\/arquivopessoa.net\/. Accessed: 15-05-2023"},{"key":"9757_CR5","unstructured":"As Mem\u00f3rias Paroquiais de 1758. http:\/\/www.cidehusdigital.uevora.pt\/portugal1758. Accessed: 15-05-2023"},{"key":"9757_CR6","unstructured":"Barreto, J. F. (1671). Ortografia da L\u00edngua Portugueza. Biblioteca Nacional, [L-323-V] purl pt, biblioteca nacional digital, Portugal"},{"key":"9757_CR7","doi-asserted-by":"crossref","unstructured":"Bick, E. (2006). Functional aspects in portuguese ner. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) Computational Processing of the Portuguese Language, pp. 80\u201389. Springer, Berlin, Heidelberg","DOI":"10.1007\/11751984_9"},{"key":"9757_CR8","unstructured":"Bick, E. (2014). In: Sardinha, T., Ferreira, T. (eds.) PALAVRAS - A Constraint Grammar-Based Parsing System for Portuguese, pp. 279\u2013302. Bloomsbury Academic, New York."},{"key":"9757_CR9","doi-asserted-by":"crossref","unstructured":"Bick, E., & Zampieri, M. (2016). Grammatical annotation of historical portuguese: Generating a corpus-based diachronic dictionary. In P. Sojka, A. Hor\u00e1k, I. Kope\u010dek, & K. Pala (Eds.), Text, Speech, and Dialogue (pp. 3\u201311). Cham: Springer.","DOI":"10.1007\/978-3-319-45510-5_1"},{"key":"9757_CR10","doi-asserted-by":"publisher","unstructured":"Blank, A. (1999). In: Blank, A., Koch, P. (eds.) Why do new meanings occur? A cognitive typology of the motivations for lexical semantic change, pp. 61\u201390. De Gruyter Mouton, Berlin, Boston. https:\/\/doi.org\/10.1515\/9783110804195.61","DOI":"10.1515\/9783110804195.61"},{"key":"9757_CR11","doi-asserted-by":"publisher","unstructured":"Bowern, C. (2019). Semantic change and semantic stability: Variation is key. In: Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pp. 48\u201355. Association for Computational Linguistics, Florence, Italy. https:\/\/doi.org\/10.18653\/v1\/W19-4706. https:\/\/aclanthology.org\/W19-4706","DOI":"10.18653\/v1\/W19-4706"},{"key":"9757_CR12","doi-asserted-by":"crossref","unstructured":"Branco, A., Silva, J. R. (2006). A suite of shallow processing tools for Portuguese: LX-suite. In: Demonstrations, pp. 179\u2013182. https:\/\/aclanthology.org\/E06-2024","DOI":"10.3115\/1608974.1609003"},{"key":"9757_CR13","doi-asserted-by":"publisher","unstructured":"Calder\u00f3n\u00a0Campos, M., & D\u00edaz-Bravo, R. (2021). An online corpus for the study of historical dialectology: Oralia diacr\u00f3nica del espa\u00f1ol. Digital Scholarship in the Humanities 36(Supplement_2), 30\u201348. https:\/\/doi.org\/10.1093\/llc\/fqaa066. https:\/\/academic.oup.com\/dsh\/article-pdf\/36\/Supplement_2\/ii30\/41091229\/fqaa066.pdf","DOI":"10.1093\/llc\/fqaa066"},{"key":"9757_CR14","doi-asserted-by":"crossref","unstructured":"Carvalho, M. S. d., & Cabecinhas, R. (2013). The orthographic (dis)agreement and the portuguese identity threat. Lusofonia and Its Futures, 82\u201395","DOI":"10.62791\/xg98b252"},{"key":"9757_CR15","unstructured":"Chancelaria de D. Afonso III: documentos em portugu\u00eas. https:\/\/hdl.handle.net\/21.11129\/0000-000D-FE7C-B. Accessed: 16-5-2023"},{"key":"9757_CR16","unstructured":"Ciobanu, A. M., Dinu, L. P., \u015eulea, O.-M., Dinu, A., & Niculae, V. (2013). Temporal text classification for Romanian novels set in the past. In: Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pp. 136\u2013140. INCOMA Ltd. Shoumen, BULGARIA, Hissar, Bulgaria. https:\/\/aclanthology.org\/R13-1018"},{"key":"9757_CR17","unstructured":"Claridge, C. (2008). Historical corpora. In: L\u00fcdeling, A., Kyt\u00f6, M. (eds.) Corpus Linguistics : an International Handbook ; Volume 1"},{"key":"9757_CR18","unstructured":"Comunidade dos Pa\u00edses de L\u00edngua Portuguesa (CPLP). https:\/\/www.cplp.org\/id-2597.aspx. Accessed: 2022-10-26"},{"key":"9757_CR19","unstructured":"Corpus diacr\u00f3ico y diat\u00f3pico del espa\u00f1ol de Am\u00e9rica. https:\/\/www.cordiam.org\/. Accessed: 23-04-2024"},{"key":"9757_CR20","unstructured":"Corpus Eletr\u00f4nico de Documentos Hist\u00f3ricos do Sert\u00e3o. http:\/\/www5.uefs.br\/cedohs\/view\/home.html. Accessed: 16-5-2023"},{"key":"9757_CR21","unstructured":"Corpus Hist\u00f3rico da Linguagem da Medicina em Portugu\u00eas (S\u00e9culo XVIII): Terminologia Diacr\u00f4nica e Humanidades Digitais. https:\/\/sites.google.com\/view\/projeto38597. Accessed: 16-5-2023"},{"key":"9757_CR22","unstructured":"Corpus L\u00e9xico de Inventarios. https:\/\/corlexin.unileon.es\/el-corpus\/. Accessed: 23-04-2024"},{"key":"9757_CR23","doi-asserted-by":"publisher","unstructured":"Couss\u00e9, E. (2011). Een digitaal compilatiecorpus historisch nederlands. Lexikos 20, https:\/\/doi.org\/10.5788\/20-0-136","DOI":"10.5788\/20-0-136"},{"key":"9757_CR24","unstructured":"CTACorpus. http:\/\/teitok.clul.ul.pt\/cta\/. Accessed: 13-12-2022"},{"key":"9757_CR25","unstructured":"Culpeper, J., & Kyt\u00f6, M. (1997). Towards a corpus of dialogues, 1550-1750. In: Language in Time and Space : Studies in Honour of Wolfgang Viereck on the Occasion of His 60th Birthday. Zeitschrift f\u00fcr Dialektologie und Linguistik. Beihefte, vol. 97, pp. 60\u201373. Franz Steiner. Stuttgart., ???"},{"issue":"1","key":"9757_CR26","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1177\/00754240022004884","volume":"28","author":"A Curzan","year":"2000","unstructured":"Curzan, A. (2000). English historical corpora in the classroom: The intersection of teaching and research. Journal of English Linguistics, 28(1), 77\u201389. https:\/\/doi.org\/10.1177\/00754240022004884","journal-title":"Journal of English Linguistics"},{"key":"9757_CR27","unstructured":"Davies, M. (2006). Corpus do Portugu\u00eas: 45 Million Words, 1300s - 1900s. http:\/\/www.corpusdoportugues.org"},{"key":"9757_CR28","doi-asserted-by":"publisher","first-page":"121","DOI":"10.3366\/cor.2012.0024","volume":"7","author":"M Davies","year":"2012","unstructured":"Davies, M. (2012). Expanding horizons in historical linguistics with the 400-million word corpus of historical american english. Corpora, 7, 121\u2013157. https:\/\/doi.org\/10.3366\/cor.2012.0024","journal-title":"Corpora"},{"key":"9757_CR29","unstructured":"Delpher. https:\/\/www.delpher.nl\/over-delpher\/delpher-open-krantenarchief\/wat-zit-er-in-het-delpher-open-krantenarchief#e6bce. Accessed: 23-04-2024"},{"key":"9757_CR30","unstructured":"Digital Library of Dutch Literature. https:\/\/www.kb.nl\/en\/research-find\/datasets\/dbnl-dataset. Accessed: 23-04-2024"},{"key":"9757_CR31","unstructured":"Early English Books Online Corpus. https:\/\/www.english-corpora.org\/eebo\/. Accessed: 23-04-2024"},{"key":"9757_CR32","unstructured":"Early English Correspondence Corpus. https:\/\/www.helsinki.fi\/en\/researchgroups\/variation-contacts-and-change-in-english\/research\/corpus-of-early-english-correspondence. Accessed: 23-04-2024"},{"key":"9757_CR33","unstructured":"Early English Medical Writing. https:\/\/varieng.helsinki.fi\/series\/volumes\/14\/taavitsainen_pahta\/. Accessed: 23-04-2024"},{"key":"9757_CR34","unstructured":"Eberhard, D. M., Simons, G. F., & Fennig, C. D. (2023). Ethnologue: Languages of the World, 26 edn. SIL International, Dallas. http:\/\/www.ethnologue.com"},{"key":"9757_CR35","unstructured":"Eighteenth Century Collections Online. https:\/\/www.gale.com\/primary-sources\/eighteenth-century-collections-online. Accessed: 23-04-2024"},{"key":"9757_CR36","unstructured":"Evans Early American Imprints Collection. https:\/\/textcreationpartnership.org\/tcp-texts\/evans-tcp-evans-early-american-imprints\/. Accessed: 23-04-2024"},{"key":"9757_CR37","unstructured":"Evert, S. (2008). A lightweight and efficient tool for cleaning web pages. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC\u201908). European Language Resources Association (ELRA), Marrakech, Morocco. http:\/\/www.lrec-conf.org\/proceedings\/lrec2008\/pdf\/885_paper.pdf"},{"key":"9757_CR38","doi-asserted-by":"publisher","unstructured":"Falc\u00e3o, M., Dias, M., & Lopes, C. T. (2022). Manual Transcriptions of Typewritten Digital Representations of Portuguese Cultural Heritage Documents from the 20th Century. INESC TEC. https:\/\/doi.org\/10.25747\/WPNA-JE39","DOI":"10.25747\/WPNA-JE39"},{"key":"9757_CR39","unstructured":"Feij\u00f3, J. M. M. (1739). Orthographia Ou Arte de Escrever e Pronunciar Com Acerto a L\u00edngua Portugueza. Biblioteca Nacional, L-5049-A] purl.pt biblioteca nacional digital, Portugal."},{"key":"9757_CR40","unstructured":"Finatto, M. J., Quaresma, P., & Gon\u00e7alves, M. F. (2018). Portuguese corpora of the 18th century: old medicine texts for teaching and research. Proceedings of the Conference on Language Techonologies & Digital Humanities"},{"key":"9757_CR41","unstructured":"Finatto, M. J., Quaresma, P., & Gon\u00e7alves, M. F. (2018). Portuguese corpora of the 18th century: old medicine texts for teaching and research. Proceedings of the Conference on Language Techonologies & Digital Humanities, 114\u2013120. 10174\/23606"},{"issue":"9","key":"9757_CR42","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1515\/ling.1964.2.9.32","volume":"2","author":"JA Fishman","year":"1964","unstructured":"Fishman, J. A. (1964). Language maintenance and language shift as a field of inquiry. a definition of the field and suggestions for its further development. Linguistics, 2(9), 32\u201370. https:\/\/doi.org\/10.1515\/ling.1964.2.9.32","journal-title":"Linguistics"},{"key":"9757_CR43","unstructured":"Gabay, S., Ortiz\u00a0Suarez, P., Bartz, A., Chagu\u00e9, A., Bawden, R., Gambette, P., Sagot, B. (2022). From FreEM to d\u2019AlemBERT: a large corpus and a language model for early Modern French. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 3367\u20133374. European Language Resources Association, Marseille, France. https:\/\/aclanthology.org\/2022.lrec-1.359"},{"key":"9757_CR44","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1075\/lv.00004.gal","volume":"18","author":"C Galves","year":"2018","unstructured":"Galves, C. (2018). The tycho brahe corpus of historical portuguese: Methodology and results. Linguistic Variation, 18, 49\u201373. https:\/\/doi.org\/10.1075\/lv.00004.gal","journal-title":"Linguistic Variation"},{"key":"9757_CR45","doi-asserted-by":"crossref","unstructured":"G\u00e9n\u00e9reux, M., Hendrickx, I., & Mendes, A. (2012). A large portuguese corpus on-line: Cleaning and preprocessing. Computational Processing of the Portuguese Language. PROPOR, 113\u2013120","DOI":"10.1007\/978-3-642-28885-2_13"},{"key":"9757_CR46","unstructured":"Geyken, A., Haaf, S., Jurish, B., Schulz, M., Steinmann, J., Thomas, C., & Wiegand, F. (2010). Das deutsche textarchiv: Vom historischen korpus zum aktiven archiv. In: Digitale Wissenschaft. Stand und Entwicklung Digital Vernetzter Forschung in Deutschland, pp. 157\u2013161"},{"key":"9757_CR47","unstructured":"GMHP. http:\/\/www.usp.br\/gmhp\/CorpI.html. Accessed: 14-06-2022"},{"key":"9757_CR48","unstructured":"Gomes, M., Guilherme, A., Tavares, L., & Marquilhas, R. (2012). Project FLY: a multidisciplinary project within linguistics. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC\u201912), pp. 2833\u20132837. European Language Resources Association (ELRA), Istanbul, Turkey. http:\/\/www.lrec-conf.org\/proceedings\/lrec2012\/pdf\/1031_Paper.pdf"},{"key":"9757_CR49","unstructured":"Grilo, S., Bolrinha, M., Silva, J., Vaz, R., & Branco, A. (2020). The bdcam\u00f5es collection of portuguese literary documents: a research resource for language technology and digital humanities. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 849\u2013854."},{"key":"9757_CR50","doi-asserted-by":"crossref","unstructured":"Haji\u010d, J., Ciaramita, M., Johansson, R., Kawahara, D., Mart\u00ed, M.A., M\u00e0rquez, L., Meyers, A., Nivre, J., Pad\u00f3, S., \u0160t\u011bp\u00e1nek, J., Stra\u0148\u00e1k, P., Surdeanu, M., Xue, N., & Zhang, Y. (2009). The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. In: Haji\u010d, J. (ed.) Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, pp. 1\u201318. Association for Computational Linguistics, Boulder, Colorado. https:\/\/aclanthology.org\/W09-1201","DOI":"10.3115\/1596409.1596411"},{"key":"9757_CR51","doi-asserted-by":"publisher","unstructured":"Hamilton, W. L., Leskovec, J., Jurafsky, D. (2016). Diachronic word embeddings reveal statistical laws of semantic change. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1489\u20131501. Association for Computational Linguistics, Berlin, Germany. https:\/\/doi.org\/10.18653\/v1\/P16-1141. https:\/\/aclanthology.org\/P16-1141","DOI":"10.18653\/v1\/P16-1141"},{"key":"9757_CR52","doi-asserted-by":"publisher","unstructured":"Kepler, F. (2005). Um etiquetador morfo-sint\u00e1tico baseado em cadeias de Markov de tamanho vari\u00e1vel. https:\/\/doi.org\/10.11606\/D.45.2005.tde-20210729-141428","DOI":"10.11606\/D.45.2005.tde-20210729-141428"},{"key":"9757_CR53","doi-asserted-by":"publisher","unstructured":"Kerswill, P. (2006). Migration and Language vol. Volume 3, pp. 2271\u20132285. De Gruyter Mouton, Berlin $$\\bullet $$ New York. https:\/\/doi.org\/10.1515\/9783110184181.3.10.2271","DOI":"10.1515\/9783110184181.3.10.2271"},{"key":"9757_CR54","doi-asserted-by":"publisher","unstructured":"Kissos, I., & Dershowitz, N. (2016). Ocr error correction using character correction and feature-based word classification. 2th IAPR Workshop on Document Analysis Systems (DAS), 198\u2013203, https:\/\/doi.org\/10.1109\/DAS.2016.44","DOI":"10.1109\/DAS.2016.44"},{"key":"9757_CR55","unstructured":"Klein, T., & Dipper, S. (2016). Handbuch zum referenzkorpus mittelhochdeutsch. In: Bochumer Linguistische Arbeitsberichte, vol. 19"},{"key":"9757_CR56","unstructured":"Kroch, A. (2020). Penn Parsed Corpora of Historical English LDC2020T16. Web download. Philadelphia: Linguistic Data Consortium."},{"key":"9757_CR57","unstructured":"Kutuzov, A., \u00d8vrelid, L., Szymanski, T., & Velldal, E. (2018). Diachronic word embeddings and semantic shifts: a survey. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1384\u20131397. Association for Computational Linguistics, Santa Fe, New Mexico, USA. https:\/\/aclanthology.org\/C18-1117"},{"key":"9757_CR58","doi-asserted-by":"publisher","unstructured":"Kyt\u00f6, M. (2010). Corpora and historical linguistics. Revista Brasileira de Lingu\u00edstica Aplicada 11, 417\u2013457. https:\/\/doi.org\/10.1590\/S1984-63982011000200007","DOI":"10.1590\/S1984-63982011000200007"},{"key":"9757_CR59","unstructured":"Lampeter Corpus of Early Modern English Tracts. http:\/\/korpus.uib.no\/icame\/manuals\/LAMPETER\/LAMPHOME.HTM. Accessed: 23-04-2024"},{"issue":"3","key":"9757_CR60","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1108\/00220410410534185","volume":"60","author":"Z Liu","year":"2004","unstructured":"Liu, Z. (2004). The evolution of documents and its impacts. Journal of Documentation, 60(3), 279\u2013288. https:\/\/doi.org\/10.1108\/00220410410534185. Accessed 2023-12-08.","journal-title":"Journal of Documentation"},{"key":"9757_CR61","doi-asserted-by":"crossref","unstructured":"Manjavacas\u00a0Arevalo, E., & Fonteyn, L. (2022). Non-parametric word sense disambiguation for historical languages. In: Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities, pp. 123\u2013134. Association for Computational Linguistics, Taipei, Taiwan. https:\/\/aclanthology.org\/2022.nlp4dh-1.16.","DOI":"10.18653\/v1\/2022.nlp4dh-1.16"},{"key":"9757_CR62","doi-asserted-by":"publisher","unstructured":"Manjavacas, E., & Fonteyn, L. (2022). Adapting vs. Pre-training Language Models for Historical Languages. Journal of Data Mining & Digital Humanities NLP4DH, https:\/\/doi.org\/10.46298\/jdmdh.9152","DOI":"10.46298\/jdmdh.9152"},{"key":"9757_CR63","unstructured":"Marot, .-. Cl\u00e9ment, Bergerac, .-., Bergerac, .-., Moli\u00e8re, .-., Br\u00e9court, .-. Guillaume Marcoureau\u00a0de, Jouin, .-. Nicolas, Coustelier, d.. Antoine\u00a0Urbain: Paris speech in the past. Oxford Text Archive (2001). http:\/\/hdl.handle.net\/20.500.12024\/2423"},{"issue":"1","key":"9757_CR64","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1515\/probus-2015-0002","volume":"27","author":"A Nevins","year":"2015","unstructured":"Nevins, A., Rodrigues, C., & Tang, K. (2015). The rise and fall of the l-shaped morphome: diachronic and experimental studies. Probus, 27(1), 101\u2013155. https:\/\/doi.org\/10.1515\/probus-2015-0002","journal-title":"Probus"},{"key":"9757_CR65","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3453476","volume":"54","author":"TTH Nguyen","year":"2022","unstructured":"Nguyen, T. T. H., Jatowt, A., Coustaty, M., & Doucet, A. (2022). Survey of post-ocr processing approaches. ACM Computing Surveys, 54, 1\u201337. https:\/\/doi.org\/10.1145\/3453476","journal-title":"ACM Computing Surveys"},{"key":"9757_CR66","doi-asserted-by":"publisher","unstructured":"Niculae, V., Zampieri, M., Dinu, L., & Ciobanu, A. M. (2014). Temporal text ranking and automatic dating of texts. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Volume 2: Short Papers, pp. 17\u201321. Association for Computational Linguistics, Gothenburg, Sweden. https:\/\/doi.org\/10.3115\/v1\/E14-4004. https:\/\/aclanthology.org\/E14-4004","DOI":"10.3115\/v1\/E14-4004"},{"key":"9757_CR67","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1086\/686075","volume":"107","author":"B Ogilvie","year":"2016","unstructured":"Ogilvie, B. (2016). Scientific archives in the age of digitization. ISIS, 107, 77\u201385. https:\/\/doi.org\/10.1086\/686075","journal-title":"ISIS"},{"key":"9757_CR68","unstructured":"Old Bailey Corpus. https:\/\/www.oldbaileyonline.org\/. Accessed: 23-04-2024"},{"key":"9757_CR69","doi-asserted-by":"crossref","unstructured":"Pereira, S. (2015). A anota\u00e7\u00e3o sint\u00e1tica de textos medievais portugueses. In: Scriptum Digital, vol. 4, pp. 125\u2013142","DOI":"10.5565\/rev\/scriptum.59"},{"key":"9757_CR70","doi-asserted-by":"crossref","unstructured":"Pettersson, E., & Megyesi, B. (2018). The histcorp collection of historical corpora and resources. In: Digital Humanities in the Nordic Countries Conference. https:\/\/api.semanticscholar.org\/CorpusID:19243754","DOI":"10.5617\/dhnbpub.11045"},{"key":"9757_CR71","doi-asserted-by":"publisher","unstructured":"Philips, J., & Tabrizi, N. (2020). Historical document processing: A survey of techniques, tools, and trends. Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management 1, 341\u2013349, https:\/\/doi.org\/10.5220\/0010177403410349","DOI":"10.5220\/0010177403410349"},{"key":"9757_CR72","unstructured":"Pichel\u00a0Campos, J. R., Gamallo, P., & Alegria, I. (2018). Measuring language distance among historical varieties using perplexity. application to European Portuguese. In: Zampieri, M., Nakov, P., Ljube\u0161i\u0107, N., Tiedemann, J., Malmasi, S., Ali, A. (eds.) Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pp. 145\u2013155. Association for Computational Linguistics, Santa Fe, New Mexico, USA. https:\/\/aclanthology.org\/W18-3916"},{"key":"9757_CR73","doi-asserted-by":"publisher","unstructured":"Popescu, O., Strapparava, C. (2015). SemEval 2015, task 7: Diachronic text evaluation. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 870\u2013878. Association for Computational Linguistics, Denver, Colorado. https:\/\/doi.org\/10.18653\/v1\/S15-2147. https:\/\/aclanthology.org\/S15-2147","DOI":"10.18653\/v1\/S15-2147"},{"key":"9757_CR74","unstructured":"Project Gutenberg. https:\/\/www.gutenberg.org\/browse\/languages\/pt. Accessed: 15-05-2023"},{"key":"9757_CR75","unstructured":"Reis Gon\u00e7alves\u00a0Viana, A. (1904). Ortografia Nacional. Simplifica\u00e7\u00e3o e Uniformiza\u00e7\u00e3o Sistem\u00e1tica Das Ortografias Portuguesas. Lisboa Viuva Tavares Cardoso, Portugal."},{"key":"9757_CR76","unstructured":"Ricardo, M. M. C. (2009). Breve hist\u00f3ria do acordo ortogr\u00e1fico. Revista Lus\u00f3fona de Educa\u00e7\u00e3o 13"},{"key":"9757_CR77","doi-asserted-by":"publisher","unstructured":"Rigaud, C., Doucet, A., Coustaty, M., & Moreux, J.-P. (2019). Icdar 2019 competition on post-ocr text correction. International Conference on Document Analysis and Recognition (ICDAR), 1588\u20131593 https:\/\/doi.org\/10.1109\/ICDAR.2019.00255","DOI":"10.1109\/ICDAR.2019.00255"},{"key":"9757_CR78","doi-asserted-by":"publisher","unstructured":"Rijhwani, S., Anastasopoulos, A., & Neubig, G. (2020). Ocr post correction for endangered language texts. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 5931\u20135942 https:\/\/doi.org\/10.18653\/v1\/2020.emnlp-main.478","DOI":"10.18653\/v1\/2020.emnlp-main.478"},{"key":"9757_CR79","doi-asserted-by":"publisher","unstructured":"Sahlgren, M., & Lenci, A. (2016). The effects of data size and frequency range on distributional semantic models. In: Su, J., Duh, K., Carreras, X. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 975\u2013980. Association for Computational Linguistics, Austin, Texas. https:\/\/doi.org\/10.18653\/v1\/D16-1099. https:\/\/aclanthology.org\/D16-1099","DOI":"10.18653\/v1\/D16-1099"},{"issue":"4","key":"9757_CR80","doi-asserted-by":"publisher","first-page":"1327","DOI":"10.1007\/s10579-013-9239-y","volume":"47","author":"F S\u00e1nchez-Mart\u00ednez","year":"2013","unstructured":"S\u00e1nchez-Mart\u00ednez, F., Mart\u00ednez-Sempere, I., Ivars-Ribes, X., & Carrasco, R. C. (2013). An open diachronic corpus of historical spanish. Language Resources and Evaluation, 47(4), 1327\u20131342.","journal-title":"Language Resources and Evaluation"},{"key":"9757_CR81","doi-asserted-by":"crossref","unstructured":"S\u00e1nchez-Prieto\u00a0Borja, P. Desarrollo y explotaci\u00f3n del \"corpus de documentos espa\u00f1oles anteriores a 1700\u201d (codea). Scriptum digital. Revista de corpus diacr\u00f2nics i edici\u00f3 digital en Lleng\u00fces iberorom\u00e0niques (1), 5\u201335 (1)","DOI":"10.5565\/rev\/scriptum.31"},{"key":"9757_CR82","doi-asserted-by":"publisher","unstructured":"Santos, D. (2021). Portuguese novel collection (eltec-por). https:\/\/doi.org\/10.5281\/zenodo.4288235","DOI":"10.5281\/zenodo.4288235"},{"key":"9757_CR83","doi-asserted-by":"publisher","first-page":"57","DOI":"10.5617\/osla.1462","volume":"7","author":"D Santos","year":"2015","unstructured":"Santos, D., & Mota, C. (2015). A admira\u00e7\u00e3o \u00e0 luz dos corpos. Oslo Studies in Language, 7, 57\u201377. https:\/\/doi.org\/10.5617\/osla.1462","journal-title":"Oslo Studies in Language"},{"key":"9757_CR84","unstructured":"Scheible, S., Whitt, R. J., Durrell, M., & Bennett, P. (2011). A gold standard corpus of early Modern German. In: Proceedings of the 5th Linguistic Annotation Workshop, pp. 124\u2013128. Association for Computational Linguistics, Portland, Oregon, USA. https:\/\/aclanthology.org\/W11-0415"},{"issue":"1","key":"9757_CR85","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1146\/annurev.an.15.100186.001115","volume":"15","author":"BB Schieffelin","year":"1986","unstructured":"Schieffelin, B. B., & Ochs, E. (1986). Language socialization. Annual Review of Anthropology, 15(1), 163\u2013191. https:\/\/doi.org\/10.1146\/annurev.an.15.100186.001115","journal-title":"Annual Review of Anthropology"},{"key":"9757_CR86","unstructured":"Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK."},{"key":"9757_CR87","doi-asserted-by":"crossref","unstructured":"Tahmasebi, N., & Risse, T. (2017). On the uses of word sense change for research in the digital humanities. In J. Kamps, G. Tsakonas, Y. Manolopoulos, L. Iliadis, & I. Karydis (Eds.), Research and Advanced Technology for Digital Libraries (pp. 246\u2013257). Cham: Springer.","DOI":"10.1007\/978-3-319-67008-9_20"},{"key":"9757_CR88","unstructured":"Tahmasebi, N., Borin, L., & Jatowt, A. (2019). Survey of computational approaches to lexical semantic change. arXiv preprint arXiv:1811.06278"},{"issue":"5","key":"9757_CR89","doi-asserted-by":"publisher","first-page":"649","DOI":"10.1017\/S1351324918000220","volume":"24","author":"X Tang","year":"2018","unstructured":"Tang, X. (2018). A state-of-the-art of semantic change computation. Natural Language Engineering, 24(5), 649\u2013676. https:\/\/doi.org\/10.1017\/S1351324918000220","journal-title":"Natural Language Engineering"},{"key":"9757_CR90","unstructured":"Teyssier, P. (2001). Hist\u00f3ria da L\u00edngua Portuguesa, pp. 31\u201335. S\u00e1 da Costa, Portugal."},{"key":"9757_CR91","unstructured":"The Corpus of Late Modern English Texts (Extended Version). 2006. Compiled by Hendrik De Smet. Department of Linguistics, University of Leuven. https:\/\/varieng.helsinki.fi\/CoRD\/corpora\/CLMETEV\/. Accessed: 23-04-2024"},{"key":"9757_CR92","doi-asserted-by":"publisher","unstructured":"Tian, Z., Jarrett, D., Escalona\u00a0Torres, J., & Amaral, P. (2021). BAHP: Benchmark of assessing word embeddings in historical Portuguese. In: Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 113\u2013119. Association for Computational Linguistics, Punta Cana, Dominican Republic (online). https:\/\/doi.org\/10.18653\/v1\/2021.latechclfl-1.13. https:\/\/aclanthology.org\/2021.latechclfl-1.13","DOI":"10.18653\/v1\/2021.latechclfl-1.13"},{"key":"9757_CR93","unstructured":"Time Corpus. https:\/\/www.english-corpora.org\/time\/. Accessed: 23-04-2024"},{"key":"9757_CR94","unstructured":"Vaamonde, G., Costa, A. L., Marquilhas, R., Pinto, C., & Pratas, F. (2014). Post scriptum: Archivo digital de escritura cotidiana. Humanidades Digitales: desaf\u00edos, logros y perspectivas de futuro, 473\u2013482."},{"key":"9757_CR95","unstructured":"VERCIAL. https:\/\/www.linguateca.pt\/acesso\/corpus.php?corpus=VERCIAL. Accessed: 14-06-2022"},{"key":"9757_CR96","unstructured":"Wagner\u00a0Filho, J. A., Wilkens, R., Idiart, M., & Villavicencio, A. (2018). The brWaC corpus: A new open resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan. https:\/\/aclanthology.org\/L18-1686"},{"key":"9757_CR97","doi-asserted-by":"publisher","unstructured":"Xavier, M. F. (2016). In: Kabatek, J. (ed.) O CIPM - Corpus Informatizado do Portugu\u00eas Medieval, fonte de um Dicion\u00e1rio exaustivo, pp. 137\u2013156. De Gruyter, Berlin, Boston. https:\/\/doi.org\/10.1515\/9783110462357-007.","DOI":"10.1515\/9783110462357-007"},{"key":"9757_CR98","unstructured":"Zampieri, M. (2017). Compiling and processing historical and contemporary portuguese corpora. CoRR arXiv:1710.00803"},{"key":"9757_CR99","unstructured":"Zampieri, M., & Becker, M. (2013). Colonia: Corpus of historical portuguese. Non-Standard Data Sources in Corpus-based Research. ZSM-Studien Series - Vol. 5."},{"key":"9757_CR100","unstructured":"Zampieri, M., Malmasi, S., & Dras, M. (2016). Modeling language change in historical corpora: The case of Portuguese. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC\u201916), pp. 4098\u20134104. European Language Resources Association (ELRA), Portoro\u017e, Slovenia. https:\/\/aclanthology.org\/L16-1647"},{"key":"9757_CR101","unstructured":"Zurich English Newspaper Corpus. https:\/\/www.es.uzh.ch\/en\/Subsites\/Projects\/zencorpus.html. Accessed: 23-04-2024"}],"container-title":["Language Resources and Evaluation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-024-09757-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10579-024-09757-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-024-09757-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,18]],"date-time":"2025-05-18T15:04:14Z","timestamp":1747580654000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10579-024-09757-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,18]]},"references-count":101,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["9757"],"URL":"https:\/\/doi.org\/10.1007\/s10579-024-09757-5","relation":{},"ISSN":["1574-020X","1574-0218"],"issn-type":[{"type":"print","value":"1574-020X"},{"type":"electronic","value":"1574-0218"}],"subject":[],"published":{"date-parts":[[2024,7,18]]},"assertion":[{"value":"10 June 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 July 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"Not applicable.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to publication"}}]}}