{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,23]],"date-time":"2026-03-23T23:34:02Z","timestamp":1774308842182,"version":"3.50.1"},"reference-count":83,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2024,12,28]],"date-time":"2024-12-28T00:00:00Z","timestamp":1735344000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,12,28]],"date-time":"2024-12-28T00:00:00Z","timestamp":1735344000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Lang Resources &amp; Evaluation"],"published-print":{"date-parts":[[2025,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed.<\/jats:p>","DOI":"10.1007\/s10579-024-09798-w","type":"journal-article","created":{"date-parts":[[2024,12,27]],"date-time":"2024-12-27T20:27:32Z","timestamp":1735331252000},"page":"2071-2102","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["ParlaMint II: advancing comparable parliamentary corpora across Europe"],"prefix":"10.1007","volume":"59","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1560-4099","authenticated-orcid":false,"given":"Toma\u017e","family":"Erjavec","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7953-8783","authenticated-orcid":false,"given":"Maty\u00e1\u0161","family":"Kopp","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7169-9152","authenticated-orcid":false,"given":"Nikola","family":"Ljube\u0161i\u0107","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7436-9896","authenticated-orcid":false,"given":"Taja","family":"Kuzman","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1257-2191","authenticated-orcid":false,"given":"Paul","family":"Rayson","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4484-5027","authenticated-orcid":false,"given":"Petya","family":"Osenova","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3467-9424","authenticated-orcid":false,"given":"Maciej","family":"Ogrodniczuk","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1031-6327","authenticated-orcid":false,"given":"\u00c7a\u011fr\u0131","family":"\u00c7\u00f6ltekin","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2916-4856","authenticated-orcid":false,"given":"Danijel","family":"Kor\u017einek","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0464-9240","authenticated-orcid":false,"given":"Katja","family":"Meden","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2678-7695","authenticated-orcid":false,"given":"Jure","family":"Skubic","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0000-9700-3686","authenticated-orcid":false,"given":"Peter","family":"Rupnik","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3063-2239","authenticated-orcid":false,"given":"Tommaso","family":"Agnoloni","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4345-2344","authenticated-orcid":false,"given":"Jos\u00e9","family":"Aires","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0004-2739-1475","authenticated-orcid":false,"given":"Starka\u00f0ur","family":"Barkarson","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6829-6309","authenticated-orcid":false,"given":"Roberto","family":"Bartolini","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9346-7803","authenticated-orcid":false,"given":"N\u00faria","family":"Bel","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0830-4837","authenticated-orcid":false,"given":"Mar\u00eda","family":"Calzada P\u00e9rez","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9375-6410","authenticated-orcid":false,"given":"Roberts","family":"Dar\u0123is","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1678-7069","authenticated-orcid":false,"given":"Sascha","family":"Diwersy","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3895-4796","authenticated-orcid":false,"given":"Maria","family":"Gavriilidou","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9204-9220","authenticated-orcid":false,"given":"Ruben","family":"van Heusden","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6121-3902","authenticated-orcid":false,"given":"Mikel","family":"Iruskieta","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0511-5854","authenticated-orcid":false,"given":"Neeme","family":"Kahusk","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6414-7456","authenticated-orcid":false,"given":"Anna","family":"Kryvenko","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0851-7621","authenticated-orcid":false,"given":"No\u00e9mi","family":"Ligeti-Nagy","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3525-1304","authenticated-orcid":false,"given":"Carmen","family":"Magari\u00f1os","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9701-1771","authenticated-orcid":false,"given":"Martin","family":"M\u00f6lder","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4242-9249","authenticated-orcid":false,"given":"Costanza","family":"Navarretta","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3555-0179","authenticated-orcid":false,"given":"Kiril","family":"Simov","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0004-3166-1837","authenticated-orcid":false,"given":"Lars Magne","family":"Tungland","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4789-5676","authenticated-orcid":false,"given":"Jouni","family":"Tuominen","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3442-6862","authenticated-orcid":false,"given":"John","family":"Vidler","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3910-7820","authenticated-orcid":false,"given":"Adina Ioana","family":"Vladu","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1631-4560","authenticated-orcid":false,"given":"Tanja","family":"Wissik","sequence":"additional","affiliation":[]},{"given":"V\u00e4in\u00f6","family":"Yrj\u00e4n\u00e4inen","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9956-1689","authenticated-orcid":false,"given":"Darja","family":"Fi\u0161er","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,12,28]]},"reference":[{"key":"9798_CR1","unstructured":"Agnoloni, T., Bartolini, R., Frontini, F., Montemagni, S., Marchetti, C., Quochi, V., Venturi, G. (2022). Making Italian parliamentary records machine-actionable: the construction of the ParlaMint-IT corpus, in Proceedings of the workshop ParlaCLARIN III within the 13th language resources and evaluation conference (pp. 117\u2013124). https:\/\/aclanthology.org\/2022.parlaclarin-1.17"},{"key":"9798_CR2","unstructured":"Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., Vollgraf, R. (2019). FLAIR: An easy-to-use framework for state-of-the-art NLP, in NAACL 2019, 2019 annual conference of the North American chapter of the Association for Computational Linguistics (demonstrations) (pp. 54\u201359)."},{"issue":"4","key":"9798_CR3","doi-asserted-by":"publisher","first-page":"480","DOI":"10.1075\/ijcl.22016.ale","volume":"27","author":"M Alexander","year":"2022","unstructured":"Alexander, M., & Struan, A. (2022). In barbarous times and in uncivilized countries two centuries of the evolving uncivil in the Hansard Corpus. International Journal of Corpus Linguistics, 27(4), 480\u2013505.","journal-title":"International Journal of Corpus Linguistics"},{"key":"9798_CR4","unstructured":"Behzad, S., & Zeldes, A. (2020). A cross-genre ensemble approach to robust Reddit part of speech tagging, in Proceedings of the 12th web as corpus workshop (WAC-XII) (pp. 50\u201356)."},{"key":"9798_CR5","unstructured":"Bl\u00e4tte, A., & Blessing, A. (2018). The GermaParl corpus of parliamentary protocols. N.\u00a0Calzolari et\u00a0al. (Eds.), Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA). https:\/\/aclanthology.org\/L18-1130"},{"key":"9798_CR6","unstructured":"Branco, A., & Silva, J. (2004). Evaluating solutions for the rapid development of state-of-the-art POS taggers for Portuguese. M.T.\u00a0Lino, M.F.\u00a0Xavier, F.\u00a0Ferreira, R.\u00a0Costa, and R.\u00a0Silva (Eds.), Proceedings of the fourth international conference on language resources and evaluation (LREC\u201904). Lisbon, Portugal: European Language Resources Association (ELRA). http:\/\/www.lrec-conf.org\/proceedings\/lrec2004\/pdf\/572.pdf"},{"key":"9798_CR7","unstructured":"Branco, A., Silva, J.R., Gomes, L., Ant\u00f3nio\u00a0Rodrigues, J. (2022). Universal grammatical dependencies for Portuguese with CINTIL data, LX processing and CLARIN support. N.\u00a0Calzolari et\u00a0al. (Eds.), Proceedings of the thirteenth language resources and evaluation conference (pp. 5617\u20135626). Marseille, France: European Language Resources Association. https:\/\/aclanthology.org\/2022.lrec-1.603"},{"key":"9798_CR8","unstructured":"\u00c7\u00f6ltekin, \u00c7. (2010). A freely available morphological analyzer for Turkis, in Proceedings of the 7th international conference on language resources and evaluation (LREC 2010) (pp. 820\u2013827). http:\/\/www.lrec-conf.org\/proceedings\/lrec2010\/summaries\/109.html"},{"key":"9798_CR9","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.10450640","author":"\u00c7 \u00c7\u00f6ltekin","year":"2024","unstructured":"\u00c7\u00f6ltekin, \u00c7., Kopp, M., Morkevi\u010dius, V., Ljube\u0161i\u0107, N., Meden, K., & Erjavec, T. (2024). Training data for the shared task Ideology and Power Identification in Parliamentary Debates. Zenodo. https:\/\/doi.org\/10.5281\/zenodo.10450640","journal-title":"Zenodo"},{"key":"9798_CR10","doi-asserted-by":"crossref","unstructured":"Coppedge, M., Gerring, J., Glynn, A., Knutsen, C. H., Lindberg, S. I., Pemstein, D., et al. (2020). Varieties of democracy: Measuring two centuries of political change. Cambridge University Press.","DOI":"10.1017\/9781108347860"},{"issue":"2","key":"9798_CR11","first-page":"255","volume":"47","author":"M-C de Marneffe","year":"2021","unstructured":"de Marneffe, M.-C., Manning, C. D., Nivre, J., & Zeman, D. (2021). Universal dependencies. Computational Linguistics, 47(2), 255\u2013308.","journal-title":"Computational Linguistics"},{"key":"9798_CR12","unstructured":"Erjavec, T., Kopp, M., Meden, K. (2024). Experience of remote collaborative work in the ParlaMint project using git, in Proceedings of the TwinTalks Workshop at DH2023 (in print). Graz, Austria: CEUR. https:\/\/ceur-ws.org\/"},{"key":"9798_CR13","unstructured":"Erjavec, T., Kopp, M., Ogrodniczuk, M., Osenova, P., Agerri, R., Agirrezabal, M., ... Fi\u0161er, D. (2024). Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 4.1. Slovenian language resource repository CLARIN.SI. http:\/\/hdl.handle.net\/11356\/1911"},{"key":"9798_CR14","unstructured":"Erjavec, T., Kopp, M., Ogrodniczuk, M., Osenova, P., Agirrezabal, M., Agnoloni, T., ... Fi\u0161er, D. (2024). Multilingual comparable corpora of parliamentary debates ParlaMint 4.1. Slovenian language resource repository CLARIN.SI. http:\/\/hdl.handle.net\/11356\/1912"},{"key":"9798_CR15","unstructured":"Erjavec, T., Meden, K., Skubic, J. (2023). Adding political orientation metadata to ParlaMint corpora. CLARIN annual conference 2023, book of abstracts. https:\/\/office.clarin.eu\/v\/CE-2023-2328_CLARIN2023_ConferenceProceedings.pdf"},{"issue":"1","key":"9798_CR16","doi-asserted-by":"publisher","first-page":"415","DOI":"10.1007\/s10579-021-09574-0","volume":"57","author":"T Erjavec","year":"2023","unstructured":"Erjavec, T., Ogrodniczuk, M., Osenova, P., Ljube\u0161i\u0107, N., Simov, K., Pan\u010dur, A., ... & Fi\u0161er, D. (2023). The ParlaMint corpora of parliamentary proceedings. Language Resources and Evaluation, 57(1), 415\u2013448. https:\/\/doi.org\/10.1007\/s10579-021-09574-0","journal-title":"Language Resources and Evaluation"},{"key":"9798_CR17","doi-asserted-by":"publisher","unstructured":"Erjavec, T., & Pan\u010dur, A. (2022). The Parla-CLARIN recommendations for encoding corpora of parliamentary proceedings. Journal of the Text Encoding Initiative (Selected Papers from the 2019 TEI Conference), (14), https:\/\/doi.org\/10.4000\/jtei.4133","DOI":"10.4000\/jtei.4133"},{"key":"9798_CR18","unstructured":"Fi\u0161er, D., & Lenardi\u010d, J. (2018). CLARIN corpora for parliamentary discourse research, in Proceedings of the LREC 2018 workshop ParlaCLARIN: Creating and using parliamentary corpora. European Language Resources Association. http:\/\/lrec-conf.org\/workshops\/lrec2018\/W2\/summaries\/14_W2.html"},{"key":"9798_CR19","doi-asserted-by":"crossref","unstructured":"Gr\u00fcnewald, S., Friedrich, A., Kuhn, J. (2021). Applying Occam\u2019s razor to transformer-based dependency parsing: What works, what doesn\u2019t, and what is really necessary. S.\u00a0Oepen, K.\u00a0Sagae, R.\u00a0Tsarfaty, G.\u00a0Bouma, D.\u00a0Seddah, and D.\u00a0Zeman (Eds.), Proceedings of the 17th international conference on parsing technologies and the IWPT 2021 shared task on parsing into enhanced Universal Dependencies (IWPT 2021) (pp. 131\u2013144). Online: Association for Computational Linguistics. https:\/\/aclanthology.org\/2021.iwpt-1.13","DOI":"10.18653\/v1\/2021.iwpt-1.13"},{"key":"9798_CR20","unstructured":"Gu\u00f0j\u00f3nsson, \u00c1.A., Loftsson, H., Da\u00f0ason, J.F. (2021). Icelandic NER API - ensamble model (21.09). http:\/\/hdl.handle.net\/20.500.12537\/159 (CLARIN-IS)"},{"key":"9798_CR21","unstructured":"Hladk\u00e1, B., Kopp, M., Stran\u0306\u00e1k, P. (2020). Compiling Czech parliamentary stenographic protocols into a corpus, in Proceedings of the LREC 2020 workshop on creating, using and linking of parliamentary corpora with other types of political discourse (ParlaCLARIN II) (pp. 18\u201322). Paris,France: European Language Resources Association (ELRA). https:\/\/www.aclweb.org.anthology\/2020.parlaclarin-1.4"},{"key":"9798_CR22","unstructured":"Honnibal, M., Montani, I., Van\u00a0Landeghem, S., Boyd, A. (2020). spaCy: Industrial-strength natural language processing in Python."},{"key":"9798_CR23","unstructured":"Janssen, M. (2016). TEITOK: Text-faithful annotated corpora. N.\u00a0Calzolari et\u00a0al. (Eds.), Proceedings of the tenth international conference on language resources and evaluation (LREC\u201916) (pp. 4037\u20134043). Portoro\u017e, Slovenia: European Language Resources Association (ELRA). https:\/\/aclanthology.org\/L16-1637"},{"key":"9798_CR24","unstructured":"Janssen, M., & Kopp, M. (2024). ParlaMint in TEITOK. D.\u00a0Fiser, M.\u00a0Eskevich, and D.\u00a0Bordon (Eds.), Proceedings of the iv workshop on creating, analysing, and increasing accessibility of parliamentary corpora (parlaclarin) @ lrec-coling 2024 (pp. 121\u2013126). Torino, Italia: ELRA and ICCL. https:\/\/aclanthology.org\/2024.parlaclarin-1.18"},{"key":"9798_CR25","unstructured":"Jasonarson, A., Steingr\u00edmsson, S., Sigur\u00f0sson, E.F., Da\u00f0ason, J.F. (2022). COMBO-based UD parser 22.10. http:\/\/hdl.handle.net\/20.500.12537\/272 (CLARIN-IS)"},{"key":"9798_CR26","doi-asserted-by":"crossref","unstructured":"Jolly, S., Bakker, R., Hooghe, L., Marks, G., Polk, J., Rovny, J., & Vachudova, M. A. (2022). Chapel Hill Expert Survey trend file, 1999\u20132019. Electoral Studies, 75, 102420.","DOI":"10.1016\/j.electstud.2021.102420"},{"key":"9798_CR27","doi-asserted-by":"crossref","unstructured":"Jongejan, B., & Dalianis, H. (2009). Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike. K-Y.\u00a0Su, J.\u00a0Su, J.\u00a0Wiebe, and H.\u00a0Li (Eds.), Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP (pp. 145\u2013153). Suntec, Singapore: Association for Computational Linguistics. https:\/\/aclanthology.org\/P09-1017","DOI":"10.3115\/1687878.1687900"},{"key":"9798_CR28","doi-asserted-by":"crossref","unstructured":"Junczys-Dowmunt, M., Grundkiewicz, R., Dwojak, T., Hoang, H., Heafield, K., Neckermann, T., ... Birch, A. (2018). Marian: Fast neural machine translation in C++, in Proceedings of ACL 2018, system demonstrations (pp. 116\u2013121). Melbourne, Australia: Association for Computational Linguistics. http:\/\/www.aclweb.org\/anthology\/P18-4020","DOI":"10.18653\/v1\/P18-4020"},{"key":"9798_CR29","doi-asserted-by":"crossref","unstructured":"Kiesel, J., \u00c7\u00f6ltekin, \u00c7., Heinrich, M., Fr\u00f6be, M., Alshomary, M., De\u00a0Longueville, B.. Stein, B. (2024). Overview of touch\u00e9 2024: Argumentation systems. European conference on information retrieval (pp. 466\u2013473).","DOI":"10.1007\/978-3-031-56069-9_64"},{"key":"9798_CR30","doi-asserted-by":"crossref","unstructured":"Kilgarriff, A., Baisa, V., Bu\u0161ta, J., Jakub\u00ed\u010dek, M., & Kov\u00e1r\u0306, V., Michelfeit, J., Suchomel, V. (2014). The Sketch Engine: ten years on. Lexicography, 1, 7\u201336.","DOI":"10.1007\/s40607-014-0009-9"},{"key":"9798_CR31","doi-asserted-by":"crossref","unstructured":"Kondratyuk, D., & Straka, M. (2019). 75 languages, 1 model: Parsing Universal Dependencies universally, in Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 2779\u20132795). Hong Kong, China: Association for Computational Linguistics. https:\/\/www.aclweb.org\/anthology\/D19-1279","DOI":"10.18653\/v1\/D19-1279"},{"key":"9798_CR32","unstructured":"Kopp, M. (2022). ParCzech pipeline. https:\/\/github.com\/ufal\/ParCzech."},{"key":"9798_CR33","unstructured":"Kopp, M. (2024a). AudioPSP 24.01: Audio recordings of proceedings of the chamber of deputies of the parliament of the Czech Republic. LINDAT\/CLARIAH-CZ digital library. http:\/\/hdl.handle.net\/11234\/1-5404"},{"key":"9798_CR34","unstructured":"Kopp, M. (2024b). ParCzech 4.0. LINDAT\/CLARIAH-CZ digital library. http:\/\/hdl.handle.net\/11234\/1-5360"},{"key":"9798_CR35","unstructured":"Kopp, M., & Ljube\u0161i\u0107, N. (2024). Parliamentary spoken corpus of Czech ParlaSpeech-CZ 1.0. Slovenian language resource repository CLARIN.SI. http:\/\/hdl.handle.net\/11356\/1785"},{"key":"9798_CR36","doi-asserted-by":"crossref","unstructured":"Kopp, M., Stankov, V., Kr\u016fza, J., Stra\u0148\u00e1k, P., Bojar, O. (2021). ParCzech 3.0: A large Czech speech corpus with rich metadata. K.\u00a0Ek\u0161tein, F.\u00a0P\u00e1rtl, and M.\u00a0Konop\u00edk (Eds.), Text, speech, and dialogue (pp. 293\u2013304). Cham, Switzerland: Springer. https:\/\/doi.org\/10.1007\/978-3-030-83527-9_25","DOI":"10.1007\/978-3-030-83527-9_25"},{"key":"9798_CR37","unstructured":"Kor\u017einek, D., & Ljube\u0161i\u0107, N. (2024). Parliamentary spoken corpus of Polish ParlaSpeech-PL 1.0. Slovenian language resource repository CLARIN.SI. http:\/\/hdl.handle.net\/11356\/1686"},{"key":"9798_CR38","unstructured":"Kryvenko, A., Evkoski, B., Bordon, D., Meden, K. (2023). Splitting lips: polarization through parliamentary speech. Poster presented at the Helsinki digital humanities hackathon #DHH23. https:\/\/www.helsinki.fi\/assets\/drupal\/2023-06\/dhh23-parliament-poster.pdf"},{"key":"9798_CR39","unstructured":"Kryvenko, A., & Kopp, M. (2023). Workflow and metadata challenges in the ParlaMint project: Insights from building the ParlaMint-UA corpus. CLARIN annual conference proceedings 2023 (pp. 67\u201370). Leuven, Belgium: CLARIN ERIC. https:\/\/office.clarin.eu\/v\/CE-2023-2328_CLARIN2023_ConferenceProceedings.pdf"},{"key":"9798_CR40","unstructured":"Kryvenko, A., & Pahor\u00a0de Maiti, K. (2023). Combining corpus linguistics and discourse analysis to explore the parliamentary debates across Europe. https:\/\/digihubb.centre.ubbcluj.ro\/workshops\/ (Tutorial given at the European Summer University in Digital Humanities, Babe\u015f-Bolyai University, Cluj-Napoca, Romania)"},{"key":"9798_CR41","unstructured":"Kryvenko, A., Pahor\u00a0de Maiti, K., Osenova, P. (2023). Put Them In to Get Them Out: the ParlaMint Corpora for Digital Humanities and Social Sciences Research. https:\/\/dh2023.adho.org\/?page_id=616 (Tutorial given at the Digital Humanities conference 2023, Graz)"},{"key":"9798_CR42","unstructured":"Kuzman, T., Ljube\u0161i\u0107, N., Erjavec, T., Kopp, M., Ogrodniczuk, M., Osenova, P., ... Fi\u0161er, D. (2024). Linguistically annotated multilingual comparable corpora of parliamentary debates in English ParlaMint-en.ana 4.1. Slovenian language resource repository CLARIN.SI. http:\/\/hdl.handle.net\/11356\/1910"},{"key":"9798_CR43","unstructured":"Laur, S., Orasmaa, S., S\u00e4rg, D., Tammo, P. (2020). EstNLTK 1.6: Remastered Estonian NLP pipeline, in Proceedings of the 12th language resources and evaluation conference (pp. 7154\u20137162). Marseille, France: European Language Resources Association. https:\/\/www.aclweb.org\/anthology\/2020.lrec-1.884"},{"key":"9798_CR44","unstructured":"Lenardi\u010d, J., & Fi\u0161er, D. (2023). CLARIN resource families: Parliamentary corpora. https:\/\/www.clarin.eu\/resource-families\/parliamentary-corpora, Accessed 20 Jan 2024"},{"key":"9798_CR45","doi-asserted-by":"crossref","unstructured":"Ljube\u0161i\u0107, N., & Dobrovoljc, K. (2019). What does neural bring? analysing improvements in morphosyntactic annotation and lemmatisation of Slovenian, Croatian and Serbian, in Proceedings of the 7th workshop on Balto-Slavic natural language processing (pp. 29\u201334). Florence, Italy: Association for Computational Linguistics. https:\/\/www.aclweb.org\/anthology\/W19-3704","DOI":"10.18653\/v1\/W19-3704"},{"key":"9798_CR46","unstructured":"Ljube\u0161i\u0107, N., Kor\u017einek, D., Rupnik, P. (2024). Parliamentary spoken corpus of Croatian ParlaSpeech-HR 2.0. Slovenian language resource repository CLARIN.SI. http:\/\/hdl.handle.net\/11356\/1914"},{"key":"9798_CR47","unstructured":"Ljube\u0161i\u0107, N., Kor\u017einek, D., Rupnik, P., Jazbec, I-P. (2022). ParlaSpeech-HR - a freely available ASR dataset for Croatian bootstrapped from the ParlaMint corpus. D.\u00a0Fi\u0161er, M.\u00a0Eskevich, J.\u00a0Lenardi\u010d, and F.\u00a0de Jong (Eds.), Proceedings of the workshop ParlaCLARIN III within the 13th language resources and evaluation conference (pp. 111\u2013116). Marseille, France: European Language Resources Association. https:\/\/aclanthology.org\/2022.parlaclarin-1.16"},{"key":"9798_CR48","unstructured":"Ljube\u0161i\u0107, N., Kor\u017einek, D., Rupnik, P., Jazbec, I-P., Batanovi\u0107, V., Baj\u010deti\u0107, L., Evkoski, B. (2022). ASR training dataset for Croatian ParlaSpeech-HR v1.0. http:\/\/hdl.handle.net\/11356\/1494 (Slovenian language resource repository CLARIN.SI)"},{"key":"9798_CR49","doi-asserted-by":"crossref","unstructured":"Ljube\u0161i\u0107, N., Rupnik, P., Kor\u017einek, D. (2024). The parlaspeech collection of automatically generated speech and text datasets from parliamentary proceedings. arXiv preprint arXiv:2409.15397","DOI":"10.1007\/978-3-031-77961-9_10"},{"key":"9798_CR50","unstructured":"Ljube\u0161i\u0107, N., Rupnik, P., Kor\u017einek, D. (2024). Parliamentary spoken corpus of serbian ParlaSpeech-RS 1.0. Slovenian language resource repository CLARIN.SI. http:\/\/hdl.handle.net\/11356\/1834"},{"key":"9798_CR51","unstructured":"Mach\u00e1lek, T. (2020). KonText: Advanced and flexible corpus query interface, in Proceedings of the 12th language resources and evaluation conference (pp. 7003\u20137008). Marseille, France: European Language Resources Association. https:\/\/www.aclweb.org\/anthology\/2020.lrec-1.865"},{"key":"9798_CR52","doi-asserted-by":"crossref","unstructured":"Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. Association for Computational Linguistics (ACL) system demonstrations (pp. 55\u201360). http:\/\/www.aclweb.org\/anthology\/P\/P14\/P14-5010","DOI":"10.3115\/v1\/P14-5010"},{"key":"9798_CR53","doi-asserted-by":"crossref","unstructured":"Meden, K., Erjavec, T., Pan\u010dur, A. (2024). Slovenian parliamentary corpus siParl. Language Resources and Evaluation, 1\u201321, https:\/\/doi.org\/10.1007\/s10579-024-09746-8","DOI":"10.1007\/s10579-024-09746-8"},{"key":"9798_CR54","doi-asserted-by":"publisher","unstructured":"Mochtak, M. (2022). SVKCorp: Corpus of debates in the national council of the Slovak Republic. Zenodo. https:\/\/doi.org\/10.5281\/zenodo.7020534","DOI":"10.5281\/zenodo.7020534"},{"key":"9798_CR55","unstructured":"Monarch, R., & Munro, R. (2021). Human-in-the-loop machine learning: Active learning and annotation for human-centered AI. Simon and Schuster."},{"key":"9798_CR56","unstructured":"Nivre, J., Agi\u0107, \u017d., Ahrenberg, L., Aranzabe, M.J., Asahara, M., Atutxa, A.. Zhu, H. (2017). Universal Dependencies 2.0. http:\/\/hdl.handle.net\/11234\/1-1983 (LINDAT\/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (\u00daFAL), Faculty of Mathematics and Physics, Charles University)"},{"key":"9798_CR57","unstructured":"Orosz, G., Sz\u00e1nt\u00f3, Z., Berkecz, P., Szab\u00f3, G., Farkas, R. (2022). HuSpaCy: an industrial-strength Hungarian natural language processing toolkit. XVIII. Magyar Sz\u00e1m\u00edt\u00f3g\u00e9pes Nyelv\u00e9szeti Konferencia (pp. 59\u201373)."},{"key":"9798_CR58","unstructured":"Pan\u010dur, A., Erjavec, T., Meden, K., Ojster\u0161ek, M., \u0160orn, M., Blaj\u00a0Hribar, N. (2022). Slovenian parliamentary corpus (1990-2022) siParl 3.0. Slovenian language resource repository CLARIN.SI. http:\/\/hdl.handle.net\/11356\/1748"},{"key":"9798_CR59","unstructured":"Pan\u010dur, A., Meden, K., Erjavec, T., Ojster\u0161ek, M., \u0160orn, M., Blaj\u00a0Hribar, N. (2024). Slovenian parliamentary corpus (1990-2022) siParl 4.0. Slovenian language resource repository CLARIN.SI. http:\/\/hdl.handle.net\/11356\/1936"},{"key":"9798_CR60","unstructured":"Pan\u010dur, A., & Erjavec, T. (2020). The siParl corpus of Slovene parliamentary proceedings. D.\u00a0Fi\u0161er, M.\u00a0Eskevich, and F.\u00a0de Jong (Eds.), Proceedings of the second ParlaCLARIN workshop (pp. 28\u201334). Marseille, France: European Language Resources Association. https:\/\/aclanthology.org\/2020.parlaclarin-1.6"},{"key":"9798_CR61","doi-asserted-by":"crossref","unstructured":"Prokopidis, P., & Piperidis, S. (2020). A neural NLP toolkit for Greek, in 11th Hellenic conference on artificial intelligence (pp. 125\u2013128).","DOI":"10.1145\/3411408.3411430"},{"key":"9798_CR62","doi-asserted-by":"crossref","unstructured":"Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.D. (2020). Stanza: A Python natural language processing toolkit for many human languages, in Proceedings of the 58th annual meeting of the association for computational linguistics: System demonstrations.","DOI":"10.18653\/v1\/2020.acl-demos.14"},{"key":"9798_CR63","unstructured":"Rayson, P., Archer, D., Piao, S., McEnery, T. (2004). The ucrel semantic analysis system. Proceedings of the workshop on beyond named entity recognition semantic labelling for nlp tasks, in association with lrec-04 (pp. 7\u201312)."},{"key":"9798_CR64","unstructured":"Silveira, N., Dozat, T., de Marneffe, M-C., Bowman, S., Connor, M., Bauer, J., Manning, C.D. (2014). A gold standard dependency corpus for English, in Proceedings of the ninth international conference on language resources and evaluation (LREC-2014)."},{"key":"9798_CR65","unstructured":"Skubic, J., Angermeier, J., Bruncrona, A., Evkoski, B., Leiminger, L. (2022). Networks of power: Gender analysis in selected European parliaments, in Proceedings of the 2nd workshop on computational linguistics for political text analysis (CPSS-2022). https:\/\/old.gscl.org\/en\/arbeitskreise\/cpss\/cpss-2022\/workshop-proceedings-2022"},{"key":"9798_CR66","unstructured":"Steingr\u00edmsson, S., Barkarson, S., \u00d6rn\u00f3lfsson, G.T. (2020). IGC-Parl: Icelandic corpus of parliamentary proceedings. D.\u00a0Fi\u0161er, M.\u00a0Eskevich, and F.\u00a0de Jong (Eds.), Proceedings of the second ParlaCLARIN workshop (pp. 11\u201317). Marseille, France: European Language Resources Association. https:\/\/aclanthology.org\/2020.parlaclarin-1.3"},{"issue":"5","key":"9798_CR67","doi-asserted-by":"publisher","first-page":"617","DOI":"10.1075\/jlp.18014.sto","volume":"17","author":"M Stopfner","year":"2018","unstructured":"Stopfner, M. (2018). Put your big girl voice on: Parliamentary heckling against female MPs. Journal of Language and Politics, 17(5), 617\u2013635.","journal-title":"Journal of Language and Politics"},{"key":"9798_CR68","unstructured":"Straka, M. (2018). UDPipe 2.0 prototype at CoNLL 2018 UD shared task, in Proceedings of the CoNLL 2018 shared task: Multilingual parsing from raw text to universal dependencies (pp. 197\u2013207). Brussels, Belgium: Association for Computational Linguistics. https:\/\/www.aclweb.org\/anthology\/K18-2020"},{"key":"9798_CR69","doi-asserted-by":"crossref","unstructured":"Strakov\u00e1, J., Straka, M., Haji\u010d, J. (2019). Neural architectures for nested NER through linearization, in Proceedings of the 57th annual meeting of the Association for Computational Linguistics (pp. 5326\u20135331). Stroudsburg, PA, USA: Association for Computational Linguistics.","DOI":"10.18653\/v1\/P19-1527"},{"key":"9798_CR70","unstructured":"Sylvester, C., Greene, Z., Ebing, B. (2022). ParlEE plenary speeches data set: Annotated full-text of 21.6 million sentence-level plenary speeches of eight EU states. https:\/\/doi.org\/10.7910\/DVN\/ZY3RV7, Harvard Dataverse, V1"},{"key":"9798_CR71","doi-asserted-by":"crossref","unstructured":"Tamper, M., Leskinen, P., Apajalahti, K., Hyv\u00f6nen, E. (2018). Using biographical texts as linked data for prosopographical research and applications. M.\u00a0Ioannides et\u00a0al. (Eds.), Digital heritage. progress in cultural heritage: Documentation, preservation, and protection. 7th international conference, EuroMed 2018 (pp. 125\u2013137). Nicosia, Cyprus: Springer-Verlag.","DOI":"10.1007\/978-3-030-01762-0_11"},{"key":"9798_CR72","doi-asserted-by":"publisher","unstructured":"Tamper, M., Oksanen, A., Tuominen, J., Hietanen, A., Hyv\u00f6nen, E. (2020). Automatic annotation service APPI: Named entity linking in legal domain. A.\u00a0Harth et\u00a0al. (Eds.), The semantic web: ESWC 2020 satellite events (Vol. 12124, pp. 208\u2013213). Springer-Verlag. https:\/\/doi.org\/10.1007\/978-3-030-62327-2_36","DOI":"10.1007\/978-3-030-62327-2_36"},{"key":"9798_CR73","unstructured":"TEI Consortium (Ed.). (2017). TEI P5: Guidelines for electronic text encoding and interchange. TEI Consortium. http:\/\/www.tei-c.org\/Guidelines\/P5\/"},{"key":"9798_CR100","doi-asserted-by":"publisher","unstructured":"Ter\u010don, L., & Ljube\u0161i\u0107, N. (2023). CLASSLA-Stanza: The next step for linguistic processing of South Slavic languages .\nhttps:\/\/doi.org\/10.48550\/arXiv.2308.04255","DOI":"10.48550\/arXiv.2308.04255"},{"key":"9798_CR74","unstructured":"Tiedemann, J. (2012). Parallel data, tools and interfaces in OPUS. LREC\u201912 (Vol. 2012, pp. 2214\u20132218)."},{"key":"9798_CR75","unstructured":"Tiedemann, J., & Thottingal, S. (2020). OPUS-MT \u2013 building open translation services for the world, in Proceedings of the 22nd annual conference of the European Association for Machine Translation."},{"key":"9798_CR76","doi-asserted-by":"crossref","unstructured":"Tjong Kim\u00a0Sang, E.F., & De\u00a0Meulder, F. (2003). Introduction to the CoNLL-2003 shared task: language-independent named entity recognition, in Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003 - volume 4 (p.142-147). USA: Association for Computational Linguistics. https:\/\/doi.org\/10.3115\/1119176.1119195","DOI":"10.3115\/1119176.1119195"},{"issue":"3","key":"9798_CR77","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1515\/zaa-2019-0025","volume":"67","author":"N Truan","year":"2019","unstructured":"Truan, N. (2019). Talking about, for, and to the people: Populism and representation in parliamentary debates on Europe. Zeitschrift f\u00fcr anglistik und amerikanistik, 67(3), 307\u2013337.","journal-title":"Zeitschrift f\u00fcr anglistik und amerikanistik"},{"key":"9798_CR78","doi-asserted-by":"crossref","unstructured":"Truan, N., & Romary, L. (2022). Building, encoding, and annotating a corpus of parliamentary debates in TEI XML: A cross-linguistic account. Journal of the Text Encoding Initiative, (14).","DOI":"10.4000\/jtei.4164"},{"key":"9798_CR79","doi-asserted-by":"publisher","DOI":"10.4000\/jtei.4214","author":"T Wissik","year":"2022","unstructured":"Wissik, T. (2022). Encoding interruptions in parliamentary data: From applause to interjections and laughter. Journal of the Text Encoding Initiative. https:\/\/doi.org\/10.4000\/jtei.4214","journal-title":"Journal of the Text Encoding Initiative"},{"key":"9798_CR80","unstructured":"Wissik, T., & Pirker, H. (2018). ParlAT beta corpus of Austrian parliamentary records, in Proceedings of the LREC 2018 workshop ParlaCLARIN: Creating and using parliamentary corpora."},{"issue":"3","key":"9798_CR81","doi-asserted-by":"publisher","first-page":"581","DOI":"10.1007\/s10579-016-9343-x","volume":"51","author":"A Zeldes","year":"2017","unstructured":"Zeldes, A. (2017). The GUM corpus: Creating multilayer resources in the classroom. Language Resources and Evaluation, 51(3), 581\u2013612. https:\/\/doi.org\/10.1007\/s10579-016-9343-x","journal-title":"Language Resources and Evaluation"},{"key":"9798_CR82","unstructured":"Znotins, A., & Cirule, E. (2018). NLP-PIPE: Latvian NLP tool pipeline. Human language technologies - the Baltic perspective (Vol.\u00a0307, p.183-189). IOS Press. http:\/\/ebooks.iospress.nl\/volumearticle\/50320"}],"container-title":["Language Resources and Evaluation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-024-09798-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10579-024-09798-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-024-09798-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,6]],"date-time":"2025-09-06T04:01:01Z","timestamp":1757131261000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10579-024-09798-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,28]]},"references-count":83,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,9]]}},"alternative-id":["9798"],"URL":"https:\/\/doi.org\/10.1007\/s10579-024-09798-w","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-4176128\/v1","asserted-by":"object"}]},"ISSN":["1574-020X","1574-0218"],"issn-type":[{"value":"1574-020X","type":"print"},{"value":"1574-0218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,28]]},"assertion":[{"value":"25 November 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 December 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no Conflict of interest, nor Conflict of interest to disclose, neither financial nor any other. One of the authors is a member of the editorial board of this journal.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interest"}},{"value":"Not applicable \/ The provider of this data and related work declares that, to the best of their knowledge, it is free of copyright restrictions and does not contain sensitive personal information or violate privacy laws.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"No human subjects were involved in this work.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"All authors and other individuals, associated with the work described give their consent to the publication of the article.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}]}}