{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,8]],"date-time":"2025-09-08T05:58:46Z","timestamp":1757311126382,"version":"3.40.5"},"reference-count":47,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,6,2]],"date-time":"2024-06-02T00:00:00Z","timestamp":1717286400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,6,2]],"date-time":"2024-06-02T00:00:00Z","timestamp":1717286400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004329","name":"Slovenian Research and Innovation Agency","doi-asserted-by":"crossref","award":["P2-010","I0-0013"],"award-info":[{"award-number":["P2-010","I0-0013"]}],"id":[{"id":"10.13039\/501100004329","id-type":"DOI","asserted-by":"crossref"}]},{"name":"CLARIN.SI"},{"name":"DARIAH-SI"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Lang Resources &amp; Evaluation"],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Parliamentary debates represent an essential part of democratic discourse and provide insights into various socio-demographic and linguistic phenomena - parliamentary corpora, which contain transcripts of parliamentary debates and extensive metadata, are an important resource for parliamentary discourse analysis and other research areas. This paper presents the Slovenian parliamentary corpus siParl, the latest version of which contains transcripts of plenary sessions and other legislative bodies of the Assembly of the Republic of Slovenia from 1990 to 2022, comprising more than 1 million speeches and 210 million words. We outline the development history of the corpus and also mention other initiatives that have been influenced by siParl (such as the Parla-CLARIN encoding and the ParlaMint corpora of European parliaments), present the corpus creation process, ranging from the initial data collection to the structural development and encoding of the corpus, and given the growing influence of the ParlaMint corpora, compare siParl with the Slovenian ParlaMint-SI corpus. Finally, we discuss updates for the next version as well as the long-term development and enrichment of the siParl corpus.<\/jats:p>","DOI":"10.1007\/s10579-024-09746-8","type":"journal-article","created":{"date-parts":[[2024,6,2]],"date-time":"2024-06-02T14:01:24Z","timestamp":1717336884000},"page":"891-911","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Slovenian parliamentary corpus siParl"],"prefix":"10.1007","volume":"59","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0464-9240","authenticated-orcid":false,"given":"Katja","family":"Meden","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1560-4099","authenticated-orcid":false,"given":"Toma\u017e","family":"Erjavec","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6143-6877","authenticated-orcid":false,"given":"Andrej","family":"Pan\u010dur","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,6,2]]},"reference":[{"key":"9746_CR1","unstructured":"Abercrombie, G., & Batista-Navarro, RT. (2018). \u2018Aye\u2019 or \u2018no\u2019? Speech-level sentiment analysis of Hansard UK parliamentary debate transcripts. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018)"},{"key":"9746_CR2","unstructured":"Arhar\u00a0Holdt, \u0160., \u010cibej, J., Dobrovoljc, K., Erjavec, T., Gantar, P., Krek, S., Munda, T., Robida, N., Ter\u010don, L., & \u017ditnik, S. (2024). SUK 1.0: A new training corpus for linguistic annotation of modern standard Slovene. In: Proceedings of the 2024 joint international conference on computational linguistics, language resources and evaluation, 20\u201325 May 2024, Torino, in print"},{"key":"9746_CR3","unstructured":"Arhar\u00a0Holdt, \u0160., Krek, S., Dobrovoljc, K., Erjavec, T., Gantar, P., \u010cibej, J., Pori, E., Ter\u010don, L., Munda, T., \u017ditnik, S., Robida, N., Blagus, N., Mo\u017ee, S., Ledinek, N., Holz, N., Zupan, K., Kuzman, T., Kav\u010di\u010d, T., \u0160krjanec, I., Marko, D., Jezer\u0161ek, L., & Zajc, A. (2023). Training corpus SUK 1.0. Jo\u017eef Stefan Institute. Retrieved from http:\/\/hdl.handle.net\/11356\/1747"},{"key":"9746_CR4","unstructured":"CLARIN ERIC. (2020). ParlaMint: Towards comparable parliamentary corpora. Retrieved 11 Feb 2023, from https:\/\/www.clarin.eu\/parlamint"},{"key":"9746_CR5","unstructured":"Dr\u017eavni zbor Republike Slovenije. (2020a). Sestava in organiziranost. Retrieved 25 Feb 2023, from https:\/\/www.dz-rs.si\/wps\/portal\/Home\/odz\/pristojnosti\/organiziranost"},{"key":"9746_CR6","unstructured":"Dr\u017eavni zbor Republike Slovenije. (2020b). Skup\u0161\u010dina 1990\u20131992: Prva demokrati\u010dno izvoljena skup\u0161\u010dina. Retrieved 25 Feb 2023, from https:\/\/www.dz-rs.si\/wps\/portal\/Home\/pos\/PretekliMandati\/Skupscina\/"},{"key":"9746_CR7","unstructured":"Erjavec, T., & Pan\u010dur, A. (2019a). Parla-CLARIN: A TEI schema for corpora of parliamentary proceedings. Retrieved from https:\/\/clarin-eric.github.io\/parla-clarin\/"},{"key":"9746_CR8","doi-asserted-by":"publisher","unstructured":"Erjavec, T., & Pan\u010dur, A. (2019b). Parla-CLARIN: TEI guidelines for corpora of parliamentary proceedings. Retrieved from https:\/\/doi.org\/10.5281\/zenodo.3446164","DOI":"10.5281\/zenodo.3446164"},{"key":"9746_CR9","unstructured":"Erjavec, T., Kopp, M., Ogrodniczuk, M., Osenova, P., Agerri, R., Agirrezabal, M., Agnoloni, T., Aires, J., Albini, M., Alkorta, J., Antiba-Cartazo, I., Arrieta, E., Barcala, M., Bardanca, D., Barkarson, S., Bartolini, R., Battistoni, R., Bel, N., Bonet\u00a0Ramos, MdM., ... Fi\u0161er, D. (2023a). Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 4.0. CLARIN.EU. Retrieved from http:\/\/hdl.handle.net\/11356\/1860"},{"key":"9746_CR10","unstructured":"Erjavec, T., Kopp, M., Ogrodniczuk, M., Osenova, P., Agirrezabal, M., Agnoloni, T., Aires, J., Albini, M., Alkorta, J., Antiba-Cartazo, I., Arrieta, E., Barcala, M., Bardanca, D., Barkarson, S., Bartolini, R., Battistoni, R., Bel, N., Bonet\u00a0Ramos, MdM., ... Fi\u0161er, D. (2023b). Multilingual comparable corpora of parliamentary debates ParlaMint 4.0. CLARIN.EU. Retrieved from http:\/\/hdl.handle.net\/11356\/1859"},{"key":"9746_CR11","doi-asserted-by":"publisher","unstructured":"Erjavec, T., Ogrodniczuk, M., Osenova, P., Ljube\u0161i\u0107, N., Simov, K., Pan\u010dur, A., Rudolf, M., Kopp, M., Barkarson, S., Steingr\u00edmsson, S., \u00c7a\u01e7r\u0131, \u00c7., de\u00a0Does, J., Depuydt, K., Agnoloni, T., Venturi, G.,P\u00e9rez, MC., de\u00a0Macedo, LD., ... Fi\u0161er, D. (2022). The Parlamint corpora of parliamentary proceedings. Language Resources and Evaluation, 1\u201334. https:\/\/doi.org\/10.1007\/s10579-021-09574-0","DOI":"10.1007\/s10579-021-09574-0"},{"key":"9746_CR12","doi-asserted-by":"publisher","unstructured":"Erjavec, T., & Pan\u010dur, A. (2022). The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings. Journal of the Text Encoding Initiative, 14. https:\/\/doi.org\/10.4000\/jtei.4133","DOI":"10.4000\/jtei.4133"},{"key":"9746_CR13","doi-asserted-by":"crossref","unstructured":"Fi\u0161er, D., Konov\u0161ek, T., & Pan\u010dur, A. (2023). Referencing the public by populist and non-populist parties in the Slovene parliament. Sloven\u0161\u010dina 20: empiri\u010dne, aplikativne in interdisciplinarne raziskave, 11(1), 69\u201390","DOI":"10.4312\/slo2.0.2023.1.69-90"},{"key":"9746_CR14","doi-asserted-by":"publisher","unstructured":"Fi\u0161er, D., & Pahor\u00a0de Maiti, K. (2020). Voices of the parliament. Modern Languages Open. https:\/\/doi.org\/10.3828\/mlo.v0i0.295","DOI":"10.3828\/mlo.v0i0.295"},{"key":"9746_CR15","doi-asserted-by":"crossref","unstructured":"Fi\u0161er, D., & Pahor\u00a0de Maiti, K. (2021) Voices of the parliament: A corpus approach to parliamentary discourse research. Retrieved from https:\/\/sidih.github.io\/voices\/index.html","DOI":"10.3828\/mlo.v0i0.295"},{"key":"9746_CR16","unstructured":"Heidar, K., & Koole, R. (eds.) (2000). Parliamentary party groups in European democracies: Political parties behind closed doors. In: ECPR studies in European political science (p. 13). Routledge: London; New York."},{"key":"9746_CR17","doi-asserted-by":"crossref","unstructured":"Hyv\u00f6nen, E., Leskinen, P., Sinikallio, L., La\u00a0Mela, M., Tuominen, J., Elo, K., Drobac, S., Koho, M., Ikkala, E., Tamper, M., et\u00a0al. (2022) Finnish parliament on the semantic web: Using ParliamentSampo data service and semantic portal for studying political culture and language. In: Digital Parliamentary data in Action (DiPaDa 2022), Workshop at the 6th Digital Humanities in Nordic and Baltic Countries Conference, CEUR Workshop Proceedings. Retrieved from  http:\/\/urn.fi\/URN:NBN:fi:aalto-202206083598","DOI":"10.5617\/dhnbpub.11261"},{"key":"9746_CR19","unstructured":"Kav\u010di\u010d, A., Mundjar, A., & Marolt, M. (2023a). Carniolan Provincial Assembly corpus Kranjska 1.0. Faculty of computer and information science. University of Ljubljana. Retrieved from http:\/\/hdl.handle.net\/11356\/1824"},{"key":"9746_CR18","unstructured":"Kav\u010di\u010d, A., Mundjar, A., & Marolt, M. (2023b). Parliamentary corpus of first Yugoslavia (1919\u20131939) yu1Parl 1.0. Faculty of computer and information science. University of Ljubljana. Retrieved from http:\/\/hdl.handle.net\/11356\/1845"},{"key":"9746_CR20","unstructured":"Krek, S., Dobrovoljc, K., Erjavec, T., Mo\u017ee, S., Ledinek, N., Holz, N., Zupan, K., Gantar, P., Kuzman, T., \u010cibej, J., Arhar\u00a0Holdt, \u0160., Kav\u010di\u010d, T., \u0160krjanec, I., Marko, D., Jezer\u0161ek, L., & Zajc, A. (2019.) Training corpus ssj500k 2.2. Centre for Language Resources and Technologies. University of Ljubljana. Retrieved from http:\/\/hdl.handle.net\/11356\/1210"},{"key":"9746_CR21","unstructured":"Kuzman, T., Ljube\u0161i\u0107, N., Erjavec, T., Kopp, M., Ogrodniczuk, M., Osenova, P., Rayson, P., Vidler, J., Agerri, R., Agirrezabal, M., Agnoloni, T., Aires, .J, Albini, M., Alkorta, J., Antiba-Cartazo, I., Arrieta, E., Barcala, M., Bardanca, D., ... Fi\u0161er, D. (2023). Linguistically annotated multilingual comparable corpora of parliamentary debates in English ParlaMint-en.ana 4.0. CLARIN.EU. Retrieved  from http:\/\/hdl.handle.net\/11356\/1864"},{"key":"9746_CR22","doi-asserted-by":"publisher","unstructured":"Ljube\u0161i\u0107, N., & Dobrovoljc, K. (2019). What does neural bring? Analysing improvements in morphosyntactic annotation and lemmatisation of Slovenian, Croatian and Serbian. In: Proceedings of the 7th workshop on Balto-Slavic natural language processing (pp 29\u201334). Association for Computational Linguistics, Florence. https:\/\/doi.org\/10.18653\/v1\/W19-3704","DOI":"10.18653\/v1\/W19-3704"},{"key":"9746_CR25","unstructured":"Ljube\u0161i\u0107, N., & Krsnik, L. (2022a). The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.4. Jo\u017eef Stefan Institute. Retrieved from http:\/\/hdl.handle.net\/11356\/1478"},{"key":"9746_CR23","unstructured":"Ljube\u0161i\u0107, N., & Krsnik, L. (2022b). The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian 1.3. Jo\u017eef Stefan Institute. Retrieved from  http:\/\/hdl.handle.net\/11356\/1476"},{"key":"9746_CR24","unstructured":"Ljube\u0161i\u0107, N., Ter\u010don, L., & \u010cibej, J. (2023). The CLASSLA-Stanza model for morphosyntactic annotation of standard Slovenian 2.0. Jo\u017eef Stefan Institute. Retrieved from  http:\/\/hdl.handle.net\/11356\/1767"},{"key":"9746_CR26","unstructured":"Mochtak, M., Rupnik, P., Meden, K., & Ljube\u0161i\u0107, N. (2023). The multilingual sentiment dataset of parliamentary debates ParlaSent 1.0. Jo\u017eef Stefan Institute. Retrieved from  http:\/\/hdl.handle.net\/11356\/1868"},{"key":"9746_CR27","unstructured":"Pan\u010dur, A. (2016). Ozna\u010devanje zbirke zapisnikov sej slovenskega parlamenta s smernicami TEI. In: Zbornik konference Jezikovne tehnologije in digitalna humanistika (pp 142\u201348)."},{"key":"9746_CR28","unstructured":"Pan\u010dur, A., Erjavec, T., Meden, K., Ojster\u0161ek, M., \u0160orn, M., & Blaj\u00a0Hribar, N. (2022). Slovenian parliamentary corpus (1990\u20132022) siParl 3.0. Institute of Contemporary History. Retrieved from http:\/\/hdl.handle.net\/11356\/1748"},{"key":"9746_CR29","unstructured":"Pan\u010dur, A., Erjavec, T., Ojster\u0161ek, M., \u0160orn, M., & Blaj\u00a0Hribar, N. (2019). Slovenian parliamentary corpus (1990\u20132018) siParl 1.0. Institute of Contemporary History. Retrieved from http:\/\/hdl.handle.net\/11356\/1236"},{"key":"9746_CR30","unstructured":"Pan\u010dur, A., Erjavec, T., Ojster\u0161ek, M., \u0160orn, M., & Blaj\u00a0Hribar, N. (2020). Slovenian parliamentary corpus (1990\u20132018) siParl 2.0. Institute of Contemporary History. Retrieved from http:\/\/hdl.handle.net\/11356\/1300"},{"key":"9746_CR31","unstructured":"Pan\u010dur, A., \u0160orn, M., & Erjavec, T. (2016.) Slovenian parliamentary corpus (1990\u20131992) SlovParl 1.0. Institute of Contemporary History. Retrieved from http:\/\/hdl.handle.net\/11356\/1075"},{"key":"9746_CR32","unstructured":"Pan\u010dur, A., \u0160orn, M., & Erjavec, T. (2017). Slovenian parliamentary corpus (1990\u20131992) SlovParl 2.0. Institute of Contemporary History. Retrieved from http:\/\/hdl.handle.net\/11356\/1167"},{"key":"9746_CR33","unstructured":"Pan\u010dur, A., \u0160orn, M., & Erjavec, T. (2018). Slovparl 2.0: The collection of Slovene parliamentary debates from the period of secession. In: D. Fi\u0161er, and J. Maria Eskevich (eds.) ParlaCLARIN 2018 Workshop Proceedings (vol\u00a07, p. 2018)."},{"key":"9746_CR34","unstructured":"Pan\u010dur, A., & Erjavec, T. (2020). The siParl corpus of Slovene parliamentary proceedings. In: Proceedings of the second ParlaCLARIN workshop (pp. 28\u201334)."},{"issue":"3","key":"9746_CR35","doi-asserted-by":"publisher","first-page":"130","DOI":"10.51663\/pnz.56.3.09","volume":"56","author":"A Pan\u010dur","year":"2016","unstructured":"Pan\u010dur, A., & \u0160orn, M. (2016). Smart big data: Use of Slovenian parliamentary papers in digital history. Contributions to Contemporary History, 56(3), 130\u2013146.","journal-title":"Contributions to Contemporary History"},{"key":"9746_CR36","unstructured":"Polani\u010d, P., & Dobrani\u0107, F. (2022). Corpus of political party programs Programi2022. Institute of Contemporary History. Retrieved from http:\/\/hdl.handle.net\/11356\/1734"},{"key":"9746_CR37","doi-asserted-by":"publisher","unstructured":"Proksch, S. O., & Slapin, J. B. (2014). The politics of parliamentary debate: Parties, rebels and representation (pp. 1\u201314). Cambridge University Press. https:\/\/doi.org\/10.1017\/CBO9781139680752","DOI":"10.1017\/CBO9781139680752"},{"issue":"1","key":"9746_CR38","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1017\/pan.2019.26","volume":"28","author":"L Rheault","year":"2020","unstructured":"Rheault, L., & Cochrane, C. (2020). Word embeddings for the analysis of ideological placement in parliamentary corpora. Political Analysis, 28(1), 112\u2013133. https:\/\/doi.org\/10.1017\/pan.2019.26","journal-title":"Political Analysis"},{"key":"9746_CR39","unstructured":"Skubic, J., & Fi\u0161er, D. (2022). Parliamentary discourse research in sociology: Literature review. In: Proceedings of the workshop ParlaCLARIN III within the 13th language resources and evaluation conference (pp. 81\u201391)."},{"key":"9746_CR40","unstructured":"Steingr\u00edmsson, S., Barkarson, S., & \u00d6rn\u00f3lfsson, G. T. (2020) IGC-parl: Icelandic corpus of parliamentary proceedings. In: D. Fi\u0161er, M. Eskevich M., and F. de\u00a0Jong (eds.) Proceedings of the second ParlaCLARIN workshop, European language resources association, Marseille, France (pp. 11\u201317). Retrieved from https:\/\/aclanthology.org\/2020.parlaclarin-1.3"},{"key":"9746_CR41","unstructured":"TEI Consortium. (2020). TEI P5: Guidelines for electronic text encoding and interchange. Retrieved from 27 Feb 2023, from https:\/\/tei-c.org\/guidelines\/p5\/"},{"key":"9746_CR42","unstructured":"Ter\u010don, L., \u010cibej, J., & Ljube\u0161i\u0107, N. (2023). The CLASSLA-Stanza model for lemmatisation of standard Slovenian 2.0. Jo\u017eef Stefan Institute. Retrieved from http:\/\/hdl.handle.net\/11356\/1768"},{"key":"9746_CR43","unstructured":"Ter\u010don, L., & Ljube\u0161i\u0107, N. (2023). The CLASSLA-Stanza model for UD dependency parsing of standard Slovenian 2.0. Jo\u017eef Stefan Institute. Retrieved from http:\/\/hdl.handle.net\/11356\/1769"},{"key":"9746_CR44","doi-asserted-by":"publisher","unstructured":"Ter\u010don, L., & Ljube\u0161i\u0107, N. (2023). CLASSLA-Stanza: The next step for linguistic processing of South Slavic Languages. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.2308.04255","DOI":"10.48550\/arXiv.2308.04255"},{"key":"9746_CR45","doi-asserted-by":"crossref","unstructured":"Truan, N., & Romary, L. (2022). Building, encoding, and annotating a corpus of parliamentary debates in TEI XML: A cross-linguistic account. Journal of the Text Encoding Initiative, (14).","DOI":"10.4000\/jtei.4164"},{"issue":"3","key":"9746_CR46","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1515\/zaa-2019-0025","volume":"67","author":"N Truan","year":"2019","unstructured":"Truan, N. (2019). Talking about, for, and to the people: Populism and representation in parliamentary debates on Europe. Zeitschrift f\u00fcr Anglistik und Amerikanistik, 67(3), 307\u2013337. https:\/\/doi.org\/10.1515\/zaa-2019-0025","journal-title":"Zeitschrift f\u00fcr Anglistik und Amerikanistik"},{"key":"9746_CR47","doi-asserted-by":"crossref","unstructured":"Wissik, T. (2022) Encoding interruptions in parliamentary data: From applause to interjections and laughter. Journal of the Text Encoding Initiative, (14).","DOI":"10.4000\/jtei.4214"}],"container-title":["Language Resources and Evaluation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-024-09746-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10579-024-09746-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-024-09746-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,18]],"date-time":"2025-05-18T15:04:00Z","timestamp":1747580640000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10579-024-09746-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,2]]},"references-count":47,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["9746"],"URL":"https:\/\/doi.org\/10.1007\/s10579-024-09746-8","relation":{},"ISSN":["1574-020X","1574-0218"],"issn-type":[{"type":"print","value":"1574-020X"},{"type":"electronic","value":"1574-0218"}],"subject":[],"published":{"date-parts":[[2024,6,2]]},"assertion":[{"value":"29 April 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 June 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no relevant financial interests to disclose.The author T.E. serves on the Editorial Board of this journal.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}