{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T19:14:40Z","timestamp":1757618080508,"version":"3.44.0"},"reference-count":72,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2025,5,28]],"date-time":"2025-05-28T00:00:00Z","timestamp":1748390400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,5,28]],"date-time":"2025-05-28T00:00:00Z","timestamp":1748390400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Ministry of Science & Technology ,Israel","award":["3-17990","3-17990"],"award-info":[{"award-number":["3-17990","3-17990"]}]},{"name":"Idit PhD Fellowship, University of Haifa"},{"DOI":"10.13039\/501100005717","name":"University of Haifa","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005717","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Lang Resources &amp; Evaluation"],"published-print":{"date-parts":[[2025,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>We present the Knesset Corpus, a corpus of Hebrew parliamentary proceedings containing over\u00a030 million sentences (over\u00a0384 million tokens) from all the (plenary and committee) protocols held in the Israeli parliament in the last three decades. Sentences are annotated with morpho-syntactic information and named entities, and are associated with detailed meta-information reflecting demographic and political properties of the speakers, based on a large database of parliament members and factions that we compiled. We discuss the structure and composition of the corpus and the various processing steps we applied to it. To demonstrate the utility of this novel dataset we present two use cases. We show that the corpus can be used to examine historical developments in the style of political discussions by showing a reduction in lexical richness in the proceedings over time. We also investigate some differences between the styles of male and female speakers. These use cases exemplify the potential of the corpus to shed light on important trends in the Israeli society, supporting research in linguistics, political science, communication, law, etc.<\/jats:p>","DOI":"10.1007\/s10579-025-09833-4","type":"journal-article","created":{"date-parts":[[2025,5,28]],"date-time":"2025-05-28T05:23:00Z","timestamp":1748409780000},"page":"2973-3004","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["The Knesset corpus: an annotated corpus of Hebrew parliamentary proceedings"],"prefix":"10.1007","volume":"59","author":[{"given":"Gili","family":"Goldin","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nick","family":"Howell","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Noam","family":"Ordan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ella","family":"Rabinovich","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shuly","family":"Wintner","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,5,28]]},"reference":[{"key":"9833_CR1","unstructured":"Abrami, G., Bagci, M., Hammerla, L., Mehler, A. (2022). German parliamentary corpus (GerParCor). In N. Calzolari,\u00a0F. B\u00e9chet & P. Blache (Eds.), Proceedings of the Thirteenth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France (pp 1900\u20131906). https:\/\/aclanthology.org\/2022.lrec-1.202\/"},{"key":"9833_CR2","unstructured":"Abrami, G., Bagci, M., Mehler, A. (2024). German parliamentary corpus (GerParCor) reloaded. In N. Calzolari, M. Y. Kan & V. Hoste (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ELRA and ICCL, Torino, Italia (pp 7707\u20137716).\u00a0https:\/\/aclanthology.org\/2024.lrec-main.681\/"},{"key":"9833_CR3","unstructured":"Agnoloni, T., Bartolini, R., Frontini, F., Montemagni, S., Marchetti, C., Quochi, V., Ruisi, M., Venturi, G. (2022). Making Italian parliamentary records machine-actionable: the construction of the ParlaMint-IT corpus. In\u00a0D. Fi\u0161er,\u00a0M. Eskevich & J. Lenardi\u010d (Eds.), Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France (pp 117\u2013124).\u00a0https:\/\/aclanthology.org\/2022.parlaclarin-1.17\/"},{"issue":"3","key":"9833_CR4","first-page":"321","volume":"23","author":"S Argamon","year":"2003","unstructured":"Argamon, S., Koppel, M., Fine, J., & Shimoni, A. R. (2003). Gender, genre, and writing style in formal written texts. Text and Talk, 23(3), 321\u201346.","journal-title":"Text and Talk"},{"key":"9833_CR5","unstructured":"Barbaresi, A. (2018) A corpus of German political speeches from the 21st century. In\u00a0N. Calzolari, K. Choukri &\u00a0C. Cieri (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan.\u00a0https:\/\/aclanthology.org\/L18-1127\/"},{"issue":"3","key":"9833_CR6","doi-asserted-by":"publisher","first-page":"849","DOI":"10.1017\/S0008423916001165","volume":"50","author":"K Beelen","year":"2017","unstructured":"Beelen, K., Thijm, T. A., Cochrane, C., Halvemaan, K., Hirst, G., Kimmins, M., Lijbrink, S., Marx, M., Naderi, N., Rheault, L., Polyanovsky, R., & Whyte, T. (2017). Digitization of the Canadian parliamentary debates. Canadian Journal of Political Science, 50(3), 849\u2013864. https:\/\/doi.org\/10.1017\/S0008423916001165","journal-title":"Canadian Journal of Political Science"},{"key":"9833_CR7","unstructured":"Blaette, A., Rakers, J., Leonhardt, C. (2022). How GermaParl evolves: Improving data quality by reproducible corpus preparation and user involvement. In\u00a0D. Fi\u0161er, M. Eskevich & J. Lenardi\u010d, (Eds.), Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France (pp 7\u201315).\u00a0https:\/\/aclanthology.org\/2022.parlaclarin-1.2\/"},{"key":"9833_CR8","unstructured":"Bl\u00e4tte, A., Blessing, A. (2018). The GermaParl Corpus of Parliamentary Protocols. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA)"},{"issue":"3","key":"9833_CR9","doi-asserted-by":"publisher","first-page":"710","DOI":"10.1044\/2019_JSLHR-19-00226","volume":"63","author":"KT Cunningham","year":"2020","unstructured":"Cunningham, K. T., & Haley, K. L. (2020). Measuring lexical diversity for discourse analysis in aphasia: Moving-average type-token ratio and word information measure. Journal of Speech, Language, and Hearing Research, 63(3), 710\u2013721.","journal-title":"Journal of Speech, Language, and Hearing Research"},{"issue":"2","key":"9833_CR10","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1093\/applin\/24.2.197","volume":"24","author":"H Daller","year":"2003","unstructured":"Daller, H., Van Hout, R., & Treffers-Daller, J. (2003). Lexical richness in the spontaneous speech of bilinguals. Applied Linguistics, 24(2), 197\u2013222.","journal-title":"Applied Linguistics"},{"key":"9833_CR11","doi-asserted-by":"publisher","DOI":"10.4000\/jtei.4133","author":"T Erjavec","year":"2021","unstructured":"Erjavec, T., & Pan\u010dur, A. (2021). The Parla-CLARIN recommendations for encoding corpora of parliamentary proceedings. Journal of the Text Encoding Initiative. https:\/\/doi.org\/10.4000\/jtei.4133","journal-title":"Journal of the Text Encoding Initiative"},{"issue":"1","key":"9833_CR12","doi-asserted-by":"publisher","first-page":"415","DOI":"10.1007\/s10579-021-09574-0","volume":"57","author":"T Erjavec","year":"2022","unstructured":"Erjavec, T., Ogrodniczuk, M., Osenova, P., Ljube\u0161i\u0107, N., Simov, K., Pan\u010dur, A., Rudolf, M., Kopp, M., Barkarson, S., Steingr\u00edmsson, S., \u00c7\u00f6ltekin, \u00c7., de Does, J., Depuydt, K., Agnoloni, T., Venturi, G., P\u00e9rez, M. C., de Macedo, L. D., Navarretta, C., Luxardo, G., & Fi\u0161er, D. (2022). The ParlaMint corpora of parliamentary proceedings. Language Resources and Evaluation, 57(1), 415\u2013448. https:\/\/doi.org\/10.1007\/s10579-021-09574-0","journal-title":"Language Resources and Evaluation"},{"key":"9833_CR13","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-024-09798-w","author":"T Erjavec","year":"2024","unstructured":"Erjavec, T., Kopp, M., Ljube\u0161i\u0107, N., Kuzman, T., Rayson, P., Osenova, P., Ogrodniczuk, M., \u00c7\u00f6ltekin, \u00c7., Kor\u017einek, D., Meden, K., & Skubic, J. (2024). ParlaMint II: Advancing comparable parliamentary corpora across Europe. Language Resources and Evaluation. https:\/\/doi.org\/10.1007\/s10579-024-09798-w","journal-title":"Language Resources and Evaluation"},{"key":"9833_CR14","doi-asserted-by":"publisher","unstructured":"Eyal, M., Noga, H., Aharoni, R., Szpektor, I., Tsarfaty R. (2023). Multilingual sequence-to-sequence models for Hebrew NLP. In A. Rogers, J. Boyd-Graber & N. Okazaki (Eds.), Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics, Toronto, Canada (pp 7700\u20137708).\u00a0https:\/\/doi.org\/10.18653\/v1\/2023.findings-acl.487","DOI":"10.18653\/v1\/2023.findings-acl.487"},{"key":"9833_CR15","doi-asserted-by":"crossref","unstructured":"Fabri, R., Gasser, M., Habash, N., Kiraz, G., Wintner S. (2014). Linguistic introduction: The orthography, morphology and syntax of Semitic languages. In I. Zitouni (Ed.), Natural Language Processing of Semitic Languages. Theory and Applications of Natural Language Processing, Springer, Berlin Heidelberg (pp. 3\u201341).","DOI":"10.1007\/978-3-642-45358-8_1"},{"key":"9833_CR16","unstructured":"Fi\u0161er, D., Eskevich, M., de\u00a0Jong F (Eds) (2020). Proceedings of the Second ParlaCLARIN Workshop, European Language Resources Association, Marseille, France, https:\/\/aclanthology.org\/2020.parlaclarin-1.0\/"},{"key":"9833_CR17","unstructured":"Fi\u0161er, D., Eskevich, M., Lenardi\u010d, J., (Eds) (2022). Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference, European Language Resources Association, Marseille, France, https:\/\/aclanthology.org\/2022.parlaclarin-1.0\/"},{"key":"9833_CR18","unstructured":"Fi\u0161er, D., Eskevich, M., de\u00a0Jong F. (Eds) (2018). Proceedings of LREC2018 Workshop ParlaCLARIN: Creating and Using Parliamentary Corpora., European Language Resources Association (ELRA), Paris, France"},{"key":"9833_CR19","unstructured":"Fi\u0161er, D., Lenardi\u010d, J. (2018). CLARIN resources for parliamentary discourse research. In\u00a0D. Fi\u0161er,\u00a0M. Eskevich & de\u00a0Jong F (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Paris, France"},{"key":"9833_CR20","unstructured":"Fi\u0161er, D., Eskevich, M., Bordon, D. (Eds) (2024). Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024, ELRA and ICCL, Torino, Italia. https:\/\/aclanthology.org\/2024.parlaclarin-1.0\/"},{"key":"9833_CR21","unstructured":"Frasnelli, V., Palmero\u00a0Aprosio, A. (2024). There\u2018s something new about the Italian parliament: The IPSA corpus. In N. Calzolari, M. Y. Kan,\u00a0V. Hoste (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ELRA and ICCL, Torino, Italia (pp 16037\u201316046).\u00a0https:\/\/aclanthology.org\/2024.lrec-main.1394\/"},{"key":"9833_CR22","doi-asserted-by":"crossref","unstructured":"Garera, N., Yarowsky, D. (2009). Modeling latent biographic attributes in conversational genres. In K. Y. Su, J. Su & J. Wiebe (Eds.), Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, Suntec, Singapore (pp 710\u2013718).\u00a0https:\/\/aclanthology.org\/P09-1080\/","DOI":"10.3115\/1690219.1690245"},{"issue":"1","key":"9833_CR23","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1007\/s10579-013-9256-x","volume":"49","author":"S Gretz","year":"2015","unstructured":"Gretz, S., Itai, A., MacWhinney, B., Nir, B., & Wintner, S. (2015). Parsing Hebrew CHILDES transcripts. Language Resources and Evaluation, 49(1), 107\u2013145. https:\/\/doi.org\/10.1007\/s10579-013-9256-x","journal-title":"Language Resources and Evaluation"},{"key":"9833_CR24","unstructured":"Guibon, G., Courtin, M., Gerdes, K., Guillaume, B., (2020). When collaborative treebank curation meets graph grammars. In N. Calzolari, F. B\u00e9chet & P. Blache (Eds.), Proceedings of the Twelfth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France (pp 5291\u20135300).\u00a0https:\/\/aclanthology.org\/2020.lrec-1.651\/"},{"key":"9833_CR25","unstructured":"Hansen, D. H., Navarretta, C., Offersgaard, L. (2018). A pilot gender study of the Danish parliament corpus. In Proceedings of the ParlaClarin workshop at the Eleventh International Conference on Language Resources and Evaluation. European Language Resources Association"},{"key":"9833_CR26","unstructured":"Hladka, B., Kopp, M., Stra\u0148\u00e1k, P. (2020). Compiling Czech parliamentary stenographic protocols into a corpus. In\u00a0D. Fi\u0161er, M. Eskevich &\u00a0F. de\u00a0Jong (Eds.), Proceedings of the Second ParlaCLARIN Workshop. European Language Resources Association, Marseille, France (pp 18\u201322).\u00a0https:\/\/aclanthology.org\/2020.parlaclarin-1.4\/"},{"key":"9833_CR27","doi-asserted-by":"publisher","unstructured":"Hussain, M. M., Mahmud, I. (2019). pyMannKendall: a python package for non parametric Mann Kendall family of trend tests. Journal of Open Source Software. 4(39):1556. https:\/\/doi.org\/10.21105\/joss.01556,","DOI":"10.21105\/joss.01556"},{"key":"9833_CR28","doi-asserted-by":"crossref","unstructured":"Ilie, C. (2017). Parliamentary debates. In R. Wodak &\u00a0B. Forchtner (Eds.),The Routledge Handbook of Language and Politics. Taylor and Francis, chap\u00a020","DOI":"10.4324\/9781315183718-24"},{"issue":"1","key":"9833_CR29","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1007\/s10579-007-9050-8","volume":"42","author":"A Itai","year":"2008","unstructured":"Itai, A., & Wintner, S. (2008). Language resources for Hebrew. Language Resources and\u00a0Evaluation, 42(1), 75\u201398.","journal-title":"Language Resources and\u00a0Evaluation"},{"key":"9833_CR30","doi-asserted-by":"publisher","unstructured":"Kawintiranon, K., Singh, L. (2021). Knowledge enhanced masked language model for stance detection. In K. Toutanova,\u00a0A. Rumshisky & L. Zettlemoyer (Eds.), Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online (pp 4725\u20134735).\u00a0https:\/\/doi.org\/10.18653\/v1\/2021.naacl-main.376","DOI":"10.18653\/v1\/2021.naacl-main.376"},{"key":"9833_CR31","doi-asserted-by":"crossref","unstructured":"Kessler, J. (2017). Scattertext: A browser-based tool for visualizing how corpora differ. In M. Bansal &\u00a0H. Ji (Eds.), Proceedings of ACL 2017, System Demonstrations. Association for Computational Linguistics, Vancouver, Canada (pp 85\u201390).\u00a0https:\/\/aclanthology.org\/P17-4015\/","DOI":"10.18653\/v1\/P17-4015"},{"key":"9833_CR32","unstructured":"Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In Proceedings of Machine Translation Summit X: Papers, Phuket, Thailand (pp 79\u201386).\u00a0https:\/\/aclanthology.org\/2005.mtsummit-papers.11\/"},{"key":"9833_CR33","unstructured":"Koppel, M., Argamon, S., Shimoni, A.R. (2003). Automatically categorizing written texts by author gender. Literary and Linguistic Computing 14(3)"},{"key":"9833_CR34","doi-asserted-by":"publisher","unstructured":"Kumar, S., Wintner, S., Smith, N. A., Tsvetkov, Y., (2019). Topics to avoid: Demoting latent confounds in text classification. In K. Inui, J. Jiang & V. Ng (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China (pp 4153\u20134163).\u00a0https:\/\/doi.org\/10.18653\/v1\/D19-1425,","DOI":"10.18653\/v1\/D19-1425"},{"issue":"1","key":"9833_CR35","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1017\/S0047404500000051","volume":"2","author":"R Lakoff","year":"1973","unstructured":"Lakoff, R. (1973). Language and woman\u2019s place. Language in Society, 2(1), 45\u201380.","journal-title":"Language in Society"},{"issue":"3","key":"9833_CR36","doi-asserted-by":"publisher","first-page":"873","DOI":"10.1007\/s10579-018-9411-5","volume":"52","author":"E Lapponi","year":"2018","unstructured":"Lapponi, E., S\u00f8yland, M. G., Velldal, E., & Oepen, S. (2018). The talk of Norway: A richly annotated corpus of the Norwegian parliament, 1998\u20132016. Language Resources and\u00a0Evaluation, 52(3), 873\u2013893. https:\/\/doi.org\/10.1007\/s10579-018-9411-5","journal-title":"Language Resources and\u00a0Evaluation"},{"key":"9833_CR37","doi-asserted-by":"crossref","unstructured":"Litvinova, T., Seredin, P., Litvinova, O., Zagorovskaya, O. (2017). Differences in type-token ratio and part-of-speech frequencies in male and female russian written texts. In Proceedings of the Workshop on Stylistic Variation, (pp 69\u201373)","DOI":"10.18653\/v1\/W17-4909"},{"issue":"2","key":"9833_CR38","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1162\/coli_a_00402","volume":"47","author":"MC de Marneffe","year":"2021","unstructured":"de Marneffe, M. C., Manning, C. D., Nivre, J., & Zeman, D. (2021). Universal Dependencies. Computational Linguistics, 47(2), 255\u2013308. https:\/\/doi.org\/10.1162\/coli_a_00402","journal-title":"Computational Linguistics"},{"key":"9833_CR39","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-024-09746-8","author":"K Meden","year":"2024","unstructured":"Meden, K., Erjavec, T., & Pan\u010dur, A. (2024). Slovenian parliamentary corpus siParl. Language Resources and Evaluatio. https:\/\/doi.org\/10.1007\/s10579-024-09746-8","journal-title":"Language Resources and Evaluatio"},{"issue":"4","key":"9833_CR40","doi-asserted-by":"publisher","first-page":"372","DOI":"10.1093\/pan\/mpn018","volume":"16","author":"BL Monroe","year":"2008","unstructured":"Monroe, B. L., Colaresi, M. P., & Quinn, K. M. (2008). Fightin\u2019words: Lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis, 16(4), 372\u2013403.","journal-title":"Political Analysis"},{"key":"9833_CR41","unstructured":"Mor-Lan, G., Levi, E., Sheafer, T., Shenhav, S. R. (2024). IsraParlTweet: The Israeli parliamentary and Twitter resource. In N. Calzolari, M. Y. Kan & V. Hoste (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ELRA and ICCL, Torino, Italia (pp 9372\u20139381).\u00a0https:\/\/aclanthology.org\/2024.lrec-main.819\/"},{"key":"9833_CR42","doi-asserted-by":"crossref","unstructured":"Muchnik, M. (2015). The Gender Challenge of Hebrew, The Brill reference library of Judaism (Vol. 42). Brill.","DOI":"10.1163\/9789004282711"},{"key":"9833_CR43","unstructured":"Nanni, F., Osman, M., Cheng, Y. R., Ponzetto, S. P., Dietz, L. (2018). Ukparl: A semantified and topically organized corpus of political speeches. In D. Fi\u0161er, M. Eskevich & F. de\u00a0Jong (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Paris, France"},{"key":"9833_CR44","unstructured":"Navarretta, C., Haltrup\u00a0Hansen, D. (2024). Government and opposition in Danish parliamentary debates. In D. Fiser, M. Eskevich & D. Bordon (Eds.), Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024. ELRA and ICCL, Torino, Italia (pp 154\u2013162)\u00a0https:\/\/aclanthology.org\/2024.parlaclarin-1.23\/"},{"key":"9833_CR45","doi-asserted-by":"publisher","unstructured":"Neudecker, C., Baierer, K., Gerber, M., Clausner, C., Antonacopoulos, A., Pletschacher, S. (2021). A survey of OCR evaluation tools and metrics. In Proceedings of the 6th International Workshop on Historical Document Imaging and Processing. Association for Computing Machinery, New York, NY, USA, HIP \u201921 (p 13-18).\u00a0https:\/\/doi.org\/10.1145\/3476887.3476888,","DOI":"10.1145\/3476887.3476888"},{"key":"9833_CR46","doi-asserted-by":"crossref","unstructured":"Nguyen, M. V., Lai, V.D., Veyseh, A. P. B., Nguyen, T. H. (2021). Trankit: A light-weight transformer-based toolkit for multilingual natural language processing. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations","DOI":"10.18653\/v1\/2021.eacl-demos.10"},{"key":"9833_CR47","unstructured":"Ogrodniczuk, M., Nito\u0144, B. (2020). New developments in the Polish parliamentary corpus. In\u00a0D Fi\u0161er, M. Eskevich &\u00a0F. de\u00a0Jong (Eds.), Proceedings of the Second ParlaCLARIN Workshop. European Language Resources Association, Marseille, France (pp 1\u20134).\u00a0https:\/\/aclanthology.org\/2020.parlaclarin-1.1\/"},{"key":"9833_CR48","unstructured":"Pan\u010dur, A., Erjavec, T. (2020). The siParl corpus of Slovene parliamentary proceedings. In D. Fi\u0161er, M. Eskevich & F. de\u00a0Jong (Eds.), Proceedings of the Second ParlaCLARIN Workshop. European Language Resources Association, Marseille, France (pp 28\u201334).\u00a0https:\/\/aclanthology.org\/2020.parlaclarin-1.6\/"},{"key":"9833_CR49","doi-asserted-by":"publisher","unstructured":"Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C. D. (2020). Stanza: A Python natural language processing toolkit for many human languages. In A. Celikyilmaz & T. H. Wen (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Online (pp 101\u2013108).\u00a0https:\/\/doi.org\/10.18653\/v1\/2020.acl-demos.14,","DOI":"10.18653\/v1\/2020.acl-demos.14"},{"key":"9833_CR50","doi-asserted-by":"publisher","unstructured":"Rabinovich, E., Sultani, M., Stevenson, S. (2019). CodeSwitch-Reddit: Exploration of written multilingual discourse in online discussion forums. In K. Inui,\u00a0J. Jiang & V. Ng (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China (pp 4776\u20134786).\u00a0https:\/\/doi.org\/10.18653\/v1\/D19-1484,","DOI":"10.18653\/v1\/D19-1484"},{"key":"9833_CR51","unstructured":"Ratclif, J. W. (1988). Pattern matching: the gestalt approach. Dr Dobb\u2019s Journal https:\/\/www.drdobbs.com\/database\/pattern-matching-the-gestalt-approach\/184407970?pgno=5"},{"key":"9833_CR52","doi-asserted-by":"publisher","first-page":"807","DOI":"10.1007\/s10579-019-09458-4","volume":"53","author":"A Rubinstein","year":"2019","unstructured":"Rubinstein, A. (2019). Historical corpora meet the digital humanities: The Jerusalem Corpus of Emergent Modern Hebrew. Language Resources and Evaluation, 53, 807\u2013835.","journal-title":"Language Resources and Evaluation"},{"key":"9833_CR53","unstructured":"R\u00fanarsson, K., Sigur\u00f0sson, E. F. (2020). Parsing Icelandic al\u00feingi transcripts: Parliamentary speeches as a genre. In D. Fi\u0161er, M. Eskevich & F. de\u00a0Jong (Eds.), Proceedings of the Second ParlaCLARIN Workshop. European Language Resources Association, Marseille, France (pp 44\u201350).\u00a0https:\/\/aclanthology.org\/2020.parlaclarin-1.9\/"},{"key":"9833_CR54","doi-asserted-by":"publisher","unstructured":"Sade, S., Seker, A., Tsarfaty, R. (2018), The Hebrew Universal Dependency treebank: Past present and future. In M. C. de\u00a0Marneffe, T. Lynn & S. Schuster (Eds.), Proceedings of the Second Workshop on Universal Dependencies (UDW 2018). Association for Computational Linguistics, Brussels, Belgium (pp 133\u2013143).\u00a0https:\/\/doi.org\/10.18653\/v1\/W18-6016","DOI":"10.18653\/v1\/W18-6016"},{"key":"9833_CR55","first-page":"185","volume-title":"Corpus linguistics by the Lune: A Festschrift for Geoffrey Leech","author":"HJ Schmid","year":"2003","unstructured":"Schmid, H. J. (2003). Do women and men really live in different cultures? evidence from the BNC. In A. Wilson, P. Rayson, & T. McEnery (Eds.), Corpus linguistics by the Lune: A Festschrift for Geoffrey Leech (pp. 185\u2013221). Peter Lang."},{"key":"9833_CR56","doi-asserted-by":"publisher","unstructured":"Seker, A., Bandel, E., Bareket, D., Brusilovsky, I., Greenfeld, R., Tsarfaty R. (2022). AlephBERT: Language model pre-training and evaluation from sub-word to sentence level. In S. Muresan, P. Nakov & A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland (pp 46\u201356).\u00a0https:\/\/doi.org\/10.18653\/v1\/2022.acl-long.4","DOI":"10.18653\/v1\/2022.acl-long.4"},{"key":"9833_CR57","doi-asserted-by":"publisher","unstructured":"Shibata, D., Wakamiya, S., Ito, K., Miyabe, M., Kinoshita, A., Aramaki E. (2018). Vocabchecker: Measuring language abilities for detecting early stage dementia. In Proceedings of the 23rd International Conference on Intelligent User Interfaces Companion. Association for Computing Machinery, New York, NY, USA, IUI \u201918 Companion.\u00a0https:\/\/doi.org\/10.1145\/3180308.3180332","DOI":"10.1145\/3180308.3180332"},{"key":"9833_CR58","doi-asserted-by":"publisher","unstructured":"Shmidman, A., Shmidman, C. S., Bareket, D., Koppel, M., Tsarfaty, R. (2023a). Do pretrained contextual language models distinguish between Hebrew homograph analyses? In A. Vlachos & I. Augenstein (Eds.), Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Dubrovnik, Croatia (pp 849\u2013864).\u00a0https:\/\/doi.org\/10.18653\/v1\/2023.eacl-main.59,","DOI":"10.18653\/v1\/2023.eacl-main.59"},{"key":"9833_CR59","unstructured":"Shmidman, S., Shmidman, A., Koppel, M. (2023b). DictaBERT: A state-of-the-art BERT suite for Modern Hebrew. https:\/\/arxiv.org\/abs\/2308.16687,"},{"issue":"2","key":"9833_CR60","first-page":"247","volume":"42","author":"K Sima\u2019an","year":"2001","unstructured":"Sima\u2019an, K., Itai, A., Winter, Y., Altman, A., & Nativ, N. (2001). Building a tree-bank of Modern Hebrew text. Traitement Automatique des Langues, 42(2), 247\u2013380.","journal-title":"Traitement Automatique des Langues"},{"key":"9833_CR61","unstructured":"Skubic, J., Fi\u0161er, D. (2022). Parliamentary discourse research in sociology: Literature review. In D. Fi\u0161er, M. Eskevich & J. Lenardi\u010d (Eds.), Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France (pp 81\u201391).\u00a0https:\/\/aclanthology.org\/2022.parlaclarin-1.12\/"},{"key":"9833_CR62","unstructured":"Solberg, P. E., Ortiz, P. (2022). The Norwegian parliamentary speech corpus. In N. Calzolari, F. B\u00e9chet &\u00a0P. Blach (Eds.), Proceedings of the Thirteenth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France (pp 1003\u20131008).\u00a0https:\/\/aclanthology.org\/2022.lrec-1.106\/"},{"key":"9833_CR63","doi-asserted-by":"publisher","unstructured":"Speer, R. (2022). rspeer\/wordfreq: v3.0. https:\/\/doi.org\/10.5281\/zenodo.7199437, software package","DOI":"10.5281\/zenodo.7199437"},{"key":"9833_CR64","first-page":"17","volume":"32","author":"S Staples","year":"2016","unstructured":"Staples, S., & Reppen, R. (2016). Understanding first-year l2 writing: A lexico-grammatical analysis across l1s, genres, and language ratings. Journal of Second Language\u00a0Writing, 32, 17\u201335.","journal-title":"Journal of Second Language\u00a0Writing"},{"key":"9833_CR65","unstructured":"\u0160tajner, S., Mitkov, R. (2012). Using comparable corpora to track diachronic and synchronic changes in lexical density and lexical richness. In R. Rapp, M. Tadic & S. Sharoff (Eds.), Proceedings of the 5th Workshop on Building and Using Comparable Corpora, held in conjunction with LREC\u201912. European Language Resources Association (ELRA), Istanbul, Turkey"},{"issue":"2","key":"9833_CR66","doi-asserted-by":"publisher","first-page":"431","DOI":"10.1007\/s10579-020-09510-8","volume":"55","author":"FM Tyers","year":"2021","unstructured":"Tyers, F. M., & Howell, N. (2021). Morphological analysis and disambiguation for Breton. Language Resourources and Evaluation, 55(2), 431\u2013473. https:\/\/doi.org\/10.1007\/s10579-020-09510-8","journal-title":"Language Resourources and Evaluation"},{"key":"9833_CR67","doi-asserted-by":"publisher","unstructured":"Wintner, S. (2014). Morphological processing of Semitic languages. In I. Zitouni (Ed.), Natural Language Processing of Semitic Languages. Theory and Applications of Natural Language Processing, Springer, Berlin Heidelberg (pp 43\u201366)\u00a0https:\/\/doi.org\/10.1007\/978-3-642-45358-8_2","DOI":"10.1007\/978-3-642-45358-8_2"},{"issue":"8","key":"9833_CR68","doi-asserted-by":"publisher","first-page":"1485","DOI":"10.4304\/tpls.3.8.1485-1489","volume":"3","author":"X Xia","year":"2013","unstructured":"Xia, X. (2013). Gender differences in using language. Theory and Practice in Language Studies, 3(8), 1485\u20131489. https:\/\/doi.org\/10.4304\/tpls.3.8.1485-1489","journal-title":"Theory and Practice in Language Studies"},{"key":"9833_CR69","unstructured":"Yrj\u00e4n\u00e4inen, V. A., Mohammadi\u00a0Nor\u00e9n, F., Borges, R., Jarlbrink, J., \u00c5berg Brorsson, L., Olsson, A. P., Snickars, P., Magnusson, M. (2024). The Swedish parliament corpus 1867 \u2013 2022. In . Calzolari, M. Y. Kan & V. Hoste(Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ELRA and ICCL, Torino, Italia (pp 16100\u201316112).\u00a0https:\/\/aclanthology.org\/2024.lrec-main.1400\/"},{"key":"9833_CR70","doi-asserted-by":"publisher","unstructured":"Zeldes, A., Howell, N., Ordan, N., Ben Moshe, Y. (2022). A second wave of UD Hebrew treebanking and cross-domain parsing. In Y. Goldberg, Z. Kozareva &\u00a0Y. Zhang (Eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (pp 4331\u20134344).\u00a0https:\/\/doi.org\/10.18653\/v1\/2022.emnlp-main.292,","DOI":"10.18653\/v1\/2022.emnlp-main.292"},{"key":"9833_CR71","doi-asserted-by":"publisher","unstructured":"Zeman, D., Haji\u010d, J., Popel, M., Potthast, M., Straka M., Ginter, F., Nivre, J., Petrov, S. (2018). CoNLL 2018 shared task: Multilingual parsing from raw text to Universal Dependencies. In D. Zeman &\u00a0J. Haji\u010d (Eds.), Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Association for Computational Linguistics, Brussels, Belgium (pp 1\u201321).\u00a0https:\/\/doi.org\/10.18653\/v1\/K18-2001,","DOI":"10.18653\/v1\/K18-2001"},{"key":"9833_CR72","doi-asserted-by":"publisher","unstructured":"Zhou, X., Gao, Y., Lu, X. (2023). Lexical complexity changes in 100 years\u2019 academic writing: Evidence from nature biology letters. Journal of English for Academic Purposes 64:101262. https:\/\/doi.org\/10.1016\/j.jeap.2023.101262, https:\/\/www.sciencedirect.com\/science\/article\/pii\/S1475158523000486","DOI":"10.1016\/j.jeap.2023.101262"}],"container-title":["Language Resources and Evaluation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-025-09833-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10579-025-09833-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-025-09833-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,6]],"date-time":"2025-09-06T16:07:56Z","timestamp":1757174876000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10579-025-09833-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,28]]},"references-count":72,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,9]]}},"alternative-id":["9833"],"URL":"https:\/\/doi.org\/10.1007\/s10579-025-09833-4","relation":{},"ISSN":["1574-020X","1574-0218"],"issn-type":[{"type":"print","value":"1574-020X"},{"type":"electronic","value":"1574-0218"}],"subject":[],"published":{"date-parts":[[2025,5,28]]},"assertion":[{"value":"23 April 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 May 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}