{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,27]],"date-time":"2026-05-27T03:02:26Z","timestamp":1779850946690,"version":"3.53.1"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T00:00:00Z","timestamp":1773446400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T00:00:00Z","timestamp":1773446400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100006470","name":"Aristotle University of Thessaloniki","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006470","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Comput Soc Sc"],"published-print":{"date-parts":[[2026,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Political discourse datasets are important for gaining political insights, analyzing communication strategies or social science phenomena. Although numerous political discourse corpora exist, comprehensive, high-quality, annotated datasets are scarce. This is largely due to the substantial manual effort, multidisciplinarity, and expertise required for the nuanced annotation of rhetorical strategies and ideological contexts. In this paper, we present AgoraSpeech, a meticulously curated, high-quality dataset of 171 political speeches from six parties during the Greek national elections in 2023. The dataset includes annotations (per paragraph) for six natural language processing (NLP) tasks: text classification, topic identification, sentiment analysis, named entity recognition, polarization and populism detection. A two-step annotation was employed, starting with ChatGPT-generated annotations and followed by exhaustive human-in-the-loop validation. The dataset was initially used in a case study to provide insights during the pre-election period. However, it has general applicability by serving as a rich source of information for political and social scientists, journalists, or data scientists, while it can be used for benchmarking and fine-tuning NLP and large language models (LLMs).<\/jats:p>","DOI":"10.1007\/s42001-026-00469-0","type":"journal-article","created":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T02:59:00Z","timestamp":1773457140000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Agoraspeech: a multi-annotated comprehensive dataset of political discourse through the lens of humans and AI"],"prefix":"10.1007","volume":"9","author":[{"given":"Pavlos","family":"Sermpezis","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Stelios","family":"Karamanidis","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Eva","family":"Paraschou","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ilias","family":"Dimitriadis","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sofia","family":"Yfantidou","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Filitsa-Ioanna","family":"Kouskouveli","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Thanasis","family":"Troboukis","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kelly","family":"Kiki","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Antonis","family":"Galanopoulos","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Athena","family":"Vakali","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2026,3,14]]},"reference":[{"key":"469_CR1","doi-asserted-by":"crossref","unstructured":"Ahrens, K., Zeng, H., & Wong, S.-H.R. (2018). Using a corpus of english and chinese political speeches for metaphor analysis. In Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018).","DOI":"10.63317\/2a8cuutk5yc9"},{"key":"469_CR2","doi-asserted-by":"publisher","DOI":"10.1057\/9780230245235_7","volume-title":"Metaphor and Gender in British Parliamentary Debates","author":"J Charteris-Black","year":"2009","unstructured":"Charteris-Black, J. (2009). Metaphor and Gender in British Parliamentary Debates. London: Springer."},{"issue":"1","key":"469_CR3","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1080\/19331680802149608","volume":"5","author":"B Yu","year":"2008","unstructured":"Yu, B., Kaufmann, S., & Diermeier, D. (2008). Classifying party affiliation from political speech. Journal of Information Technology & Politics, 5(1), 33\u201348.","journal-title":"Journal of Information Technology & Politics"},{"key":"469_CR4","doi-asserted-by":"crossref","unstructured":"Dilai, I., & Dilai, M. (2020). Automatic extraction of keywords in political speeches. In 2020 IEEE 15th international conference on computer sciences and information technologies (CSIT), (Vol. 1, pp. 291\u2013294). IEEE.","DOI":"10.1109\/CSIT49958.2020.9322011"},{"key":"469_CR5","unstructured":"Populismus corpus_03 (improved edition) (2020). CLARIN:EL. http:\/\/hdl.handle.net\/11500\/AUTH-0000-0000-5DFC-D."},{"key":"469_CR6","doi-asserted-by":"crossref","unstructured":"Card, D., Boydstun, A., Gross, J. H., Resnik, P., & Smith, N. A. (2015). The media frames corpus: Annotations of frames across issues. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 2: Short Papers), pp. 438\u2013444.","DOI":"10.3115\/v1\/P15-2072"},{"key":"469_CR7","doi-asserted-by":"crossref","unstructured":"Wachsmuth, H., Naderi, N., Hou, Y., Bilu, Y., Prabhakaran, V., Thijm, T.A., Hirst, G., & Stein, B. (2017). Computational argumentation quality assessment in natural language. In Proceedings of the 15th conference of the European chapter of the association for computational linguistics Volume 1, Long Papers, pp. 176\u2013187.","DOI":"10.18653\/v1\/E17-1017"},{"key":"469_CR8","unstructured":"Language, I., & Center, S. P. -A. R. (2020). Greek Parliament topic modeling visualization corpus. CLARIN:EL. http:\/\/hdl.handle.net\/11500\/ATHENA-0000-0000-5E18-D."},{"key":"469_CR9","unstructured":"Kuzman, T., Ljube\u0161i\u0107, N., Erjavec, T., Kopp, M., Ogrodniczuk, M., Osenova, P., Rayson, P., Vidler, J., Agerri, R., Agirrezabal, M., Agnoloni, T., Aires, J., Albini, M., Alkorta, J., Antiba-Cartazo, I., Arrieta, E., Barcala, M., Bardanca, D., Barkarson, S., Bartolini, R., Battistoni, R., Bel, N., Bonet Ramos, M. d. M., Calzada P\u00e9rez, M., Cardoso, A., \u00c7\u00f6ltekin, \u00c7., Coole, M., Dar$$\\grave{{\\rm g}}$$is, R., Does, J., Libano, R., Depoorter, G., Depuydt, K., Diwersy, S., Dod\u00e9, R., Fernandez, K., Fern\u00e1ndez Rei, E., Frontini, F., Garcia, M., Garc\u00eda D\u00edaz, N., Garc\u00eda Louzao, P., Gavriilidou, M., Gkoumas, D., Grigorov, I., Grigorova, V., Haltrup Hansen, D., Iruskieta, M., Jarlbrink, J., Jelencsik-M\u00e1tyus, K., Jongejan, B., Kahusk, N., Kirnbauer, M., Kryvenko, A., Ligeti-Nagy, N., Luxardo, G., Magari\u00f1os, C., Magnusson, M., Marchetti, C., Marx, M., Meden, K., Mendes, A., Mochtak, M., M\u00f6lder, M., Montemagni, S., Navarretta, C., Nito\u0144, B., Nor\u00e9n, F.M., Nwadukwe, A., Ojster\u0161ek, M., Pan\u010dur, A., Papavassiliou, V., Pereira, R., P\u00e9rez Lago, M., Piperidis, S., Pirker, H., Pisani, M., Pol, H.v.d., Prokopidis, P., Quochi, V., Regueira, X.L., Rudolf, M., Ruisi, M., Rupnik, P., Schopper, D., Simov, K., Sinikallio, L., Skubic, J., Tamper, M., Tungland, L.M., Tuominen, J., Heusden, R., Varga, Z., V\u00e1zquez Abu\u00edn, M., Venturi, G., Vidal Migu\u00e9ns, A., Vider, K., Vivel Couso, A., Vladu, A.I., Wissik, T., Yrj\u00e4n\u00e4inen, V., Zevallos, R., & Fi\u0161er, D. (2023). Linguistically annotated multilingual comparable corpora of parliamentary debates in English ParlaMint-en.ana 4.0. Slovenian language resource repository CLARIN.SI. http:\/\/hdl.handle.net\/11356\/1864."},{"key":"469_CR10","doi-asserted-by":"publisher","unstructured":"Lehmann, P., Franzmann, S., Al-Gaddooa, D., Burst, T., Ivanusch, C., Regel, S., Riethm\u00f6ller, F., Volkens, A., We\u00dfels, B.,& Zehnter, L. (2024). The Manifesto Data Collection. Manifesto Project (MRG\/CMP\/MARPOR). Version 2024a. Wissenschaftszentrum Berlin f\u00fcr Sozialforschung \/ G\u00f6ttinger Institut f\u00fcr Demokratieforschung, Berlin \/ G\u00f6ttingen. https:\/\/doi.org\/10.25522\/manifesto.mpds.2024a.","DOI":"10.25522\/manifesto.mpds.2024a"},{"key":"469_CR11","unstructured":"Di Bona, G., Fraxanet, E., Komander, B., Sasso, A. L., Morini, V., Vendeville, A., Falkenberg, M.,& Galeazzi, A. (2024). Sampled datasets risk substantial bias in the identification of political polarization on social media. Preprint retrieved from arXiv:2406.19867."},{"key":"469_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s42001-021-00117-9","volume":"5","author":"E Chen","year":"2022","unstructured":"Chen, E., Deb, A., & Ferrara, E. (2022). # Election2020: the first public twitter dataset on the 2020 US presidential election. Journal of Computational Social Science, 5, 1\u201318.","journal-title":"Journal of Computational Social Science"},{"key":"469_CR13","doi-asserted-by":"crossref","unstructured":"Li, M., Shi, T., Ziems, C., Kan, M.-Y., Chen, N. F., Liu, Z., & Yang, D. (2023). Coannotating: Uncertainty-guided work allocation between human and large language models for data annotation. Preprint retrieved from arXiv:2310.15638.","DOI":"10.18653\/v1\/2023.emnlp-main.92"},{"key":"469_CR14","unstructured":"Madiega, T. (2021). Artificial intelligence act. European Parliament: European Parliamentary Research Service."},{"key":"469_CR15","first-page":"1877","volume":"33","author":"T Brown","year":"2020","unstructured":"Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877\u20131901.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"469_CR16","doi-asserted-by":"crossref","unstructured":"Gilardi, F., Alizadeh, M., & Kubli, M. (2023). Chatgpt outperforms crowd-workers for text-annotation tasks. Preprint retrieved from arXiv:2303.15056.","DOI":"10.1073\/pnas.2305016120"},{"key":"469_CR17","unstructured":"Zhu, Y., Zhang, P., Haq, E. -U., Hui, P., & Tyson, G. (2023). Can chatgpt reproduce human-generated labels? a study of social computing tasks. Preprint retrieved from arXiv:2304.10145."},{"key":"469_CR18","doi-asserted-by":"publisher","unstructured":"Huang, F., Kwak, H., & An, J. (2023) Is chatgpt better than human annotators? potential and limitations of chatgpt in explaining implicit hate speech. In Companion proceedings of the ACM web conference 2023. WWW \u201923 Companion, pp. 294\u2013297. Association for Computing Machinery, New York. https:\/\/doi.org\/10.1145\/3543873.3587368.","DOI":"10.1145\/3543873.3587368."},{"key":"469_CR19","doi-asserted-by":"publisher","DOI":"10.7759\/cureus.35029","author":"M Sallam","year":"2023","unstructured":"Sallam, M., Salim, N. A., Ala\u2019a, B., Barakat, M., Fayyad, D., Hallit, S., Harapan, H., Hallit, R., Mahafzah, A., & Ala\u2019a, B. (2023). Chatgpt output regarding compulsory vaccination and covid-19 vaccine conspiracy: A descriptive study at the outset of a paradigm shift in online search for information. Cureus. https:\/\/doi.org\/10.7759\/cureus.35029","journal-title":"Cureus"},{"key":"469_CR20","unstructured":"Baldwin, T., Cook, P., Lui, M., MacKinlay, A., & Wang, L. (2013). How noisy social media text, how diffrnt social media sources? In Proceedings of the sixth international joint conference on natural language processing, pp. 356\u2013364."},{"key":"469_CR21","doi-asserted-by":"publisher","first-page":"101068","DOI":"10.1016\/j.elerap.2021.101068","volume":"48","author":"Q Deng","year":"2021","unstructured":"Deng, Q., Hine, M. J., Ji, S., & Wang, Y. (2021). Understanding consumer engagement with brand posts on social media: The effects of post linguistic styles. Electronic Commerce Research and Applications, 48, 101068.","journal-title":"Electronic Commerce Research and Applications"},{"key":"469_CR22","doi-asserted-by":"crossref","unstructured":"Kiela, D., Bartolo, M., Nie, Y., Kaushik, D., Geiger, A., Wu, Z., Vidgen, B., Prasad, G., Singh, A., & Ringshia, P., et al. (2021). Dynabench: Rethinking benchmarking in nlp. Preprint retrieved from arXiv:2104.14337.","DOI":"10.18653\/v1\/2021.naacl-main.324"},{"issue":"1","key":"469_CR23","first-page":"1","volume":"3","author":"Y Gu","year":"2021","unstructured":"Gu, Y., Tinn, R., Cheng, H., Lucas, M., Usuyama, N., Liu, X., Naumann, T., Gao, J., & Poon, H. (2021). Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH), 3(1), 1\u201323.","journal-title":"ACM Transactions on Computing for Healthcare (HEALTH)"},{"issue":"1","key":"469_CR24","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1038\/s41597-019-0103-9","volume":"6","author":"H Harutyunyan","year":"2019","unstructured":"Harutyunyan, H., Khachatrian, H., Kale, D. C., Ver Steeg, G., & Galstyan, A. (2019). Multitask learning and benchmarking with clinical time series data. Scientific Data, 6(1), 96.","journal-title":"Scientific Data"},{"key":"469_CR25","unstructured":"Guo, B., Zhang, X., Wang, Z., Jiang, M., Nie, J., Ding, Y., Yue, J., & Wu, Y. (2023). How close is chatgpt to human experts? Comparison corpus, evaluation, and detection. Preprint retrieved from arXiv:2301.07597."},{"key":"469_CR26","doi-asserted-by":"publisher","DOI":"10.4324\/9780203561218","volume-title":"Analysing Political Discourse: Theory and Practice","author":"P Chilton","year":"2004","unstructured":"Chilton, P. (2004). Analysing Political Discourse: Theory and Practice. Oxfordshire: Routledge."},{"issue":"2","key":"469_CR27","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1111\/ajps.12423","volume":"63","author":"K Benoit","year":"2019","unstructured":"Benoit, K., Munger, K., & Spirling, A. (2019). Measuring and explaining political sophistication through textual complexity. American Journal of Political Science, 63(2), 491\u2013508.","journal-title":"American Journal of Political Science"},{"issue":"3","key":"469_CR28","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1093\/pan\/mps028","volume":"21","author":"J Grimmer","year":"2013","unstructured":"Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267\u2013297.","journal-title":"Political Analysis"},{"issue":"10","key":"469_CR29","doi-asserted-by":"publisher","first-page":"1531","DOI":"10.1177\/0956797615594620","volume":"26","author":"P Barber\u00e1","year":"2015","unstructured":"Barber\u00e1, P., Jost, J. T., Nagler, J., Tucker, J. A., & Bonneau, R. (2015). Tweeting from left to right: Is online political communication more than an echo chamber? Psychological Science, 26(10), 1531\u20131542.","journal-title":"Psychological Science"},{"key":"469_CR30","unstructured":"Troboukis, T., Kiki, K., Galanopoulos, A., Sermpezis, P., Karamanidis, S., Dimitriadis, I., & Vakali, A. (2024). Towards hybrid intelligence in journalism: Findings and lessons learnt from a collaborative analysis of greek political rhetoric by ChatGPT and humans. https:\/\/arxiv.org\/abs\/2410.13400."},{"key":"469_CR31","unstructured":"iMEdD Lab (2023). Elections 2023 - iMEdD Lab. https:\/\/lab.imedd.org\/en\/elections-2023\/."},{"key":"469_CR32","unstructured":"Kiki, K., Troboukis, T., Galanopoulos, A., Sermpezis, P., Karamanidis, S., & Dimitriadis, I. (2023). How we analyze the campaign speeches of political leaders. https:\/\/lab.imedd.org\/en\/pos-analyoume-tis-proeklogikes-omilies-ton-politikon-archigon\/. iMEdD Lab."},{"key":"469_CR33","unstructured":"Galanopoulos, A. (2023). Populism in pre-election political discourse in Greece. https:\/\/lab.imedd.org\/en\/populism-in-pre-election-political-discourse-in-greece\/. iMEdD Lab."},{"key":"469_CR34","unstructured":"Galanopoulos, A. (2023). Detecting polarization in the pre-election political discourse in Greece. https:\/\/lab.imedd.org\/en\/polarization-greek-political-speech\/. iMEdD Lab."},{"issue":"9","key":"469_CR35","first-page":"1","volume":"55","author":"P Liu","year":"2023","unstructured":"Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1\u201335.","journal-title":"ACM Computing Surveys"},{"key":"469_CR36","doi-asserted-by":"crossref","unstructured":"Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., & Chung, W., et al. (2023). A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. Preprint retrieved from arXiv:2302.04023.","DOI":"10.18653\/v1\/2023.ijcnlp-main.45"},{"key":"469_CR37","doi-asserted-by":"publisher","DOI":"10.1037\/11491-005","volume-title":"The Proof and Measurement of Association Between Two Things","author":"C Spearman","year":"1961","unstructured":"Spearman, C. (1961). The Proof and Measurement of Association Between Two Things. Illinois: Appleton-Century-Crofts."},{"key":"469_CR38","volume-title":"Mathematical Methods of Statistics","author":"H Cram\u00e9r","year":"1999","unstructured":"Cram\u00e9r, H. (1999). Mathematical Methods of Statistics (Vol. 26). Princeton: Princeton University Press."}],"container-title":["Journal of Computational Social Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42001-026-00469-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s42001-026-00469-0","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42001-026-00469-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,27]],"date-time":"2026-05-27T02:16:29Z","timestamp":1779848189000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s42001-026-00469-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":38,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2026,5]]}},"alternative-id":["469"],"URL":"https:\/\/doi.org\/10.1007\/s42001-026-00469-0","relation":{},"ISSN":["2432-2717","2432-2725"],"issn-type":[{"value":"2432-2717","type":"print"},{"value":"2432-2725","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,14]]},"assertion":[{"value":"6 September 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 February 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 March 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declared no potential Conflict of interest with respect to the research, authorship, and\/or publication of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"36"}}