{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T11:22:48Z","timestamp":1777893768193,"version":"3.51.4"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","license":[{"start":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T00:00:00Z","timestamp":1777593600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T00:00:00Z","timestamp":1777593600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002341","name":"Research Council of Finland","doi-asserted-by":"publisher","award":["356314"],"award-info":[{"award-number":["356314"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005222","name":"University of Jyv\u00e4skyl\u00e4","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005222","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Scientometrics"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Some performance-based research funding systems rely on expert-assigned journal rankings to allocate resources and guide research evaluation. In Finland, the JuFo system provides journal rankings, determined by experts who assess journals using available metadata, such as bibliometric indicators, alongside qualitative judgment. While prior work has explored machine learning approaches to approximate these rankings, the recent emergence of large language models (LLMs) offers new possibilities for automated, data-driven evaluation. In this study, we examine how well LLMs can replicate JuFo rankings when given the same structured information available to experts, including citation metrics, disciplinary assignments, and publisher metadata. We systematically compare LLM predictions to expert-assigned JuFo ranks using a confusion-matrix analysis to identify cases of alignment and deviation. Our research addresses two key questions: (1) how accurately LLMs estimate journal rankings, and (2) in which situations their predictions diverge from expert judgments and which factors explain these discrepancies. Our findings show that LLMs approximate expert-assigned rankings with high overall accuracy, with most errors occurring between adjacent levels. However, their performance varies systematically across disciplines, and they tend to under-predict top-tier journals, particularly in social sciences and humanities fields.<\/jats:p>","DOI":"10.1007\/s11192-026-05644-8","type":"journal-article","created":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T07:01:42Z","timestamp":1777618902000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Comparing LLM and expert assessments of journal quality"],"prefix":"10.1007","author":[{"given":"Mirka","family":"Saarela","sequence":"first","affiliation":[]},{"given":"Janne","family":"P\u00f6l\u00f6nen","sequence":"additional","affiliation":[]},{"given":"Anna-Kaarina","family":"Linna","sequence":"additional","affiliation":[]},{"given":"Leena","family":"Wahlfors","sequence":"additional","affiliation":[]},{"given":"Ant\u00f3nio","family":"Correia","sequence":"additional","affiliation":[]},{"given":"Tommi","family":"K\u00e4rkk\u00e4inen","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2026,5,1]]},"reference":[{"issue":"4","key":"5644_CR1","doi-asserted-by":"publisher","first-page":"20","DOI":"10.2478\/jdis-2018-0018","volume":"3","author":"K Aagaard","year":"2018","unstructured":"Aagaard, K. (2018). Performance-based research funding in Denmark: The adoption and translation of the Norwegian model. Journal of Data and Information Science, 3(4), 20\u201330. https:\/\/doi.org\/10.2478\/jdis-2018-0018","journal-title":"Journal of Data and Information Science"},{"issue":"3","key":"5644_CR2","doi-asserted-by":"publisher","first-page":"767","DOI":"10.1007\/s11192-012-0632-x","volume":"92","author":"P Ahlgren","year":"2012","unstructured":"Ahlgren, P., Colliander, C., & Persson, O. (2012). Field normalized citation rates, field normalized journal impact and Norwegian weights for allocation of university research funds. Scientometrics, 92(3), 767\u2013780. https:\/\/doi.org\/10.1007\/s11192-012-0632-x","journal-title":"Scientometrics"},{"key":"5644_CR3","doi-asserted-by":"publisher","unstructured":"Cantone, G. G., Zheng, E.- T., Tomaselli, V., & Nightingale, P. (2025). Estimation of disciplinary similarity with large language models. Scientometrics, 130(10), 5345\u20135373. https:\/\/doi.org\/10.1007\/s11192-025-05385-0","DOI":"10.1007\/s11192-025-05385-0"},{"issue":"1","key":"5644_CR4","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2025.104336","volume":"63","author":"J Chen","year":"2026","unstructured":"Chen, J., Wu, M., Liu, Q., & Zhang, Y. (2026). Explainable prediction of knowledge recombination: a synergized method with heterogeneous hypergraph learning and large language models. Information Processing & Management, 63(1), Article 104336. https:\/\/doi.org\/10.1016\/j.ipm.2025.104336","journal-title":"Information Processing & Management"},{"issue":"4","key":"5644_CR5","doi-asserted-by":"publisher","first-page":"2469","DOI":"10.1007\/s11192-024-04939-y","volume":"129","author":"J de Winter","year":"2024","unstructured":"de Winter, J. (2024). Can ChatGPT be used to predict citation counts, readership, and social media interaction? An exploration among 2222 scientific abstracts. Scientometrics, 129(4), 2469\u20132487. https:\/\/doi.org\/10.1007\/s11192-024-04939-y","journal-title":"Scientometrics"},{"issue":"2","key":"5644_CR6","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1016\/j.respol.2011.09.007","volume":"41","author":"D Hicks","year":"2012","unstructured":"Hicks, D. (2012). Performance-based university research funding systems. Research Policy, 41(2), 251\u2013261. https:\/\/doi.org\/10.1016\/j.respol.2011.09.007","journal-title":"Research Policy"},{"issue":"7548","key":"5644_CR7","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1038\/520429a","volume":"520","author":"D Hicks","year":"2015","unstructured":"Hicks, D., Wouters, P., Waltman, L., De Rijcke, S., & Rafols, I. (2015). The Leiden Manifesto for research metrics. Nature News, 520(7548), 429. https:\/\/doi.org\/10.1038\/520429a","journal-title":"Nature News"},{"issue":"1","key":"5644_CR8","doi-asserted-by":"publisher","first-page":"463","DOI":"10.1007\/s11192-018-2711-0","volume":"116","author":"E Kulczycki","year":"2018","unstructured":"Kulczycki, E., Engels, T. C., P\u00f6l\u00f6nen, J., Bruun, K., Du\u0161kov\u00e1, M., Guns, R., et al. (2018). Publication patterns in the social sciences and humanities: Evidence from eight European countries. Scientometrics, 116(1), 463\u2013486. https:\/\/doi.org\/10.1007\/s11192-018-2711-0","journal-title":"Scientometrics"},{"issue":"12","key":"5644_CR9","doi-asserted-by":"publisher","first-page":"1741","DOI":"10.1002\/asi.24706","volume":"73","author":"E Kulczycki","year":"2022","unstructured":"Kulczycki, E., Huang, Y., Zuccala, A. A., Engels, T. C., Ferrara, A., Guns, R., & Zhang, L. (2022). Uses of the journal impact factor in national journal rankings in China and Europe. Journal of the American Society for Information Science, 73(12), 1741\u20131754. https:\/\/doi.org\/10.1002\/asi.24706","journal-title":"Journal of the American Society for Information Science"},{"issue":"2","key":"5644_CR10","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3697010","volume":"34","author":"S Ouyang","year":"2025","unstructured":"Ouyang, S., Zhang, J. M., Harman, M., & Wang, M. (2025). An empirical study of the non-determinism of ChatGPT in code generation. ACM Transactions on Software Engineering and Methodology, 34(2), 1\u201328. https:\/\/doi.org\/10.1145\/3697010","journal-title":"ACM Transactions on Software Engineering and Methodology"},{"issue":"4","key":"5644_CR11","doi-asserted-by":"publisher","first-page":"31","DOI":"10.2478\/jdis-2018-0019","volume":"3","author":"J P\u00f6l\u00f6nen","year":"2018","unstructured":"P\u00f6l\u00f6nen, J. (2018). Applications of, and experiences with, the Norwegian Model in Finland. Journal of Data and Information Science, 3(4), 31\u201344. https:\/\/doi.org\/10.2478\/jdis-2018-0019","journal-title":"Journal of Data and Information Science"},{"key":"5644_CR12","doi-asserted-by":"crossref","unstructured":"P\u00f6l\u00f6nen, J., Guns, R., Engels, T.C. (2023). Journal metrics as predictors of Research Excellence Framework 2021 results: Comparison of impact factor quartiles and Finnish expert-ratings. 27th International Conference on Science, Technology and Innovation Indicators (STI 2023), 27-29 September, 2023, Leiden, Netherlands (pp. 1\u201311). https:\/\/repository.uantwerpen.be\/docman\/irua\/226ac5motoMc9","DOI":"10.55835\/643e529c0b149e8673ee2d95"},{"issue":"1","key":"5644_CR13","doi-asserted-by":"publisher","first-page":"50","DOI":"10.2478\/jdis-2021-0004","volume":"6","author":"J P\u00f6l\u00f6nen","year":"2021","unstructured":"P\u00f6l\u00f6nen, J., Guns, R., Kulczycki, E., Sivertsen, G., Engels, T. C., & Polonen, J. (2021). National lists of scholarly publication channels: an overview and recommendations for their construction and maintenance. Journal of Data and Information Science, 6(1), 50\u201386. https:\/\/doi.org\/10.2478\/jdis-2021-0004","journal-title":"Journal of Data and Information Science"},{"key":"5644_CR14","doi-asserted-by":"publisher","unstructured":"Pylv\u00e4n\u00e4inen, E., & P\u00f6l\u00f6nen, J. (2023). Publication forum review of ratings in 2022 (Tech. Rep. No. 17). Federation of Finnish Learned Societies. https:\/\/doi.org\/10.23847\/tsv.618","DOI":"10.23847\/tsv.618"},{"issue":"2","key":"5644_CR15","doi-asserted-by":"publisher","DOI":"10.1016\/j.dsx.2024.102946","volume":"18","author":"A Saad","year":"2024","unstructured":"Saad, A., Jenko, N., Ariyaratne, S., Birch, N., Iyengar, K. P., Davies, A. M., & Botchu, R. (2024). Exploring the potential of ChatGPT in the peer review process: An observational study. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 18(2), Article 102946. https:\/\/doi.org\/10.1016\/j.dsx.2024.102946","journal-title":"Diabetes & Metabolic Syndrome: Clinical Research & Reviews"},{"key":"5644_CR16","doi-asserted-by":"publisher","unstructured":"Saarela, M. (2024). On the relation of causality-versus correlation-based feature selection on model fairness. Proceedings of the 39th ACM\/SIGAPP Symposium on Applied Computing (pp. 56\u201364). https:\/\/doi.org\/10.1145\/3605098.3636018","DOI":"10.1145\/3605098.3636018"},{"key":"5644_CR17","doi-asserted-by":"crossref","unstructured":"Saarela, M., Correia, A., K\u00e4rkk\u00e4inen, T. (2025). Explainable and interactive scientometrics with large language models and knowledge graphs. 2025 9th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) (p. 1\u20136). https:\/\/ieeexplore.ieee.org\/document\/11268107","DOI":"10.1109\/ISMSIT67332.2025.11268107"},{"issue":"2","key":"5644_CR18","doi-asserted-by":"publisher","DOI":"10.1016\/j.joi.2020.101008","volume":"14","author":"M Saarela","year":"2020","unstructured":"Saarela, M., & K\u00e4rkk\u00e4inen, T. (2020). Can we automate expert-based journal rankings? Analysis of the Finnish publication indicator. Journal of Informetrics, 14(2), Article 101008. https:\/\/doi.org\/10.1016\/j.joi.2020.101008","journal-title":"Journal of Informetrics"},{"issue":"3","key":"5644_CR19","doi-asserted-by":"publisher","first-page":"693","DOI":"10.1016\/j.joi.2016.03.004","volume":"10","author":"M Saarela","year":"2016","unstructured":"Saarela, M., K\u00e4rkk\u00e4inen, T., Lahtonen, T., & Rossi, T. (2016). Expert-based versus citation-based ranking of scholarly and scientific publication channels. Journal of Informetrics, 10(3), 693\u2013718. https:\/\/doi.org\/10.1016\/j.joi.2016.03.004","journal-title":"Journal of Informetrics"},{"issue":"19","key":"5644_CR20","doi-asserted-by":"publisher","first-page":"8884","DOI":"10.3390\/app14198884","volume":"14","author":"M Saarela","year":"2024","unstructured":"Saarela, M., & Podgorelec, V. (2024). Recent applications of explainable AI (XAI): A systematic literature review. Applied Sciences, 14(19), 8884. https:\/\/doi.org\/10.3390\/app14198884","journal-title":"Applied Sciences"},{"issue":"3","key":"5644_CR21","doi-asserted-by":"publisher","first-page":"364","DOI":"10.1057\/eps.2009.19","volume":"8","author":"JW Schneider","year":"2009","unstructured":"Schneider, J. W. (2009). An outline of the bibliometric indicator used for performance-based funding of research institutions in Norway. European Political Science, 8(3), 364\u2013378. https:\/\/doi.org\/10.1057\/eps.2009.19","journal-title":"European Political Science"},{"key":"5644_CR22","doi-asserted-by":"publisher","DOI":"10.1038\/s43588-025-00906-6","author":"E Shao","year":"2025","unstructured":"Shao, E., Wang, Y., Qian, Y., Pan, Z., Liu, H., & Wang, D. (2025). SciSciGPT: Advancing human-AI collaboration in the science of science. Nature Computational Science. https:\/\/doi.org\/10.1038\/s43588-025-00906-6","journal-title":"Nature Computational Science"},{"issue":"2","key":"5644_CR23","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1007\/s11192-016-1845-1","volume":"107","author":"G Sivertsen","year":"2016","unstructured":"Sivertsen, G. (2016). Patterns of internationalization and criteria for research assessment in the social sciences and humanities. Scientometrics, 107(2), 357\u2013368. https:\/\/doi.org\/10.1007\/s11192-016-1845-1","journal-title":"Scientometrics"},{"issue":"4","key":"5644_CR24","doi-asserted-by":"publisher","first-page":"3","DOI":"10.2478\/jdis-2018-0017","volume":"3","author":"G Sivertsen","year":"2018","unstructured":"Sivertsen, G. (2018). The Norwegian model in Norway. Journal of Data and Information Science, 3(4), 3\u201319. https:\/\/doi.org\/10.2478\/jdis-2018-0017","journal-title":"Journal of Data and Information Science"},{"issue":"2","key":"5644_CR25","doi-asserted-by":"publisher","first-page":"61","DOI":"10.2478\/dim-2019-0008","volume":"3","author":"G Sivertsen","year":"2019","unstructured":"Sivertsen, G. (2019). Understanding and evaluating research and scholarly publishing in the Social Sciences and Humanities (SSH). Data and Information Management, 3(2), 61\u201371. https:\/\/doi.org\/10.2478\/dim-2019-0008","journal-title":"Data and Information Management"},{"key":"5644_CR26","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-69580-3_11","author":"G Sivertsen","year":"2024","unstructured":"Sivertsen, G., & Aagaard, K. (2024). Designing performance-based research funding systems. Challenges in Research Policy: Evidence-based policy briefs with recommendations. https:\/\/doi.org\/10.1007\/978-3-031-69580-3_11","journal-title":"Challenges in Research Policy: Evidence-based policy briefs with recommendations"},{"issue":"1","key":"5644_CR27","doi-asserted-by":"publisher","first-page":"7","DOI":"10.2478\/jdis-2025-0011","volume":"10","author":"M Thelwall","year":"2025","unstructured":"Thelwall, M. (2025a). Evaluating research quality with large language models: An analysis of ChatGPT\u2019s effectiveness with different settings and inputs. Journal of Data and Information Science, 10(1), 7\u201325. https:\/\/doi.org\/10.2478\/jdis-2025-0011","journal-title":"Journal of Data and Information Science"},{"issue":"10","key":"5644_CR28","doi-asserted-by":"publisher","first-page":"5309","DOI":"10.1007\/s11192-025-05361-8","volume":"130","author":"M Thelwall","year":"2025","unstructured":"Thelwall, M. (2025b). Research quality evaluation by AI in the era of large language models: advantages, disadvantages, and systemic effects-An opinion paper. Scientometrics, 130(10), 5309\u20135321. https:\/\/doi.org\/10.1007\/s11192-025-05361-8","journal-title":"Scientometrics"},{"key":"5644_CR29","doi-asserted-by":"publisher","unstructured":"Thelwall, M. (2025c). Responsible uses of large language models for research evaluation. Proceedings of the 20th International Conference on Scientometrics & Informetrics (ISSI 2025) (pp. 71\u201380). International Society for Scientometrics and Informetrics. https:\/\/doi.org\/10.51408\/issi2025_156","DOI":"10.51408\/issi2025_156"},{"issue":"4","key":"5644_CR30","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2025.104123","volume":"62","author":"M Thelwall","year":"2025","unstructured":"Thelwall, M., Jiang, X., & Bath, P. A. (2025). Estimating the quality of published medical research with ChatGPT. Information Processing & Management, 62(4), Article 104123. https:\/\/doi.org\/10.1016\/j.ipm.2025.104123","journal-title":"Information Processing & Management"},{"issue":"2","key":"5644_CR31","doi-asserted-by":"publisher","first-page":"106","DOI":"10.2478\/jdis-2025-0016","volume":"10","author":"M Thelwall","year":"2025","unstructured":"Thelwall, M., & Kousha, K. (2025). Journal quality factors from ChatGPT: More meaningful than impact factors? Journal of Data and Information Science, 10(2), 106\u2013123. https:\/\/doi.org\/10.2478\/jdis-2025-0016","journal-title":"Journal of Data and Information Science"},{"key":"5644_CR32","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-025-05393-0","author":"M Thelwall","year":"2025","unstructured":"Thelwall, M., & Kurt, Z. (2025). Research evaluation with ChatGPT: Is it age, country, length, or field biased? Scientometrics. https:\/\/doi.org\/10.1007\/s11192-025-05393-0","journal-title":"Scientometrics"},{"key":"5644_CR33","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-026-05585-2","author":"M Thelwall","year":"2025","unstructured":"Thelwall, M., & Mohammadi, E. (2025). Can small and reasoning large language models score journal articles for research quality and do averaging and few-shot help? Scientometrics. https:\/\/doi.org\/10.1007\/s11192-026-05585-2","journal-title":"Scientometrics"},{"key":"5644_CR34","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-025-05287-1","author":"M Thelwall","year":"2025","unstructured":"Thelwall, M., & Yaghi, A. (2025). Evaluating the predictive capacity of ChatGPT for academic peer review outcomes across multiple platforms. Scientometrics. https:\/\/doi.org\/10.1007\/s11192-025-05287-1","journal-title":"Scientometrics"},{"issue":"1","key":"5644_CR35","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1093\/scipol\/scy041","volume":"46","author":"T Zacharewicz","year":"2019","unstructured":"Zacharewicz, T., Lepori, B., Reale, E., & Jonkers, K. (2019). Performance-based research funding in EU Member States-a comparative assessment. Science and Public Policy, 46(1), 105\u2013115. https:\/\/doi.org\/10.1093\/scipol\/scy041","journal-title":"Science and Public Policy"},{"key":"5644_CR36","doi-asserted-by":"crossref","unstructured":"Zhou, R., Chen, L., Yu, K. (2024). Is LLM a reliable reviewer? A comprehensive evaluation of LLM on automatic paper reviewing tasks. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 9340\u20139351). https:\/\/aclanthology.org\/2024.lrec-main.816\/","DOI":"10.63317\/48d359hjdvog"}],"container-title":["Scientometrics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11192-026-05644-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11192-026-05644-8","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11192-026-05644-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T07:01:44Z","timestamp":1777618904000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11192-026-05644-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,5,1]]},"references-count":36,"alternative-id":["5644"],"URL":"https:\/\/doi.org\/10.1007\/s11192-026-05644-8","relation":{},"ISSN":["0138-9130","1588-2861"],"issn-type":[{"value":"0138-9130","type":"print"},{"value":"1588-2861","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,5,1]]},"assertion":[{"value":"25 January 2026","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 April 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 May 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}