{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T00:47:07Z","timestamp":1781052427348,"version":"3.54.1"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,12,18]],"date-time":"2024-12-18T00:00:00Z","timestamp":1734480000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,12,18]],"date-time":"2024-12-18T00:00:00Z","timestamp":1734480000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100010663","name":"H2020 European Research Council","doi-asserted-by":"publisher","award":["883121"],"award-info":[{"award-number":["883121"]}],"id":[{"id":"10.13039\/100010663","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006447","name":"University of Zurich","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006447","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Comput Soc Sc"],"published-print":{"date-parts":[[2025,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>This paper studies the performance of open-source Large Language Models (LLMs) in text classification tasks typical for political science research. By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis and to establish a baseline performance benchmark that demonstrates the models\u2019 effectiveness. Specifically, we conduct an assessment of both zero-shot and fine-tuned LLMs across a range of text annotation tasks using news articles and tweets datasets. Our analysis shows that fine-tuning improves the performance of open-source LLMs, allowing them to match or even surpass zero-shot GPT<jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$-$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>-<\/mml:mo>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula>3.5 and GPT-4, though still lagging behind fine-tuned GPT<jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$-$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>-<\/mml:mo>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula>3.5. We further establish that fine-tuning is preferable to few-shot training with a relatively modest quantity of annotated text. Our findings show that fine-tuned open-source LLMs can be effectively deployed in a broad spectrum of text annotation applications. We provide a Python notebook facilitating the application of LLMs in text annotation for other researchers.<\/jats:p>","DOI":"10.1007\/s42001-024-00345-9","type":"journal-article","created":{"date-parts":[[2024,12,18]],"date-time":"2024-12-18T08:16:14Z","timestamp":1734509774000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":53,"title":["Open-source LLMs for text annotation: a practical guide for model setting and fine-tuning"],"prefix":"10.1007","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6696-6471","authenticated-orcid":false,"given":"Meysam","family":"Alizadeh","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ma\u00ebl","family":"Kubli","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zeynab","family":"Samei","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shirin","family":"Dehghani","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mohammadmasiha","family":"Zahedivafa","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Juan D.","family":"Bermeo","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Maria","family":"Korobeynikova","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fabrizio","family":"Gilardi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,12,18]]},"reference":[{"key":"345_CR1","unstructured":"Alghisi, S., Rizzoli, M., Gabriel, R., Seyed MM., & Giuseppe R. (2024) Should we fine-tune or rag? evaluating different techniques to adapt llms for dialogue. arXiv preprint arXiv:2406.06399"},{"key":"345_CR2","doi-asserted-by":"crossref","unstructured":"Alizadeh, M., Gilardi, F., Emma H., K.\u00fcser, K.J., Kubli, M., & Marchal, N. (2022) Content moderation as a political issue: the twitter discourse around trump\u2019s ban. Journal of Quantitative Description: Digital Media, 2,","DOI":"10.51685\/jqd.2022.023"},{"issue":"1","key":"345_CR3","doi-asserted-by":"publisher","first-page":"13703","DOI":"10.1038\/s41598-023-40716-2","volume":"13","author":"Meysam Alizadeh","year":"2023","unstructured":"Alizadeh, Meysam, Hoes, Emma, & Gilardi, Fabrizio. (2023). Tokenization of social media engagements increases the sharing of false (and other) news but penalization moderates it. Scientific Reports, 13(1), 13703.","journal-title":"Scientific Reports"},{"issue":"1","key":"345_CR4","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1017\/pan.2020.8","volume":"29","author":"Pablo Barber\u00e1","year":"2021","unstructured":"Barber\u00e1, Pablo, Boydstun, Amber E., Linn, Suzanna, McMahon, Ryan, & Nagler, Jonathan. (2021). Automated text classification of news articles: a practical guide. Political Analysis, 29(1), 19\u201342.","journal-title":"Political Analysis"},{"key":"345_CR5","unstructured":"Binz, M., & Eric S. (2023) Turning large language models into cognitive models. arXiv preprint arXiv:2306.03917."},{"key":"345_CR6","first-page":"1877","volume":"33","author":"Tom Brown","year":"2020","unstructured":"Brown, Tom, Mann, Benjamin, Ryder, Nick, Subbiah, Melanie, Kaplan, Jared D., Dhariwal, Prafulla, Neelakantan, Arvind, Shyam, Pranav, Sastry, Girish, Askell, Amanda, et al. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877\u20131901.","journal-title":"Advances in neural information processing systems"},{"key":"345_CR7","doi-asserted-by":"crossref","unstructured":"Card, D., Boydstun, A., Gross, J.H., Resnik, P., & Smith, N.A. (2015). The media frames corpus: annotations of frames across issues. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 2: short papers), 438-444.","DOI":"10.3115\/v1\/P15-2072"},{"key":"345_CR8","unstructured":"Chung, H., Hou, L., Longpre, S., Zoph, B., YiTay, F., William, L., Eric, X., Dehghani, M., Brahma, S. et al. (2022). Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416."},{"key":"345_CR9","unstructured":"Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023) Qlora: efficient finetuning of quantized llms. arXiv: 2305.14314 [cs.LG]."},{"key":"345_CR10","unstructured":"Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2024). Qlora: efficient finetuning of quantized llms. Advances in Neural Information Processing Systems 36."},{"key":"345_CR11","doi-asserted-by":"crossref","unstructured":"Ding, B., Qin, C., Liu, L., Chia, YK., Joty, S., Li, B., & Bing, L. (2023). Is GPT-3 a Good Data Annotator? In Proceedings of the 61th annual meeting of the association for computational linguistics. June. Accessed June 30, 2023.","DOI":"10.18653\/v1\/2023.acl-long.626"},{"key":"345_CR12","doi-asserted-by":"publisher","first-page":"104478","DOI":"10.1016\/j.jbi.2023.104478","volume":"145","author":"Johann Frei","year":"2023","unstructured":"Frei, Johann, & Kramer, Frank. (2023). Annotated dataset creation through large language models for non-english medical nlp. Journal of Biomedical Informatics, 145, 104478.","journal-title":"Journal of Biomedical Informatics"},{"issue":"30","key":"345_CR13","doi-asserted-by":"publisher","first-page":"e2305016120","DOI":"10.1073\/pnas.2305016120","volume":"120","author":"Fabrizio Gilardi","year":"2023","unstructured":"Gilardi, Fabrizio, Alizadeh, Meysam, & Kubli, Ma\u00ebl. (2023). ChatGPT outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, 120(30), e2305016120.","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"345_CR14","unstructured":"He, J., Zhou, C, Ma, X, Berg-Kirkpatrick, T., & Neubig, G. (2021). Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366."},{"key":"345_CR15","unstructured":"Hoes, E, Altay, S, & Bermeo, J. (2023). Using ChatGPT to Fight Misinformation: ChatGPT Nails 72% of 12,000 Verified Claims."},{"key":"345_CR16","unstructured":"Hoes, E., Altay, S., & Bermeo, J. n.d. Using chatgpt to fight misinformation: chatgpt nails 72% of 12,000 verified claims."},{"key":"345_CR17","unstructured":"Hoffmann, J, Borgeaud, S., Mensch, A, Buchatskaya, E, Cai, Trevor, R, Eliza, C, Diego de L., Hendricks, L.A., Welbl, J., Clark, A., et al.(2022). Training compute-optimal large language models. arXiv preprint arXiv:2203.15556."},{"key":"345_CR18","doi-asserted-by":"crossref","unstructured":"Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146.","DOI":"10.18653\/v1\/P18-1031"},{"key":"345_CR19","unstructured":"Hu, E.J., Shen, Y, PhillipWallis, A.Z., Zeyuan, L., Yuanzhi, S.W., LuWang, & Chen, W. (2021). Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685."},{"key":"345_CR20","doi-asserted-by":"crossref","unstructured":"Hu, Z., Lan, Y., Wang, L., Xu, W., Lim, E.P., Lee, R.K.W., Bing, L., & Poria, S. (2023). Llm-adapters: an adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933.","DOI":"10.18653\/v1\/2023.emnlp-main.319"},{"key":"345_CR21","unstructured":"Kojima, T., Gu, S.S,, Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916."},{"key":"345_CR22","unstructured":"K\u00f6pf, Andreas, K., Yannic, von R\u00fctte, Dimitri, A., Sotiris, T., Zhi-Rui, S., Keith, B., Abdullah, et al. (2023). Openassistant conversations - democratizing large language model alignment. arXiv: 2304.07327 [cs.CL]."},{"key":"345_CR23","doi-asserted-by":"crossref","unstructured":"Liesenfeld, A., Lopez, A., & Dingemanse, M. (2023). Opening up chatgpt: tracking openness, transparency, and accountability in instruction-tuned text generators. In Proceedings of the 5th international conference on conversational user interfaces, 1-6.","DOI":"10.1145\/3571884.3604316"},{"issue":"9","key":"345_CR24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3560815","volume":"55","author":"Pengfei Liu","year":"2023","unstructured":"Liu, Pengfei, Yuan, Weizhe, Jinlan, Fu., Jiang, Zhengbao, Hayashi, Hiroaki, & Neubig, Graham. (2023). Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1\u201335.","journal-title":"ACM Computing Surveys"},{"key":"345_CR25","doi-asserted-by":"crossref","unstructured":"Marchal, N., Hoes, E., Kl\u00fcser, K.J., Hamborg, F., Alizadeh, M., Kubli, M., & Katzenbach, C. (2024). How negative media coverage impacts platform governance: evidence from facebook, twitter, and youtube. Political Communication, 1-19.","DOI":"10.1080\/10584609.2024.2377992"},{"key":"345_CR26","doi-asserted-by":"publisher","DOI":"10.1515\/9780691249643","volume-title":"Ai snake oil: what artificial intelligence can do, what it can\u2019t, and how to tell the difference","author":"Arvind Narayanan","year":"2024","unstructured":"Narayanan, Arvind, & Kapoor, Sayash. (2024). Ai snake oil: what artificial intelligence can do, what it can\u2019t, and how to tell the difference. Princeton University Press."},{"issue":"1","key":"345_CR27","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1038\/s42256-023-00783-6","volume":"6","author":"\u00c9tienne Ollion","year":"2024","unstructured":"Ollion, \u00c9tienne., Shen, Rubing, Macanovic, Ana, & Chatelain, Arnault. (2024). The dangers of using proprietary LLMs for research. Nature Machine Intelligence, 6(1), 4\u20135.","journal-title":"Nature Machine Intelligence"},{"key":"345_CR28","unstructured":"Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C. et al.( 2022). Training language models to follow instructions with human feedback. In Advances in neural information processing systems, edited by S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, 35:27730-27744. Curran Associates, Inc. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/file\/b1efde53be364a73914f58805a001731-Paper-Conference.pdf."},{"key":"345_CR29","unstructured":"Pangakis, N., Samuel W., & Fasching, N. (2023). Automated annotation with generative ai requires validation. arXiv preprint arXiv:2306.00176."},{"key":"345_CR30","doi-asserted-by":"crossref","unstructured":"Paul, M., Maglaras, L., Ferrag, Mohamed A., & AlMomani, I. (2023). Digitization of healthcare sector: a study on privacy and security concerns. ICT Express.","DOI":"10.1016\/j.icte.2023.02.007"},{"issue":"121","key":"345_CR31","first-page":"54","volume":"3","author":"PP Ray","year":"2023","unstructured":"Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3(121), 54.","journal-title":"Internet of Things and Cyber-Physical Systems"},{"issue":"206","key":"345_CR32","first-page":"215","volume":"1","author":"Cynthia Rudin","year":"2019","unstructured":"Rudin, Cynthia. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell, 1(206), 215.","journal-title":"Nat Mach Intell"},{"key":"345_CR33","doi-asserted-by":"crossref","unstructured":"Sarti, G., Feldhus, N., Sickert, L., DerWal, Oskar Van, Nissim, M., & Bisazza, A. (2023). Inseq: an interpretability toolkit for sequence generation models. arXiv preprint arXiv:2302.13942.","DOI":"10.18653\/v1\/2023.acl-demo.40"},{"key":"345_CR34","unstructured":"Schick, T., Dwivedi-Yu, J., Dess\u0131, R., Raileanu, R., Lomeli, M., Hambro, E., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2024). Toolformer: language models can teach themselves to use tools. Advances in Neural Information Processing Systems 36."},{"key":"345_CR35","unstructured":"Shen, Y., Song, K., Tan, X., Li, D., Lu, W., & Zhuang, Y. (2024). Hugginggpt: solving ai tasks with chatgpt and its friends in hugging face. Advances in Neural Information Processing Systems 36."},{"issue":"7957","key":"345_CR36","doi-asserted-by":"publisher","first-page":"413","DOI":"10.1038\/d41586-023-01295-4","volume":"616","author":"Arthur Spirling","year":"2023","unstructured":"Spirling, Arthur. (2023). Why open-source generative AI models are an ethical way forward for science. Nature, 616(7957), 413\u2013413.","journal-title":"Nature"},{"key":"345_CR37","unstructured":"T\u00f6rnberg, P. (2023a). ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning."},{"key":"345_CR38","unstructured":"T\u00f6rnberg, P. (2023b). Chatgpt-4 outperforms experts and crowd workers in annotating political twitter messages with zero-shot learning. arXiv preprint arXiv:2304.06588."},{"key":"345_CR39","unstructured":"Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., & Bhosale, S., et al.( 2023). Llama 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288."},{"issue":"7947","key":"345_CR40","doi-asserted-by":"publisher","first-page":"224","DOI":"10.1038\/d41586-023-00288-7","volume":"614","author":"Van Dis","year":"2023","unstructured":"Dis, Van, Eva, A. M., Bollen, Johan, Zuidema, Willem, van Rooij, Robert, & Bockting, Claudi L. (2023). Chatgpt: five priorities for research. Nature, 614(7947), 224\u2013226.","journal-title":"Nature"},{"key":"345_CR41","doi-asserted-by":"publisher","unstructured":"Wang, Z., Wohlwend, J., & Lei, T. (2020). Structured pruning of large language models. In Proceedings of the 2020 conference on empirical methods in natural language processing (emnlp), 6151-6162. Online: Association for Computational Linguistics, November. https:\/\/doi.org\/10.18653\/v1\/2020.emnlp-main.496.","DOI":"10.18653\/v1\/2020.emnlp-main.496"},{"key":"345_CR42","unstructured":"Wei, J., Bosma, M., Zhao, V.Y., Guu, K., Yu, Adams W., Lester, B., Du, N., Dai, A.M., & Le, Q.V. (2022). Finetuned language models are zero-shot learners. arXiv: 2109.01652 [cs.CL]."},{"key":"345_CR43","unstructured":"Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903."},{"key":"345_CR44","unstructured":"Werra, Leandro von, Belkada, Y., Tunstall, L., Beeching, E., Thrush, T., Lambert, N., & Huang, S. (2020). Trl: transformer reinforcement learning. https:\/\/github.com\/huggingface\/trl."},{"key":"345_CR45","unstructured":"Yang, J., Jin, H., Tang, R., Han, X., Feng, Q., Jiang, H., Yin, B., & Hu, X. (2023). Harnessing the power of llms in practice: a survey on chatgpt and beyond. arXiv preprint arXiv:2304.13712."},{"key":"345_CR46","unstructured":"Yang, W., Li, C., Zhang, J., & Zong, C. (2023). Bigtrans: augmenting large language models with multilingual translation capability over 100 languages. arXiv preprint arXiv:2305.18098."},{"key":"345_CR47","unstructured":"Zhang, B., Liu, Z., Cherry, C., & Firat, O. (2024). When scaling meets llm finetuning: the effect of data, model and finetuning method. arXiv preprint arXiv:2402.17193."},{"key":"345_CR48","unstructured":"Zhang, Z., Zhang, A., Li, M., Zhao, H., Karypis, G., & Smola, A. (2023). Multimodal chain-of-thought reasoning in language models. arXiv preprint arXiv:2302.00923."},{"key":"345_CR49","unstructured":"Zhu, Y., Zhang, P., Haq, EU., Hui, P., & Tyson, G. (2023). Can ChatGPT Reproduce Human-Generated Labels? A Study of Social Computing Tasks."},{"key":"345_CR50","doi-asserted-by":"crossref","unstructured":"Ziems, C., Held, W., Shaikh, O., Chen, J., Zhang, Z., & Yang, D. (2023). Can large language models transform computational social science? arXiv preprint arXiv:2305.03514.","DOI":"10.1162\/coli_a_00502"}],"container-title":["Journal of Computational Social Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42001-024-00345-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s42001-024-00345-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42001-024-00345-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,15]],"date-time":"2025-02-15T05:29:03Z","timestamp":1739597343000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s42001-024-00345-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,18]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,2]]}},"alternative-id":["345"],"URL":"https:\/\/doi.org\/10.1007\/s42001-024-00345-9","relation":{},"ISSN":["2432-2717","2432-2725"],"issn-type":[{"value":"2432-2717","type":"print"},{"value":"2432-2725","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,18]]},"assertion":[{"value":"9 April 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 November 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 December 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"None.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"17"}}