{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T11:47:15Z","timestamp":1773402435857,"version":"3.50.1"},"reference-count":31,"publisher":"Wiley","issue":"11","license":[{"start":{"date-parts":[[2025,7,17]],"date-time":"2025-07-17T00:00:00Z","timestamp":1752710400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100005416","name":"Norges Forskningsr\u00e5d","doi-asserted-by":"publisher","award":["256223"],"award-info":[{"award-number":["256223"]}],"id":[{"id":"10.13039\/501100005416","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["72374160"],"award-info":[{"award-number":["72374160"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["71974150"],"award-info":[{"award-number":["71974150"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["L2424104"],"award-info":[{"award-number":["L2424104"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["asistdl.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Asso for Info Science &amp;amp; Tech"],"published-print":{"date-parts":[[2025,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The classification of research according to its aims has been a longstanding focus in the fields of quantitative science studies and R&amp;D statistics. Since 1963, the Organization for Economic Co\u2010operation and Development (OECD) has employed a classical distinction among basic, applied, and experimental research. Building on this framework, our previous work highlighted the utility of differentiating between scientific and societal progress as two primary research objectives. This distinction enabled the quantitative analysis of scientific publication abstracts and the development of an automated method for large\u2010scale classification. In the current study, we systematically evaluate text classification techniques, including traditional text mining models, classification tools, BERT\u2010based language models, and decoder\u2010only large language models (LLMs) such as ChatGPT. Our findings show that the fine\u2010tuned GPT\u20104o\u2010mini model performs the best among single\u2010model approaches. However, traditional and BERT\u2010based models outperform in certain fine\u2010grained classification tasks. Leveraging majority voting strategies to incorporate their strengths yields performance comparable to closed\u2010source GPT models. A case study on 10 biomedical journals further validates the method, demonstrating strong alignment between journal scopes, model predictions, and outputs generated by the fine\u2010tuned GPT\u20104o\u2010mini model. These results highlight the robustness and practical effectiveness of the proposed methodology for nuanced research aim classification.<\/jats:p>","DOI":"10.1002\/asi.70004","type":"journal-article","created":{"date-parts":[[2025,7,17]],"date-time":"2025-07-17T08:50:14Z","timestamp":1752742214000},"page":"1470-1487","update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Scaling research aim identification: Language models for classifying scientific and societal\u2010oriented studies"],"prefix":"10.1002","volume":"76","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3956-7808","authenticated-orcid":false,"given":"Mengjia","family":"Wu","sequence":"first","affiliation":[{"name":"Australian Artificial Intelligence Institute, Faculty of Engineering and Information Technology University of Technology Sydney  Sydney New South Wales Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1020-3189","authenticated-orcid":false,"given":"Gunnar","family":"Sivertsen","sequence":"additional","affiliation":[{"name":"Nordic Institute for Studies in Innovation, Research and Education (NIFU)  Oslo Norway"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0526-9677","authenticated-orcid":false,"given":"Lin","family":"Zhang","sequence":"additional","affiliation":[{"name":"Center for Science, Technology &amp; Education Assessment (CSTEA) Wuhan University  Wuhan China"},{"name":"Center for Studies of Information Resources, School of Information Management Wuhan University  Wuhan China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1342-3371","authenticated-orcid":false,"given":"Fan","family":"Qi","sequence":"additional","affiliation":[{"name":"Center for Science, Technology &amp; Education Assessment (CSTEA) Wuhan University  Wuhan China"},{"name":"Center for Studies of Information Resources, School of Information Management Wuhan University  Wuhan China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7731-0301","authenticated-orcid":false,"given":"Yi","family":"Zhang","sequence":"additional","affiliation":[{"name":"Australian Artificial Intelligence Institute, Faculty of Engineering and Information Technology University of Technology Sydney  Sydney New South Wales Australia"}]}],"member":"311","published-online":{"date-parts":[[2025,7,17]]},"reference":[{"key":"e_1_2_11_2_1","unstructured":"Achiam J. Adler S. Agarwal S. Ahmad L. Akkaya I. Aleman F. L. Almeida D. Altenschmidt J. Altman S. &Anadkat S.(2023).GPT\u20104 technical report. arXiv:2303.08774 [cs.CL] https:\/\/doi.org\/10.48550\/arXiv.2303.08774"},{"key":"e_1_2_11_3_1","doi-asserted-by":"crossref","unstructured":"Beltagy I. Lo K. &Cohan A.(2019).SciBERT: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676.","DOI":"10.18653\/v1\/D19-1371"},{"key":"e_1_2_11_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.joi.2013.10.005"},{"key":"e_1_2_11_5_1","unstructured":"Devlin J. Chang M.\u2010W. Lee K. &Toutanova K.(2018).BERT: Pre\u2010training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805."},{"key":"e_1_2_11_6_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies, volume 1 (long and short papers), Minneapolis, Minnesota","author":"Devlin J.","year":"2019"},{"key":"e_1_2_11_7_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.23256"},{"key":"e_1_2_11_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-020-03516-3"},{"key":"e_1_2_11_9_1","unstructured":"Dubey A. Jauhri A. Pandey A. Kadian A. Al\u2010Dahle A. Letman A. Mathur A. Schelten A. Yang A. &Fan A.(2024).The llama 3 herd of models [Preprint]. arXiv:https:\/\/doi.org\/10.48550\/arXiv.2407.21783"},{"key":"e_1_2_11_10_1","doi-asserted-by":"publisher","DOI":"10.1177\/0162243906291865"},{"key":"e_1_2_11_11_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btz682"},{"key":"e_1_2_11_12_1","unstructured":"Liu Y.(2019).Roberta: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692."},{"key":"e_1_2_11_13_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.4630270104"},{"key":"e_1_2_11_14_1","doi-asserted-by":"publisher","DOI":"10.1787\/9789264239012-en"},{"key":"e_1_2_11_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2024.3352100"},{"key":"e_1_2_11_16_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.24655"},{"key":"e_1_2_11_17_1","doi-asserted-by":"publisher","DOI":"10.1162\/qss_a_00285"},{"key":"e_1_2_11_18_1","unstructured":"Priem J. Piwowar H. &Orr R.(2022).OpenAlex: A fully\u2010open index of scholarly works authors venues institutions and concepts. arXiv preprint arXiv:2205.01833."},{"issue":"140","key":"e_1_2_11_19_1","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text\u2010to\u2010text transformer","volume":"21","author":"Raffel C.","year":"2020","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_11_20_1","doi-asserted-by":"crossref","unstructured":"Reimers N. &Gurevych I.(2019).Sentence\u2010BERT: Sentence embeddings using siamese BERT\u2010networks. arXiv preprint arXiv:1908.10084.","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_2_11_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-022-04602-4"},{"key":"e_1_2_11_22_1","doi-asserted-by":"publisher","DOI":"10.1093\/reseval\/rvz032"},{"key":"e_1_2_11_23_1","volume-title":"Pasteur's quadrant: Basic science and technological innovation","author":"Stokes D. E.","year":"1997"},{"key":"e_1_2_11_24_1","first-page":"265","volume-title":"Bibliographic control in the digital ecosystem","author":"Suominen O.","year":"2022"},{"key":"e_1_2_11_25_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.24642"},{"key":"e_1_2_11_26_1","unstructured":"Tay Y. Dehghani M. Tran V. Q. Garcia X. Wei J. Wang X. Chung H. W. Shakeri S. Bahri D. &Schuster T.(2022).Ul2: Unifying language learning paradigms. arXiv preprint arXiv:2205.05131."},{"key":"e_1_2_11_27_1","unstructured":"Touvron H. Lavril T. Izacard G. Martinet X. Lachaux M.\u2010A. Lacroix T. Rozi\u00e8re B. Goyal N. Hambro E. &Azhar F.(2023).Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971."},{"issue":"11","key":"e_1_2_11_28_1","first-page":"2579","article-title":"Visualizing data using t\u2010SNE","volume":"9","author":"Van der Maaten L.","year":"2008","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_11_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.techfore.2020.120513"},{"key":"e_1_2_11_30_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.23487"},{"key":"e_1_2_11_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-021-04171-y"},{"key":"e_1_2_11_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.joi.2018.09.004"}],"container-title":["Journal of the Association for Information Science and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/asistdl.onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/asi.70004","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T05:29:54Z","timestamp":1762061394000},"score":1,"resource":{"primary":{"URL":"https:\/\/asistdl.onlinelibrary.wiley.com\/doi\/10.1002\/asi.70004"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,17]]},"references-count":31,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,11]]}},"alternative-id":["10.1002\/asi.70004"],"URL":"https:\/\/doi.org\/10.1002\/asi.70004","archive":["Portico"],"relation":{},"ISSN":["2330-1635","2330-1643"],"issn-type":[{"value":"2330-1635","type":"print"},{"value":"2330-1643","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,17]]},"assertion":[{"value":"2025-01-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-17","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}