{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,20]],"date-time":"2026-04-20T03:12:03Z","timestamp":1776654723446,"version":"3.51.2"},"reference-count":43,"publisher":"MIT Press","issue":"3","license":[{"start":{"date-parts":[[2024,5,6]],"date-time":"2024-05-06T00:00:00Z","timestamp":1714953600000},"content-version":"vor","delay-in-days":126,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100014438","name":"Business Finland","doi-asserted-by":"publisher","award":["under the project \u201cMapping Sustainable Development Activity; Its Evolution and Impact in Science, Technology, Innovation and Businesses (INNOSDG)\u201d"],"award-info":[{"award-number":["under the project \u201cMapping Sustainable Development Activity; Its Evolution and Impact in Science, Technology, Innovation and Businesses (INNOSDG)\u201d"]}],"id":[{"id":"10.13039\/501100014438","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100007167","name":"VTT Technical Research Centre of Finland","doi-asserted-by":"crossref","award":["project 132376"],"award-info":[{"award-number":["project 132376"]}],"id":[{"id":"10.13039\/501100007167","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>This paper examines the comparative effectiveness of a specialized compiled language model and a general-purpose model such as OpenAI\u2019s GPT-3.5 in detecting sustainable development goals (SDGs) within text data. It presents a critical review of large language models (LLMs), addressing challenges related to bias and sensitivity. The necessity of specialized training for precise, unbiased analysis is underlined. A case study using a company descriptions data set offers insight into the differences between the GPT-3.5 model and the specialized SDG detection model. While GPT-3.5 boasts broader coverage, it may identify SDGs with limited relevance to the companies\u2019 activities. In contrast, the specialized model zeroes in on highly pertinent SDGs. The importance of thoughtful model selection is emphasized, taking into account task requirements, cost, complexity, and transparency. Despite the versatility of LLMs, the use of specialized models is suggested for tasks demanding precision and accuracy. The study concludes by encouraging further research to find a balance between the capabilities of LLMs and the need for domain-specific expertise and interpretability.<\/jats:p>","DOI":"10.1162\/qss_a_00310","type":"journal-article","created":{"date-parts":[[2024,5,6]],"date-time":"2024-05-06T14:11:09Z","timestamp":1715004669000},"page":"736-756","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":40,"title":["A critical review of large language models: Sensitivity, bias, and the path toward specialized AI"],"prefix":"10.1162","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2032-9180","authenticated-orcid":true,"given":"Arash","family":"Hajikhani","sequence":"first","affiliation":[{"name":"Quantitative Science and Technology Studies, VTT Technical Research Centre of Finland, Espoo, Finland"},{"name":"School of Business and Management, LUT University, Lappeenranta, Finland"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-6031-881X","authenticated-orcid":true,"given":"Carolyn","family":"Cole","sequence":"additional","affiliation":[{"name":"Quantitative Science and Technology Studies, VTT Technical Research Centre of Finland, Espoo, Finland"}]}],"member":"281","published-online":{"date-parts":[[2024,8,1]]},"reference":[{"issue":"2","key":"2024093020301740300_bib1","doi-asserted-by":"publisher","DOI":"10.59707\/hymrFBYA5348","article-title":"Harnessing large language models in medical research and scientific writing: A closer look to the future","volume":"1","author":"Abu-Jeyyab","year":"2023","journal-title":"High Yield Medical Reviews"},{"key":"2024093020301740300_bib2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2212.06295","article-title":"Despite \u201csuper-human\u201d performance, current LLMs are unsuited for decisions about ethics and safety","author":"Albrecht","year":"2022","journal-title":"arXiv"},{"key":"2024093020301740300_bib3","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-11009-3_34","article-title":"Turning a blind eye: Explicit removal of biases and variation from deep neural network embeddings","volume-title":"Computer vision\u2014ECCV 2018 workshops","author":"Alvi","year":"2019"},{"key":"2024093020301740300_bib4","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2005.14165","article-title":"Language models are few-shot learners","author":"Brown","year":"2020","journal-title":"arXiv"},{"issue":"6","key":"2024093020301740300_bib5","doi-asserted-by":"publisher","first-page":"663","DOI":"10.1002\/sd.1735","article-title":"Is the Sustainable Development Goals (SDG) index an adequate framework to measure the progress of the 2030 Agenda?","volume":"26","author":"Diaz-Sarachaga","year":"2018","journal-title":"Sustainable Development"},{"key":"2024093020301740300_bib6","volume-title":"Sustainability science in a global landscape","author":"Elsevier","year":"2015"},{"issue":"1","key":"2024093020301740300_bib7","doi-asserted-by":"publisher","first-page":"e100978","DOI":"10.1136\/bmjhci-2023-100978","article-title":"Performance of large language models on advocating the management of meningitis: A comparative qualitative study","volume":"31","author":"Fisch","year":"2024","journal-title":"BMJ Health & Care Informatics"},{"key":"2024093020301740300_bib8","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-21743-2_21","article-title":"SDG-Meter: A deep learning based tool for automatic text classification of the Sustainable Development Goals","volume-title":"Intelligent information and database systems. ACIIDS 2022","author":"Guisiano","year":"2022"},{"key":"2024093020301740300_bib9","doi-asserted-by":"publisher","first-page":"6661","DOI":"10.1007\/s11192-022-04358-x","article-title":"Mapping the sustainable development goals (SDGs) in science, technology and innovation: application of machine learning in SDG-oriented artefact detection","volume":"127","author":"Hajikhani","year":"2022","journal-title":"Scientometrics"},{"key":"2024093020301740300_bib10","doi-asserted-by":"publisher","first-page":"106775","DOI":"10.1016\/j.ecolecon.2020.106775","article-title":"Frontrunners and laggards: How fast are the EU member states progressing towards the sustainable development goals?","volume":"177","author":"Hametner","year":"2020","journal-title":"Ecological Economics"},{"issue":"1","key":"2024093020301740300_bib11","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1038\/s41368-023-00239-y","article-title":"ChatGPT for shaping the future of dentistry: The potential of multi-modal large language model","volume":"15","author":"Huang","year":"2023","journal-title":"International Journal of Oral Science"},{"issue":"20","key":"2024093020301740300_bib12","doi-asserted-by":"publisher","first-page":"5596","DOI":"10.3390\/su11205596","article-title":"Visualizing sustainability research in business and management (1990\u20132019) and emerging topics: A large-scale bibliometric analysis","volume":"11","author":"Jia","year":"2019","journal-title":"Sustainability"},{"key":"2024093020301740300_bib13","doi-asserted-by":"publisher","first-page":"1163","DOI":"10.1109\/TEM.2022.3152216","article-title":"Deep learning for technical document classification","volume":"71","author":"Jiang","year":"2022","journal-title":"IEEE Transactions on Engineering Management"},{"key":"2024093020301740300_bib15","doi-asserted-by":"publisher","first-page":"104","DOI":"10.1016\/j.csbj.2016.12.005","article-title":"Machine learning and data mining methods in diabetes research","volume":"15","author":"Kavakiotis","year":"2017","journal-title":"Computational and Structural Biotechnology Journal"},{"key":"2024093020301740300_bib16","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1016\/j.csbj.2014.11.005","article-title":"Machine learning applications in cancer prognosis and prediction","volume":"13","author":"Kourou","year":"2015","journal-title":"Computational and Structural Biotechnology Journal"},{"key":"2024093020301740300_bib17","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2302.08091","article-title":"Do we still need clinical language models?","author":"Lehman","year":"2023","journal-title":"arXiv"},{"key":"2024093020301740300_bib18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.cosust.2015.05.009","article-title":"The multiple roles of sustainability indicators in informational governance: Between intended use and unanticipated influence","volume":"18","author":"Lehtonen","year":"2016","journal-title":"Current Opinion in Environmental Sustainability"},{"key":"2024093020301740300_bib19","first-page":"6565","article-title":"Towards understanding and mitigating social biases in language models","volume":"139","author":"Liang","year":"2021","journal-title":"Proceedings of Machine Learning Research"},{"key":"2024093020301740300_bib20","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2309.06256","article-title":"Speciality vs generality: An empirical study on catastrophic forgetting in fine-tuning foundation models","author":"Lin","year":"2023","journal-title":"arXiv"},{"issue":"Suppl. 1","key":"2024093020301740300_bib21","doi-asserted-by":"publisher","first-page":"2328","DOI":"10.1182\/blood-2023-172710","article-title":"Toward AI-assisted clinical assessment for patients with multiple myeloma: Feature selection for large language models","volume":"142","author":"Malek","year":"2023","journal-title":"Blood"},{"key":"2024093020301740300_bib22","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2111.01243","article-title":"Recent advances in natural language processing via large pre-trained language models: A survey","author":"Min","year":"2021","journal-title":"arXiv"},{"key":"2024093020301740300_bib14","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2023.3333946","article-title":"Legal natural language processing from 2015\u20132022: A comprehensive systematic mapping study of advances and applications","author":"Quevedo","year":"2023","journal-title":"IEEE Access"},{"key":"2024093020301740300_bib23","doi-asserted-by":"publisher","DOI":"10.1101\/2023.07.13.23292613","article-title":"On the limitations of large language models in clinical diagnosis","author":"Reese","year":"2023","journal-title":"medRxiv"},{"issue":"3","key":"2024093020301740300_bib24","doi-asserted-by":"publisher","first-page":"588","DOI":"10.1002\/csr.1705","article-title":"Business contribution to the Sustainable Development Agenda: Organizational factors related to early adoption of SDG reporting","volume":"26","author":"Rosati","year":"2019","journal-title":"Corporate Social Responsibility and Environmental Management"},{"key":"2024093020301740300_bib25","doi-asserted-by":"publisher","first-page":"230","DOI":"10.1016\/j.jneumeth.2013.11.016","article-title":"Machine learning on brain MRI data for differential diagnosis of Parkinson\u2019s disease and progressive supranuclear palsy","volume":"222","author":"Salvatore","year":"2014","journal-title":"Journal of Neuroscience Methods"},{"key":"2024093020301740300_bib26","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2402.05140","article-title":"Tag-LLM: Repurposing general-purpose LLMs for specialized domains","author":"Shen","year":"2024","journal-title":"arXiv"},{"issue":"3","key":"2024093020301740300_bib27","doi-asserted-by":"publisher","first-page":"e0265409","DOI":"10.1371\/journal.pone.0265409","article-title":"Impact of the Sustainable Development Goals on the academic research agenda. A scientometric analysis","volume":"17","author":"Sianes","year":"2022","journal-title":"PLOS ONE"},{"issue":"7972","key":"2024093020301740300_bib28","doi-asserted-by":"publisher","first-page":"172","DOI":"10.1038\/s41586-023-06291-2","article-title":"Large language models encode clinical knowledge","volume":"620","author":"Singhal","year":"2023","journal-title":"Nature"},{"issue":"4","key":"2024093020301740300_bib29","doi-asserted-by":"publisher","first-page":"311","DOI":"10.1002\/sd.1657","article-title":"Hot air or comprehensive progress? A critical assessment of the SDGs","volume":"25","author":"Spangenberg","year":"2017","journal-title":"Sustainable Development"},{"issue":"Suppl. 5","key":"2024093020301740300_bib30","doi-asserted-by":"publisher","first-page":"v10","DOI":"10.1093\/noajnl\/vdad141.041","article-title":"10089-CO-4 development of a physician support system for analysis of genetic mutations in brain tumors and selection of clinical trials using large-scale language models (LLMs) with retriever","volume":"5","author":"Takahashi","year":"2023","journal-title":"Neuro-Oncology Advances"},{"key":"2024093020301740300_bib31","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2302.13971","article-title":"LLaMA: Open and efficient foundation language models","author":"Touvron","year":"2023","journal-title":"arXiv"},{"issue":"6","key":"2024093020301740300_bib32","doi-asserted-by":"publisher","first-page":"1584","DOI":"10.1002\/sd.2107","article-title":"Sustainable development goal interactions: An analysis based on the five pillars of the 2030 agenda","volume":"28","author":"Tremblay","year":"2020","journal-title":"Sustainable Development"},{"key":"2024093020301740300_bib33","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1805.12152","article-title":"Robustness may be at odds with accuracy","author":"Tsipras","year":"2018","journal-title":"arXiv"},{"key":"2024093020301740300_bib34","volume-title":"Transforming our world: The 2030 Agenda for Sustainable Development","author":"UN General Assembly","year":"2015"},{"key":"2024093020301740300_bib35","volume-title":"Sustainable development report","author":"UNSDG","year":"2019"},{"issue":"20","key":"2024093020301740300_bib37","doi-asserted-by":"publisher","first-page":"5783","DOI":"10.3390\/su11205783","article-title":"A bibliometric review of the knowledge base for innovation in sustainable development","volume":"11","author":"Vatananan-Thesenvitz","year":"2019","journal-title":"Sustainability"},{"key":"2024093020301740300_bib38","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2301.02120","article-title":"Reprogramming pretrained language models for protein sequence representation learning","author":"Vinod","year":"2023","journal-title":"arXiv"},{"key":"2024093020301740300_bib39","volume-title":"Voluntary National Review 2020 FINLAND: Report on the implementation of the 2030 Agenda for Sustainable Development","author":"VNK","year":"2020"},{"issue":"5","key":"2024093020301740300_bib40","doi-asserted-by":"publisher","first-page":"e2021865119","DOI":"10.1073\/pnas.2021865119","article-title":"One model for the learning of language","volume":"119","author":"Yang","year":"2017","journal-title":"Proceedings of the National Academy of Sciences of the United States of America"},{"key":"2024093020301740300_bib41","doi-asserted-by":"publisher","first-page":"2225","DOI":"10.18653\/v1\/2021.findings-emnlp.192","article-title":"GPT3Mix: Leveraging large-scale language models for text augmentation","volume-title":"Findings of the Association for Computational Linguistics","author":"Yoo","year":"2021"},{"key":"2024093020301740300_bib42","doi-asserted-by":"publisher","DOI":"10.1101\/2023.07.13.23292577","article-title":"Coding inequity: Assessing GPT-4\u2019s potential for perpetuating racial and gender biases in healthcare","author":"Zack","year":"2023","journal-title":"medRxiv"},{"key":"2024093020301740300_bib43","doi-asserted-by":"publisher","first-page":"1537","DOI":"10.1007\/978-3-030-70665-4_166","article-title":"Discover discriminatory bias in high accuracy models embedded in machine learning algorithms","volume-title":"Advances in natural computation, fuzzy systems and knowledge discovery","author":"Zhang","year":"2021"},{"key":"2024093020301740300_bib44","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2310.04945","article-title":"Balancing specialized and general skills in LLMs: The impact of modern tuning and data strategy","author":"Zhang","year":"2023","journal-title":"arXiv"}],"container-title":["Quantitative Science Studies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/qss\/article-pdf\/doi\/10.1162\/qss_a_00310\/2472683\/qss_a_00310.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/qss\/article-pdf\/doi\/10.1162\/qss_a_00310\/2472683\/qss_a_00310.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,30]],"date-time":"2024-09-30T16:30:33Z","timestamp":1727713833000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/qss\/article\/5\/3\/736\/120940\/A-critical-review-of-large-language-models"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":43,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,8,1]]}},"URL":"https:\/\/doi.org\/10.1162\/qss_a_00310","relation":{"has-review":[{"id-type":"doi","id":"10.1162\/QSS_A_00310\/v1\/decision1","asserted-by":"object"},{"id-type":"doi","id":"10.1162\/QSS_A_00310\/v1\/review1","asserted-by":"object"},{"id-type":"doi","id":"10.1162\/QSS_A_00310\/v2\/review1","asserted-by":"object"},{"id-type":"doi","id":"10.1162\/QSS_A_00310\/v2\/response1","asserted-by":"object"},{"id-type":"doi","id":"10.1162\/QSS_A_00310\/v2\/decision1","asserted-by":"object"}]},"ISSN":["2641-3337"],"issn-type":[{"value":"2641-3337","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]}}}