{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T14:16:55Z","timestamp":1780669015795,"version":"3.54.1"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,10,20]],"date-time":"2023-10-20T00:00:00Z","timestamp":1697760000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,10,20]],"date-time":"2023-10-20T00:00:00Z","timestamp":1697760000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Large language models (LLMs) are being integrated into healthcare systems; but these models may recapitulate harmful, race-based medicine. The objective of this study is to assess whether four commercially available large language models (LLMs) propagate harmful, inaccurate, race-based content when responding to eight different scenarios that check for race-based medicine or widespread misconceptions around race. Questions were derived from discussions among four physician experts and prior work on race-based medical misconceptions believed by medical trainees. We assessed four large language models with nine different questions that were interrogated five times each with a total of 45 responses per model. All models had examples of perpetuating race-based medicine in their responses. Models were not always consistent in their responses when asked the same question repeatedly. LLMs are being proposed for use in the healthcare setting, with some models already connecting to electronic health record systems. However, this study shows that based on our findings, these LLMs could potentially cause harm by perpetuating debunked, racist ideas.<\/jats:p>","DOI":"10.1038\/s41746-023-00939-z","type":"journal-article","created":{"date-parts":[[2023,10,20]],"date-time":"2023-10-20T10:02:19Z","timestamp":1697796139000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":320,"title":["Large language models propagate race-based medicine"],"prefix":"10.1038","volume":"6","author":[{"given":"Jesutofunmi A.","family":"Omiye","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jenna C.","family":"Lester","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1226-5527","authenticated-orcid":false,"given":"Simon","family":"Spichak","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0639-2677","authenticated-orcid":false,"given":"Veronica","family":"Rotemberg","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7988-9356","authenticated-orcid":false,"given":"Roxana","family":"Daneshjou","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2023,10,20]]},"reference":[{"key":"939_CR1","doi-asserted-by":"publisher","unstructured":"Harskamp, R. E. & Clercq, L. D. Performance of ChatGPT as an AI-assisted decision support tool in medicine: a proof-of-concept study for interpreting symptoms and management of common cardiac conditions (AMSTELHEART-2). 2023.03.25.23285475. Preprint at https:\/\/doi.org\/10.1101\/2023.03.25.23285475 (2023).","DOI":"10.1101\/2023.03.25.23285475"},{"key":"939_CR2","doi-asserted-by":"publisher","first-page":"E36","DOI":"10.1016\/j.bja.2023.04.033","volume":"131","author":"MJ Aldridge","year":"2023","unstructured":"Aldridge, M. J. & Penders, R. Artificial intelligence and anaesthesia examinations: exploring ChatGPT as a prelude to the future. Br. J. Anaesth 131, E36\u2013E37 (2023).","journal-title":"Br. J. Anaesth"},{"key":"939_CR3","doi-asserted-by":"publisher","first-page":"e230424","DOI":"10.1148\/radiol.230424","volume":"307","author":"HL Haver","year":"2023","unstructured":"Haver, H. L. et al. Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology 307, e230424 (2023).","journal-title":"Radiology"},{"key":"939_CR4","unstructured":"Brown, T. et al. Language models are few-shot learners. in Advances in Neural Information Processing Systems 33 1877\u20131901 (Curran Associates, Inc., 2020)."},{"key":"939_CR5","unstructured":"Pichai, S. Google AI updates: Bard and new AI features in Search. https:\/\/blog.google\/technology\/ai\/bard-google-ai-search-updates\/ (2023)."},{"key":"939_CR6","unstructured":"Vig, J. et al. Investigating gender bias in language models using causal mediation analysis. in Advances in Neural Information Processing Systems. 33 12388\u201312401 (Curran Associates, Inc., 2020)."},{"key":"939_CR7","doi-asserted-by":"publisher","unstructured":"Nadeem, M., Bethke, A. & Reddy, S. StereoSet: Measuring stereotypical bias in pretrained language models. in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) 5356\u20135371 (Association for Computational Linguistics, 2021). https:\/\/doi.org\/10.18653\/v1\/2021.acl-long.416.","DOI":"10.18653\/v1\/2021.acl-long.416"},{"key":"939_CR8","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1053\/j.ajkd.2021.08.003","volume":"79","author":"C Delgado","year":"2022","unstructured":"Delgado, C. et al. A unifying approach for GFR estimation: recommendations of the NKF-ASN task force on reassessing the inclusion of race in diagnosing kidney disease. Am. J. Kidney Dis. 79, 268\u2013288.e1 (2022).","journal-title":"Am. J. Kidney Dis."},{"key":"939_CR9","doi-asserted-by":"publisher","first-page":"978","DOI":"10.1164\/rccm.202302-0310ST","volume":"207","author":"NR Bhakta","year":"2023","unstructured":"Bhakta, N. R. et al. Race and ethnicity in pulmonary function test interpretation: an official American thoracic society statement. Am. J. Respir. Crit. Care Med. 207, 978\u2013995 (2023).","journal-title":"Am. J. Respir. Crit. Care Med."},{"key":"939_CR10","doi-asserted-by":"publisher","first-page":"4296","DOI":"10.1073\/pnas.1516047113","volume":"113","author":"KM Hoffman","year":"2016","unstructured":"Hoffman, K. M., Trawalter, S., Axt, J. R. & Oliver, M. N. Racial bias in pain assessment and treatment recommendations, and false beliefs about biological differences between blacks and whites. Proc. Natl Acad. Sci. 113, 4296\u20134301 (2016).","journal-title":"Proc. Natl Acad. Sci."},{"key":"939_CR11","unstructured":"Eddy, N. Epic, Microsoft partner to use generative AI for better EHRs. Healthcare IT News. https:\/\/www.healthcareitnews.com\/news\/epic-microsoft-partner-use-generative-ai-better-ehrs (2023)."},{"key":"939_CR12","unstructured":"Removing Race from Estimates of Kidney Function. National Kidney Foundation. https:\/\/www.kidney.org\/news\/removing-race-estimates-kidney-function (2021)."},{"key":"939_CR13","doi-asserted-by":"publisher","first-page":"992","DOI":"10.2215\/CJN.00090108","volume":"3","author":"J Hsu","year":"2008","unstructured":"Hsu, J., Johansen, K. L., Hsu, C.-Y., Kaysen, G. A. & Chertow, G. M. Higher serum creatinine concentrations in black patients with chronic kidney disease: beyond nutritional status and body composition. Clin. J. Am. Soc. Nephrol. CJASN 3, 992\u2013997 (2008).","journal-title":"Clin. J. Am. Soc. Nephrol. CJASN"},{"key":"939_CR14","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1016\/S0190-9622(00)90012-4","volume":"42","author":"SE Whitmore","year":"2000","unstructured":"Whitmore, S. E. & Sago, N. J. Caliper-measured skin thickness is similar in white and black women. J. Am. Acad. Dermatol. 42, 76\u201379 (2000).","journal-title":"J. Am. Acad. Dermatol."},{"key":"939_CR15","doi-asserted-by":"publisher","first-page":"e0000198","DOI":"10.1371\/journal.pdig.0000198","volume":"2","author":"TH Kung","year":"2023","unstructured":"Kung, T. H. et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit. Health 2, e0000198 (2023).","journal-title":"PLOS Digit. Health"},{"key":"939_CR16","unstructured":"Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V. & Kalai, A. T. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. in Advances in Neural Information Processing Systems. 29 (Curran Associates, Inc., 2016)."},{"key":"939_CR17","doi-asserted-by":"publisher","unstructured":"Sheng, E., Chang, K.-W., Natarajan, P. & Peng, N. The woman worked as a babysitter: on biases in language generation. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) 3407\u20133412 (Association for Computational Linguistics, 2019). https:\/\/doi.org\/10.18653\/v1\/D19-1339.","DOI":"10.18653\/v1\/D19-1339"},{"key":"939_CR18","first-page":"9","volume":"1","author":"A Radford","year":"2019","unstructured":"Radford, A. et al. Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019).","journal-title":"OpenAI Blog"},{"key":"939_CR19","first-page":"42","volume":"3","author":"G Kleinberg","year":"2022","unstructured":"Kleinberg, G., Diaz, M. J., Batchu, S. & Lucke-Wold, B. Racial underrepresentation in dermatological datasets leads to biased machine learning models and inequitable healthcare. J. Biomed. Res. 3, 42\u201347 (2022).","journal-title":"J. Biomed. Res."},{"key":"939_CR20","first-page":"27730","volume":"35","author":"L Ouyang","year":"2022","unstructured":"Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35, 27730\u201327744 (2022).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"939_CR21","unstructured":"Bai, Y. et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. Preprint at http:\/\/arxiv.org\/abs\/2204.05862 (2022)."},{"key":"939_CR22","doi-asserted-by":"publisher","unstructured":"Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the dangers of stochastic parrots: can language models be too big? in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency 610\u2013623 (ACM, 2021). https:\/\/doi.org\/10.1145\/3442188.3445922.","DOI":"10.1145\/3442188.3445922"},{"key":"939_CR23","unstructured":"Celikyilmaz, A., Clark, E. & Gao, J. Evaluation of text generation: a survey. Preprint at http:\/\/arxiv.org\/abs\/2006.14799 (2021)."},{"key":"939_CR24","unstructured":"OpenAI. Introducing ChatGPT. https:\/\/openai.com\/blog\/chatgpt (2022)."},{"key":"939_CR25","doi-asserted-by":"publisher","unstructured":"OpenAI. GPT-4 Technical Report. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2303.08774 (2023).","DOI":"10.48550\/arXiv.2303.08774"},{"key":"939_CR26","unstructured":"OpenAI. GPT-4. https:\/\/openai.com\/research\/gpt-4 (2023)."},{"key":"939_CR27","unstructured":"Introducing Claude. Anthropic https:\/\/www.anthropic.com\/index\/introducing-claude (2023)."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00939-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00939-z","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00939-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,18]],"date-time":"2023-11-18T14:25:34Z","timestamp":1700317534000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-023-00939-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,20]]},"references-count":27,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["939"],"URL":"https:\/\/doi.org\/10.1038\/s41746-023-00939-z","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,20]]},"assertion":[{"value":"13 July 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 September 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 October 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"R.D. has served as an advisor to MDAlgorithms and Revea and received consulting fees from Pfizer, L\u2019Oreal, Frazier Healthcare Partners, and DWA, and research funding from UCB. V.R. is an expert advisor for Inhabit Brands. The remaining authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"195"}}