{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T00:37:54Z","timestamp":1767832674247,"version":"3.49.0"},"reference-count":59,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2021,12,16]],"date-time":"2021-12-16T00:00:00Z","timestamp":1639612800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,1,29]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Objective<\/jats:title><jats:p>To analyze gender bias in clinical trials, to design an algorithm that mitigates the effects of biases of gender representation on natural-language (NLP) systems trained on text drawn from clinical trials, and to evaluate its performance.<\/jats:p><\/jats:sec><jats:sec><jats:title>Materials and Methods<\/jats:title><jats:p>We analyze gender bias in clinical trials described by 16\u00a0772 PubMed abstracts (2008\u20132018). We present a method to augment word embeddings, the core building block of NLP-centric representations, by weighting abstracts by the number of women participants in the trial. We evaluate the resulting gender-sensitive embeddings performance on several clinical prediction tasks: comorbidity classification, hospital length of stay prediction, and intensive care unit (ICU) readmission prediction.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>For female patients, the gender-sensitive model area under the receiver-operator characteristic (AUROC) is 0.86 versus the baseline of 0.81 for comorbidity classification, mean absolute error 4.59 versus the baseline of 4.66 for length of stay prediction, and AUROC 0.69 versus 0.67 for ICU readmission. All results are statistically significant.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>Women have been underrepresented in clinical trials. Thus, using the broad clinical trials literature as training data for statistical language models could result in biased models, with deficits in knowledge about women. The method presented enables gender-sensitive use of publications as training data for word embeddings. In experiments, the gender-sensitive embeddings show better performance than baseline embeddings for the clinical tasks studied. The results highlight opportunities for recognizing and addressing gender and other representational biases in the clinical trials literature.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>Addressing representational biases in data for training NLP embeddings can lead to better results on downstream tasks for underrepresented populations.<\/jats:p><\/jats:sec>","DOI":"10.1093\/jamia\/ocab279","type":"journal-article","created":{"date-parts":[[2021,12,10]],"date-time":"2021-12-10T12:20:30Z","timestamp":1639138830000},"page":"415-423","source":"Crossref","is-referenced-by-count":12,"title":["Gender-sensitive word embeddings for healthcare"],"prefix":"10.1093","volume":"29","author":[{"given":"Shunit","family":"Agmon","sequence":"first","affiliation":[{"name":"Computer Science Faculty, Technion - Israel Institute of Technology, Haifa, Israel"}]},{"given":"Plia","family":"Gillis","sequence":"additional","affiliation":[{"name":"Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel"}]},{"given":"Eric","family":"Horvitz","sequence":"additional","affiliation":[{"name":"Microsoft Research, Redmond, WA, USA"}]},{"given":"Kira","family":"Radinsky","sequence":"additional","affiliation":[{"name":"Computer Science Faculty, Technion - Israel Institute of Technology, Haifa, Israel"}]}],"member":"286","published-online":{"date-parts":[[2021,12,16]]},"reference":[{"issue":"1","key":"2022012920323004800_ocab279-B1","doi-asserted-by":"crossref","first-page":"708","DOI":"10.18549\/PharmPract.2016.01.708","article-title":"Women\u2019s involvement in clinical trials: historical perspective and future implications","volume":"14","author":"Liu","year":"2016","journal-title":"Pharm Pract (Granada)"},{"issue":"7","key":"2022012920323004800_ocab279-B2","doi-asserted-by":"crossref","first-page":"e196700","DOI":"10.1001\/jamanetworkopen.2019.6700","article-title":"Quantifying sex bias in clinical studies at scale with automated data extraction","volume":"2","author":"Feldman","year":"2019","journal-title":"JAMA Netw Open"},{"issue":"34","key":"2022012920323004800_ocab279-B3","article-title":"Sex bias in drug research: a call for change","volume":"14","author":"McGregor","year":"2016","journal-title":"Evaluation"},{"issue":"4","key":"2022012920323004800_ocab279-B4","doi-asserted-by":"crossref","first-page":"630","DOI":"10.1097\/ACM.0000000000002027","article-title":"The more things change, the more they stay the same: a study to evaluate compliance with inclusion and assessment of women and minorities in randomized controlled trials","volume":"93","author":"Geller","year":"2018","journal-title":"Acad Med"},{"key":"2022012920323004800_ocab279-B5"},{"issue":"3","key":"2022012920323004800_ocab279-B6","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1007\/s11930-017-0111-8","article-title":"Non-binary\/genderqueer identities: a critical review of the literature","volume":"9","author":"Matsuno","year":"2017","journal-title":"Curr Sex Health Rep"},{"issue":"1","key":"2022012920323004800_ocab279-B7","doi-asserted-by":"crossref","first-page":"44","DOI":"10.3109\/09540261.2015.1115753","article-title":"Mental health and gender dysphoria: a review of the literature","volume":"28","author":"Dhejne","year":"2016","journal-title":"Int Rev Psychiatry"},{"issue":"11","key":"2022012920323004800_ocab279-B8","doi-asserted-by":"crossref","first-page":"1003","DOI":"10.1177\/009127009803801103","article-title":"Gender differences in adverse drug reactions","volume":"38","author":"Tran","year":"1998","journal-title":"J Clin Pharmacol"},{"issue":"10","key":"2022012920323004800_ocab279-B9","doi-asserted-by":"crossref","first-page":"999","DOI":"10.1007\/s00228-008-0494-6","article-title":"Women encounter ADRs more often than do men","volume":"64","author":"Zopf","year":"2008","journal-title":"Eur J Clin Pharmacol"},{"issue":"11","key":"2022012920323004800_ocab279-B10","first-page":"1254","article-title":"Sex-based differences in drug activity","volume":"80","author":"Whitley","year":"2009","journal-title":"Am Fam Physician"},{"issue":"8","key":"2022012920323004800_ocab279-B11","doi-asserted-by":"crossref","first-page":"689","DOI":"10.1056\/NEJMp1307972","article-title":"Zolpidem and driving impairment\u2014identifying persons at risk","volume":"369","author":"Farkas","year":"2013","journal-title":"N Engl J Med"},{"issue":"3","key":"2022012920323004800_ocab279-B12","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1038\/s41591-018-0335-9","article-title":"Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence","volume":"25","author":"Liang","year":"2019","journal-title":"Nat Med"},{"issue":"4","key":"2022012920323004800_ocab279-B13","doi-asserted-by":"crossref","first-page":"e0174708","DOI":"10.1371\/journal.pone.0174708","article-title":"Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning","volume":"12","author":"Horng","year":"2017","journal-title":"PLoS One"},{"key":"2022012920323004800_ocab279-B14","first-page":"259","article-title":"CodeMagic: semi-automatic assignment of ICD-10-AM codes to patient records","author":"Arifo\u02d8glu","year":"2014","journal-title":"Information Sciences and Systems"},{"issue":"4","key":"2022012920323004800_ocab279-B15","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"key":"2022012920323004800_ocab279-B16"},{"issue":"6464","key":"2022012920323004800_ocab279-B17","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1126\/science.aax2342","article-title":"Dissecting racial bias in an algorithm used to manage the health of populations","volume":"366","author":"Obermeyer","year":"2019","journal-title":"Science"},{"issue":"16","key":"2022012920323004800_ocab279-B18","doi-asserted-by":"crossref","first-page":"E3635","DOI":"10.1073\/pnas.1720347115","article-title":"Word embeddings quantify 100 years of gender and ethnic stereotypes","volume":"115","author":"Garg","year":"2018","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"6334","key":"2022012920323004800_ocab279-B19","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1126\/science.aal4230","article-title":"Semantics derived automatically from language corpora contain human-like biases","volume":"356","author":"Caliskan","year":"2017","journal-title":"Science"},{"key":"2022012920323004800_ocab279-B20","author":"Zhang","year":"2020"},{"key":"2022012920323004800_ocab279-B21","article-title":"Efficient estimation of word representations in vector space","author":"Mikolov","year":"2013","journal-title":"arXiv Preprint"},{"key":"2022012920323004800_ocab279-B22","author":"Pennington","year":"2014"},{"key":"2022012920323004800_ocab279-B23","author":"Peters","year":"2018"},{"key":"2022012920323004800_ocab279-B24","author":"Devlin","year":"2019"},{"key":"2022012920323004800_ocab279-B25","first-page":"4349","article-title":"Man is to computer programmer as woman is to homemaker? debiasing word embeddings","volume":"29","author":"Bolukbasi","year":"2016","journal-title":"Adv Neural Inform Process Syst"},{"key":"2022012920323004800_ocab279-B26","article-title":"Learning gender-neutral word embeddings","author":"Zhao","year":"2018","journal-title":"arXiv Preprint"},{"key":"2022012920323004800_ocab279-B27","article-title":"Lipstick on a pig: debiasing methods cover up systematic gender biases in word embeddings but do not remove them","author":"Gonen","year":"2019","journal-title":"arXiv Preprint"},{"key":"2022012920323004800_ocab279-B28","author":"Kurita","year":"2, 2019;"},{"key":"2022012920323004800_ocab279-B29","author":"Basta","year":"2, 2019;"},{"key":"2022012920323004800_ocab279-B30","author":"Ravfogel","year":"2020"},{"key":"2022012920323004800_ocab279-B31"},{"issue":"1","key":"2022012920323004800_ocab279-B32","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1111\/1471-0528.14711","article-title":"Epidemiology of endometriosis: a large population-based database study from a healthcare provider with 2 million members","volume":"125","author":"Eisenberg","year":"2018","journal-title":"BJOG"},{"issue":"2","key":"2022012920323004800_ocab279-B33","doi-asserted-by":"crossref","first-page":"402","DOI":"10.1016\/j.ajo.2014.04.026","article-title":"The Maccabi Glaucoma Study: prevalence and incidence of glaucoma in a large Israeli health maintenance organization","volume":"158","author":"Levkovitch-Verbin","year":"2014","journal-title":"Am J Ophthalmol"},{"issue":"6","key":"2022012920323004800_ocab279-B34","doi-asserted-by":"crossref","first-page":"1044","DOI":"10.1002\/jmv.24426","article-title":"Epidemiology of hepatitis C virus infection in a large Israeli health maintenance organization","volume":"88","author":"Weil","year":"2016","journal-title":"J Med Virol"},{"issue":"3","key":"2022012920323004800_ocab279-B35","doi-asserted-by":"crossref","first-page":"501","DOI":"10.1161\/HYPERTENSIONAHA.114.03718","article-title":"Prevalence and factors associated with resistant hypertension in a large health maintenance organization in Israel","volume":"64","author":"Weitzman","year":"2014","journal-title":"Hypertension"},{"key":"2022012920323004800_ocab279-B36","year":"2018"},{"issue":"1","key":"2022012920323004800_ocab279-B37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/sdata.2016.35","article-title":"MIMIC-III, a freely accessible critical care database","volume":"3","author":"Johnson","year":"2016","journal-title":"Sci Data"},{"issue":"Database issue","key":"2022012920323004800_ocab279-B38","doi-asserted-by":"crossref","first-page":"D267","DOI":"10.1093\/nar\/gkh061","article-title":"The unified medical language system (UMLS): integrating biomedical terminology","volume":"32","author":"Bodenreider","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2022012920323004800_ocab279-B39","author":"Aronson","year":"3\u20137, 2001; ,"},{"issue":"9","key":"2022012920323004800_ocab279-B40","doi-asserted-by":"crossref","first-page":"e0203755","DOI":"10.1371\/journal.pone.0203755","article-title":"Fibromyalgia diagnosis and biased assessment: sex, prevalence and bias","volume":"13","author":"Wolfe","year":"2018","journal-title":"PLoS One"},{"key":"2022012920323004800_ocab279-B41","author":"Rios","year":"2020"},{"key":"2022012920323004800_ocab279-B42","article-title":"Clinical concept embeddings learned from massive sources of multimodal medical data","author":"Beam","year":"2018","journal-title":"arXiv Preprint"},{"issue":"11","key":"2022012920323004800_ocab279-B43","doi-asserted-by":"crossref","first-page":"e0225495","DOI":"10.1371\/journal.pone.0225495","article-title":"Discovering novel disease comorbidities using electronic medical records","volume":"14","author":"Chaganti","year":"2019","journal-title":"PLoS ONE"},{"issue":"7","key":"2022012920323004800_ocab279-B44","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1158\/1055-9965.EPI-16-0212","article-title":"Comprehensive evaluation of medical conditions associated with risk of non-Hodgkin lymphoma using Medicare claims (\u201cMedWAS\u201d)","volume":"25","author":"Engels","year":"2016","journal-title":"Cancer Epidemiol Biomarkers Prev"},{"issue":"4","key":"2022012920323004800_ocab279-B45","doi-asserted-by":"crossref","first-page":"e5203","DOI":"10.1371\/journal.pone.0005203","article-title":"Exploring clinical associations using \u2018-omics\u2019 based enrichment analyses","volume":"4","author":"Hanauer","year":"2009","journal-title":"PLoS One"},{"issue":"6","key":"2022012920323004800_ocab279-B46","doi-asserted-by":"crossref","first-page":"e21132","DOI":"10.1371\/journal.pone.0021132","article-title":"Discovering disease associations by integrating electronic clinical data and medical literature","volume":"6","author":"Holmes","year":"2011","journal-title":"PLoS One"},{"issue":"3","key":"2022012920323004800_ocab279-B47","doi-asserted-by":"crossref","first-page":"837","DOI":"10.2307\/2531595","article-title":"Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach","volume":"44","author":"DeLong","year":"1988","journal-title":"Biometrics"},{"key":"2022012920323004800_ocab279-B48"},{"key":"2022012920323004800_ocab279-B49"},{"key":"2022012920323004800_ocab279-B50"},{"issue":"D1","key":"2022012920323004800_ocab279-B51","doi-asserted-by":"crossref","first-page":"D607","DOI":"10.1093\/nar\/gky1131","article-title":"STRING v11: protein\u2013protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets","volume":"47","author":"Szklarczyk","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2022012920323004800_ocab279-B52","author":"Folino","year":"12\u201315, 2010; ,"},{"issue":"10","key":"2022012920323004800_ocab279-B53","doi-asserted-by":"crossref","first-page":"e109264","DOI":"10.1371\/journal.pone.0109264","article-title":"Data-driven decisions for reducing readmissions for heart failure: general methodology and case study","volume":"9","author":"Bayati","year":"2014","journal-title":"PLoS One"},{"issue":"7","key":"2022012920323004800_ocab279-B54","doi-asserted-by":"crossref","first-page":"e0218942","DOI":"10.1371\/journal.pone.0218942","article-title":"Analysis and prediction of unplanned intensive care unit readmission using recurrent neural networks with long short-term memory","volume":"14","author":"Lin","year":"2019","journal-title":"PLoS One"},{"issue":"9","key":"2022012920323004800_ocab279-B55","doi-asserted-by":"crossref","first-page":"e017199","DOI":"10.1136\/bmjopen-2017-017199","article-title":"Prediction of early unplanned intensive care unit readmission in a UK tertiary care hospital: a cross-sectional machine learning approach","volume":"7","author":"Desautels","year":"2017","journal-title":"BMJ Open"},{"key":"2022012920323004800_ocab279-B56","article-title":"Accurate and reproducible prediction of ICU readmissions","author":"Nguyen","year":"2021","journal-title":"medRxiv"},{"issue":"1","key":"2022012920323004800_ocab279-B57","first-page":"139","article-title":"Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics","volume":"1989","author":"Crenshaw","year":"1989","journal-title":"University of Chicago Legal Forum"},{"issue":"2","key":"2022012920323004800_ocab279-B58","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1017\/S1742058X13000210","article-title":"Movement intersectionality: The case of race, gender, disability, and genetic technologies","volume":"10","author":"Roberts","year":"2013","journal-title":"Du Bois Rev"},{"issue":"2","key":"2022012920323004800_ocab279-B59","doi-asserted-by":"crossref","first-page":"156","DOI":"10.2522\/ptj.20070147","article-title":"Scales to assess the quality of randomized controlled trials: a systematic review","volume":"88","author":"Olivo","year":"2008","journal-title":"Phys Ther"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/29\/3\/415\/42333196\/ocab279.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/29\/3\/415\/42333196\/ocab279.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,13]],"date-time":"2023-11-13T22:52:02Z","timestamp":1699915922000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/29\/3\/415\/6464068"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,16]]},"references-count":59,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2021,12,16]]},"published-print":{"date-parts":[[2022,1,29]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocab279","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,3,1]]},"published":{"date-parts":[[2021,12,16]]}}}