{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T16:56:23Z","timestamp":1774630583700,"version":"3.50.1"},"reference-count":53,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T00:00:00Z","timestamp":1774051200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100017223","name":"The National Energy Research Scientific Computing Center","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100017223","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000015","name":"The U.S. Department of Energy","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"crossref"}]},{"name":"The Sustainable Research Pathways Program"},{"name":"The Hood College Volpe Scholarship"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Obstructive Sleep Apnea (OSA) is a common sleep disorder associated with serious health risks. This study leverages large language models (LLMs) to process and interpret clinical narratives in electronic health records. It develops clinically meaningful lexicons for predicting mortality and readmission risk, as well as for multiclass diagnostic classification in OSA patients. Using LLM-expanded lexicons, logistic regression models achieved ROC\u2013AUC scores of 0.844 for 6-month all-cause post-discharge mortality, 0.817 for 1-year all-cause post-discharge mortality, and 0.729 for all-cause hospital readmissions following the first discharge. Diagnostic performance was highest with smaller n-gram representations, indicating that additional contextual length did not improve performance. Compared with frequency-based n-gram models, LLM-expanded lexicons yielded sparser feature sets with lower computational cost and comparable performance. Our findings highlight the potential of LLM-expanded lexicons to enhance OSA diagnosis and clinical risk stratification.<\/jats:p>","DOI":"10.3390\/bdcc10030097","type":"journal-article","created":{"date-parts":[[2026,3,23]],"date-time":"2026-03-23T13:53:34Z","timestamp":1774274014000},"page":"97","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Predicting Mortality and Readmission in Obstructive Sleep Apnea via LLM-Expanded Clinical Concepts"],"prefix":"10.3390","volume":"10","author":[{"given":"Awwal","family":"Ahmed","sequence":"first","affiliation":[{"name":"Hood College, Frederick, MD 21701, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anthony","family":"Rispoli","sequence":"additional","affiliation":[{"name":"Hood College, Frederick, MD 21701, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carrie","family":"Wasieloski","sequence":"additional","affiliation":[{"name":"Hood College, Frederick, MD 21701, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ifrah","family":"Khurram","sequence":"additional","affiliation":[{"name":"San Juan Bautista School of Medicine, Caguas, PR 00727, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rafael","family":"Zamora-Resendiz","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Destinee","family":"Morrow","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6005-755X","authenticated-orcid":false,"given":"Aijuan","family":"Dong","sequence":"additional","affiliation":[{"name":"Hood College, Frederick, MD 21701, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1919-1890","authenticated-orcid":false,"given":"Silvia","family":"Crivelli","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2026,3,21]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"108348","DOI":"10.1016\/j.rmed.2025.108348","article-title":"Unmasking obstructive sleep apnea: Estimated prevalence and impact in the United States","volume":"248","author":"Dupuy","year":"2025","journal-title":"Respir. Med."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1016\/S2213-2600(19)30198-5","article-title":"Estimation of the global prevalence and burden of obstructive sleep apnoea: A literature-based analysis","volume":"7","author":"Benjafield","year":"2019","journal-title":"Lancet Respir. Med."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"720","DOI":"10.1161\/CIRCOUTCOMES.111.964783","article-title":"Association of obstructive sleep apnea with risk of serious cardiovascular events: A systematic review and meta-analysis","volume":"5","author":"Loke","year":"2012","journal-title":"Circ. Cardiovasc. Qual. Outcomes"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"e648","DOI":"10.1212\/WNL.0000000000006904","article-title":"Prevalence of sleep-disordered breathing after stroke and TIA: A meta-analysis","volume":"92","author":"Seiler","year":"2019","journal-title":"Neurology"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"777","DOI":"10.1093\/sleep\/29.6.777","article-title":"Differences in polysomnography predictors for hypertension and impaired glucose tolerance","volume":"29","author":"Sulit","year":"2006","journal-title":"Sleep"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.sleep.2018.03.016","article-title":"Relationship between obstructive sleep apnoea syndrome and essential hypertension: A dose-response meta-analysis","volume":"47","author":"Xia","year":"2018","journal-title":"Sleep Med."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1016\/j.smrv.2017.03.005","article-title":"Cognitive deficits in obstructive sleep apnea: Insights from a meta-review and comparison with deficits observed in COPD, insomnia, and sleep deprivation","volume":"38","author":"Olaithe","year":"2018","journal-title":"Sleep Med. Rev."},{"key":"ref_8","first-page":"613","article-title":"Sleep-Disordered Breathing, Hypoxia, and Risk of Mild Cognitive Impairment and Dementia in Older Women","volume":"306","author":"Yaffe","year":"2011","journal-title":"JAMA"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"812","DOI":"10.1097\/ALN.0b013e31816d83e4","article-title":"STOP questionnaire: A tool to screen patients for obstructive sleep apnea","volume":"108","author":"Chung","year":"2008","journal-title":"Anesthesiology"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1016\/S2213-2600(16)30075-3","article-title":"The NoSAS score for screening of sleep-disordered breathing: A derivation and validation study","volume":"4","author":"Hirotsu","year":"2016","journal-title":"Lancet Respir. Med."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1136\/amiajnl-2011-000681","article-title":"Methods and dimensions of electronic health record data quality assessment: Enabling reuse for clinical research","volume":"20","author":"Weiskopf","year":"2013","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Chiu, C.C., Wu, C.M., Chien, T.N., Kao, L.J., Li, C., and Chu, C.M. (2023). Integrating structured and unstructured EHR data for predicting mortality by machine learning and latent Dirichlet allocation method. Int. J. Environ. Res. Public Health, 20.","DOI":"10.3390\/ijerph20054340"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zhang, D., Yin, C., Zeng, J., Yuan, X., and Zhang, P. (2020). Combining structured and unstructured data for predictive models: A deep learning approach. BMC Med. Inform. Decis. Mak., 20.","DOI":"10.1186\/s12911-020-01297-6"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1038\/s41746-018-0029-1","article-title":"Scalable and accurate deep learning with electronic health records","volume":"1","author":"Rajkomar","year":"2018","journal-title":"NPJ Digit. Med."},{"key":"ref_15","unstructured":"Fensore, C., Carrillo-Larco, R.M., Patel, S.A., Morris, A.A., and Ho, J.C. (2024). Large Language Models for Integrating Social Determinant of Health Data: A Case Study on Heart Failure 30-Day Readmission Prediction. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Park, S., Wee, C.W., Choi, S.H., Kim, K.H., Chang, J.S., Yoon, H.I., Lee, I.J., Kim, Y.B., Cho, J., and Keum, K.C. (2024). RT-Surv: Improving Mortality Prediction After Radiotherapy with Large Language Model Structuring of Large-Scale Unstructured Electronic Health Records. arXiv.","DOI":"10.1016\/j.radonc.2025.111052"},{"key":"ref_17","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Alsentzer, E., Murphy, J.R., Boag, W., Weng, W.H., Jin, D., Naumann, T., and McDermott, M. (2019). Publicly available clinical BERT embeddings. arXiv.","DOI":"10.18653\/v1\/W19-1909"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: A pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1038\/s41746-022-00742-2","article-title":"A large language model for electronic health records","volume":"5","author":"Yang","year":"2022","journal-title":"NPJ Digit. Med."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1038\/s41586-023-06291-2","article-title":"Large language models encode clinical knowledge","volume":"620","author":"Singhal","year":"2023","journal-title":"Nature"},{"key":"ref_22","unstructured":"Nori, H., King, N., McKinney, S.M., Carignan, D., and Horvitz, E. (2023). Capabilities of GPT-4 on medical challenge problems. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1186\/s13326-016-0093-x","article-title":"Expansion of medical vocabularies using distributional semantics on Japanese patient blogs","volume":"7","author":"Ahltorp","year":"2016","journal-title":"J. Biomed. Semant."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1093\/jamiaopen\/ooz007","article-title":"Using word embeddings to expand terminology of dietary supplements using clinical notes","volume":"2","author":"Fan","year":"2019","journal-title":"JAMIA Open"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"104497","DOI":"10.1016\/j.jbi.2023.104497","article-title":"Embedding-based terminology expansion via secondary data sources","volume":"147","author":"Kugic","year":"2023","journal-title":"J. Biomed. Inform."},{"key":"ref_26","unstructured":"Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., and Brunskill, E. (2021). On the opportunities and risks of foundation models. arXiv."},{"key":"ref_27","unstructured":"Le Scao, T., Fan, A., Akiki, C., Pavlick, E., Ili\u0107, S., Hesslow, D., Castagn\u00e9, R., Luccioni, A.S., Yvon, F., and Gall\u00e9, M. (2022). Bloom: A 176b-parameter open-access multilingual language model. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Peng, Y., Yan, S., and Lu, Z. (2019). Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets. arXiv.","DOI":"10.18653\/v1\/W19-5006"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"e107","DOI":"10.1016\/S2589-7500(23)00021-3","article-title":"ChatGPT: The future of discharge summaries?","volume":"5","author":"Patel","year":"2023","journal-title":"Lancet Digit. Health"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"bbac409","DOI":"10.1093\/bib\/bbac409","article-title":"BioGPT: Generative pre-trained transformer for biomedical text generation and mining","volume":"23","author":"Luo","year":"2022","journal-title":"Briefings Bioinform."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Kumar, V., Rajawat, P.S., and Ntoutsi, E. (2025). Mitigating Semantic Drift: Evaluating LLMs\u2019 Efficacy in Psychotherapy through MI Dialogue Summarization Leveraging MITI Code. Proceedings of the International Joint Conference on Neural Networks (IJCNN), IEEE.","DOI":"10.1109\/IJCNN64981.2025.11228771"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1097\/CM9.0000000000003456","article-title":"Application of Large Language Models in Disease Diagnosis and Treatment","volume":"138","author":"Yang","year":"2025","journal-title":"Chin. Med. J."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"e59069","DOI":"10.2196\/59069","article-title":"Revolutionizing Health Care: The Transformative Impact of Large Language Models in Medicine","volume":"27","author":"Zhang","year":"2025","journal-title":"J. Med. Internet Res."},{"key":"ref_34","unstructured":"Qadrud-Din, J., Rabiou, A.B., Walker, R., Soni, R., Gajek, M., Pack, G., and Rangaraj, A. (2020). Transformer based language models for similar text retrieval and ranking. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Chae, Y., and Davidson, T. (2023). Large Language Models for Text Classification: From Zero-Shot Learning to Fine-Tuning, Open Science Foundation.","DOI":"10.31235\/osf.io\/sthwk_v1"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Savelka, J., and Ashley, K.D. (2023). The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts. Front. Artif. Intell., 6.","DOI":"10.3389\/frai.2023.1279794"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1093\/jamia\/ocz096","article-title":"Enhancing clinical concept extraction with contextual embeddings","volume":"26","author":"Si","year":"2019","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Hernandez, B., Stiff, O., Ming, D.K., Ho Quang, C., Nguyen Lam, V., Nguyen Minh, T., Nguyen Van Vinh, C., Nguyen Minh, N., Nguyen Quang, H., and Phung Khanh, L. (2023). Learning meaningful latent space representations for patient risk stratification: Model development and validation for dengue and other acute febrile illness. Front. Digit. Health, 5.","DOI":"10.3389\/fdgth.2023.1057467"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Morrow, E., Zamora-Resendiz, R., Beckham, J.C., Kimbrel, N.A., McMahon, B.H., and Crivelli, S. (2026). Life events extraction from healthcare notes for veteran acute suicide risk prediction. J. Am. Med. Inform. Assoc. (JAMIA), ocaf197.","DOI":"10.1093\/jamia\/ocaf197"},{"key":"ref_40","first-page":"26","article-title":"What\u2019s in a note? Unpacking predictive value in clinical note representations","volume":"2018","author":"Boag","year":"2018","journal-title":"AMIA Summits Transl. Sci. Proc."},{"key":"ref_41","unstructured":"Huang, K., Altosaar, J., and Ranganath, R. (2019). Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Wu, J., Ye, X., Mou, C., and Dai, W. (2023). Fineehr: Refine clinical note representations to improve mortality prediction. Proceedings of the 2023 11th International Symposium on Digital Forensics and Security (ISDFS), IEEE.","DOI":"10.1109\/ISDFS58141.2023.10131726"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Ye, J., Yao, L., Shen, J., Janarthanam, R., and Luo, Y. (2020). Predicting mortality in critically ill patients with diabetes using machine learning and clinical notes. BMC Med. Inform. Decis. Mak., 20.","DOI":"10.1186\/s12911-020-01318-4"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"103256","DOI":"10.1016\/j.jbi.2019.103256","article-title":"Readmission prediction using deep learning on electronic health records","volume":"97","author":"Ashfaq","year":"2019","journal-title":"J. Biomed. Inform."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"e38241","DOI":"10.2196\/38241","article-title":"Predicting postoperative mortality with deep neural networks and natural language processing: Model development and validation","volume":"10","author":"Chen","year":"2022","journal-title":"JMIR Med. Inform."},{"key":"ref_46","unstructured":"Jin, M., Bahadori, M.T., Colak, A., Bhatia, P., Celikkaya, B., Bhakta, R., Senthivel, S., Khalilia, M., Navarro, D., and Zhang, B. (2018). Improving hospital mortality prediction with medical named entities and multimodal learning. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Khadanga, S., Aggarwal, K., Joty, S., and Srivastava, J. (2019). Using clinical notes with time series data for ICU management. arXiv.","DOI":"10.18653\/v1\/D19-1678"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"1190","DOI":"10.1177\/000313481808400736","article-title":"Predicting mortality in the surgical intensive care unit using artificial intelligence and natural language processing of physician documentation","volume":"84","author":"Parreco","year":"2018","journal-title":"Am. Surg."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Nazih, W., Abuhmed, T., Alharbi, M., and El-Sappagh, S. (2025). Mortality Prediction for ICU Patients with Mental Disorders Using Large Language Models Ensemble and Unstructured Medical Notes. PLoS ONE, 20.","DOI":"10.1371\/journal.pone.0332134"},{"key":"ref_50","unstructured":"Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L.A., and Mark, R. (2023). MIMIC-IV (version 2.2). PhysioNet."},{"key":"ref_51","unstructured":"Ahmed, A., Rispoli, A., Wasieloski, C., Khurram, I., Zamora-Resendiz, R., Morrow, D., Dong, A., and Crivelli, S. (2024, January 9\u201312). Deep Phenotyping of Obstructive Sleep Apnea and Comorbidities with Large Language Models. Proceedings of the AIME24, Salt Lake City, UT, USA."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"175","DOI":"10.5664\/jcsm.8160","article-title":"Multisite validation of a simple electronic health record algorithm for identifying diagnosed obstructive sleep apnea","volume":"16","author":"Keenan","year":"2020","journal-title":"J. Clin. Sleep Med."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"ooab117","DOI":"10.1093\/jamiaopen\/ooab117","article-title":"Sleep apnea phenotyping and relationship to disease in a large clinical biobank","volume":"5","author":"Cade","year":"2022","journal-title":"JAMIA Open"}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/10\/3\/97\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T16:12:48Z","timestamp":1774627968000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/10\/3\/97"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,21]]},"references-count":53,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2026,3]]}},"alternative-id":["bdcc10030097"],"URL":"https:\/\/doi.org\/10.3390\/bdcc10030097","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,21]]}}}