{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T21:36:17Z","timestamp":1775338577479,"version":"3.50.1"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"S1","license":[{"start":{"date-parts":[[2022,7,1]],"date-time":"2022-07-01T00:00:00Z","timestamp":1656633600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,7,7]],"date-time":"2022-07-07T00:00:00Z","timestamp":1657152000000},"content-version":"vor","delay-in-days":6,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"crossref","award":["R01AT009457"],"award-info":[{"award-number":["R01AT009457"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["UL1TR002494"],"award-info":[{"award-number":["UL1TR002494"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Since no effective therapies exist for Alzheimer\u2019s disease (AD), prevention has become more critical through lifestyle status changes and interventions. Analyzing electronic health records (EHRs) of patients with AD can help us better understand lifestyle\u2019s effect on AD. However, lifestyle information is typically stored in clinical narratives. Thus, the objective of the study was to compare different natural language processing (NLP) models on classifying the lifestyle statuses (e.g., physical activity and excessive diet) from clinical texts in English.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>Based on the collected concept unique identifiers (CUIs) associated with the lifestyle status, we extracted all related EHRs for patients with AD from the Clinical Data Repository (CDR) of the University of Minnesota (UMN). We automatically generated labels for the training data by using a rule-based NLP algorithm. We conducted weak supervision for pre-trained Bidirectional Encoder Representations from Transformers (BERT) models and three traditional machine learning models as baseline models on the weakly labeled training corpus. These models include the BERT base model, PubMedBERT (abstracts\u2009+\u2009full text), PubMedBERT (only abstracts), Unified Medical Language System (UMLS) BERT, Bio BERT, Bio-clinical BERT, logistic regression, support vector machine, and random forest. The rule-based model used for weak supervision was tested on the GSC for comparison. We performed two case studies: physical activity and excessive diet, in order to validate the effectiveness of BERT models in classifying lifestyle status for all models were evaluated and compared on the developed Gold Standard Corpus (GSC) on the two case studies.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>The UMLS BERT model achieved the best performance for classifying status of physical activity, with its precision, recall, and F-1 scores of 0.93, 0.93, and 0.92, respectively. Regarding classifying excessive diet, the Bio-clinical BERT model showed the best performance with precision, recall, and F-1 scores of 0.93, 0.93, and 0.93, respectively.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>The proposed approach leveraging weak supervision could significantly increase the sample size, which is required for training the deep learning models. By comparing with the traditional machine learning models, the study also demonstrates the high performance of BERT models for classifying lifestyle status for Alzheimer\u2019s disease in clinical notes.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s12911-022-01819-4","type":"journal-article","created":{"date-parts":[[2022,7,7]],"date-time":"2022-07-07T12:03:12Z","timestamp":1657195392000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":29,"title":["Classifying the lifestyle status for Alzheimer\u2019s disease from clinical notes using deep learning with weak supervision"],"prefix":"10.1186","volume":"22","author":[{"given":"Zitao","family":"Shen","sequence":"first","affiliation":[]},{"given":"Dalton","family":"Schutte","sequence":"additional","affiliation":[]},{"given":"Yoonkwon","family":"Yi","sequence":"additional","affiliation":[]},{"given":"Anusha","family":"Bompelli","sequence":"additional","affiliation":[]},{"given":"Fang","family":"Yu","sequence":"additional","affiliation":[]},{"given":"Yanshan","family":"Wang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8258-3585","authenticated-orcid":false,"given":"Rui","family":"Zhang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,7,7]]},"reference":[{"key":"1819_CR1","unstructured":"Alzheimer\u2019s Association: What is Alzheimer\u2019s? https:\/\/www.alz.org\/alzheimers-dementia\/what-is-alzheimers."},{"key":"1819_CR2","unstructured":"NIH: Alzheimer\u2019s Disease Fact Sheet. U.S. Department of Health and Human Services. https:\/\/www.nia.nih.gov\/health\/alzheimers-disease-fact-sheet."},{"issue":"4","key":"1819_CR3","doi-asserted-by":"publisher","first-page":"362","DOI":"10.2174\/1567205016666190315095151","volume":"16","author":"KS Frederiksen","year":"2019","unstructured":"Frederiksen KS, Gjerum L, Waldemar G, Hasselbalch SG. Physical activity as a moderator of alzheimer pathology: a systematic review of observational studies. Curr Alzheimer Res. 2019;16(4):362\u201378. https:\/\/doi.org\/10.2174\/1567205016666190315095151.","journal-title":"Curr Alzheimer Res"},{"issue":"4","key":"1819_CR4","doi-asserted-by":"publisher","first-page":"374","DOI":"10.1212\/WNL.0000000000009816","volume":"95","author":"K Dhana","year":"2020","unstructured":"Dhana K, Evans DA, Rajan KB, Bennett DA, Morris MC. Healthy lifestyle and the risk of Alzheimer dementia: findings from 2 longitudinal studies. Neurology. 2020;95(4):374\u201383.","journal-title":"Neurology"},{"issue":"6","key":"1819_CR5","doi-asserted-by":"publisher","first-page":"657","DOI":"10.1016\/j.jalz.2012.09.012","volume":"9","author":"M Kivipelto","year":"2013","unstructured":"Kivipelto M, Solomon A, Ahtiluoto S, Ngandu T, Lehtisalo J, Antikainen R, B\u00e4ckman L, H\u00e4nninen T, Jula A, Laatikainen T, et al. The finnish geriatric intervention study to prevent cognitive impairment and disability (finger): study design and progress. Alzheimer\u2019s Dement. 2013;9(6):657\u201365. https:\/\/doi.org\/10.1016\/j.jalz.2012.09.012.","journal-title":"Alzheimer\u2019s Dement"},{"key":"1819_CR6","unstructured":"Alzheimer\u2019s Association: A lifestyle intervention trial to support brain health and prevent cognitive decline. https:\/\/alz.org\/us-pointer\/overview.asp."},{"issue":"5","key":"1819_CR7","doi-asserted-by":"publisher","first-page":"382","DOI":"10.1056\/NEJMp0912825","volume":"362","author":"D Blumenthal","year":"2010","unstructured":"Blumenthal D. Launching hitech. N Engl J Med. 2010;362(5):382\u20135. https:\/\/doi.org\/10.1056\/NEJMp0912825.","journal-title":"N Engl J Med"},{"key":"1819_CR8","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1016\/j.jbi.2017.11.011","volume":"77","author":"Y Wang","year":"2018","unstructured":"Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S, Liu H. Clinical information extraction applications: a literature review. J Biomed Inform. 2018;77:34\u201349. https:\/\/doi.org\/10.1016\/j.jbi.2017.11.011.","journal-title":"J Biomed Inform"},{"key":"1819_CR9","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1016\/j.jbi.2018.10.005","volume":"88","author":"S Velupillai","year":"2018","unstructured":"Velupillai S, Suominen H, Liakata M, Roberts A, Shah AD, Morley K, Osborn D, Hayes J, Stewart R, Downs J, Chapman W. Using clinical natural language processing for health outcomes research: overview and actionable suggestions for future advances. J Biomed Inform. 2018;88:11\u20139. https:\/\/doi.org\/10.1016\/j.jbi.2018.10.005.","journal-title":"J Biomed Inform"},{"issue":"01","key":"1819_CR10","doi-asserted-by":"publisher","first-page":"194","DOI":"10.15265\/iy-2015-035","volume":"24","author":"A N\u00e9v\u00e9ol","year":"2015","unstructured":"N\u00e9v\u00e9ol A, Zweigenbaum P. Clinical natural language processing in 2014: foundational methods supporting efficient healthcare. Yearb Med Inform. 2015;24(01):194\u20138. https:\/\/doi.org\/10.15265\/iy-2015-035.","journal-title":"Yearb Med Inform"},{"key":"1819_CR11","unstructured":"Wu Y, Jiang M, Xu J, Zhi D, Xu H: Clinical named entity recognition using deep learning models. In: AMIA annual symposium proceedings, vol 2017. American Medical Informatics Association; 2017. p. 1812"},{"key":"1819_CR12","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-43742-2","volume-title":"Secondary analysis of electronic health records","author":"M Critical Data","year":"2016","unstructured":"Critical Data M. Secondary analysis of electronic health records. Springer; 2016."},{"issue":"1","key":"1819_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12911-017-0537-y","volume":"17","author":"J-B Escudi\u00e9","year":"2017","unstructured":"Escudi\u00e9 J-B, Rance B, Malamut G, Khater S, Burgun A, Cellier C, Jannot A-S. A novel data-driven workflow combining literature and electronic health records to estimate comorbidities burden for a specific disease: a case study on autoimmune comorbidities in patients with celiac disease. BMC Med Inform Decis Mak. 2017;17(1):1\u201310.","journal-title":"BMC Med Inform Decis Mak"},{"key":"1819_CR14","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijmedinf.2019.08.003","volume":"130","author":"X Zhou","year":"2019","unstructured":"Zhou X, Wang Y, Sohn S, Therneau TM, Liu H, Knopman DS. Automatic extraction and assessment of lifestyle exposures for Alzheimer\u2019s disease using natural language processing. Int J Med Inform. 2019;130: 103943. https:\/\/doi.org\/10.1016\/j.ijmedinf.2019.08.003.","journal-title":"Int J Med Inform"},{"key":"1819_CR15","doi-asserted-by":"crossref","unstructured":"Yi Y, Shen Z, Bompelli A, Yu F, Wang Y, Zhang R: Natural language processing methods to extract lifestyle exposures for Alzheimer\u2019s disease from clinical notes. In: HealthNLP workshop 2020. 2020 (in Press).","DOI":"10.1109\/ICHI48887.2020.9374320"},{"key":"1819_CR16","unstructured":"Devlin J, Chang M, Lee K, Toutanova K: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs\/1810.04805; 2018."},{"issue":"1","key":"1819_CR17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/sdata.2016.35","volume":"3","author":"AE Johnson","year":"2016","unstructured":"Johnson AE, Pollard TJ, Shen L, Li-Wei HL, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3(1):1\u20139.","journal-title":"Sci Data"},{"key":"1819_CR18","doi-asserted-by":"crossref","unstructured":"Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, Naumann T, Gao J, Poon H: Domain-specific language model pretraining for biomedical natural language processing. 2020. arXiv:2007.15779.","DOI":"10.1145\/3458754"},{"key":"1819_CR19","doi-asserted-by":"publisher","unstructured":"Alsentzer E, Murphy J, Boag W, Weng W-H, Jindi D, Naumann T, McDermott M: Publicly available clinical BERT embeddings. Association for Computational Linguistics; 2019. https:\/\/doi.org\/10.18653\/v1\/W19-1909. https:\/\/www.aclweb.org\/anthology\/W19-1909.","DOI":"10.18653\/v1\/W19-1909"},{"key":"1819_CR20","doi-asserted-by":"crossref","unstructured":"Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, Naumann T, Gao J, Poon H: Domain-specific language model pretraining for biomedical natural language processing. arXiv preprint arXiv:2007.15779. 2020.","DOI":"10.1145\/3458754"},{"issue":"4","key":"1819_CR21","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","volume":"36","author":"J Lee","year":"2019","unstructured":"Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2019;36(4):1234\u201340. https:\/\/doi.org\/10.1093\/bioinformatics\/btz682.","journal-title":"Bioinformatics"},{"key":"1819_CR22","doi-asserted-by":"crossref","unstructured":"Michalopoulos G, Wang Y, Kaka H, Chen H, Wong A: Umlsbert: clinical domain knowledge augmentation of contextual embeddings using the unified medical language system metathesaurus. arXiv preprint arXiv:2010.10391. 2020.","DOI":"10.18653\/v1\/2021.naacl-main.139"},{"issue":"1","key":"1819_CR23","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12911-018-0723-6","volume":"19","author":"Y Wang","year":"2019","unstructured":"Wang Y, Sohn S, Liu S, Shen F, Wang L, Atkinson EJ, Amin S, Liu H. A clinical text classification paradigm using weak supervision and deep representation. BMC Med Inform Decis Mak. 2019;19(1):1.","journal-title":"BMC Med Inform Decis Mak"},{"issue":"1","key":"1819_CR24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41467-021-22328-4","volume":"12","author":"JA Fries","year":"2021","unstructured":"Fries JA, Steinberg E, Khattar S, Fleming SL, Posada J, Callahan A, Shah NH. Ontology-driven weak supervision for clinical entity classification in electronic health records. Nat Commun. 2021;12(1):1\u201311.","journal-title":"Nat Commun"},{"key":"1819_CR25","doi-asserted-by":"crossref","unstructured":"Liang C, Yu Y, Jiang H, Er S, Wang R, Zhao T, Zhang C: Bond: bert-assisted open-domain named entity recognition with distant supervision. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery and data mining. 2020. p. 1054\u201364.","DOI":"10.1145\/3394486.3403149"},{"key":"1819_CR26","doi-asserted-by":"crossref","unstructured":"Patel D, Konam S, Selvaraj SP: Weakly supervised medication regimen extraction from medical conversations. arXiv preprint arXiv:2010.05317. 2020.","DOI":"10.18653\/v1\/2020.clinicalnlp-1.20"},{"key":"1819_CR27","unstructured":"Klie J-C: Inception: interactive machine-assisted annotation. In: Proceedings of the first biennial conference on design of experimental search and information retrieval systems; 2018. p. 105. http:\/\/tubiblio.ulb.tu-darmstadt.de\/106627\/."},{"key":"1819_CR28","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K: BERT: pre-training of deep bidirectional transformers for language understanding. 2018. 1810.04805."},{"key":"1819_CR29","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1016\/j.ijmedinf.2017.07.002","volume":"106","author":"R Zhang","year":"2017","unstructured":"Zhang R, Simon G, Yu F. Advancing Alzheimer\u2019s research: a review of big data promises. Int J Med Inform. 2017;106:48\u201356.","journal-title":"Int J Med Inform"}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-022-01819-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12911-022-01819-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-022-01819-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,11]],"date-time":"2023-02-11T02:20:56Z","timestamp":1676082056000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-022-01819-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7]]},"references-count":29,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2022,7]]}},"alternative-id":["1819"],"URL":"https:\/\/doi.org\/10.1186\/s12911-022-01819-4","relation":{},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7]]},"assertion":[{"value":"13 March 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 March 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 July 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}},{"order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}],"article-number":"88"}}