{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T15:00:35Z","timestamp":1775142035354,"version":"3.50.1"},"reference-count":38,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T00:00:00Z","timestamp":1741046400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Digit. Health"],"abstract":"<jats:p>Large language models have shown improved accuracy in phenotype term normalization tasks when augmented with retrievers that suggest candidate normalizations based on term definitions. In this work, we introduce a simplified retriever that enhances large language model accuracy by searching the Human Phenotype Ontology (HPO) for candidate matches using contextual word embeddings from BioBERT without the need for explicit term definitions. Testing this method on terms derived from the clinical synopses of Online Mendelian Inheritance in Man (OMIM<jats:sup>\u00ae<\/jats:sup>), we demonstrate that the normalization accuracy of GPT-4o increases from a baseline of 62% without augmentation to 85% with retriever augmentation. This approach is potentially generalizable to other biomedical term normalization tasks and offers an efficient alternative to more complex retrieval methods.<\/jats:p>","DOI":"10.3389\/fdgth.2025.1495040","type":"journal-article","created":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T07:10:29Z","timestamp":1741072229000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["A simplified retriever to improve accuracy of phenotype normalizations by large language models"],"prefix":"10.3389","volume":"7","author":[{"given":"Daniel B.","family":"Hier","sequence":"first","affiliation":[]},{"given":"Thanh Son","family":"Do","sequence":"additional","affiliation":[]},{"given":"Tayo","family":"Obafemi-Ajayi","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,3,4]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"bbad493","DOI":"10.1093\/bib\/bbad493","article-title":"Opportunities and challenges for ChatGPT and large language models in biomedicine and health","volume":"25","author":"Tian","year":"2023","journal-title":"Brief Bioinform"},{"key":"B2","doi-asserted-by":"publisher","first-page":"1058","DOI":"10.1212\/WNL.0000000000207967","article-title":"Large language models in neurology research and future practice","volume":"101","author":"Romano","year":"2023","journal-title":"Neurology"},{"key":"B3","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1038\/s43856-023-00370-1","article-title":"The future landscape of large language models in medicine","volume":"3","author":"Clusmann","year":"2023","journal-title":"Commun Med"},{"key":"B4","doi-asserted-by":"publisher","first-page":"ocad259","DOI":"10.1093\/jamia\/ocad259","article-title":"Improving large language models for clinical named entity recognition via prompt engineering","volume":"31","author":"Hu","year":"2024","journal-title":"J Am Med Inform Assoc"},{"key":"B5","doi-asserted-by":"publisher","first-page":"104458","DOI":"10.1016\/j.jbi.2023.104458","article-title":"Few-shot learning for medical text: a review of advances, trends, and opportunities","volume":"144","author":"Ge","year":"2023","journal-title":"J Biomed Inform"},{"key":"B6","article-title":"Entity decomposition with filtering: a zero-shot clinical named entity recognition framework.","author":"Averly","year":""},{"key":"B7","doi-asserted-by":"publisher","first-page":"ocae074","DOI":"10.1093\/jamia\/ocae074","article-title":"Large language models for biomedicine: foundations, opportunities, challenges, and best practices","volume":"31","author":"Sahoo","year":"2024","journal-title":"J Am Med Inform Assoc"},{"key":"B8","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2024.bionlp-1.29","article-title":"REAL: a retrieval-augmented entity linking approach for biomedical concept recognition.","author":"Shlyk","year":""},{"key":"B9","doi-asserted-by":"publisher","first-page":"507","DOI":"10.1038\/nrg.2016.86","article-title":"Towards precision medicine","volume":"17","author":"Ashley","year":"2016","journal-title":"Nat Rev Genet"},{"key":"B10","doi-asserted-by":"publisher","first-page":"793","DOI":"10.1056\/NEJMp1500523","article-title":"A new initiative on precision medicine","volume":"372","author":"Collins","year":"2015","journal-title":"N Engl J Med"},{"key":"B11","doi-asserted-by":"publisher","first-page":"777","DOI":"10.1002\/humu.22080","article-title":"Deep phenotyping for precision medicine","volume":"33","author":"Robinson","year":"2012","journal-title":"Hum Mutat"},{"key":"B12","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1007\/978-981-10-1503-8_7","article-title":"Text mining for precision medicine: bringing structure to EHRs and biomedical literature to understand genes and health","volume":"939","author":"Simmons","year":"2016","journal-title":"Transl Biomed Inform"},{"key":"B13","doi-asserted-by":"publisher","first-page":"e1378","DOI":"10.1002\/wsbm.1378","article-title":"Integrated precision medicine: the role of electronic health records in delivering personalized treatment","volume":"9","author":"Sitapati","year":"2017","journal-title":"Wiley Interdiscip Rev Syst Biol Med"},{"key":"B14","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1016\/bs.pmbts.2022.03.002","article-title":"Artificial intelligence and machine learning in precision medicine: a paradigm shift in big data analysis","volume":"190","author":"Sahu","year":"2022","journal-title":"Prog Mol Biol Transl Sci"},{"key":"B15","doi-asserted-by":"publisher","first-page":"e206","DOI":"10.1136\/amiajnl-2013-002428","article-title":"Electronic health records-driven phenotyping: challenges, recent advances, and perspectives","volume":"20","author":"Pathak","year":"2013","journal-title":"J Am Med Inform Assoc"},{"key":"B16","doi-asserted-by":"publisher","first-page":"37","DOI":"10.19044\/esj.2022.v18n4p37","article-title":"High throughput neurological phenotyping with MetaMap","volume":"18","author":"Hier","year":"2022","journal-title":"Eur Sci J"},{"key":"B17","doi-asserted-by":"publisher","first-page":"4","DOI":"10.19044\/esj.2022.v18n4p4","article-title":"A focused review of deep phenotyping with examples from Neurology","volume":"18","author":"Hier","year":"2022","journal-title":"Eur Sci J"},{"key":"B18","doi-asserted-by":"crossref","DOI":"10.1109\/EMBC53108.2024.10782119","article-title":"High throughput phenotyping of physician notes with large language and hybrid NLP models.","author":"Munzir","year":""},{"key":"B19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/2041-1480-4-44","article-title":"Building a drug ontology based on RxNorm and other sources","volume":"4","author":"Hanna","year":"2013","journal-title":"J Biomed Semant"},{"key":"B20","doi-asserted-by":"publisher","first-page":"738","DOI":"10.1093\/jamia\/ocaa030","article-title":"The new international classification of diseases 11th edition: a comparative analysis with ICD-10 and ICD-10-CM","volume":"27","author":"Fung","year":"2020","journal-title":"J Am Med Inform Assoc"},{"key":"B21","doi-asserted-by":"publisher","first-page":"624","DOI":"10.1373\/49.4.624","article-title":"LOINC, a universal standard for identifying laboratory observations: a 5-year update","volume":"49","author":"McDonald","year":"2003","journal-title":"Clin Chem"},{"key":"B22","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1136\/amiajnl-2013-001935","article-title":"A review of approaches to identifying patient phenotype cohorts using electronic health records","volume":"21","author":"Shivade","year":"2014","journal-title":"J Am Med Inform Assoc"},{"key":"B23","doi-asserted-by":"publisher","first-page":"104252","DOI":"10.1016\/j.jbi.2022.104252","article-title":"An overview of biomedical entity linking throughout the years","volume":"137","author":"French","year":"2023","journal-title":"J Biomed Inform"},{"key":"B24","article-title":"NCBO annotator: semantic annotation of biomedical data.","author":"Jonquet","year":""},{"key":"B25","doi-asserted-by":"publisher","first-page":"W566","DOI":"10.1093\/nar\/gkz386","article-title":"Doc2Hpo: a web application for efficient and accurate HPO concept curation","volume":"47","author":"Liu","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"B26","doi-asserted-by":"publisher","first-page":"1269","DOI":"10.1109\/TCBB.2022.3170301","article-title":"PhenoBERT: a combined deep learning method for automated recognition of human phenotype ontology","volume":"20","author":"Feng","year":"2022","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"B27","doi-asserted-by":"publisher","first-page":"4837","DOI":"10.1093\/bioinformatics\/btac598","article-title":"BERN2: an advanced neural biomedical named entity recognition and normalization tool","volume":"38","author":"Sung","year":"2022","journal-title":"Bioinformatics"},{"key":"B28","doi-asserted-by":"publisher","first-page":"D704","DOI":"10.1093\/nar\/gkz997","article-title":"The monarch initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species","volume":"48","author":"Shefchek","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"B29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1093\/bioinformatics\/btae406","article-title":"FastHPOCR: pragmatic, fast, and accurate concept recognition using the human phenotype ontology","volume":"40","author":"Groza","year":"2024","journal-title":"Bioinformatics"},{"key":"B30","doi-asserted-by":"crossref","DOI":"10.1101\/362111","article-title":"ClinPhen extracts and prioritizes patient phenotypes directly from medical records to accelerate genetic disease diagnosis.","author":"Deisseroth","year":""},{"key":"B31","doi-asserted-by":"publisher","first-page":"bav005","DOI":"10.1093\/database\/bav005","article-title":"Automatic concept recognition using the human phenotype ontology reference and test suite corpora","volume":"2015","author":"Groza","year":"2015","journal-title":"Database"},{"key":"B32","article-title":"High-throughput phenotyping of clinical text using large language models.","author":"Hier","year":""},{"key":"B33","article-title":"Siren\u2019s song in the AI ocean: a survey on hallucination in large language models.","author":"Zhang","year":""},{"key":"B34","article-title":"Retrieval-augmented generation for large language models: a survey.","author":"Gao","year":""},{"key":"B35","doi-asserted-by":"publisher","first-page":"D789","DOI":"10.1093\/nar\/gku1205","article-title":"OMIM.org: online mendelian inheritance in man (OMIM\u00ae), an online catalog of human genes and genetic disorders","volume":"43","author":"Amberger","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"B36","article-title":"A large language model outperforms other computational approaches to the high-throughput phenotyping of physician notes.","author":"Munzir","year":""},{"key":"B37","article-title":"NCBO BioPortal (2024).","year":""},{"key":"B38","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"}],"container-title":["Frontiers in Digital Health"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1495040\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T07:10:32Z","timestamp":1741072232000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1495040\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,4]]},"references-count":38,"alternative-id":["10.3389\/fdgth.2025.1495040"],"URL":"https:\/\/doi.org\/10.3389\/fdgth.2025.1495040","relation":{},"ISSN":["2673-253X"],"issn-type":[{"value":"2673-253X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,4]]},"article-number":"1495040"}}