{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,15]],"date-time":"2026-05-15T07:10:09Z","timestamp":1778829009024,"version":"3.51.4"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T00:00:00Z","timestamp":1760054400000},"content-version":"vor","delay-in-days":9,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Identifying promising therapeutic targets from thousands of genes in transcriptomic studies remains a major bottleneck in biomedical research. While large language models (LLMs) show potential for gene prioritization, they suffer from hallucination and lack systematic validation against expert knowledge.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>The framework identified 609 sepsis-relevant genes with &amp;gt;94% filtering efficiency, demonstrating strong enrichment for inflammatory pathways including TNF-\u03b1 signaling, complement activation, and interferon responses. Literature validation yielded 30 ultra-high confidence therapeutic candidates, including both established sepsis genes (IL10, TREM1, S100A9, NLRP3) and novel targets warranting investigation. Benchmark validation against expert-curated databases achieved 71.2% recall, with systematic correlation between computational confidence and evidence quality. The final candidate set balanced discovery (11 novel genes) with validation (19 known genes), maintaining biological coherence throughout the filtering process. This framework demonstrates that rigorous methodology can transform unreliable LLM outputs into systematically validated biological insights. By combining computational efficiency with literature grounding, the approach provides a practical tool for prioritizing experimental validation efforts. The modular design enables adaptation to other diseases through knowledge base substitution, offering a systematic approach to literature-guided biomarker discovery.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>We developed a two-stage computational framework that combines LLM-based screening with literature validation for systematic gene prioritization. Starting with 10\u00a0824 genes from the BloodGen3 repertoire, we applied multi-criteria evaluation for sepsis relevance, followed by retrieval-augmented generation using 6346 curated sepsis publications. A novel faithfulness evaluation system verified that LLM predictions aligned with retrieved literature evidence. Source code and implementation details are available at https:\/\/github.com\/taushifkhan\/llm-geneprioritization-framework, vector database at https:\/\/doi.org\/10.5281\/zenodo.15802241, and Interactive demonstration at https:\/\/llm-geneprioritization.streamlit.app\/.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf541","type":"journal-article","created":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T12:16:55Z","timestamp":1759925815000},"source":"Crossref","is-referenced-by-count":3,"title":["Automating candidate gene prioritization with large language models: from naive scoring to literature-grounded validation"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7917-8965","authenticated-orcid":false,"given":"Taushif","family":"Khan","sequence":"first","affiliation":[{"name":"The Jackson Laboratory for Genomic Medicine , 10 Discovery Drive , Farmington, Connecticut, 06032,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohammed","family":"Toufiq","sequence":"additional","affiliation":[{"name":"The Jackson Laboratory for Genomic Medicine , 10 Discovery Drive , Farmington, Connecticut, 06032,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marina","family":"Yurieva","sequence":"additional","affiliation":[{"name":"The Jackson Laboratory for Genomic Medicine , 10 Discovery Drive , Farmington, Connecticut, 06032,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nitaya","family":"Indrawattana","sequence":"additional","affiliation":[{"name":"Research Department, Faculty of Medicine Siriraj Hospital Biomedical Research Incubator Unit, , Mahidol University, Bangkok,","place":["Thailand"]},{"name":"Faculty of Medicine Siriraj Hospital Siriraj Center of Research and Excellence in Allergy and Immunology, , Mahidol University , Bangkok,","place":["Thailand"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Akanitt","family":"Jittmittraphap","sequence":"additional","affiliation":[{"name":"Department of Microbiology and Immunology, Faculty of Tropical Medicine, Mahidol University , Bangkok, 73170,","place":["Thailand"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7465-0696","authenticated-orcid":false,"given":"Nathamon","family":"Kosoltanapiwat","sequence":"additional","affiliation":[{"name":"Department of Microbiology and Immunology, Faculty of Tropical Medicine, Mahidol University , Bangkok, 73170,","place":["Thailand"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pornpan","family":"Pumirat","sequence":"additional","affiliation":[{"name":"Department of Microbiology and Immunology, Faculty of Tropical Medicine, Mahidol University , Bangkok, 73170,","place":["Thailand"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5590-0688","authenticated-orcid":false,"given":"Passanesh","family":"Sukphopetch","sequence":"additional","affiliation":[{"name":"Department of Microbiology and Immunology, Faculty of Tropical Medicine, Mahidol University , Bangkok, 73170,","place":["Thailand"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Muthita","family":"Vanaporn","sequence":"additional","affiliation":[{"name":"Department of Microbiology and Immunology, Faculty of Tropical Medicine, Mahidol University , Bangkok, 73170,","place":["Thailand"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Karolina","family":"Palucka","sequence":"additional","affiliation":[{"name":"The Jackson Laboratory for Genomic Medicine , 10 Discovery Drive , Farmington, Connecticut, 06032,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Basirudeen","family":"Syed Ahamed Kabeer","sequence":"additional","affiliation":[{"name":"Sidra Medicine , Doha,","place":["Qatar"]},{"name":"Saveetha Institute of Medical and Technical Sciences (SIMATS), Saveetha University Department of Pathology, Saveetha Medical College and Hospital, , Chennai,","place":["India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Darawan","family":"Rinchai","sequence":"additional","affiliation":[{"name":"St. Jude Children\u2019s Research Hospital , Memphis, TN,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Damien","family":"Chaussabel","sequence":"additional","affiliation":[{"name":"The Jackson Laboratory for Genomic Medicine , 10 Discovery Drive , Farmington, Connecticut, 06032,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,10,10]]},"reference":[{"key":"2025102308562129100_btaf541-B1","first-page":"e55991","article-title":"Comparing the performance of popular large language models on the national board of medical examiners sample questions","volume":"16","author":"Abbas","year":"2024","journal-title":"Cureus"},{"key":"2025102308562129100_btaf541-B2","doi-asserted-by":"crossref","first-page":"4385","DOI":"10.1038\/s41467-021-24584-w","article-title":"Development of a fixed module repertoire for the analysis and interpretation of blood transcriptome data","volume":"12","author":"Altman","year":"2021","journal-title":"Nat Commun"},{"key":"2025102308562129100_btaf541-B3","doi-asserted-by":"crossref","first-page":"711","DOI":"10.1084\/jem.20021553","article-title":"Interferon and granulopoiesis signatures in systemic lupus erythematosus blood","volume":"197","author":"Bennett","year":"2003","journal-title":"J Exp Med"},{"key":"2025102308562129100_btaf541-B4","doi-asserted-by":"publisher","author":"Brummaier","DOI":"10.1136\/bmjopen-2020-041631"},{"key":"2025102308562129100_btaf541-B5","doi-asserted-by":"crossref","first-page":"1003","DOI":"10.3390\/cells12071003","article-title":"The regulation of neutrophil migration in patients with sepsis: the complexity of the molecular mechanisms and their modulation in sepsis and the heterogeneity of sepsis patients","volume":"12","author":"Bruserud","year":"2023","journal-title":"Cells"},{"key":"2025102308562129100_btaf541-B6","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1038\/ni.3151","article-title":"A vision and a prescription for big data-enabled medicine","volume":"16","author":"Chaussabel","year":"2015","journal-title":"Nat Immunol"},{"key":"2025102308562129100_btaf541-B7","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1016\/j.immuni.2008.05.012","article-title":"A modular analysis framework for blood genomics studies: application to systemic lupus erythematosus","volume":"29","author":"Chaussabel","year":"2008","journal-title":"Immunity"},{"key":"2025102308562129100_btaf541-B8","doi-asserted-by":"crossref","first-page":"1173","DOI":"10.18653\/v1\/2024.acl-long.65","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Chu","year":"2024"},{"key":"2025102308562129100_btaf541-B9","doi-asserted-by":"crossref","first-page":"D1257","DOI":"10.1093\/nar\/gkac833","article-title":"Comparative toxicogenomics database (CTD): update 2023","volume":"51","author":"Davis","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2025102308562129100_btaf541-B10","doi-asserted-by":"crossref","first-page":"1941","DOI":"10.1097\/JS9.0000000000001066","article-title":"Evaluation of large language models in breast cancer clinical scenarios: a comparative analysis based on ChatGPT-3.5, ChatGPT-4.0, and Claude2","volume":"110","author":"Deng","year":"2024","journal-title":"Int J Surg"},{"key":"2025102308562129100_btaf541-B11","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1186\/s13059-017-1282-3","article-title":"Knowledge-guided gene prioritization reveals new insights into the mechanisms of chemoresistance","volume":"18","author":"Emad","year":"2017","journal-title":"Genome Biol"},{"key":"2025102308562129100_btaf541-B12","doi-asserted-by":"crossref","first-page":"W455","DOI":"10.1093\/nar\/gkr246","article-title":"G\u00e9nie: literature-based gene prioritization at multi genomic scale","volume":"39","author":"Fontaine","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2025102308562129100_btaf541-B13","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1038\/nbt1385","article-title":"Direct multiplexed measurement of gene expression with color-coded probe pairs","volume":"26","author":"Geiss","year":"2008","journal-title":"Nat Biotechnol"},{"key":"2025102308562129100_btaf541-B15","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1126\/science.286.5439.531","article-title":"Molecular classification of cancer: class discovery and class prediction by gene expression monitoring","volume":"286","author":"Golub","year":"1999","journal-title":"Science"},{"key":"2025102308562129100_btaf541-B16","doi-asserted-by":"crossref","first-page":"862","DOI":"10.1038\/nri3552","article-title":"Sepsis-induced immunosuppression: from cellular dysfunctions to immunotherapy","volume":"13","author":"Hotchkiss","year":"2013","journal-title":"Nat Rev Immunol"},{"key":"2025102308562129100_btaf541-B17","doi-asserted-by":"crossref","first-page":"2190","DOI":"10.1016\/j.ajhg.2024.08.010","article-title":"Assessing the utility of large language models for phenotype-driven gene prioritization in the diagnosis of rare genetic disease","volume":"111","author":"Kim","year":"2024","journal-title":"Am J Hum Genet"},{"key":"2025102308562129100_btaf541-B18","doi-asserted-by":"crossref","first-page":"523","DOI":"10.1093\/bioinformatics\/btt703","article-title":"Causal analysis approaches in ingenuity pathway analysis","volume":"30","author":"Kr\u00e4mer","year":"2014","journal-title":"Bioinformatics"},{"key":"2025102308562129100_btaf541-B19","doi-asserted-by":"crossref","first-page":"767","DOI":"10.1038\/s41590-023-01490-5","article-title":"Neutrophils and emergency granulopoiesis drive immune suppression and an extreme response endotype during sepsis","volume":"24","author":"Kwok","year":"2023","journal-title":"Nat Immunol"},{"key":"2025102308562129100_btaf541-B20","doi-asserted-by":"publisher","first-page":"89","DOI":"10.3390\/make7030089","article-title":"AlzheimerRAG: Multimodal retrieval-augmented generation for clinical use cases","volume":"7","author":"Lahiri","year":"2025","journal-title":"MAKE"},{"key":"2025102308562129100_btaf541-B21","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1038\/ni.2789","article-title":"Molecular signatures of antibody responses derived from a systems biology study of five human vaccines","volume":"15","author":"Li","year":"2014","journal-title":"Nat Immunol"},{"key":"2025102308562129100_btaf541-B22","author":"Lin","year":"2024"},{"key":"2025102308562129100_btaf541-B23","unstructured":"Montani I, Honnibal M, Honnibal M \u00a0et al \u00a0Explosion\/spaCy: v3.7.2: fixes for APIs and requirements. Zenodo 2023. https:\/\/doi: \u00a010.5281\/zenodo.10009823"},{"key":"2025102308562129100_btaf541-B24","doi-asserted-by":"crossref","first-page":"D1302","DOI":"10.1093\/nar\/gkaa1027","article-title":"Open targets platform: supporting systematic drug\u2013target identification and prioritisation","volume":"49","author":"Ochoa","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025102308562129100_btaf541-B25","first-page":"D845","article-title":"The DisGeNET knowledge platform for disease genomics: 2019 update","volume":"48","author":"Pi\u00f1ero","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2025102308562129100_btaf541-B26","doi-asserted-by":"crossref","first-page":"D938","DOI":"10.1093\/nar\/gkad1082","article-title":"The monarch initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species","volume":"52","author":"Putman","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025102308562129100_btaf541-B27","doi-asserted-by":"crossref","first-page":"e244","DOI":"10.1002\/ctm2.244","article-title":"Definition of erythroid cell-positive blood transcriptome phenotypes associated with severe respiratory syncytial virus infection","volume":"10","author":"Rinchai","year":"2020","journal-title":"Clin Transl Med"},{"key":"2025102308562129100_btaf541-B28","doi-asserted-by":"publisher","first-page":"994","DOI":"10.12688\/f1000research.122811.1","article-title":"A training curriculum for retrieving, structuring, and aggregating information derived from the biomedical literature and large-scale data repositories","volume":"11","author":"Rinchai","year":"2022","journal-title":"F1000Res"},{"key":"2025102308562129100_btaf541-B29","doi-asserted-by":"crossref","first-page":"eabp9961","DOI":"10.1126\/sciadv.abp9961","article-title":"High-temporal resolution profiling reveals distinct immune trajectories following the first and second doses of COVID-19 mRNA vaccines","volume":"8","author":"Rinchai","year":"2022","journal-title":"Sci Adv"},{"key":"2025102308562129100_btaf541-B30","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1186\/s12967-020-02456-z","article-title":"A modular framework for the development of targeted covid-19 blood transcript profiling panels","volume":"18","author":"Rinchai","year":"2020","journal-title":"J Transl Med"},{"key":"2025102308562129100_btaf541-B31","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1038\/s41586-023-06291-2","article-title":"Large language models encode clinical knowledge","volume":"620","author":"Singhal","year":"2023","journal-title":"Nature"},{"key":"2025102308562129100_btaf541-B32","doi-asserted-by":"crossref","first-page":"e1662","DOI":"10.1371\/journal.pone.0001662","article-title":"High throughput gene expression measurement with real time PCR in a microfluidic dynamic array","volume":"3","author":"Spurgeon","year":"2008","journal-title":"PLoS One"},{"key":"2025102308562129100_btaf541-B33","doi-asserted-by":"crossref","first-page":"23225","DOI":"10.1038\/s41598-024-73916-5","article-title":"Human-augmented large language model-driven selection of glutathione peroxidase 4 as a candidate blood transcriptional biomarker for circulating erythroid cells","volume":"14","author":"Subba","year":"2024","journal-title":"Sci Rep"},{"key":"2025102308562129100_btaf541-B34","doi-asserted-by":"crossref","first-page":"1510431","DOI":"10.3389\/fmed.2025.1510431","article-title":"From gene modules to gene markers: an integrated AI-human approach selects CD38 to represent plasma cell-associated transcriptional signatures","volume":"12","author":"Syed Ahamed Kabeer","year":"2025","journal-title":"Front Med (Lausanne)"},{"key":"2025102308562129100_btaf541-B35","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1186\/s13326-024-00320-3","article-title":"Dynamic retrieval augmented generation of ontologies using artificial intelligence (DRAGON-AI)","volume":"15","author":"Toro","year":"2024","journal-title":"J Biomed Semantics"},{"key":"2025102308562129100_btaf541-B36","doi-asserted-by":"crossref","first-page":"728","DOI":"10.1186\/s12967-023-04576-8","article-title":"Harnessing large language models (LLMs) for candidate gene prioritization and selection","volume":"21","author":"Toufiq","year":"2023","journal-title":"J Transl Med"},{"key":"2025102308562129100_btaf541-B37","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1038\/nri.2017.36","article-title":"The immunopathology of sepsis and potential therapeutic targets","volume":"17","author":"van der Poll","year":"2017","journal-title":"Nat Rev Immunol"},{"key":"2025102308562129100_btaf541-B38","doi-asserted-by":"crossref","first-page":"530","DOI":"10.1038\/415530a","article-title":"Gene expression profiling predicts clinical outcome of breast cancer","volume":"415","author":"van\u2019t Veer","year":"2002","journal-title":"Nature"},{"key":"2025102308562129100_btaf541-B39","doi-asserted-by":"publisher","author":"Wu","year":"2025","DOI":"10.48550\/arXiv.2503.12286"},{"key":"2025102308562129100_btaf541-B40","doi-asserted-by":"crossref","first-page":"6233","DOI":"10.18653\/v1\/2024.findings-acl.372","volume-title":"Findings of the Association for Computational Linguistics: ACL 2024","author":"Xiong","year":"2024"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf541\/64612806\/btaf541.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/10\/btaf541\/64612806\/btaf541.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/10\/btaf541\/64612806\/btaf541.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,23]],"date-time":"2025-10-23T12:56:31Z","timestamp":1761224191000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf541\/8280402"}},"subtitle":[],"editor":[{"given":"Zhiyong","family":"Lu","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2025,10]]},"references-count":39,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2025,10,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf541","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,10]]},"published":{"date-parts":[[2025,10]]},"article-number":"btaf541"}}