{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T09:10:10Z","timestamp":1755853810006,"version":"3.44.0"},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2025,8,7]],"date-time":"2025-08-07T00:00:00Z","timestamp":1754524800000},"content-version":"vor","delay-in-days":6,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100009122","name":"Ministry of Education","doi-asserted-by":"publisher","award":["101101923"],"award-info":[{"award-number":["101101923"]}],"id":[{"id":"10.13039\/100009122","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,8,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Accurately identifying and prioritizing protein binding pockets is a foundational element of small-molecule drug discovery. Defining these known pockets currently relies on a laborious manual process of extracting key residue data from selected publications, reconciling inconsistent terminology, and independently computing volumetric representations. This manual curation to ensure biological relevance is time-consuming, error-prone, and represents a major bottleneck for efficient, high-throughput drug discovery.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We present a novel approach for the identification and prioritization of protein binding pockets for small molecules by combining geometric pocket detection with large language models (LLMs). Our method leverages Fpocket to generate candidate pockets, which are then validated against published experimental data extracted from research articles using LLM with a series of prompts fine-tuned to identify and extract residue-level information associated with experimentally confirmed binding sites. We developed a curated benchmark dataset of diverse proteins and associated literature to train and evaluate the LLM\u2019s performance in paper relevance assessment and pocket extraction.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The developed benchmark dataset and methodology are freely available at the GitHub repository (https:\/\/github.com\/receptor-ai\/LLM-benchmark-dataset) and Zenodo (DOI: 10.5281\/zenodo.15798647).<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf449","type":"journal-article","created":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T20:04:06Z","timestamp":1755029046000},"source":"Crossref","is-referenced-by-count":0,"title":["Leveraging large language models for literature-driven prioritization of protein binding pockets"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-6632-7642","authenticated-orcid":false,"given":"Roman","family":"Stratiichuk","sequence":"first","affiliation":[{"name":"Receptor.AI Inc. , London N1 7GU,","place":["United Kingdom"]},{"name":"Department of Biophysics and Medical Informatics, Educational and Scientific Centre \u201c\u0406nstitute of Biology and Medicine\u201d, Taras Shevchenko Kyiv National University , Kyiv 01601,","place":["Ukraine"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-5671-9809","authenticated-orcid":false,"given":"Mykola","family":"Melnychenko","sequence":"additional","affiliation":[{"name":"Receptor.AI Inc. , London N1 7GU,","place":["United Kingdom"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4620-3175","authenticated-orcid":false,"given":"Ihor","family":"Koleiev","sequence":"additional","affiliation":[{"name":"Receptor.AI Inc. , London N1 7GU,","place":["United Kingdom"]},{"name":"Department of Physics of Biological Systems, Institute of Physics of The National Academy of Sciences of Ukraine , Kyiv 03038,","place":["Ukraine"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3127-3688","authenticated-orcid":false,"given":"Taras","family":"Voitsitskyi","sequence":"additional","affiliation":[{"name":"Receptor.AI Inc. , London N1 7GU,","place":["United Kingdom"]},{"name":"Department of Physics of Biological Systems, Institute of Physics of The National Academy of Sciences of Ukraine , Kyiv 03038,","place":["Ukraine"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-7398-2330","authenticated-orcid":false,"given":"Vladyslav","family":"Husak","sequence":"additional","affiliation":[{"name":"Receptor.AI Inc. , London N1 7GU,","place":["United Kingdom"]},{"name":"Department of Cellular, Computational and Integrative Biology, The University of Trento , Povo, Trento 38123,","place":["Italy"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-8474-9376","authenticated-orcid":false,"given":"Nazar","family":"Shevchuk","sequence":"additional","affiliation":[{"name":"Receptor.AI Inc. , London N1 7GU,","place":["United Kingdom"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-4644-3587","authenticated-orcid":false,"given":"Zakhar","family":"Ostrovsky","sequence":"additional","affiliation":[{"name":"Receptor.AI Inc. , London N1 7GU,","place":["United Kingdom"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0315-450X","authenticated-orcid":false,"given":"Volodymyr","family":"Bdzhola","sequence":"additional","affiliation":[{"name":"Institute of Molecular Biology and Genetics of The National Academy of Sciences of Ukraine , Kyiv 03143,","place":["Ukraine"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6748-8931","authenticated-orcid":false,"given":"Semen","family":"Yesylevskyy","sequence":"additional","affiliation":[{"name":"Receptor.AI Inc. , London N1 7GU,","place":["United Kingdom"]},{"name":"Department of Physics of Biological Systems, Institute of Physics of The National Academy of Sciences of Ukraine , Kyiv 03038,","place":["Ukraine"]},{"name":"Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences , Prague 6 CZ-166 10,","place":["Czech Republic"]},{"name":"Department of Physical Chemistry, Faculty of Science, Palack\u00fd University Olomouc , Olomouc 771 46,","place":["Czech Republic"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5103-0635","authenticated-orcid":false,"given":"Serhii","family":"Starosyla","sequence":"additional","affiliation":[{"name":"Receptor.AI Inc. , London N1 7GU,","place":["United Kingdom"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-8604-377X","authenticated-orcid":false,"given":"Alan","family":"Nafiiev","sequence":"additional","affiliation":[{"name":"Receptor.AI Inc. , London N1 7GU,","place":["United Kingdom"]}]}],"member":"286","published-online":{"date-parts":[[2025,8,7]]},"reference":[{"key":"2025082204563273400_btaf449-B1","doi-asserted-by":"publisher","first-page":"5069","DOI":"10.1021\/acs.jcim.1c00799","article-title":"DeepPocket: ligand binding site detection and segmentation using 3D convolutional neural networks","volume":"62","author":"Aggarwal","year":"2022","journal-title":"J Chem Inf Model"},{"key":"2025082204563273400_btaf449-B2","doi-asserted-by":"publisher","first-page":"aac5464","DOI":"10.1126\/science.aac5464","article-title":"Structural basis of Nav1.7 inhibition by an isoform-selective small-molecule antagonist","volume":"350","author":"Ahuja","year":"2015","journal-title":"Science"},{"key":"2025082204563273400_btaf449-B4","doi-asserted-by":"publisher","first-page":"5564","DOI":"10.1038\/s41467-024-49892-9","article-title":"In silico fragment-based discovery of CIB1-directed anti-tumor agents by FRASE-Bot","volume":"15","author":"An","year":"2024","journal-title":"Nat Commun"},{"key":"2025082204563273400_btaf449-B5","doi-asserted-by":"publisher","first-page":"e1000585","DOI":"10.1371\/journal.pcbi.1000585","article-title":"Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure","volume":"5","author":"Capra","year":"2009","journal-title":"PLoS Comput Biol"},{"key":"2025082204563273400_btaf449-B6","doi-asserted-by":"publisher","first-page":"5047","DOI":"10.1021\/ct500381c","article-title":"POVME 2.0: an enhanced tool for determining pocket shape and volume characteristics","volume":"10","author":"Durrant","year":"2014","journal-title":"J Chem Theory Comput"},{"key":"2025082204563273400_btaf449-B7","doi-asserted-by":"publisher","first-page":"417","DOI":"10.1002\/prot.22154","article-title":"Improving accuracy and efficiency of blind protein-ligand docking by focusing on predicted binding sites","volume":"74","author":"Ghersi","year":"2009","journal-title":"Proteins"},{"key":"2025082204563273400_btaf449-B8","doi-asserted-by":"publisher","first-page":"3128","DOI":"10.1021\/acs.jcim.3c00336","article-title":"Binding site detection remastered: enabling fast, robust, and reliable binding site detection and descriptor calculation with DoGSite3","volume":"63","author":"Graef","year":"2023","journal-title":"J Chem Inf Model"},{"key":"2025082204563273400_btaf449-B9","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1186\/s13321-024-00865-6","article-title":"PUResNetV2.0: a deep learning model leveraging sparse representation for improved ligand binding site prediction","volume":"16","author":"Jeevan","year":"2024","journal-title":"J Cheminform"},{"key":"2025082204563273400_btaf449-B10","doi-asserted-by":"publisher","first-page":"3036","DOI":"10.1093\/bioinformatics\/btx350","article-title":"DeepSite: protein-Binding site predictor using 3D-convolutional neural networks","volume":"33","author":"Jim\u00e9nez","year":"2017","journal-title":"Bioinformatics"},{"key":"2025082204563273400_btaf449-B11","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1186\/s13321-021-00547-7","article-title":"PUResNet: prediction of protein\u2013ligand binding sites using deep residual neural network","volume":"13","author":"Kandel","year":"2021","journal-title":"J Cheminform"},{"key":"2025082204563273400_btaf449-B12","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1038\/s41586-020-2654-5","article-title":"Shared structural mechanisms of general anaesthetics and benzodiazepines","volume":"585","author":"Kim","year":"2020","journal-title":"Nature"},{"key":"2025082204563273400_btaf449-B13","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1186\/s13321-018-0285-8","article-title":"P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure","volume":"10","author":"Kriv\u00e1k","year":"2018","journal-title":"J Cheminform"},{"key":"2025082204563273400_btaf449-B14","doi-asserted-by":"publisher","first-page":"549","DOI":"10.1038\/nrd4295","article-title":"Muscarinic acetylcholine receptors: novel opportunities for drug development","volume":"13","author":"Kruse","year":"2014","journal-title":"Nat Rev Drug Discov"},{"key":"2025082204563273400_btaf449-B15","doi-asserted-by":"publisher","first-page":"168","DOI":"10.1186\/1471-2105-10-168","article-title":"Fpocket: an open source platform for ligand pocket detection","volume":"10","author":"Le Guilloux","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2025082204563273400_btaf449-B16","doi-asserted-by":"publisher","first-page":"1884","DOI":"10.1002\/pro.5560070905","article-title":"Anatomy of protein pockets and cavities: measurement of binding site geometry and implications for ligand design","volume":"7","author":"Liang","year":"1998","journal-title":"Protein Sci"},{"key":"2025082204563273400_btaf449-B17","doi-asserted-by":"publisher","first-page":"W159","DOI":"10.1093\/nar\/gkac394","article-title":"CB-Dock2: improved protein\u2013ligand blind docking by integrating cavity detection, docking and homologous template fitting","volume":"50","author":"Liu","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2025082204563273400_btaf449-B18","doi-asserted-by":"publisher","first-page":"369","DOI":"10.1042\/BJ20131270","article-title":"Insights into the evolution of divergent nucleotide-binding mechanisms among pseudokinases revealed by crystal structures of human and mouse MLKL","volume":"457","author":"Murphy","year":"2014","journal-title":"Biochem J"},{"key":"2025082204563273400_btaf449-B19","doi-asserted-by":"publisher","first-page":"W363","DOI":"10.1093\/nar\/gky473","article-title":"CASTp 3.0: computed atlas of surface topography of proteins","volume":"46","author":"Tian","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2025082204563273400_btaf449-B20","doi-asserted-by":"publisher","first-page":"qzae001","DOI":"10.1093\/gpbjnl\/qzae001","article-title":"Q-BioLiP: a comprehensive resource for quaternary structure-based protein\u2013ligand interactions","volume":"22","author":"Wei","year":"2024","journal-title":"Genomics Proteomics Bioinform"},{"year":"2023","author":"Wei","key":"2025082204563273400_btaf449-B21"},{"key":"2025082204563273400_btaf449-B22","doi-asserted-by":"publisher","first-page":"3018","DOI":"10.1093\/bioinformatics\/btaa110","article-title":"Protein-Ligand binding residue prediction enhancement through hybrid deep heterogeneous learning of sequence and structure data","volume":"36","author":"Xia","year":"2020","journal-title":"Bioinformatics"},{"year":"2023","author":"Yao","key":"2025082204563273400_btaf449-B23"},{"volume-title":"J Comput Chem","author":"Yesylevskyy","key":"2025082204563273400_btaf449-B24","doi-asserted-by":"publisher","DOI":"10.1002\/jcc.27536"},{"key":"2025082204563273400_btaf449-B25","doi-asserted-by":"publisher","first-page":"D404","DOI":"10.1093\/nar\/gkad630","article-title":"BioLiP2: an updated structure database for biologically relevant Ligand-Protein interactions","volume":"52","author":"Zhang","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025082204563273400_btaf449-B26","doi-asserted-by":"publisher","first-page":"9280","DOI":"10.3390\/ijms25179280","article-title":"A point cloud graph neural network for protein\u2013ligand binding site prediction","volume":"25","author":"Zhao","year":"2024","journal-title":"Int J Mol Sci"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf449\/63978887\/btaf449.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/8\/btaf449\/63978887\/btaf449.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/8\/btaf449\/63978887\/btaf449.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T08:56:37Z","timestamp":1755852997000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf449\/8225722"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,8]]},"references-count":25,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,8,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf449","relation":{},"ISSN":["1367-4811"],"issn-type":[{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2025,8]]},"published":{"date-parts":[[2025,8]]},"article-number":"btaf449"}}