{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:35Z","timestamp":1772138075530,"version":"3.50.1"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T00:00:00Z","timestamp":1750982400000},"content-version":"vor","delay-in-days":26,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,6,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Binding sites are the key interfaces that determine a protein\u2019s biological activity, and therefore common targets for therapeutic intervention. Techniques that help us detect, compare, and contextualize binding sites are hence of immense interest to drug discovery.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here, we present an approach that integrates protein language models with a 3D tessellation technique to derive rich and versatile representations of binding sites that combine functional, structural, and evolutionary information with unprecedented detail. We demonstrate that the associated similarity metrics induce meaningful pocket clusterings by balancing local structure against global sequence effects. The resulting embeddings are shown to simplify a variety of downstream tasks: they help organize the \u2018pocketome\u2019 in a way that efficiently contextualizes new binding sites, construct performant druggability models, and define challenging train-test splits for believable benchmarking of pocket-centric machine-learning models.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>A Python package that implements the EPoCS method is freely available at https:\/\/github.com\/tugceoruc\/epocs.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf284","type":"journal-article","created":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T07:37:03Z","timestamp":1751009823000},"source":"Crossref","is-referenced-by-count":1,"title":["Mapping the space of protein binding sites with sequence-based protein language models"],"prefix":"10.1093","volume":"41","author":[{"given":"Tu\u011f\u00e7e","family":"Oru\u00e7","sequence":"first","affiliation":[{"name":"Computational Chemistry & Informatics, Astex Pharmaceuticals , Cambridge CB4 0QA,","place":["United Kingdom"]}]},{"given":"Maria","family":"Kadukova","sequence":"additional","affiliation":[{"name":"Computational Chemistry & Informatics, Astex Pharmaceuticals , Cambridge CB4 0QA,","place":["United Kingdom"]}]},{"given":"Thomas G","family":"Davies","sequence":"additional","affiliation":[{"name":"Computational Chemistry & Informatics, Astex Pharmaceuticals , Cambridge CB4 0QA,","place":["United Kingdom"]}]},{"given":"Marcel","family":"Verdonk","sequence":"additional","affiliation":[{"name":"Computational Chemistry & Informatics, Astex Pharmaceuticals , Cambridge CB4 0QA,","place":["United Kingdom"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8945-2027","authenticated-orcid":false,"given":"Carl","family":"Poelking","sequence":"additional","affiliation":[{"name":"Computational Chemistry & Informatics, Astex Pharmaceuticals , Cambridge CB4 0QA,","place":["United Kingdom"]}]}],"member":"286","published-online":{"date-parts":[[2025,6,27]]},"reference":[{"key":"2025070408272840100_btaf284-B1","doi-asserted-by":"publisher","first-page":"787","DOI":"10.1016\/j.chembiol.2003.09.002","article-title":"The process of structure-based drug design","volume":"10","author":"Anderson","year":"2003","journal-title":"Chem Biol"},{"key":"2025070408272840100_btaf284-B2","doi-asserted-by":"crossref","first-page":"469","DOI":"10.1145\/235815.235821","article-title":"The Quickhull algorithm for convex hulls","volume":"22","author":"Barber","year":"1996","journal-title":"ACM Trans Math Softw"},{"key":"2025070408272840100_btaf284-B3","doi-asserted-by":"publisher","first-page":"654","DOI":"10.1016\/j.cels.2021.05.017","article-title":"Learning the protein language: evolution, structure, and function","volume":"12","author":"Bepler","year":"2021","journal-title":"Cell Syst"},{"key":"2025070408272840100_btaf284-B4","doi-asserted-by":"publisher","first-page":"3130","DOI":"10.1039\/D3SC04185A","article-title":"PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences","volume":"15","author":"Buttenschoen","year":"2024","journal-title":"Chem Sci"},{"key":"2025070408272840100_btaf284-B5","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1186\/s13321-024-00821-4","article-title":"Learnt representations of proteins can be used for accurate prediction of small molecule binding sites on experimentally determined and predicted protein structures","volume":"16","author":"Carbery","year":"2024","journal-title":"J Cheminform"},{"key":"2025070408272840100_btaf284-B6","doi-asserted-by":"publisher","first-page":"1600","DOI":"10.1021\/acs.jcim.5b00333","article-title":"Detection of binding site molecular interaction field similarities","volume":"55","author":"Chartier","year":"2015","journal-title":"J Chem Inf Model"},{"key":"2025070408272840100_btaf284-B7","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1038\/nbt1273","article-title":"Structure-based maximal affinity model predicts small-molecule druggability","volume":"25","author":"Cheng","year":"2007","journal-title":"Nat Biotechnol"},{"key":"2025070408272840100_btaf284-B8","doi-asserted-by":"publisher","first-page":"eadg7492","DOI":"10.1126\/science.adg7492","article-title":"Accurate proteome-wide missense variant effect prediction with AlphaMissense","volume":"381","author":"Cheng","year":"2023","journal-title":"Science"},{"key":"2025070408272840100_btaf284-B9","doi-asserted-by":"publisher","first-page":"623","DOI":"10.1021\/ci300566n","article-title":"Encoding protein\u2013ligand interaction patterns in fingerprints and graphs","volume":"53","author":"Desaphy","year":"2013","journal-title":"J Chem Inf Model"},{"key":"2025070408272840100_btaf284-B10","doi-asserted-by":"publisher","author":"Devlin","year":"2019","DOI":"10.48550\/arXiv.1810.04805"},{"key":"2025070408272840100_btaf284-B11","doi-asserted-by":"publisher","author":"Durairaj","year":"2024","DOI":"10.1101\/2024.07.17.603955"},{"key":"2025070408272840100_btaf284-B12","doi-asserted-by":"publisher","first-page":"7127","DOI":"10.1021\/acs.jmedchem.0c00422","article-title":"A computer vision approach to align and compare protein cavities: application to fragment-based drug design","volume":"63","author":"Eguida","year":"2020","journal-title":"J Med Chem"},{"key":"2025070408272840100_btaf284-B13","doi-asserted-by":"publisher","first-page":"12462","DOI":"10.3390\/ijms232012462","article-title":"Estimating the similarity between protein pockets","volume":"23","author":"Eguida","year":"2022","journal-title":"Int J Mol Sci"},{"key":"2025070408272840100_btaf284-B14","doi-asserted-by":"publisher","first-page":"e1006483","DOI":"10.1371\/journal.pcbi.1006483","article-title":"A benchmark driven guide to binding site comparison: an exhaustive evaluation using tailor-made data sets (ProSPECCTs)","volume":"14","author":"Ehrt","year":"2018","journal-title":"PLoS Comput Biol"},{"key":"2025070408272840100_btaf284-B15","first-page":"1","article-title":"POT: Python Optimal Transport","volume":"22","author":"Flamary","year":"2021","journal-title":"J Mach Learn Res"},{"key":"2025070408272840100_btaf284-B16","doi-asserted-by":"publisher","first-page":"597","DOI":"10.1093\/bioinformatics\/btt024","article-title":"APoc: large-scale identification of similar protein pockets","volume":"29","author":"Gao","year":"2013","journal-title":"Bioinformatics"},{"key":"2025070408272840100_btaf284-B17","doi-asserted-by":"publisher","first-page":"2567","DOI":"10.1038\/s41467-022-29609-6","article-title":"The pocketome of G-protein-coupled receptors reveals previously untargeted allosteric sites","volume":"13","author":"Hedderich","year":"2022","journal-title":"Nat Commun"},{"key":"2025070408272840100_btaf284-B18","doi-asserted-by":"publisher","first-page":"275","DOI":"10.1038\/s41587-023-01763-2","article-title":"Efficient evolution of human antibodies from general protein language models","volume":"42","author":"Hie","year":"2024","journal-title":"Nat Biotechnol"},{"key":"2025070408272840100_btaf284-B19","doi-asserted-by":"publisher","first-page":"1160","DOI":"10.1093\/bioinformatics\/btq100","article-title":"ProBIS algorithm for detection of structurally similar protein binding sites by local structural alignment","volume":"26","author":"Konc","year":"2010","journal-title":"Bioinformatics"},{"key":"2025070408272840100_btaf284-B20","doi-asserted-by":"publisher","first-page":"D535","DOI":"10.1093\/nar\/gkr825","article-title":"Pocketome: an encyclopedia of small-molecule binding sites in 4D","volume":"40","author":"Kufareva","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2025070408272840100_btaf284-B21","doi-asserted-by":"publisher","first-page":"1432","DOI":"10.1002\/cmdc.200700075","article-title":"Functional classification of protein kinase binding sites using Cavbase","volume":"2","author":"Kuhn","year":"2007","journal-title":"ChemMedChem"},{"key":"2025070408272840100_btaf284-B22","doi-asserted-by":"publisher","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2025070408272840100_btaf284-B23","doi-asserted-by":"publisher","first-page":"15910","DOI":"10.1073\/pnas.1518946112","article-title":"Detection of secondary binding sites in proteins using fragment screening","volume":"112","author":"Ludlow","year":"2015","journal-title":"Proc Natl Acad Sci USA"},{"key":"2025070408272840100_btaf284-B24","doi-asserted-by":"publisher","first-page":"1177","DOI":"10.1038\/s41467-023-36699-3","article-title":"Predicting locations of cryptic pockets from single protein structures using the PocketMiner graph neural network","volume":"14","author":"Meller","year":"2023","journal-title":"Nat Commun"},{"key":"2025070408272840100_btaf284-B25","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1007\/978-3-319-21903-5_8","volume-title":"Hierarchical Clustering","author":"Nielsen","year":"2016"},{"key":"2025070408272840100_btaf284-B26","doi-asserted-by":"publisher","first-page":"1750","DOI":"10.1016\/j.csbj.2021.03.022","article-title":"The language of proteins: NLP, machine learning & protein sequences","volume":"19","author":"Ofer","year":"2021","journal-title":"Comput Struct Biotechnol J"},{"key":"2025070408272840100_btaf284-B27","doi-asserted-by":"crossref","first-page":"1131","DOI":"10.1002\/prot.25278","article-title":"VoroMQA: assessment of protein structure quality using interatomic contact areas","volume":"85","author":"Olechnovi\u010d","year":"2017","journal-title":"Proteins"},{"key":"2025070408272840100_btaf284-B28","doi-asserted-by":"publisher","first-page":"6296","DOI":"10.1038\/s41598-024-56893-7","article-title":"VirtuousPocketome: a computational tool for screening protein\u2013ligand complexes to identify similar binding sites","volume":"14","author":"Pallante","year":"2024","journal-title":"Sci Rep"},{"key":"2025070408272840100_btaf284-B29","doi-asserted-by":"publisher","first-page":"805","DOI":"10.1038\/s41589-022-01247-5","article-title":"Structural basis of efficacy-driven ligand selectivity at GPCRs","volume":"19","author":"Powers","year":"2023","journal-title":"Nat Chem Biol"},{"key":"2025070408272840100_btaf284-B30","doi-asserted-by":"publisher","first-page":"1755","DOI":"10.1002\/prot.21858","article-title":"A simple and fuzzy method to align and compare druggable ligand-binding sites","volume":"71","author":"Schalon","year":"2008","journal-title":"Proteins"},{"key":"2025070408272840100_btaf284-B31","doi-asserted-by":"publisher","first-page":"W337","DOI":"10.1093\/nar\/gki482","article-title":"SiteEngines: recognition and comparison of binding sites and protein\u2013protein interfaces","volume":"33","author":"Shulman-Peleg","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2025070408272840100_btaf284-B32","doi-asserted-by":"publisher","first-page":"2356","DOI":"10.1021\/acs.jcim.9b00554","article-title":"DeeplyTough: learning structural comparison of protein binding sites","volume":"60","author":"Simonovsky","year":"2020","journal-title":"J Chem Inf Model"},{"key":"2025070408272840100_btaf284-B33","doi-asserted-by":"publisher","first-page":"730","DOI":"10.1038\/s41592-022-01490-7","article-title":"ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction","volume":"19","author":"Tubiana","year":"2022","journal-title":"Nat Methods"},{"key":"2025070408272840100_btaf284-B34","doi-asserted-by":"publisher","first-page":"13452","DOI":"10.1038\/s41598-021-92785-w","article-title":"The effect of protein mutations on drug binding suggests ensuing personalised drug selection","volume":"11","author":"Wan","year":"2021","journal-title":"Sci Rep"},{"key":"2025070408272840100_btaf284-B35","doi-asserted-by":"publisher","first-page":"2031","DOI":"10.1021\/ci3000776","article-title":"Pharmacophore fingerprint-based approach to binding site subpocket similarity and its application to bioisostere replacement","volume":"52","author":"Wood","year":"2012","journal-title":"J Chem Inf Model"},{"key":"2025070408272840100_btaf284-B36","doi-asserted-by":"publisher","first-page":"543","DOI":"10.1186\/1471-2105-9-543","article-title":"PocketMatch: a new algorithm to compare binding sites in protein structures","volume":"9","author":"Yeturu","year":"2008","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf284\/63606584\/btaf284.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/6\/btaf284\/63606584\/btaf284.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/6\/btaf284\/63606584\/btaf284.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,4]],"date-time":"2025-07-04T08:27:35Z","timestamp":1751617655000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf284\/8176567"}},"subtitle":[],"editor":[{"given":"Arne","family":"Elofsson","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,6]]},"references-count":36,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,6,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf284","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.07.24.604735","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,6]]},"published":{"date-parts":[[2025,6]]},"article-number":"btaf284"}}