{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T13:26:48Z","timestamp":1772458008766,"version":"3.50.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2024,12,18]],"date-time":"2024-12-18T00:00:00Z","timestamp":1734480000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001824","name":"Czech Science Foundation","doi-asserted-by":"publisher","award":["23-07349S"],"award-info":[{"award-number":["23-07349S"]}],"id":[{"id":"10.13039\/501100001824","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,12,26]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Structure-based methods for detecting protein\u2013ligand binding sites play a crucial role in various domains, from fundamental research to biomedical applications. However, current prediction methodologies often rely on holo (ligand-bound) protein conformations for training and evaluation, overlooking the significance of the apo (ligand-free) states. This oversight is particularly problematic in the case of cryptic binding sites (CBSs) where holo-based assessment yields unrealistic performance expectations.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>To advance the development in this domain, we introduce CryptoBench, a benchmark dataset tailored for training and evaluating novel CBS prediction methodologies. CryptoBench is constructed upon a large collection of apo\u2013holo protein pairs, grouped by UniProtID, clustered by sequence identity, and filtered to contain only structures with substantial structural change in the binding site. CryptoBench comprises 1107 structures with predefined cross-validation splits, making it the most extensive CBS dataset to date. To establish a performance baseline, we measured the predictive power of sequence- and structure-based CBS residue prediction methods using the benchmark. We selected PocketMiner as the state-of-the-art representative of the structure-based methods for CBS detection, and P2Rank, a widely-used structure-based method for general binding site prediction that is not specifically tailored for cryptic sites. For sequence-based approaches, we trained a neural network to classify binding residues using protein language model embeddings. Our sequence-based approach outperformed PocketMiner and P2Rank across key metrics, including area under the curve, area under the precision-recall curve, Matthew\u2019s correlation coefficient, and F1 scores. These results provide baseline benchmark results for future CBS and potentially also non-CBS prediction endeavors, leveraging CryptoBench as the foundational platform for further advancements in the field.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The CryptoBench dataset, including the benchmark model, is available on Open Science Framework\u2014https:\/\/osf.io\/pz4a9\/. The code and tutorial are available at the GitHub repository\u2014https:\/\/github.com\/skrhakv\/CryptoBench\/.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae745","type":"journal-article","created":{"date-parts":[[2024,12,16]],"date-time":"2024-12-16T07:28:29Z","timestamp":1734334109000},"source":"Crossref","is-referenced-by-count":13,"title":["CryptoBench: cryptic protein\u2013ligand binding sites dataset and benchmark"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-0712-0661","authenticated-orcid":false,"given":"V\u00edt","family":"\u0160krh\u00e1k","sequence":"first","affiliation":[{"name":"Department of Software Engineering, Faculty of Mathematics and Physics, Charles University , 118 00 Prague, Czech Republic"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8788-3202","authenticated-orcid":false,"given":"Marian","family":"Novotn\u00fd","sequence":"additional","affiliation":[{"name":"Department of Cell Biology, Faculty of Science, Charles University , 128 43 Prague, Czech Republic"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7987-6045","authenticated-orcid":false,"given":"Christos P","family":"Feidakis","sequence":"additional","affiliation":[{"name":"Department of Cell Biology, Faculty of Science, Charles University , 128 43 Prague, Czech Republic"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7521-0844","authenticated-orcid":false,"given":"Radoslav","family":"Kriv\u00e1k","sequence":"additional","affiliation":[{"name":"Department of Software Engineering, Faculty of Mathematics and Physics, Charles University , 118 00 Prague, Czech Republic"},{"name":"Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences , 160 00 Prague, Czech Republic"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4679-0557","authenticated-orcid":false,"given":"David","family":"Hoksza","sequence":"additional","affiliation":[{"name":"Department of Software Engineering, Faculty of Mathematics and Physics, Charles University , 118 00 Prague, Czech Republic"}]}],"member":"286","published-online":{"date-parts":[[2024,12,18]]},"reference":[{"key":"2025011223412055400_btae745-B1","doi-asserted-by":"publisher","first-page":"493","DOI":"10.1038\/s41586-024-07487-w","article-title":"Accurate structure prediction of biomolecular interactions with alphafold 3","volume":"630","author":"Abramson","year":"2024","journal-title":"Nature"},{"key":"2025011223412055400_btae745-B2","doi-asserted-by":"publisher","first-page":"311","DOI":"10.1186\/s12859-019-2932-0","article-title":"ProteinNet: A standardized data set for machine learning of protein structure","volume":"20","author":"AlQuraishi","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2025011223412055400_btae745-B3","doi-asserted-by":"publisher","first-page":"E3416","DOI":"10.1073\/pnas.1711490115","article-title":"Exploring the structural origins of cryptic sites on proteins","volume":"115","author":"Beglov","year":"2018","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2025011223412055400_btae745-B4","doi-asserted-by":"publisher","first-page":"709","DOI":"10.1016\/j.jmb.2016.01.029","article-title":"Cryptosite: expanding the druggable proteome by characterization and prediction of cryptic binding sites","volume":"428","author":"Cimermancic","year":"2016","journal-title":"J Mol Biol"},{"key":"2025011223412055400_btae745-B5","doi-asserted-by":"crossref","first-page":"167587","DOI":"10.1016\/j.jmb.2022.167587","article-title":"Ftmove: a web server for detection and analysis of cryptic and allosteric binding sites by mapping multiple protein structures","volume":"434","author":"Egbert","year":"2022","journal-title":"J Mol Biol"},{"key":"2025011223412055400_btae745-B6","author":"Ehrt","year":"2019"},{"key":"2025011223412055400_btae745-B7","doi-asserted-by":"publisher","first-page":"168545","DOI":"10.1016\/j.jmb.2024.168545","article-title":"AHoJ-DB: A PDB-wide assignment of apo & holo relationships based on individual protein-ligand interactions","volume":"436","author":"Feidakis","year":"2024","journal-title":"J Mol Biol"},{"key":"2025011223412055400_btae745-B8","doi-asserted-by":"crossref","first-page":"5452","DOI":"10.1093\/bioinformatics\/btac701","article-title":"Ahoj: rapid, tailored search and retrieval of apo and holo protein structures for user-defined ligands","volume":"38","author":"Feidakis","year":"2022","journal-title":"Bioinformatics"},{"key":"2025011223412055400_btae745-B9","doi-asserted-by":"publisher","first-page":"332","DOI":"10.1093\/bioinformatics\/bty464","article-title":"3DPatch: fast 3D structure visualization with residue conservation","volume":"35","author":"Jakubec","year":"2019","journal-title":"Bioinformatics"},{"key":"2025011223412055400_btae745-B10","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1186\/s13321-018-0285-8","article-title":"P2rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure","volume":"10","author":"Kriv\u00e1k","year":"2018","journal-title":"J Cheminform"},{"key":"2025011223412055400_btae745-B11","doi-asserted-by":"publisher","first-page":"654","DOI":"10.1021\/acs.accounts.9b00613","article-title":"Investigating cryptic binding sites by molecular dynamics simulations","volume":"53","author":"Kuzmanic","year":"2020","journal-title":"Acc Chem Res"},{"key":"2025011223412055400_btae745-B12","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1016\/0022-2836(71)90324-X","article-title":"The interpretation of protein structures: estimation of static accessibility","volume":"55","author":"Lee","year":"1971","journal-title":"J Mol Biol"},{"key":"2025011223412055400_btae745-B13","doi-asserted-by":"publisher","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2025011223412055400_btae745-B14","doi-asserted-by":"publisher","author":"Lin","year":".","DOI":"10.1101\/2022.07.20.500902"},{"key":"2025011223412055400_btae745-B15","doi-asserted-by":"publisher","first-page":"2314","DOI":"10.1021\/acs.jcim.9b01209","article-title":"Playmolecule crypticscout: predicting protein cryptic sites using mixed-solvent molecular simulations","volume":"60","author":"Martinez-Rosell","year":"2020","journal-title":"J Chem Inf Model"},{"key":"2025011223412055400_btae745-B16","doi-asserted-by":"publisher","first-page":"1177","DOI":"10.1038\/s41467-023-36699-3","article-title":"Predicting locations of cryptic pockets from single protein structures using the pocketminer graph neural network","volume":"14","author":"Meller","year":"2023","journal-title":"Nat Commun"},{"key":"2025011223412055400_btae745-B17","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1146\/annurev.bb.06.060177.001055","article-title":"Areas, volumes, packing, and protein structure","volume":"6","author":"Richards","year":"1977","journal-title":"Annu Rev Biophys Bioeng"},{"key":"2025011223412055400_btae745-B18","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1038\/nrd3410","article-title":"The resurgence of covalent drugs","volume":"10","author":"Singh","year":"2011","journal-title":"Nat Rev Drug Discov"},{"key":"2025011223412055400_btae745-B19","doi-asserted-by":"publisher","first-page":"1220","DOI":"10.1109\/BIBM.2016.7822693","author":"\u0160koda","year":"2016"},{"key":"2025011223412055400_btae745-B20","doi-asserted-by":"publisher","first-page":"2883","DOI":"10.1109\/BIBM58861.2023.10385497","author":"\u0160krh\u00e1k","year":"2023"},{"key":"2025011223412055400_btae745-B21","doi-asserted-by":"publisher","first-page":"1287","DOI":"10.1021\/acs.jcim.0c01002","article-title":"Identification of cryptic binding sites using mixmd with standard and accelerated molecular dynamics","volume":"61","author":"Smith","year":"2021","journal-title":"J Chem Inf Model"},{"key":"2025011223412055400_btae745-B22","doi-asserted-by":"publisher","author":"Smith","year":"2019","DOI":"10.1101\/816702"},{"key":"2025011223412055400_btae745-B23","doi-asserted-by":"publisher","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets","volume":"35","author":"Steinegger","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2025011223412055400_btae745-B24","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1002\/pro.4218","article-title":"Panther: making genome-scale phylogenetics accessible to all","volume":"31","author":"Thomas","year":"2022","journal-title":"Protein Sci"},{"key":"2025011223412055400_btae745-B25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.cbpa.2018.05.003","article-title":"Cryptic binding sites on proteins: definition, detection, and druggability","volume":"44","author":"Vajda","year":"2018","journal-title":"Curr Opin Chem Biol"},{"key":"2025011223412055400_btae745-B26","doi-asserted-by":"crossref","first-page":"D344","DOI":"10.1093\/nar\/gkz853","article-title":"Pdbe-kb: a community-driven resource for structural and functional annotations","volume":"48","author":"Varadi","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2025011223412055400_btae745-B27","doi-asserted-by":"crossref","first-page":"102396","DOI":"10.1016\/j.sbi.2022.102396","article-title":"Mapping the binding sites of challenging drug targets","volume":"75","author":"Wakefield","year":"2022","journal-title":"Curr Opin Struct Biol"},{"key":"2025011223412055400_btae745-B28","doi-asserted-by":"publisher","first-page":"D520","DOI":"10.1093\/nar\/gky949","article-title":"Protein data bank: the single global archive for 3D macromolecular structure data","volume":"47","author":"wwPDB Consortium","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2025011223412055400_btae745-B29","doi-asserted-by":"publisher","first-page":"889","DOI":"10.1093\/bioinformatics\/btq066","article-title":"How significant is a protein structure similarity with tm-score = 0.5?","volume":"26","author":"Xu","year":"2010","journal-title":"Bioinformatics"},{"key":"2025011223412055400_btae745-B30","doi-asserted-by":"publisher","first-page":"D404","DOI":"10.1093\/nar\/gkad630","article-title":"BioLiP2: an updated structure database for biologically relevant ligand\u2013protein interactions","volume":"52","author":"Zhang","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025011223412055400_btae745-B31","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1016\/j.csbj.2020.02.008","article-title":"Exploring the computational methods for protein-ligand binding site prediction","volume":"18","author":"Zhao","year":"2020","journal-title":"Comput Struct Biotechnol J"},{"key":"2025011223412055400_btae745-B32","doi-asserted-by":"crossref","first-page":"416","DOI":"10.1002\/prot.26027","article-title":"Predicting cryptic ligand binding sites based on normal modes guided conformational sampling","volume":"89","author":"Zheng","year":"2021","journal-title":"Proteins"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae745\/61228599\/btae745.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/1\/btae745\/61228599\/btae745.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/1\/btae745\/61228599\/btae745.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,12]],"date-time":"2025-01-12T18:41:41Z","timestamp":1736707301000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae745\/7927823"}},"subtitle":[],"editor":[{"given":"Arne","family":"Elofsson","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,12,18]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,12,26]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae745","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.08.20.608828","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,1]]},"published":{"date-parts":[[2024,12,18]]},"article-number":"btae745"}}