{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,26]],"date-time":"2025-09-26T00:19:14Z","timestamp":1758845954218,"version":"3.44.0"},"reference-count":45,"publisher":"Oxford University Press (OUP)","license":[{"start":{"date-parts":[[2025,9,25]],"date-time":"2025-09-25T00:00:00Z","timestamp":1758758400000},"content-version":"vor","delay-in-days":267,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100007000","name":"Laboratory Directed Research and Development","doi-asserted-by":"publisher","award":["#20240876PRD4"],"award-info":[{"award-number":["#20240876PRD4"]}],"id":[{"id":"10.13039\/100007000","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,1,18]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The PDBBind database has been widely utilized for the computational prediction of protein\u2013protein binding affinities. While the accuracy of the PDBBind-curated equilibrium dissociation constants (KD) has been reported for the protein\u2013ligand subset of the PDBBind database, the curation accuracy has not been reported for the protein\u2013protein subset. Here, we present a detailed manual analysis for the subset of PDBBind records with PubMed Central Open Access primary publications and find that ~19% of these records had KD values that were not supported by their primary publications. The impact of these putative curation errors on the machine learning-based prediction of KD from experimental protein\u2013protein 3D structures was evaluated and correcting the curation errors improved the Pearson correlation coefficient between measured and random forest-predicted log10(KD) values by ~8 percentage points. This finding underscores the importance of dataset accuracy for computational modelling and highlights the need for more stringent curation processes when extracting information from the scientific literature.<\/jats:p>","DOI":"10.1093\/database\/baaf061","type":"journal-article","created":{"date-parts":[[2025,9,25]],"date-time":"2025-09-25T14:26:19Z","timestamp":1758810379000},"source":"Crossref","is-referenced-by-count":0,"title":["The impact of curation errors in the PDBBind Database on machine learning predictions of protein\u2013protein binding affinity"],"prefix":"10.1093","volume":"2025","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7151-3108","authenticated-orcid":false,"given":"Jason D","family":"Gans","sequence":"first","affiliation":[{"name":"Bioscience Division, Los Alamos National Laboratory , Los Alamos, NM, 87545 ,","place":["USA"]}]},{"given":"Justin E","family":"Miller","sequence":"additional","affiliation":[{"name":"Bioscience Division, Los Alamos National Laboratory , Los Alamos, NM, 87545 ,","place":["USA"]}]},{"given":"Esen","family":"Sokullu","sequence":"additional","affiliation":[{"name":"Bioscience Division, Los Alamos National Laboratory , Los Alamos, NM, 87545 ,","place":["USA"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4488-9126","authenticated-orcid":false,"given":"Nileena","family":"Velappan","sequence":"additional","affiliation":[{"name":"Bioscience Division, Los Alamos National Laboratory , Los Alamos, NM, 87545 ,","place":["USA"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5904-3441","authenticated-orcid":false,"given":"Ramesh K","family":"Jha","sequence":"additional","affiliation":[{"name":"Bioscience Division, Los Alamos National Laboratory , Los Alamos, NM, 87545 ,","place":["USA"]}]}],"member":"286","published-online":{"date-parts":[[2025,9,24]]},"reference":[{"key":"2025092510261029900_bib1","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1016\/j.sbi.2016.10.016","article-title":"Protein\u2013protein interactions: scoring schemes and binding affinity","volume":"44","author":"Gromiha","year":"2017","journal-title":"Curr Opin Struct Biol"},{"key":"2025092510261029900_bib2","doi-asserted-by":"publisher","first-page":"1065703","DOI":"10.3389\/fbinf.2022.1065703","article-title":"Machine learning methods for protein-protein binding affinity prediction in protein design","volume":"2","author":"Guo","year":"2022","journal-title":"Front Bioinform"},{"key":"2025092510261029900_bib3","doi-asserted-by":"publisher","first-page":"20120835","DOI":"10.1098\/rsif.2012.0835","article-title":"On the binding affinity of macromolecular interactions: daring to ask why proteins interact","volume":"10","author":"Kastritis","year":"2013","journal-title":"J R Soc Interface"},{"key":"2025092510261029900_bib4","doi-asserted-by":"publisher","first-page":"e1448","DOI":"10.1002\/wcms.1448","article-title":"Computational prediction of protein\u2013protein binding affinities","volume":"10","author":"Siebenmorgen","year":"2020","journal-title":"WIREs Computat Mol Sci"},{"key":"2025092510261029900_bib5","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1093\/bioinformatics\/btu626","article-title":"PDB-wide collection of binding data: current status of the PDBbind database","volume":"31","author":"Liu","year":"2015","journal-title":"Bioinformatics"},{"key":"2025092510261029900_bib6","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The Protein Data Bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2025092510261029900_bib7","doi-asserted-by":"publisher","first-page":"e1716","DOI":"10.1002\/wcms.1716","article-title":"Modern machine-learning for binding affinity estimation of protein\u2013ligand complexes: progress, opportunities, and challenges","volume":"14","author":"Harren","year":"2024","journal-title":"WIREs Compu Mol Sci"},{"key":"2025092510261029900_bib8","doi-asserted-by":"publisher","first-page":"20200007","DOI":"10.1098\/rsfs.2020.0007","article-title":"Rapid, accurate, precise and reproducible ligand\u2013protein binding free energy prediction","volume":"10","author":"Wan","year":"2020","journal-title":"Interface Focus"},{"key":"2025092510261029900_bib9","doi-asserted-by":"publisher","first-page":"1169","DOI":"10.1093\/bioinformatics\/btq112","article-title":"A machine learning approach to predicting protein\u2013ligand binding affinity with applications to molecular docking","volume":"26","author":"Ballester","year":"2010","journal-title":"Bioinformatics"},{"key":"2025092510261029900_bib10","doi-asserted-by":"publisher","first-page":"308","DOI":"10.1016\/j.cels.2020.03.002","article-title":"MONN: a multi-objective neural network for predicting compound-protein interactions and affinities","volume":"10","author":"Li","year":"2020","journal-title":"Cell Syst"},{"key":"2025092510261029900_bib11","doi-asserted-by":"publisher","first-page":"482","DOI":"10.1002\/pro.580","article-title":"A structure-based benchmark for protein\u2013protein binding affinity","volume":"20","author":"Kastritis","year":"2011","journal-title":"Protein Sci"},{"key":"2025092510261029900_bib12","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1016\/j.sbi.2008.11.009","article-title":"Computational evaluation of protein-small molecule binding","volume":"19","author":"Guvench","year":"2009","journal-title":"Curr Opin Struct Biol"},{"key":"2025092510261029900_bib13","doi-asserted-by":"publisher","first-page":"2977","DOI":"10.1021\/jm030580l","article-title":"The PDBbind Database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures","volume":"47","author":"Wang","year":"2004","journal-title":"J Med Chem"},{"key":"2025092510261029900_bib14","doi-asserted-by":"publisher","first-page":"5441","DOI":"10.1039\/C8SC00148K","article-title":"Large-scale comparison of machine learning methods for drug target prediction on ChEMBL","volume":"9","author":"Mayr","year":"2018","journal-title":"Chem Sci"},{"key":"2025092510261029900_bib15","doi-asserted-by":"publisher","first-page":"1829","DOI":"10.1021\/acs.jproteome.2c00020","article-title":"PPI-affinity: a web tool for the prediction and optimization of protein\u2212peptide and protein\u2212protein binding affinity","volume":"21","author":"Romero-Molina","year":"2022","journal-title":"J Proteome Res"},{"key":"2025092510261029900_bib16","article-title":"Generalist equivariant transformer towards 3D molecular interaction learning","volume-title":"Proceedings of the 41st International Conference on Machine Learning","author":"Kong","year":"2024"},{"key":"2025092510261029900_bib17","doi-asserted-by":"publisher","first-page":"1127","DOI":"10.1002\/prot.26700","article-title":"ProBAN: neural network algorithm for predicting binding affinity in protein\u2013protein complexes","volume":"92","author":"Bogdanova","year":"2024","journal-title":"Proteins Struct Funct Bioinf"},{"key":"2025092510261029900_bib18","doi-asserted-by":"publisher","first-page":"5465","DOI":"10.1038\/s41467-021-25772-4","article-title":"A deep-learning framework for multi-level peptide\u2013protein interaction prediction","volume":"12","author":"Lei","year":"2021","journal-title":"Nat Commun"},{"key":"2025092510261029900_bib19","first-page":"1871","article-title":"Deep learning-based method for predicting and classifying the binding affinity of protein-protein complexes","volume":"6","author":"Nikam","year":"2023","journal-title":"Biochim Biophys Acta Proteins Proteom"},{"key":"2025092510261029900_bib20","first-page":"3454","article-title":"Predictive models and impact of interfacial contacts and amino acids on protein\u2212protein binding affinity","volume":"9","author":"Yi","year":"2024","journal-title":"ACS Omega"},{"key":"2025092510261029900_bib21","doi-asserted-by":"publisher","first-page":"4111","DOI":"10.1021\/jm048957q","article-title":"The PDBbind Database: methodologies and Updates","volume":"48","author":"Wang","year":"2005","journal-title":"J Med Chem"},{"key":"2025092510261029900_bib22","article-title":"PMC Open Access Subset","author":"National Library of Medicine","year":"2003"},{"key":"2025092510261029900_bib23","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach Learn"},{"key":"2025092510261029900_bib40","doi-asserted-by":"crossref","DOI":"10.1186\/1472-6807-14-17","article-title":"Characterization of the SAM domain of the PKD-related protein ANKS6 and its interaction with ANKS3","volume":"14","author":"Leettola","year":"2014","journal-title":"BMC Struct Biol"},{"key":"2025092510261029900_bib41","doi-asserted-by":"publisher","first-page":"6893","DOI":"10.1093\/nar\/gky542","article-title":"Structural motifs in eIF4G and 4E-BPs modulate their binding to eIF4E to regulate translation initiation in yeast","volume":"46","author":"Gruner","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2025092510261029900_bib42","doi-asserted-by":"publisher","first-page":"427","DOI":"10.1038\/nature13715","article-title":"Structure of malaria invasion protein RH5 with erythrocyte basigin and blocking antibodies","volume":"515","author":"Wright","year":"2014","journal-title":"Nature"},{"key":"2025092510261029900_bib43","doi-asserted-by":"publisher","first-page":"10024","DOI":"10.1074\/jbc.M114.550558","article-title":"Polymorphisms in the human inhibitory signal-regulatory protein \u03b1 do not affect binding to its ligand CD47","volume":"289","author":"Hatherley","year":"2014","journal-title":"J Biol Chem"},{"key":"2025092510261029900_bib44","doi-asserted-by":"publisher","first-page":"e20352","DOI":"10.7554\/eLife.20352","article-title":"Computationally designed high specificity inhibitors delineate the roles of BCL2 family proteins in cancer","volume":"5","author":"Berger","year":"2016","journal-title":"eLife"},{"key":"2025092510261029900_bib45","doi-asserted-by":"publisher","first-page":"e1000126","DOI":"10.1371\/journal.pbio.1000126","article-title":"ATP and MO25alpha regulate the conformational state of the STRADalpha pseudokinase and activation of the LKB1 tumour suppressor","volume":"7","author":"Zeqiraj","year":"2009","journal-title":"PLoS Biol"},{"key":"2025092510261029900_bib24","doi-asserted-by":"publisher","first-page":"e07454","DOI":"10.7554\/eLife.07454","article-title":"Contacts-based prediction of binding affinity in protein\u2013protein complexes","volume":"4","author":"Vangone","year":"2015","journal-title":"eLife"},{"key":"2025092510261029900_bib25","doi-asserted-by":"publisher","first-page":"1961","DOI":"10.1021\/ci100264e","article-title":"Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets","volume":"50","author":"Kramer","year":"2010","journal-title":"J Chem Inf Model"},{"key":"2025092510261029900_bib26","doi-asserted-by":"publisher","first-page":"947","DOI":"10.1021\/acs.jcim.8b00712","article-title":"In need of bias control: evaluating chemical data for machine learning in structure-based virtual screening","volume":"59","author":"Sieg","year":"2019","journal-title":"J Chem Inf Model"},{"key":"2025092510261029900_bib27","first-page":"1","article-title":"Quantification of biases in predictions of protein\u2013protein binding affinity changes upon mutations","volume":"25","author":"Tsishyn","year":"2024","journal-title":"Briefings Bioinf"},{"key":"2025092510261029900_bib28","doi-asserted-by":"publisher","first-page":"5485","DOI":"10.1021\/acs.jcim.2c01149","article-title":"Assessment of the generalization abilities of machine-learning scoring functions for structure-based virtual screening","volume":"62","author":"Zhu","year":"2022","journal-title":"J Chem Inf Model"},{"key":"2025092510261029900_bib29","volume-title":"Single-Linkage Clustering","author":"Wikipedia","year":"2024"},{"key":"2025092510261029900_bib30","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J Mol Biol"},{"key":"2025092510261029900_bib31","doi-asserted-by":"publisher","first-page":"2302","DOI":"10.1093\/nar\/gki524","article-title":"TM-align: a protein structure alignment algorithm based on the TM-score","volume":"33","author":"Zhang","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2025092510261029900_bib32","article-title":"Welch's t-Test","author":"Wikipedia","year":"2025"},{"key":"2025092510261029900_bib33","doi-asserted-by":"publisher","first-page":"1513","DOI":"10.1093\/bioinformatics\/bty880","article-title":"A natural upper bound to the accuracy of predicting protein stability changes upon mutations","volume":"35","author":"Montanucci","year":"2019","journal-title":"Bioinformatics"},{"key":"2025092510261029900_bib34","doi-asserted-by":"publisher","first-page":"895","DOI":"10.1021\/acs.jcim.8b00545","article-title":"Comparative assessment of scoring functions: the CASF-2016 update","volume":"59","author":"Su","year":"2019","journal-title":"J Chem Inf Model"},{"key":"2025092510261029900_bib35","doi-asserted-by":"publisher","first-page":"1079","DOI":"10.1021\/ci9000053","article-title":"Comparative assessment of scoring functions on a diverse test set","volume":"49","author":"Cheng","year":"2009","journal-title":"J Chem Inf Model"},{"key":"2025092510261029900_bib36","doi-asserted-by":"publisher","first-page":"1700","DOI":"10.1021\/ci500080q","article-title":"Comparative assessment of scoring functions on an updated benchmark: 1. Compilation of the test set","volume":"54","author":"Li","year":"2014","journal-title":"J Chem Inf Model"},{"key":"2025092510261029900_bib37","doi-asserted-by":"publisher","first-page":"1692","DOI":"10.1021\/acs.jcim.6b00196","article-title":"Improved computation of protein\u2212protein relative binding energies with the Nwat-MMGBSA method","volume":"56","author":"Maffucci","year":"2016","journal-title":"J Chem Inf Model"},{"key":"2025092510261029900_bib38","year":"2022."},{"key":"2025092510261029900_bib39","doi-asserted-by":"publisher","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc Natl Acad Sci USA"}],"container-title":["Database"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baaf061\/64394782\/baaf061.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baaf061\/64394782\/baaf061.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,25]],"date-time":"2025-09-25T14:26:21Z","timestamp":1758810381000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/database\/article\/doi\/10.1093\/database\/baaf061\/8263854"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"references-count":45,"URL":"https:\/\/doi.org\/10.1093\/database\/baaf061","relation":{},"ISSN":["1758-0463"],"issn-type":[{"value":"1758-0463","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025]]},"published":{"date-parts":[[2025]]},"article-number":"baaf061"}}