{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,27]],"date-time":"2026-04-27T05:56:01Z","timestamp":1777269361322,"version":"3.51.4"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"9","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Accurately predicting the binding affinities of large sets of diverse protein\u2013ligand complexes is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for analysing the outputs of molecular docking, which in turn is an important technique for drug discovery, chemical biology and structural biology. Each scoring function assumes a predetermined theory-inspired functional form for the relationship between the variables that characterize the complex, which also include parameters fitted to experimental or simulation data and its predicted binding affinity. The inherent problem of this rigid approach is that it leads to poor predictivity for those complexes that do not conform to the modelling assumptions. Moreover, resampling strategies, such as cross-validation or bootstrapping, are still not systematically used to guard against the overfitting of calibration data in parameter estimation for scoring functions.<\/jats:p>\n               <jats:p>Results: We propose a novel scoring function (RF-Score) that circumvents the need for problematic modelling assumptions via non-parametric machine learning. In particular, Random Forest was used to implicitly capture binding effects that are hard to model explicitly. RF-Score is compared with the state of the art on the demanding PDBbind benchmark. Results show that RF-Score is a very competitive scoring function. Importantly, RF-Score's performance was shown to improve dramatically with training set size and hence the future availability of more high-quality structural and interaction data is expected to lead to improved versions of RF-Score.<\/jats:p>\n               <jats:p>Contact: \u00a0pedro.ballester@ebi.ac.uk; jbom@st-andrews.ac.uk<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq112","type":"journal-article","created":{"date-parts":[[2010,3,18]],"date-time":"2010-03-18T00:14:28Z","timestamp":1268871268000},"page":"1169-1175","source":"Crossref","is-referenced-by-count":841,"title":["A machine learning approach to predicting protein\u2013ligand binding affinity with applications to molecular docking"],"prefix":"10.1093","volume":"26","author":[{"given":"Pedro J.","family":"Ballester","sequence":"first","affiliation":[{"name":"1 Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW and 2 Centre for Biomolecular Sciences, University of St Andrews, North Haugh, St Andrews KY16 9ST, UK"}]},{"given":"John B. O.","family":"Mitchell","sequence":"additional","affiliation":[{"name":"1 Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW and 2 Centre for Biomolecular Sciences, University of St Andrews, North Haugh, St Andrews KY16 9ST, UK"}]}],"member":"286","published-online":{"date-parts":[[2010,3,17]]},"reference":[{"key":"2023012508163577100_B1","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1002\/prot.21782","article-title":"A general approach for developing system-specific functions to score protein-ligand docked complexes using support vector inductive logic programming","volume":"69","author":"Amini","year":"2007","journal-title":"Proteins"},{"key":"2023012508163577100_B2","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1002\/(SICI)1097-0134(19981115)33:3<367::AID-PROT6>3.0.CO;2-W","article-title":"Flexible docking using Tabu search and an empirical estimate of binding affinity","volume":"33","author":"Baxter","year":"1998","journal-title":"Proteins: Struct., Funct., Genet."},{"key":"2023012508163577100_B3","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The Protein Data Bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"2023012508163577100_B4","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1007\/BF00126743","article-title":"The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure","volume":"8","author":"B\u00f6hm","year":"1994","journal-title":"J. Comput.-Aided Mol. Des."},{"key":"2023012508163577100_B5","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1023\/A:1007999920146","article-title":"Prediction of binding constants of protein ligands: a fast method for the prioritization of hits obtained from de novo design or 3D database search programs","volume":"12","author":"B\u00f6hm","year":"1998","journal-title":"J. Comput.-Aided Mol. Des."},{"key":"2023012508163577100_B6","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"2023012508163577100_B7","volume-title":"Classification and Regression Trees.","author":"Breiman","year":"1984"},{"key":"2023012508163577100_B8","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1016\/j.drudis.2009.02.010","article-title":"A chemogenomic approach to drug discovery: focus on cardiovascular diseases","volume":"14","author":"Cases","year":"2009","journal-title":"Drug Discov. Today"},{"key":"2023012508163577100_B9","doi-asserted-by":"crossref","first-page":"4394","DOI":"10.1093\/bioinformatics\/bti721","article-title":"Prediction of protein-protein interactions using random decision forest framework","volume":"21","author":"Chen","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012508163577100_B10","doi-asserted-by":"crossref","first-page":"1079","DOI":"10.1021\/ci9000053","article-title":"Comparative assessment of scoring functions on a diverse test set","volume":"49","author":"Cheng","year":"2009","journal-title":"J. Chem. Inf. Model."},{"key":"2023012508163577100_B11","doi-asserted-by":"crossref","first-page":"699","DOI":"10.1021\/ci034246+","article-title":"Predicting protein-ligand binding affinities using novel geometrical descriptors and machine-learning methods","volume":"44","author":"Deng","year":"2004","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"2023012508163577100_B12","doi-asserted-by":"crossref","first-page":"425","DOI":"10.1023\/A:1007996124545","article-title":"Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes","volume":"11","author":"Eldridge","year":"1997","journal-title":"J. Comput.-Aided Mol. Des."},{"key":"2023012508163577100_B13","doi-asserted-by":"crossref","first-page":"855","DOI":"10.1016\/j.jmb.2007.10.065","article-title":"Molecular docking for substrate identification: the short-chain dehydrogenases\/reductases","volume":"375","author":"Favia","year":"2008","journal-title":"J. Mol. Biol."},{"key":"2023012508163577100_B14","doi-asserted-by":"crossref","first-page":"3032","DOI":"10.1021\/jm030489h","article-title":"Assessing scoring functions for protein-ligand interactions","volume":"47","author":"Ferrara","year":"2004","journal-title":"J. Med. Chem."},{"key":"2023012508163577100_B15","doi-asserted-by":"crossref","first-page":"1739","DOI":"10.1021\/jm0306430","article-title":"Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy","volume":"47","author":"Friesner","year":"2004","journal-title":"J. Med. Chem."},{"key":"2023012508163577100_B16","doi-asserted-by":"crossref","first-page":"6177","DOI":"10.1021\/jm051256o","article-title":"Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes","volume":"49","author":"Friesner","year":"2006","journal-title":"J. Med. Chem."},{"key":"2023012508163577100_B17","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1016\/1074-5521(95)90050-0","article-title":"Molecular recognition of the inhibitor AG-1343 by HIV-1 Protease: conformationally flexible docking by evolutionary programming","volume":"2","author":"Gehlhaar","year":"1995","journal-title":"Chem. Biol."},{"key":"2023012508163577100_B18","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1006\/jmbi.1999.3371","article-title":"Knowledge-based scoring function to predict protein-ligand interactions","volume":"295","author":"Gohlke","year":"2000","journal-title":"J. Mol. Biol."},{"key":"2023012508163577100_B19","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1016\/j.sbi.2008.11.009","article-title":"Computational evaluation of protein-small molecule binding","volume":"19","author":"Guvench","year":"2009","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023012508163577100_B20","doi-asserted-by":"crossref","first-page":"500","DOI":"10.1186\/1471-2105-9-500","article-title":"Prediction of glycosylation sites using random forests","volume":"9","author":"Hamby","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012508163577100_B21","doi-asserted-by":"crossref","first-page":"5166","DOI":"10.1039\/B608269F","article-title":"Molecular mechanics methods for predicting protein-ligand binding","volume":"8","author":"Huang","year":"2006","journal-title":"Phys. Chem. Chem. Phys."},{"key":"2023012508163577100_B22","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/s10822-008-9189-4","article-title":"Community benchmarks for virtual screening","volume":"22","author":"Irwin","year":"2008","journal-title":"J. Comput.-Aided Mol. Des."},{"key":"2023012508163577100_B23","doi-asserted-by":"crossref","first-page":"427","DOI":"10.1007\/BF00124474","article-title":"Scoring noncovalent protein-ligand interactions: a continuous differentiable function tuned to compute binding affinities","volume":"10","author":"Jain","year":"1996","journal-title":"J. Comput.-Aided Mol. Des."},{"key":"2023012508163577100_B24","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1016\/S0022-2836(95)80037-9","article-title":"Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation","volume":"245","author":"Jones","year":"1995","journal-title":"J. Mol. Biol."},{"key":"2023012508163577100_B25","doi-asserted-by":"crossref","first-page":"727","DOI":"10.1006\/jmbi.1996.0897","article-title":"Development and validation of a genetic algorithm for flexible docking","volume":"267","author":"Jones","year":"1997","journal-title":"J. Mol. Biol."},{"key":"2023012508163577100_B26","doi-asserted-by":"crossref","first-page":"935","DOI":"10.1038\/nrd1549","article-title":"Docking and scoring in virtual screening for drug discovery: methods and applications","volume":"3","author":"Kitchen","year":"2004","journal-title":"Nat. Rev. Drug Discov."},{"key":"2023012508163577100_B27","doi-asserted-by":"crossref","first-page":"527","DOI":"10.1002\/qsar.200430926","article-title":"Knowledge based potentials: the reverse Boltzmann methodology, virtual screening and molecular weight dependence","volume":"24","author":"Konstantinou Kirtay","year":"2005","journal-title":"QSAR Comb. Sci."},{"key":"2023012508163577100_B28","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1016\/j.jmgm.2004.11.007","article-title":"LigScore: a novel scoring function for predicting binding affinities","volume":"23","author":"Krammer","year":"2005","journal-title":"J. Mol. Graph. Model."},{"key":"2023012508163577100_B29","doi-asserted-by":"crossref","first-page":"1990","DOI":"10.1021\/ci800125k","article-title":"Information theory-based scoring function for the structure-based prediction of protein-ligand binding affinity","volume":"48","author":"Kulharia","year":"2008","journal-title":"J. Chem. Inf. Model."},{"key":"2023012508163577100_B30","volume-title":"Molecular Modelling: Principles and Applications","author":"Leach","year":"2001","edition":"2"},{"key":"2023012508163577100_B31","doi-asserted-by":"crossref","first-page":"5851","DOI":"10.1021\/jm060999m","article-title":"Prediction of protein-ligand interactions. docking and scoring: successes and gaps","volume":"49","author":"Leach","year":"2006","journal-title":"J. Med. Chem."},{"key":"2023012508163577100_B32","doi-asserted-by":"crossref","first-page":"1165","DOI":"10.1002\/(SICI)1096-987X(199908)20:11<1165::AID-JCC7>3.0.CO;2-A","article-title":"BLEEP - potential of mean force describing protein-ligand interactions: I. Generating potential","volume":"20","author":"Mitchell","year":"1999","journal-title":"J. Comput. Chem."},{"key":"2023012508163577100_B33","doi-asserted-by":"crossref","first-page":"1177","DOI":"10.1002\/(SICI)1096-987X(199908)20:11<1177::AID-JCC8>3.0.CO;2-0","article-title":"BLEEP - potential of mean force describing protein-ligand interactions: II. Calculation of binding energies and comparison with experimental data","volume":"20","author":"Mitchell","year":"1999","journal-title":"J. Comput. Chem."},{"key":"2023012508163577100_B34","doi-asserted-by":"crossref","first-page":"S7","DOI":"10.1038\/sj.bjp.0707515","article-title":"Towards the development of universal, fast and highly accurate docking\/scoring methods: a long way to go","volume":"153","author":"Moitessier","year":"2008","journal-title":"Br. J. Pharmacol."},{"key":"2023012508163577100_B35","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1002\/prot.20588","article-title":"General and targeted statistical potentials for protein-ligand interactions","volume":"61","author":"Mooij","year":"2005","journal-title":"Proteins: Struct., Funct., Bioinf."},{"key":"2023012508163577100_B36","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1023\/A:1008729005958","article-title":"A knowledge-based scoring function for protein-ligand interactions: probing the reference state","volume":"20","author":"Muegge","year":"2000","journal-title":"Perspect. Drug Discov. Des."},{"key":"2023012508163577100_B37","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1002\/1096-987X(200103)22:4<418::AID-JCC1012>3.0.CO;2-3","article-title":"Effect of ligand volume correction on PMF scoring","volume":"22","author":"Muegge","year":"2001","journal-title":"J. Comput. Chem."},{"key":"2023012508163577100_B38","doi-asserted-by":"crossref","first-page":"5895","DOI":"10.1021\/jm050038s","article-title":"PMF scoring revisited","volume":"49","author":"Muegge","year":"2006","journal-title":"J. Med. Chem."},{"key":"2023012508163577100_B39","doi-asserted-by":"crossref","first-page":"791","DOI":"10.1021\/jm980536j","article-title":"A general and fast scoring function for protein-ligand interactions: a simplified potential approach","volume":"42","author":"Muegge","year":"1999","journal-title":"J. Med. Chem."},{"key":"2023012508163577100_B40","doi-asserted-by":"crossref","first-page":"2345","DOI":"10.1021\/ci700157b","article-title":"y-Randomization and its variants in QSPR\/QSAR","volume":"47","author":"Rucker","year":"2007","journal-title":"J. Chem. Inf. Model."},{"key":"2023012508163577100_B41","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1021\/ci900382e","article-title":"Combining machine learning and pharmacophore-based interaction fingerprint for in silico screening","volume":"50","author":"Sato","year":"2010","journal-title":"J. Chem. Inf. Model."},{"key":"2023012508163577100_B42","doi-asserted-by":"crossref","first-page":"1947","DOI":"10.1021\/ci034160g","article-title":"Random forest: a classification and regression tool for compound classification and QSAR modeling","volume":"43","author":"Svetnik","year":"2003","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"2023012508163577100_B43","author":"The Discovery Studio Software, version 2.0","year":"2001"},{"key":"2023012508163577100_B44","author":"The Schr\u00f6dinger Software, version 8.0","year":"2005"},{"key":"2023012508163577100_B45","author":"The Sybyl Software, version 7.2","year":"2006"},{"key":"2023012508163577100_B46","doi-asserted-by":"crossref","first-page":"6296","DOI":"10.1021\/jm050436v","article-title":"DrugScoreCSD - knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction","volume":"48","author":"Velec","year":"2005","journal-title":"J. Med. Chem."},{"key":"2023012508163577100_B47","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1023\/A:1016357811882","article-title":"Further development and validation of empirical scoring functions for structure-based binding affinity prediction","volume":"16","author":"Wang","year":"2002","journal-title":"J. Comput.-Aided Mol. Des."},{"key":"2023012508163577100_B48","doi-asserted-by":"crossref","first-page":"2287","DOI":"10.1021\/jm0203783","article-title":"Comparative evaluation of 11 scoring functions for molecular docking","volume":"46","author":"Wang","year":"2003","journal-title":"J. Med. Chem."},{"key":"2023012508163577100_B49","doi-asserted-by":"crossref","first-page":"2114","DOI":"10.1021\/ci049733j","article-title":"An extensive test of 14 scoring functions using the PDBbind refined set of 800 protein-ligand complexes","volume":"44","author":"Wang","year":"2004","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"2023012508163577100_B50","doi-asserted-by":"crossref","first-page":"4111","DOI":"10.1021\/jm048957q","article-title":"The PDBbind database: methodologies and updates","volume":"48","author":"Wang","year":"2005","journal-title":"J. Med. Chem."},{"key":"2023012508163577100_B51","doi-asserted-by":"crossref","first-page":"e4783","DOI":"10.1371\/journal.pone.0004783","article-title":"Chemical probes that competitively and selectively inhibit Stat3 activation","volume":"4","author":"Xu","year":"2009","journal-title":"PLoS ONE"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/9\/1169\/48856826\/bioinformatics_26_9_1169.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/9\/1169\/48856826\/bioinformatics_26_9_1169.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T08:17:18Z","timestamp":1674634638000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/9\/1169\/199938"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,3,17]]},"references-count":51,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2010,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq112","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,5,1]]},"published":{"date-parts":[[2010,3,17]]}}}