{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T08:34:13Z","timestamp":1772786053289,"version":"3.50.1"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2019,8,26]],"date-time":"2019-08-26T00:00:00Z","timestamp":1566777600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/G03706X\/1"],"award-info":[{"award-number":["EP\/G03706X\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,2,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Machine learning scoring functions for protein\u2013ligand binding affinity prediction have been found to consistently outperform classical scoring functions. Structure-based scoring functions for universal affinity prediction typically use features describing interactions derived from the protein\u2013ligand complex, with limited information about the chemical or topological properties of the ligand itself.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We demonstrate that the performance of machine learning scoring functions are consistently improved by the inclusion of diverse ligand-based features. For example, a Random Forest (RF) combining the features of RF-Score v3 with RDKit molecular descriptors achieved Pearson correlation coefficients of up to 0.836, 0.780 and 0.821 on the PDBbind 2007, 2013 and 2016 core sets, respectively, compared to 0.790, 0.746 and 0.814 when using the features of RF-Score v3 alone. Excluding proteins and\/or ligands that are similar to those in the test sets from the training set has a significant effect on scoring function performance, but does not remove the predictive power of ligand-based features. Furthermore a RF using only ligand-based features is predictive at a level similar to classical scoring functions and it appears to be predicting the mean binding affinity of a ligand for its protein targets.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Data and code to reproduce all the results are freely available at http:\/\/opig.stats.ox.ac.uk\/resources.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz665","type":"journal-article","created":{"date-parts":[[2019,8,21]],"date-time":"2019-08-21T15:27:21Z","timestamp":1566401241000},"page":"758-764","source":"Crossref","is-referenced-by-count":89,"title":["Learning from the ligand: using ligand-based features to improve binding affinity prediction"],"prefix":"10.1093","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4185-1229","authenticated-orcid":false,"given":"Fergus","family":"Boyles","sequence":"first","affiliation":[{"name":"Department of Statistics, University of Oxford , Oxford OX1 3LB, UK"}]},{"given":"Charlotte M","family":"Deane","sequence":"additional","affiliation":[{"name":"Department of Statistics, University of Oxford , Oxford OX1 3LB, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1731-8405","authenticated-orcid":false,"given":"Garrett M","family":"Morris","sequence":"additional","affiliation":[{"name":"Department of Statistics, University of Oxford , Oxford OX1 3LB, UK"}]}],"member":"286","published-online":{"date-parts":[[2019,8,26]]},"reference":[{"key":"2023013110021870600_btz665-B1","doi-asserted-by":"crossref","first-page":"488","DOI":"10.1002\/jcc.540150503","article-title":"Icm \u2013 a new method for protein modeling and design: applications to docking and structure prediction from the distorted native conformation","volume":"15","author":"Abagyan","year":"1994","journal-title":"J. Comput. Chem"},{"key":"2023013110021870600_btz665-B2","doi-asserted-by":"crossref","first-page":"1023","DOI":"10.1039\/C4IB00175C","article-title":"Modelling ligand selectivity of serine proteases using integrative proteochemometric approaches improves model performance and allows the multi-target dependent interpretation of features","volume":"6","author":"Ain","year":"2014","journal-title":"Integr. Biol"},{"key":"2023013110021870600_btz665-B3","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1039\/C5SC02678D","article-title":"Accurate calculation of the absolute free energy of binding for drug molecules","volume":"7","author":"Aldeghi","year":"2016","journal-title":"Chem. Sci"},{"key":"2023013110021870600_btz665-B4","doi-asserted-by":"crossref","first-page":"1169","DOI":"10.1093\/bioinformatics\/btq112","article-title":"A machine learning approach to predicting protein\u2013ligand binding affinity with applications to molecular docking","volume":"26","author":"Ballester","year":"2010","journal-title":"Bioinformatics"},{"key":"2023013110021870600_btz665-B5","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The Protein Data Bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023013110021870600_btz665-B6","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn"},{"key":"2023013110021870600_btz665-B7","doi-asserted-by":"crossref","first-page":"1079","DOI":"10.1021\/ci9000053","article-title":"Comparative assessment of scoring functions on a diverse test set","volume":"49","author":"Cheng","year":"2009","journal-title":"J. Chem. Inform. Model"},{"key":"2023013110021870600_btz665-B8","doi-asserted-by":"crossref","first-page":"2897","DOI":"10.1021\/ci2003889","article-title":"Nnscore 2.0: a neural-network receptor\u2013ligand scoring function","volume":"51","author":"Durrant","year":"2011","journal-title":"J. Chem. Inform. Model"},{"key":"2023013110021870600_btz665-B9","doi-asserted-by":"crossref","first-page":"1739","DOI":"10.1021\/jm0306430","article-title":"Glide: a new approach for rapid, accurate docking and scoring. 1. method and assessment of docking accuracy","volume":"47","author":"Friesner","year":"2004","journal-title":"J. Med. Chem"},{"key":"2023013110021870600_btz665-B10","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1021\/cc9800071","article-title":"A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases","volume":"1","author":"Ghose","year":"1999","journal-title":"J. Combinat. Chem"},{"key":"2023013110021870600_btz665-B11","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1146\/annurev.biophys.36.040306.132550","article-title":"Calculation of protein-ligand binding affinities","volume":"36","author":"Gilson","year":"2007","journal-title":"Ann. Rev. Biophys. Biomol. Struct"},{"key":"2023013110021870600_btz665-B12","doi-asserted-by":"crossref","first-page":"1750","DOI":"10.1021\/jm030644s","article-title":"Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening","volume":"47","author":"Halgren","year":"2004","journal-title":"J. Med. Chem"},{"key":"2023013110021870600_btz665-B13","doi-asserted-by":"crossref","first-page":"12899","DOI":"10.1039\/c0cp00151a","article-title":"Scoring functions and their evaluation methods for protein\u2013ligand docking: recent advances and future directions","volume":"12","author":"Huang","year":"2010","journal-title":"Phys. Chem. Chem. Phys"},{"key":"2023013110021870600_btz665-B14","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1021\/jm020406h","article-title":"Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine","volume":"46","author":"Jain","year":"2003","journal-title":"J. Med. Chem"},{"key":"2023013110021870600_btz665-B15","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1021\/acs.jcim.7b00650","article-title":"K deep: protein\u2013ligand absolute binding affinity prediction via 3d-convolutional neural networks","volume":"58","author":"Jim\u00e9nez","year":"2018","journal-title":"J. Chem. Inform. Model"},{"key":"2023013110021870600_btz665-B16","doi-asserted-by":"crossref","first-page":"727","DOI":"10.1006\/jmbi.1996.0897","article-title":"Development and validation of a genetic algorithm for flexible docking","volume":"267","author":"Jones","year":"1997","journal-title":"J. Mol. Biol"},{"key":"2023013110021870600_btz665-B17","doi-asserted-by":"crossref","first-page":"1961","DOI":"10.1021\/ci100264e","article-title":"Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets","volume":"50","author":"Kramer","year":"2010","journal-title":"J. Chem. Inform. Model"},{"key":"2023013110021870600_btz665-B18","doi-asserted-by":"crossref","first-page":"291.","DOI":"10.1186\/1471-2105-15-291","article-title":"Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: cyscore as a case study","volume":"15","author":"Li","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023013110021870600_btz665-B19","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1002\/minf.201400132","article-title":"Improving autodock vina using random forest: the growing accuracy of binding affinity prediction by the effective exploitation of larger data sets","volume":"34","author":"Li","year":"2015","journal-title":"Mol. Informatics"},{"key":"2023013110021870600_btz665-B20","doi-asserted-by":"crossref","first-page":"10947","DOI":"10.3390\/molecules200610947","article-title":"Low-quality structural and interaction data improves binding affinity prediction via random forest","volume":"20","author":"Li","year":"2015","journal-title":"Molecules"},{"key":"2023013110021870600_btz665-B21","doi-asserted-by":"crossref","first-page":"12.","DOI":"10.3390\/biom8010012","article-title":"The impact of protein structure and sequence similarity on the accuracy of machine-learning scoring functions for binding affinity prediction","volume":"8","author":"Li","year":"2018","journal-title":"Biomolecules"},{"key":"2023013110021870600_btz665-B22","doi-asserted-by":"crossref","first-page":"1007","DOI":"10.1021\/acs.jcim.7b00049","article-title":"Structural and sequence similarity makes a significant impact on machine-learning-based scoring functions for protein\u2013ligand interactions","volume":"57","author":"Li","year":"2017","journal-title":"J. Chem. Inform Model"},{"key":"2023013110021870600_btz665-B23","doi-asserted-by":"crossref","first-page":"1700","DOI":"10.1021\/ci500080q","article-title":"Comparative assessment of scoring functions on an updated benchmark: 1. Compilation of the test set","volume":"54","author":"Li","year":"2014","journal-title":"J. Chem. Inform. Model"},{"key":"2023013110021870600_btz665-B24","doi-asserted-by":"crossref","first-page":"1717","DOI":"10.1021\/ci500081m","article-title":"Comparative assessment of scoring functions on an updated benchmark: 2. Evaluation methods and general results","volume":"54","author":"Li","year":"2014","journal-title":"J. Chem. Inform. Model"},{"key":"2023013110021870600_btz665-B25","doi-asserted-by":"crossref","first-page":"140.","DOI":"10.1038\/nmeth.2324","article-title":"A pharmacological organization of g protein\u2013coupled receptors","volume":"10","author":"Lin","year":"2013","journal-title":"Nat. Methods"},{"key":"2023013110021870600_btz665-B26","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1016\/j.ddtec.2004.11.007","article-title":"Lead-and drug-like compounds: the rule-of-five revolution","volume":"1","author":"Lipinski","year":"2004","journal-title":"Drug Discov. Today Technol"},{"key":"2023013110021870600_btz665-B27","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1021\/acs.accounts.6b00491","article-title":"Forging the basis for developing protein\u2013ligand interaction scoring functions","volume":"50","author":"Liu","year":"2017","journal-title":"Acc. Chem. Res"},{"key":"2023013110021870600_btz665-B28","doi-asserted-by":"crossref","first-page":"2785","DOI":"10.1002\/jcc.21256","article-title":"Autodock4 and autodocktools4: automated docking with selective receptor flexibility","volume":"30","author":"Morris","year":"2009","journal-title":"J. Comput. Chem"},{"key":"2023013110021870600_btz665-B29","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/s13321-015-0063-9","article-title":"Proteochemometric modelling coupled to in silico target prediction: an integrated approach for the simultaneous prediction of polypharmacology and binding affinity\/potency of small molecules","volume":"7","author":"Paricharak","year":"2015","journal-title":"J. Chemoinformatics"},{"key":"2023013110021870600_btz665-B30","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023013110021870600_btz665-B31","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.sbi.2015.12.002","article-title":"Advances in free-energy-based simulations of protein folding and ligand binding","volume":"36","author":"Perez","year":"2016","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023013110021870600_btz665-B32","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1006\/jmbi.1996.0477","article-title":"A fast flexible docking method using an incremental construction algorithm","volume":"261","author":"Rarey","year":"1996","journal-title":"J. Mol. Biol"},{"key":"2023013110021870600_btz665-B33","doi-asserted-by":"crossref","first-page":"e1004586.","DOI":"10.1371\/journal.pcbi.1004586","article-title":"Autodockfr: advances in protein-ligand docking with explicitly specified binding site flexibility","volume":"11","author":"Ravindranath","year":"2015","journal-title":"PLoS Comput. Biol"},{"key":"2023013110021870600_btz665-B34","doi-asserted-by":"crossref","first-page":"603","DOI":"10.4155\/fmc.12.18","article-title":"Analysis of structure-based virtual screening studies and characterization of identified active compounds","volume":"4","author":"Ripphausen","year":"2012","journal-title":"Future Med. Chem"},{"key":"2023013110021870600_btz665-B35","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1002\/prot.22058","article-title":"Sfcscore: scoring functions for affinity prediction of protein\u2013ligand complexes","volume":"73","author":"Sotriffer","year":"2008","journal-title":"Proteins Struct. Funct. Bioinform"},{"key":"2023013110021870600_btz665-B36","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1002\/prot.21082","article-title":"Protein-ligand docking: current status and future challenges","volume":"65","author":"Sousa","year":"2006","journal-title":"Proteins"},{"key":"2023013110021870600_btz665-B37","doi-asserted-by":"crossref","first-page":"2296","DOI":"10.2174\/0929867311320180002","article-title":"Protein-ligand docking in the new millennium \u2013 a retrospective of 10 years in the field","volume":"20","author":"Sousa","year":"2013","journal-title":"Curr. Med. Chem"},{"key":"2023013110021870600_btz665-B38","article-title":"Comparative assessment of scoring functions: the casf-2016 update","author":"Su","year":"2018","journal-title":"Journal of Chemical Information and Modeling"},{"key":"2023013110021870600_btz665-B39","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1002\/jcc.21334","article-title":"Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading","volume":"31","author":"Trott","year":"2010","journal-title":"J. Comput. Chem"},{"key":"2023013110021870600_btz665-B40","doi-asserted-by":"crossref","first-page":"e27518.","DOI":"10.1371\/journal.pone.0027518","article-title":"Which compound to select in lead optimization? Prospectively validated proteochemometric models guide preclinical development","volume":"6","author":"van Westen","year":"2011","journal-title":"PLoS One"},{"key":"2023013110021870600_btz665-B41","doi-asserted-by":"crossref","first-page":"26.","DOI":"10.1186\/s13321-015-0078-2","article-title":"Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field","volume":"7","author":"W\u00f3jcikowski","year":"2015","journal-title":"J. Cheminform"},{"key":"2023013110021870600_btz665-B42","doi-asserted-by":"crossref","first-page":"1334","DOI":"10.1093\/bioinformatics\/bty757","article-title":"Development of a protein\u2013ligand extended connectivity (PLEC) fingerprint and its application for binding affinity predictions","volume":"35","author":"W\u00f3jcikowski","year":"2018","journal-title":"Bioinformatics"},{"key":"2023013110021870600_btz665-B43","doi-asserted-by":"crossref","first-page":"1923","DOI":"10.1021\/ci400120b","article-title":"Sfcscore rf: a random forest-based scoring function for improved affinity prediction of protein\u2013ligand complexes","volume":"53","author":"Zilian","year":"2013","journal-title":"J. Chem. Inform. Model"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz665\/30124491\/btz665.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/3\/758\/48981883\/bioinformatics_36_3_758.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/3\/758\/48981883\/bioinformatics_36_3_758.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,22]],"date-time":"2024-07-22T06:41:49Z","timestamp":1721630509000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/3\/758\/5554651"}},"subtitle":[],"editor":[{"given":"Arne","family":"Elofsson","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,8,26]]},"references-count":43,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,2,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz665","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv.8174525.v1","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,2,1]]},"published":{"date-parts":[[2019,8,26]]}}}