{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T07:26:24Z","timestamp":1776410784479,"version":"3.51.2"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2020,12,7]],"date-time":"2020-12-07T00:00:00Z","timestamp":1607299200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100005739","name":"UNAM","doi-asserted-by":"publisher","award":["LANCAD-UNAM-DGTIC-335"],"award-info":[{"award-number":["LANCAD-UNAM-DGTIC-335"]}],"id":[{"id":"10.13039\/501100005739","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,6,16]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Machine-learning scoring functions (SFs) have been found to outperform standard SFs for binding affinity prediction of protein\u2013ligand complexes. A plethora of reports focus on the implementation of increasingly complex algorithms, while the chemical description of the system has not been fully exploited.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Herein, we introduce Extended Connectivity Interaction Features (ECIF) to describe protein\u2013ligand complexes and build machine-learning SFs with improved predictions of binding affinity. ECIF are a set of protein\u2212ligand atom-type pair counts that take into account each atom\u2019s connectivity to describe it and thus define the pair types. ECIF were used to build different machine-learning models to predict protein\u2013ligand affinities (pKd\/pKi). The models were evaluated in terms of \u2018scoring power\u2019 on the Comparative Assessment of Scoring Functions 2016. The best models built on ECIF achieved Pearson correlation coefficients of 0.857 when used on its own, and 0.866 when used in combination with ligand descriptors, demonstrating ECIF descriptive power.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>Data and code to reproduce all the results are freely available at https:\/\/github.com\/DIFACQUIM\/ECIF.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa982","type":"journal-article","created":{"date-parts":[[2020,11,11]],"date-time":"2020-11-11T01:56:59Z","timestamp":1605059819000},"page":"1376-1382","source":"Crossref","is-referenced-by-count":96,"title":["Extended connectivity interaction features: improving binding affinity prediction through chemical description"],"prefix":"10.1093","volume":"37","author":[{"given":"Norberto","family":"S\u00e1nchez-Cruz","sequence":"first","affiliation":[{"name":"Department of Pharmacy, School of Chemistry, Universidad Nacional Aut\u00f3noma de M\u00e9xico , Mexico City 04510, Mexico"}]},{"given":"Jos\u00e9 L","family":"Medina-Franco","sequence":"additional","affiliation":[{"name":"Department of Pharmacy, School of Chemistry, Universidad Nacional Aut\u00f3noma de M\u00e9xico , Mexico City 04510, Mexico"}]},{"given":"Jordi","family":"Mestres","sequence":"additional","affiliation":[{"name":"Research Group on Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute and University Pompeu Fabra, Parc de Recerca Biomedica (PRBB) , 08003 Barcelona, Catalonia, Spain"},{"name":"Chemotargets SL, Parc Cientific de Barcelona (PCB) , 08028 Barcelona, Catalonia, Spain"}]},{"given":"Xavier","family":"Barril","sequence":"additional","affiliation":[{"name":"Institut de Biomedicina de la Universitat de Barcelona (IBUB) and Facultat de Farmacia, Universitat de Barcelona , 08028 Barcelona, Spain"},{"name":"Catalan Institution for Research and Advanced Studies (ICREA) , 08010 Barcelona, Spain"}]}],"member":"286","published-online":{"date-parts":[[2020,12,7]]},"reference":[{"key":"2023051709341013200_btaa982-B1","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1002\/wcms.1225","article-title":"Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening","volume":"5","author":"Ain","year":"2015","journal-title":"Wiley Interdiscip. Rev. Comput. Mol. Sci"},{"key":"2023051709341013200_btaa982-B2","doi-asserted-by":"crossref","first-page":"944","DOI":"10.1021\/ci500091r","article-title":"Does a more precise chemical description of protein\u2013ligand complexes lead to more accurate prediction of binding affinity?","volume":"54","author":"Ballester","year":"2014","journal-title":"J. Chem. Inf. Model"},{"key":"2023051709341013200_btaa982-B3","doi-asserted-by":"crossref","first-page":"1169","DOI":"10.1093\/bioinformatics\/btq112","article-title":"A machine learning approach to predicting protein\u2013ligand binding affinity with applications to molecular docking","volume":"26","author":"Ballester","year":"2010","journal-title":"Bioinformatics"},{"key":"2023051709341013200_btaa982-B4","doi-asserted-by":"crossref","first-page":"758","DOI":"10.1093\/bioinformatics\/btz665","article-title":"Learning from the ligand: using ligand-based features to improve binding affinity prediction","volume":"36","author":"Boyles","year":"2020","journal-title":"Bioinformatics"},{"key":"2023051709341013200_btaa982-B5","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1005929","article-title":"Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening","author":"Cang","year":"2018"},{"key":"2023051709341013200_btaa982-B6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pcbi.1005690","article-title":"TopologyNet: topology based deep convolutional and multi-task neural networks for biomolecular property predictions","volume":"13","author":"Cang","year":"2017","journal-title":"PLoS Comput. Biol"},{"key":"2023051709341013200_btaa982-B8","doi-asserted-by":"crossref","first-page":"1079","DOI":"10.1021\/ci9000053","article-title":"Comparative assessment of scoring functions on a diverse test set","volume":"49","author":"Cheng","year":"2009","journal-title":"J. Chem. Inf. Model"},{"key":"2023051709341013200_btaa982-B10","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1021\/jm030331x","article-title":"Structural Interaction Fingerprint (SIFt): a novel method for analyzing three-dimensional protein\u2212ligand binding interactions","volume":"47","author":"Deng","year":"2004","journal-title":"J. Med. Chem"},{"key":"2023051709341013200_btaa982-B11","doi-asserted-by":"crossref","first-page":"2897","DOI":"10.1021\/ci2003889","article-title":"NNScore 2.0: a neural-network receptor\u2013ligand scoring function","volume":"51","author":"Durrant","year":"2011","journal-title":"J. Chem. Inf. Model"},{"key":"2023051709341013200_btaa982-B12","doi-asserted-by":"crossref","first-page":"6177","DOI":"10.1021\/jm051256o","article-title":"Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein\u2212ligand complexes","volume":"49","author":"Friesner","year":"2006","journal-title":"J. Med. Chem"},{"key":"2023051709341013200_btaa982-B13","doi-asserted-by":"crossref","first-page":"1739","DOI":"10.1021\/jm0306430","article-title":"Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy","volume":"47","author":"Friesner","year":"2004","journal-title":"J. Med. Chem"},{"key":"2023051709341013200_btaa982-B14","doi-asserted-by":"crossref","first-page":"1750","DOI":"10.1021\/jm030644s","article-title":"Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening","volume":"47","author":"Halgren","year":"2004","journal-title":"J. Med. Chem"},{"key":"2023051709341013200_btaa982-B15","doi-asserted-by":"crossref","first-page":"2791","DOI":"10.1021\/acs.jcim.0c00075","article-title":"RosENet: improving binding affinity prediction by leveraging molecular mechanics energies with an ensemble of 3D convolutional neural networks","volume":"60","author":"Hassan-Harrirou","year":"2020","journal-title":"J. Chem. Inf. Model"},{"key":"2023051709341013200_btaa982-B16","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1021\/acs.jcim.7b00650","article-title":"KDEEP: protein\u2013ligand absolute binding affinity prediction via 3D-convolutional neural networks","volume":"58","author":"Jim\u00e9nez","year":"2018","journal-title":"J. Chem. Inf. Model"},{"key":"2023051709341013200_btaa982-B17","doi-asserted-by":"crossref","first-page":"727","DOI":"10.1006\/jmbi.1996.0897","article-title":"Development and validation of a genetic algorithm for flexible docking 1 1Edited by F. E. Cohen","volume":"267","author":"Jones","year":"1997","journal-title":"J. Mol. Biol"},{"key":"2023051709341013200_btaa982-B8549621","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Communications of the ACM"},{"key":"2023051709341013200_btaa982-B18","doi-asserted-by":"crossref","first-page":"822","DOI":"10.1016\/j.bmc.2009.11.050","article-title":"Novel and selective DNA methyltransferase inhibitors: docking-based virtual screening and experimental evaluation","volume":"18","author":"Kuck","year":"2010","journal-title":"Bioorg. Med. Chem"},{"key":"2023051709341013200_btaa982-B19","doi-asserted-by":"crossref","first-page":"665","DOI":"10.1038\/s41592-020-0848-2","article-title":"Macromolecular modeling and design in Rosetta: recent methods and frameworks","volume":"17","author":"Leman","year":"2020","journal-title":"Nat. Methods"},{"key":"2023051709341013200_btaa982-B20","doi-asserted-by":"crossref","first-page":"592","DOI":"10.1021\/ci300493w","article-title":"ID-score: a new empirical scoring function based on a comprehensive set of descriptors related to protein\u2013ligand interactions","volume":"53","author":"Li","year":"2013","journal-title":"J. Chem. Inf. Model"},{"key":"2023051709341013200_btaa982-B21","doi-asserted-by":"crossref","first-page":"3989","DOI":"10.1093\/bioinformatics\/btz183","article-title":"Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data","volume":"35","author":"Li","year":"2019","journal-title":"Bioinformatics"},{"key":"2023051709341013200_btaa982-B22","doi-asserted-by":"crossref","first-page":"10947","DOI":"10.3390\/molecules200610947","article-title":"Low-quality structural and interaction data improves binding affinity prediction via random forest","volume":"20","author":"Li","year":"2015","journal-title":"Molecules"},{"key":"2023051709341013200_btaa982-B23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1002\/wcms.1465","article-title":"Machine-learning scoring functions for structure-based drug lead optimization","volume":"10","author":"Li","year":"2020","journal-title":"Wiley Interdiscip. Rev. Comput. Mol. Sci"},{"key":"2023051709341013200_btaa982-B24","doi-asserted-by":"crossref","first-page":"12","DOI":"10.3390\/biom8010012","article-title":"The impact of protein structure and sequence similarity on the accuracy of machine-learning scoring functions for binding affinity prediction","volume":"8","author":"Li","year":"2018","journal-title":"Biomolecules"},{"key":"2023051709341013200_btaa982-B25","doi-asserted-by":"crossref","first-page":"666","DOI":"10.1038\/nprot.2017.114","article-title":"Assessing protein\u2013ligand interaction scoring functions with the CASF-2013 benchmark","volume":"13","author":"Li","year":"2018","journal-title":"Nat. Protoc"},{"key":"2023051709341013200_btaa982-B26","doi-asserted-by":"crossref","first-page":"1700","DOI":"10.1021\/ci500080q","article-title":"Comparative assessment of scoring functions on an updated benchmark: 1. Compilation of the test set","volume":"54","author":"Li","year":"2014","journal-title":"J. Chem. Inf. Model"},{"key":"2023051709341013200_btaa982-B27","doi-asserted-by":"crossref","first-page":"1717","DOI":"10.1021\/ci500081m","article-title":"Comparative assessment of scoring functions on an updated benchmark: 2. Evaluation methods and general results","volume":"54","author":"Li","year":"2014","journal-title":"J. Chem. Inf. Model"},{"key":"2023051709341013200_btaa982-B28","doi-asserted-by":"crossref","first-page":"475","DOI":"10.1021\/ci500731a","article-title":"Classification of current scoring functions","volume":"55","author":"Liu","year":"2015","journal-title":"J. Chem. Inf. Model"},{"key":"2023051709341013200_btaa982-B29","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1021\/acs.accounts.6b00491","article-title":"Forging the basis for developing protein\u2013ligand interaction scoring functions","volume":"50","author":"Liu","year":"2017","journal-title":"Acc. Chem. Res"},{"key":"2023051709341013200_btaa982-B30","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1093\/bioinformatics\/btu626","article-title":"PDB-wide collection of binding data: current status of the PDBbind database","volume":"31","author":"Liu","year":"2015","journal-title":"Bioinformatics"},{"key":"2023051709341013200_btaa982-B31","doi-asserted-by":"crossref","first-page":"4540","DOI":"10.1021\/acs.jcim.9b00645","article-title":"Incorporating explicit water molecules and ligand conformation stability in machine-learning scoring functions","volume":"59","author":"Lu","year":"2019","journal-title":"J. Chem. Inf. Model"},{"key":"2023051709341013200_btaa982-B32","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1038\/s41586-019-0917-9","article-title":"Ultra-large library docking for discovering new chemotypes","volume":"566","author":"Lyu","year":"2019","journal-title":"Nature"},{"key":"2023051709341013200_btaa982-B33","doi-asserted-by":"crossref","first-page":"1715","DOI":"10.1021\/acs.jcim.7b00226","article-title":"Rigidity strengthening: a mechanism for protein\u2013ligand binding","volume":"57","author":"Nguyen","year":"2017","journal-title":"J. Chem. Inf. Model"},{"key":"2023051709341013200_btaa982-B34","doi-asserted-by":"crossref","first-page":"3291","DOI":"10.1021\/acs.jcim.9b00334","article-title":"AGL-score: algebraic graph learning score for protein\u2013ligand binding scoring, ranking, docking, and screening","volume":"59","author":"Nguyen","year":"2019","journal-title":"J. Chem. Inf. Model"},{"key":"2023051709341013200_btaa982-B35","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1002\/cnm.3179","article-title":"DG-GL: differential geometry-based geometric learning of molecular datasets","volume":"35","author":"Nguyen","year":"2019","journal-title":"Int. J. Numer. Method Biomed. Eng"},{"key":"2023051709341013200_btaa982-B36","first-page":"2825","article-title":"Scikit-learn: machine learning in {P}ython","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023051709341013200_btaa982-B37","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1021\/ci100050t","article-title":"Extended-connectivity fingerprints","volume":"50","author":"Rogers","year":"2010","journal-title":"J. Chem. Inf. Model"},{"key":"2023051709341013200_btaa982-B38","doi-asserted-by":"crossref","first-page":"e1003571","DOI":"10.1371\/journal.pcbi.1003571","article-title":"rDock: a fast, versatile and open source program for docking ligands to proteins and nucleic acids","volume":"10","author":"Ruiz-Carmona","year":"2014","journal-title":"PLoS Comput. Biol"},{"key":"2023051709341013200_btaa982-B39","doi-asserted-by":"crossref","first-page":"3666","DOI":"10.1093\/bioinformatics\/bty374","article-title":"Development and evaluation of a deep learning model for protein\u2013ligand binding affinity prediction","volume":"34","author":"Stepniewska-Dziubinska","year":"2018","journal-title":"Bioinformatics"},{"key":"2023051709341013200_btaa982-B40","doi-asserted-by":"crossref","first-page":"895","DOI":"10.1021\/acs.jcim.8b00545","article-title":"Comparative assessment of scoring functions: the CASF-2016 update","volume":"59","author":"Su","year":"2019","journal-title":"J. Chem. Inf. Model"},{"key":"2023051709341013200_btaa982-B41","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1002\/jcc.21334","article-title":"AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading","volume":"31","author":"Trott","year":"2009","journal-title":"J. Comput. Chem"},{"key":"2023051709341013200_btaa982-B42","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1002\/jcc.24667","article-title":"Improving scoring-docking-screening powers of protein\u2013ligand scoring functions using random forest","volume":"38","author":"Wang","year":"2017","journal-title":"J. Comput. Chem"},{"key":"2023051709341013200_btaa982-B43","doi-asserted-by":"crossref","first-page":"1334","DOI":"10.1093\/bioinformatics\/bty757","article-title":"Development of a protein\u2013ligand extended connectivity (PLEC) fingerprint and its application for binding affinity predictions","volume":"35","author":"W\u00f3jcikowski","year":"2019","journal-title":"Bioinformatics"},{"key":"2023051709341013200_btaa982-B44","doi-asserted-by":"crossref","first-page":"15956","DOI":"10.1021\/acsomega.9b01997","article-title":"OnionNet: a multiple-layer intermolecular-contact-based convolutional neural network for protein\u2013ligand binding affinity prediction","volume":"4","author":"Zheng","year":"2019","journal-title":"ACS Omega"},{"key":"2023051709341013200_btaa982-B45","doi-asserted-by":"crossref","first-page":"1923","DOI":"10.1021\/ci400120b","article-title":"SFCscoreRF: a random forest-based scoring function for improved affinity prediction of protein\u2013ligand complexes","volume":"53","author":"Zilian","year":"2013","journal-title":"J. Chem. Inf. Model"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa982\/34774458\/btaa982.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/10\/1376\/50360818\/btaa982.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/10\/1376\/50360818\/btaa982.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,26]],"date-time":"2023-10-26T19:14:37Z","timestamp":1698347677000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/10\/1376\/5998664"}},"subtitle":[],"editor":[{"given":"Elofsson","family":"Arne","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,12,7]]},"references-count":44,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2021,6,16]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa982","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,5,15]]},"published":{"date-parts":[[2020,12,7]]}}}