{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,29]],"date-time":"2026-05-29T16:02:57Z","timestamp":1780070577825,"version":"3.54.0"},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"Supplement_2","license":[{"start":{"date-parts":[[2022,9,1]],"date-time":"2022-09-01T00:00:00Z","timestamp":1661990400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Pakistan HEC","award":["ECCB2022"],"award-info":[{"award-number":["ECCB2022"]}]},{"name":"Pakistan HEC","award":["NRPU 6085"],"award-info":[{"award-number":["NRPU 6085"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,16]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Machine-learning-based prediction of compound\u2013protein interactions (CPIs) is important for drug design, screening and repurposing. Despite numerous recent publication with increasing methodological sophistication claiming consistent improvements in predictive accuracy, we have observed a number of fundamental issues in experiment design that produce overoptimistic estimates of model performance.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We systematically analyze the impact of several factors affecting generalization performance of CPI predictors that are overlooked in existing work: (i) similarity between training and test examples in cross-validation; (ii) synthesizing negative examples in absence of experimentally verified negative examples and (iii) alignment of evaluation protocol and performance metrics with real-world use of CPI predictors in screening large compound libraries. Using both state-of-the-art approaches by other researchers as well as a simple kernel-based baseline, we have found that effective assessment of generalization performance of CPI predictors requires careful control over similarity between training and test examples. We show that, under stringent performance assessment protocols, a simple kernel-based approach can exceed the predictive performance of existing state-of-the-art methods. We also show that random pairing for generating synthetic negative examples for training and performance evaluation results in models with better generalization in comparison to more sophisticated strategies used in existing studies. Our analyses indicate that using proposed experiment design strategies can offer significant improvements for CPI prediction leading to effective target compound screening for drug repurposing and discovery of putative chemical ligands of SARS-CoV-2-Spike and Human-ACE2 proteins.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>Code and supplementary material available at https:\/\/github.com\/adibayaseen\/HKRCPI.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac496","type":"journal-article","created":{"date-parts":[[2022,9,20]],"date-time":"2022-09-20T09:20:11Z","timestamp":1663665611000},"page":"ii75-ii81","source":"Crossref","is-referenced-by-count":10,"title":["Insights into performance evaluation of compound\u2013protein interaction prediction methods"],"prefix":"10.1093","volume":"38","author":[{"given":"Adiba","family":"Yaseen","sequence":"first","affiliation":[{"name":"Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS) , Islamabad 45650, Pakistan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Imran","family":"Amin","sequence":"additional","affiliation":[{"name":"National Institute for Biotechnology and Genetic Engineering , Faisalabad 38000, Pakistan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Naeem","family":"Akhter","sequence":"additional","affiliation":[{"name":"Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS) , Islamabad 45650, Pakistan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Asa","family":"Ben-Hur","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Colorado State University , Fort Collins, CO 80523, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9129-1189","authenticated-orcid":false,"given":"Fayyaz","family":"Minhas","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Warwick , Coventry CV4 7AL, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2022,9,18]]},"reference":[{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1471-2105-7-S1-S2","article-title":"Choosing negative examples for the prediction of protein\u2013protein interactions","volume":"7","author":"Ben-Hur","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"2397","DOI":"10.1093\/bioinformatics\/btp433","article-title":"Supervised prediction of drug\u2013target interactions using bipartite local models","volume":"25","author":"Bleakley","year":"2009","journal-title":"Bioinformatics"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1038\/nrg1317","article-title":"Chemogenomics: an emerging strategy for rapid target and drug discovery","volume":"5","author":"Bredel","year":"2004","journal-title":"Nat. Rev. Genet"},{"key":"2023041408003634700_","first-page":"14","article-title":"High-throughput screening for drug discovery","volume":"384","author":"Broach","year":"1996","journal-title":"Nature"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"1092","DOI":"10.1093\/bioinformatics\/btt105","article-title":"ChemoPy: freely available python package for computational biology and chemoinformatics","volume":"29","author":"Cao","year":"2013","journal-title":"Bioinformatics"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"e0220113","DOI":"10.1371\/journal.pone.0220113","article-title":"Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening","volume":"14","author":"Chen","year":"2019","journal-title":"PLoS ONE"},{"key":"2023041408003634700_","first-page":"4406","article-title":"TransformerCPI: improving compound\u2013protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments","volume":"36","author":"Chen","year":"2020","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"2208","DOI":"10.3390\/molecules23092208","article-title":"Machine learning for drug\u2013target interaction prediction","volume":"23","author":"Chen","year":"2018","journal-title":"Molecules"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"696","DOI":"10.1093\/bib\/bbv066","article-title":"Drug\u2013target interaction prediction: databases, web servers and computational models","volume":"17","author":"Chen","year":"2016","journal-title":"Brief. Bioinform"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1093\/bib\/bbt056","article-title":"Similarity-based machine learning methods for predicting drug\u2013target interactions: a brief review","volume":"15","author":"Ding","year":"2014","journal-title":"Brief. Bioinform"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"D1045","DOI":"10.1093\/nar\/gkv1072","article-title":"BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology","volume":"44","author":"Gilson","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"2304","DOI":"10.1093\/bioinformatics\/bts360","article-title":"Predicting drug\u2013target interactions from chemical and genomic kernels using bayesian matrix factorization","volume":"28","author":"G\u00f6nen","year":"2012","journal-title":"Bioinformatics"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"D919","DOI":"10.1093\/nar\/gkm862","article-title":"SuperTarget and matador: resources for exploring drug\u2013target relationships","volume":"36","author":"G\u00fcnther","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"i802","DOI":"10.1093\/bioinformatics\/bty573","article-title":"Predicting protein\u2013protein interactions through sequence-based deep learning","volume":"34","author":"Hashemifar","year":"2018","journal-title":"Bioinformatics"},{"key":"2023041408003634700_","first-page":"680","article-title":"CD-HIT suite: a web server for clustering and comparing biological sequences","volume":"26","author":"Huang","year":"2010","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023041408003634700_","first-page":"2149","article-title":"Protein\u2013ligand interaction prediction: an improved chemogenomics approach","volume":"24","author":"Jacob","year":"2008","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"1193","DOI":"10.1007\/s12272-016-0791-z","article-title":"Target identification for biologically active small molecules using chemical biology approaches","volume":"39","author":"Lee","year":"2016","journal-title":"Arch. Pharm. Res"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"1541","DOI":"10.1016\/j.csbj.2021.03.004","article-title":"A review on compound\u2013protein interaction prediction methods: data, format, representation and model","volume":"19","author":"Lim","year":"2021","journal-title":"Comput. Struct. Biotechnol. J"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"i221","DOI":"10.1093\/bioinformatics\/btv256","article-title":"Improving compound\u2013protein interaction prediction by building up highly credible negative samples","volume":"31","author":"Liu","year":"2015","journal-title":"Bioinformatics"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"1768","DOI":"10.2174\/1568026615666150506151533","article-title":"Application of SMILES notation based optimal descriptors in drug discovery and design","volume":"15","author":"Veselinovic","year":"2015","journal-title":"Curr. Top. Med. Chem"},{"key":"2023041408003634700_","first-page":"1141","article-title":"Large-scale data-driven integrative framework for extracting essential targets and processes from disease-associated gene data sets","volume":"19","author":"Mazandu","year":"2018","journal-title":"Brief. Bioinform"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"1142","DOI":"10.1002\/prot.24479","article-title":"PAIRpred: partner-specific prediction of interacting residues from sequence and structure","volume":"82","author":"Minhas","year":"2014","journal-title":"Proteins"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"6582","DOI":"10.1021\/jm300687e","article-title":"Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking","volume":"55","author":"Mysinger","year":"2012","journal-title":"J. Med. Chem"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1093\/bioinformatics\/btaa921","article-title":"GraphDTA: predicting drug\u2013target binding affinity with graph neural networks","volume":"37","author":"Nguyen","year":"2021","journal-title":"Bioinformatics"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"i821","DOI":"10.1093\/bioinformatics\/bty593","article-title":"DeepDTA: deep drug\u2013target binding affinity prediction","volume":"34","author":"\u00d6zt\u00fcrk","year":"2018","journal-title":"Bioinformatics"},{"key":"2023041408003634700_","article-title":"WideDTA: prediction of drug\u2013target binding affinity","author":"\u00d6zt\u00fcrk","year":"2019","journal-title":"ArXiv"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1038\/d41586-019-02307-y","article-title":"Three pitfalls to avoid in machine learning","volume":"572","author":"Riley","year":"2019","journal-title":"Nature"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1021\/ci8002649","article-title":"Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data","volume":"49","author":"Rohrer","year":"2009","journal-title":"J. Chem. Inform. Model"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1016\/j.drudis.2015.08.001","article-title":"Identifying compound efficacy targets in phenotypic drug discovery","volume":"21","author":"Schirle","year":"2016","journal-title":"Drug Discov. Today"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"947","DOI":"10.1021\/acs.jcim.8b00712","article-title":"In need of bias control: evaluating chemical data for machine learning in structure-based virtual screening","volume":"59","author":"Sieg","year":"2019","journal-title":"J. Chem. Inform. Model"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"D1137","DOI":"10.1093\/nar\/gkx1088","article-title":"SuperDRUG2: a one stop resource for approved\/marketed drugs","volume":"46","author":"Siramshetty","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"782","DOI":"10.3389\/fchem.2019.00782","article-title":"Comparison study of computational prediction tools for drug\u2013target binding affinities","volume":"7","author":"Thafar","year":"2019","journal-title":"Front. Chem"},{"key":"2023041408003634700_","first-page":"309","article-title":"Compound\u2013protein interaction prediction with end-to-end learning of neural networks for graphs and sequences","volume":"35","author":"Tsubaki","year":"2019","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023041408003634700_","first-page":"1132","author":"Wang","year":"2020"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"D901","DOI":"10.1093\/nar\/gkm958","article-title":"DrugBank: a knowledgebase for drugs, drug actions and drug targets","volume":"36","author":"Wishart","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1021\/acs.jcim.6b00491","article-title":"Computational multitarget drug design","volume":"57","author":"Zhang","year":"2017","journal-title":"J. Chem. Inform. Model"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"690049","DOI":"10.3389\/fgene.2021.690049","article-title":"Graph neural networks and their current applications in bioinformatics","volume":"12","author":"Zhang","year":"2021","journal-title":"Front. Genet"},{"key":"2023041408003634700_","doi-asserted-by":"crossref","first-page":"2141","DOI":"10.1093\/bib\/bbaa044","article-title":"Identifying drug\u2013target interactions based on graph convolutional network and deep neural network","volume":"22","author":"Zhao","year":"2021","journal-title":"Brief. Bioinform"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_2\/ii75\/49886471\/btac496.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_2\/ii75\/49886471\/btac496.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,26]],"date-time":"2023-11-26T04:30:10Z","timestamp":1700973010000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/Supplement_2\/ii75\/6702015"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,1]]},"references-count":38,"journal-issue":{"issue":"Supplement_2","published-print":{"date-parts":[[2022,9,16]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac496","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,9,1]]},"published":{"date-parts":[[2022,9,1]]}}}