{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T11:55:10Z","timestamp":1773230110863,"version":"3.50.1"},"reference-count":37,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T00:00:00Z","timestamp":1773100800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>An accurate prediction of how strongly a drug binds to its target (where the drug will have the desired effect) is very important for drug discovery. It helps select the most promising compounds and saves money by doing fewer experiments. We present DTBAffinity, a multi-modal regression framework that integrates chemically meaningful ligand descriptors with diverse protein sequence features in a unified gradient-boosting model. The representation of ligands includes physicochemical and topological descriptors (RDKit and Mordred), structural keys (MACCS and FP4), circular fingerprints (ECFP\/Morgan), and SMILES-derived features from iFeatureOmega. For proteins, thousands of sequence-derived descriptors (composition, autocorrelations, physicochemical profiles, and evolutionary indices) from iFeatureOmega are used, together with contextual embeddings from large protein language models (ESM-1b, ESM-2). The feature matrices are cleaned up, variance filtered, z-score scaled, and univariate selected before being concatenated and modeled with regularized XGBoost ensembles. We evaluate DTBAffinity on two kinase-centric datasets that are commonly used: Davis (30,056 interactions: pKd values) and KIBA (118,254 interactions: integrated affinity scores). Various metrics are used to measure the performance, such as MSE, R2, Pearson\/Spearman correlations, Concordance Index (CI), rm2, and AUPR. On Davis, DTBAffinity yields MSE = 0.1885, CI = 0.9102, and AUPR = 0.8112, and on KIBA, it gives MSE = 0.1540, CI = 0.8686, and AUPR = 0.8361; thus, it is better than the state-of-the-art baselines such as KronRLS, SimBoost, DeepDTA, and GraphDTA. The findings here imply that the combination of interpretable descriptors and contextual embeddings in a robust boosting framework is a great way to realize accurate, interpretable, and generalizable DTBA prediction.<\/jats:p>","DOI":"10.3390\/computers15030182","type":"journal-article","created":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T14:45:39Z","timestamp":1773153939000},"page":"182","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["DTBAffinity: A Multi-Modal Feature Engineering and Gradient-Boosting Framework for Drug\u2013Target Binding Affinity on Davis and KIBA Benchmarks"],"prefix":"10.3390","volume":"15","author":[{"given":"Meshari","family":"Alazmi","sequence":"first","affiliation":[{"name":"College of Computer Science and Engineering, University of Ha\u2019il, Ha\u2019il 81411, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2026,3,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"844","DOI":"10.1001\/jama.2020.1166","article-title":"Estimated research and development investment needed to bring a new medicine to market","volume":"323","author":"Wouters","year":"2020","journal-title":"JAMA"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.jhealeco.2016.01.012","article-title":"Innovation in the pharmaceutical industry: New estimates of R&D costs","volume":"47","author":"DiMasi","year":"2016","journal-title":"J. Health Econ."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1569","DOI":"10.1001\/jamainternmed.2017.3601","article-title":"Research and development spending to bring a single cancer drug to market","volume":"177","author":"Prasad","year":"2017","journal-title":"JAMA Intern. Med."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1046","DOI":"10.1038\/nbt.1990","article-title":"Comprehensive analysis of kinase inhibitor selectivity","volume":"29","author":"Davis","year":"2011","journal-title":"Nat. Biotechnol."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"735","DOI":"10.1021\/ci400709d","article-title":"Making sense of large-scale kinase inhibitor bioactivity data sets","volume":"54","author":"Tang","year":"2014","journal-title":"J. Chem. Inf. Model."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"W623","DOI":"10.1093\/nar\/gkp456","article-title":"PubChem: A public information system for analyzing bioactivities of small molecules","volume":"37","author":"Wang","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"The UniProt Consortium (2025). UniProt: The Universal Protein Knowledgebase in 2025. Nucleic Acids Res., 53, D609\u2013D619.","DOI":"10.1093\/nar\/gkae1010"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1093\/bib\/bbu010","article-title":"Toward more realistic drug\u2013target interaction predictions with lasso-regularized kernel ridge regression (KronRLS)","volume":"16","author":"Pahikkala","year":"2015","journal-title":"Brief. Bioinform."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1186\/s13321-017-0209-z","article-title":"SimBoost: A read-across approach for predicting drug\u2013target binding affinities using gradient boosting machines","volume":"9","author":"He","year":"2017","journal-title":"J. Cheminform."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"i821","DOI":"10.1093\/bioinformatics\/bty593","article-title":"DeepDTA: Deep drug\u2013target binding affinity prediction","volume":"34","author":"Ozkirimli","year":"2018","journal-title":"Bioinformatics"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1093\/bioinformatics\/btaa921","article-title":"GraphDTA: Predicting drug\u2013target binding affinity with graph neural networks","volume":"37","author":"Nguyen","year":"2021","journal-title":"Bioinformatics"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"830","DOI":"10.1093\/bioinformatics\/btaa880","article-title":"MolTrans: Molecular interaction transformer for drug\u2013target interaction prediction","volume":"37","author":"Huang","year":"2021","journal-title":"Bioinformatics"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"4406","DOI":"10.1093\/bioinformatics\/btaa524","article-title":"TransformerCPI: Improving compound\u2013protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments","volume":"36","author":"Chen","year":"2021","journal-title":"Bioinformatics"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1038\/s41746-025-01464-x","article-title":"Dual modality feature fused neural network integrating binding site information for drug target affinity prediction","volume":"8","author":"He","year":"2025","journal-title":"NPJ Digit. Med."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"5021","DOI":"10.1038\/s41467-025-59917-6","article-title":"DeepDTAGen: A multitask deep learning framework for drug-target affinity prediction and target-aware drugs generation","volume":"16","author":"Shah","year":"2025","journal-title":"Nat. Commun."},{"key":"ref_18","unstructured":"(2024, March 12). RDKit: Open-Source Cheminformatics. Available online: https:\/\/www.rdkit.org\/."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1186\/s13321-018-0258-y","article-title":"Mordred: A molecular descriptor calculator","volume":"10","author":"Moriwaki","year":"2018","journal-title":"J. Cheminform."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1186\/1758-2946-3-33","article-title":"Open Babel: An open chemical toolbox","volume":"3","author":"Banck","year":"2011","journal-title":"J. Cheminform."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1273","DOI":"10.1021\/ci010132r","article-title":"Reoptimization of MDL keys for use in drug discovery","volume":"42","author":"Durant","year":"2002","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1021\/ci100050t","article-title":"Extended-connectivity fingerprints","volume":"50","author":"Rogers","year":"2010","journal-title":"J. Chem. Inf. Model."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1021\/ci00057a005","article-title":"SMILES, a chemical language and information system","volume":"28","author":"Weininger","year":"1988","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"W434","DOI":"10.1093\/nar\/gkac351","article-title":"iFeatureOmega: An integrative platform for feature engineering, visualization and analysis of molecular data","volume":"50","author":"Chen","year":"2022","journal-title":"Nucleic Acids Res."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1093\/nar\/28.1.374","article-title":"AAindex: Amino acid index database","volume":"28","author":"Kawashima","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"ref_27","first-page":"1","article-title":"Prediction of protein subcellular locations by pseudo amino acid composition","volume":"214","author":"Chou","year":"2001","journal-title":"J. Theor. Biol."},{"key":"ref_28","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_29","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1002\/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4","article-title":"Multivariable prognostic models: Issues and methods","volume":"15","author":"Harrell","year":"1996","journal-title":"Stat. Med."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1071","DOI":"10.1002\/jcc.23231","article-title":"Some case studies on application of \u201crm2\u201d metrics for judging quality of quantitative structure\u2013activity relationship predictions: Emphasis on scaling of response data","volume":"34","author":"Roy","year":"2013","journal-title":"J. Comput. Chem."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Davis, J., and Goadrich, M. (2006, January 25\u201329). The relationship between Precision\u2013Recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning (ICML), Pittsburgh, PA, USA.","DOI":"10.1145\/1143844.1143874"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","article-title":"An introduction to ROC analysis","volume":"27","author":"Fawcett","year":"2006","journal-title":"Pattern Recognit. Lett."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Zhao, L., Wang, J., Pang, L., Liu, Y., and Zhang, J. (2020). GANsDTA: Predicting drug-target binding affinity using GANs. Front. Genet., 10.","DOI":"10.3389\/fgene.2019.01243"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"4416","DOI":"10.1038\/s41598-021-83679-y","article-title":"Prediction of drug\u2013target binding affinity using similarity-based convolutional neural network","volume":"11","author":"Shim","year":"2021","journal-title":"Sci. Rep."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1007\/s10822-023-00533-1","article-title":"TeM-DTBA: Time-efficient drug target binding affinity prediction using multiple modalities with Lasso feature selection","volume":"37","author":"Liyaqat","year":"2023","journal-title":"J. Comput. Aided Mol. Des."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/15\/3\/182\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T14:53:06Z","timestamp":1773154386000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/15\/3\/182"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,10]]},"references-count":37,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2026,3]]}},"alternative-id":["computers15030182"],"URL":"https:\/\/doi.org\/10.3390\/computers15030182","relation":{},"ISSN":["2073-431X"],"issn-type":[{"value":"2073-431X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,10]]}}}