{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,12]],"date-time":"2025-12-12T13:45:46Z","timestamp":1765547146255,"version":"3.41.2"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2023,10,9]],"date-time":"2023-10-09T00:00:00Z","timestamp":1696809600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62101094","62131004","62250028"],"award-info":[{"award-number":["62101094","62131004","62250028"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Municipal Government of Quzhou","award":["2022D040"],"award-info":[{"award-number":["2022D040"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Numerous high-accuracy drug\u2013target affinity (DTA) prediction models, whose performance is heavily reliant on the drug and target feature information, are developed at the expense of complexity and interpretability. Feature extraction and optimization constitute a critical step that significantly influences the enhancement of model performance, robustness, and interpretability. Many existing studies aim to comprehensively characterize drugs and targets by extracting features from multiple perspectives; however, this approach has drawbacks: (i) an abundance of redundant or noisy features; and (ii) the feature sets often suffer from high dimensionality.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>In this study, to obtain a model with high accuracy and strong interpretability, we utilize various traditional and cutting-edge feature selection and dimensionality reduction techniques to process self-associated features and adjacent associated features. These optimized features are then fed into learning to rank to achieve efficient DTA prediction. Extensive experimental results on two commonly used datasets indicate that, among various feature optimization methods, the regression tree-based feature selection method is most beneficial for constructing models with good performance and strong robustness. Then, by utilizing Shapley Additive Explanations values and the incremental feature selection approach, we obtain that the high-quality feature subset consists of the top 150D features and the top 20D features have a breakthrough impact on the DTA prediction. In conclusion, our study thoroughly validates the importance of feature optimization in DTA prediction and serves as inspiration for constructing high-performance and high-interpretable models.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>https:\/\/github.com\/RUXIAOQING964914140\/FS_DTA.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad615","type":"journal-article","created":{"date-parts":[[2023,10,9]],"date-time":"2023-10-09T15:05:59Z","timestamp":1696863959000},"source":"Crossref","is-referenced-by-count":14,"title":["Optimization of drug\u2013target affinity prediction methods through feature processing schemes"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2968-6435","authenticated-orcid":false,"given":"Xiaoqing","family":"Ru","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Tsukuba , Tsukuba, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6406-1142","authenticated-orcid":false,"given":"Quan","family":"Zou","sequence":"additional","affiliation":[{"name":"Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China , Chengdu, China"},{"name":"Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China , Quzhou, Zhejiang, China"}]},{"given":"Chen","family":"Lin","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, School of Informatics, Xiamen University , Xiamen, Fujian, 361005, China"}]}],"member":"286","published-online":{"date-parts":[[2023,10,9]]},"reference":[{"key":"2023111005484295400_btad615-B1","first-page":"6","article-title":"Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset","volume":"13","author":"Al Daoud","year":"2019","journal-title":"Int J Comput Inf Eng"},{"key":"2023111005484295400_btad615-B2","doi-asserted-by":"crossref","first-page":"bbab434","DOI":"10.1093\/bib\/bbab434","article-title":"MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors","volume":"23","author":"Bonidia","year":"2022","journal-title":"Brief Bioinform"},{"first-page":"89","year":"2005","author":"Burges","key":"2023111005484295400_btad615-B3"},{"key":"2023111005484295400_btad615-B4","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1016\/j.eswa.2016.12.008","article-title":"Automatic selection of molecular descriptors using random Forest: application to drug discovery","volume":"72","author":"Cano","year":"2017","journal-title":"Expert Syst Appl"},{"first-page":"129","year":"2007","author":"Cao","key":"2023111005484295400_btad615-B5"},{"key":"2023111005484295400_btad615-B6","doi-asserted-by":"crossref","first-page":"2208","DOI":"10.3390\/molecules23092208","article-title":"Machine learning for drug-target interaction prediction","volume":"23","author":"Chen","year":"2018","journal-title":"Molecules"},{"first-page":"785","year":"2016","author":"Chen","key":"2023111005484295400_btad615-B7"},{"key":"2023111005484295400_btad615-B8","doi-asserted-by":"crossref","first-page":"696","DOI":"10.1093\/bib\/bbv066","article-title":"Drug\u2013target interaction prediction: databases, web servers and computational models","volume":"17","author":"Chen","year":"2016","journal-title":"Brief Bioinform"},{"key":"2023111005484295400_btad615-B9","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1186\/s13321-023-00702-2","article-title":"Deep generative model for drug design from protein target sequence","volume":"15","author":"Chen","year":"2023","journal-title":"J Cheminform"},{"key":"2023111005484295400_btad615-B10","doi-asserted-by":"crossref","first-page":"392","DOI":"10.1093\/bfgp\/elad012","article-title":"Molecular language models: RNNs or transformer?","volume":"22","author":"Chen","year":"2023","journal-title":"Brief Funct Genomics"},{"first-page":"46","year":"2021","author":"Fida","key":"2023111005484295400_btad615-B11"},{"key":"2023111005484295400_btad615-B12","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1093\/biomet\/92.4.965","article-title":"Concordance probability and discriminatory power in proportional hazards regression","volume":"92","author":"G\u00f6nen","year":"2005","journal-title":"Biometrika"},{"key":"2023111005484295400_btad615-B13","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1186\/s40537-020-00369-8","article-title":"CatBoost for big data: an interdisciplinary review","volume":"7","author":"Hancock","year":"2020","journal-title":"J Big Data"},{"key":"2023111005484295400_btad615-B14","doi-asserted-by":"crossref","first-page":"5545","DOI":"10.1093\/bioinformatics\/btaa1005","article-title":"DeepPurpose: a deep learning library for drug\u2013target interaction prediction","volume":"36","author":"Huang","year":"2021","journal-title":"Bioinformatics"},{"year":"2017","author":"Ke","key":"2023111005484295400_btad615-B15"},{"year":"2017","author":"Klambauer","key":"2023111005484295400_btad615-B16"},{"key":"2023111005484295400_btad615-B17","doi-asserted-by":"crossref","first-page":"765","DOI":"10.3390\/math8050765","article-title":"Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms","volume":"8","author":"Liang","year":"2020","journal-title":"Mathematics"},{"key":"2023111005484295400_btad615-B18","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1561\/1500000016","article-title":"Learning to rank for information retrieval","volume":"3","author":"Liu","year":"2009","journal-title":"Found Trends Inf Retr"},{"year":"2017","author":"Lundberg","key":"2023111005484295400_btad615-B19"},{"key":"2023111005484295400_btad615-B20","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1038\/s41467-017-00680-8","article-title":"A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information","volume":"8","author":"Luo","year":"2017","journal-title":"Nat Commun"},{"key":"2023111005484295400_btad615-B21","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1093\/bioinformatics\/bts670","article-title":"Drug\u2013target interaction prediction by learning from local information and neighbors","volume":"29","author":"Mei","year":"2013","journal-title":"Bioinformatics"},{"key":"2023111005484295400_btad615-B22","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1093\/bioinformatics\/btaa921","article-title":"GraphDTA: predicting drug\u2013target binding affinity with graph neural networks","volume":"37","author":"Nguyen","year":"2021","journal-title":"Bioinformatics"},{"key":"2023111005484295400_btad615-B23","doi-asserted-by":"crossref","first-page":"i821","DOI":"10.1093\/bioinformatics\/bty593","article-title":"DeepDTA: deep drug\u2013target binding affinity prediction","volume":"34","author":"\u00d6zt\u00fcrk","year":"2018","journal-title":"Bioinformatics"},{"key":"2023111005484295400_btad615-B24","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1093\/bib\/bbu010","article-title":"Toward more realistic drug\u2013target interaction predictions","volume":"16","author":"Pahikkala","year":"2015","journal-title":"Brief Bioinform"},{"year":"2018","author":"Prokhorenkova","key":"2023111005484295400_btad615-B25"},{"key":"2023111005484295400_btad615-B26","doi-asserted-by":"crossref","first-page":"634","DOI":"10.1016\/j.asoc.2018.10.036","article-title":"Feature selection based on artificial bee colony and gradient boosting decision tree","volume":"74","author":"Rao","year":"2019","journal-title":"Appl Soft Comput"},{"key":"2023111005484295400_btad615-B27","doi-asserted-by":"crossref","first-page":"1964","DOI":"10.1093\/bioinformatics\/btac048","article-title":"NerLTR-DTA: drug\u2013target binding affinity prediction based on neighbor relationship and learning to rank","volume":"38","author":"Ru","year":"2022","journal-title":"Bioinformatics"},{"key":"2023111005484295400_btad615-B28","doi-asserted-by":"crossref","first-page":"4569","DOI":"10.1021\/acs.jcim.0c00485","article-title":"ivis dimensionality reduction framework for biomacromolecular simulations","volume":"60","author":"Tian","year":"2020","journal-title":"J Chem Inf Model"},{"key":"2023111005484295400_btad615-B29","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J R Stat Soc Series B Methodol"},{"key":"2023111005484295400_btad615-B30","doi-asserted-by":"crossref","first-page":"815","DOI":"10.1049\/cje.2021.06.003","article-title":"A machine learning method for differentiating and predicting human-infective coronavirus based on physicochemical features and composition of the spike protein","volume":"30","author":"Wang","year":"2021","journal-title":"Chin J Electron"},{"key":"2023111005484295400_btad615-B31","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/0169-7439(87)80084-9","article-title":"Principal component analysis","volume":"2","author":"Wold","year":"1987","journal-title":"Chemometr Intell Lab Syst"},{"key":"2023111005484295400_btad615-B32","doi-asserted-by":"crossref","first-page":"i246","DOI":"10.1093\/bioinformatics\/btq176","article-title":"Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework","volume":"26","author":"Yamanishi","year":"2010","journal-title":"Bioinformatics"},{"first-page":"2061","year":"2009","author":"Ye","key":"2023111005484295400_btad615-B33"},{"key":"2023111005484295400_btad615-B34","doi-asserted-by":"crossref","first-page":"bbab506","DOI":"10.1093\/bib\/bbab506","article-title":"FusionDTA: attention-based feature polymerizer and knowledge distillation for drug-target binding affinity prediction","volume":"23","author":"Yuan","year":"2022","journal-title":"Brief Bioinform"},{"key":"2023111005484295400_btad615-B35","doi-asserted-by":"crossref","first-page":"bbab117","DOI":"10.1093\/bib\/bbab117","article-title":"Deep drug-target binding affinity prediction with multiple attention blocks","volume":"22","author":"Zeng","year":"2021","journal-title":"Brief Bioinform"},{"key":"2023111005484295400_btad615-B36","doi-asserted-by":"crossref","first-page":"218","DOI":"10.21037\/atm.2016.03.37","article-title":"Introduction to machine learning: k-nearest neighbors","volume":"4","author":"Zhang","year":"2016","journal-title":"Ann Transl Med"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad615\/51960342\/btad615.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/11\/btad615\/53006768\/btad615.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/11\/btad615\/53006768\/btad615.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T09:11:50Z","timestamp":1730279510000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad615\/7301469"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,10,9]]},"references-count":36,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2023,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad615","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2023,11,1]]},"published":{"date-parts":[[2023,10,9]]},"article-number":"btad615"}}