{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,11]],"date-time":"2026-05-11T10:49:47Z","timestamp":1778496587307,"version":"3.51.4"},"reference-count":54,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2023,9,22]],"date-time":"2023-09-22T00:00:00Z","timestamp":1695340800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62122089"],"award-info":[{"award-number":["62122089"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Beijing Outstanding Young Scientist Program","award":["BJJWZYJH012019100020098"],"award-info":[{"award-number":["BJJWZYJH012019100020098"]}]},{"name":"Intelligent Social Governance Platform, Major Innovation & Planning Interdisciplinary Platform"},{"DOI":"10.13039\/501100004260","name":"Renmin University of China","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004260","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Research Funds of Renmin University of China"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,9,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Accurate prediction of drug\u2013target affinity (DTA) is of vital importance in early-stage drug discovery, facilitating the identification of drugs that can effectively interact with specific targets and regulate their activities. While wet experiments remain the most reliable method, they are time-consuming and resource-intensive, resulting in limited data availability that poses challenges for deep learning approaches. Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue. To overcome this challenge, we present the Semi-Supervised Multi-task training (SSM) framework for DTA prediction, which incorporates three simple yet highly effective strategies: (1) A multi-task training approach that combines DTA prediction with masked language modeling using paired drug\u2013target data. (2) A semi-supervised training method that leverages large-scale unpaired molecules and proteins to enhance drug and target representations. This approach differs from previous methods that only employed molecules or proteins in pre-training. (3) The integration of a lightweight cross-attention module to improve the interaction between drugs and targets, further enhancing prediction accuracy. Through extensive experiments on benchmark datasets such as BindingDB, DAVIS and KIBA, we demonstrate the superior performance of our framework. Additionally, we conduct case studies on specific drug\u2013target binding activities, virtual screening experiments, drug feature visualizations and real-world applications, all of which showcase the significant potential of our work. In conclusion, our proposed SSM-DTA framework addresses the data limitation challenge in DTA prediction and yields promising results, paving the way for more efficient and accurate drug discovery processes.<\/jats:p>","DOI":"10.1093\/bib\/bbad386","type":"journal-article","created":{"date-parts":[[2023,10,30]],"date-time":"2023-10-30T23:27:53Z","timestamp":1698708473000},"source":"Crossref","is-referenced-by-count":36,"title":["Breaking the barriers of data scarcity in drug\u2013target affinity prediction"],"prefix":"10.1093","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7242-422X","authenticated-orcid":false,"given":"Qizhi","family":"Pei","sequence":"first","affiliation":[{"name":"Gaoling School of Artificial Intelligence, Renmin University of China , No.59, Zhong Guan Cun Avenue, Haidian District, 100872, Beijing , China"},{"name":"Engineering Research Center of Next-Generation Intelligent Search and Recommendation, Ministry of Education"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3530-590X","authenticated-orcid":false,"given":"Lijun","family":"Wu","sequence":"additional","affiliation":[{"name":"Microsoft Research AI4Science , No.5, Dan Ling Street, Haidian District, 100080, Beijing , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2157-9077","authenticated-orcid":false,"given":"Jinhua","family":"Zhu","sequence":"additional","affiliation":[{"name":"CAS Key Laboratory of GIPAS , EEIS Department, , No.96, JinZhai Road, Baohe District, 230026, Hefei, Anhui Province , China"},{"name":"University of Science and Technology of China , EEIS Department, , No.96, JinZhai Road, Baohe District, 230026, Hefei, Anhui Province , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9823-9033","authenticated-orcid":false,"given":"Yingce","family":"Xia","sequence":"additional","affiliation":[{"name":"Microsoft Research AI4Science , No.5, Dan Ling Street, Haidian District, 100080, Beijing , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7126-0139","authenticated-orcid":false,"given":"Shufang","family":"Xie","sequence":"additional","affiliation":[{"name":"Gaoling School of Artificial Intelligence, Renmin University of China , No.59, Zhong Guan Cun Avenue, Haidian District, 100872, Beijing , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9095-0776","authenticated-orcid":false,"given":"Tao","family":"Qin","sequence":"additional","affiliation":[{"name":"Microsoft Research AI4Science , No.5, Dan Ling Street, Haidian District, 100080, Beijing , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7324-6632","authenticated-orcid":false,"given":"Haiguang","family":"Liu","sequence":"additional","affiliation":[{"name":"Microsoft Research AI4Science , No.5, Dan Ling Street, Haidian District, 100080, Beijing , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tie-Yan","family":"Liu","sequence":"additional","affiliation":[{"name":"Microsoft Research AI4Science , No.5, Dan Ling Street, Haidian District, 100080, Beijing , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rui","family":"Yan","sequence":"additional","affiliation":[{"name":"Gaoling School of Artificial Intelligence, Renmin University of China , No.59, Zhong Guan Cun Avenue, Haidian District, 100872, Beijing , China"},{"name":"Beijing Key Laboratory of Big Data Management and Analysis Methods"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2023,10,30]]},"reference":[{"key":"2023120619102176000_ref1","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1038\/nrd3078","article-title":"How to improve r&d productivity: the pharmaceutical industry\u2019s grand challenge","volume":"9","author":"Paul","year":"2010","journal-title":"Nat Rev Drug Discov"},{"key":"2023120619102176000_ref2","doi-asserted-by":"crossref","DOI":"10.1002\/jcc.21334","article-title":"Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading","volume":"31","author":"Trott","year":"2010","journal-title":"J Comput Chem"},{"key":"2023120619102176000_ref3","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1080\/17460441.2018.1403419","article-title":"Molecular dynamics simulations and novel drug discovery","volume":"13","author":"Liu","year":"2018","journal-title":"Expert Opin Drug Discov"},{"key":"2023120619102176000_ref4","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1093\/bioinformatics\/bty535","article-title":"Compound\u2013protein interaction prediction with end-to-end learning of neural networks for graphs and sequences","volume":"35","author":"Tsubaki","year":"2019","journal-title":"Bioinformatics"},{"key":"2023120619102176000_ref5","doi-asserted-by":"crossref","first-page":"3329","DOI":"10.1093\/bioinformatics\/btz111","article-title":"Deepaffinity: interpretable deep learning of compound\u2013protein affinity through unified recurrent and convolutional neural networks","volume":"35","author":"Karimi","year":"2019","journal-title":"Bioinformatics"},{"key":"2023120619102176000_ref6","doi-asserted-by":"crossref","first-page":"5545","DOI":"10.1093\/bioinformatics\/btaa1005","article-title":"Deeppurpose: a deep learning library for drug\u2013target interaction prediction","volume":"36","author":"Huang","year":"2020","journal-title":"Bioinformatics"},{"key":"2023120619102176000_ref7","doi-asserted-by":"crossref","first-page":"4406","DOI":"10.1093\/bioinformatics\/btaa524","article-title":"Transformercpi: improving compound\u2013protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments","volume":"36","author":"Chen","year":"2020","journal-title":"Bioinformatics"},{"key":"2023120619102176000_ref8","author":"Devlin"},{"key":"2023120619102176000_ref9","article-title":"Roberta: a robustly optimized bert pretraining approach","author":"Liu","year":"2019","journal-title":"arXiv:190711692"},{"key":"2023120619102176000_ref10","doi-asserted-by":"crossref","first-page":"3636","DOI":"10.2174\/1568026616666160530181149","article-title":"Molecular docking for identification of potential targets for drug repurposing","volume":"16","author":"Luo","year":"2016","journal-title":"Curr Top Med Chem"},{"key":"2023120619102176000_ref11","doi-asserted-by":"crossref","first-page":"e1005678","DOI":"10.1371\/journal.pcbi.1005678","article-title":"Computational-experimental approach to drug-target interaction mapping: a case study on kinase inhibitors","volume":"13","author":"Cichonska","year":"2017","journal-title":"PLoS Comput Biol"},{"key":"2023120619102176000_ref12","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1093\/bib\/bbu010","article-title":"Toward more realistic drug\u2013target interaction predictions","volume":"16","author":"Pahikkala","year":"2015","journal-title":"Brief Bioinform"},{"key":"2023120619102176000_ref13","article-title":"Simboost: a read-across approach for predicting drug\u2013target binding affinities using gradient boosting machines","volume":"9","author":"He","journal-title":"J Chem"},{"key":"2023120619102176000_ref14","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1016\/j.cels.2020.03.002","article-title":"Monn: a multi-objective neural network for predicting compound-protein interactions and affinities","volume":"10","author":"Li","year":"2020","journal-title":"Cell Systems"},{"issue":"1","key":"2023120619102176000_ref15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-022-04905-6","article-title":"Generalizeddta: combining pre-training and multi-task learning to predict drug-target binding affinity for unknown drug discovery","volume":"23","author":"Lin","year":"2022","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"2023120619102176000_ref16","doi-asserted-by":"crossref","first-page":"bbab506","DOI":"10.1093\/bib\/bbab506","article-title":"Fusiondta: attention-based feature polymerizer and knowledge distillation for drug-target binding affinity prediction","volume":"23","author":"Yuan","year":"2022","journal-title":"Brief Bioinform"},{"key":"2023120619102176000_ref17","doi-asserted-by":"crossref","first-page":"i821","DOI":"10.1093\/bioinformatics\/bty593","article-title":"Deepdta: deep drug\u2013target binding affinity prediction","volume":"34","author":"\u00d6zt\u00fcrk","year":"2018","journal-title":"Bioinformatics"},{"key":"2023120619102176000_ref18","doi-asserted-by":"crossref","first-page":"1995","DOI":"10.1093\/bioinformatics\/btac035","article-title":"Bacpi: a bi-directional attention neural network for compound\u2013protein interaction and binding affinity prediction","volume":"38","author":"Li","year":"2022","journal-title":"Bioinformatics"},{"key":"2023120619102176000_ref19","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbac269","article-title":"Mitigating cold-start problems in drug-target affinity prediction with interaction knowledge transferring","volume":"23","author":"Nguyen","year":"2022","journal-title":"Brief Bioinform"},{"key":"2023120619102176000_ref20","article-title":"Molecular representation learning with language models and domain-relevant auxiliary tasks","author":"Fabian","year":"2020","journal-title":"arXiv:201113230"},{"key":"2023120619102176000_ref21","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1021\/ci00057a005","article-title":"Smiles, a chemical language and information system. 1. Introduction to methodology and encoding rules","volume":"28","author":"Weininger","year":"1988","journal-title":"J Chem Inf Comput Sci"},{"key":"2023120619102176000_ref22","article-title":"Strategies for pre-training graph neural networks","author":"Hu","year":"2019"},{"key":"2023120619102176000_ref23","doi-asserted-by":"crossref","DOI":"10.1145\/3307339.3342186","article-title":"Smiles-bert: large scale unsupervised pre-training for molecular property prediction","volume-title":"International conference on bioinformatics, computational biology and health informatics","author":"Wang","year":"2019"},{"key":"2023120619102176000_ref24","article-title":"Chemberta: large-scale self-supervised pretraining for molecular property prediction","author":"Chithrananda","year":"2020"},{"key":"2023120619102176000_ref25","article-title":"N-gram graph: simple unsupervised representation for graphs, with applications to molecules","volume":"32","author":"Liu","journal-title":"NeurIPS"},{"key":"2023120619102176000_ref26","article-title":"Self-supervised graph transformer on large-scale molecular data","volume":"33","author":"Rong","journal-title":"NeurIPS"},{"key":"2023120619102176000_ref27","article-title":"Gcc: graph contrastive coding for graph neural network pre-training","author":"Qiu","year":"2020","journal-title":"SIGKDD"},{"key":"2023120619102176000_ref28","article-title":"Proteinbert: a universal deep-learning model of protein sequence and function","volume":"38","author":"Brandes"},{"key":"2023120619102176000_ref29","article-title":"Evaluating protein transfer learning with tape","volume":"32","author":"Rao","year":"2019","journal-title":"NeurIPS"},{"key":"2023120619102176000_ref30","article-title":"Et al.","volume":"118","author":"Rives","journal-title":"Proc Natl Acad Sci"},{"key":"2023120619102176000_ref31","article-title":"Attention is all you need","volume":"30","author":"Vaswani","journal-title":"NIPS"},{"key":"2023120619102176000_ref32","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1016\/j.neucom.2021.03.091","article-title":"A review on the attention mechanism of deep learning","volume":"452","author":"Niu","year":"2021","journal-title":"Neurocomputing"},{"key":"2023120619102176000_ref33","article-title":"Recurrent neural network for text classification with multi-task learning","author":"Liu","year":"2016"},{"key":"2023120619102176000_ref34","doi-asserted-by":"crossref","first-page":"123912","DOI":"10.1109\/ACCESS.2021.3110269","article-title":"Pre-training of deep bidirectional protein sequence representations with structural information","volume":"9","author":"Min","year":"2021","journal-title":"IEEE Access"},{"key":"2023120619102176000_ref35","doi-asserted-by":"crossref","DOI":"10.1109\/ICCV.2019.00305","article-title":"Unsupervised pre-training of image features on non-curated data","volume-title":"ICCV","author":"Caron","year":"2019"},{"key":"2023120619102176000_ref36","doi-asserted-by":"crossref","article-title":"Video swin transformer","author":"Liu","DOI":"10.1109\/CVPR52688.2022.00320"},{"key":"2023120619102176000_ref37","volume-title":"Bridging the gap between pre-training and fine-tuning for end-to-end speech translation","author":"Chengyi, Wang"},{"key":"2023120619102176000_ref38","article-title":"Recall and learn: fine-tuning deep pretrained language models with less forgetting","author":"Chen","year":"2020","journal-title":"EMNLP"},{"key":"2023120619102176000_ref39","doi-asserted-by":"crossref","first-page":"D1202","DOI":"10.1093\/nar\/gkv951","article-title":"Pubchem substance and compound databases","volume":"44","author":"Kim","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023120619102176000_ref40","doi-asserted-by":"crossref","first-page":"D412","DOI":"10.1093\/nar\/gkaa913","article-title":"Pfam: the protein families database in 2021","volume":"49","author":"Mistry","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023120619102176000_ref41","doi-asserted-by":"crossref","first-page":"7140","DOI":"10.1093\/nar\/gkm859","article-title":"Bindingdb: a web-accessible database of experimentally determined protein\u2013ligand binding affinities","volume":"35","author":"Liu","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023120619102176000_ref42","doi-asserted-by":"crossref","first-page":"1046","DOI":"10.1038\/nbt.1990","article-title":"Comprehensive analysis of kinase inhibitor selectivity","volume":"29","author":"Davis","year":"2011","journal-title":"Nat Biotechnol"},{"key":"2023120619102176000_ref43","doi-asserted-by":"crossref","first-page":"735","DOI":"10.1021\/ci400709d","article-title":"Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis","volume":"54","author":"Tang","year":"2014","journal-title":"J Chem Inf Model"},{"issue":"D1","key":"2023120619102176000_ref44","doi-asserted-by":"crossref","first-page":"D1102","DOI":"10.1093\/nar\/gky1033","article-title":"Pubchem 2019 update: improved access to chemical data","volume":"47","author":"Kim","year":"2019","journal-title":"Nucleic Acids Res"},{"issue":"16","key":"2023120619102176000_ref45","doi-asserted-by":"crossref","first-page":"8993","DOI":"10.3390\/ijms22168993","article-title":"Sag-dta: prediction of drug\u2013target affinity using self-attention graph network","volume":"22","author":"Zhang","year":"2021","journal-title":"Int J Mol Sci"},{"issue":"3","key":"2023120619102176000_ref46","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1039\/D1SC05180F","article-title":"Mgraphdta: deep multiscale graph neural network for explainable drug\u2013target binding affinity prediction","volume":"13","author":"Yang","year":"2022","journal-title":"Chem Sci"},{"key":"2023120619102176000_ref47","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1093\/bioinformatics\/btaa921","article-title":"Graphdta: predicting drug\u2013target binding affinity with graph neural networks","volume":"37","author":"Nguyen","year":"2021","journal-title":"Bioinformatics"},{"key":"2023120619102176000_ref48","doi-asserted-by":"crossref","first-page":"4633","DOI":"10.1093\/bioinformatics\/btaa544","article-title":"Deepcda: deep cross-domain compound\u2013protein affinity prediction through lstm and convolutional neural networks","volume":"36","author":"Abbasi","year":"2020","journal-title":"Bioinformatics"},{"key":"2023120619102176000_ref49","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1093\/biomet\/92.4.965","article-title":"Concordance probability and discriminatory power in proportional hazards regression","volume":"92","author":"G\u00f6nen","year":"2005","journal-title":"Biometrika"},{"key":"2023120619102176000_ref50","article-title":"Drug-target binding affinity prediction using transformers","author":"Saadat","year":"2022"},{"issue":"1","key":"2023120619102176000_ref51","doi-asserted-by":"crossref","first-page":"4751","DOI":"10.1038\/s41598-022-08787-9","article-title":"Affinity2vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning","volume":"12","author":"Thafar","year":"2022","journal-title":"Sci Rep"},{"issue":"1","key":"2023120619102176000_ref52","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12864-022-08648-9","article-title":"Sequence-based drug-target affinity prediction using weighted graph neural networks","volume":"23","author":"Jiang","year":"2022","journal-title":"BMC Genomics"},{"key":"2023120619102176000_ref53","doi-asserted-by":"crossref","first-page":"D668","DOI":"10.1093\/nar\/gkj067","article-title":"Drugbank: a comprehensive resource for in silico drug discovery and exploration","volume":"34","author":"Wishart","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023120619102176000_ref54","article-title":"Dual-view molecule pre-training","author":"Zhu"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/6\/bbad386\/54034646\/bbad386.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/6\/bbad386\/54034646\/bbad386.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,7]],"date-time":"2023-12-07T05:36:01Z","timestamp":1701927361000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbad386\/7333673"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,22]]},"references-count":54,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,9,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbad386","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,11,1]]},"published":{"date-parts":[[2023,9,22]]},"article-number":"bbad386"}}