{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T02:58:27Z","timestamp":1779332307982,"version":"3.51.4"},"reference-count":49,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2022,7,7]],"date-time":"2022-07-07T00:00:00Z","timestamp":1657152000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,7,18]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Candidate compounds with high binding affinities toward a target protein are likely to be developed as drugs. Deep neural networks (DNNs) have attracted increasing attention for drug\u2013target affinity (DTA) estimation owning to their efficiency. However, the negative impact of batch effects caused by measure metrics, system technologies and other assay information is seldom discussed when training a DNN model for DTA. Suffering from the data deviation caused by batch effects, the DNN models can only be trained on a small amount of \u2018clean\u2019 data. Thus, it is challenging for them to provide precise and consistent estimations. We design a batch-sensitive training framework, namely BatchDTA, to train the DNN models. BatchDTA implicitly aligns multiple batches toward the same protein through learning the orders of candidate compounds with respect to the batches, alleviating the impact of the batch effects on the DNN models. Extensive experiments demonstrate that BatchDTA facilitates four mainstream DNN models to enhance the ability and robustness on multiple DTA datasets (BindingDB, Davis and KIBA). The average concordance index of the DNN models achieves a relative improvement of 4.0%. The case study reveals that BatchDTA can successfully learn the ranking orders of the compounds from multiple batches. In addition, BatchDTA can also be applied to the fused data collected from multiple sources to achieve further improvement.<\/jats:p>","DOI":"10.1093\/bib\/bbac260","type":"journal-article","created":{"date-parts":[[2022,7,6]],"date-time":"2022-07-06T23:57:25Z","timestamp":1657151845000},"source":"Crossref","is-referenced-by-count":13,"title":["BatchDTA: implicit batch alignment enhances deep learning-based drug\u2013target affinity estimation"],"prefix":"10.1093","volume":"23","author":[{"given":"Hongyu","family":"Luo","sequence":"first","affiliation":[{"name":"PaddleHelix team, Baidu Inc. , 518000, Shenzhen, China"}]},{"given":"Yingfei","family":"Xiang","sequence":"additional","affiliation":[{"name":"PaddleHelix team, Baidu Inc. , 518000, Shenzhen, China"}]},{"given":"Xiaomin","family":"Fang","sequence":"additional","affiliation":[{"name":"PaddleHelix team, Baidu Inc. , 518000, Shenzhen, China"}]},{"given":"Wei","family":"Lin","sequence":"additional","affiliation":[{"name":"PaddleHelix team, Baidu Inc. , 518000, Shenzhen, China"}]},{"given":"Fan","family":"Wang","sequence":"additional","affiliation":[{"name":"PaddleHelix team, Baidu Inc. , 518000, Shenzhen, China"}]},{"given":"Hua","family":"Wu","sequence":"additional","affiliation":[{"name":"Baidu Inc. , 100000, Beijing, China"}]},{"given":"Haifeng","family":"Wang","sequence":"additional","affiliation":[{"name":"Baidu Inc. , 100000, Beijing, China"}]}],"member":"286","published-online":{"date-parts":[[2022,7,7]]},"reference":[{"issue":"7","key":"2022071906190752600_ref1","doi-asserted-by":"crossref","first-page":"807","DOI":"10.1038\/ng0707-807","article-title":"On the design and analysis of gene expression studies in human populations","volume":"39","author":"Akey","year":"2007","journal-title":"Nat Genet"},{"key":"2022071906190752600_ref2","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1214\/09-SS054","article-title":"A survey of cross-validation procedures for model selection","volume":"4","author":"Arlot","year":"2010","journal-title":"Statistics surveys"},{"issue":"4","key":"2022071906190752600_ref3","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1677\/erc.1.00868","article-title":"High-resolution serum proteomic patterns for ovarian cancer detection","volume":"11","author":"Baggerly","year":"2004","journal-title":"Endocr Relat Cancer"},{"key":"2022071906190752600_ref4","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1145\/1102351.1102363","volume-title":"Proceedings of the 22nd international conference on Machine learning","author":"Burges","year":"2005"},{"issue":"9","key":"2022071906190752600_ref5","doi-asserted-by":"crossref","DOI":"10.3390\/molecules23092208","article-title":"Machine learning for drug-target interaction prediction","volume":"23","author":"Chen","year":"2018","journal-title":"Molecules"},{"issue":"13","key":"2022071906190752600_ref6","doi-asserted-by":"crossref","first-page":"i509","DOI":"10.1093\/bioinformatics\/bty277","article-title":"Learning with multiple pairwise kernels for drug bioactivity prediction","volume":"34","author":"Cichonska","year":"2018","journal-title":"Bioinformatics"},{"issue":"8","key":"2022071906190752600_ref7","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1005678","article-title":"Computational-experimental approach to drug-target interaction mapping: a case study on kinase inhibitors","volume":"13","author":"Cichonska","year":"2017","journal-title":"PLoS Comput Biol"},{"key":"2022071906190752600_ref8","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1007\/978-1-4939-9744-2_16","volume-title":"Mass spectrometry data analysis in proteomics","author":"\u010cuklina","year":"2020"},{"issue":"11","key":"2022071906190752600_ref9","doi-asserted-by":"crossref","first-page":"1046","DOI":"10.1038\/nbt.1990","article-title":"Comprehensive analysis of kinase inhibitor selectivity","volume":"29","author":"Davis","year":"2011","journal-title":"Nat Biotechnol"},{"issue":"5","key":"2022071906190752600_ref10","doi-asserted-by":"crossref","first-page":"1273","DOI":"10.1021\/ci010132r","article-title":"Reoptimization of MDL keys for use in drug discovery","volume":"42","author":"Durant","year":"2002","journal-title":"J Chem Inf Comput Sci"},{"issue":"4","key":"2022071906190752600_ref11","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1093\/biomet\/92.4.965","article-title":"Concordance probability and discriminatory power in proportional hazards regression","volume":"92","author":"G\u00f6nen","year":"2005","journal-title":"Biometrika"},{"issue":"5","key":"2022071906190752600_ref12","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1038\/nbt.4091","article-title":"Batch effects in single-cell rna-sequencing data are corrected by matching mutual nearest neighbors","volume":"36","author":"Haghverdi","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2022071906190752600_ref13","first-page":"195","volume-title":"International workshop on artificial neural networks","author":"Han","year":"1995"},{"key":"2022071906190752600_ref14","doi-asserted-by":"crossref","first-page":"Springer","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The elements of statistical learning: data mining, inference, and prediction","author":"Hastie","year":"2009"},{"issue":"1","key":"2022071906190752600_ref15","first-page":"1","article-title":"Simboost: a read-across approach for predicting drug\u2013target binding affinities using gradient boosting machines","volume":"9","author":"He","year":"2017","journal-title":"J Chem"},{"issue":"8","key":"2022071906190752600_ref16","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"issue":"6","key":"2022071906190752600_ref17","doi-asserted-by":"crossref","first-page":"830","DOI":"10.1093\/bioinformatics\/btaa880","article-title":"Moltrans: Molecular interaction transformer for drug\u2013target interaction prediction","volume":"37","author":"Huang","year":"2021","journal-title":"Bioinformatics"},{"issue":"6","key":"2022071906190752600_ref18","doi-asserted-by":"crossref","first-page":"1219","DOI":"10.1111\/j.1476-5381.2009.00604.x","article-title":"Ligand binding assays at equilibrium: validation and interpretation","volume":"161","author":"Hulme","year":"2010","journal-title":"Br J Pharmacol"},{"key":"2022071906190752600_ref19","first-page":"448","volume-title":"International conference on machine learning","author":"Ioffe","year":"2015"},{"issue":"24","key":"2022071906190752600_ref20","doi-asserted-by":"crossref","first-page":"18209","DOI":"10.1021\/acs.jmedchem.1c01830","article-title":"Interactiongraphnet: A novel and efficient deep graph representation learning framework for accurate protein\u2013ligand interaction predictions","volume":"64","author":"Jiang","year":"2021","journal-title":"J Med Chem"},{"issue":"35","key":"2022071906190752600_ref21","doi-asserted-by":"crossref","first-page":"20701","DOI":"10.1039\/D0RA02297G","article-title":"Drug\u2013target affinity prediction using graph neural network and contact maps","volume":"10","author":"Jiang","year":"2020","journal-title":"RSC Adv"},{"issue":"2","key":"2022071906190752600_ref22","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1021\/acs.jcim.7b00650","article-title":"K deep: protein\u2013ligand absolute binding affinity prediction via 3d-convolutional neural networks","volume":"58","author":"Jim\u00e9nez","year":"2018","journal-title":"J Chem Inf Model"},{"issue":"4","key":"2022071906190752600_ref23","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0061007","article-title":"Comparability of mixed ic50 data\u2013a statistical analysis","volume":"8","author":"Kalliokoski","year":"2013","journal-title":"PloS one"},{"key":"2022071906190752600_ref24","article-title":"Adam: A method for stochastic optimization","author":"Kingma","year":"2014"},{"key":"2022071906190752600_ref25","volume-title":"Proceedings of the Conference Neural Information Processing Systems (NIPS)","author":"Krizhevsky"},{"key":"2022071906190752600_ref26","first-page":"1097","volume-title":"Advances in neural information processing systems","author":"Krizhevsky","year":"2012"},{"issue":"10","key":"2022071906190752600_ref27","first-page":"1995","article-title":"Convolutional networks for images, speech, and time series","volume":"3361","author":"LeCun","year":"1995","journal-title":"The handbook of brain theory and neural networks"},{"issue":"6","key":"2022071906190752600_ref28","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1007129","article-title":"Deepconv-dti: Prediction of drug-target interactions via deep learning with convolution on protein sequences","volume":"15","author":"Lee","year":"2019","journal-title":"PLoS Comput Biol"},{"issue":"6","key":"2022071906190752600_ref29","doi-asserted-by":"crossref","first-page":"882","DOI":"10.1093\/bioinformatics\/bts034","article-title":"The sva package for removing batch effects and other unwanted variation in high-throughput experiments","volume":"28","author":"Leek","year":"2012","journal-title":"Bioinformatics"},{"issue":"10","key":"2022071906190752600_ref30","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1038\/nrg2825","article-title":"Tackling the widespread and critical impact of batch effects in high-throughput data","volume":"11","author":"Leek","year":"2010","journal-title":"Nat Rev Genet"},{"key":"2022071906190752600_ref31","doi-asserted-by":"crossref","first-page":"975","DOI":"10.1145\/3447548.3467311","volume-title":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","author":"Li","year":"2021"},{"issue":"3","key":"2022071906190752600_ref32","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1561\/1500000016","article-title":"Learning to rank for information retrieval","volume":"3","author":"Liu","year":"2009","journal-title":"Found Trends Inf Retr"},{"issue":"suppl_1","key":"2022071906190752600_ref33","doi-asserted-by":"crossref","first-page":"D198","DOI":"10.1093\/nar\/gkl999","article-title":"Bindingdb: a web-accessible database of experimentally determined protein\u2013ligand binding affinities","volume":"35","author":"Liu","year":"2007","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"2022071906190752600_ref34","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1038\/s41467-017-00680-8","article-title":"A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information","volume":"8","author":"Luo","year":"2017","journal-title":"Nat Commun"},{"issue":"3","key":"2022071906190752600_ref35","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1038\/nrd3368","article-title":"Impact of high-throughput screening in biomedical research","volume":"10","author":"Macarron","year":"2011","journal-title":"Nat Rev Drug Discov"},{"issue":"8","key":"2022071906190752600_ref36","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1093\/bioinformatics\/btaa921","volume":"37","author":"Nguyen","year":"2020","journal-title":"Bioinformatics"},{"key":"2022071906190752600_ref37","article-title":"Gefa: early fusion approach in drug-target affinity prediction","author":"Nguyen","year":"2021","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2022071906190752600_ref38","article-title":"Chemboost: A chemical language based approach for protein\u2013ligand binding affinity prediction","author":"\u00d6z\u00e7elik","year":"2020","journal-title":"Molecular Informatics"},{"issue":"17","key":"2022071906190752600_ref39","doi-asserted-by":"crossref","first-page":"i821","DOI":"10.1093\/bioinformatics\/bty593","article-title":"DeepDTA: deep drug\u2013target binding affinity prediction","volume":"34","author":"\u00d6zt\u00fcrk","year":"2018","journal-title":"Bioinformatics"},{"key":"2022071906190752600_ref40","article-title":"Widedta: prediction of drug-target binding affinity","author":"\u00d6zt\u00fcrk","year":"2019"},{"issue":"5","key":"2022071906190752600_ref41","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1021\/ci100050t","article-title":"Extended-connectivity fingerprints","volume":"50","author":"Rogers","year":"2010","journal-title":"J Chem Inf Model"},{"key":"2022071906190752600_ref42","first-page":"230","volume-title":"Machine Learning for Healthcare Conference","author":"Shin","year":"2019"},{"issue":"21","key":"2022071906190752600_ref43","doi-asserted-by":"crossref","first-page":"3666","DOI":"10.1093\/bioinformatics\/bty374","article-title":"Development and evaluation of a deep learning model for protein\u2013ligand binding affinity prediction","volume":"34","author":"Stepniewska-Dziubinska","year":"2018","journal-title":"Bioinformatics"},{"issue":"3","key":"2022071906190752600_ref44","doi-asserted-by":"crossref","first-page":"735","DOI":"10.1021\/ci400709d","article-title":"Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis","volume":"54","author":"Tang","year":"2014","journal-title":"J Chem Inf Model"},{"key":"2022071906190752600_ref45","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani","year":"2017"},{"key":"2022071906190752600_ref46","article-title":"Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin","volume":"30","author":"Vaswani","year":"2017","journal-title":"Attention is all you need Advances in neural information processing systems"},{"key":"2022071906190752600_ref47","article-title":"Graph attention networks","author":"Veli\u010dkovi\u0107","year":"2017"},{"issue":"1","key":"2022071906190752600_ref48","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40649-019-0069-y","article-title":"Graph convolutional networks: a comprehensive review","volume":"6","author":"Zhang","year":"2019","journal-title":"Computational Social Networks"},{"issue":"1","key":"2022071906190752600_ref49","doi-asserted-by":"crossref","first-page":"437","DOI":"10.1186\/s12859-019-3028-6","article-title":"Influence of batch effect correction methods on drug induced differential gene expression profiles","volume":"20","author":"Zhou","year":"2019","journal-title":"BMC Bioinformatics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/4\/bbac260\/45017546\/bbac260.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/4\/bbac260\/45017546\/bbac260.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,7,19]],"date-time":"2022-07-19T02:21:25Z","timestamp":1658197285000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbac260\/6632927"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,7]]},"references-count":49,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,7,18]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbac260","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.11.23.469641","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,7,18]]},"published":{"date-parts":[[2022,7,7]]},"article-number":"bbac260"}}