{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T05:57:31Z","timestamp":1781330251469,"version":"3.54.1"},"reference-count":53,"publisher":"Oxford University Press (OUP)","issue":"16","license":[{"start":{"date-parts":[[2020,5,19]],"date-time":"2020-05-19T00:00:00Z","timestamp":1589846400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["81773634"],"award-info":[{"award-number":["81773634"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Science & Technology Major"},{"DOI":"10.13039\/501100013279","name":"Key New Drug Creation and Manufacturing Program","doi-asserted-by":"publisher","award":["2018ZX09711002"],"award-info":[{"award-number":["2018ZX09711002"]}],"id":[{"id":"10.13039\/501100013279","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Strategic Priority Research Program of the Chinese Academy of Sciences","award":["XDA12050201"],"award-info":[{"award-number":["XDA12050201"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,8,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Identifying compound\u2013protein interaction (CPI) is a crucial task in drug discovery and chemogenomics studies, and proteins without three-dimensional structure account for a large part of potential biological targets, which requires developing methods using only protein sequence information to predict CPI. However, sequence-based CPI models may face some specific pitfalls, including using inappropriate datasets, hidden ligand bias and splitting datasets inappropriately, resulting in overestimation of their prediction performance.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>To address these issues, we here constructed new datasets specific for CPI prediction, proposed a novel transformer neural network named TransformerCPI, and introduced a more rigorous label reversal experiment to test whether a model learns true interaction features. TransformerCPI achieved much improved performance on the new experiments, and it can be deconvolved to highlight important interacting regions of protein sequences and compound atoms, which may contribute chemical biology studies with useful guidance for further ligand structural optimization.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/lifanchen-simm\/transformerCPI.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa524","type":"journal-article","created":{"date-parts":[[2020,5,14]],"date-time":"2020-05-14T11:36:51Z","timestamp":1589456211000},"page":"4406-4414","source":"Crossref","is-referenced-by-count":540,"title":["TransformerCPI: improving compound\u2013protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments"],"prefix":"10.1093","volume":"36","author":[{"given":"Lifan","family":"Chen","sequence":"first","affiliation":[{"name":"Shanghai Institute of Materia Medica, Chinese Academy of Sciences Drug Discovery and Design Center, State Key Laboratory of Drug Research, , Shanghai 201203, China"},{"name":"University of Chinese Academy of Sciences , Beijing 100049, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiaoqin","family":"Tan","sequence":"additional","affiliation":[{"name":"Shanghai Institute of Materia Medica, Chinese Academy of Sciences Drug Discovery and Design Center, State Key Laboratory of Drug Research, , Shanghai 201203, China"},{"name":"University of Chinese Academy of Sciences , Beijing 100049, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dingyan","family":"Wang","sequence":"additional","affiliation":[{"name":"Shanghai Institute of Materia Medica, Chinese Academy of Sciences Drug Discovery and Design Center, State Key Laboratory of Drug Research, , Shanghai 201203, China"},{"name":"University of Chinese Academy of Sciences , Beijing 100049, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Feisheng","family":"Zhong","sequence":"additional","affiliation":[{"name":"Shanghai Institute of Materia Medica, Chinese Academy of Sciences Drug Discovery and Design Center, State Key Laboratory of Drug Research, , Shanghai 201203, China"},{"name":"University of Chinese Academy of Sciences , Beijing 100049, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiaohong","family":"Liu","sequence":"additional","affiliation":[{"name":"Shanghai Institute of Materia Medica, Chinese Academy of Sciences Drug Discovery and Design Center, State Key Laboratory of Drug Research, , Shanghai 201203, China"},{"name":"ShanghaiTech University Shanghai Institute for Advanced Immunochemical Studies, School of Life Science and Technology, , Shanghai 200031, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tianbiao","family":"Yang","sequence":"additional","affiliation":[{"name":"Shanghai Institute of Materia Medica, Chinese Academy of Sciences Drug Discovery and Design Center, State Key Laboratory of Drug Research, , Shanghai 201203, China"},{"name":"University of Chinese Academy of Sciences , Beijing 100049, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiaomin","family":"Luo","sequence":"additional","affiliation":[{"name":"Shanghai Institute of Materia Medica, Chinese Academy of Sciences Drug Discovery and Design Center, State Key Laboratory of Drug Research, , Shanghai 201203, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kaixian","family":"Chen","sequence":"additional","affiliation":[{"name":"Shanghai Institute of Materia Medica, Chinese Academy of Sciences Drug Discovery and Design Center, State Key Laboratory of Drug Research, , Shanghai 201203, China"},{"name":"ShanghaiTech University Shanghai Institute for Advanced Immunochemical Studies, School of Life Science and Technology, , Shanghai 200031, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hualiang","family":"Jiang","sequence":"additional","affiliation":[{"name":"Shanghai Institute of Materia Medica, Chinese Academy of Sciences Drug Discovery and Design Center, State Key Laboratory of Drug Research, , Shanghai 201203, China"},{"name":"ShanghaiTech University Shanghai Institute for Advanced Immunochemical Studies, School of Life Science and Technology, , Shanghai 200031, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3323-3092","authenticated-orcid":false,"given":"Mingyue","family":"Zheng","sequence":"additional","affiliation":[{"name":"Shanghai Institute of Materia Medica, Chinese Academy of Sciences Drug Discovery and Design Center, State Key Laboratory of Drug Research, , Shanghai 201203, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2020,5,19]]},"reference":[{"key":"2023062213524808500_btaa524-B1","doi-asserted-by":"crossref","first-page":"1315","DOI":"10.1038\/s41592-019-0598-1","article-title":"Unified rational protein engineering with sequence-based deep representation learning","volume":"16","author":"Alley","year":"2019","journal-title":"Nat. Methods"},{"key":"2023062213524808500_btaa524-B2","doi-asserted-by":"crossref","first-page":"2397","DOI":"10.1093\/bioinformatics\/btp433","article-title":"Supervised prediction of drug\u2013target interactions using bipartite local models","volume":"25","author":"Bleakley","year":"2009","journal-title":"Bioinformatics"},{"key":"2023062213524808500_btaa524-B3","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1038\/nrg1317","article-title":"Chemogenomics: an emerging strategy for rapid target and drug discovery","volume":"5","author":"Bredel","year":"2004","journal-title":"Nat. Rev. Genet"},{"key":"2023062213524808500_btaa524-B4","first-page":"3035","article-title":"GLASS: a comprehensive database for experimentally validated GPCR\u2013ligand associations","volume":"31","author":"Chan","year":"2015","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023062213524808500_btaa524-B5","doi-asserted-by":"crossref","first-page":"e0220113","DOI":"10.1371\/journal.pone.0220113","article-title":"Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening","volume":"14","author":"Chen","year":"2019","journal-title":"PLoS One"},{"key":"2023062213524808500_btaa524-B6","doi-asserted-by":"crossref","first-page":"2373","DOI":"10.1039\/c2mb25110h","article-title":"Prediction of chemical\u2013protein interactions: multitarget-QSAR versus computational chemogenomic methods","volume":"8","author":"Cheng","year":"2012","journal-title":"Mol. Biosyst"},{"key":"2023062213524808500_btaa524-B151","author":"Dai","year":"2019"},{"key":"2023062213524808500_btaa524-B7","first-page":"933","author":"Dauphin","year":"2016"},{"key":"2023062213524808500_btaa524-B152","first-page":"71","author":"Devlin","year":"2019"},{"key":"2023062213524808500_btaa524-B8","first-page":"3371","author":"Gao","year":"2018"},{"key":"2023062213524808500_btaa524-B9","doi-asserted-by":"crossref","first-page":"D1045","DOI":"10.1093\/nar\/gkv1072","article-title":"BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology","volume":"44","author":"Gilson","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023062213524808500_btaa524-B10","doi-asserted-by":"crossref","first-page":"2304","DOI":"10.1093\/bioinformatics\/bts360","article-title":"Predicting drug\u2013target interactions from chemical and genomic kernels using Bayesian matrix factorization","volume":"28","author":"Gonen","year":"2012","journal-title":"Bioinformatics"},{"key":"2023062213524808500_btaa524-B11","doi-asserted-by":"crossref","first-page":"D919","DOI":"10.1093\/nar\/gkm862","article-title":"SuperTarget and Matador: resources for exploring drug\u2013target relationships","volume":"36","author":"Gunther","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023062213524808500_btaa524-B12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1002\/minf.201600045","article-title":"CGBVS-DNN: prediction of compound\u2013protein interactions based on deep learning","volume":"36","author":"Hamanaka","year":"2017","journal-title":"Mol. Inform"},{"key":"2023062213524808500_btaa524-B13","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1186\/s13321-017-0209-z","article-title":"SimBoost: a read-across approach for predicting drug\u2013target binding affinities using gradient boosting machines","volume":"9","author":"He","year":"2017","journal-title":"J. Cheminform"},{"key":"2023062213524808500_btaa524-B14","doi-asserted-by":"crossref","first-page":"2149","DOI":"10.1093\/bioinformatics\/btn409","article-title":"Protein\u2013ligand interaction prediction: an improved chemogenomics approach","volume":"24","author":"Jacob","year":"2008","journal-title":"Bioinformatics"},{"key":"2023062213524808500_btaa524-B15","doi-asserted-by":"crossref","first-page":"3329","DOI":"10.1093\/bioinformatics\/btz111","article-title":"DeepAffinity: interpretable deep learning of compound\u2013protein affinity through unified recurrent and convolutional neural networks","volume":"35","author":"Karimi","year":"2019","journal-title":"Bioinformatics"},{"key":"2023062213524808500_btaa524-B16","author":"Kimothi","year":"2016"},{"key":"2023062213524808500_btaa524-B17","author":"Kipf","year":"2016"},{"key":"2023062213524808500_btaa524-B18","doi-asserted-by":"crossref","first-page":"e0141287","DOI":"10.1371\/journal.pone.0141287","article-title":"Continuous distributed representation of biological sequences for deep proteomics and genomics","volume":"10","author":"Kobeissy","year":"2015","journal-title":"PLoS One"},{"key":"2023062213524808500_btaa524-B19","doi-asserted-by":"crossref","first-page":"e1007129","DOI":"10.1371\/journal.pcbi.1007129","article-title":"DeepConv-DTI: prediction of drug\u2013target interactions via deep learning with convolution on protein sequences","volume":"15","author":"Lee","year":"2019","journal-title":"PLoS Comput. Biol"},{"key":"2023062213524808500_btaa524-B20","doi-asserted-by":"crossref","first-page":"i221","DOI":"10.1093\/bioinformatics\/btv256","article-title":"Improving compound\u2013protein interaction prediction by building up highly credible negative samples","volume":"31","author":"Liu","year":"2015","journal-title":"Bioinformatics"},{"key":"2023062213524808500_btaa524-B21","author":"Liu","year":"2019"},{"key":"2023062213524808500_btaa524-B22","doi-asserted-by":"crossref","first-page":"D198","DOI":"10.1093\/nar\/gkl999","article-title":"BindingDB: a web-accessible database of experimentally determined protein\u2013ligand binding affinities","volume":"35","author":"Liu","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023062213524808500_btaa524-B23","author":"Mazzaferro","year":"2017"},{"key":"2023062213524808500_btaa524-B24","author":"Mikolov","year":"2013"},{"key":"2023062213524808500_btaa524-B25","first-page":"3111","article-title":"Distributed representations of words and phrases and their compositionality","volume":"26","author":"Mikolov","year":"2013","journal-title":"Adv. Neural Inform. Process. Syst"},{"key":"2023062213524808500_btaa524-B26","doi-asserted-by":"crossref","first-page":"6582","DOI":"10.1021\/jm300687e","article-title":"Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking","volume":"55","author":"Mysinger","year":"2012","journal-title":"J. Med. Chem"},{"key":"2023062213524808500_btaa524-B27","author":"Nguyen","year":"2019"},{"key":"2023062213524808500_btaa524-B28","doi-asserted-by":"crossref","first-page":"i821","DOI":"10.1093\/bioinformatics\/bty593","article-title":"DeepDTA: deep drug\u2013target binding affinity prediction","volume":"34","author":"Ozturk","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062213524808500_btaa524-B29","author":"\u00d6zt\u00fcrk","year":"2019"},{"key":"2023062213524808500_btaa524-B30","author":"Qiu","year":"2020"},{"key":"2023062213524808500_btaa524-B31","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1038\/d41586-019-02307-y","article-title":"Three pitfalls to avoid in machine learning","volume":"572","author":"Riley","year":"2019","journal-title":"Nature"},{"key":"2023062213524808500_btaa524-B32","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1109\/TNN.2008.2005605","article-title":"The graph neural network model","volume":"20","author":"Scarselli","year":"2009","journal-title":"IEEE Trans. Neural Netw"},{"key":"2023062213524808500_btaa524-B153","doi-asserted-by":"crossref","first-page":"1572","DOI":"10.1021\/acscentsci.9b00576","article-title":"Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction","volume":"5","author":"Schwaller","year":"2019","journal-title":"ACS Central Science"},{"key":"2023062213524808500_btaa524-B33","doi-asserted-by":"crossref","first-page":"947","DOI":"10.1021\/acs.jcim.8b00712","article-title":"In need of bias control: evaluating chemical data for machine learning in structure-based virtual screening","volume":"59","author":"Sieg","year":"2019","journal-title":"J. Chem. Inf. Model"},{"key":"2023062213524808500_btaa524-B34","doi-asserted-by":"crossref","first-page":"D380","DOI":"10.1093\/nar\/gkv1277","article-title":"STITCH 5: augmenting protein\u2013chemical interaction networks with tissue and affinity data","volume":"44","author":"Szklarczyk","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023062213524808500_btaa524-B35","doi-asserted-by":"crossref","first-page":"735","DOI":"10.1021\/ci400709d","article-title":"Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis","volume":"54","author":"Tang","year":"2014","journal-title":"J. Chem. Inf. Model"},{"key":"2023062213524808500_btaa524-B36","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1016\/j.ymeth.2016.06.024","article-title":"Boosting compound\u2013protein interaction prediction by deep learning","volume":"110","author":"Tian","year":"2016","journal-title":"Methods"},{"key":"2023062213524808500_btaa524-B37","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1093\/bioinformatics\/bty535","article-title":"Compound\u2013protein interaction prediction with end-to-end learning of neural networks for graphs and sequences","volume":"35","author":"Tsubaki","year":"2019","journal-title":"Bioinformatics"},{"key":"2023062213524808500_btaa524-B38","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1038\/s41573-019-0024-5","article-title":"Applications of machine learning in drug discovery and development","volume":"18","author":"Vamathevan","year":"2019","journal-title":"Nat. Rev. Drug Discov"},{"key":"2023062213524808500_btaa524-B39","doi-asserted-by":"crossref","first-page":"3036","DOI":"10.1093\/bioinformatics\/btr500","article-title":"Gaussian interaction profile kernels for predicting drug\u2013target interaction","volume":"27","author":"van Laarhoven","year":"2011","journal-title":"Bioinformatics"},{"key":"2023062213524808500_btaa524-B154","author":"Vaswani","year":"2017"},{"key":"2023062213524808500_btaa524-B40","author":"Wan","year":"2016"},{"key":"2023062213524808500_btaa524-B41","doi-asserted-by":"crossref","first-page":"478","DOI":"10.1016\/j.gpb.2019.04.003","article-title":"DeepCPI: a deep learning-based framework for large-scale in silico drug screening","volume":"17","author":"Wan","year":"2019","journal-title":"Genomics Proteomics Bioinf"},{"key":"2023062213524808500_btaa524-B42","doi-asserted-by":"crossref","first-page":"2821","DOI":"10.1021\/ci200264h","article-title":"Computational screening for active compounds targeting protein sequences: methodology and experimental validation","volume":"51","author":"Wang","year":"2011","journal-title":"J. Chem. Inf. Model"},{"key":"2023062213524808500_btaa524-B43","doi-asserted-by":"crossref","first-page":"i126","DOI":"10.1093\/bioinformatics\/btt234","article-title":"Predicting drug\u2013target interactions using restricted Boltzmann machines","volume":"29","author":"Wang","year":"2013","journal-title":"Bioinformatics"},{"key":"2023062213524808500_btaa524-B44","doi-asserted-by":"crossref","first-page":"D901","DOI":"10.1093\/nar\/gkm958","article-title":"DrugBank: a knowledgebase for drugs, drug actions and drug targets","volume":"36","author":"Wishart","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023062213524808500_btaa524-B45","doi-asserted-by":"crossref","first-page":"i232","DOI":"10.1093\/bioinformatics\/btn162","article-title":"Prediction of drug\u2013target interaction networks from the integration of chemical and genomic spaces","volume":"24","author":"Yamanishi","year":"2008","journal-title":"Bioinformatics"},{"key":"2023062213524808500_btaa524-B46","doi-asserted-by":"crossref","first-page":"2642","DOI":"10.1093\/bioinformatics\/bty178","article-title":"Learned protein embeddings for machine learning","volume":"34","author":"Yang","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062213524808500_btaa524-B155","first-page":"2978","author":"Yang","year":"2019"},{"key":"2023062213524808500_btaa524-B47","author":"Zhang","year":"2019"},{"key":"2023062213524808500_btaa524-B48","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1038\/s42256-020-0152-y","article-title":"Predicting drug\u2013protein interaction using quasi-visual question answering system","volume":"2","author":"Zheng","year":"2020","journal-title":"Nat. Mach. Intell"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa524\/33666312\/btaa524.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/16\/4406\/50676855\/btaa524.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/16\/4406\/50676855\/btaa524.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,23]],"date-time":"2023-06-23T12:06:32Z","timestamp":1687521992000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/16\/4406\/5840724"}},"subtitle":[],"editor":[{"given":"Arne","family":"Elofsson","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2020,5,19]]},"references-count":53,"journal-issue":{"issue":"16","published-print":{"date-parts":[[2020,8,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa524","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,8,15]]},"published":{"date-parts":[[2020,5,19]]}}}