{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T19:17:21Z","timestamp":1776107841524,"version":"3.50.1"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2020,10,18]],"date-time":"2020-10-18T00:00:00Z","timestamp":1602979200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["SCH SCH-2014438"],"award-info":[{"award-number":["SCH SCH-2014438"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS-1418511"],"award-info":[{"award-number":["IIS-1418511"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF-1533768"],"award-info":[{"award-number":["CCF-1533768"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS-1838042"],"award-info":[{"award-number":["IIS-1838042"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Institute of Health","award":["R01 1R01NS107291-01"],"award-info":[{"award-number":["R01 1R01NS107291-01"]}]},{"name":"National Institute of Health","award":["R56HL138415"],"award-info":[{"award-number":["R56HL138415"]}]},{"name":"IQVIA"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,5,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Drug\u2013target interaction (DTI) prediction is a foundational task for in-silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (i) existing molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain and (ii) existing methods focus on limited labeled data while ignoring the value of massive unlabeled molecular data.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (i) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction and (ii) an augmented transformer encoder to better extract and capture the semantic relations among sub-structures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real-world data and show it improved DTI prediction performance compared to state-of-the-art baselines.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The model scripts are available at https:\/\/github.com\/kexinhuang12345\/moltrans.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa880","type":"journal-article","created":{"date-parts":[[2020,10,7]],"date-time":"2020-10-07T19:14:30Z","timestamp":1602098070000},"page":"830-836","source":"Crossref","is-referenced-by-count":524,"title":["MolTrans: Molecular Interaction Transformer for drug\u2013target interaction prediction"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6693-8390","authenticated-orcid":false,"given":"Kexin","family":"Huang","sequence":"first","affiliation":[{"name":"Health Data Science, Harvard University , Boston, MA 02120, USA"}]},{"given":"Cao","family":"Xiao","sequence":"additional","affiliation":[{"name":"Analytics Center of Excellence, IQVIA , Cambridge, MA 02139, USA"}]},{"given":"Lucas M","family":"Glass","sequence":"additional","affiliation":[{"name":"Analytics Center of Excellence, IQVIA , Cambridge, MA 02139, USA"}]},{"given":"Jimeng","family":"Sun","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Illinois at Urbana-Champaign , Urbana, IL 61801, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,10,18]]},"reference":[{"key":"2023051705210136900_btaa880-B1","first-page":"217","volume-title":"Annual Reports in Computational Chemistry","author":"Bolton","year":"2008"},{"key":"2023051705210136900_btaa880-B2","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1007\/978-1-59745-535-0_4","volume-title":"Plant Bioinformatics","author":"Boutet","year":"2007"},{"key":"2023051705210136900_btaa880-B3","first-page":"14","article-title":"High-throughput screening for drug discovery","volume":"384","author":"Broach","year":"1996","journal-title":"Nature"},{"key":"2023051705210136900_btaa880-B4","doi-asserted-by":"crossref","first-page":"960","DOI":"10.1093\/bioinformatics\/btt072","article-title":"propy: a tool to generate various modes of Chou\u2019s PseAAC","volume":"29","author":"Cao","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051705210136900_btaa880-B5","doi-asserted-by":"crossref","first-page":"1046","DOI":"10.1038\/nbt.1990","article-title":"Comprehensive analysis of kinase inhibitor selectivity","volume":"29","author":"Davis","year":"2011","journal-title":"Nat. Biotechnol"},{"key":"2023051705210136900_btaa880-B6","doi-asserted-by":"crossref","first-page":"8700","DOI":"10.1073\/pnas.92.19.8700","article-title":"Prediction of protein folding class using global description of amino acid sequence","volume":"92","author":"Dubchak","year":"1995","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051705210136900_btaa880-B7","doi-asserted-by":"crossref","first-page":"3593","DOI":"10.1016\/j.febslet.2011.10.028","article-title":"Crystal structure of the EphA4 protein tyrosine kinase domain in the apo-and dasatinib-bound state","volume":"585","author":"Farenc","year":"2011","journal-title":"FEBS Lett"},{"key":"2023051705210136900_btaa880-B8","first-page":"23","article-title":"A new algorithm for data compression","volume":"12","author":"Gage","year":"1994","journal-title":"C Users J"},{"key":"2023051705210136900_btaa880-B9","author":"Gao","year":"2018"},{"key":"2023051705210136900_btaa880-B10","doi-asserted-by":"crossref","first-page":"D1100","DOI":"10.1093\/nar\/gkr777","article-title":"ChEMBL: a large-scale bioactivity database for drug discovery","volume":"40","author":"Gaulton","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023051705210136900_btaa880-B11","author":"Gong","year":"2018"},{"key":"2023051705210136900_btaa880-B12","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1186\/s13321-017-0209-z","article-title":"SimBoost: a read-across approach for predicting drug\u2013target binding affinities using gradient boosting machines","volume":"9","author":"He","year":"2017","journal-title":"J. Cheminform"},{"key":"2023051705210136900_btaa880-B13","doi-asserted-by":"crossref","first-page":"5947","DOI":"10.4249\/scholarpedia.5947","article-title":"Deep belief networks","volume":"4","author":"Hinton","year":"2009","journal-title":"Scholarpedia"},{"key":"2023051705210136900_btaa880-B14","first-page":"599","article-title":"A practical guide to training restricted Boltzmann machines","author":"Hinton","year":"2012","journal-title":"Neural Networks: Tricks of the Trade"},{"key":"2023051705210136900_btaa880-B15","doi-asserted-by":"crossref","first-page":"1239","DOI":"10.1111\/j.1476-5381.2010.01127.x","article-title":"Principles of early drug discovery","volume":"162","author":"Hughes","year":"2011","journal-title":"Br. J. Pharmacol"},{"key":"2023051705210136900_btaa880-B16","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1038\/nrd2683","article-title":"Mechanisms of drug combinations: interaction and network perspectives","volume":"8","author":"Jia","year":"2009","journal-title":"Nat. Rev. Drug Disc"},{"key":"2023051705210136900_btaa880-B17","author":"Krizhevsky","year":"2012"},{"key":"2023051705210136900_btaa880-B18","doi-asserted-by":"crossref","first-page":"26926","DOI":"10.1074\/jbc.M113.490706","article-title":"Histone deacetylase (HDAC) inhibitor kinetic rate constants correlate with cellular histone acetylation but not transcription and cell viability","volume":"288","author":"Lauffer","year":"2013","journal-title":"J. Biol. Chem"},{"key":"2023051705210136900_btaa880-B19","doi-asserted-by":"crossref","first-page":"e1007129","DOI":"10.1371\/journal.pcbi.1007129","article-title":"DeepConv-DTI: prediction of drug-target interactions via deep learning with convolution on protein sequences","volume":"15","author":"Lee","year":"2019","journal-title":"PLoS Comput. Biol"},{"key":"2023051705210136900_btaa880-B20","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1042\/bj0630130","article-title":"Inhibition of cytochrome systems of heart muscle and certain bacteria by the antagonists of dihydrostreptomycin: 2-alkyl-4-hydroxyquinoline N-oxides","volume":"63","author":"Lightbown","year":"1956","journal-title":"Biochem. J"},{"key":"2023051705210136900_btaa880-B21","doi-asserted-by":"crossref","first-page":"D198","DOI":"10.1093\/nar\/gkl999","article-title":"BindingDB: a web-accessible database of experimentally determined protein\u2013ligand binding affinities","volume":"35","author":"Liu","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023051705210136900_btaa880-B22","doi-asserted-by":"crossref","first-page":"5441","DOI":"10.1039\/C8SC00148K","article-title":"Large-scale comparison of machine learning methods for drug target prediction on chEMBL","volume":"9","author":"Mayr","year":"2018","journal-title":"Chem. Sci"},{"key":"2023051705210136900_btaa880-B23","doi-asserted-by":"crossref","first-page":"i821","DOI":"10.1093\/bioinformatics\/bty593","article-title":"DeepDTA: deep drug\u2013target binding affinity prediction","volume":"34","author":"\u00d6zt\u00fcrk","year":"2018","journal-title":"Bioinformatics"},{"key":"2023051705210136900_btaa880-B24","author":"\u00d6zt\u00fcrk","year":"2019"},{"key":"2023051705210136900_btaa880-B25","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1093\/bib\/bbu010","article-title":"Toward more realistic drug\u2013target interaction predictions","volume":"16","author":"Pahikkala","year":"2015","journal-title":"Brief. Bioinform"},{"key":"2023051705210136900_btaa880-B26","author":"Paszke","year":"2019"},{"key":"2023051705210136900_btaa880-B27","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1021\/ci100050t","article-title":"Extended-connectivity fingerprints","volume":"50","author":"Rogers","year":"2010","journal-title":"J. Chem. Inf. Model"},{"key":"2023051705210136900_btaa880-B28","doi-asserted-by":"crossref","first-page":"232","DOI":"10.1038\/nchembio.1199","article-title":"Target identification and mechanism of action in chemical biology and drug discovery","volume":"9","author":"Schenone","year":"2013","journal-title":"Nat. Chem. Biol"},{"key":"2023051705210136900_btaa880-B29","author":"Sennrich","year":"2016"},{"key":"2023051705210136900_btaa880-B30","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1093\/bioinformatics\/bty535","article-title":"Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences","volume":"35","author":"Tsubaki","year":"2019","journal-title":"Bioinformatics"},{"key":"2023051705210136900_btaa880-B31","author":"Unterthiner","year":"2014"},{"key":"2023051705210136900_btaa880-B32","author":"Vaswani","year":"2017"},{"key":"2023051705210136900_btaa880-B33","doi-asserted-by":"crossref","first-page":"W623","DOI":"10.1093\/nar\/gkp456","article-title":"PubChem: a public information system for analyzing bioactivities of small molecules","volume":"37","author":"Wang","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2023051705210136900_btaa880-B34","doi-asserted-by":"crossref","first-page":"1401","DOI":"10.1021\/acs.jproteome.6b00618","article-title":"Deep-learning-based drug\u2013target interaction prediction","volume":"16","author":"Wen","year":"2017","journal-title":"J. Proteome Res"},{"key":"2023051705210136900_btaa880-B35","doi-asserted-by":"crossref","first-page":"D901","DOI":"10.1093\/nar\/gkm958","article-title":"DrugBank: a knowledgebase for drugs, drug actions and drug targets","volume":"36","author":"Wishart","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023051705210136900_btaa880-B36","author":"Zhang","year":"2018"},{"key":"2023051705210136900_btaa880-B37","author":"Zheng","year":"2013"},{"key":"2023051705210136900_btaa880-B38","article-title":"BioSNAP datasets: Stanford biomedical network dataset collection","author":"Zitnik","year":"2018"},{"key":"2023051705210136900_btaa880-B39","doi-asserted-by":"crossref","first-page":"i457","DOI":"10.1093\/bioinformatics\/bty294","article-title":"Modeling polypharmacy side effects with graph convolutional networks","volume":"34","author":"Zitnik","year":"2018","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa880\/35204223\/btaa880.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/6\/830\/50357346\/btaa880.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/6\/830\/50357346\/btaa880.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,17]],"date-time":"2023-05-17T05:22:12Z","timestamp":1684300932000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/6\/830\/5929692"}},"subtitle":[],"editor":[{"given":"Zhiyong","family":"Lu","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,10,18]]},"references-count":39,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,5,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa880","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,3,15]]},"published":{"date-parts":[[2020,10,18]]}}}