{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:35Z","timestamp":1772138075566,"version":"3.50.1"},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2025,6,24]],"date-time":"2025-06-24T00:00:00Z","timestamp":1750723200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Target discovery is crucial in drug development, especially for complex chronic diseases. Recent advances in high-throughput technologies and the explosion of biomedical data have highlighted the potential of computational druggability prediction methods. However, most current methods rely on sequence-based features with machine learning, which often face challenges related to hand-crafted features, reproducibility, and accessibility. Moreover, the potential of raw sequence and protein structure has not been fully investigated.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here, we leveraged both protein sequence and structure using deep learning techniques, revealing that protein sequence, especially pre-trained embeddings, is more informative than protein structure. Next, we developed DrugTar, a high-performance deep learning algorithm integrating sequence embeddings from the ESM-2 pre-trained protein language model with gene ontologies to predict druggability. DrugTar achieved areas under the curve and precision\u2013recall curve values of 0.94, outperforming state-of-the-art methods. In conclusion, DrugTar streamlines target discovery as a bottleneck in developing novel therapeutics.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>DrugTar is available as a web server at www.DrugTar.com. The data and source code are at https:\/\/github.com\/NBorhani\/DrugTar.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf360","type":"journal-article","created":{"date-parts":[[2025,6,29]],"date-time":"2025-06-29T01:19:28Z","timestamp":1751159968000},"source":"Crossref","is-referenced-by-count":3,"title":["DrugTar improves druggability prediction by integrating large language models and gene ontologies"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-1704-850X","authenticated-orcid":false,"given":"Niloofar","family":"Borhani","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, Isfahan University of Technology , Isfahan 84156-83111,","place":["Iran"]},{"name":"Regenerative Medicine Research Center, Isfahan University of Medical Sciences , Isfahan 81746-73461,","place":["Iran"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0476-502X","authenticated-orcid":false,"given":"Iman","family":"Izadi","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Isfahan University of Technology , Isfahan 84156-83111,","place":["Iran"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1140-3257","authenticated-orcid":false,"given":"Ali","family":"Motahharynia","sequence":"additional","affiliation":[{"name":"Regenerative Medicine Research Center, Isfahan University of Medical Sciences , Isfahan 81746-73461,","place":["Iran"]},{"name":"Isfahan Neuroscience Research Center, Isfahan University of Medical Sciences , Isfahan 81839-83434,","place":["Iran"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-1221-5001","authenticated-orcid":false,"given":"Mahsa","family":"Sheikholeslami","sequence":"additional","affiliation":[{"name":"Regenerative Medicine Research Center, Isfahan University of Medical Sciences , Isfahan 81746-73461,","place":["Iran"]},{"name":"Department of Medicinal Chemistry, Isfahan University of Medical Sciences , Isfahan 81746-73461,","place":["Iran"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9665-1091","authenticated-orcid":false,"given":"Yousof","family":"Gheisari","sequence":"additional","affiliation":[{"name":"Regenerative Medicine Research Center, Isfahan University of Medical Sciences , Isfahan 81746-73461,","place":["Iran"]},{"name":"Department of Genetics and Molecular Biology, Isfahan University of Medical Sciences , Isfahan 81746-73461,","place":["Iran"]}]}],"member":"286","published-online":{"date-parts":[[2025,6,24]]},"reference":[{"key":"2025073113315032800_btaf360-B1","doi-asserted-by":"crossref","first-page":"23452","DOI":"10.1038\/s41598-021-02282-3","article-title":"Systems biology and machine learning approaches identify drug targets in diabetic nephropathy","volume":"11","author":"Abedi","year":"2021","journal-title":"Sci Rep"},{"key":"2025073113315032800_btaf360-B2","doi-asserted-by":"crossref","first-page":"1315","DOI":"10.1038\/s41592-019-0598-1","article-title":"Unified rational protein engineering with sequence-based deep representation learning","volume":"16","author":"Alley","year":"2019","journal-title":"Nat Methods"},{"key":"2025073113315032800_btaf360-B3","doi-asserted-by":"crossref","first-page":"e0141287","DOI":"10.1371\/journal.pone.0141287","article-title":"Continuous distributed representation of biological sequences for deep proteomics and genomics","volume":"10","author":"Asgari","year":"2015","journal-title":"PLoS One"},{"key":"2025073113315032800_btaf360-B4","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1093\/bioinformatics\/btp002","article-title":"Properties and identification of human protein drug targets","volume":"25","author":"Bakheet","year":"2009","journal-title":"Bioinformatics"},{"key":"2025073113315032800_btaf360-B5","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The Protein Data Bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2025073113315032800_btaf360-B6","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1186\/s12859-022-04569-2","article-title":"A deep learning approach to predict inter-omics interactions in multi-layer networks","volume":"23","author":"Borhani","year":"2022","journal-title":"BMC Bioinformatics"},{"key":"2025073113315032800_btaf360-B7","doi-asserted-by":"crossref","first-page":"104883","DOI":"10.1016\/j.isci.2022.104883","article-title":"Computational prediction and interpretation of druggable proteins using a stacked ensemble-learning framework","volume":"25","author":"Charoenkwan","year":"2022","journal-title":"iScience"},{"key":"2025073113315032800_btaf360-B8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1002\/pro.4555","article-title":"QuoteTarget: a sequence-based transformer protein language model to identify potentially druggable protein targets","volume":"32","author":"Chen","year":"2023","journal-title":"Protein Sci"},{"key":"2025073113315032800_btaf360-B9","doi-asserted-by":"crossref","first-page":"20170387","DOI":"10.1098\/rsif.2017.0387","article-title":"Opportunities and obstacles for deep learning in biology and medicine","volume":"15","author":"Ching","year":"2018","journal-title":"J R Soc Interface"},{"key":"2025073113315032800_btaf360-B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12916-014-0241-z","article-title":"Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement","volume":"13","author":"Collins","year":"2015","journal-title":"BMC Med"},{"key":"2025073113315032800_btaf360-B11","doi-asserted-by":"crossref","first-page":"1170","DOI":"10.1038\/s41374-022-00830-7","article-title":"High-throughput proteomics: a methodological mini-review","volume":"102","author":"Cui","year":"2022","journal-title":"Lab Invest"},{"key":"2025073113315032800_btaf360-B12","doi-asserted-by":"crossref","first-page":"W612","DOI":"10.1093\/nar\/gkv352","article-title":"ChEMBL web services: streamlining access to drug discovery data and utilities","volume":"43","author":"Davies","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2025073113315032800_btaf360-B13","doi-asserted-by":"crossref","first-page":"1914","DOI":"10.1038\/s41467-022-29443-w","article-title":"Learning meaningful representations of protein sequences","volume":"13","author":"Detlefsen","year":"2022","journal-title":"Nat Commun"},{"key":"2025073113315032800_btaf360-B14","first-page":"4171","author":"Devlin","year":"2019"},{"key":"2025073113315032800_btaf360-B15","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1186\/1471-2105-11-283","article-title":"Optimal contact definition for reconstruction of contact maps","volume":"11","author":"Duarte","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2025073113315032800_btaf360-B16","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1038\/s41592-019-0666-6","article-title":"Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning","volume":"17","author":"Gainza","year":"2020","journal-title":"Nat Methods"},{"key":"2025073113315032800_btaf360-B17","doi-asserted-by":"crossref","first-page":"100142","DOI":"10.1016\/j.patter.2020.100142","article-title":"Deep learning in protein structural modeling and design","volume":"1","author":"Gao","year":"2020","journal-title":"Patterns (N Y)"},{"key":"2025073113315032800_btaf360-B18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/bioinformatics\/btad738","article-title":"Seq-InSite: sequence supersedes structure for protein interaction site prediction","volume":"40","author":"Hosseini","year":"2024","journal-title":"Bioinformatics"},{"key":"2025073113315032800_btaf360-B19","doi-asserted-by":"crossref","first-page":"209","DOI":"10.2174\/1389200219666180925091851","article-title":"A review of recent advances and research on drug target identification methods","volume":"20","author":"Hu","year":"2019","journal-title":"Curr Drug Metab"},{"key":"2025073113315032800_btaf360-B20","doi-asserted-by":"crossref","first-page":"750","DOI":"10.1016\/j.jtbi.2009.11.002","article-title":"Predict potential drug targets from the ion channel proteins based on SVM","volume":"262","author":"Huang","year":"2010","journal-title":"J Theor Biol"},{"key":"2025073113315032800_btaf360-B21","doi-asserted-by":"crossref","first-page":"718","DOI":"10.1016\/j.drudis.2016.01.007","article-title":"DrugMiner: comparative analysis of machine learning algorithms for prediction of potential druggable proteins","volume":"21","author":"Jamali","year":"2016","journal-title":"Drug Discov Today"},{"key":"2025073113315032800_btaf360-B22","doi-asserted-by":"crossref","first-page":"8360","DOI":"10.1038\/s41598-022-12201-9","article-title":"Prediction of protein\u2013protein interaction using graph neural networks","volume":"12","author":"Jha","year":"2022","journal-title":"Sci Rep"},{"key":"2025073113315032800_btaf360-B23","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1016\/j.compbiomed.2014.11.008","article-title":"Identification of human drug targets using machine-learning algorithms","volume":"56","author":"Kumari","year":"2015","journal-title":"Comput Biol Med"},{"key":"2025073113315032800_btaf360-B24","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"2025073113315032800_btaf360-B25","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1186\/1471-2105-8-353","article-title":"Prediction of potential drug targets based on simple sequence properties","volume":"8","author":"Li","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2025073113315032800_btaf360-B26","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1016\/j.artmed.2019.07.005","article-title":"Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging-SVM ensemble classifier","volume":"98","author":"Lin","year":"2019","journal-title":"Artif Intell Med"},{"key":"2025073113315032800_btaf360-B27","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2025073113315032800_btaf360-B28","doi-asserted-by":"crossref","first-page":"1604","DOI":"10.1093\/bib\/bbz176","article-title":"Biomedical data and computational models for drug repositioning: a comprehensive review","volume":"22","author":"Luo","year":"2021","journal-title":"Brief Bioinform"},{"key":"2025073113315032800_btaf360-B29","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1038\/nrd3368","article-title":"Impact of high-throughput screening in biomedical research","volume":"10","author":"Macarron","year":"2011","journal-title":"Nat Rev Drug Discov"},{"key":"2025073113315032800_btaf360-B30","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1016\/j.csbj.2018.02.009","article-title":"Prediction of dyslipidemia using gene mutations, family history of diseases and anthropometric indicators in children and adolescents: the CASPIAN-III study","volume":"16","author":"Marateb","year":"2018","journal-title":"Comput Struct Biotechnol J"},{"key":"2025073113315032800_btaf360-B31","doi-asserted-by":"crossref","first-page":"360","DOI":"10.1007\/s008940100038","article-title":"Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks","volume":"7","author":"Meiler","year":"2001","journal-title":"J. Mol. Model"},{"key":"2025073113315032800_btaf360-B32","doi-asserted-by":"crossref","first-page":"1750","DOI":"10.1016\/j.csbj.2021.03.022","article-title":"The language of proteins: NLP, machine learning & protein sequences","volume":"19","author":"Ofer","year":"2021","journal-title":"Comput Struct Biotechnol J"},{"key":"2025073113315032800_btaf360-B33","first-page":"77","author":"Qi","year":"2017"},{"key":"2025073113315032800_btaf360-B34","doi-asserted-by":"crossref","first-page":"1291","DOI":"10.1038\/s42003-022-04245-4","article-title":"DrugnomeAI is an ensemble machine-learning framework for predicting druggability of candidate drug targets","volume":"5","author":"Raies","year":"2022","journal-title":"Commun Biol"},{"key":"2025073113315032800_btaf360-B35","first-page":"1","article-title":"Evaluating protein transfer learning with TAPE","volume":"32","author":"Rao","year":"2019","journal-title":"Adv Neural Inf Process Syst"},{"key":"2025073113315032800_btaf360-B36","doi-asserted-by":"crossref","first-page":"5505","DOI":"10.1038\/s41598-022-09484-3","article-title":"XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized features set","volume":"12","author":"Sikander","year":"2022","journal-title":"Sci Rep"},{"key":"2025073113315032800_btaf360-B37","doi-asserted-by":"crossref","first-page":"334","DOI":"10.1007\/s40484-018-0157-2","article-title":"Analysis of protein features and machine learning algorithms for prediction of druggable proteins","volume":"6","author":"Sun","year":"2018","journal-title":"Quant Biol"},{"key":"2025073113315032800_btaf360-B38","first-page":"2023","author":"Tian","year":"2023"},{"key":"2025073113315032800_btaf360-B39","doi-asserted-by":"crossref","first-page":"730","DOI":"10.1038\/s41592-022-01490-7","article-title":"ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction","volume":"19","author":"Tubiana","year":"2022","journal-title":"Nat Methods"},{"key":"2025073113315032800_btaf360-B40","first-page":"2579","article-title":"Visualizing data using t-SNE laurens","volume":"9","author":"Van der Maaten","year":"2008","journal-title":"J Mach Learn Res"},{"key":"2025073113315032800_btaf360-B41","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1093\/bioinformatics\/btaa701","article-title":"Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function","volume":"37","author":"Villegas-Morcillo","year":"2021","journal-title":"Bioinformatics"},{"key":"2025073113315032800_btaf360-B42","doi-asserted-by":"crossref","first-page":"D1074","DOI":"10.1093\/nar\/gkx1037","article-title":"DrugBank 5.0: a major update to the DrugBank database for 2018","volume":"46","author":"Wishart","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2025073113315032800_btaf360-B43","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1016\/j.jare.2022.01.009","article-title":"The applications of deep learning algorithms on in silico druggable proteins identification","volume":"41","author":"Yu","year":"2022","journal-title":"J Adv Res"},{"key":"2025073113315032800_btaf360-B44","doi-asserted-by":"crossref","first-page":"D1180","DOI":"10.1093\/nar\/gkad1004","article-title":"The ChEMBL database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods","volume":"52","author":"Zdrazil","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025073113315032800_btaf360-B45","doi-asserted-by":"crossref","first-page":"234","DOI":"10.3390\/pharmaceutics14020234","article-title":"Prediction of drug targets for specific diseases leveraging gene perturbation data: a machine learning approach","volume":"14","author":"Zhao","year":"2022","journal-title":"Pharmaceutics"},{"key":"2025073113315032800_btaf360-B46","doi-asserted-by":"crossref","first-page":"2141","DOI":"10.1093\/bib\/bbaa044","article-title":"Identifying drug\u2013target interactions based on graph convolutional network and deep neural network","volume":"22","author":"Zhao","year":"2021","journal-title":"Brief Bioinform"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf360\/63569311\/btaf360.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/7\/btaf360\/63569311\/btaf360.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/7\/btaf360\/63569311\/btaf360.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,31]],"date-time":"2025-07-31T17:32:01Z","timestamp":1753983121000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf360\/8172716"}},"subtitle":[],"editor":[{"given":"Peter","family":"Robinson","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,6,24]]},"references-count":46,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf360","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.09.21.614218","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,6,24]]},"article-number":"btaf360"}}