{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T21:32:07Z","timestamp":1762032727934,"version":"build-2065373602"},"reference-count":66,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T00:00:00Z","timestamp":1761696000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004410","name":"Scientific and Technological Research Council of Turkey","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004410","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Protein language models (pLMs) have emerged as powerful tools for capturing the intricate information encoded in protein sequences, facilitating various downstream protein prediction tasks. With numerous pLMs available, there is a critical need for diverse benchmarks to systematically evaluate their performance across biologically relevant tasks. Here, we introduce DARKIN, a zero-shot classification benchmark designed to assign phosphosites to understudied kinases, termed dark kinases. Kinases, which catalyze phosphorylation, are central to cellular signaling pathways. While phosphoproteomics enables the large-scale identification of phosphosites, determining the cognate kinase responsible for the phosphorylation event remains an experimental challenge.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>In DARKIN, we prepared training, validation, and test folds that respect the zero-shot nature of this classification problem, incorporating stratification based on kinase groups and sequence similarity. We evaluated multiple pLMs using two zero-shot classifiers: a novel, training-free k-NN-based method, and a bilinear classifier. Our findings indicate that ESM, ProtT5-XL, and SaProt exhibit superior performance on this task. DARKIN provides a challenging benchmark for assessing pLM efficacy and fosters deeper exploration of under-characterized (dark) kinases by offering a biologically relevant test bed.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The DARKIN benchmark data and the scripts for generating additional splits are publicly available at: https:\/\/github.com\/tastanlab\/darkin<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf480","type":"journal-article","created":{"date-parts":[[2025,8,28]],"date-time":"2025-08-28T11:45:03Z","timestamp":1756381503000},"source":"Crossref","is-referenced-by-count":0,"title":["DARKIN: a zero-shot benchmark for phosphosite\u2013dark kinase association using protein language models"],"prefix":"10.1093","volume":"41","author":[{"given":"Emine Ay\u015fe","family":"Sunar","sequence":"first","affiliation":[{"name":"Faculty of Engineering and Natural Sciences, Sabanci University , Istanbul 34956, T\u00fcrkiye"}]},{"given":"Zeynep","family":"I\u015f\u0131k","sequence":"additional","affiliation":[{"name":"Faculty of Engineering and Natural Sciences, Sabanci University , Istanbul 34956, T\u00fcrkiye"}]},{"given":"Mert","family":"Pekey","sequence":"additional","affiliation":[{"name":"Faculty of Engineering and Natural Sciences, Sabanci University , Istanbul 34956, T\u00fcrkiye"}]},{"given":"Ramazan G\u00f6kberk","family":"Cinbi\u015f","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Middle East Technical University , Ankara 06800, T\u00fcrkiye"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7058-5372","authenticated-orcid":false,"given":"Oznur","family":"Tastan","sequence":"additional","affiliation":[{"name":"Faculty of Engineering and Natural Sciences, Sabanci University , Istanbul 34956, T\u00fcrkiye"}]}],"member":"286","published-online":{"date-parts":[[2025,10,29]]},"reference":[{"key":"2025110117301039800_btaf480-B2","doi-asserted-by":"crossref","first-page":"W547","DOI":"10.1093\/nar\/gkaf394","article-title":"The UniProt website API: facilitating programmatic access to protein knowledge","volume":"53","author":"Ahmad","year":"2025","journal-title":"Nucleic Acids Res"},{"first-page":"2927","year":"2015","author":"Akata","key":"2025110117301039800_btaf480-B3"},{"key":"2025110117301039800_btaf480-B4","doi-asserted-by":"crossref","first-page":"1425","DOI":"10.1109\/TPAMI.2015.2487986","article-title":"Label-embedding for image classification","volume":"38","author":"Akata","year":"2016","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2025110117301039800_btaf480-B5","doi-asserted-by":"crossref","first-page":"e0141287","DOI":"10.1371\/journal.pone.0141287","article-title":"Continuous distributed representation of biological sequences for deep proteomics and genomics","volume":"10","author":"Asgari","year":"2015","journal-title":"PloS One"},{"key":"2025110117301039800_btaf480-B6","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1093\/nar\/28.1.304","article-title":"The ENZYME database in 2000","volume":"28","author":"Bairoch","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2025110117301039800_btaf480-B7","doi-asserted-by":"crossref","first-page":"D154","DOI":"10.1093\/nar\/gki070","article-title":"The Universal Protein Resource (UniProt)","volume":"33","author":"Bairoch","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2025110117301039800_btaf480-B8","doi-asserted-by":"crossref","first-page":"D529","DOI":"10.1093\/nar\/gkaa853","article-title":"The dark kinase knowledgebase: an online compendium of knowledge and experimental results of understudied kinases","volume":"49","author":"Berginski","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025110117301039800_btaf480-B9","doi-asserted-by":"crossref","first-page":"1444","DOI":"10.1038\/s41592-024-02362-y","article-title":"Guiding questions to avoid data leakage in biological machine learning applications","volume":"21","author":"Bernett","year":"2024","journal-title":"Nat Methods"},{"key":"2025110117301039800_btaf480-B10","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1038\/35077225","article-title":"Oncogenic kinase signalling","volume":"411","author":"Blume-Jensen","year":"2001","journal-title":"Nature"},{"key":"2025110117301039800_btaf480-B11","doi-asserted-by":"crossref","first-page":"2102","DOI":"10.1093\/bioinformatics\/btac020","article-title":"ProteinBERT: a universal deep-learning model of protein sequence and function","volume":"38","author":"Brandes","year":"2022","journal-title":"Bioinformatics"},{"year":"2025","author":"Bri\u00e8re","key":"2025110117301039800_btaf480-B12","doi-asserted-by":"publisher","DOI":"10.1101\/2025.01.23.634511"},{"year":"2025","author":"Beyza \u00c7and\u0131r","key":"2025110117301039800_btaf480-B13","doi-asserted-by":"publisher","DOI":"10.1101\/2025.04.18.649584"},{"key":"2025110117301039800_btaf480-B14","doi-asserted-by":"crossref","first-page":"1989","DOI":"10.1038\/s41467-023-37572-z","article-title":"Improving the generalizability of protein-ligand binding predictions with AI-bind","volume":"14","author":"Chatterjee","year":"2023","journal-title":"Nat Commun"},{"key":"2025110117301039800_btaf480-B15","doi-asserted-by":"crossref","first-page":"E127","DOI":"10.1038\/ncb0502-e127","article-title":"The origins of protein phosphorylation","volume":"4","author":"Cohen","year":"2002","journal-title":"Nat Cell Biol"},{"key":"2025110117301039800_btaf480-B16","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1038\/s41573-021-00195-4","article-title":"Kinase drug discovery 20 years after imatinib: progress and future directions","volume":"20","author":"Cohen","year":"2021","journal-title":"Nat Rev Drug Discov"},{"year":"2019","author":"Devlin","key":"2025110117301039800_btaf480-B17"},{"key":"2025110117301039800_btaf480-B18","doi-asserted-by":"crossref","first-page":"3652","DOI":"10.1093\/bioinformatics\/btaa013","article-title":"Deepkinzero: zero-shot learning for predicting kinase\u2013phosphosite associations involving understudied kinases","volume":"36","author":"Deznabi","year":"2020","journal-title":"Bioinformatics"},{"key":"2025110117301039800_btaf480-B19","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1186\/s12859-016-1433-7","article-title":"Kinmap: a web-based tool for interactive navigation through human kinome data","volume":"18","author":"Eid","year":"2017","journal-title":"BMC Bioinformatics"},{"year":"2023","author":"Elnaggar","key":"2025110117301039800_btaf480-B20"},{"key":"2025110117301039800_btaf480-B21","doi-asserted-by":"crossref","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"Prottrans: towards cracking the language of life\u2019s code through self-supervised deep learning and high performance computing","volume":"44","author":"Elnaggar","year":"2021","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"author":"ESM Team","key":"2025110117301039800_btaf480-B22"},{"key":"2025110117301039800_btaf480-B23","doi-asserted-by":"crossref","first-page":"gigabyte13","DOI":"10.46471\/gigabyte.13","article-title":"Epitopepredict: a tool for integrated MHC binding prediction","volume":"2021","author":"Farrell","year":"2021","journal-title":"GigaByte"},{"key":"2025110117301039800_btaf480-B24","doi-asserted-by":"crossref","first-page":"4348","DOI":"10.1038\/s41467-022-32007-7","article-title":"Protgpt2 is a deep unsupervised language model for protein design","volume":"13","author":"Ferruz","year":"2022","journal-title":"Nat Commun"},{"year":"2024","author":"Fournier","key":"2025110117301039800_btaf480-B25","doi-asserted-by":"publisher","DOI":"10.1101\/2024.09.23.614603"},{"key":"2025110117301039800_btaf480-B26","first-page":"2121","article-title":"Devise: a deep visual-semantic embedding model","author":"Frome","year":"2013"},{"key":"2025110117301039800_btaf480-B27","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1038\/nrd2829","article-title":"Targeting innate immunity protein kinase signalling in inflammation","volume":"8","author":"Gaestel","year":"2009","journal-title":"Nat Rev Drug Discov"},{"key":"2025110117301039800_btaf480-B28","doi-asserted-by":"crossref","first-page":"ii95","DOI":"10.1093\/bioinformatics\/btac474","article-title":"Distilprotbert: a distilled protein language model used to distinguish between real proteins and their randomly shuffled counterparts","volume":"38","author":"Geffen","year":"2022","journal-title":"Bioinformatics"},{"key":"2025110117301039800_btaf480-B29","doi-asserted-by":"crossref","first-page":"2502","DOI":"10.4161\/cc.8.16.9335","article-title":"The Akt kinases: isoform specificity in metabolism and cancer","volume":"8","author":"Gonzalez","year":"2009","journal-title":"Cell Cycle"},{"key":"2025110117301039800_btaf480-B30","doi-asserted-by":"crossref","first-page":"850","DOI":"10.1126\/science.ads0018","article-title":"Simulating 500 million years of evolution with a language model","volume":"387","author":"Hayes","year":"2025","journal-title":"Science"},{"key":"2025110117301039800_btaf480-B31","doi-asserted-by":"crossref","first-page":"D261","DOI":"10.1093\/nar\/gkr1122","article-title":"Phosphositeplus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse","volume":"40","author":"Hornbeck","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2025110117301039800_btaf480-B32","doi-asserted-by":"crossref","first-page":"D512","DOI":"10.1093\/nar\/gku1267","article-title":"Phosphositeplus, 2014: mutations, PTMs and recalibrations","volume":"43","author":"Hornbeck","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2025110117301039800_btaf480-B33","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1016\/0092-8674(95)90405-0","article-title":"Protein kinases and phosphatases: the yin and yang of protein phosphorylation and signalling","volume":"80","author":"Hunter","year":"1995","journal-title":"Cell"},{"key":"2025110117301039800_btaf480-B34","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with alphafold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"first-page":"3174","year":"2017","author":"Kodirov","key":"2025110117301039800_btaf480-B35"},{"year":"2022","author":"Lin","key":"2025110117301039800_btaf480-B36"},{"key":"2025110117301039800_btaf480-B37","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2025110117301039800_btaf480-B38","doi-asserted-by":"crossref","first-page":"1912","DOI":"10.1126\/science.1075762","article-title":"The protein kinase complement of the human genome","volume":"298","author":"Manning","year":"2002","journal-title":"Science"},{"key":"2025110117301039800_btaf480-B39","first-page":"29287","volume-title":"Advances in Neural Information Processing Systems","author":"Meier","year":"2021"},{"key":"2025110117301039800_btaf480-B40","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1038\/s41592-022-01488-1","article-title":"Colabfold: making protein folding accessible to all","volume":"19","author":"Mirdita","year":"2022","journal-title":"Nat Methods"},{"year":"2020","author":"Moret","key":"2025110117301039800_btaf480-B41"},{"key":"2025110117301039800_btaf480-B42","doi-asserted-by":"crossref","first-page":"818","DOI":"10.1038\/nchembio.1938","article-title":"The ins and outs of selective kinase inhibitor development","volume":"11","author":"M\u00fcller","year":"2015","journal-title":"Nat Chem Biol"},{"key":"2025110117301039800_btaf480-B43","doi-asserted-by":"crossref","first-page":"3185","DOI":"10.1016\/j.eswa.2010.09.005","article-title":"A new encoding technique for peptide classification","volume":"38","author":"Nanni","year":"2011","journal-title":"Expert Syst Appl"},{"key":"2025110117301039800_btaf480-B44","doi-asserted-by":"crossref","first-page":"eaau8645","DOI":"10.1126\/scisignal.aau8645","article-title":"Illuminating the dark phosphoproteome","volume":"12","author":"Needham","year":"2019","journal-title":"Science Signal"},{"year":"2024","author":"Ouyang-Zhang","key":"2025110117301039800_btaf480-B45","doi-asserted-by":"publisher","DOI":"10.1101\/2024.11.08.622579"},{"key":"2025110117301039800_btaf480-B46","doi-asserted-by":"crossref","first-page":"945","DOI":"10.1038\/s41592-025-02656-9","article-title":"PTM-Mamba: a PTM-aware protein language model with bidirectional gated mamba blocks","volume":"22","author":"Peng","year":"2025","journal-title":"Nat Methods"},{"author":"Rao","key":"2025110117301039800_btaf480-B47","first-page":"9689"},{"first-page":"2152","year":"2015","author":"Romera-Paredes","key":"2025110117301039800_btaf480-B48"},{"volume-title":"Introduction to Modern Information Retrieval","year":"1983","author":"Salton","key":"2025110117301039800_btaf480-B49"},{"key":"2025110117301039800_btaf480-B1","doi-asserted-by":"crossref","first-page":"102641","DOI":"10.1016\/j.sbi.2023.102641","article-title":"Finding functional motifs in protein sequences with deep learning and natural language models","volume":"81","year":"2023","journal-title":"Curr Opin Struct Biol"},{"key":"2025110117301039800_btaf480-B50","doi-asserted-by":"crossref","first-page":"7407","DOI":"10.1038\/s41467-024-51844-2","article-title":"Fine-tuning protein language models boosts predictions across diverse tasks","volume":"15","author":"Schmirler","year":"2024","journal-title":"Nat Commun"},{"key":"2025110117301039800_btaf480-B51","doi-asserted-by":"crossref","first-page":"4846","DOI":"10.1038\/ncomms5846","article-title":"The landscape of kinase fusions in cancer","volume":"5","author":"Stransky","year":"2014","journal-title":"Nat Commun"},{"author":"Su","key":"2025110117301039800_btaf480-B52"},{"key":"2025110117301039800_btaf480-B53","doi-asserted-by":"crossref","first-page":"770","DOI":"10.1109\/TGRS.2017.2754648","article-title":"Fine-grained object recognition and zero-shot learning in remote sensing imagery","volume":"56","author":"Sumbul","year":"2018","journal-title":"IEEE Trans Geosci Remote Sensing"},{"key":"2025110117301039800_btaf480-B54","doi-asserted-by":"crossref","first-page":"D523","DOI":"10.1093\/nar\/gkac1052","article-title":"Uniprot: the universal protein knowledgebase in 2023","volume":"51","author":"The UniProt Consortium","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2025110117301039800_btaf480-B55","doi-asserted-by":"crossref","first-page":"2927","DOI":"10.1093\/bioinformatics\/btr525","article-title":"Computational prediction of eukaryotic phosphorylation sites","volume":"27","author":"Trost","year":"2011","journal-title":"Bioinformatics"},{"key":"2025110117301039800_btaf480-B56","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1038\/s42256-022-00457-9","article-title":"Learning functional properties of proteins with language models","volume":"4","author":"Unsal","year":"2022","journal-title":"Nat Mach Intell"},{"key":"2025110117301039800_btaf480-B57","doi-asserted-by":"crossref","first-page":"D344","DOI":"10.1093\/nar\/gkz853","article-title":"Pdbe-kb: a community-driven resource for structural and functional annotations","volume":"48","author":"Varadi","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2025110117301039800_btaf480-B58","doi-asserted-by":"crossref","first-page":"D439","DOI":"10.1093\/nar\/gkab1061","article-title":"Alphafold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models","volume":"50","author":"Varadi","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2025110117301039800_btaf480-B59","doi-asserted-by":"crossref","first-page":"1077","DOI":"10.1038\/s41417-021-00408-3","article-title":"Diving into the dark kinome: lessons learned from lmtk3","volume":"29","author":"Vella","year":"2022","journal-title":"Cancer Gene Ther"},{"key":"2025110117301039800_btaf480-B60","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1038\/nmeth.3396","article-title":"Mimp: predicting the impact of mutations on kinase-substrate phosphorylation","volume":"12","author":"Wagih","year":"2015","journal-title":"Nat Methods"},{"author":"Wang","key":"2025110117301039800_btaf480-B61"},{"key":"2025110117301039800_btaf480-B62","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1038\/s41576-021-00434-9","article-title":"Navigating the pitfalls of applying machine learning in genomics","volume":"23","author":"Whalen","year":"2022","journal-title":"Nat Rev Genet"},{"key":"2025110117301039800_btaf480-B63","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1038\/s41392-023-01439-y","article-title":"Targeting protein modifications in metabolic diseases: molecular mechanisms and targeted therapies","volume":"8","author":"Wu","year":"2023","journal-title":"Signal Transduct Target Ther"},{"year":"2017","author":"Xian","key":"2025110117301039800_btaf480-B64"},{"key":"2025110117301039800_btaf480-B66","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3641289","volume-title":"ACM Trans Intell Syst Technol","year":"2024"},{"key":"2025110117301039800_btaf480-B65","doi-asserted-by":"crossref","first-page":"e45","DOI":"10.1002\/imo2.45","article-title":"Hyena architecture enables fast and efficient protein language modeling","volume":"2","author":"Zhang","year":"2025","journal-title":"IMetaOmics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf480\/65025564\/btaf480.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/11\/btaf480\/65025564\/btaf480.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/11\/btaf480\/65025564\/btaf480.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T21:30:23Z","timestamp":1762032623000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf480\/8306017"}},"subtitle":[],"editor":[{"given":"Lenore","family":"Cowen","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,10,29]]},"references-count":66,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf480","relation":{},"ISSN":["1367-4811"],"issn-type":[{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2025,11]]},"published":{"date-parts":[[2025,10,29]]},"article-number":"btaf480"}}