{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:33Z","timestamp":1772138073665,"version":"3.50.1"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2025,3,21]],"date-time":"2025-03-21T00:00:00Z","timestamp":1742515200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["#1901793"],"award-info":[{"award-number":["#1901793"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["#1564606"],"award-info":[{"award-number":["#1564606"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["#2215734"],"award-info":[{"award-number":["#2215734"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,3,29]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The mapping from codon to amino acid is surjective due to codon degeneracy, suggesting that codon space might harbor higher information content. Embeddings from the codon language model have recently demonstrated success in various protein downstream tasks. However, predictive models for residue-level tasks such as phosphorylation sites, arguably the most studied Post-Translational Modification (PTM), and PTM sites prediction in general, have predominantly relied on representations in amino acid space.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We introduce a novel approach for predicting phosphorylation sites by utilizing codon-level information through embeddings from the codon adaptation language model (CaLM), trained on protein-coding DNA sequences. Protein sequences are first reverse-translated into reliable coding sequences by mapping UniProt sequences to their corresponding NCBI reference sequences and extracting the exact coding sequences from their GenBank format using a dynamic programming-based global pairwise alignment. The resulting coding sequences are encoded using the CaLM encoder to generate codon-aware embeddings, which are subsequently integrated with amino acid-aware embeddings obtained from a protein language model, through an early fusion strategy. Next, a window-level representation of the site of interest, retaining the full sequence context, is constructed from the fused embeddings. A ConvBiGRU network extracts feature maps that capture spatiotemporal correlations between proximal residues within the window. This is followed by a prediction head based on a Kolmogorov-Arnold network (KAN) using the derivative of gaussian wavelet transform to generate the inference for the site. The overall model, dubbed CaLMPhosKAN, performs better than the existing approaches across multiple datasets.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>CaLMPhosKAN is publicly available at https:\/\/github.com\/KCLabMTU\/CaLMPhosKAN.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf124","type":"journal-article","created":{"date-parts":[[2025,3,17]],"date-time":"2025-03-17T16:20:24Z","timestamp":1742228424000},"source":"Crossref","is-referenced-by-count":3,"title":["CaLMPhosKAN: prediction of general phosphorylation sites in proteins via fusion of codon aware embeddings with amino acid aware embeddings and wavelet-based Kolmogorov\u2013Arnold network"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4210-1200","authenticated-orcid":false,"given":"Pawel","family":"Pratyush","sequence":"first","affiliation":[{"name":"Golisano College of Computing and Information Sciences, Rochester Institute of Technology , Rochester, NY 14623,","place":["United States"]}]},{"given":"Callen","family":"Carrier","sequence":"additional","affiliation":[{"name":"College of Computing, Michigan Technological University , Houghton, MI 49931,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1495-2953","authenticated-orcid":false,"given":"Suresh","family":"Pokharel","sequence":"additional","affiliation":[{"name":"Golisano College of Computing and Information Sciences, Rochester Institute of Technology , Rochester, NY 14623,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2690-5655","authenticated-orcid":false,"given":"Hamid D","family":"Ismail","sequence":"additional","affiliation":[{"name":"College of Engineering, North Carolina Agricultural and Technical State University , Greensboro, NC 27411,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9016-7549","authenticated-orcid":false,"given":"Meenal","family":"Chaudhari","sequence":"additional","affiliation":[{"name":"College of Applied Sciences and Technology, Illinois State University , Normal, IL 61761,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7443-1928","authenticated-orcid":false,"given":"Dukka B","family":"KC","sequence":"additional","affiliation":[{"name":"Golisano College of Computing and Information Sciences, Rochester Institute of Technology , Rochester, NY 14623,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2025,3,21]]},"reference":[{"key":"2025041602170754100_btaf124-B1","doi-asserted-by":"crossref","first-page":"271","DOI":"10.3892\/ijmm.2017.3036","article-title":"The crucial role of protein phosphorylation in cell signaling and its use as targeted therapy","volume":"40","author":"Ardito","year":"2017","journal-title":"Int J Mol Med"},{"key":"2025041602170754100_btaf124-B2","doi-asserted-by":"crossref","first-page":"2646","DOI":"10.1016\/j.csbj.2021.04.042","article-title":"Codon-based indices for modeling gene expression and transcript evolution","volume":"19","author":"Bahiri-Elitzur","year":"2021","journal-title":"Comput Struct Biotechnol J"},{"key":"2025041602170754100_btaf124-B3","doi-asserted-by":"crossref","first-page":"D36","DOI":"10.1093\/nar\/gks1195","article-title":"Genbank","volume":"41","author":"Benson","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2025041602170754100_btaf124-B4","author":"Bozorgasl","year":"2024"},{"key":"2025041602170754100_btaf124-B5","doi-asserted-by":"publisher","first-page":"e82819","DOI":"10.7554\/eLife.82819","article-title":"Transformer-based deep learning for predicting protein properties in the life sciences","volume":"12","author":"Chandra","year":"2023","journal-title":"eLife"},{"key":"2025041602170754100_btaf124-B6","doi-asserted-by":"crossref","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"ProtTrans: toward understanding the language of life through self-supervised learning","volume":"44","author":"Elnaggar","year":"2022","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2025041602170754100_btaf124-B7","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1021\/acs.jproteome.0c00431","article-title":"DeepPSP: a global\u2013local information-based deep neural network for the prediction of protein phosphorylation sites","volume":"20","author":"Guo","year":"2021","journal-title":"J Proteome Res"},{"key":"2025041602170754100_btaf124-B8","doi-asserted-by":"crossref","first-page":"16585","DOI":"10.1021\/ja408355p","article-title":"\u03c0-electron conjugation in two dimensions","volume":"135","author":"Gutzler","year":"2013","journal-title":"J Am Chem Soc"},{"key":"2025041602170754100_btaf124-B9","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbae282","article-title":"Cryosegnet: accurate cryo-EM protein particle picking by integrating the foundational ai image segmentation model and attention-gated u-net","volume":"25","author":"Gyawali","year":"2024","journal-title":"Brief Bioinform"},{"key":"2025041602170754100_btaf124-B10","author":"Harbecke","year":"2022"},{"key":"2025041602170754100_btaf124-B11","doi-asserted-by":"crossref","first-page":"426","DOI":"10.1021\/pr0341033","article-title":"Identification of phosphorylation sites in protein kinase a substrates using artificial neural networks and mass spectrometry","volume":"3","author":"Hjerrild","year":"2004","journal-title":"J Proteome Res"},{"key":"2025041602170754100_btaf124-B12","doi-asserted-by":"crossref","first-page":"3281590","DOI":"10.1155\/2016\/3281590","article-title":"RF-Phos: a novel general phosphorylation site prediction tool based on random forest","volume":"2016","author":"Ismail","year":"2016","journal-title":"Biomed Res Int"},{"key":"2025041602170754100_btaf124-B13","doi-asserted-by":"publisher","first-page":"13566","DOI":"10.1038\/s41598-024-64211-4","article-title":"Protein embeddings predict binding residues in disordered regions","volume":"14","author":"Jahn","year":"2024","journal-title":"Sci Rep"},{"key":"2025041602170754100_btaf124-B14","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1042\/BST0370627","article-title":"The regulation of protein phosphorylation","volume":"37","author":"Johnson","year":"2009","journal-title":"Biochem Soc Trans"},{"key":"2025041602170754100_btaf124-B15","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1007\/s00210-014-1063-4","article-title":"Evidence of histidine and aspartic acid phosphorylation in human prostate cancer cells","volume":"388","author":"Lapek","year":"2015","journal-title":"Naunyn-Schmiedeberg\u2019s Arch Pharmacol"},{"key":"2025041602170754100_btaf124-B16","doi-asserted-by":"crossref","first-page":"827","DOI":"10.1038\/s41592-022-01524-0","article-title":"Histidine phosphorylation in human cells; a needle or phantom in the haystack?","volume":"19","author":"Leijten","year":"2022","journal-title":"Nat Methods"},{"key":"2025041602170754100_btaf124-B17","author":"Lin","year":"2022"},{"key":"2025041602170754100_btaf124-B18","author":"Liu","year":"2024"},{"key":"2025041602170754100_btaf124-B19","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbab244","article-title":"DeepIPs: comprehensive assessment and computational identification of phosphorylation sites of sars-cov-2 infection using a deep learning-based approach","volume":"22","author":"Lv","year":"2021","journal-title":"Brief Bioinform"},{"key":"2025041602170754100_btaf124-B20","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1016\/j.gpb.2022.06.004","article-title":"KinasePhos 3.0: redesign and expansion of the prediction on kinase-specific phosphorylation sites","volume":"21","author":"Ma","year":"2023","journal-title":"Genomics Proteomics Bioinf"},{"key":"2025041602170754100_btaf124-B21","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1146\/annurev-biophys-030722-020555","article-title":"The effects of codon usage on protein structure and folding","volume":"53","author":"Moss","year":"2024","journal-title":"Annu Rev Biophys"},{"key":"2025041602170754100_btaf124-B22","doi-asserted-by":"crossref","first-page":"596","DOI":"10.1038\/nrd2056","article-title":"Drug discovery in the ubiquitin\u2013proteasome system","volume":"5","author":"Nalepa","year":"2006","journal-title":"Nat Rev Drug Discov"},{"key":"2025041602170754100_btaf124-B23","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J Mol Biol"},{"key":"2025041602170754100_btaf124-B24","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1038\/s42256-024-00791-0","article-title":"Codon language embeddings provide strong signals for use in protein engineering","volume":"6","author":"Outeiral","year":"2024","journal-title":"Nat Mach Intell"},{"key":"2025041602170754100_btaf124-B25","doi-asserted-by":"crossref","first-page":"2548","DOI":"10.1021\/acs.jproteome.2c00667","article-title":"LMPhossite: a deep learning-based approach for general protein phosphorylation site prediction using embeddings from the local window sequence and pretrained protein language model","volume":"22","author":"Pakhrin","year":"2023","journal-title":"J Proteome Res"},{"key":"2025041602170754100_btaf124-B26","doi-asserted-by":"crossref","first-page":"16000","DOI":"10.3390\/ijms242116000","article-title":"Integrating embeddings from multiple protein language models to improve protein O-GlcNAc site prediction","volume":"24","author":"Pokharel","year":"2023","journal-title":"Int J Mol Sci"},{"key":"2025041602170754100_btaf124-B27","doi-asserted-by":"crossref","first-page":"btae290","DOI":"10.1093\/bioinformatics\/btae290","article-title":"LMCrot: an enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model","volume":"40","author":"Pratyush","year":"2024","journal-title":"Bioinformatics"},{"key":"2025041602170754100_btaf124-B28","doi-asserted-by":"crossref","first-page":"64","DOI":"10.2174\/092986610789909412","article-title":"Evaluation of protein phosphorylation site predictors","volume":"17","author":"Que","year":"2010","journal-title":"Protein Pept Lett"},{"key":"2025041602170754100_btaf124-B29","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J Mach Learn Res"},{"key":"2025041602170754100_btaf124-B30","doi-asserted-by":"crossref","first-page":"e0118432","DOI":"10.1371\/journal.pone.0118432","article-title":"The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets","volume":"10","author":"Saito","year":"2015","journal-title":"PLoS One"},{"key":"2025041602170754100_btaf124-B31","doi-asserted-by":"crossref","first-page":"24520","DOI":"10.1021\/acsomega.4c00523","article-title":"Normal mode analysis elicits conformational shifts in proteins at both proximal and distal regions to the phosphosite stemming from single-site phosphorylation","volume":"9","author":"Subhadarshini","year":"2024","journal-title":"ACS Omega"},{"key":"2025041602170754100_btaf124-B32","doi-asserted-by":"crossref","first-page":"2634","DOI":"10.2174\/0929867325666180508095242","article-title":"Inhibitors of serine\/threonine protein phosphatases: biochemical and structural studies provide insight for further development","volume":"26","author":"Swingle","year":"2019","journal-title":"Curr Med Chem"},{"key":"2025041602170754100_btaf124-B33","doi-asserted-by":"publisher","first-page":"12550","DOI":"10.1038\/s41598-021-91840-w","article-title":"A deep learning based approach for prediction of Chlamydomonas reinhardtii phosphorylation sites","volume":"11","author":"Thapa","year":"2021","journal-title":"Sci Rep"},{"key":"2025041602170754100_btaf124-B34","doi-asserted-by":"crossref","first-page":"W140","DOI":"10.1093\/nar\/gkaa275","article-title":"MusiteDeep: a deep-learning based webserver for protein post-translational modification site prediction and visualization","volume":"48","author":"Wang","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2025041602170754100_btaf124-B35","doi-asserted-by":"crossref","first-page":"1854","DOI":"10.3390\/biom12121854","article-title":"A novel capsule network with attention routing to identify prokaryote phosphorylation sites","volume":"12","author":"Wang","year":"2022","journal-title":"Biomolecules"},{"key":"2025041602170754100_btaf124-B36","doi-asserted-by":"crossref","first-page":"1443","DOI":"10.1016\/j.bbrc.2004.11.001","article-title":"GPS: a novel group-based phosphorylation predicting and scoring method","volume":"325","author":"Zhou","year":"2004","journal-title":"Biochem Biophys Res Commun"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf124\/62520362\/btaf124.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/4\/btaf124\/62520362\/btaf124.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/4\/btaf124\/62520362\/btaf124.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,16]],"date-time":"2025-04-16T02:17:25Z","timestamp":1744769845000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf124\/8090024"}},"subtitle":[],"editor":[{"given":"Xin","family":"Gao","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,3,21]]},"references-count":36,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,3,29]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf124","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.07.30.605530","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,4]]},"published":{"date-parts":[[2025,3,21]]},"article-number":"btaf124"}}