{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,29]],"date-time":"2026-05-29T13:45:24Z","timestamp":1780062324289,"version":"3.54.0"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2024,4,25]],"date-time":"2024-04-25T00:00:00Z","timestamp":1714003200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["# 1901793"],"award-info":[{"award-number":["# 1901793"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["# 1564606"],"award-info":[{"award-number":["# 1564606"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,5,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Recent advancements in natural language processing have highlighted the effectiveness of global contextualized representations from protein language models (pLMs) in numerous downstream tasks. Nonetheless, strategies to encode the site-of-interest leveraging pLMs for per-residue prediction tasks, such as crotonylation (Kcr) prediction, remain largely uncharted.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Herein, we adopt a range of approaches for utilizing pLMs by experimenting with different input sequence types (full-length protein sequence versus window sequence), assessing the implications of utilizing per-residue embedding of the site-of-interest as well as embeddings of window residues centered around it. Building upon these insights, we developed a novel residual ConvBiLSTM network designed to process window-level embeddings of the site-of-interest generated by the ProtT5-XL-UniRef50 pLM using full-length sequences as input. This model, termed T5ResConvBiLSTM, surpasses existing state-of-the-art Kcr predictors in performance across three diverse datasets. To validate our approach of utilizing full sequence-based window-level embeddings, we also delved into the interpretability of ProtT5-derived embedding tensors in two ways: firstly, by scrutinizing the attention weights obtained from the transformer\u2019s encoder block; and secondly, by computing SHAP values for these tensors, providing a model-agnostic interpretation of the prediction results. Additionally, we enhance the latent representation of ProtT5 by incorporating two additional local representations, one derived from amino acid properties and the other from supervised embedding layer, through an intermediate fusion stacked generalization approach, using an n-mer window sequence (or, peptide\/fragment). The resultant stacked model, dubbed LMCrot, exhibits a more pronounced improvement in predictive performance across the tested datasets.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>LMCrot is publicly available at https:\/\/github.com\/KCLabMTU\/LMCrot.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae290","type":"journal-article","created":{"date-parts":[[2024,4,25]],"date-time":"2024-04-25T16:37:17Z","timestamp":1714063037000},"source":"Crossref","is-referenced-by-count":19,"title":["LMCrot: an enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4210-1200","authenticated-orcid":false,"given":"Pawel","family":"Pratyush","sequence":"first","affiliation":[{"name":"Department of Computer Science, Michigan Technological University , Houghton, MI 49931, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Soufia","family":"Bahmani","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Michigan Technological University , Houghton, MI 49931, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Suresh","family":"Pokharel","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Michigan Technological University , Houghton, MI 49931, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hamid D","family":"Ismail","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Michigan Technological University , Houghton, MI 49931, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7443-1928","authenticated-orcid":false,"given":"Dukka B","family":"KC","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Michigan Technological University , Houghton, MI 49931, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2024,4,25]]},"reference":[{"key":"2024051208013367700_btae290-B1","author":"Chandra","year":"2023"},{"key":"2024051208013367700_btae290-B2","author":"Elnaggar","year":"2020"},{"key":"2024051208013367700_btae290-B3","author":"Elnaggar","year":"2023"},{"key":"2024051208013367700_btae290-B4","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1038\/s42003-023-04462-5","article-title":"Learning the protein language of proteome-wide protein\u2013protein binding sites via explainable ensemble deep learning","volume":"6","author":"Hou","year":"2023","journal-title":"Commun Biol"},{"key":"2024051208013367700_btae290-B5","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1007\/978-1-0716-2317-6_3","volume-title":"Computational methods for predicting Post-Translational modification sites","author":"Ismail","year":"2022"},{"key":"2024051208013367700_btae290-B6","doi-asserted-by":"crossref","first-page":"703","DOI":"10.1038\/s41419-021-03987-z","article-title":"Protein lysine crotonylation: past, present, perspective","volume":"12","author":"Jiang","year":"2021","journal-title":"Cell Death Dis"},{"key":"2024051208013367700_btae290-B7","author":"Joulin","year":"2016"},{"key":"2024051208013367700_btae290-B8","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1016\/j.jmgm.2017.08.020","article-title":"Prediction of lysine crotonylation sites by incorporating the composition of k-spaced amino acid pairs into Chou\u2019s general PseAAC","volume":"77","author":"Ju","year":"2017","journal-title":"J Mol Graph Model"},{"key":"2024051208013367700_btae290-B9","doi-asserted-by":"crossref","first-page":"bbab492","DOI":"10.1093\/bib\/bbab492","article-title":"DeepCap-Kcr: accurate identification and investigation of protein lysine crotonylation sites based on capsule network","volume":"23","author":"Khanal","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024051208013367700_btae290-B10","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1016\/j.csbj.2022.11.056","article-title":"CapsNh-Kcr: capsule network-based prediction of lysine crotonylation sites in human non-histone proteins","volume":"21","author":"Khanal","year":"2023","journal-title":"Comput Struct Biotechnol J"},{"key":"2024051208013367700_btae290-B11","doi-asserted-by":"crossref","first-page":"bbac037","DOI":"10.1093\/bib\/bbac037","article-title":"Adapt-Kcr: a novel deep learning framework for accurate prediction of lysine crotonylation sites based on learning embedding features and attention architecture","volume":"23","author":"Li","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024051208013367700_btae290-B12","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2024051208013367700_btae290-B13","doi-asserted-by":"crossref","first-page":"113903","DOI":"10.1016\/j.ab.2020.113903","article-title":"Prediction of protein crotonylation sites through LightGBM classifier based on smote and elastic net","volume":"609","author":"Liu","year":"2020","journal-title":"Anal Biochem"},{"key":"2024051208013367700_btae290-B14","author":"Lundberg"},{"key":"2024051208013367700_btae290-B15","doi-asserted-by":"crossref","first-page":"bbaa255","DOI":"10.1093\/bib\/bbaa255","article-title":"Deep-Kcr: accurate detection of lysine crotonylation sites using deep learning method","volume":"22","author":"Lv","year":"2021","journal-title":"Brief Bioinform"},{"key":"2024051208013367700_btae290-B16","doi-asserted-by":"crossref","first-page":"2548","DOI":"10.1021\/acs.jproteome.2c00667","article-title":"Lmphossite: a deep learning-based approach for general protein phosphorylation site prediction using embeddings from the local window sequence and pretrained protein language model","volume":"22","author":"Pakhrin","year":"2023","journal-title":"J Proteome Res"},{"key":"2024051208013367700_btae290-B17","author":"Peters","year":"2017"},{"key":"2024051208013367700_btae290-B18","doi-asserted-by":"crossref","first-page":"16933","DOI":"10.1038\/s41598-022-21366-2","article-title":"Improving protein succinylation sites prediction using embeddings from protein language model","volume":"12","author":"Pokharel","year":"2022","journal-title":"Sci Rep"},{"key":"2024051208013367700_btae290-B19","doi-asserted-by":"crossref","first-page":"16000","DOI":"10.3390\/ijms242116000","article-title":"Integrating embeddings from multiple protein language models to improve protein O-GlcNAc site prediction","volume":"24","author":"Pokharel","year":"2023","journal-title":"Int J Mol Sci"},{"key":"2024051208013367700_btae290-B20","first-page":"81","volume-title":"Machine learning in bioinformatics of protein sequences: Algorithms, databases and resources for modern protein bioinformatics","author":"Pokharel"},{"key":"2024051208013367700_btae290-B21","first-page":"37","volume-title":"J Mach Learn Technol","author":"Powers"},{"key":"2024051208013367700_btae290-B22","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1186\/s12859-023-05164-9","article-title":"pLMSNOSite: an ensemble-based approach for predicting protein s-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model","volume":"24","author":"Pratyush","year":"2023","journal-title":"BMC Bioinform"},{"key":"2024051208013367700_btae290-B23","doi-asserted-by":"crossref","first-page":"648","DOI":"10.1093\/bioinformatics\/btab712","article-title":"BERT-Kcr: prediction of lysine crotonylation sites by a transfer learning method with pre-trained BERT models","volume":"38","author":"Qiao","year":"2022","journal-title":"Bioinformatics"},{"key":"2024051208013367700_btae290-B24","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1016\/j.artmed.2017.02.007","article-title":"Identify and analysis crotonylation sites in histone by using support vector machines","volume":"83","author":"Qiu","year":"2017","journal-title":"Artif Intell Med"},{"key":"2024051208013367700_btae290-B25","first-page":"5485","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J Mach Learn Res"},{"key":"2024051208013367700_btae290-B26","author":"Raschka"},{"key":"2024051208013367700_btae290-B27","doi-asserted-by":"crossref","first-page":"3013","DOI":"10.1038\/s41598-017-03369-6","article-title":"First comprehensive proteome analysis of lysine crotonylation in seedling leaves of Nicotiana tabacum","volume":"7","author":"Sun","year":"2017","journal-title":"Sci Rep"},{"key":"2024051208013367700_btae290-B28","author":"Vaswani"},{"key":"2024051208013367700_btae290-B29","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1093\/bioinformatics\/btaa701","article-title":"Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function","volume":"37","author":"Villegas-Morcillo","year":"2021","journal-title":"Bioinformatics"},{"key":"2024051208013367700_btae290-B30","doi-asserted-by":"crossref","first-page":"1169","DOI":"10.1016\/j.str.2022.05.001","article-title":"Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction","volume":"30","author":"Weissenow","year":"2022","journal-title":"Structure"},{"key":"2024051208013367700_btae290-B31","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1016\/S0893-6080(05)80023-1","article-title":"Stacked generalization","volume":"5","author":"Wolpert","year":"1992","journal-title":"Neural Netw"},{"key":"2024051208013367700_btae290-B32","doi-asserted-by":"crossref","first-page":"eaay4697","DOI":"10.1126\/sciadv.aay4697","article-title":"Global crotonylome reveals CDYL-regulated rpa1 crotonylation in homologous recombination-mediated DNA repair","volume":"6","author":"Yu","year":"2020","journal-title":"Sci Adv"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae290\/57334068\/btae290.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/5\/btae290\/57553699\/btae290.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/5\/btae290\/57553699\/btae290.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,12]],"date-time":"2024-05-12T08:01:57Z","timestamp":1715500917000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae290\/7658304"}},"subtitle":[],"editor":[{"given":"Janet","family":"Kelso","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2024,4,25]]},"references-count":32,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,5,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae290","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,5,1]]},"published":{"date-parts":[[2024,4,25]]},"article-number":"btae290"}}