{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T20:32:56Z","timestamp":1780432376001,"version":"3.54.1"},"reference-count":60,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2022,9,14]],"date-time":"2022-09-14T00:00:00Z","timestamp":1663113600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Axioms"],"abstract":"<jats:p>Post-translational glycosylation and glycation are common types of protein post-translational modifications (PTMs) in which glycan binds to protein enzymatically or nonenzymatically, respectively. They are associated with various diseases such as coronavirus, Alzheimer\u2019s, cancer, and diabetes diseases. Identifying glycosylation and glycation sites is significant to understanding their biological mechanisms. However, utilizing experimental laboratory tools to identify PTM sites is time-consuming and costly. In contrast, computational methods based on machine learning are becoming increasingly essential for PTM site prediction due to their higher performance and lower cost. In recent years, advances in Transformer-based Language Models based on deep learning have been transferred from Natural Language Processing (NLP) into the proteomics field by developing language models for protein sequence representation known as Protein Language Models (PLMs). In this work, we proposed a novel method, PTG-PLM, for improving the performance of PTM glycosylation and glycation site prediction. PTG-PLM is based on convolutional neural networks (CNNs) and embedding extracted from six recent PLMs including ProtBert-BFD, ProtBert, ProtAlbert, ProtXlnet, ESM-1b, and TAPE. The model is trained and evaluated on two public datasets for glycosylation and glycation site prediction. The results show that PTG-PLM based on ESM-1b and ProtBert-BFD has better performance than PTG-PLM based on the other PLMs. Comparison results with the existing tools and representative supervised learning methods show that PTG-PLM surpasses the other models for glycosylation and glycation site prediction. The outstanding performance results of PTG-PLM indicate that it can be used to predict the sites of the other types of PTMs.<\/jats:p>","DOI":"10.3390\/axioms11090469","type":"journal-article","created":{"date-parts":[[2022,9,14]],"date-time":"2022-09-14T20:50:45Z","timestamp":1663188645000},"page":"469","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["PTG-PLM: Predicting Post-Translational Glycosylation and Glycation Sites Using Protein Language Models and Deep Learning"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0725-9923","authenticated-orcid":false,"given":"Alhasan","family":"Alkuhlani","sequence":"first","affiliation":[{"name":"Faculty of Computer and Information Technology, Sana\u2019a University, Sana\u2019a 1247, Yemen"},{"name":"Faculty of Computer and Information Science, Ain Shams University, Cairo 11566, Egypt"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7816-3518","authenticated-orcid":false,"given":"Walaa","family":"Gad","sequence":"additional","affiliation":[{"name":"Faculty of Computer and Information Science, Ain Shams University, Cairo 11566, Egypt"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9655-3229","authenticated-orcid":false,"given":"Mohamed","family":"Roushdy","sequence":"additional","affiliation":[{"name":"Faculty of Computers and Information Technology, Future University in Egypt, New Cairo 11835, Egypt"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4727-0089","authenticated-orcid":false,"given":"Michael Gr.","family":"Voskoglou","sequence":"additional","affiliation":[{"name":"Department of Applied Mathematics, Graduate Technological Educational Institute of Western Greece, 22334 Patras, Greece"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5013-4339","authenticated-orcid":false,"given":"Abdel-badeeh M.","family":"Salem","sequence":"additional","affiliation":[{"name":"Faculty of Computer and Information Science, Ain Shams University, Cairo 11566, Egypt"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,9,14]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1016\/j.compbiolchem.2017.10.004","article-title":"Predicting lysine glycation sites using bi-profile Bayes feature extraction","volume":"71","author":"Ju","year":"2017","journal-title":"Comput. Biol. Chem."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/bs.pmbts.2018.12.002","article-title":"Glycan-based biomarkers for diagnosis of cancers and other diseases: Past, present, and future","volume":"Volume 162","author":"Hu","year":"2019","journal-title":"Progress in Molecular Biology and Translational Science"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Regan, P., McClean, P.L., Smyth, T., and Doherty, M. (2019). Early Stage Glycosylation Biomarkers in Alzheimer\u2019s Disease. Medicines, 6.","DOI":"10.3390\/medicines6030092"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-020-16567-0","article-title":"Vulnerabilities in coronavirus glycan shields despite extensive glycosylation","volume":"11","author":"Watanabe","year":"2020","journal-title":"Nat. Commun."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"777","DOI":"10.1002\/prot.25511","article-title":"iProtGly-SS: Identifying protein glycation sites using sequence and structure based features","volume":"86","author":"Islam","year":"2018","journal-title":"Proteins Struct. Funct. Bioinform."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"142368","DOI":"10.1109\/ACCESS.2019.2944411","article-title":"DeepGly: A deep learning framework with recurrent and convolutional neural networks to identify protein glycation sites from imbalanced data","volume":"7","author":"Chen","year":"2019","journal-title":"IEEE Access"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Chauhan, J.S., Rao, A., and Raghava, G.P. (2013). In silico platform for prediction of N-, O-and C-glycosites in eukaryotic protein sequences. PloS ONE, 8.","DOI":"10.1371\/journal.pone.0067008"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"2749","DOI":"10.1093\/bioinformatics\/bty1043","article-title":"PredGly: Predicting lysine glycation sites for Homo sapiens based on XGboost feature optimization","volume":"35","author":"Yu","year":"2019","journal-title":"Bioinformatics"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1186\/s12859-018-2547-x","article-title":"GlyStruct: Glycation prediction using structural properties of amino acid residues","volume":"19","author":"Reddy","year":"2019","journal-title":"BMC Bioinform."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Li, F., Zhang, Y., Purcell, A.W., Webb, G.I., Chou, K.C., Lithgow, T., Li, C., and Song, J. (2019). Positive-unlabelled learning of glycosylation sites in the human proteome. Bmc Bioinform., 20.","DOI":"10.1186\/s12859-019-2700-1"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Hamby, S.E., and Hirst, J.D. (2008). Prediction of glycosylation sites using random forests. Bmc Bioinform., 9.","DOI":"10.1186\/1471-2105-9-500"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Chauhan, J.S., Bhat, A.H., Raghava, G.P., and Rao, A. (2012). GlycoPP: A webserver for prediction of N-and O-glycosites in prokaryotic protein sequences. PloS ONE, 7.","DOI":"10.1371\/journal.pone.0040155"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1411","DOI":"10.1093\/bioinformatics\/btu852","article-title":"GlycoMine: A machine learning-based approach for predicting N-, C-and O-linked glycosylation in the human proteome","volume":"31","author":"Li","year":"2015","journal-title":"Bioinformatics"},{"key":"ref_14","first-page":"1","article-title":"GlycoMine struct: A new bioinformatics tool for highly accurate mapping of the human N-linked and O-linked glycoproteomes by incorporating structural features","volume":"6","author":"Li","year":"2016","journal-title":"Sci. Rep."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"4140","DOI":"10.1093\/bioinformatics\/btz215","article-title":"SPRINT-Gly: Predicting N-and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties","volume":"35","author":"Taherzadeh","year":"2019","journal-title":"Bioinformatics"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-019-52341-z","article-title":"N-GlyDE: A two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding","volume":"9","author":"Pitti","year":"2019","journal-title":"Sci. Rep."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"165944","DOI":"10.1109\/ACCESS.2020.3022629","article-title":"N-GlycoGo: Predicting Protein N-Glycosylation Sites on Imbalanced Data Sets by Using Heterogeneous and Comprehensive Strategy","volume":"8","author":"Chien","year":"2020","journal-title":"IEEE Access"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"12702","DOI":"10.1109\/ACCESS.2022.3146395","article-title":"PUStackNGly: Positive-Unlabeled and Stacking Learning for N-Linked Glycosylation Site Prediction","volume":"10","author":"Alkuhlani","year":"2022","journal-title":"IEEE Access"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"844","DOI":"10.1093\/glycob\/cwl009","article-title":"Analysis and prediction of mammalian protein glycation","volume":"16","author":"Johansen","year":"2006","journal-title":"Glycobiology"},{"key":"ref_20","first-page":"561547","article-title":"Predict and analyze protein glycation sites with the mRMR and IFS methods","volume":"2015","author":"Liu","year":"2015","journal-title":"Biomed Res. Int."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.gene.2016.11.021","article-title":"Gly-PseAAC: Identifying protein lysine glycation through sequences","volume":"602","author":"Xu","year":"2017","journal-title":"Gene"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhao, X., Zhao, X., Bao, L., Zhang, Y., Dai, J., and Yin, M. (2017). Glypre: In silico prediction of protein glycation sites by fusing multiple features and support vector machine. Molecules, 22.","DOI":"10.3390\/molecules22111891"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Liu, Y., Liu, Y., Wang, G., Cheng, Y., Bi, S., and Zhu, X. (2022). BERT-Kgly: A Bidirectional Encoder Representations from Transformers (BERT)-based Model for Predicting Lysine Glycation Site for Homo sapiens. Front. Bioinform., 12.","DOI":"10.3389\/fbinf.2022.834153"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"774","DOI":"10.2174\/1574893615666210108094847","article-title":"Intelligent Techniques Analysis for Glycosylation Site Prediction","volume":"16","author":"Alkuhlani","year":"2021","journal-title":"Curr. Bioinform."},{"key":"ref_25","unstructured":"Alkuhlani, A., Gad, W., Roushdy, M., and Salem, A.B.M. (2021). Artificial Intelligence for Glycation Site Prediction. IEICE Proc. Ser., 64."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Marquet, C., Heinzinger, M., Olenyi, T., Dallago, C., Erckert, K., Bernhofer, M., Nechaev, D., and Rost, B. (2021). Embeddings from protein language models predict conservation and variant effects. Hum. Genet., 1\u201319.","DOI":"10.21203\/rs.3.rs-584804\/v2"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"941","DOI":"10.1093\/bioinformatics\/btab801","article-title":"NetSolP: Predicting protein solubility in Escherichia coli using language models","volume":"38","author":"Thumuluri","year":"2022","journal-title":"Bioinformatics"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"654","DOI":"10.1016\/j.cels.2021.05.017","article-title":"Learning the protein language: Evolution, structure, and function","volume":"12","author":"Bepler","year":"2021","journal-title":"Cell Syst."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1750","DOI":"10.1016\/j.csbj.2021.03.022","article-title":"The language of proteins: NLP, machine learning & protein sequences","volume":"19","author":"Ofer","year":"2021","journal-title":"Comput. Struct. Biotechnol. J."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"107398","DOI":"10.1016\/j.ymssp.2020.107398","article-title":"1D convolutional neural networks and applications: A survey","volume":"151","author":"Kiranyaz","year":"2021","journal-title":"Mech. Syst. Signal Process."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s10916-018-1003-9","article-title":"A survey of data mining and deep learning in bioinformatics","volume":"42","author":"Lan","year":"2018","journal-title":"J. Med. Syst."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"2673","DOI":"10.1109\/78.650093","article-title":"Bidirectional recurrent neural networks","volume":"45","author":"Schuster","year":"1997","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_34","first-page":"50","article-title":"Data Augmentation for Arabic Speech Recognition Based on End-to-End Deep Learning","volume":"21","author":"Alsayadi","year":"2021","journal-title":"Int. J. Intell. Comput. Inf. Sci."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"406","DOI":"10.1039\/D0ME00161A","article-title":"Sequence-based peptide identification, generation, and property prediction with deep learning: A review","volume":"6","author":"Chen","year":"2021","journal-title":"Mol. Syst. Des. Eng."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12920-020-0677-2","article-title":"Convolutional neural network models for cancer type prediction based on gene expression","volume":"13","author":"Mostavi","year":"2020","journal-title":"BMC Med. Genom."},{"key":"ref_37","first-page":"1","article-title":"Performance improvement for a 2D convolutional neural network by using SSC encoding on protein\u2013protein interaction tasks","volume":"22","author":"Wang","year":"2021","journal-title":"BMC Bioinform."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-021-03431-4","article-title":"Protein embeddings and deep learning predict binding residues for various ligand classes","volume":"11","author":"Littmann","year":"2021","journal-title":"Sci. Rep."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"102844","DOI":"10.1016\/j.jvcir.2020.102844","article-title":"Protein secondary structure prediction based on integration of CNN and LSTM model","volume":"71","author":"Cheng","year":"2020","journal-title":"J. Vis. Commun. Image Represent"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"2766","DOI":"10.1093\/bioinformatics\/bty1051","article-title":"DeepPhos: Prediction of protein phosphorylation sites with deep learning","volume":"35","author":"Luo","year":"2019","journal-title":"Bioinformatics"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"3909","DOI":"10.1093\/bioinformatics\/btx496","article-title":"MusiteDeep: A deep-learning framework for general and kinase-specific phosphorylation site prediction","volume":"33","author":"Wang","year":"2017","journal-title":"Bioinformatics"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1016\/j.procs.2021.12.273","article-title":"Protein post-translational modification site prediction using deep learning","volume":"198","author":"Deng","year":"2022","journal-title":"Procedia Comput. Sci."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"9923112","DOI":"10.1155\/2021\/9923112","article-title":"LSTMCNNsucc: A Bidirectional LSTM and CNN-Based Deep Learning Method for Predicting Lysine Succinylation Sites","volume":"2021","author":"Huang","year":"2021","journal-title":"Biomed Res. Int."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"D204","DOI":"10.1093\/nar\/gku989","article-title":"UniProt: A hub for protein information","volume":"43","author":"Consortium","year":"2015","journal-title":"Nucleic Acids Res."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","article-title":"CD-HIT: Accelerated for clustering the next-generation sequencing data","volume":"28","author":"Fu","year":"2012","journal-title":"Bioinformatics"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Elnaggar, A., Heinzinger, M., Dallago, C., Rihawi, G., Wang, Y., Jones, L., Gibbs, T., Feher, T., Angerer, C., and Steinegger, M. (2020). ProtTrans: Towards cracking the language of Life\u2019s code through self-supervised deep learning and high performance computing. arXiv.","DOI":"10.1101\/2020.07.12.199554"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_48","first-page":"9689","article-title":"Evaluating protein transfer learning with TAPE","volume":"32","author":"Rao","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_49","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_50","unstructured":"Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv."},{"key":"ref_51","unstructured":"Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., and Le, Q.V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst., 32."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-018-04964-5","article-title":"Clustering huge protein sequence sets in linear time","volume":"9","author":"Steinegger","year":"2018","journal-title":"Nat. Commun."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"926","DOI":"10.1093\/bioinformatics\/btu739","article-title":"UniRef clusters: A comprehensive and scalable alternative for improving sequence similarity searches","volume":"31","author":"Suzek","year":"2015","journal-title":"Bioinformatics"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"D222","DOI":"10.1093\/nar\/gkt1223","article-title":"Pfam: The protein families database","volume":"42","author":"Finn","year":"2014","journal-title":"Nucleic Acids Res."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Patil, A., and Rane, M. (2020, January 15\u201316). Convolutional neural networks: An overview and its applications in pattern recognition. Proceedings of the International Conference on Information and Communication Technology for Intelligent Systems, Ahmedabad, India.","DOI":"10.1007\/978-981-15-7078-0_3"},{"key":"ref_56","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1001\/jama.2016.7653","article-title":"Logistic regression: Relating patient characteristics to outcomes","volume":"316","author":"Tolles","year":"2016","journal-title":"JAMA"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."}],"container-title":["Axioms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2075-1680\/11\/9\/469\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:30:52Z","timestamp":1760142652000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2075-1680\/11\/9\/469"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,14]]},"references-count":60,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2022,9]]}},"alternative-id":["axioms11090469"],"URL":"https:\/\/doi.org\/10.3390\/axioms11090469","relation":{},"ISSN":["2075-1680"],"issn-type":[{"value":"2075-1680","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,14]]}}}