{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,1]],"date-time":"2026-03-01T04:25:47Z","timestamp":1772339147951,"version":"3.50.1"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2024,10,24]],"date-time":"2024-10-24T00:00:00Z","timestamp":1729728000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"U.H.D"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>O-linked glycosylation, an essential post-translational modification process in Homo sapiens, involves attaching sugar moieties to the oxygen atoms of serine and\/or threonine residues. It influences various biological and cellular functions. While threonine or serine residues within protein sequences are potential sites for O-linked glycosylation, not all serine and\/or threonine residues undergo this modification, underscoring the importance of characterizing its occurrence. This study presents a novel approach for predicting intracellular and extracellular O-linked glycosylation events on proteins, which are crucial for comprehending cellular processes. Two base multi-layer perceptron models were trained by leveraging a stacked generalization framework. These base models respectively use ProtT5 and Ankh O-linked glycosylation site-specific embeddings whose combined predictions are used to train the meta-multi-layer perceptron model. Trained on extensive O-linked glycosylation datasets, the stacked-generalization model demonstrated high predictive performance on independent test datasets. Furthermore, the study emphasizes the distinction between nucleocytoplasmic and extracellular O-linked glycosylation, offering insights into their functional implications that were overlooked in previous studies. By integrating the protein language model\u2019s embedding with stacked generalization techniques, this approach enhances predictive accuracy of O-linked glycosylation events and illuminates the intricate roles of O-linked glycosylation in proteomics, potentially accelerating the discovery of novel glycosylation sites.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Stack-OglyPred-PLM produces Sensitivity, Specificity, Matthews Correlation Coefficient, and Accuracy of 90.50%, 89.60%, 0.464, and 89.70%, respectively on a benchmark NetOGlyc-4.0 independent test dataset. These results demonstrate that Stack-OglyPred-PLM is a robust computational tool to predict O-linked glycosylation sites in proteins.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The developed tool, programs, training, and test dataset are available at https:\/\/github.com\/PakhrinLab\/Stack-OglyPred-PLM.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae643","type":"journal-article","created":{"date-parts":[[2024,10,24]],"date-time":"2024-10-24T18:45:53Z","timestamp":1729795553000},"source":"Crossref","is-referenced-by-count":12,"title":["Prediction of human <i>O-<\/i>linked glycosylation sites using stacked generalization and embeddings from pre-trained protein language model"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-3310-2939","authenticated-orcid":false,"given":"Subash Chandra","family":"Pakhrin","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering Technology, University of Houston-Downtown , Houston, TX 77002,","place":["United States"]}]},{"given":"Neha","family":"Chauhan","sequence":"additional","affiliation":[{"name":"School of Computing, Wichita State University , Wichita, KS 67260,","place":["United States"]}]},{"given":"Salman","family":"Khan","sequence":"additional","affiliation":[{"name":"Department of Computer Science, The University of Texas at Austin , Austin, TX 78712,","place":["United States"]}]},{"given":"Jamie","family":"Upadhyaya","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering Technology, University of Houston-Downtown , Houston, TX 77002,","place":["United States"]}]},{"given":"Moriah Rene","family":"Beck","sequence":"additional","affiliation":[{"name":"Department of Chemistry and Biochemistry, Wichita State University , Wichita, KS 67260,","place":["United States"]}]},{"given":"Eduardo","family":"Blanco","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Arizona , Tucson, AZ 85721,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2024,10,24]]},"reference":[{"key":"2024111117060445800_btae643-B1","doi-asserted-by":"crossref","first-page":"e48885","DOI":"10.15252\/embr.201948885","article-title":"O-glycan initiation directs distinct biological pathways and controls epithelial differentiation","volume":"21","author":"Bagdonaite","year":"2020","journal-title":"EMBO Rep"},{"key":"2024111117060445800_btae643-B2","doi-asserted-by":"crossref","first-page":"456","DOI":"10.1038\/nature12723","article-title":"The heterotaxy gene GALNT11 glycosylates Notch to orchestrate cilia type and laterality","volume":"504","author":"Boskovski","year":"2013","journal-title":"Nature"},{"key":"2024111117060445800_btae643-B3","doi-asserted-by":"crossref","first-page":"1616","DOI":"10.1074\/mcp.M114.046862","article-title":"Probing the O-glycoproteome of gastric cancer cell lines for biomarker discovery","volume":"14","author":"Campos","year":"2015","journal-title":"Mol Cell Proteomics"},{"key":"2024111117060445800_btae643-B4","doi-asserted-by":"crossref","first-page":"438","DOI":"10.1186\/1471-2105-8-438","article-title":"Glycosylation site prediction using ensembles of support vector machine classifiers","volume":"8","author":"Caragea","year":"2007","journal-title":"BMC Bioinform"},{"key":"2024111117060445800_btae643-B5","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1007\/s11517-015-1268-9","article-title":"Prediction of O-glycosylation sites based on multi-scale composition of amino acids and feature selection","volume":"53","author":"Chen","year":"2015","journal-title":"Med Biol Eng Comput"},{"key":"2024111117060445800_btae643-B6","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1186\/1471-2105-9-101","article-title":"Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs","volume":"9","author":"Chen","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2024111117060445800_btae643-B7","doi-asserted-by":"crossref","DOI":"10.1101\/2023.01.16.524265","article-title":"Ankh\u2625: optimized protein language model unlocks general-purpose modelling","author":"Elnaggar","year":"2023"},{"key":"2024111117060445800_btae643-B8","doi-asserted-by":"crossref","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"ProtTrans: toward understanding the language of life through self-supervised learning","volume":"44","author":"Elnaggar","year":"2022","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2024111117060445800_btae643-B9","doi-asserted-by":"crossref","first-page":"825","DOI":"10.1146\/annurev-biochem-060608-102511","article-title":"Cross talk between O-GlcNAcylation and phosphorylation: roles in signaling, transcription, and chronic disease","volume":"80","author":"Hart","year":"2011","journal-title":"Annu Rev Biochem"},{"key":"2024111117060445800_btae643-B10","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1021\/acs.jproteome.3c00458","article-title":"O-GlcNAcPRED-DL: prediction of protein O-GlcNAcylation sites based on an ensemble model of deep learning","volume":"23","author":"Hu","year":"2024","journal-title":"J Proteome Res"},{"key":"2024111117060445800_btae643-B11","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1016\/j.gpb.2020.05.003","article-title":"OGP: a repository of experimentally characterized O-glycoproteins to facilitate studies on O-glycosylation","volume":"19","author":"Huang","year":"2021","journal-title":"Genomics Proteomics Bioinf"},{"key":"2024111117060445800_btae643-B12","doi-asserted-by":"crossref","first-page":"680","DOI":"10.1093\/bioinformatics\/btq003","article-title":"CD-HIT suite: a web server for clustering and comparing biological sequences","volume":"26","author":"Huang","year":"2010","journal-title":"Bioinformatics"},{"key":"2024111117060445800_btae643-B13","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1093\/glycob\/cwh151","article-title":"Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites","volume":"15","author":"Julenius","year":"2005","journal-title":"Glycobiology"},{"key":"2024111117060445800_btae643-B14","doi-asserted-by":"crossref","first-page":"1411","DOI":"10.1093\/bioinformatics\/btu852","article-title":"GlycoMine: a machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome","volume":"31","author":"Li","year":"2015","journal-title":"Bioinformatics"},{"key":"2024111117060445800_btae643-B15","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1016\/j.compbiolchem.2006.02.002","article-title":"Predicting O-glycosylation sites in mammalian proteins by using SVMs","volume":"30","author":"Li","year":"2006","journal-title":"Comput Biol Chem"},{"key":"2024111117060445800_btae643-B16","doi-asserted-by":"crossref","first-page":"D471","DOI":"10.1093\/nar\/gkab1017","article-title":"dbPTM in 2022: an updated database for exploring regulatory networks and functional associations of protein post-translational modifications","volume":"50","author":"Li","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2024111117060445800_btae643-B17","doi-asserted-by":"crossref","first-page":"23916","DOI":"10.1038\/s41598-021-03431-4","article-title":"Protein embeddings and deep learning predict binding residues for various ligand classes","volume":"11","author":"Littmann","year":"2021","journal-title":"Sci Rep"},{"key":"2024111117060445800_btae643-B18","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten","year":"2008","journal-title":"Mach Learn Res"},{"key":"2024111117060445800_btae643-B19","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1093\/glycob\/cwaa067","article-title":"ISOGlyP: de novo prediction of isoform-specific mucin-type O-glycosylation","volume":"31","author":"Mohl","year":"2021","journal-title":"Glycobiology"},{"key":"2024111117060445800_btae643-B20","doi-asserted-by":"crossref","first-page":"6257","DOI":"10.1038\/s41467-022-33806-8","article-title":"Global mapping of GalNAc-T isoform-specificities and O-glycosylation site-occupancy in a tissue-forming human cell line","volume":"13","author":"Nielsen","year":"2022","journal-title":"Nat Commun"},{"key":"2024111117060445800_btae643-B21","doi-asserted-by":"crossref","first-page":"855","DOI":"10.1016\/j.cell.2006.08.019","article-title":"Glycosylation in cellular mechanisms of health and disease","volume":"126","author":"Ohtsubo","year":"2006","journal-title":"Cell"},{"key":"2024111117060445800_btae643-B22","author":"Pakhrin","year":"2022"},{"key":"2024111117060445800_btae643-B23","doi-asserted-by":"crossref","first-page":"7314","DOI":"10.3390\/molecules26237314","article-title":"DeepNGlyPred: a deep neural network-based approach for human N-linked glycosylation site prediction","volume":"26","author":"Pakhrin","year":"2021","journal-title":"Molecules"},{"key":"2024111117060445800_btae643-B24","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1093\/glycob\/cwad033","article-title":"LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model","volume":"33","author":"Pakhrin","year":"2023","journal-title":"Glycobiology"},{"key":"2024111117060445800_btae643-B25","doi-asserted-by":"crossref","first-page":"2548","DOI":"10.1021\/acs.jproteome.2c00667","article-title":"LMPhosSite: a deep learning-based approach for general protein phosphorylation site prediction using embeddings from the local window sequence and pretrained protein language model","volume":"22","author":"Pakhrin","year":"2023","journal-title":"J Proteome Res"},{"key":"2024111117060445800_btae643-B26","doi-asserted-by":"crossref","first-page":"lqae011","DOI":"10.1093\/nargab\/lqae011","article-title":"SumoPred-PLM: human SUMOylation and SUMO2\/3 sites prediction using pre-trained protein language model","volume":"6","author":"Palacios","year":"2024","journal-title":"NAR Genom Bioinform"},{"key":"2024111117060445800_btae643-B27","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J Mach Learn Res"},{"key":"2024111117060445800_btae643-B28","doi-asserted-by":"crossref","first-page":"1478","DOI":"10.1038\/emboj.2013.79","article-title":"Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology","volume":"32","author":"Steentoft","year":"2013","journal-title":"EMBO J"},{"key":"2024111117060445800_btae643-B29","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1038\/s41592-019-0437-4","article-title":"Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold","volume":"16","author":"Steinegger","year":"2019","journal-title":"Nat Methods"},{"key":"2024111117060445800_btae643-B30","doi-asserted-by":"crossref","first-page":"4140","DOI":"10.1093\/bioinformatics\/btz215","article-title":"SPRINT-Gly: predicting N- and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties","volume":"35","author":"Taherzadeh","year":"2019","journal-title":"Bioinformatics"},{"key":"2024111117060445800_btae643-B31","doi-asserted-by":"crossref","first-page":"W228","DOI":"10.1093\/nar\/gkac278","article-title":"DeepLoc 2.0: multi-label subcellular localization prediction using protein language models","volume":"50","author":"Thumuluri","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2024111117060445800_btae643-B32","doi-asserted-by":"crossref","first-page":"6921","DOI":"10.1074\/jbc.R112.418558","article-title":"Mucin-type O-glycosylation during development","volume":"288","author":"Tran","year":"2013","journal-title":"J Biol Chem"},{"key":"2024111117060445800_btae643-B33","doi-asserted-by":"crossref","first-page":"D480","DOI":"10.1093\/nar\/gkaa1100","article-title":"UniProt: the universal protein knowledgebase in 2021","volume":"49","author":"UniProt","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2024111117060445800_btae643-B34","doi-asserted-by":"crossref","first-page":"776","DOI":"10.1038\/nchembio.1403","article-title":"Adaptive immune activation: glycosylation does matter","volume":"9","author":"Wolfert","year":"2013","journal-title":"Nat Chem Biol"},{"key":"2024111117060445800_btae643-B35","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1016\/S0893-6080(05)80023-1","article-title":"Stacked generalization","volume":"5","author":"Wolpert","year":"1992","journal-title":"Neural Netw"},{"key":"2024111117060445800_btae643-B36","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1007\/978-3-642-21802-6_72","volume-title":"Advanced Research on Computer Education, Simulation and Modeling","author":"Yang","year":"2011"},{"key":"2024111117060445800_btae643-B37","doi-asserted-by":"crossref","first-page":"2150029","DOI":"10.1142\/S0219720021500293","article-title":"O-glycosylation site prediction for Homo sapiens by combining properties and sequence features with support vector machine","volume":"20","author":"Zhu","year":"2022","journal-title":"J Bioinform Comput Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae643\/60071984\/btae643.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/11\/btae643\/60592567\/btae643.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/11\/btae643\/60592567\/btae643.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,11]],"date-time":"2024-11-11T17:06:26Z","timestamp":1731344786000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae643\/7840257"}},"subtitle":[],"editor":[{"given":"Xin","family":"Gao","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,10,24]]},"references-count":37,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2024,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae643","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,11]]},"published":{"date-parts":[[2024,10,24]]},"article-number":"btae643"}}