{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T21:08:27Z","timestamp":1768338507503,"version":"3.49.0"},"reference-count":55,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2018,2,6]],"date-time":"2018-02-06T00:00:00Z","timestamp":1517875200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["3132016306, 3132017048 and 3132017085"],"award-info":[{"award-number":["3132016306, 3132017048 and 3132017085"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012456","name":"National Social Science Foundation of China","doi-asserted-by":"crossref","award":["15CGL031"],"award-info":[{"award-number":["15CGL031"]}],"id":[{"id":"10.13039\/501100012456","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Program for Dalian High Level Talent Innovation Support","award":["2015R063"],"award-info":[{"award-number":["2015R063"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Protein O-GlcNAcylation (O-GlcNAc) is an important post-translational modification of serine (S)\/threonine (T) residues that involves multiple molecular and cellular processes. Recent studies have suggested that abnormal O-G1cNAcylation causes many diseases, such as cancer and various neurodegenerative diseases. With the available protein O-G1cNAcylation sites experimentally verified, it is highly desired to develop automated methods to rapidly and effectively identify O-GlcNAcylation sites. Although some computational methods have been proposed, their performance has been unsatisfactory, particularly in terms of prediction sensitivity.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this study, we developed an ensemble model O-GlcNAcPRED-II to identify potential O-GlcNAcylation sites. A K-means principal component analysis oversampling technique (KPCA) and fuzzy undersampling method (FUS) were first proposed and incorporated to reduce the proportion of the original positive and negative training samples. Then, rotation forest, a type of classifier-integrated system, was adopted to divide the eight types of feature space into several subsets using four sub-classifiers: random forest, k-nearest neighbour, naive Bayesian and support vector machine. We observed that O-GlcNAcPRED-II achieved a sensitivity of 81.05%, specificity of 95.91%, accuracy of 91.43% and Matthew\u2019s correlation coefficient of 0.7928 for five-fold cross-validation run 10 times. Additionally, the results obtained by O-GlcNAcPRED-II on two independent datasets also indicated that the proposed predictor outperformed five published prediction tools.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>http:\/\/121.42.167.206\/OGlcPred\/<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty039","type":"journal-article","created":{"date-parts":[[2018,2,6]],"date-time":"2018-02-06T04:12:44Z","timestamp":1517890364000},"page":"2029-2036","source":"Crossref","is-referenced-by-count":134,"title":["O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a <i>K<\/i>-means PCA oversampling technique"],"prefix":"10.1093","volume":"34","author":[{"given":"Cangzhi","family":"Jia","sequence":"first","affiliation":[{"name":"Department of Mathematics, Dalian Maritime University, Dalian, China"}]},{"given":"Yun","family":"Zuo","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Dalian Maritime University, Dalian, China"}]},{"given":"Quan","family":"Zou","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Tianjin University, Tianjin, China"}]}],"member":"286","published-online":{"date-parts":[[2018,2,6]]},"reference":[{"key":"2023012713380896500_bty039-B1","doi-asserted-by":"crossref","first-page":"1849","DOI":"10.1093\/bioinformatics\/btg249","article-title":"RVP-net: online prediction of real valued accessible surface area of proteins from single sequences","volume":"19","author":"Ahmad","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012713380896500_bty039-B2","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1002\/prot.10328","article-title":"Real value prediction of solvent accessibility from amino acid sequence","volume":"50","author":"Ahmad","year":"2003","journal-title":"Proteins"},{"key":"2023012713380896500_bty039-B3","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1093\/nar\/gkh131","article-title":"UniProt: the Universal Protein knowledgebase","volume":"32","author":"Apweiler","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023012713380896500_bty039-B4","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Rotation forest","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn"},{"key":"2023012713380896500_bty039-B5","doi-asserted-by":"crossref","first-page":"e67008.","DOI":"10.1371\/journal.pone.0067008","article-title":"Insilico platform for prediction of N-, O- and C-glycosites in eukaryotic protein sequences","volume":"8","author":"Chauhan","year":"2013","journal-title":"PLoS One"},{"key":"2023012713380896500_bty039-B6","doi-asserted-by":"crossref","first-page":"e68.","DOI":"10.1093\/nar\/gks1450","article-title":"iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition","volume":"41","author":"Chen","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023012713380896500_bty039-B7","article-title":"iATC-mHyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals","volume":"8","author":"Cheng","year":"2017","journal-title":"Oncotarget"},{"key":"2023012713380896500_bty039-B8","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1093\/bioinformatics\/btx387","article-title":"iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals","volume":"33","author":"Cheng","year":"2017","journal-title":"Bioinformatics"},{"key":"2023012713380896500_bty039-B9","doi-asserted-by":"crossref","first-page":"246.","DOI":"10.1002\/prot.1035","article-title":"Prediction of protein cellular attributes using pseudo amino acid composition","volume":"44","author":"Chou","year":"2001","journal-title":"Proteins Struct. Funct. Bioinf"},{"key":"2023012713380896500_bty039-B10","doi-asserted-by":"crossref","first-page":"1092","DOI":"10.1039\/c3mb25555g","article-title":"Some remarks on predicting multi-label attributes in molecular biosystems","volume":"9","author":"Chou","year":"2013","journal-title":"Mol. Biosyst"},{"key":"2023012713380896500_bty039-B11","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1016\/S0304-4165(99)00176-2","article-title":"O-GlcNAc and the control of gene expression","volume":"1473","author":"Comer","year":"1999","journal-title":"Biochim. Biophys. Acta"},{"key":"2023012713380896500_bty039-B12","doi-asserted-by":"crossref","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","article-title":"CD-HIT","volume":"28","author":"Fu","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012713380896500_bty039-B13","first-page":"310","article-title":"Prediction of glycosylation across the human proteome and the correlation to protein function","volume":"7","author":"Gupta","year":"2002","journal-title":"Pac. Symp. Biocomput. Pac. Symp. Biocomput"},{"key":"2023012713380896500_bty039-B14","doi-asserted-by":"crossref","first-page":"370","DOI":"10.1093\/nar\/27.1.370","article-title":"O-GLYCBASE: a revised database of O-glycosylated proteins","volume":"27","author":"Hansen","year":"1999","journal-title":"Nucleic Acids Res"},{"key":"2023012713380896500_bty039-B15","doi-asserted-by":"crossref","first-page":"D261.","DOI":"10.1093\/nar\/gkr1122","article-title":"PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post- translational modifications in man and mouse","volume":"40","author":"Hornbeck","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023012713380896500_bty039-B16","author":"Hosseinzadeh","year":"2016"},{"key":"2023012713380896500_bty039-B17","doi-asserted-by":"crossref","first-page":"680","DOI":"10.1093\/bioinformatics\/btq003","article-title":"CD-HIT Suite: a web server for clustering and comparing biological sequences","volume":"5","author":"Huang","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012713380896500_bty039-B18","doi-asserted-by":"crossref","first-page":"2909","DOI":"10.1039\/c3mb70326f","article-title":"O-GlcNAcPRED: a sensitive predictor to capture protein O-GlcNAcylation sites","volume":"9","author":"Jia","year":"2013","journal-title":"Mol. Biosyst"},{"key":"2023012713380896500_bty039-B19","doi-asserted-by":"crossref","first-page":"10410","DOI":"10.3390\/ijms150610410","article-title":"Prediction of protein S-nitrosylation sites based on adapted normal distribution bi-profile Bayes and Chou's pseudo amino acid composition","volume":"15","author":"Jia","year":"2014","journal-title":"Int. J. Mol. Sci"},{"key":"2023012713380896500_bty039-B20","doi-asserted-by":"crossref","first-page":"3133","DOI":"10.1093\/bioinformatics\/btw387","article-title":"pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC","volume":"32","author":"Jia","year":"2016","journal-title":"Bioinformatics"},{"key":"2023012713380896500_bty039-B21","doi-asserted-by":"crossref","first-page":"416","DOI":"10.1016\/j.bbapap.2013.12.002","article-title":"Validation of the reliability of computational O-GlcNAc prediction","volume":"1844","author":"Jochmann","year":"2014","journal-title":"BBA Proteins Proteomics"},{"key":"2023012713380896500_bty039-B22","doi-asserted-by":"crossref","first-page":"S10.","DOI":"10.1186\/1471-2105-16-S18-S10","article-title":"A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNActransferase substrate motifs","volume":"16","author":"Kao","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023012713380896500_bty039-B23","doi-asserted-by":"crossref","first-page":"622","DOI":"10.1093\/nar\/gkj083","article-title":"dbPTM: an information repository of protein post-translational modification","volume":"34","author":"Lee","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023012713380896500_bty039-B24","doi-asserted-by":"crossref","first-page":"1411","DOI":"10.1093\/bioinformatics\/btu852","article-title":"GlycoMine: a machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome","volume":"31","author":"Li","year":"2015","journal-title":"Bioinformatics"},{"key":"2023012713380896500_bty039-B25","doi-asserted-by":"crossref","first-page":"34595.","DOI":"10.1038\/srep34595","article-title":"GlycoMinestruct: a new bioinformatics tool for highly accurate mapping of the human N-linked and O-linked glycoproteomes by incorporating structural features","volume":"6","author":"Li","year":"2016","journal-title":"Sci. Rep"},{"key":"2023012713380896500_bty039-B26","doi-asserted-by":"crossref","first-page":"634","DOI":"10.1039\/c3mb25466f","article-title":"iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins","volume":"9","author":"Lin","year":"2013","journal-title":"Mol. BioSyst"},{"key":"2023012713380896500_bty039-B27","doi-asserted-by":"crossref","first-page":"12961","DOI":"10.1093\/nar\/gku1019","article-title":"iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition","volume":"42","author":"Lin","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023012713380896500_bty039-B28","doi-asserted-by":"crossref","first-page":"W65","DOI":"10.1093\/nar\/gkv458","article-title":"Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences","volume":"43","author":"Liu","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023012713380896500_bty039-B29","first-page":"67","article-title":"Pse-in-One 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein Sequences","volume":"9","author":"Liu","year":"2017","journal-title":"Nat. Sci"},{"key":"2023012713380896500_bty039-B30","doi-asserted-by":"crossref","first-page":"552","DOI":"10.2174\/1573406413666170515120507","article-title":"iPGK-PseAAC: identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general PseAAC","volume":"13","author":"Liu","year":"2017","journal-title":"Med. Chem"},{"key":"2023012713380896500_bty039-B31","doi-asserted-by":"crossref","first-page":"947416.","DOI":"10.1155\/2014\/947416","article-title":"iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach","volume":"2014","author":"Qiu","year":"2014","journal-title":"Biomed. Res. Int"},{"key":"2023012713380896500_bty039-B32","doi-asserted-by":"crossref","first-page":"1731","DOI":"10.1080\/07391102.2014.968875","article-title":"iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a grey system model","volume":"33","author":"Qiu","year":"2015","journal-title":"J. Biomol. Struct. Dyn"},{"key":"2023012713380896500_bty039-B33","doi-asserted-by":"crossref","first-page":"44310","DOI":"10.18632\/oncotarget.10027","article-title":"iHyd-PseCp: identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC","volume":"7","author":"Qiu","year":"2016","journal-title":"Oncotarget"},{"key":"2023012713380896500_bty039-B34","doi-asserted-by":"crossref","first-page":"3116","DOI":"10.1093\/bioinformatics\/btw380","article-title":"iPTM-mLys: identifying multiple lysine PTM sites and their different types","volume":"32","author":"Qiu","year":"2016","journal-title":"Bioinformatics"},{"key":"2023012713380896500_bty039-B35","article-title":"iPhos-PseEvo: identifying human phosphorylated proteins by incorporating evolutionary information into general PseAAC via grey system theory","volume":"36","author":"Qiu","year":"2017","journal-title":"Mol. Inf"},{"key":"2023012713380896500_bty039-B36","doi-asserted-by":"crossref","first-page":"1619","DOI":"10.1109\/TPAMI.2006.211","article-title":"Rotation forest: a new classifier ensemble method","volume":"28","author":"Rodriguez","year":"2006","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023012713380896500_bty039-B37","doi-asserted-by":"crossref","first-page":"e4920.","DOI":"10.1371\/journal.pone.0004920","article-title":"Computational identification of protein methylation sites through bi-Profile bayes feature extraction","volume":"4","author":"Shao","year":"2009","journal-title":"PLoS One"},{"key":"2023012713380896500_bty039-B38","doi-asserted-by":"crossref","first-page":"752","DOI":"10.1093\/bioinformatics\/btq043","article-title":"Cascleave: towards more accurate prediction of caspase substrate cleavage sites","volume":"26","author":"Song","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012713380896500_bty039-B39","doi-asserted-by":"crossref","first-page":"3308","DOI":"10.1016\/S0021-9258(17)43295-9","article-title":"Topography and polypeptide distribution of terminal N- acetylglucosamine residues on the surfaces of intact lymphocytes","volume":"259","author":"Torres","year":"1984","journal-title":"J. Biol. Chem"},{"key":"2023012713380896500_bty039-B40","doi-asserted-by":"crossref","first-page":"2760","DOI":"10.1021\/acs.jproteome.6b00304","article-title":"DAPPLE 2: a tool for the homology-based prediction of post-translational modification sites","volume":"15","author":"Trost","year":"2016","journal-title":"J. Proteome Res"},{"key":"2023012713380896500_bty039-B41","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1074\/mcp.M900268-MCP200","article-title":"Enrichment and site mapping of O-linked N-acetylglucosamine by a combination of chemical\/enzymatic tagging, photochemical cleavage, and electron transfer dissociation mass spectrometry","volume":"9","author":"Wang","year":"2010","journal-title":"Mol. Cell. Proteomics MCP"},{"key":"2023012713380896500_bty039-B42","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1186\/1471-2105-12-91","article-title":"dbOGAP-an integrated bioinformatics resource for protein O-GlcNAcylation","volume":"2","author":"Wang","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012713380896500_bty039-B43","doi-asserted-by":"crossref","first-page":"2849.","DOI":"10.1039\/C6MB00314A","article-title":"SOHPRED: a new bioinformatics tool for the characterization and prediction of human S-sulfenylation sites","volume":"12","author":"Wang","year":"2016","journal-title":"Mol. Biosyst"},{"key":"2023012713380896500_bty039-B44","doi-asserted-by":"crossref","first-page":"e3261.","DOI":"10.7717\/peerj.3261","article-title":"Prediction of post-translational modification sites using multiple kernel support vector machine","volume":"5","author":"Wang","year":"2017","journal-title":"PeerJ"},{"key":"2023012713380896500_bty039-B45","doi-asserted-by":"crossref","first-page":"635.","DOI":"10.1016\/j.jmb.2004.02.002","article-title":"Prediction and functional analysis of native disorder in proteins from the three kingdoms of life","volume":"337","author":"Ward","year":"2004","journal-title":"J. Mol. Biol"},{"key":"2023012713380896500_bty039-B46","doi-asserted-by":"crossref","first-page":"3287","DOI":"10.1039\/c1mb05232b","article-title":"iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites","volume":"7","author":"Wu","year":"2011","journal-title":"Mol. BioSyst"},{"key":"2023012713380896500_bty039-B47","doi-asserted-by":"crossref","first-page":"S1.","DOI":"10.1186\/1471-2105-15-S16-S1","article-title":"Characterization and identification of protein O-GlcNAcylation sites with substrate specificity","volume":"15","author":"Wu","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023012713380896500_bty039-B48","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1016\/j.ab.2013.01.019","article-title":"iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types","volume":"436","author":"Xiao","year":"2013","journal-title":"Anal. Biochem"},{"key":"2023012713380896500_bty039-B49","doi-asserted-by":"crossref","first-page":"e55844","DOI":"10.1371\/journal.pone.0055844","article-title":"iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition","volume":"8","author":"Xu","year":"2013","journal-title":"PLoS One"},{"key":"2023012713380896500_bty039-B50","doi-asserted-by":"crossref","first-page":", e171","DOI":"10.7717\/peerj.171","article-title":"iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins","volume":"1","author":"Xu","year":"2013","journal-title":"Peerj"},{"key":"2023012713380896500_bty039-B51","doi-asserted-by":"crossref","first-page":"7594","DOI":"10.3390\/ijms15057594","article-title":"iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition","volume":"15","author":"Xu","year":"2014","journal-title":"Int. J. Mol. Sci"},{"key":"2023012713380896500_bty039-B52","doi-asserted-by":"crossref","first-page":"e105018","DOI":"10.1371\/journal.pone.0105018","article-title":"iNitro-Tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition","volume":"9","author":"Xu","year":"2014","journal-title":"PLoS One"},{"key":"2023012713380896500_bty039-B53","doi-asserted-by":"crossref","first-page":"544.","DOI":"10.2174\/1573406413666170419150052","article-title":"iPreny-PseAAC: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into PseAAC","volume":"13","author":"Xu","year":"2017","journal-title":"Med. Chem"},{"key":"2023012713380896500_bty039-B54","doi-asserted-by":"crossref","first-page":"11204","DOI":"10.3390\/ijms150711204","article-title":"PSNO: predicting cysteine S-nitrosylation sites by incorporating various sequence-derived features into the general form of Chou's PseAAC","volume":"15","author":"Zhang","year":"2014","journal-title":"Int. J. Mol. Sci"},{"key":"2023012713380896500_bty039-B55","doi-asserted-by":"crossref","first-page":"524.","DOI":"10.1016\/j.jtbi.2015.06.026","article-title":"PGlcS: prediction of protein O-GlcNAcylation sites with multiple features and analysis","volume":"380","author":"Zhao","year":"2015","journal-title":"J. Theor. Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/12\/2029\/48935785\/bioinformatics_34_12_2029.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/12\/2029\/48935785\/bioinformatics_34_12_2029.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T14:19:16Z","timestamp":1674829156000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/12\/2029\/4840731"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,2,6]]},"references-count":55,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2018,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty039","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,6,15]]},"published":{"date-parts":[[2018,2,6]]}}}