{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T11:42:47Z","timestamp":1753875767347,"version":"3.41.2"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2022,9,27]],"date-time":"2022-09-27T00:00:00Z","timestamp":1664236800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["313202225"],"award-info":[{"award-number":["313202225"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,11,19]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Glutarylation is a post-translational modification which plays an irreplaceable role in various functions of the cell. Therefore, it is very important to accurately identify the glutarylation substrates and its corresponding glutarylation sites. In recent years, many computational methods of glutarylation sites have emerged one after another, but there are still many limitations, among which noisy data and the class imbalance problem caused by the uncertainty of non-glutarylation sites are great challenges. In this study, we propose a new semi-supervised learning algorithm, named FCCCSR, to identify reliable non-glutarylation lysine sites from unlabeled samples as negative samples. FCCCSR first finds core objects from positive samples according to reverse nearest neighbor information, and then clusters core objects based on natural neighbor structure. Finally, reliable negative samples are selected according to clustering result. With FCCCSR algorithm, we propose a new method named FCCCSR_Glu for glutarylation sites identification. In this study, multi-view features are extracted and fused to describe peptides, including amino acid composition, BLOSUM62, amino acid factors and composition of k-spaced amino acid pairs. Then, reliable negative samples selected by FCCCSR and positive samples are combined to establish models and XGBoost optimized by differential evolution algorithm is used as the classifier. On the independent testing dataset, FCCCSR_Glu achieves 85.18%, 98.36%, 94.31% and 0.8651 in sensitivity, specificity, accuracy and Matthew\u2019s Correlation Coefficient, respectively, which is superior to state-of-the-art methods in predicting glutarylation sites. Therefore, FCCCSR_Glu can be a useful tool for glutarylation sites prediction and FCCCSR algorithm can effectively select reliable negative samples from unlabeled samples. The data and code are available on https:\/\/github.com\/xbbxhbc\/FCCCSR_Glu.git<\/jats:p>","DOI":"10.1093\/bib\/bbac421","type":"journal-article","created":{"date-parts":[[2022,9,28]],"date-time":"2022-09-28T06:28:02Z","timestamp":1664346482000},"source":"Crossref","is-referenced-by-count":5,"title":["FCCCSR_Glu: a semi-supervised learning model based on FCCCSR algorithm for prediction of glutarylation sites"],"prefix":"10.1093","volume":"23","author":[{"given":"Qiao","family":"Ning","sequence":"first","affiliation":[{"name":"Department of Information Science and Technology, Dalian Maritime University , Lingshui Street, 116026, Dalian, China"}]},{"given":"Zedong","family":"Qi","sequence":"additional","affiliation":[{"name":"Department of Information Science and Technology, Dalian Maritime University , Lingshui Street, 116026, Dalian, China"}]},{"given":"Yue","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Information Science and Technology, Dalian Maritime University , Lingshui Street, 116026, Dalian, China"}]},{"given":"Ansheng","family":"Deng","sequence":"additional","affiliation":[{"name":"Department of Information Science and Technology, Dalian Maritime University , Lingshui Street, 116026, Dalian, China"}]},{"given":"Chen","family":"Chen","sequence":"additional","affiliation":[{"name":"Naval Architecture and Ocean Engineering college, Dalian Maritime University , Lingshui Street, 116026, Dalian, China"}]}],"member":"286","published-online":{"date-parts":[[2022,9,27]]},"reference":[{"issue":"1","key":"2022112111115930700_ref1","doi-asserted-by":"crossref","DOI":"10.1038\/srep00090","article-title":"Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database","volume":"1","author":"Khoury","year":"2011","journal-title":"Sci Rep"},{"issue":"D1","key":"2022112111115930700_ref2","doi-asserted-by":"crossref","first-page":"D531","DOI":"10.1093\/nar\/gkt1093","article-title":"CPLM: A database of protein lysine modififications","volume":"42","author":"Liu","year":"2014","journal-title":"Nucleic Acids Res"},{"issue":"5","key":"2022112111115930700_ref3","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1074\/mcp.M111.015875","article-title":"Lysine succinylation and lysine malonylation in histones","volume":"11","author":"Xie","year":"2012","journal-title":"Mol Cell Proteomics"},{"key":"2022112111115930700_ref4","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1007\/978-3-319-50044-7_9","article-title":"The roles of SUMO in metabolic regulation","volume":"963","author":"Kamynina","year":"2017","journal-title":"Adv Exp Med Biol"},{"key":"2022112111115930700_ref5","doi-asserted-by":"crossref","first-page":"356","DOI":"10.1016\/j.jmgm.2017.07.022","article-title":"Prediction of lysine propionylation sites using biased SVM and incorporating four different sequence features into Chou\u2019s PseAAC","volume":"76","author":"Zhe","year":"2017","journal-title":"J Mol Gr Modell"},{"issue":"8","key":"2022112111115930700_ref6","doi-asserted-by":"crossref","first-page":"1857","DOI":"10.1016\/S0021-9258(18)96714-1","article-title":"The Methylation of lysine residues in protein","volume":"241","author":"Comb","year":"1966","journal-title":"Biol Chem"},{"issue":"1","key":"2022112111115930700_ref7","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1038\/nrendo.2015.181","article-title":"Protein acetylation in metabolism-metabolites and cofactors","volume":"12","author":"Menzies","year":"2016","journal-title":"Nat Rev Endocrinol"},{"issue":"4","key":"2022112111115930700_ref8","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1016\/j.cmet.2014.03.014","article-title":"Lysine glutarylation is a protein posttranslational modifification regulated by SIRT5","volume":"19","author":"Tan","year":"2014","journal-title":"Cell Metab"},{"key":"2022112111115930700_ref9","doi-asserted-by":"crossref","first-page":"1379","DOI":"10.1021\/acs.jproteome.5b00917","article-title":"Proteome-wide Lysine Glutarylation Profiling of the Mycobacterium tuberculosis H37Rv","volume":"15","author":"Xie","year":"2016","journal-title":"J Proteome Res"},{"issue":"1","key":"2022112111115930700_ref10","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1021\/acs.jproteome.0c00314","article-title":"iGlu_AdaBoost: Identification of Lysine Glutarylation Using the AdaBoost Classifier","volume":"20","author":"Dou","year":"2021","journal-title":"J Proteome Res"},{"issue":"9","key":"2022112111115930700_ref11","first-page":"1023","article-title":"Accurately Predicting Glutarylation Sites Using Sequential Bi-Peptide-Based Evolutionary Features","volume":"11","author":"Arafat","year":"2020","journal-title":"Gen"},{"key":"2022112111115930700_ref12","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1039\/C9MO00028C","article-title":"RF-GlutarySite: a random forest based predictor for glutarylation sites","volume":"15","author":"Albarakati","year":"2019","journal-title":"Mol Omics"},{"key":"2022112111115930700_ref13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.ab.2018.04.005","article-title":"Prediction of lysine glutarylation sites by maximum relevance minimum redundancy feature selection","volume":"550","author":"Ju","year":"2018","journal-title":"Anal Biochem"},{"key":"2022112111115930700_ref14","first-page":"941","article-title":"DeepGlut: A Deep Learning Framework for Prediction of Glutarylation Sites in Proteins","author":"Sen","year":"2020","journal-title":"IEEE Region 10 Symposium"},{"key":"2022112111115930700_ref15","doi-asserted-by":"crossref","DOI":"10.3389\/fgene.2022.885929","article-title":"ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites","volume":"13","author":"Indriani","year":"2022","journal-title":"Front Genet"},{"issue":"S13","key":"2022112111115930700_ref16","doi-asserted-by":"crossref","first-page":"384","DOI":"10.1186\/s12859-018-2394-9","article-title":"Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites","volume":"19","author":"Huang","year":"2019","journal-title":"BMC Bioinform"},{"key":"2022112111115930700_ref17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/07391102.2021.1962738","article-title":"iGluK-Deep: computational identification of lysine glutarylation sites using deep neural networks with general pseudo amino acid compositions","author":"Naseer","year":"2021","journal-title":"J Biomol Struct Dyn"},{"issue":"1","key":"2022112111115930700_ref18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-021-98458-y","article-title":"Computational identification of multiple lysine PTM sites by analyzing the instance hardness and feature importance","volume":"11","author":"Ahmed","year":"2021","journal-title":"Sci Rep"},{"issue":"5","key":"2022112111115930700_ref19","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1016\/j.jgg.2017.03.007","article-title":"PLMD: An updated data resource of protein lysine modifications","volume":"44","author":"Xu","year":"2017","journal-title":"J Genet Genomics"},{"key":"2022112111115930700_ref20","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"issue":"5","key":"2022112111115930700_ref21","doi-asserted-by":"crossref","first-page":"680","DOI":"10.1093\/bioinformatics\/btq003","article-title":"CD-HIT Suite: a web server for clustering and comparing biological sequences","volume":"26","author":"Huang","year":"2010","journal-title":"Bioinformatics"},{"issue":"6","key":"2022112111115930700_ref22","doi-asserted-by":"crossref","first-page":"1188","DOI":"10.1101\/gr.849004","article-title":"WebLogo: A sequence logo generator","volume":"14","author":"Crooks","year":"2004","journal-title":"Genome Res"},{"issue":"1","key":"2022112111115930700_ref23","article-title":"Prediction of mucintype Oglycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs","volume":"9","author":"Chen","year":"2008","journal-title":"Bioinformatics"},{"key":"2022112111115930700_ref24","article-title":"Classification of nuclear receptors based on amino acid composition and dipeptie composition","author":"Bhasin","year":"2017","journal-title":"J Biol Chem"},{"key":"2022112111115930700_ref25","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1093\/bib\/bbac037","article-title":"Adapt-Kcr: a novel deep learning framework for accurate prediction of lysine crotonylation sites based on learning embedding feature, attention architecture","volume":"23","author":"Li","year":"2022","journal-title":"Brief Bioinform"},{"issue":"18","key":"2022112111115930700_ref26","doi-asserted-by":"crossref","first-page":"6395","DOI":"10.1073\/pnas.0408677102","article-title":"Solving the protein sequence metric problem","volume":"102","author":"Atchley","year":"2005","journal-title":"Proc Natl Acad"},{"key":"2022112111115930700_ref27","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1016\/j.patrec.2016.05.007","article-title":"Natural neighbor: A self-adaptive neighborhood method without parameter k","volume":"80","author":"Zhu","year":"2016","journal-title":"Pattern RecognitLett"},{"key":"2022112111115930700_ref28","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1016\/j.knosys.2017.11.025","article-title":"Density core-based clustering algorithm with dynamic scanning radius","volume":"142","author":"Xie","year":"2017","journal-title":"Knowl-Based Syst"},{"key":"2022112111115930700_ref29","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.is.2019.04.001","article-title":"A novel clustering algorithm based on the natural reverse nearest neighbor structure","volume":"84","author":"Dai","year":"2019","journal-title":"Inf Syst"},{"key":"2022112111115930700_ref30","doi-asserted-by":"crossref","DOI":"10.1145\/2939672.2939785","volume-title":"XGBoost: A Scalable Tree Boosting System","author":"Chen","year":"2016"},{"key":"2022112111115930700_ref31","first-page":"1","article-title":"A novel method for Identification of Glutarylation sites combining Borderline-SMOTE with Tomek links technique in imbalanced data","volume":"PP","author":"Ning","year":"2021","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"issue":"4","key":"2022112111115930700_ref32","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1023\/A:1008202821328","article-title":"Differential Evolution - A Simple and Efficient Heuristic for global Optimization over Continuous Spaces","volume":"11","author":"Storn","year":"1997","journal-title":"J Glob Optim"},{"issue":"21","key":"2022112111115930700_ref33","doi-asserted-by":"crossref","first-page":"2590","DOI":"10.1093\/bioinformatics\/btl441","article-title":"PSoL: a positive sample only learning algorithm for finding non-coding RNA genes","volume":"22","author":"Wang","year":"2006","journal-title":"Bioinformatics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/6\/bbac421\/47143772\/bbac421.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/6\/bbac421\/47143772\/bbac421.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,21]],"date-time":"2022-11-21T11:17:34Z","timestamp":1669029454000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbac421\/6720406"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,27]]},"references-count":33,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2022,11,19]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbac421","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"type":"print","value":"1467-5463"},{"type":"electronic","value":"1477-4054"}],"subject":[],"published-other":{"date-parts":[[2022,11]]},"published":{"date-parts":[[2022,9,27]]},"article-number":"bbac421"}}