{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T16:40:26Z","timestamp":1775148026511,"version":"3.50.1"},"reference-count":48,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2021,4,19]],"date-time":"2021-04-19T00:00:00Z","timestamp":1618790400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2020YFA0908700"],"award-info":[{"award-number":["2020YFA0908700"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62072003"],"award-info":[{"award-number":["62072003"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61672037"],"award-info":[{"award-number":["61672037"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["31501169"],"award-info":[{"award-number":["31501169"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["11835014"],"award-info":[{"award-number":["11835014"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U19A2064"],"award-info":[{"award-number":["U19A2064"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Academic Scholar of the High Level University","award":["00298"],"award-info":[{"award-number":["00298"]}]},{"name":"Recruitment Program for Leading Talent Team of Anhui Province","award":["2019\u201316"],"award-info":[{"award-number":["2019\u201316"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,9,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Although synonymous mutations do not alter the encoded amino acids, they may impact protein function by interfering with the regulation of RNA splicing or altering transcript splicing. New progress on next-generation sequencing technologies has put the exploration of synonymous mutations at the forefront of precision medicine. Several approaches have been proposed for predicting the deleterious synonymous mutations specifically, but their performance is limited by imbalance of the positive and negative samples. In this study, we firstly expanded the number of samples greatly from various data sources and compared six undersampling strategies to solve the problem of the imbalanced datasets. The results suggested that cluster centroid is the most effective scheme. Secondly, we presented a computational model, undersampling scheme based method for deleterious synonymous mutation (usDSM) prediction, using 14-dimensional biology features and random forest classifier to detect the deleterious synonymous mutation. The results on the test datasets indicated that the proposed usDSM model can attain superior performance in comparison with other state-of-the-art machine learning methods. Lastly, we found that the deep learning model did not play a substantial role in deleterious synonymous mutation prediction through a lot of experiments, although it achieves superior results in other fields. In conclusion, we hope our work will contribute to the future development of computational methods for a more accurate prediction of the deleterious effect of human synonymous mutation. The web server of usDSM is freely accessible at http:\/\/usdsm.xialab.info\/.<\/jats:p>","DOI":"10.1093\/bib\/bbab123","type":"journal-article","created":{"date-parts":[[2021,3,16]],"date-time":"2021-03-16T12:26:22Z","timestamp":1615897582000},"source":"Crossref","is-referenced-by-count":20,"title":["usDSM: a novel method for deleterious synonymous mutation prediction using undersampling scheme"],"prefix":"10.1093","volume":"22","author":[{"given":"Xi","family":"Tang","sequence":"first","affiliation":[{"name":"GMU-GIBH Joint School of Life Sciences, Guangzhou Medical University and the Institutes of Physical Science and Information Technology, Anhui University, China"}]},{"given":"Tao","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Anhui University, China"}]},{"given":"Na","family":"Cheng","sequence":"additional","affiliation":[{"name":"Institutes of Physical Science and Information Technology, Anhui University, China"}]},{"given":"Huadong","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Anhui University, China"}]},{"given":"Chun-Hou","family":"Zheng","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Anhui University, China"}]},{"given":"Junfeng","family":"Xia","sequence":"additional","affiliation":[{"name":"Institutes of Physical Science and Information Technology, Anhui University, China"}]},{"given":"Tiejun","family":"Zhang","sequence":"additional","affiliation":[{"name":"GMU-GIBH Joint School of Life Sciences, Guangzhou Medical University, China"}]}],"member":"286","published-online":{"date-parts":[[2021,4,19]]},"reference":[{"key":"2021090815223094600_ref1","doi-asserted-by":"crossref","first-page":"1129","DOI":"10.1016\/j.cell.2014.02.037","article-title":"Silent mutations make some noise","volume":"156","author":"Zheng","year":"2014","journal-title":"Cell"},{"key":"2021090815223094600_ref2","doi-asserted-by":"crossref","first-page":"13481","DOI":"10.1073\/pnas.1304227110","article-title":"Whole-genome sequencing identifies a recurrent functional synonymous mutation in melanoma","volume":"110","author":"Gartner","year":"2013","journal-title":"Proc Natl Acad Sci"},{"key":"2021090815223094600_ref3","doi-asserted-by":"crossref","first-page":"1324","DOI":"10.1016\/j.cell.2014.01.051","article-title":"Synonymous mutations frequently act as driver mutations in human cancers","volume":"156","author":"Supek","year":"2014","journal-title":"Cell"},{"key":"2021090815223094600_ref4","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1038\/nrg1327","article-title":"Genomic variants in exons and introns: identifying the splicing spoilers","volume":"5","author":"Pagani","year":"2004","journal-title":"Nat Rev Genet"},{"key":"2021090815223094600_ref5","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1038\/scientificamerican0609-46","article-title":"The price of silent mutations","volume":"300","author":"Chamary","year":"2009","journal-title":"Sci Am"},{"key":"2021090815223094600_ref6","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1038\/nrg2899","article-title":"Synonymous but not the same: the causes and consequences of codon bias","volume":"12","author":"Plotkin","year":"2011","journal-title":"Nat Rev Genet"},{"key":"2021090815223094600_ref7","doi-asserted-by":"crossref","DOI":"10.1109\/TCBB.2020.2975181","article-title":"STIC: predicting single nucleotide variants and tumor purity in cancer genome","author":"Yuan","year":"2020","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2021090815223094600_ref8","doi-asserted-by":"crossref","first-page":"970","DOI":"10.1093\/bib\/bbz047","article-title":"Comparison and integration of computational methods for deleterious synonymous mutation prediction","volume":"21","author":"Cheng","year":"2020","journal-title":"Brief Bioinform"},{"key":"2021090815223094600_ref9","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1038\/ng.2892","article-title":"A general framework for estimating the relative pathogenicity of human genetic variants","volume":"46","author":"Kircher","year":"2014","journal-title":"Nat Genet"},{"key":"2021090815223094600_ref10","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1093\/bioinformatics\/btu703","article-title":"DANN: a deep learning approach for annotating the pathogenicity of genetic variants","volume":"31","author":"Quang","year":"2015","journal-title":"Bioinformatics"},{"key":"2021090815223094600_ref11","doi-asserted-by":"crossref","first-page":"1536","DOI":"10.1093\/bioinformatics\/btv009","article-title":"An integrative approach to predicting the functional effects of non-coding and coding sequence variation","volume":"31","author":"Shihab","year":"2015","journal-title":"Bioinformatics"},{"key":"2021090815223094600_ref12","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1093\/bioinformatics\/btx536","article-title":"FATHMM-XF: accurate prediction of pathogenic point mutations via extended features","volume":"34","author":"Rogers","year":"2018","journal-title":"Bioinformatics"},{"key":"2021090815223094600_ref13","doi-asserted-by":"crossref","first-page":"W247","DOI":"10.1093\/nar\/gkx369","article-title":"PhD-SNPg: a webserver and lightweight tool for scoring single nucleotide variants","volume":"45","author":"Capriotti","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2021090815223094600_ref14","doi-asserted-by":"crossref","first-page":"1843","DOI":"10.1093\/bioinformatics\/btt308","article-title":"Identification of deleterious synonymous variants in human genomes","volume":"29","author":"Buske","year":"2013","journal-title":"Bioinformatics"},{"key":"2021090815223094600_ref15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-017-00141-2","article-title":"Annotating pathogenic non-coding variants in genic regions","volume":"8","author":"Gelfman","year":"2017","journal-title":"Nat Commun"},{"key":"2021090815223094600_ref16","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1186\/s12920-018-0455-6","article-title":"Computational identification of deleterious synonymous variants in human genomes using a feature-based approach","volume":"12","author":"Shi","year":"2019","journal-title":"BMC Med Genomics"},{"key":"2021090815223094600_ref17","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/nature09534","article-title":"A map of human genome variation from population-scale sequencing","volume":"467","author":"Consortium","year":"2010","journal-title":"Nature"},{"key":"2021090815223094600_ref18","volume-title":"Pattern Classification","author":"Duda","year":"2012"},{"key":"2021090815223094600_ref19","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1016\/j.neucom.2014.07.064","article-title":"Neighbourhood sampling in bagging for imbalanced data","volume":"150","author":"Acta Medica OkayamaAapg Bulletin","year":"2015","journal-title":"Neurocomputing"},{"key":"2021090815223094600_ref20","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1109\/TSMCC.2011.2161285","article-title":"A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches","volume":"42","author":"Galar","year":"2011","journal-title":"IEEE Trans Syst Man Cyber, Part C (Appl Rev)"},{"key":"2021090815223094600_ref21","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1016\/j.ins.2017.05.008","article-title":"Clustering-based undersampling in class-imbalanced data","volume":"409","author":"Lin","year":"2017","journal-title":"Inform Sci"},{"key":"2021090815223094600_ref22","doi-asserted-by":"crossref","first-page":"665","DOI":"10.1007\/s00439-017-1779-6","article-title":"The human gene mutation database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies","volume":"136","author":"Stenson","year":"2017","journal-title":"Hum Genet"},{"key":"2021090815223094600_ref23","doi-asserted-by":"crossref","first-page":"1914","DOI":"10.1093\/bioinformatics\/btw086","article-title":"dbDSM: a manually curated database for deleterious synonymous mutations","volume":"32","author":"Wen","year":"2016","journal-title":"Bioinformatics"},{"key":"2021090815223094600_ref24","doi-asserted-by":"crossref","first-page":"772","DOI":"10.1038\/jhg.2011.99","article-title":"Identification of independent risk loci for graves\u2019 disease within the MHC in the Japanese population","volume":"56","author":"Nakabayashi","year":"2011","journal-title":"J Hum Genet"},{"key":"2021090815223094600_ref25","doi-asserted-by":"crossref","first-page":"885","DOI":"10.1038\/ng.669","article-title":"A locus on 19p13 modifies risk of breast cancer in BRCA1 mutation carriers and is associated with hormone receptor\u2013negative breast cancer in the general population","volume":"42","author":"Antoniou","year":"2010","journal-title":"Nat Genet"},{"key":"2021090815223094600_ref26","doi-asserted-by":"crossref","first-page":"D980","DOI":"10.1093\/nar\/gkt1113","article-title":"ClinVar: public archive of relationships among sequence variation and human phenotype","volume":"42","author":"Landrum","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2021090815223094600_ref27","volume-title":"The NCBI Handbook","author":"Canese","year":"2013","edition":"2nd"},{"key":"2021090815223094600_ref28","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1002\/humu.22727","article-title":"Vari SNP, a benchmark database for variations from db SNP","volume":"36","author":"Schaafsma","year":"2015","journal-title":"Hum Mutat"},{"key":"2021090815223094600_ref29","doi-asserted-by":"crossref","first-page":"D662","DOI":"10.1093\/nar\/gku1010","article-title":"Ensembl 2015","volume":"43","author":"Cunningham","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2021090815223094600_ref30","doi-asserted-by":"crossref","first-page":"3439","DOI":"10.1093\/bioinformatics\/bti525","article-title":"BioMart and bioconductor: a powerful link between biological databases and microarray data analysis","volume":"21","author":"Durinck","year":"2005","journal-title":"Bioinformatics"},{"key":"2021090815223094600_ref31","doi-asserted-by":"crossref","first-page":"D862","DOI":"10.1093\/nar\/gkv1222","article-title":"ClinVar: public archive of interpretations of clinically relevant variants","volume":"44","author":"Landrum","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2021090815223094600_ref32","doi-asserted-by":"crossref","first-page":"1034","DOI":"10.1101\/gr.3715005","article-title":"Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes","volume":"15","author":"Siepel","year":"2005","journal-title":"Genome Res"},{"key":"2021090815223094600_ref33","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1101\/gr.097857.109","article-title":"Detection of nonneutral substitution rates on mammalian phylogenies","volume":"20","author":"Pollard","year":"2010","journal-title":"Genome Res"},{"key":"2021090815223094600_ref34","doi-asserted-by":"crossref","first-page":"e1001025","DOI":"10.1371\/journal.pcbi.1001025","article-title":"Identifying a high fraction of the human genome to be under selective constraint using GERP++","volume":"6","author":"Davydov","year":"2010","journal-title":"PLoS Comput Biol"},{"key":"2021090815223094600_ref35","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1093\/bioinformatics\/17.6.520","article-title":"Missing value estimation methods for DNA microarrays","volume":"17","author":"Troyanskaya","year":"2001","journal-title":"Bioinformatics"},{"key":"2021090815223094600_ref36","doi-asserted-by":"crossref","first-page":"1161","DOI":"10.1038\/s41588-018-0167-z","article-title":"Predicting the clinical impact of human mutation with deep neural networks","volume":"50","author":"Sundaram","year":"2018","journal-title":"Nat Genet"},{"key":"2021090815223094600_ref37","doi-asserted-by":"crossref","first-page":"2740","DOI":"10.1093\/bioinformatics\/bty179","article-title":"Deep learning improves antimicrobial peptide recognition","volume":"34","author":"Veltri","year":"2018","journal-title":"Bioinformatics"},{"key":"2021090815223094600_ref38","volume-title":"Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, Washington, DC, 2003","author":"Mani","year":"2003"},{"key":"2021090815223094600_ref39","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1038\/nmeth.2832","article-title":"Functional annotation of noncoding sequence variants","volume":"11","author":"Ritchie","year":"2014","journal-title":"Nat Methods"},{"key":"2021090815223094600_ref40","doi-asserted-by":"crossref","first-page":"3738","DOI":"10.1016\/j.patcog.2012.03.014","article-title":"Inverse random under sampling for class imbalance problem and its application to multi-label classification","volume":"45","author":"Tahir","year":"2012","journal-title":"Pattern Recogn"},{"key":"2021090815223094600_ref41","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2021090815223094600_ref42","doi-asserted-by":"crossref","first-page":"2571","DOI":"10.3389\/fmicb.2018.02571","article-title":"PredT4SE-stack: prediction of bacterial type IV secreted effectors from protein sequences using a stacked ensemble method","volume":"9","author":"Xiong","year":"2018","journal-title":"Front Microbiol"},{"key":"2021090815223094600_ref43","doi-asserted-by":"crossref","first-page":"1254806","DOI":"10.1126\/science.1254806","article-title":"The human splicing code reveals new insights into the genetic determinants of disease","volume":"347","author":"Xiong","year":"2015","journal-title":"Science"},{"key":"2021090815223094600_ref44","doi-asserted-by":"crossref","first-page":"i121","DOI":"10.1093\/bioinformatics\/btu277","article-title":"Deep learning of the tissue-regulated splicing code","volume":"30","author":"Leung","year":"2014","journal-title":"Bioinformatics"},{"key":"2021090815223094600_ref45","doi-asserted-by":"crossref","first-page":"e107","DOI":"10.1093\/nar\/gkw226","article-title":"DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences","volume":"44","author":"Quang","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2021090815223094600_ref46","doi-asserted-by":"crossref","first-page":"3387","DOI":"10.1093\/bioinformatics\/btx431","article-title":"DeepLoc: prediction of protein subcellular localization using deep learning","volume":"33","author":"Almagro Armenteros","year":"2017","journal-title":"Bioinformatics"},{"key":"2021090815223094600_ref47","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1016\/j.cell.2018.12.015","article-title":"Predicting splicing from primary sequence with deep learning","volume":"176","author":"Jaganathan","year":"2019","journal-title":"Cell"},{"key":"2021090815223094600_ref48","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1002\/humu.22768","article-title":"The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity","volume":"36","author":"Grimm","year":"2015","journal-title":"Hum Mutat"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/5\/bbab123\/40261711\/bbab123.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/5\/bbab123\/40261711\/bbab123.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,8]],"date-time":"2021-09-08T15:23:17Z","timestamp":1631114597000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab123\/6236069"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,19]]},"references-count":48,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2021,9,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab123","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,9]]},"published":{"date-parts":[[2021,4,19]]},"article-number":"bbab123"}}