{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T06:27:08Z","timestamp":1772519228966,"version":"3.50.1"},"reference-count":55,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2020,6,24]],"date-time":"2020-06-24T00:00:00Z","timestamp":1592956800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Cancer Genomics Core funded by the Cancer Prevention and Research Institute of Texas","award":["RP180734"],"award-info":[{"award-number":["RP180734"]}]},{"name":"Cancer Genomics Core funded by the Cancer Prevention and Research Institute of Texas","award":["CPRIT RP170668"],"award-info":[{"award-number":["CPRIT RP170668"]}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01LM012806"],"award-info":[{"award-number":["R01LM012806"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,5,20]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>DNA N4-methylcytosine (4mC) modification represents a novel epigenetic regulation. It involves in various cellular processes, including DNA replication, cell cycle and gene expression, among others. In addition to experimental identification of 4mC sites, in silico prediction of 4mC sites in the genome has emerged as an alternative and promising approach. In this study, we first reviewed the current progress in the computational prediction of 4mC sites and systematically evaluated the predictive capacity of eight conventional machine learning algorithms as well as 12 feature types commonly used in previous studies in six species. Using a representative benchmark dataset, we investigated the contribution of feature selection and stacking approach to the model construction, and found that feature optimization and proper reinforcement learning could improve the performance. We next recollected newly added 4mC sites in the six species\u2019 genomes and developed a novel deep learning-based 4mC site predictor, namely Deep4mC. Deep4mC applies convolutional neural networks with four representative features. For species with small numbers of samples, we extended our deep learning framework with a bootstrapping method. Our evaluation indicated that Deep4mC could obtain high accuracy and robust performance with the average area under curve (AUC) values greater than 0.9 in all species (range: 0.9005\u20130.9722). In comparison, Deep4mC achieved an AUC value improvement from 10.14 to 46.21% when compared to previous tools in these six species. A user-friendly web server (https:\/\/bioinfo.uth.edu\/Deep4mC) was built for predicting putative 4mC sites in a genome.<\/jats:p>","DOI":"10.1093\/bib\/bbaa099","type":"journal-article","created":{"date-parts":[[2020,5,5]],"date-time":"2020-05-05T03:27:17Z","timestamp":1588649237000},"source":"Crossref","is-referenced-by-count":72,"title":["Deep4mC: systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning"],"prefix":"10.1093","volume":"22","author":[{"given":"Haodong","family":"Xu","sequence":"first","affiliation":[{"name":"Center for Precision Health, School of Biomedical Informatics"}]},{"given":"Peilin","family":"Jia","sequence":"additional","affiliation":[{"name":"Center for Precision Health, School of Biomedical Informatics"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3477-0914","authenticated-orcid":false,"given":"Zhongming","family":"Zhao","sequence":"additional","affiliation":[{"name":"Center for Precision Health, School of Biomedical Informatics"}]}],"member":"286","published-online":{"date-parts":[[2020,6,24]]},"reference":[{"key":"2021052109515333100_ref1","first-page":"e148","article-title":"Base-resolution detection of N4-methylcytosine in genomic DNA using 4mC-Tet-assisted-bisulfite- sequencing","volume":"43","author":"Yu","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2021052109515333100_ref2","doi-asserted-by":"crossref","first-page":"934","DOI":"10.1126\/science.1220671","article-title":"Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution","volume":"336","author":"Booth","year":"2012","journal-title":"Science"},{"key":"2021052109515333100_ref3","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1016\/j.molcel.2018.06.015","article-title":"N6-methyladenine DNA modification in the human genome","volume":"71","author":"Xiao","year":"2018","journal-title":"Mol Cell"},{"key":"2021052109515333100_ref4","doi-asserted-by":"crossref","first-page":"839","DOI":"10.1038\/nature09586","article-title":"Impaired hydroxylation of 5-methylcytosine in myeloid cancers with mutant TET2","volume":"468","author":"Ko","year":"2010","journal-title":"Nature"},{"key":"2021052109515333100_ref5","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1186\/s13072-015-0016-6","article-title":"Epigenetic regulatory functions of DNA modifications: 5-methylcytosine and beyond","volume":"8","author":"Breiling","year":"2015","journal-title":"Epigenetics Chromatin"},{"key":"2021052109515333100_ref6","doi-asserted-by":"crossref","first-page":"893","DOI":"10.1016\/j.cell.2015.04.018","article-title":"N6-methyladenine DNA modification in Drosophila","volume":"161","author":"Zhang","year":"2015","journal-title":"Cell"},{"key":"2021052109515333100_ref7","doi-asserted-by":"crossref","first-page":"939","DOI":"10.1128\/JB.169.3.939-943.1987","article-title":"N4-methylcytosine as a minor base in bacterial DNA","volume":"169","author":"Ehrlich","year":"1987","journal-title":"J Bacteriol"},{"key":"2021052109515333100_ref8","doi-asserted-by":"crossref","first-page":"1063","DOI":"10.1073\/pnas.77.2.1063","article-title":"Escherichia coli mutator mutants deficient in methylation-instructed DNA mismatch correction","volume":"77","author":"Glickman","year":"1980","journal-title":"P Natl Acad Sci"},{"key":"2021052109515333100_ref9","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1093\/genetics\/104.4.571","article-title":"Effects of high levels of DNA adenine methylation on methyl-directed mismatch repair in Escherichia coli","volume":"104","author":"Pukkila","year":"1983","journal-title":"Genetics"},{"key":"2021052109515333100_ref10","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1038\/nmeth.1459","article-title":"Direct detection of DNA methylation during single-molecule, real-time sequencing","volume":"7","author":"Flusberg","year":"2010","journal-title":"Nat Methods"},{"key":"2021052109515333100_ref11","doi-asserted-by":"crossref","first-page":"20170078","DOI":"10.1098\/rstb.2017.0078","article-title":"Selective recognition of N4-methylcytosine in DNA by engineered transcription-activator-like effectors","volume":"373","author":"Rathi","year":"2018","journal-title":"Philos Trans R Soc Lond B Biol Sci"},{"key":"2021052109515333100_ref12","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1093\/nar\/gkw950","article-title":"MethSMRT: an integrative database for DNA N6-methyladenine and N4-methylcytosine generated by single-molecular real-time sequencing","volume":"45","author":"Ye","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2021052109515333100_ref13","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1186\/s13321-019-0349-4","article-title":"DNAmod: the DNA modification database","volume":"11","author":"Sood","year":"2019","journal-title":"J Chem"},{"key":"2021052109515333100_ref14","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1038\/s41438-019-0160-4","article-title":"MDR: an integrative DNA N6-methyladenine and N4-methylcytosine modification database for Rosaceae","volume":"6","author":"Liu","year":"2019","journal-title":"Hortic Res"},{"key":"2021052109515333100_ref15","doi-asserted-by":"crossref","first-page":"3257","DOI":"10.1093\/bioinformatics\/btaa113","article-title":"6mA-Finder: a novel online tool for predicting DNA N6-methyladenine sites in genomes","volume":"36","author":"Haodong","year":"2020","journal-title":"Bioinformatics"},{"key":"2021052109515333100_ref16","doi-asserted-by":"crossref","first-page":"3518","DOI":"10.1093\/bioinformatics\/btx479","article-title":"iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties","volume":"33","author":"Chen","year":"2017","journal-title":"Bioinformatics"},{"key":"2021052109515333100_ref17","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1016\/j.ygeno.2018.01.005","article-title":"iDNA6mA-PseKNC: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC","volume":"111","author":"Feng","year":"2019","journal-title":"Genomics"},{"key":"2021052109515333100_ref18","doi-asserted-by":"crossref","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","article-title":"CD-HIT: accelerated for clustering the next-generation sequencing data","volume":"28","author":"Fu","year":"2012","journal-title":"Bioinformatics"},{"key":"2021052109515333100_ref19","doi-asserted-by":"crossref","first-page":"e91","DOI":"10.1093\/nar\/gkw104","article-title":"SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features","volume":"44","author":"Zhou","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2021052109515333100_ref20","doi-asserted-by":"crossref","first-page":"10900","DOI":"10.1038\/srep10900","article-title":"Systematic analysis of the genetic variability that impacts SUMO conjugation and their involvement in human diseases","volume":"5","author":"Xu","year":"2015","journal-title":"Sci Rep"},{"key":"2021052109515333100_ref21","doi-asserted-by":"crossref","first-page":"911","DOI":"10.2174\/0929866511320080008","article-title":"Prediction of methylation sites using the composition of K-spaced amino acid pairs","volume":"20","author":"Zhang","year":"2013","journal-title":"Protein Pept Lett"},{"key":"2021052109515333100_ref22","doi-asserted-by":"crossref","first-page":"e127","DOI":"10.1093\/nar\/gkz740","article-title":"BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches","volume":"47","author":"Liu","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2021052109515333100_ref23","first-page":"197","article-title":"A coding measure scheme employing electron-ion interaction pseudopotential (EIIP)","volume":"1","author":"Nair","year":"2006","journal-title":"Bioinformation"},{"key":"2021052109515333100_ref24","doi-asserted-by":"crossref","first-page":"767","DOI":"10.1039\/C7MB00054E","article-title":"EnhancerPred2.0: predicting enhancers and their strength based on position-specific trinucleotide propensity and electron\u2013ion interaction potential feature selection","volume":"13","author":"He","year":"2017","journal-title":"Mol Biosyst"},{"key":"2021052109515333100_ref25","doi-asserted-by":"crossref","first-page":"e68","DOI":"10.1093\/nar\/gks1450","article-title":"iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition","volume":"41","author":"Chen","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2021052109515333100_ref26","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1016\/j.ab.2015.08.021","article-title":"iRNA-methyl: identifying N6-methyladenosine sites using pseudo nucleotide composition","volume":"490","author":"Chen","year":"2015","journal-title":"Anal Biochem"},{"key":"2021052109515333100_ref27","doi-asserted-by":"crossref","first-page":"e20136","DOI":"10.1371\/journal.pone.0020136","article-title":"Exploiting nucleotide composition to engineer promoters","volume":"6","author":"Grabherr","year":"2011","journal-title":"PLoS One"},{"key":"2021052109515333100_ref28","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1016\/j.ygeno.2015.01.005","article-title":"Identification of protein-interacting nucleotides in a RNA sequence using composition profile of tri-nucleotides","volume":"105","author":"Panwar","year":"2015","journal-title":"Genomics"},{"key":"2021052109515333100_ref29","doi-asserted-by":"crossref","first-page":"1746","DOI":"10.3390\/ijms15021746","article-title":"iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components","volume":"15","author":"Qiu","year":"2014","journal-title":"Int J Mol Sci"},{"key":"2021052109515333100_ref30","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1186\/1471-2164-15-127","article-title":"Prediction and classification of ncRNAs using structural information","volume":"15","author":"Panwar","year":"2014","journal-title":"BMC Genomics"},{"key":"2021052109515333100_ref31","doi-asserted-by":"crossref","first-page":"2499","DOI":"10.1093\/bioinformatics\/bty140","article-title":"iFeature: a python package and web server for features extraction and selection from protein and peptide sequences","volume":"34","author":"Chen","year":"2018","journal-title":"Bioinformatics"},{"key":"2021052109515333100_ref32","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1016\/j.gpb.2018.08.004","article-title":"Integration of a deep learning classifier with a random forest approach for predicting malonylation sites","volume":"16","author":"Chen","year":"2018","journal-title":"Genomics Proteomics Bioinformatics"},{"key":"2021052109515333100_ref33","article-title":"Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences","author":"Chen","year":"2019","journal-title":"Brief Bioinform"},{"key":"2021052109515333100_ref34","doi-asserted-by":"crossref","first-page":"1332","DOI":"10.3390\/cells8111332","article-title":"4mCpred-EL: an ensemble learning framework for identification of DNA N4-methylcytosine sites in the mouse genome","volume":"8","author":"Manavalan","year":"2019","journal-title":"Cell"},{"key":"2021052109515333100_ref35","first-page":"1669","article-title":"BERMP: a cross-species classifier for predicting m6A sites by integrating a deep learning algorithm and a random forest approach","volume":"14","author":"Huang","year":"2018","journal-title":"Int J Mol Sci"},{"key":"2021052109515333100_ref36","article-title":"iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data","author":"Chen","year":"2019","journal-title":"Brief Bioinform"},{"key":"2021052109515333100_ref37","first-page":"e332","article-title":"iRNA-PseU: identifying RNA pseudouridine sites","volume":"5","author":"Chen","year":"2016","journal-title":"Mol Ther-Nucl Acids"},{"key":"2021052109515333100_ref38","doi-asserted-by":"crossref","first-page":"739","DOI":"10.1016\/j.omtn.2019.10.008","article-title":"RNAm5CPred: prediction of RNA 5-methylcytosine sites based on three different kinds of nucleotide composition","volume":"18","author":"Fang","year":"2019","journal-title":"Mol Ther-Nucl Acids"},{"key":"2021052109515333100_ref39","doi-asserted-by":"crossref","first-page":"3748","DOI":"10.1093\/bioinformatics\/btv439","article-title":"SuccFind: a novel succinylation sites online prediction tool via enhanced characteristic strategy","volume":"31","author":"Xu","year":"2015","journal-title":"Bioinformatics"},{"key":"2021052109515333100_ref40","doi-asserted-by":"crossref","first-page":"3999","DOI":"10.1093\/bioinformatics\/bty444","article-title":"ProAcePred: prokaryote lysine acetylation sites prediction based on elastic net feature optimization","volume":"34","author":"Chen","year":"2018","journal-title":"Bioinformatics"},{"key":"2021052109515333100_ref41","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1016\/j.chemolab.2006.01.007","article-title":"Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products","volume":"83","author":"Granitto","year":"2006","journal-title":"Chemometr Intell Lab"},{"key":"2021052109515333100_ref42","doi-asserted-by":"crossref","first-page":"394","DOI":"10.3390\/genes9080394","article-title":"A model stacking framework for identifying DNA binding proteins by orchestrating multi-view features and classifiers","volume":"9","author":"Liu","year":"2018","journal-title":"Genes"},{"key":"2021052109515333100_ref43","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1093\/nsr\/nwx110","article-title":"Deep learning for natural language processing: advantages and challenges","volume":"5","author":"Li","year":"2017","journal-title":"Natl Sci Rev"},{"key":"2021052109515333100_ref44","first-page":"770","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"He","year":"2016"},{"key":"2021052109515333100_ref45","article-title":"Decoding regulatory structures and features from epigenomics profiles: a Roadmap-ENCODE Variational Auto-Encoder (RE-VAE) model","author":"Hu","year":"2019","journal-title":"Methods"},{"key":"2021052109515333100_ref46","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1186\/s12864-018-5370-x","article-title":"Gene2vec: distributed representation of genes based on co-expression","volume":"20","author":"Du","year":"2019","journal-title":"BMC Genomics"},{"key":"2021052109515333100_ref47","doi-asserted-by":"crossref","DOI":"10.1016\/j.gpb.2020.01.001","article-title":"GPS 5.0: an update on the prediction of kinase-specific phosphorylation sites in proteins","author":"Wang","year":"2020","journal-title":"Genomics Proteomics Bioinformatics"},{"key":"2021052109515333100_ref48","doi-asserted-by":"crossref","first-page":"3909","DOI":"10.1093\/bioinformatics\/btx496","article-title":"MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction","volume":"33","author":"Wang","year":"2017","journal-title":"Bioinformatics"},{"key":"2021052109515333100_ref49","doi-asserted-by":"crossref","first-page":"1660","DOI":"10.1093\/bioinformatics\/bty842","article-title":"DeepHINT: understanding HIV-1 integration via deep learning with attention","volume":"35","author":"Hu","year":"2019","journal-title":"Bioinformatics"},{"key":"2021052109515333100_ref50","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1093\/bioinformatics\/bty668","article-title":"4mCPred: machine learning methods for DNA N4-methylcytosine sites prediction","volume":"35","author":"He","year":"2018","journal-title":"Bioinformatics"},{"key":"2021052109515333100_ref51","doi-asserted-by":"crossref","first-page":"1326","DOI":"10.1093\/bioinformatics\/bty824","article-title":"Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species","volume":"35","author":"Wei","year":"2018","journal-title":"Bioinformatics"},{"key":"2021052109515333100_ref52","doi-asserted-by":"crossref","first-page":"4930","DOI":"10.1093\/bioinformatics\/btz408","article-title":"Iterative feature representations improve N4-methylcytosine site prediction","volume":"35","author":"Wei","year":"2019","journal-title":"Bioinformatics"},{"key":"2021052109515333100_ref53","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1016\/j.omtn.2019.04.019","article-title":"Meta-4mCpred: a sequence-based meta-predictor for accurate DNA 4mC site prediction using effective feature representation","volume":"16","author":"Manavalan","year":"2019","journal-title":"Mol Ther-Nucl Acids"},{"key":"2021052109515333100_ref54","first-page":"2579","article-title":"Visualizing data using t-SNE","volume-title":"J Mach Learn Res","author":"Maaten","year":"2008"},{"key":"2021052109515333100_ref55","volume-title":"Hyperas: a very simple convenience wrapper around hyperopt for fast prototyping with keras models (2017)","author":"Pumperla","year":"2019"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/3\/bbaa099\/37965780\/bbaa099.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/3\/bbaa099\/37965780\/bbaa099.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,30]],"date-time":"2023-09-30T17:58:13Z","timestamp":1696096693000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaa099\/5856341"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,24]]},"references-count":55,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,5,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaa099","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,5]]},"published":{"date-parts":[[2020,6,24]]},"article-number":"bbaa099"}}