{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,15]],"date-time":"2026-05-15T21:32:48Z","timestamp":1778880768517,"version":"3.51.4"},"reference-count":49,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2020,1,25]],"date-time":"2020-01-25T00:00:00Z","timestamp":1579910400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100018537","name":"National Science and Technology Major Project","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100018537","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61372138"],"award-info":[{"award-number":["61372138"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100018537","name":"National Science and Technology Major Project","doi-asserted-by":"crossref","award":["2018ZX10201002"],"award-info":[{"award-number":["2018ZX10201002"]}],"id":[{"id":"10.13039\/501100018537","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,1,18]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>By reviewing previous CpG-related studies, we consider that the transcription regulation of about half of the human genes, mostly housekeeping (HK) genes, involves CpG islands (CGIs), their methylation states, CpG spacing and other chromosomal parameters. However, the precise CGI definition and positioning of CGIs within gene structures, as well as specific CGI-associated regulatory mechanisms, all remain to be explained at individual gene and gene-family levels, together with consideration of species and lineage specificity. Although previous studies have already classified CGIs into high-CpG (HCGI), intermediate-CpG (ICGI) and low-CpG (LCGI) densities based on CpG density variation, the correlation between CGI density and gene expression regulation, such as co-regulation of CGIs and TATA box on HK genes, remains to be elucidated. First, this study introduces such a problem-solving protocol for human-genome annotation, which is based on a combination of GTEx, JBLA and Gene Ontology (GO) analysis. Next, we discuss why CGI-associated genes are most likely regulated by HCGI and tend to be HK genes; the HCGI\/TATA\u00b1 and LCGI\/TATA\u00b1 combinations show different GO enrichment, whereas the ICGI\/TATA\u00b1 combination is less characteristic based on GO enrichment analysis. Finally, we demonstrate that Hadoop MapReduce-based MR-JBLA algorithm is more efficient than the original JBLA in k-mer counting and CGI-associated gene analysis.<\/jats:p>","DOI":"10.1093\/bib\/bbz134","type":"journal-article","created":{"date-parts":[[2019,10,8]],"date-time":"2019-10-08T08:11:14Z","timestamp":1570522274000},"page":"515-525","source":"Crossref","is-referenced-by-count":43,"title":["CpG-island-based annotation and analysis of human housekeeping genes"],"prefix":"10.1093","volume":"22","author":[{"given":"Le","family":"Zhang","sequence":"first","affiliation":[{"name":"College of Computer Science, Sichuan University, Chengdu, 610065, PR China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zichun","family":"Dai","sequence":"additional","affiliation":[{"name":"Medical Big Data Center of Sichuan University, Sichuan University, Chengdu, 610065, PR China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Yu","sequence":"additional","affiliation":[{"name":"CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, PR China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ming","family":"Xiao","sequence":"additional","affiliation":[{"name":"University of Chinese Academy of Sciences, Beijing 100049, PR China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2020,1,25]]},"reference":[{"key":"2021012203305803000_ref1","doi-asserted-by":"crossref","first-page":"777","DOI":"10.1016\/S0140-6736(18)31268-6","article-title":"Principles of DNA methylation and their implications for biology and medicine","volume":"392","author":"Dor","year":"2018","journal-title":"Lancet"},{"key":"2021012203305803000_ref2","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1016\/j.cell.2013.12.019","article-title":"Reversing DNA methylation: mechanisms, genomics, and biological functions","volume":"156","author":"Wu","year":"2014","journal-title":"Cell"},{"key":"2021012203305803000_ref3","first-page":"374","article-title":"Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data","volume":"19","author":"Zhang","year":"2016","journal-title":"Brief Bioinform"},{"key":"2021012203305803000_ref4","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1126\/science.aag3260","article-title":"Integration of CpG-free DNA induces de novo methylation of CpG islands in pluripotent stem cells","volume":"356","author":"Takahashi","year":"2017","journal-title":"Science"},{"key":"2021012203305803000_ref5","doi-asserted-by":"crossref","first-page":"512","DOI":"10.1016\/j.bpj.2016.12.029","article-title":"Optical trapping nanometry of hypermethylated CPG-island DNA","volume":"112","author":"Pongor","year":"2017","journal-title":"Biophys J"},{"key":"2021012203305803000_ref6","doi-asserted-by":"crossref","first-page":"564","DOI":"10.1038\/nsmb.1594","article-title":"Developmental programming of CpG island methylation profiles in the human genome","volume":"16","author":"Straussman","year":"2009","journal-title":"Nat Struct Mol Biol"},{"key":"2021012203305803000_ref7","doi-asserted-by":"crossref","first-page":"R33","DOI":"10.1186\/gb-2005-6-4-r33","article-title":"Promoter features related to tissue specificity as measured by Shannon entropy","volume":"6","author":"Schug","year":"2005","journal-title":"Genome Biol"},{"key":"2021012203305803000_ref8","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1016\/j.tig.2008.08.004","article-title":"On the nature of human housekeeping genes","volume":"24","author":"Zhu","year":"2008","journal-title":"Trends Genet"},{"key":"2021012203305803000_ref9","doi-asserted-by":"crossref","first-page":"1044","DOI":"10.1101\/gr.088773.108","article-title":"Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver","volume":"19","author":"Brunner","year":"2009","journal-title":"Genome Res"},{"key":"2021012203305803000_ref10","doi-asserted-by":"crossref","first-page":"2998","DOI":"10.1093\/gbe\/evu238","article-title":"Conserved and divergent patterns of DNA methylation in higher vertebrates","volume":"6","author":"Ning","year":"2014","journal-title":"Genome Biol Evol"},{"key":"2021012203305803000_ref11","doi-asserted-by":"crossref","first-page":"421","DOI":"10.4161\/epi.19565","article-title":"Diametrically opposite methylome-transcriptome relationships in high- and low-CpG promoter genes in postmitotic neural rat tissue","volume":"7","author":"Hartung","year":"2012","journal-title":"Epigenetics"},{"key":"2021012203305803000_ref12","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1093\/bib\/bbx013","article-title":"A survey of the approaches for identifying differential methylation using bisulfite sequencing data","volume":"19","author":"Shafi","year":"2017","journal-title":"Brief Bioinform"},{"key":"2021012203305803000_ref13","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1038\/ng1990","article-title":"Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome","volume":"39","author":"Weber","year":"2007","journal-title":"Nat Genet"},{"key":"2021012203305803000_ref14","doi-asserted-by":"crossref","first-page":"1057","DOI":"10.1093\/nar\/gku1113","article-title":"The GOA database: gene ontology annotation updates for 2015","volume":"43","author":"Huntley","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2021012203305803000_ref15","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1093\/bib\/6.3.298","article-title":"Get ready to GO! A biologist's guide to the gene ontology","volume":"6","author":"Lomax","year":"2005","journal-title":"Brief Bioinform"},{"key":"2021012203305803000_ref16","doi-asserted-by":"crossref","first-page":"3624","DOI":"10.1093\/bioinformatics\/bty392","article-title":"Lineage-associated underrepresented permutations (LAUPs) of mammalian genomic sequences based on a jellyfish-based LAUPs analysis application (JBLA)","volume":"34","author":"Zhang","year":"2018","journal-title":"Bioinformatics"},{"key":"2021012203305803000_ref17","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1038\/ng.2653","article-title":"The genotype-tissue expression (GTEx) project","volume":"45","author":"Kubicek","year":"2013","journal-title":"Nat Genet"},{"key":"2021012203305803000_ref18","doi-asserted-by":"crossref","first-page":"204","DOI":"10.1038\/nature24277","article-title":"Genetic effects on gene expression across human tissues","volume":"550","author":"Consortium","year":"2017","journal-title":"Nature"},{"key":"2021012203305803000_ref19","doi-asserted-by":"crossref","first-page":"D501","DOI":"10.1093\/nar\/gki025","article-title":"Reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins","volume":"33","author":"Pruitt","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2021012203305803000_ref20","doi-asserted-by":"crossref","first-page":"D67","DOI":"10.1093\/nar\/gkv1276","article-title":"GenBank","volume":"44","author":"Clark","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2021012203305803000_ref21","doi-asserted-by":"publisher","DOI":"10.1038\/ncomms11778","article-title":"Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow","volume":"7","author":"Wright","year":"2016","journal-title":"Nat Commun"},{"key":"2021012203305803000_ref22","doi-asserted-by":"crossref","first-page":"D762","DOI":"10.1093\/nar\/gkx1020","article-title":"The UCSC genome browser database: 2018 update","volume":"46","author":"Casper","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2021012203305803000_ref23","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1016\/0022-2836(87)90689-9","article-title":"CpG islands in vertebrate genomes","volume":"196","author":"Gardinergarden","year":"1987","journal-title":"J Mol Biol"},{"key":"2021012203305803000_ref24","doi-asserted-by":"publisher","DOI":"10.1109\/TCBB.2019.2935971","article-title":"CGIDLA:developing the web server for CpG Island related density and LAUPs (lineage-associated underrepresented permutations) study","author":"Xiao","year":"2019","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2021012203305803000_ref25","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1186\/1471-2164-9-172","article-title":"How many human genes can be defined as housekeeping with current expression data?","volume":"9","author":"Zhu","year":"2008","journal-title":"BMC Genomics"},{"key":"2021012203305803000_ref26","doi-asserted-by":"crossref","first-page":"640","DOI":"10.1126\/science.aab3002","article-title":"Human genetics. GTEx detects genetic effects","volume":"348","author":"Gobson","year":"2015","journal-title":"Science"},{"key":"2021012203305803000_ref27","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1089\/omi.2011.0118","article-title":"clusterProfiler: an R package for comparing biological themes among gene clusters","volume":"16","author":"Yu","year":"2012","journal-title":"OMICS"},{"key":"2021012203305803000_ref28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TC.2016.2595566","article-title":"Storage computing for Hadoop MapReduce framework: challenges and possibilities","author":"Park","year":"2016","journal-title":"IEEE Trans Comput"},{"key":"2021012203305803000_ref29","first-page":"439","article-title":"Conservation between the RNA Polymerase I, II, and III transcription initiation machineries","volume-title":"Molecular cell","author":"Vannini","year":"2012"},{"key":"2021012203305803000_ref30","first-page":"307","volume-title":"ACM Symposium on Research in Applied Computation","author":"Ding","year":"2011"},{"key":"2021012203305803000_ref31","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1109\/TSC.2015.2444838","article-title":"Processing Cassandra datasets with Hadoop-streaming based approaches","volume":"9","author":"Dede","year":"2016","journal-title":"IEEE Trans Serv Comput"},{"key":"2021012203305803000_ref32","first-page":"164","article-title":"Distributed extreme learning machine with alternating direction method of multiplier","author":"Luo","year":"2017"},{"key":"2021012203305803000_ref33","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1016\/j.fss.2014.01.016","article-title":"Parallel sampling from big data with uncertainty distribution","volume":"258","author":"He","year":"2015","journal-title":"Fuzzy Set Syst"},{"key":"2021012203305803000_ref34","doi-asserted-by":"publisher","DOI":"10.1038\/srep38201","article-title":"A parallel Adaboost-backpropagation neural network for massive image dataset classification","volume":"6","author":"Cao","year":"2016","journal-title":"Sci Rep"},{"key":"2021012203305803000_ref35","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2008-9-5-r79","article-title":"CpG island density and its correlations with genomic features in mammalian genomes","volume":"9","author":"Han","year":"2008","journal-title":"Genome Biol"},{"key":"2021012203305803000_ref36","doi-asserted-by":"crossref","first-page":"1010","DOI":"10.1101\/gad.2037511","article-title":"CpG islands and regulation of transcription","volume":"25","author":"Deaton","year":"2011","journal-title":"Genes Dev"},{"key":"2021012203305803000_ref37","first-page":"107","volume-title":"International Symposium on Databases in Parallel and Distributed Systems, 1988","author":"Lakshmi","year":"2000"},{"key":"2021012203305803000_ref38","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1093\/bib\/2.2.181","article-title":"On the parallelisation of bioinformatics applications","volume":"2","author":"Trelles","year":"2001","journal-title":"Brief Bioinform"},{"key":"2021012203305803000_ref39","first-page":"938","article-title":"Evaluation of preprocessing, mapping and postprocessing algorithms for analyzing whole genome bisulfite sequencing data","volume":"17","author":"Tsuji","year":"2016","journal-title":"Brief Bioinform"},{"key":"2021012203305803000_ref40","doi-asserted-by":"crossref","first-page":"773","DOI":"10.1016\/j.molcel.2013.02.011","article-title":"The hierarchy of the 3D genome","volume":"49","author":"Gibcus","year":"2013","journal-title":"Mol Cell"},{"key":"2021012203305803000_ref41","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1093\/bib\/bbv085","article-title":"How computer science can help in understanding the 3D genome architecture","volume":"17","author":"Shavit","year":"2016","journal-title":"Brief Bioinform"},{"key":"2021012203305803000_ref42","doi-asserted-by":"crossref","first-page":"1829","DOI":"10.1002\/cnm.1444","article-title":"Employing graphics processing unit technology, alternating direction implicit method and domain decomposition to speed up the numerical diffusion solver for the biomedical engineering research","volume":"27","author":"Jiang","year":"2011","journal-title":"Int J Numer Method Biomed Eng"},{"key":"2021012203305803000_ref43","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.matcom.2014.07.003","article-title":"Novel 3D GPU based numerical parallel diffusion algorithms in cylindrical coordinates for health care simulation","volume":"109","author":"Jiang","year":"2015","journal-title":"Math Comput Simul"},{"key":"2021012203305803000_ref44","first-page":"1","article-title":"Building up a robust risk mathematical platform to predict colorectal cancer","volume":"2017","author":"Zhang","year":"2017","journal-title":"Complexity"},{"key":"2021012203305803000_ref45","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1093\/jmcb\/mjx056","article-title":"EZH2-, CHD4-, and IDH-linked epigenetic perturbation and its association with survival in glioma patients","volume":"9","author":"Zhang","year":"2017","journal-title":"J Mol Cell Biol"},{"key":"2021012203305803000_ref46","doi-asserted-by":"crossref","first-page":"14877","DOI":"10.1039\/C6NR01637E","article-title":"Investigation of mechanism of bone regeneration in a porous biodegradable calcium phosphate (CaP) scaffold by a combination of a multi-scale agent-based model and experimental optimization\/validation","volume":"8","author":"Zhang","year":"2016","journal-title":"Nanoscale"},{"key":"2021012203305803000_ref47","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1007\/s12539-019-00327-w","article-title":"An overview of scoring functions used for protein-ligand interactions in molecular docking","volume":"11","author":"Li","year":"2019","journal-title":"Interdiscip Sci"},{"key":"2021012203305803000_ref48","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1186\/s12859-019-2741-5","article-title":"Computed tomography angiography-based analysis of high-risk intracerebral haemorrhage patients by employing a mathematical model","volume":"20","author":"Zhang","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2021012203305803000_ref49","first-page":"15","article-title":"Comprehensively benchmarking applications for detecting copy number variation","volume":"e1007069","author":"Zhang","year":"2019","journal-title":"PLoS Comput Biol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/1\/515\/35934772\/bbz134.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/1\/515\/35934772\/bbz134.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,1,23]],"date-time":"2021-01-23T12:04:58Z","timestamp":1611403498000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/22\/1\/515\/5715934"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,25]]},"references-count":49,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2020,1,25]]},"published-print":{"date-parts":[[2021,1,18]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbz134","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,1]]},"published":{"date-parts":[[2020,1,25]]}}}