{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T09:14:15Z","timestamp":1775726055816,"version":"3.50.1"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2006,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Despite their involvement in the regulation of gene expression and their importance as genomic markers for promoter prediction, no objective standard exists for defining CpG islands (CGIs), since all current approaches rely on a large parameter space formed by the thresholds of length, CpG fraction and G+C content.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>Given the higher frequency of CpG dinucleotides at CGIs, as compared to bulk DNA, the distance distributions between neighboring CpGs should differ for bulk and island CpGs. A new algorithm (<jats:italic>CpGcluster<\/jats:italic>) is presented, based on the physical distance between neighboring CpGs on the chromosome and able to predict directly clusters of CpGs, while not depending on the subjective criteria mentioned above. By assigning a <jats:italic>p-value<\/jats:italic> to each of these clusters, the most statistically significant ones can be predicted as CGIs. <jats:italic>CpGcluster<\/jats:italic> was benchmarked against five other CGI finders by using a test sequence set assembled from an experimental CGI library. <jats:italic>CpGcluster<\/jats:italic> reached the highest overall accuracy values, while showing the lowest rate of false-positive predictions. Since a minimum-length threshold is not required, <jats:italic>CpGcluster<\/jats:italic> can find short but fully functional CGIs usually missed by other algorithms. The CGIs predicted by <jats:italic>CpGcluster<\/jats:italic> present the lowest degree of overlap with Alu retrotransposons and, simultaneously, the highest overlap with vertebrate Phylogenetic Conserved Elements (PhastCons). <jats:italic>CpGcluster's<\/jats:italic> CGIs overlapping with the Transcription Start Site (TSS) show the highest statistical significance, as compared to the islands in other genome locations, thus qualifying <jats:italic>CpGcluster<\/jats:italic> as a valuable tool in discriminating functional CGIs from the remaining islands in the bulk genome.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>\n              <jats:italic>CpGcluster<\/jats:italic> uses only integer arithmetic, thus being a fast and computationally efficient algorithm able to predict statistically significant clusters of CpG dinucleotides. Another outstanding feature is that all predicted CGIs start and end with a CpG dinucleotide, which should be appropriate for a genomic feature whose functionality is based precisely on CpG dinucleotides. The only search parameter in <jats:italic>CpGcluster<\/jats:italic> is the distance between two consecutive CpGs, in contrast to previous algorithms. Therefore, none of the main statistical properties of CpG islands (neither G+C content, CpG fraction nor length threshold) are needed as search parameters, which may lead to the high specificity and low overlap with spurious Alu elements observed for <jats:italic>CpGcluster<\/jats:italic> predictions.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-7-446","type":"journal-article","created":{"date-parts":[[2006,10,12]],"date-time":"2006-10-12T21:40:10Z","timestamp":1160689210000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":143,"title":["CpGcluster: a distance-based algorithm for CpG-island detection"],"prefix":"10.1186","volume":"7","author":[{"given":"Michael","family":"Hackenberg","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christopher","family":"Previti","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pedro Luis","family":"Luque-Escamilla","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pedro","family":"Carpena","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jos\u00e9","family":"Mart\u00ednez-Aroza","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jos\u00e9 L","family":"Oliver","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2006,10,12]]},"reference":[{"issue":"12","key":"1185_CR1","doi-asserted-by":"publisher","first-page":"4692","DOI":"10.1073\/pnas.87.12.4692","volume":"87","author":"J Sved","year":"1990","unstructured":"Sved J, Bird A: The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model. Proc Natl Acad Sci USA 1990, 87(12):4692\u20136. 10.1073\/pnas.87.12.4692","journal-title":"Proc Natl Acad Sci USA"},{"issue":"8","key":"1185_CR2","doi-asserted-by":"publisher","first-page":"1647","DOI":"10.1007\/s00018-003-3088-6","volume":"60","author":"F Antequera","year":"2003","unstructured":"Antequera F: Structure, function and evolution of CpG island promoters. Cell Mol Life Sci 2003, 60(8):1647\u201358. 10.1007\/s00018-003-3088-6","journal-title":"Cell Mol Life Sci"},{"issue":"23","key":"1185_CR3","doi-asserted-by":"publisher","first-page":"7865","DOI":"10.1093\/nar\/10.23.7865","volume":"10","author":"M McClelland","year":"1982","unstructured":"McClelland M, Ivarie R: Asymmetrical distribution of CpG in an 'average' mammalian gene. Nucleic Acids Res 1982, 10(23):7865\u201377.","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"1185_CR4","doi-asserted-by":"publisher","first-page":"647","DOI":"10.1093\/nar\/11.3.647","volume":"11","author":"DN Cooper","year":"1983","unstructured":"Cooper DN, Taggart MH, Bird AP: Unmethylated domains in vertebrate DNA. Nucleic Acids Res 1983, 11(3):647\u201358.","journal-title":"Nucleic Acids Res"},{"issue":"6067","key":"1185_CR5","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1038\/321209a0","volume":"321","author":"AP Bird","year":"1986","unstructured":"Bird AP: CpG-rich islands and the function of DNA methylation. Nature 1986, 321(6067):209\u201313. 10.1038\/321209a0","journal-title":"Nature"},{"issue":"24","key":"1185_CR6","doi-asserted-by":"publisher","first-page":"11995","DOI":"10.1073\/pnas.90.24.11995","volume":"90","author":"F Antequera","year":"1993","unstructured":"Antequera F, Bird A: Number of CpG islands and genes in human and mouse. Proc Natl Acad Sci USA 1993, 90(24):11995\u20139. 10.1073\/pnas.90.24.11995","journal-title":"Proc Natl Acad Sci USA"},{"key":"1185_CR7","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1101\/gad.947102","volume":"16","author":"AP Bird","year":"2002","unstructured":"Bird AP: DNA methylation patterns and epigenetic memory. Genes Dev 2002, 16: 6\u201321. 10.1101\/gad.947102","journal-title":"Genes Dev"},{"issue":"3","key":"1185_CR8","doi-asserted-by":"publisher","first-page":"503","DOI":"10.1016\/0092-8674(90)90015-7","volume":"62","author":"F Antequera","year":"1990","unstructured":"Antequera F, Boyes J, Bird A: High levels of de novo methylation and altered chromatin structure at CpG islands in cell lines. Cell 1990, 62(3):503\u201314. 10.1016\/0092-8674(90)90015-7","journal-title":"Cell"},{"issue":"8","key":"1185_CR9","first-page":"3225","volume":"61","author":"M Esteller","year":"2001","unstructured":"Esteller M, Corn PG, Baylin SB, Herman JG: A gene hypermethylation profile of human cancer. Cancer Res 2001, 61(8):3225\u20139.","journal-title":"Cancer Res"},{"issue":"7","key":"1185_CR10","doi-asserted-by":"publisher","first-page":"687","DOI":"10.1093\/hmg\/10.7.687","volume":"10","author":"SB Baylin","year":"2001","unstructured":"Baylin SB, Esteller M, Rountree MR, Bachman KE, Schuebel K, Herman JG: Aberrant patterns of DNA methylation, chromatin formation and gene expression in cancer. Hum Mol Genet 2001, 10(7):687\u201392. 10.1093\/hmg\/10.7.687","journal-title":"Hum Mol Genet"},{"issue":"12","key":"1185_CR11","doi-asserted-by":"publisher","first-page":"988","DOI":"10.1038\/nrc1507","volume":"4","author":"JP Issa","year":"2004","unstructured":"Issa JP: CpG island methylator phenotype in cancer. Nat Rev Cancer 2004, 4(12):988\u201393. 10.1038\/nrc1507","journal-title":"Nat Rev Cancer"},{"issue":"5","key":"1185_CR12","doi-asserted-by":"publisher","first-page":"1412","DOI":"10.1073\/pnas.0510310103","volume":"103","author":"S Saxonov","year":"2006","unstructured":"Saxonov S, Berg P, Brutlag DL: A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc Natl Acad Sci USA 2006, 103(5):1412\u20137. 10.1073\/pnas.0510310103","journal-title":"Proc Natl Acad Sci USA"},{"issue":"4","key":"1185_CR13","doi-asserted-by":"publisher","first-page":"1095","DOI":"10.1016\/0888-7543(92)90024-M","volume":"13","author":"F Larsen","year":"1992","unstructured":"Larsen F, Gundersen G, Lopez R, Prydz H: CpG islands as gene markers in the human genome. Genomics 1992, 13(4):1095\u2013107. 10.1016\/0888-7543(92)90024-M","journal-title":"Genomics"},{"key":"1185_CR14","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1016\/S0097-8485(02)00010-4","volume":"26","author":"W Li","year":"2002","unstructured":"Li W, Bernaola-Galv\u00e1n PA, Haghighi F, Grosse I: Applications of recursive segmentation to the analysis of DNA sequences. Comput Chem 2002, 26: 491\u2013509. 10.1016\/S0097-8485(02)00010-4","journal-title":"Comput Chem"},{"issue":"4","key":"1185_CR15","doi-asserted-by":"publisher","first-page":"631","DOI":"10.1093\/bioinformatics\/18.4.631","volume":"18","author":"L Ponger","year":"2002","unstructured":"Ponger L, Mouchiroud D: CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences. Bioinformatics 2002, 18(4):631\u20133. 10.1093\/bioinformatics\/18.4.631","journal-title":"Bioinformatics"},{"issue":"6","key":"1185_CR16","doi-asserted-by":"publisher","first-page":"3740","DOI":"10.1073\/pnas.052410099","volume":"99","author":"D Takai","year":"2002","unstructured":"Takai D, Jones PA: Comprehensive analysis of CpG islands in human chromosomes 21 and 22. Proc Natl Acad Sci USA 2002, 99(6):3740\u20135. 10.1073\/pnas.052410099","journal-title":"Proc Natl Acad Sci USA"},{"issue":"3","key":"1185_CR17","first-page":"235","volume":"3","author":"D Takai","year":"2003","unstructured":"Takai D, Jones PA: The CpG island searcher: a new WWW resource. In Silico Biol 2003, 3(3):235\u201340.","journal-title":"In Silico Biol"},{"issue":"7","key":"1185_CR18","doi-asserted-by":"publisher","first-page":"1170","DOI":"10.1093\/bioinformatics\/bth059","volume":"20","author":"Y Wang","year":"2004","unstructured":"Wang Y, Leung FC: An evaluation of new criteria for CpG islands in the human genome as gene markers. Bioinformatics 2004, 20(7):1170\u20137. 10.1093\/bioinformatics\/bth059","journal-title":"Bioinformatics"},{"issue":"6 Pt 1","key":"1185_CR19","doi-asserted-by":"publisher","first-page":"061925","DOI":"10.1103\/PhysRevE.71.061925","volume":"71","author":"PL Luque-Escamilla","year":"2005","unstructured":"Luque-Escamilla PL, Martinez-Aroza J, Oliver JL, Gomez-Lopera JF, Roman-Roldan R: Compositional searching of CpG islands in the human genome. Phys Rev E Stat Nonlin Soft Matter Phys 2005, 71(6 Pt 1):061925.","journal-title":"Phys Rev E Stat Nonlin Soft Matter Phys"},{"issue":"2","key":"1185_CR20","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1016\/0022-2836(87)90689-9","volume":"196","author":"M Gardiner-Garden","year":"1987","unstructured":"Gardiner-Garden M, Frommer M: CpG islands in vertebrate genomes. J Mol Biol 1987, 196(2):261\u201382. 10.1016\/0022-2836(87)90689-9","journal-title":"J Mol Biol"},{"issue":"1\u20132","key":"1185_CR21","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1016\/S0378-1119(01)00672-2","volume":"276","author":"W Li","year":"2001","unstructured":"Li W: Delineating relative homogeneous G+C domains in DNA sequences. Gene 2001, 276(1\u20132):57\u201372. 10.1016\/S0378-1119(01)00672-2","journal-title":"Gene"},{"issue":"3","key":"1185_CR22","doi-asserted-by":"publisher","first-page":"353","DOI":"10.1006\/geno.1996.0298","volume":"34","author":"M Burset","year":"1996","unstructured":"Burset M, Guigo R: Evaluation of gene structure prediction programs. Genomics 1996, 34(3):353\u201367. 10.1006\/geno.1996.0298","journal-title":"Genomics"},{"issue":"1","key":"1185_CR23","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1006\/dbio.2001.0560","volume":"243","author":"I Stancheva","year":"2002","unstructured":"Stancheva I, El-Maarri O, Walter J, Niveleau A, Meehan RR: DNA methylation at promoter regions regulates the timing of gene activation in Xenopus laevis embryos. Dev Biol 2002, 243(1):155\u201365. 10.1006\/dbio.2001.0560","journal-title":"Dev Biol"},{"issue":"2","key":"1185_CR24","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1038\/ng886","volume":"31","author":"BW Futscher","year":"2002","unstructured":"Futscher BW, Oshiro MM, Wozniak RJ, Holtan N, Hanigan CL, Duan H, Domann FE: Role for DNA methylation in the control of cell type specific maspin expression. Nat Genet 2002, 31(2):175\u20139. 10.1038\/ng886","journal-title":"Nat Genet"},{"issue":"11","key":"1185_CR25","doi-asserted-by":"publisher","first-page":"7327","DOI":"10.1128\/MCB.19.11.7327","volume":"19","author":"C De Smet","year":"1999","unstructured":"De Smet C, Lurquin C, Lethe B, Martelange V, Boon T: DNA methylation is the primary silencing mechanism for a set of germ line- and tumor-specific genes with a CpG-rich promoter. Mol Cell Biol 1999, 19(11):7327\u201335.","journal-title":"Mol Cell Biol"},{"issue":"6","key":"1185_CR26","doi-asserted-by":"publisher","first-page":"830","DOI":"10.1101\/gr.3430605","volume":"15","author":"TH Kim","year":"2005","unstructured":"Kim TH, Barrera LO, Qu C, Van Calcar S, Trinklein ND, Cooper SJ, Luna RM, Glass CK, Rosenfeld MG, Myers RM, Ren B: Direct isolation and identification of promoters in the human genome. Genome Res 2005, 15(6):830\u20139. 10.1101\/gr.3430605","journal-title":"Genome Res"},{"issue":"6","key":"1185_CR27","doi-asserted-by":"publisher","first-page":"626","DOI":"10.1038\/ng1789","volume":"38","author":"P Carninci","year":"2006","unstructured":"Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, Ponjavic J, Semple CA, Taylor MS, Engstrom PG, Frith MC, Forrest AR, Alkema WB, Tan SL, Plessy C, Kodzius R, Ravasi T, Kasukawa T, Fukuda S, Kanamori-Katayama M, Kitazume Y, Kawaji H, Kai C, Nakamura M, Konno H, Nakano K, Mottagui-Tabar S, Arner P, Chesi A, Gustincich S, Persichetti F, Suzuki H, Grimmond SM, Wells CA, Orlando V, Wahlestedt C, Liu ET, Harbers M, Kawai J, Bajic VB, Hume DA, Hayashizaki Y: Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 2006, 38(6):626\u201335. 10.1038\/ng1789","journal-title":"Nat Genet"},{"issue":"2","key":"1185_CR28","doi-asserted-by":"publisher","first-page":"e17","DOI":"10.1371\/journal.pgen.0020017","volume":"2","author":"NC Wong","year":"2006","unstructured":"Wong NC, Wong LH, Quach JM, Canham P, Craig JM, Song JZ, Clark SJ, Choo KH: Permissive transcriptional activity at the centromere through pockets of DNA hypomethylation. PLoS Genet 2006, 2(2):e17. 10.1371\/journal.pgen.0020017","journal-title":"PLoS Genet"},{"issue":"8","key":"1185_CR29","doi-asserted-by":"publisher","first-page":"1034","DOI":"10.1101\/gr.3715005","volume":"15","author":"A Siepel","year":"2005","unstructured":"Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 2005, 15(8):1034\u201350. 10.1101\/gr.3715005","journal-title":"Genome Res"},{"key":"1185_CR30","unstructured":"UCSC Genome Browser[http:\/\/genome.ucsc.edu]"},{"key":"1185_CR31","unstructured":"The RefSeq Database[http:\/\/www.ncbi.nih.gov\/RefSeq]"},{"issue":"1","key":"1185_CR32","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1038\/79189","volume":"26","author":"IP Ioshikhes","year":"2000","unstructured":"Ioshikhes IP, Zhang MQ: Large-scale human promoter mapping using CpG islands. Nat Genet 2000, 26(1):61\u20133. 10.1038\/79189","journal-title":"Nat Genet"},{"issue":"9","key":"1185_CR33","doi-asserted-by":"publisher","first-page":"2952","DOI":"10.1093\/nar\/gki582","volume":"33","author":"LE Heisler","year":"2005","unstructured":"Heisler LE, Torti D, Boutros PC, Watson J, Chan C, Winegarden N, Takahashi M, Yau P, Huang TH, Farnham PJ, Jurisica I, Woodgett JR, Bremner R, Penn LZ, Der SD: CpG Island microarray probe sequences derived from a physical library are representative of CpG Islands annotated on the human genome. Nucleic Acids Res 2005, 33(9):2952\u201361. 10.1093\/nar\/gki582","journal-title":"Nucleic Acids Res"},{"issue":"Database issue","key":"1185_CR34","doi-asserted-by":"publisher","first-page":"D86","DOI":"10.1093\/nar\/gkj129","volume":"34","author":"R Yamashita","year":"2006","unstructured":"Yamashita R, Suzuki Y, Wakaguri H, Tsuritani K, Nakai K, Sugano S: DBTSS: DataBase of Human Transcription Start Sites, progress report 2006. Nucleic Acids Res 2006, 34(Database issue):D86\u20139. 10.1093\/nar\/gkj129","journal-title":"Nucleic Acids Res"},{"issue":"6","key":"1185_CR35","first-page":"526","volume":"2","author":"SF Altschul","year":"1985","unstructured":"Altschul SF, Erickson BW: Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. Mol Biol Evol 1985, 2(6):526\u201338.","journal-title":"Mol Biol Evol"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-446.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T11:04:03Z","timestamp":1630494243000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-7-446"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,10,12]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["1185"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-7-446","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,10,12]]},"assertion":[{"value":"22 June 2006","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 October 2006","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 October 2006","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"446"}}