{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T01:22:02Z","timestamp":1767921722035,"version":"3.49.0"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2005,5,2]],"date-time":"2005-05-02T00:00:00Z","timestamp":1114992000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"},{"start":{"date-parts":[[2005,5,2]],"date-time":"2005-05-02T00:00:00Z","timestamp":1114992000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                        <jats:title>Background<\/jats:title>\n                        <jats:p>Understanding transcriptional regulation of gene expression is one of the greatest challenges of modern molecular biology. A central role in this mechanism is played by transcription factors, which typically bind to specific, short DNA sequence motifs usually located in the upstream region of the regulated genes. We discuss here a simple and powerful approach for the ab initio identification of these cis-regulatory motifs. The method we present integrates several elements: human-mouse comparison, statistical analysis of genomic sequences and the concept of coregulation. We apply it to a complete scan of the human genome.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Results<\/jats:title>\n                        <jats:p>By using the catalogue of conserved upstream sequences collected in the CORG database we construct sets of genes sharing the same overrepresented motif (short DNA sequence) in their upstream regions both in human and in mouse. We perform this construction for all possible motifs from 5 to 8 nucleotides in length and then filter the resulting sets looking for two types of evidence of coregulation: first, we analyze the Gene Ontology annotation of the genes in the set, searching for statistically significant common annotations; second, we analyze the expression profiles of the genes in the set as measured by microarray experiments, searching for evidence of coexpression. The sets which pass one or both filters are conjectured to contain a significant fraction of coregulated genes, and the upstream motifs characterizing the sets are thus good candidates to be the binding sites of the TF's involved in such regulation.<\/jats:p>\n                        <jats:p>In this way we find various known motifs and also some new candidate binding sites.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Conclusion<\/jats:title>\n                        <jats:p>We have discussed a new integrated algorithm for the \"ab initio\" identification of transcription factor binding sites in the human genome. The method is based on three ingredients: comparative genomics, overrepresentation, different types of coregulation. The method is applied to a full-scan of the human genome, giving satisfactory results.<\/jats:p>\n                     <\/jats:sec>","DOI":"10.1186\/1471-2105-6-110","type":"journal-article","created":{"date-parts":[[2005,5,3]],"date-time":"2005-05-03T18:24:57Z","timestamp":1115144697000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["Ab initio identification of putative human transcription factor binding sites by comparative genomics"],"prefix":"10.1186","volume":"6","author":[{"given":"D","family":"Cor\u00e0","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"C","family":"Herrmann","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"C","family":"Dieterich","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"F","family":"Di Cunto","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"P","family":"Provero","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"M","family":"Caselle","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2005,5,2]]},"reference":[{"key":"435_CR1","doi-asserted-by":"publisher","first-page":"276","DOI":"10.1038\/nrg1315","volume":"5","author":"WW Wassermann","year":"2004","unstructured":"Wassermann WW, Sandelin A: Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 2004, 5: 276\u201387. 10.1038\/nrg1315","journal-title":"Nat Rev Genet"},{"key":"435_CR2","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1038\/35052548","volume":"2","author":"LA Pennacchio","year":"2001","unstructured":"Pennacchio LA, Rubin EM: Genomic strategies to identify mammalian regulatory sequences. Nat Rev Genet 2001, 2: 100\u2013109. 10.1038\/35052548","journal-title":"Nat Rev Genet"},{"key":"435_CR3","doi-asserted-by":"publisher","first-page":"369","DOI":"10.1016\/S0168-9525(00)02081-3","volume":"16","author":"R Hardison","year":"2000","unstructured":"Hardison R: Conserved non-coding sequences are reliable guides to regulatory elements. Trends Genet 2000, 16: 369\u2013372. 10.1016\/S0168-9525(00)02081-3","journal-title":"Trends Genet"},{"key":"435_CR4","doi-asserted-by":"publisher","first-page":"2315","DOI":"10.1093\/nar\/21.10.2315","volume":"21","author":"L Duret","year":"1993","unstructured":"Duret L, Dorkeld F, Gautier C: Strong conservation of non-coding sequences during vertebrates evolution: potential involvement in post-transcriptional regulation of gene expression. Nucleic Acid Res 1993, 21: 2315\u20132322.","journal-title":"Nucleic Acid Res"},{"key":"435_CR5","doi-asserted-by":"publisher","first-page":"136","DOI":"10.1126\/science.288.5463.136","volume":"288","author":"GG Loots","year":"2000","unstructured":"Loots GG, Locksley RM, Blankespoor CM, Wang ZE, Miller W, Rubin EM, Frazer KA: Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 2000, 288: 136\u2013140. 10.1126\/science.288.5463.136","journal-title":"Science"},{"key":"435_CR6","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1038\/72635","volume":"18","author":"B Goettgens","year":"2000","unstructured":"Goettgens B, Barton L, Gilbert J, Bench A, Sanchez M, Bahn S, Mistry S, Grafham D, McMurray A, Vaudin M, Amaya E, Bentley D, Green A, Sinclair A: Analysis of vertebrate scl loci identifies conserved enhancers. Nat Biotechnol 2000, 18: 181\u2013186. 10.1038\/72635","journal-title":"Nat Biotechnol"},{"key":"435_CR7","doi-asserted-by":"publisher","first-page":"371","DOI":"10.1093\/hmg\/10.4.371","volume":"10","author":"J Flint","year":"2001","unstructured":"Flint J, Tufarelli C, Peden J, Clark K, Daniels R, Hardison R, Miller W, Philipsen S, Tan-Un K, McMorrow T, Frampton J, Alter B, Frischauf A, Higgs D: Comparative genome analysis delimits a chromosomal domain and identifies key regulatory elements in the alpha globin cluster. Hum Mol Genet 2001, 10: 371\u2013382. 10.1093\/hmg\/10.4.371","journal-title":"Hum Mol Genet"},{"key":"435_CR8","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1186\/1475-4924-2-13","volume":"2","author":"B Lenhard","year":"2003","unstructured":"Lenhard B, Sandelin A, Mendoza L, Engstr\u00f6m P, Jareborg N, Wasserman WW: Identification of conserved regulatory elements by comparative genome analysis. J Biol 2003, 2: 13. 10.1186\/1475-4924-2-13","journal-title":"J Biol"},{"key":"435_CR9","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1186\/1475-4924-2-11","volume":"2","author":"Z Zhang","year":"2003","unstructured":"Zhang Z, Gerstein M: Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements. J Biol 2003, 2: 11. 10.1186\/1475-4924-2-11","journal-title":"J Biol"},{"key":"435_CR10","doi-asserted-by":"publisher","first-page":"5549","DOI":"10.1093\/nar\/gkf669","volume":"30","author":"S Sinha","year":"2002","unstructured":"Sinha S, Tompa M: Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res 2002, 30: 5549. 10.1093\/nar\/gkf669","journal-title":"Nucleic Acids Res"},{"key":"435_CR11","doi-asserted-by":"publisher","first-page":"3586","DOI":"10.1093\/nar\/gkg618","volume":"31","author":"S Sinha","year":"2003","unstructured":"Sinha S, Tompa M: YMF: A program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res 2003, 31: 3586. 10.1093\/nar\/gkg618","journal-title":"Nucleic Acids Res"},{"key":"435_CR12","doi-asserted-by":"publisher","first-page":"1567","DOI":"10.1101\/gr.158301","volume":"11","author":"K Birnbaum","year":"2001","unstructured":"Birnbaum K, Benfey PN, Shasha DE: cis element\/transcription factor analysis (cis\/TF): a method for discovering transcription factor\/cis element relationships. Genome Research 2001, 11: 1567. 10.1101\/gr.158301","journal-title":"Genome Research"},{"key":"435_CR13","doi-asserted-by":"crossref","first-page":"775","DOI":"10.1101\/gr.9.8.775","volume":"9","author":"TG Wolfsberg","year":"1999","unstructured":"Wolfsberg TG, Gabrielian AE, Campbell MJ, Cho RJ, Spouge JL, Landsman D: Candidate regulatory sequence elements for cell cycle-dependent transcription in Saccharomyces cerevisiae. Genome Research 1999, 9: 775.","journal-title":"Genome Research"},{"issue":"1","key":"435_CR14","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1186\/1471-2105-3-7","volume":"3","author":"M Caselle","year":"2002","unstructured":"Caselle M, Di Cunto F, Provero P: Correlating overrepresented upstream motifs to gene expression: a computational approach to regulatory element discovery in eukaryotes. BMC Bioinformatics 2002, 3(1):7. 10.1186\/1471-2105-3-7","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"435_CR15","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1186\/1471-2105-5-57","volume":"5","author":"D Cora'","year":"2004","unstructured":"Cora' D, Di Cunto F, Provero P, Silengo L, Caselle M: Computational identification of transcription factor binding sites by functional analysis of sets of genes sharing overrepresented upstream motifs. BMC Bioinformatics 2004, 5(1):57. 10.1186\/1471-2105-5-57","journal-title":"BMC Bioinformatics"},{"issue":"5","key":"435_CR16","doi-asserted-by":"publisher","first-page":"827","DOI":"10.1006\/jmbi.1998.1947","volume":"281","author":"J van Helden","year":"1998","unstructured":"van Helden J, Andre B, Collado-Vides J: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 281(5):827\u201342. 1998 Sep 4 10.1006\/jmbi.1998.1947","journal-title":"J Mol Biol"},{"issue":"4","key":"435_CR17","doi-asserted-by":"publisher","first-page":"326","DOI":"10.1093\/bioinformatics\/16.4.326","volume":"16","author":"LJ Jensen","year":"2000","unstructured":"Jensen LJ, Knudsen S: Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation. Bioinformatics 2000, 16(4):326\u201333. 10.1093\/bioinformatics\/16.4.326","journal-title":"Bioinformatics"},{"key":"435_CR18","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1002\/prot.340070105","volume":"7","author":"CE Lawrence","year":"1990","unstructured":"Lawrence CE, Reilly AA: An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 1990, 7: 41.","journal-title":"Proteins"},{"key":"435_CR19","doi-asserted-by":"publisher","first-page":"447","DOI":"10.1089\/10665270252935566","volume":"9","author":"G Thijs","year":"2002","unstructured":"Thijs G, Marchal K, Lescot M, Rombauts S, De Moor B, Rouze P, Moreau Y: A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J Comput Biol 2002, 9: 447. 10.1089\/10665270252935566","journal-title":"J Comput Biol"},{"key":"435_CR20","doi-asserted-by":"publisher","first-page":"3580","DOI":"10.1093\/nar\/gkg608","volume":"31","author":"W Thompson","year":"2003","unstructured":"Thompson W, Rouchka EC, Lawrence CE: Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res 2003, 31: 3580. 10.1093\/nar\/gkg608","journal-title":"Nucleic Acids Res"},{"key":"435_CR21","doi-asserted-by":"publisher","first-page":"1205","DOI":"10.1006\/jmbi.2000.3519","volume":"296","author":"JD Hughes","year":"2000","unstructured":"Hughes JD, Estep PW, Tavazoie S, Church GM: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 2000, 296: 1205. 10.1006\/jmbi.2000.3519","journal-title":"J Mol Biol"},{"key":"435_CR22","first-page":"W249","volume-title":"Nucleic Acids Res","author":"A Sandelin","year":"2004","unstructured":"Sandelin A, Wasserman WW, Lenhard B: ConSite: web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Res (32 (Web Server issue)):W249\u201352. 2004, Jul 1"},{"key":"435_CR23","first-page":"348","volume-title":"Pac Symp Biocomput","author":"A Prakash","year":"2004","unstructured":"Prakash A, Blanchette M, Sinha S, Tompa M: Motif discovery in heterogeneous sequence data. Pac Symp Biocomput 2004, 348\u201359."},{"issue":"26","key":"435_CR24","doi-asserted-by":"publisher","first-page":"12146","DOI":"10.1073\/pnas.92.26.12146","volume":"92","author":"K Ohtani","year":"1995","unstructured":"Ohtani K, DeGregori J, Nevins JR: Regulation of the cyclin E gene by transcription factor E2F1. PNAS 1995, 92(26):12146\u201350.","journal-title":"PNAS"},{"key":"435_CR25","doi-asserted-by":"publisher","first-page":"374","DOI":"10.1093\/nar\/gkg108","volume":"31","author":"V Matys","year":"2003","unstructured":"Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H, Munch R, Reuter I, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 2003, 31: 374. 10.1093\/nar\/gkg108","journal-title":"Nucleic Acids Res"},{"issue":"19","key":"435_CR26","doi-asserted-by":"publisher","first-page":"6679","DOI":"10.1093\/nar\/11.19.6679","volume":"11","author":"DB Sittman","year":"1993","unstructured":"Sittman DB, Graves RA, Marzluff WF: Structure of a cluster of mouse histone genes. Nucleic Acids Res 1993, 11(19):6679\u201397.","journal-title":"Nucleic Acids Res"},{"issue":"4","key":"435_CR27","doi-asserted-by":"publisher","first-page":"1242","DOI":"10.1073\/pnas.88.4.1242","volume":"88","author":"BM Wentworth","year":"1991","unstructured":"Wentworth BM, Donoghue M, Engert JC, Berglund EB, Rosenthal N: Paired MyoD-binding sites regulate myosin light chain gene expression. PNAS 1991, 88(4):1242\u20136.","journal-title":"PNAS"},{"issue":"35","key":"435_CR28","doi-asserted-by":"crossref","first-page":"27013","DOI":"10.1016\/S0021-9258(19)61473-0","volume":"275","author":"Y Wang","year":"2000","unstructured":"Wang Y, Shen J, Arenzana N, Tirasophon W, Kaufman RJ, Prywes R: Activation of ATF6 and an ATF6 DNA binding site by the endoplasmic reticulum stress response. J Biol Chem 2000, 275(35):27013\u201320.","journal-title":"J Biol Chem"},{"issue":"3","key":"435_CR29","doi-asserted-by":"publisher","first-page":"2180","DOI":"10.1074\/jbc.M004430200","volume":"276","author":"K Mizugishi","year":"2001","unstructured":"Mizugishi K, Aruga J, Nakata K, Mikoshiba K: Molecular properties of Zic proteins as transcriptional regulators and their relationship to GLI proteins. J Biol Chem 276(3):2180\u20138. 2001 Jan 19 10.1074\/jbc.M004430200","journal-title":"J Biol Chem"},{"key":"435_CR30","doi-asserted-by":"publisher","first-page":"1723","DOI":"10.1101\/gr.301202","volume":"12","author":"P Sudarsanam","year":"2002","unstructured":"Sudarsanam P, Pilpel Y, Church GM: Genome-wide cooccurrence of promoter elements reveals a cis-regulatory cassette of rRNA transcription motifs in S. cerevisiae. Genome Research 2002, 12: 1723. 10.1101\/gr.301202","journal-title":"Genome Research"},{"key":"435_CR31","doi-asserted-by":"publisher","first-page":"R43","DOI":"10.1186\/gb-2003-4-7-r43","volume":"4","author":"Derek Y Chiang","year":"2003","unstructured":"Chiang DerekY, Moses AlanM, Manolis Kellis , Lander EricS, Eisen MichaelB: Phylogenetically and conserved word pairs associated with gene-expression changes in yeasts. Genome Biology 2003, 4: R43. 10.1186\/gb-2003-4-7-r43","journal-title":"Genome Biology"},{"issue":"Suppl 2","key":"435_CR32","doi-asserted-by":"publisher","first-page":"S84","DOI":"10.1093\/bioinformatics\/18.suppl_2.S84","volume":"18","author":"C Dieterich","year":"2002","unstructured":"Dieterich C, Cusack B, Wang H, Rateitschak K, Krause A, Vingron M: Annotating regulatory DNA based on man-mouse genomic comparison. Bioinformatics 2002, 18(Suppl 2):S84.","journal-title":"Bioinformatics"},{"key":"435_CR33","doi-asserted-by":"publisher","first-page":"723","DOI":"10.1016\/0022-2836(87)90478-5","volume":"197","author":"MS Waterman","year":"1997","unstructured":"Waterman MS, Eggert M: A new algorithm for best subsequence alignmnents with application to tRNA-rRNA comparison. J Mol Biol 1997, 197: 723\u2013728. 10.1016\/0022-2836(87)90478-5","journal-title":"J Mol Biol"},{"key":"435_CR34","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1214\/ss\/1177010382","volume":"9","author":"MS Waterman","year":"1994","unstructured":"Waterman MS, Vingron M: Sequence comparison significance and Poisson approximation. Statistical Science 1994, 9: 367\u2013381.","journal-title":"Statistical Science"},{"key":"435_CR35","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1093\/nar\/gkg007","volume":"31","author":"C Dieterich","year":"2003","unstructured":"Dieterich C, Wang H, Rateitschak K, Luz H, Vingron M: CORG: a database for Comparative Regulatory Genomics. Nucleic Acid Res 2003, 31: 55\u201357. 10.1093\/nar\/gkg007","journal-title":"Nucleic Acid Res"},{"key":"435_CR36","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1038\/75556","volume":"25","author":"The Gene Ontology Consortium","year":"2000","unstructured":"The Gene Ontology Consortium: Gene Ontology: tool for the unification of biology. Nature Genetics 2000, 25: 25\u201329. 10.1038\/75556","journal-title":"Nature Genetics"},{"issue":"6","key":"435_CR37","doi-asserted-by":"publisher","first-page":"1977","DOI":"10.1091\/mbc.02-02-0030.","volume":"13","author":"ML Whitfield","year":"2002","unstructured":"Whitfield ML, Sherlock G, Saldanha AJ, Murray JI, Ball CA, Alexander KE, Matese JC, Perou CM, Hurt MM, Brown PO, Botstein D: Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell 2002, 13(6):1977\u20132000. 10.1091\/mbc.02-02-0030.","journal-title":"Mol Biol Cell"},{"key":"435_CR38","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","volume":"B57","author":"Y Benjamini","year":"1995","unstructured":"Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc 1995, B57: 289.","journal-title":"J R Statist Soc"},{"key":"435_CR39","doi-asserted-by":"publisher","first-page":"1183","DOI":"10.1073\/pnas.86.4.1183","volume":"86","author":"G Stormo","year":"1989","unstructured":"Stormo G, Hartzell GW: Identifying protein-binding sites from unaligned DNA fragments. PNAS 1989, 86: 1183\u20131187.","journal-title":"PNAS"},{"issue":"28(8)","key":"435_CR40","doi-asserted-by":"publisher","first-page":"1808","DOI":"10.1093\/nar\/28.8.1808","volume":"15","author":"J van Helden","year":"2000","unstructured":"van Helden J, Rios AF, Collado-Vidas J: Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acid Res 2000, 15(28(8)):1808\u201318. 10.1093\/nar\/28.8.1808","journal-title":"Nucleic Acid Res"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-110.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/1471-2105-6-110\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-110.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,7]],"date-time":"2024-10-07T12:11:33Z","timestamp":1728303093000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-6-110"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,5,2]]},"references-count":40,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2005,12]]}},"alternative-id":["435"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-6-110","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,5,2]]},"assertion":[{"value":"2 December 2004","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 May 2005","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 May 2005","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"110"}}