{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,3,11]],"date-time":"2024-03-11T17:10:07Z","timestamp":1710177007448},"reference-count":53,"publisher":"Oxford University Press (OUP)","issue":"16","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2760,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,8,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Statistical assessment of cis-regulatory modules (CRMs) is a crucial task in computational biology. Usually, one concludes from exceptional co-occurrences of DNA motifs that the corresponding transcription factors (TFs) are cooperative. However, similar DNA motifs tend to co-occur in random sequences due to high probability of overlapping occurrences. Therefore, it is important to consider similarity of DNA motifs in the statistical assessment.<\/jats:p><jats:p>Results: Based on previous work, we propose to adjust the window size for co-occurrence detection. Using the derived approximation, one obtains different window sizes for different sets of DNA motifs depending on their similarities. This ensures that the probability of co-occurrences in random sequences are equal. Applying the approach to selected similar and dissimilar DNA motifs from human TFs shows the necessity of adjustment and confirms the accuracy of the approximation by comparison to simulated data. Furthermore, it becomes clear that approaches ignoring similarities strongly underestimate P-values for cooperativity of TFs with similar DNA motifs. In addition, the approach is extended to deal with overlapping windows. We derive Chen\u2013Stein error bounds for the approximation. Comparing the error bounds for similar and dissimilar DNA motifs shows that the approximation for similar DNA motifs yields large bounds. Hence, one has to be careful using overlapping windows. Based on the error bounds, one can precompute the approximation errors and select an appropriate overlap scheme before running the analysis.<\/jats:p><jats:p>Availability: Software to perform the calculation for pairs of position frequency matrices (PFMs) is available at http:\/\/mosta.molgen.mpg.de as well as C++ source code for downloading.<\/jats:p><jats:p>Contact: \u00a0utz.pape@molgen.mpg.de<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp143","type":"journal-article","created":{"date-parts":[[2009,3,14]],"date-time":"2009-03-14T00:49:45Z","timestamp":1236991785000},"page":"2103-2109","source":"Crossref","is-referenced-by-count":8,"title":["Statistical detection of cooperative transcription factors with similarity adjustment"],"prefix":"10.1093","volume":"25","author":[{"given":"Utz J.","family":"Pape","sequence":"first","affiliation":[{"name":"1 Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 73 and 2Mathematics and Computer Science, Free University of Berlin, Takustr. 9, 14195 Berlin, Germany"},{"name":"1 Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 73 and 2Mathematics and Computer Science, Free University of Berlin, Takustr. 9, 14195 Berlin, Germany"}]},{"given":"Holger","family":"Klein","sequence":"additional","affiliation":[{"name":"1 Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 73 and 2Mathematics and Computer Science, Free University of Berlin, Takustr. 9, 14195 Berlin, Germany"}]},{"given":"Martin","family":"Vingron","sequence":"additional","affiliation":[{"name":"1 Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 73 and 2Mathematics and Computer Science, Free University of Berlin, Takustr. 9, 14195 Berlin, Germany"}]}],"member":"286","published-online":{"date-parts":[[2009,3,13]]},"reference":[{"issue":"Suppl. 2","key":"2023013112092348100_B1","doi-asserted-by":"crossref","first-page":"ii5","DOI":"10.1093\/bioinformatics\/btg1052","article-title":"Computational detection of cis-regulatory modules","volume":"19","author":"Aerts","year":"2003","journal-title":"Bioinformatics"},{"key":"2023013112092348100_B2","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1145\/360825.360855","article-title":"Efficient string matching","volume":"18","author":"Aho","year":"1975","journal-title":"CACM"},{"key":"2023013112092348100_B3","doi-asserted-by":"crossref","first-page":"1851","DOI":"10.1242\/dev.124.10.1851","article-title":"The hardwiring of development: organization and function of genomic regulatory systems","volume":"124","author":"Arnone","year":"1997","journal-title":"Development"},{"key":"2023013112092348100_B4","first-page":"403","article-title":"Poisson approximation and the Chen-Stein method","volume":"5","author":"Arratia","year":"1990","journal-title":"Stat. Sci."},{"key":"2023013112092348100_B5","doi-asserted-by":"crossref","first-page":"II16","DOI":"10.1093\/bioinformatics\/btg1054","article-title":"Searching for statistically significant regulatory modules","volume":"19","author":"Bailey","year":"2003","journal-title":"Bioinformatics"},{"key":"2023013112092348100_B6","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780198522355.001.0001","volume-title":"Poisson Approximation.","author":"Barbour","year":"1992"},{"key":"2023013112092348100_B7","doi-asserted-by":"crossref","first-page":"R61","DOI":"10.1186\/gb-2004-5-9-r61","article-title":"Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura","volume":"5","author":"Berman","year":"2004","journal-title":"Genome Biol."},{"key":"2023013112092348100_B8","doi-asserted-by":"crossref","first-page":"757","DOI":"10.1073\/pnas.231608898","article-title":"Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome","volume":"99","author":"Berman","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112092348100_B9","doi-asserted-by":"crossref","first-page":"R83","DOI":"10.1186\/gb-2007-8-5-r83","article-title":"A distance difference matrix approach to identifying transcription factors that regulate differential gene expression","volume":"8","author":"Bleser","year":"2007","journal-title":"Genome Biol."},{"key":"2023013112092348100_B10","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1186\/1748-7188-2-13","article-title":"Exact p-value calculation for heterotypic clusters of regulatory motifs and its application in computational annotation of cis-regulatory modules","volume":"2","author":"Boeva","year":"2007","journal-title":"Algorithms Mol. Biol."},{"key":"2023013112092348100_B11","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1006\/dbio.2002.0619","article-title":"New computational approaches for analysis of cis-regulatory networks","volume":"246","author":"Brown","year":"2002","journal-title":"Dev. Biol."},{"key":"2023013112092348100_B12","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1016\/S0021-9258(18)55924-X","article-title":"The composition of the deoxyribonucleic acid of salmon sperm","volume":"192","author":"Chargaff","year":"1951","journal-title":"J. Biol. Chem."},{"key":"2023013112092348100_B13","doi-asserted-by":"crossref","first-page":"534","DOI":"10.1214\/aop\/1176996359","article-title":"Poisson approximation for dependent trials","volume":"3","author":"Chen","year":"1975","journal-title":"Ann. Probab."},{"key":"2023013112092348100_B14","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1038\/nature02189","article-title":"A self-organizing system of repressor gradients establishes segmental complexity in Drosophila","volume":"426","author":"Clyde","year":"2003","journal-title":"Nature"},{"key":"2023013112092348100_B15","doi-asserted-by":"crossref","first-page":"1188","DOI":"10.1101\/gr.849004","article-title":"Weblogo: a sequence logo generator","volume":"14","author":"Crooks","year":"2004","journal-title":"Genome Res"},{"key":"2023013112092348100_B16","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1006\/jmbi.1997.0965","article-title":"A statistical model for locating regulatory regions in genomic DNA","volume":"268","author":"Crowley","year":"1997","journal-title":"J. Mol. Biol."},{"key":"2023013112092348100_B17","doi-asserted-by":"crossref","first-page":"GC19","DOI":"10.1016\/0378-1119(95)00888-8","article-title":"Coordinate positioning of MEF2 and myogenin binding sites","volume":"172","author":"Fickett","year":"1996","journal-title":"Gene"},{"key":"2023013112092348100_B18","doi-asserted-by":"crossref","first-page":"3666","DOI":"10.1093\/nar\/gkg540","article-title":"Cluster-buster: finding dense clusters of motifs in DNA sequences","volume":"31","author":"Frith","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023013112092348100_B19","doi-asserted-by":"crossref","first-page":"878","DOI":"10.1093\/bioinformatics\/17.10.878","article-title":"Detection of cis-element clusters in higher eukaryotic DNA","volume":"17","author":"Frith","year":"2001","journal-title":"Bioinformatics"},{"key":"2023013112092348100_B20","doi-asserted-by":"crossref","first-page":"3214","DOI":"10.1093\/nar\/gkf438","article-title":"Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences","volume":"30","author":"Frith","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023013112092348100_B21","doi-asserted-by":"crossref","first-page":"1372","DOI":"10.1093\/nar\/gkh299","article-title":"Detection of functional DNA motifs via statistical over-representation","volume":"32","author":"Frith","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023013112092348100_B22","doi-asserted-by":"crossref","first-page":"3585","DOI":"10.1093\/nar\/gkl372","article-title":"Computational identification of transcriptional regulatory elements in DNA sequence","volume":"34","author":"GuhaThakurta","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023013112092348100_B23","doi-asserted-by":"crossref","first-page":"608","DOI":"10.1093\/bioinformatics\/17.7.608","article-title":"Identifying target sites for cooperatively binding factors","volume":"17","author":"GuhaThakurta","year":"2001","journal-title":"Bioinformatics"},{"key":"2023013112092348100_B24","doi-asserted-by":"crossref","first-page":"7079","DOI":"10.1073\/pnas.0408743102","article-title":"De novo cis-regulatory module elicitation for eukaryotic genomes","volume":"102","author":"Gupta","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112092348100_B25","doi-asserted-by":"crossref","first-page":"4278","DOI":"10.1093\/nar\/gkf535","article-title":"Predicting transcription factor synergism","volume":"30","author":"Hannenhalli","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023013112092348100_B26","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1038\/nature02800","article-title":"Transcriptional regulatory code of a eukaryotic genome","volume":"431","author":"Harbison","year":"2004","journal-title":"Nature"},{"key":"2023013112092348100_B27","first-page":"109","article-title":"Using transcription factor binding site co-occurrence to predict regulatory regions","volume":"18","author":"Klein","year":"2007","journal-title":"Genome Inform."},{"key":"2023013112092348100_B28","doi-asserted-by":"crossref","first-page":"180","DOI":"10.1093\/bioinformatics\/15.3.180","article-title":"Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity","volume":"15","author":"Klingenhoff","year":"1999","journal-title":"Bioinformatics"},{"key":"2023013112092348100_B29","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1142\/S021972000400065X","article-title":"Searching for transcription factor binding site clusters: how true are true positives?","volume":"2","author":"Krivan","year":"2004","journal-title":"J. Bioinform. Comput. Biol."},{"key":"2023013112092348100_B30","doi-asserted-by":"crossref","first-page":"579","DOI":"10.1101\/gr.668403","article-title":"Uniform clusters in Drosophila","volume":"13","author":"Lifanov","year":"2003","journal-title":"Genome Res."},{"key":"2023013112092348100_B31","article-title":"Detecting functional modules of transcription factor binding sites in the human genome","volume-title":"Lecture Notes in Computer Science.","author":"Manke","year":"2005"},{"key":"2023013112092348100_B32","doi-asserted-by":"crossref","first-page":"763","DOI":"10.1073\/pnas.012591199","article-title":"Genome-wide analysis of clustered dorsal binding sites identifies putative target genes in the Drosophila embryo","volume":"99","author":"Markstein","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112092348100_B33","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1093\/nar\/gkg108","article-title":"TRANSFAC(R): transcriptional regulation, from patterns to profiles","volume":"31","author":"Matys","year":"2003","journal-title":"Nucleic Acids Res."},{"issue":"Suppl. 1","key":"2023013112092348100_B34","doi-asserted-by":"crossref","first-page":"D108","DOI":"10.1093\/nar\/gkj143","article-title":"Transfac(r) and its module transcompel(r): transcriptional gene regulation in eukaryotes","volume":"34","author":"Matys","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023013112092348100_B35","doi-asserted-by":"crossref","first-page":"1032","DOI":"10.1093\/bioinformatics\/btm047","article-title":"Clusterdraw web server: a tool to identify and visualize clusters of binding motifs for transcription factors","volume":"23","author":"Papatsenko","year":"2007","journal-title":"Bioinformatics"},{"key":"2023013112092348100_B36","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1101\/gr.212502","article-title":"Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers","volume":"12","author":"Papatsenko","year":"2002","journal-title":"Genome Res."},{"key":"2023013112092348100_B37","volume-title":"Statistics for Transcription Factor Binding Sites.","author":"Pape","year":"2008"},{"key":"2023013112092348100_B38","article-title":"Statistics for co-occurrence of DNA motifs","volume-title":"Proceedings of the 4th International Workshop on Applied Probability.","author":"Pape","year":"2008"},{"key":"2023013112092348100_B39","first-page":"134","article-title":"A new statistical model to select target sequences bound by transcription factors","volume":"17","author":"Pape","year":"2006","journal-title":"Genome Inform."},{"key":"2023013112092348100_B40","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1089\/cmb.2007.0084","article-title":"Compound Poisson approximation of number of occurrences of a position frequency matrix (PFM) on both strands","volume":"15","author":"Pape","year":"2008","journal-title":"J. Comput. Biol."},{"key":"2023013112092348100_B41","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1093\/bioinformatics\/btm610","article-title":"Natural similarity measures between position frequency matrices with an application to clustering","volume":"24","author":"Pape","year":"2008","journal-title":"Bioinformatics"},{"key":"2023013112092348100_B42","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1038\/ng724","article-title":"Identifying regulatory networks by combinatorial analysis of promoter elements","volume":"29","author":"Pilpel","year":"2001","journal-title":"Nat. Genet."},{"key":"2023013112092348100_B43","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1007\/978-3-540-39763-2_12","article-title":"Dynamic programming algorithms for two statistical problems in computational biology","volume-title":"Proceedings of the 3rd Workshop of Algorithms in Bioinformatics (WABI).","author":"Rahmann","year":"2003"},{"key":"2023013112092348100_B44","doi-asserted-by":"crossref","first-page":"9888","DOI":"10.1073\/pnas.152320899","article-title":"Score: a computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data. site clustering over random expectation","volume":"99","author":"Rebeiz","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112092348100_B45","doi-asserted-by":"crossref","first-page":"3589","DOI":"10.1093\/nar\/gkg544","article-title":"Target explorer: an automated tool for the identification of new target genes for a specified set of transcription factors","volume":"31","author":"Sosinsky","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023013112092348100_B46","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1093\/bioinformatics\/16.1.16","article-title":"DNA binding sites: representation and discovery","volume":"16","author":"Stormo","year":"2000","journal-title":"Bioinformatics"},{"key":"2023013112092348100_B47","doi-asserted-by":"crossref","first-page":"3594","DOI":"10.1093\/nar\/25.18.3594","article-title":"A computational genomics approach to the identification of gene networks","volume":"25","author":"Wagner","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023013112092348100_B48","doi-asserted-by":"crossref","first-page":"776","DOI":"10.1093\/bioinformatics\/15.10.776","article-title":"Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes","volume":"15","author":"Wagner","year":"1999","journal-title":"Bioinformatics"},{"key":"2023013112092348100_B49","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1006\/jmbi.1998.1700","article-title":"Identification of regulatory regions which confer muscle-specific gene expression","volume":"278","author":"Wasserman","year":"1998","journal-title":"J. Mol. Biol."},{"key":"2023013112092348100_B50","doi-asserted-by":"crossref","first-page":"643","DOI":"10.1046\/j.1365-2443.1999.00291.x","article-title":"Long range interaction of cis-DNA elements mediated by architectural transcription factor bach1","volume":"4","author":"Yoshida","year":"1999","journal-title":"Genes Cells"},{"key":"2023013112092348100_B51","doi-asserted-by":"crossref","first-page":"4925","DOI":"10.1093\/nar\/gkl595","article-title":"Computational analysis of tissue-specific combinatorial gene regulation: predicting interaction between transcription factors in human tissues","volume":"34","author":"Yu","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023013112092348100_B52","doi-asserted-by":"crossref","first-page":"1896","DOI":"10.1126\/science.279.5358.1896","article-title":"Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene","volume":"279","author":"Yuh","year":"1998","journal-title":"Science"},{"key":"2023013112092348100_B53","doi-asserted-by":"crossref","first-page":"12114","DOI":"10.1073\/pnas.0402858101","article-title":"CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling","volume":"101","author":"Zhou","year":"2004","journal-title":"Proc. Natl Acad. Sci."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/16\/2103\/48992902\/bioinformatics_25_16_2103.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/16\/2103\/48992902\/bioinformatics_25_16_2103.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,11]],"date-time":"2024-03-11T16:51:26Z","timestamp":1710175886000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/16\/2103\/203951"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,3,13]]},"references-count":53,"journal-issue":{"issue":"16","published-print":{"date-parts":[[2009,8,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp143","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,8,15]]},"published":{"date-parts":[[2009,3,13]]}}}