{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T16:53:40Z","timestamp":1742403220350},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,2,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) are fundamental protein\u2013DNA interactions in transcriptional regulation. Extensive efforts have been made to better understand the protein\u2013DNA interactions. Recent mining on exact TF\u2013TFBS-associated sequence patterns (rules) has shown great potentials and achieved very promising results. However, exact rules cannot handle variations in real data, resulting in limited informative rules. In this article, we generalize the exact rules to approximate ones for both TFs and TFBSs, which are essential for biological variations.<\/jats:p>\n               <jats:p>Results: A progressive approach is proposed to address the approximation to alleviate the computational requirements. Firstly, similar TFBSs are grouped from the available TF\u2013TFBS data (TRANSFAC database). Secondly, approximate and highly conserved binding cores are discovered from TF sequences corresponding to each TFBS group. A customized algorithm is developed for the specific objective. We discover the approximate TF\u2013TFBS rules by associating the grouped TFBS consensuses and TF cores. The rules discovered are evaluated by matching (verifying with) the actual protein\u2013DNA binding pairs from Protein Data Bank (PDB) 3D structures. The approximate results exhibit many more verified rules and up to 300% better verification ratios than the exact ones. The customized algorithm achieves over 73% better verification ratios than traditional methods. Approximate rules (64\u201379%) are shown statistically significant. Detailed variation analysis and conservation verification on NCBI records demonstrate that the approximate rules reveal both the flexible and specific protein\u2013DNA interactions accurately. The approximate TF\u2013TFBS rules discovered show great generalized capability of exploring more informative binding rules.<\/jats:p>\n               <jats:p>Availability: \u00a0Supplementary Data are available on Bioinformatics online and http:\/\/www.cse.cuhk.edu.hk\/.<\/jats:p>\n               <jats:p>Contact: \u00a0tmchan@cse.cuhk.edu.hk<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq682","type":"journal-article","created":{"date-parts":[[2010,12,31]],"date-time":"2010-12-31T01:59:43Z","timestamp":1293760783000},"page":"471-478","source":"Crossref","is-referenced-by-count":14,"title":["Discovering approximate-associated sequence patterns for protein\u2013DNA interactions"],"prefix":"10.1093","volume":"27","author":[{"given":"Tak-Ming","family":"Chan","sequence":"first","affiliation":[{"name":"1 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N. T., Hong Kong, 2Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Jeddah, KSA, 3School of Biomedical Sciences, The Chinese University of Hong Kong and 4Hong Kong Bioinformatics Centre, Shatin, N. T., Hong Kong"}]},{"given":"Ka-Chun","family":"Wong","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N. T., Hong Kong, 2Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Jeddah, KSA, 3School of Biomedical Sciences, The Chinese University of Hong Kong and 4Hong Kong Bioinformatics Centre, Shatin, N. T., Hong Kong"},{"name":"1 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N. T., Hong Kong, 2Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Jeddah, KSA, 3School of Biomedical Sciences, The Chinese University of Hong Kong and 4Hong Kong Bioinformatics Centre, Shatin, N. T., Hong Kong"}]},{"given":"Kin-Hong","family":"Lee","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N. T., Hong Kong, 2Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Jeddah, KSA, 3School of Biomedical Sciences, The Chinese University of Hong Kong and 4Hong Kong Bioinformatics Centre, Shatin, N. T., Hong Kong"}]},{"given":"Man-Hon","family":"Wong","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N. T., Hong Kong, 2Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Jeddah, KSA, 3School of Biomedical Sciences, The Chinese University of Hong Kong and 4Hong Kong Bioinformatics Centre, Shatin, N. T., Hong Kong"}]},{"given":"Chi-Kong","family":"Lau","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N. T., Hong Kong, 2Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Jeddah, KSA, 3School of Biomedical Sciences, The Chinese University of Hong Kong and 4Hong Kong Bioinformatics Centre, Shatin, N. T., Hong Kong"}]},{"given":"Stephen Kwok-Wing","family":"Tsui","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N. T., Hong Kong, 2Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Jeddah, KSA, 3School of Biomedical Sciences, The Chinese University of Hong Kong and 4Hong Kong Bioinformatics Centre, Shatin, N. T., Hong Kong"},{"name":"1 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N. T., Hong Kong, 2Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Jeddah, KSA, 3School of Biomedical Sciences, The Chinese University of Hong Kong and 4Hong Kong Bioinformatics Centre, Shatin, N. T., Hong Kong"}]},{"given":"Kwong-Sak","family":"Leung","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N. T., Hong Kong, 2Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Jeddah, KSA, 3School of Biomedical Sciences, The Chinese University of Hong Kong and 4Hong Kong Bioinformatics Centre, Shatin, N. T., Hong Kong"}]}],"member":"286","published-online":{"date-parts":[[2010,12,30]]},"reference":[{"key":"2023012511573551500_B1","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1093\/bioinformatics\/btg432","article-title":"Analysis and prediction of dna-binding proteins and their binding residues based on composition, sequence and structural information","volume":"20","author":"Ahmad","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012511573551500_B2","doi-asserted-by":"crossref","first-page":"5922","DOI":"10.1093\/nar\/gkn573","article-title":"Protein-DNA interactions: structural, thermodynamic and clustering patterns of conserved residues in DNA-binding proteins","volume":"36","author":"Ahmad","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012511573551500_B3","first-page":"28","article-title":"Fitting a mixture model by expectation maximization to discover motifs in biopolymers","volume-title":"Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology","author":"Bailey","year":"1994"},{"key":"2023012511573551500_B4","doi-asserted-by":"crossref","first-page":"D138","DOI":"10.1093\/nar\/gkh121","article-title":"The pfam protein families database","volume":"32","author":"Bateman","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012511573551500_B5","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The protein data bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"2023012511573551500_B6","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1186\/1471-2105-9-19","article-title":"Nestedmica as an ab initio protein motif discovery tool","volume":"9","author":"Do\u011fruel","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012511573551500_B7","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1146\/annurev.bi.53.070184.003115","article-title":"Three-dimensional structure of membrane and surface proteins","volume":"53","author":"Eisenberg","year":"1984","journal-title":"Annu. Rev. Biochem."},{"key":"2023012511573551500_B8","doi-asserted-by":"crossref","first-page":"3157","DOI":"10.1093\/nar\/5.9.3157","article-title":"DNAse footprinting: a simple method for the detection of protein-DNA binding specificity","volume":"5","author":"Galas","year":"1987","journal-title":"Nucleic Acids Res."},{"key":"2023012511573551500_B9","doi-asserted-by":"crossref","first-page":"3047","DOI":"10.1093\/nar\/9.13.3047","article-title":"A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the escherichia coli lactose operon regulatory system","volume":"9","author":"Garner","year":"1981","journal-title":"Nucleic Acids Res."},{"issue":"Suppl. 1","key":"2023012511573551500_B10","first-page":"D245","article-title":"The 20 years of prosite","volume":"36","author":"Hulo","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012511573551500_B11","doi-asserted-by":"crossref","first-page":"1557","DOI":"10.1093\/bioinformatics\/bth127","article-title":"BioOptimizer: a Bayesian scoring function approach to motif discovery","volume":"20","author":"Jensen","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012511573551500_B12","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1214\/088342304000000107","article-title":"Computational discovery of gene regulatory binding motifs: a bayesian perspective","volume":"19","author":"Jensen","year":"2004","journal-title":"Stat. Sci."},{"key":"2023012511573551500_B13","doi-asserted-by":"crossref","first-page":"877","DOI":"10.1006\/jmbi.1999.2659","article-title":"Protein-dna interactions: a structural analysis","volume":"287","author":"Jones","year":"1999","journal-title":"J. Mol. Biol."},{"key":"2023012511573551500_B14","doi-asserted-by":"crossref","first-page":"7189","DOI":"10.1093\/nar\/gkg922","article-title":"Using electrostatic potentials to predict dna-binding sites on dna-binding proteins","volume":"31","author":"Jones","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023012511573551500_B15","doi-asserted-by":"crossref","first-page":"532","DOI":"10.1093\/nar\/gkg161","article-title":"Structural classification of zinc fingers: survey and summary","volume":"31","author":"Krishna","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023012511573551500_B16","doi-asserted-by":"crossref","first-page":"6324","DOI":"10.1093\/nar\/gkq500","article-title":"Discovering protein-DNA binding sequence patterns using association rule mining","volume":"38","author":"Leung","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023012511573551500_B17","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1006\/jcss.2002.1823","article-title":"Finding similar regions in many sequences","volume":"65","author":"Li","year":"2002","journal-title":"J. Comput. Syst. Sci."},{"key":"2023012511573551500_B18","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012511573551500_B19","doi-asserted-by":"crossref","first-page":"991","DOI":"10.1016\/S0022-2836(02)00571-5","article-title":"Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity","volume":"320","author":"Luscombe","year":"2002","journal-title":"J. Mol. Biol."},{"key":"2023012511573551500_B20","doi-asserted-by":"crossref","first-page":"REVIEWS001","DOI":"10.1186\/gb-2000-1-1-reviews001","article-title":"An overview of the structures of protein-dna complexes","volume":"1","author":"Luscombe","year":"2000","journal-title":"Genome Biol."},{"key":"2023012511573551500_B21","doi-asserted-by":"crossref","first-page":"2860","DOI":"10.1093\/nar\/29.13.2860","article-title":"Amino acid-base interactions: a three-dimensional analysis of protein-dna interactions at an atomic level","volume":"29","author":"Luscombe","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023012511573551500_B22","doi-asserted-by":"crossref","first-page":"e36","DOI":"10.1371\/journal.pcbi.0020036","article-title":"Practical strategies for discovering regulatory dna sequence motifs","volume":"2","author":"MacIsaac","year":"2006","journal-title":"PLoS Comput. Biol."},{"key":"2023012511573551500_B23","doi-asserted-by":"crossref","first-page":"2306","DOI":"10.1093\/nar\/26.10.2306","article-title":"Quantitative parameters for amino acid-base interaction: implications for prediction of protein-dna binding sites","volume":"26","author":"Mandel-Gutfreund","year":"1998","journal-title":"Nucleic Acids Res."},{"key":"2023012511573551500_B24","doi-asserted-by":"crossref","first-page":"370","DOI":"10.1006\/jmbi.1995.0559","article-title":"Comprehensive analysis of hydrogen bonds in regulatory protein dna-complexes: in search of common principles","volume":"253","author":"Mandel-Gutfreund","year":"1995","journal-title":"J. Mol. Biol."},{"key":"2023012511573551500_B25","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1093\/nar\/gkj143","article-title":"Transfac and its module transcompel: transcriptional gene regulation in eukaryotes","volume":"34","author":"Matys","year":"2006","journal-title":"Nucleic Acids Res."},{"issue":"Pt 1","key":"2023012511573551500_B26","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1042\/bj3440245","article-title":"Cloning and characterization of two nuclear receptors from the filarial nematode Brugia pahangi","volume":"344","author":"Moore","year":"1999","journal-title":"Biochem. J."},{"issue":"Suppl. 2","key":"2023012511573551500_B27","doi-asserted-by":"crossref","first-page":"W350","DOI":"10.1093\/nar\/gkl159","article-title":"Dilimot: discovery of linear motifs in proteins","volume":"34","author":"Neduva","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023012511573551500_B28","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1093\/bioinformatics\/bti1117","article-title":"Computational discovery of transcriptional regulatory rules","volume":"21","author":"Pham","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012511573551500_B29","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1093\/bioinformatics\/14.1.55","article-title":"Combinatorial pattern discovery in biological sequences: the teiresias algorithm","volume":"14","author":"Rigoutsos","year":"1998","journal-title":"Bioinformatics"},{"key":"2023012511573551500_B30","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1146\/annurev.biophys.34.040204.144537","article-title":"Protein-dna recognition patterns and predictions","volume":"34","author":"Sarai","year":"2005","journal-title":"Annu. Rev. Biophys. Biomol. Struct."},{"key":"2023012511573551500_B31","doi-asserted-by":"crossref","first-page":"D5","DOI":"10.1093\/nar\/gkp967","article-title":"Database resources of the national center for biotechnology information","volume":"38","author":"Sayers","year":"2010","journal-title":"Nucleic Acids Res."},{"issue":"Suppl. 1","key":"2023012511573551500_B32","doi-asserted-by":"crossref","first-page":"i403","DOI":"10.1093\/bioinformatics\/bti1043","article-title":"Mining ChIP-chip data for transcription factor and cofactor binding sites","volume":"21","author":"Smith","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012511573551500_B33","first-page":"241","article-title":"Computer methods for analyzing sequence recognition of nucleic acids","volume":"17","author":"Stormo","year":"1988","journal-title":"Annu. Rev. BioChem."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/4\/471\/48864713\/bioinformatics_27_4_471.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/4\/471\/48864713\/bioinformatics_27_4_471.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T12:45:21Z","timestamp":1674650721000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/27\/4\/471\/197544"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,12,30]]},"references-count":33,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2011,2,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq682","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2011,2,15]]},"published":{"date-parts":[[2010,12,30]]}}}