{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T15:31:13Z","timestamp":1763479873153},"reference-count":52,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1576,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constructing useful models, especially for C2H2 zinc finger proteins, but it remains a challenging problem with ample room for improvement. For most families of transcription factors the best available methods utilize k-nearest neighbor (KNN) algorithms to make specificity predictions based on the average of the specificities of the k most similar proteins with defined specificities. Homeodomain (HD) proteins are the second most abundant family of transcription factors, after zinc fingers, in most metazoan genomes, and as a consequence an effective recognition model for this family would facilitate predictive models of many transcriptional regulatory networks within these genomes.<\/jats:p>\n               <jats:p>Results: Using extensive experimental data, we have tested several machine learning approaches and find that both support vector machines and random forests (RFs) can produce recognition models for HD proteins that are significant improvements over KNN-based methods. Cross-validation analyses show that the resulting models are capable of predicting specificities with high accuracy. We have produced a web-based prediction tool, PreMoTF (Predicted Motifs for Transcription Factors) (http:\/\/stormo.wustl.edu\/PreMoTF), for predicting position frequency matrices from protein sequence using a RF-based model.<\/jats:p>\n               <jats:p>Contact: \u00a0stormo@wustl.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts202","type":"journal-article","created":{"date-parts":[[2012,6,11]],"date-time":"2012-06-11T14:09:18Z","timestamp":1339423758000},"page":"i84-i89","source":"Crossref","is-referenced-by-count":43,"title":["Recognition models to predict DNA-binding specificities of homeodomain proteins"],"prefix":"10.1093","volume":"28","author":[{"given":"Ryan G.","family":"Christensen","sequence":"first","affiliation":[{"name":"1 Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, 2Program in Gene Function and Expression, 3Department of Biochemistry and Molecular Pharmacology, 4Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Metewo Selase","family":"Enuameh","sequence":"additional","affiliation":[{"name":"1 Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, 2Program in Gene Function and Expression, 3Department of Biochemistry and Molecular Pharmacology, 4Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marcus B.","family":"Noyes","sequence":"additional","affiliation":[{"name":"1 Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, 2Program in Gene Function and Expression, 3Department of Biochemistry and Molecular Pharmacology, 4Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA"},{"name":"1 Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, 2Program in Gene Function and Expression, 3Department of Biochemistry and Molecular Pharmacology, 4Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael H.","family":"Brodsky","sequence":"additional","affiliation":[{"name":"1 Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, 2Program in Gene Function and Expression, 3Department of Biochemistry and Molecular Pharmacology, 4Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA"},{"name":"1 Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, 2Program in Gene Function and Expression, 3Department of Biochemistry and Molecular Pharmacology, 4Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Scot A.","family":"Wolfe","sequence":"additional","affiliation":[{"name":"1 Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, 2Program in Gene Function and Expression, 3Department of Biochemistry and Molecular Pharmacology, 4Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA"},{"name":"1 Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, 2Program in Gene Function and Expression, 3Department of Biochemistry and Molecular Pharmacology, 4Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gary D.","family":"Stormo","sequence":"additional","affiliation":[{"name":"1 Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, 2Program in Gene Function and Expression, 3Department of Biochemistry and Molecular Pharmacology, 4Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2012,6,9]]},"reference":[{"key":"2023012512383574000_B1","doi-asserted-by":"crossref","first-page":"14601","DOI":"10.1021\/bi00044a040","article-title":"Specificity of minor-groove and major-groove interactions in a homeodomain-DNA complex","volume":"34","author":"Ades","year":"1995","journal-title":"Biochemistry"},{"key":"2023012512383574000_B2","doi-asserted-by":"crossref","first-page":"1012","DOI":"10.1093\/bioinformatics\/btn645","article-title":"Predicting the binding preference of transcription factors to individual DNA k-mers","volume":"25","author":"Alleyne","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512383574000_B3","doi-asserted-by":"crossref","first-page":"260","DOI":"10.1093\/nar\/27.1.260","article-title":"Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins","volume":"27","author":"Bateman","year":"1999","journal-title":"Nucleic Acids Res."},{"key":"2023012512383574000_B4","first-page":"115","article-title":"SAMIE: statistical algorithm for modeling interaction energies","volume":"6","author":"Benos","year":"2001","journal-title":"Pac. Symp. Biocomput."},{"key":"2023012512383574000_B5","doi-asserted-by":"crossref","first-page":"466","DOI":"10.1002\/bies.10073","article-title":"Is there a code for protein-DNA recognition? Probab(ilistical)ly","volume":"24","author":"Benos","year":"2002","journal-title":"Bioessays"},{"key":"2023012512383574000_B6","doi-asserted-by":"crossref","first-page":"701","DOI":"10.1016\/S0022-2836(02)00917-8","article-title":"Probabilistic code for DNA recognition by proteins of the EGR family","volume":"323","author":"Benos","year":"2002","journal-title":"J. Mol. Biol."},{"key":"2023012512383574000_B7","doi-asserted-by":"crossref","first-page":"1266","DOI":"10.1016\/j.cell.2008.05.024","article-title":"Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences","volume":"133","author":"Berger","year":"2008","journal-title":"Cell"},{"key":"2023012512383574000_B8","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Radom forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"2023012512383574000_B9","doi-asserted-by":"crossref","first-page":"4173","DOI":"10.1093\/nar\/25.21.4173","article-title":"Analysis of TALE superclass homeobox genes (MEIS, PBC, KNOX, Iroquois, TGIF) reveals a novel domain conserved between plants and animals","volume":"25","author":"Burglin","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023012512383574000_B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1961189.1961199","article-title":"LIBSVM: a library for support vector machines","volume":"2","author":"Chang","year":"2011","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"2023012512383574000_B11","doi-asserted-by":"crossref","first-page":"11168","DOI":"10.1073\/pnas.91.23.11168","article-title":"Selection of DNA binding sites for zinc fingers using rationally randomized DNA reveals coded interactions","volume":"91","author":"Choo","year":"1994","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012512383574000_B12","doi-asserted-by":"crossref","first-page":"11163","DOI":"10.1073\/pnas.91.23.11163","article-title":"Toward a code for the interactions of zinc fingers with DNA: selection of randomized fingers displayed on phage","volume":"91","author":"Choo","year":"1994","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012512383574000_B13","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1016\/S0959-440X(97)80015-2","article-title":"Physical basis of a protein-DNA recognition code","volume":"7","author":"Choo","year":"1997","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023012512383574000_B14","doi-asserted-by":"crossref","first-page":"1188","DOI":"10.1101\/gr.849004","article-title":"WebLogo: a sequence logo generator","volume":"14","author":"Crooks","year":"2004","journal-title":"Genome Res."},{"key":"2023012512383574000_B15","doi-asserted-by":"crossref","first-page":"4992","DOI":"10.1002\/j.1460-2075.1996.tb00879.x","article-title":"A molecular code dictates sequence-specific DNA recognition by homeodomains","volume":"15","author":"Damante","year":"1996","journal-title":"The EMBO J."},{"key":"2023012512383574000_B16","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1093\/bioinformatics\/btm604","article-title":"Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction","volume":"24","author":"Dunn","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012512383574000_B17","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden Markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2023012512383574000_B18","doi-asserted-by":"crossref","first-page":"3551","DOI":"10.1002\/j.1460-2075.1994.tb06662.x","article-title":"The degree of variation in DNA sequence recognition among four Drosophila homeotic proteins","volume":"13","author":"Ekker","year":"1994","journal-title":"EMBO J."},{"key":"2023012512383574000_B19","doi-asserted-by":"crossref","first-page":"D211","DOI":"10.1093\/nar\/gkp985","article-title":"The Pfam protein families database","volume":"38","author":"Finn","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2023012512383574000_B20","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1006\/jmbi.1998.2147","article-title":"Engrailed homeodomain-DNA complex at 2.2 A resolution: a detailed view of the interface and comparison with other engrailed structures","volume":"284","author":"Fraenkel","year":"1998","journal-title":"J. Mol. Biol."},{"key":"2023012512383574000_B21","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1146\/annurev.bi.63.070194.002415","article-title":"Homeodomain proteins","volume":"63","author":"Gehring","year":"1994","journal-title":"Annu. Rev. Biochem."},{"key":"2023012512383574000_B22","doi-asserted-by":"crossref","first-page":"e1","DOI":"10.1371\/journal.pcbi.0010001","article-title":"Ab initio prediction of transcription factor targets using structural knowledge","volume":"1","author":"Kaplan","year":"2005","journal-title":"PLoS Comput. Biol."},{"key":"2023012512383574000_B23","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1093\/nar\/gki198","article-title":"MAFFT version 5: improvement in accuracy of multiple sequence alignment","volume":"33","author":"Katoh","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012512383574000_B24","doi-asserted-by":"crossref","first-page":"579","DOI":"10.1016\/0092-8674(90)90453-L","article-title":"Crystal structure of an engrailed homeodomain-DNA complex at 2.8 A resolution: a framework for understanding homeodomain-DNA interactions","volume":"63","author":"Kissinger","year":"1990","journal-title":"Cell"},{"key":"2023012512383574000_B25","doi-asserted-by":"crossref","first-page":"565","DOI":"10.1038\/276565a0","article-title":"A gene complex controlling segmentation in Drosophila","volume":"276","author":"Lewis","year":"1978","journal-title":"Nature"},{"key":"2023012512383574000_B26","first-page":"18","article-title":"Classification and regression by randomForest","volume":"2","author":"Liaw","year":"2002","journal-title":"R News"},{"key":"2023012512383574000_B27","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1016\/0006-291X(90)91385-6","article-title":"Crystallization and preliminary X-ray diffraction studies of the engrailed homeodomain and of an engrailed homeodomain\/DNA complex","volume":"171","author":"Liu","year":"1990","journal-title":"Biochem. Biophys. Res. Commun."},{"key":"2023012512383574000_B28","doi-asserted-by":"crossref","first-page":"1850","DOI":"10.1093\/bioinformatics\/btn331","article-title":"Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors","volume":"24","author":"Liu","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012512383574000_B29","doi-asserted-by":"crossref","first-page":"e61","DOI":"10.1371\/journal.pcbi.0030061","article-title":"DNA familial binding profiles made easy: comparison of various motif alignment and clustering strategies","volume":"3","author":"Mahony","year":"2007","journal-title":"PLoS Comput. Biol."},{"key":"2023012512383574000_B30","doi-asserted-by":"crossref","first-page":"i297","DOI":"10.1093\/bioinformatics\/btm215","article-title":"Inferring protein DNA dependencies using motif alignments and mutual information","volume":"23","author":"Mahony","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012512383574000_B31","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1038\/335294a0","article-title":"Protein-DNA interaction. No code for recognition","volume":"335","author":"Matthews","year":"1988","journal-title":"Nature"},{"key":"2023012512383574000_B32","doi-asserted-by":"crossref","first-page":"D77","DOI":"10.1093\/nar\/gkn660","article-title":"UniPROBE: an online database of protein binding microarray data on protein-DNA interactions","volume":"37","author":"Newburger","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012512383574000_B33","doi-asserted-by":"crossref","first-page":"1277","DOI":"10.1016\/j.cell.2008.05.023","article-title":"Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites","volume":"133","author":"Noyes","year":"2008","journal-title":"Cell"},{"key":"2023012512383574000_B34","doi-asserted-by":"crossref","first-page":"597","DOI":"10.1006\/jmbi.2000.3918","article-title":"Geometric analysis and comparison of protein-DNA interfaces: why is there no simple code for recognition?","volume":"301","author":"Pabo","year":"2000","journal-title":"J. Mol. Biol."},{"key":"2023012512383574000_B35","doi-asserted-by":"crossref","first-page":"714","DOI":"10.1038\/17833","article-title":"Structure of a DNA-bound Ultrabithorax-Extradenticle homeodomain complex","volume":"397","author":"Passner","year":"1999","journal-title":"Nature"},{"key":"2023012512383574000_B36","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1093\/bioinformatics\/btn580","article-title":"Predicting DNA recognition by Cys2His2 zinc finger proteins","volume":"25","author":"Persikov","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512383574000_B37","doi-asserted-by":"crossref","first-page":"035010","DOI":"10.1088\/1478-3975\/8\/3\/035010","article-title":"An expanded binding model for Cys(2)His(2) zinc finger protein-DNA interfaces","volume":"8","author":"Persikov","year":"2011","journal-title":"Phys. Biol."},{"key":"2023012512383574000_B38","doi-asserted-by":"crossref","first-page":"1017","DOI":"10.1016\/j.chembiol.2004.05.008","article-title":"Dissecting the Engrailed homeodomain-DNA interaction by phage-displayed shotgun scanning","volume":"11","author":"Sato","year":"2004","journal-title":"Chem. Biol."},{"key":"2023012512383574000_B39","doi-asserted-by":"crossref","first-page":"804","DOI":"10.1073\/pnas.73.3.804","article-title":"Sequence-specific recognition of double helical nucleic acids by proteins","volume":"73","author":"Seeman","year":"1976","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012512383574000_B40","doi-asserted-by":"crossref","first-page":"1085","DOI":"10.1093\/nar\/gkl1155","article-title":"Structure-based prediction of C2H2 zinc-finger binding specificity: sensitivity to docking geometry","volume":"35","author":"Siggers","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023012512383574000_B41","doi-asserted-by":"crossref","first-page":"1027","DOI":"10.1016\/j.jmb.2004.11.010","article-title":"Structural alignment of protein\u2013DNA interfaces: insights into the determinants of binding specificity","volume":"345","author":"Siggers","year":"2005","journal-title":"J. Mol. Biol."},{"key":"2023012512383574000_B42","doi-asserted-by":"crossref","first-page":"1219","DOI":"10.1534\/genetics.110.126052","article-title":"Maximally efficient modeling of DNA sequence motifs at all levels of complexity","volume":"187","author":"Stormo","year":"2011","journal-title":"Genetics"},{"key":"2023012512383574000_B43","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1038\/nrg2845","article-title":"Determining the specificity of protein-DNA interactions","volume":"11","author":"Stormo","year":"2010","journal-title":"Nat. Rev. Genet."},{"key":"2023012512383574000_B44","doi-asserted-by":"crossref","first-page":"2997","DOI":"10.1093\/nar\/10.9.2997","article-title":"Use of the \u2018Perceptron\u2019 algorithm to distinguish translational initiation sites in E. coli","volume":"10","author":"Stormo","year":"1982","journal-title":"Nucleic Acids Res."},{"key":"2023012512383574000_B45","doi-asserted-by":"crossref","first-page":"832","DOI":"10.1038\/35057011","article-title":"Expressing the human genome","volume":"409","author":"Tupler","year":"2001","journal-title":"Nature"},{"key":"2023012512383574000_B46","doi-asserted-by":"crossref","first-page":"17400","DOI":"10.1073\/pnas.0505147102","article-title":"Identifying the conserved network of cis-regulatory sites of a eukaryotic genome","volume":"102","author":"Wang","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012512383574000_B47","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1016\/0092-8674(91)90526-5","article-title":"Crystal structure of a MAT alpha 2 homeodomain-operator complex suggests a general model for homeodomain-DNA interactions","volume":"67","author":"Wolberger","year":"1991","journal-title":"Cell"},{"key":"2023012512383574000_B48","doi-asserted-by":"crossref","first-page":"1917","DOI":"10.1006\/jmbi.1998.2421","article-title":"Analysis of zinc fingers optimized via phage display: evaluating the utility of a recognition code","volume":"285","author":"Wolfe","year":"1999","journal-title":"J. Mol. Biol."},{"key":"2023012512383574000_B49","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1146\/annurev.biophys.29.1.183","article-title":"DNA recognition by Cys2His2 zinc finger proteins","volume":"29","author":"Wolfe","year":"2000","journal-title":"Ann. Rev. Biophys. Biomol. Struct."},{"key":"2023012512383574000_B50","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1038\/nbt.1893","article-title":"Quantitative analysis demonstrates most transcription factors require only simple models of specificity","volume":"29","author":"Zhao","year":"2011","journal-title":"Nat. Biotechnol."},{"key":"2023012512383574000_B51","doi-asserted-by":"crossref","first-page":"556","DOI":"10.1101\/gr.090233.108","article-title":"High-resolution DNA-binding specificity analysis of yeast transcription factors","volume":"19","author":"Zhu","year":"2009","journal-title":"Genome Res."},{"key":"2023012512383574000_B52","doi-asserted-by":"crossref","first-page":"D111","DOI":"10.1093\/nar\/gkq858","article-title":"FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system","volume":"39","author":"Zhu","year":"2011","journal-title":"Nucleic Acids Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/12\/i84\/48874959\/bioinformatics_28_12_i84.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/12\/i84\/48874959\/bioinformatics_28_12_i84.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T16:40:52Z","timestamp":1674664852000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/12\/i84\/267417"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,6,9]]},"references-count":52,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2012,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts202","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,6,15]]},"published":{"date-parts":[[2012,6,9]]}}}