{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,2,1]],"date-time":"2024-02-01T18:10:17Z","timestamp":1706811017178},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>The identification of promoter regions that are regulated by a given transcription factor has traditionally relied upon the identification and distributions of binding sites recognized by the factor. In this study, we have developed a tandem machine learning approach for the identification of regulatory target genes based on these parameters and on the corresponding binding site information contents that measure the affinities of the factor for these cognate elements.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>This method has been validated using models of DNA binding sites recognized by the xenobiotic-sensitive nuclear receptor, PXR\/RXR\u03b1, for target genes within the human genome. An information theory-based weight matrix was first derived and refined from known PXR\/RXR\u03b1 binding sites. The promoter region of candidate genes was scanned with the weight matrix. A novel information density-based clustering algorithm was then used to identify clusters of information rich sites. Finally, transformed data representing metrics of location, strength and clustering of binding sites were used for classification of promoter regions using an ensemble approach involving neural networks, decision trees and Na\u00efve Bayesian classification. The method was evaluated on a set of 24 known target genes and 288 genes known not to be regulated by PXR\/RXR\u03b1. We report an average accuracy (proportion of correctly classified promoter regions) of 71%, sensitivity of 73%, and specificity of 70%, based on multiple cross-validation and the leave-one-out strategy. The performance on a test set of 13 genes showed that 10 were correctly classified.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>We have developed a machine learning approach for the successful detection of gene targets for transcription factors with high accuracy. The method has been validated for the transcription factor PXR\/RXR\u03b1 and has the potential to be extended to other transcription factors.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/1471-2105-6-204","type":"journal-article","created":{"date-parts":[[2005,8,24]],"date-time":"2005-08-24T06:16:41Z","timestamp":1124864201000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Tandem machine learning for the identification of genes regulated by transcription factors"],"prefix":"10.1186","volume":"6","author":[{"given":"Deendayal","family":"Dinakarpandian","sequence":"first","affiliation":[]},{"given":"Venetia","family":"Raheja","sequence":"additional","affiliation":[]},{"given":"Saumil","family":"Mehta","sequence":"additional","affiliation":[]},{"given":"Erin G","family":"Schuetz","sequence":"additional","affiliation":[]},{"given":"Peter K","family":"Rogan","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2005,8,22]]},"reference":[{"key":"529_CR1","doi-asserted-by":"publisher","first-page":"427","DOI":"10.1006\/jtbi.1997.0540","volume":"189","author":"TD Schneider","year":"1997","unstructured":"Schneider TD: Information content of individual genetic sequences. J Theor Biol 1997, 189: 427\u2013441. 10.1006\/jtbi.1997.0540","journal-title":"J Theor Biol"},{"key":"529_CR2","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1093\/bioinformatics\/16.1.16","volume":"16","author":"GD Stormo","year":"2000","unstructured":"Stormo GD: DNA binding sites: representation and discovery. Bioinformatics 2000, 16: 16\u201323. 10.1093\/bioinformatics\/16.1.16","journal-title":"Bioinformatics"},{"key":"529_CR3","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389\u20133402. 10.1093\/nar\/25.17.3389","journal-title":"Nucleic Acids Res"},{"key":"529_CR4","doi-asserted-by":"publisher","first-page":"374","DOI":"10.1093\/nar\/gkg108","volume":"31","author":"V Matys","year":"2003","unstructured":"Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H, Munch R, Reuter I, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 2003, 31: 374\u2013378. 10.1093\/nar\/gkg108","journal-title":"Nucleic Acids Res"},{"issue":"Database Issue","key":"529_CR5","doi-asserted-by":"publisher","first-page":"D192","DOI":"10.1093\/nar\/gki069","volume":"33","author":"A Marchler-Bauer","year":"2005","unstructured":"Marchler-Bauer A, Anderson JB, Cherukuri PF, DeWeese-Scott C, Geer LY, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Liebert CA, Liu C, Lu F, Marchler GH, Mullokandov M, Shoemaker BA, Simonyan V, Song JS, Thiessen PA, Yamashita RA, Yin JJ, Zhang D, Bryant SH: CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res 2005, 33(Database Issue):D192\u20136. 10.1093\/nar\/gki069","journal-title":"Nucleic Acids Res"},{"key":"529_CR6","doi-asserted-by":"publisher","first-page":"46779","DOI":"10.1074\/jbc.M408395200","volume":"279","author":"CA Vyhlidal","year":"2004","unstructured":"Vyhlidal CA, Rogan PK, Leeder JS: Development and Refinement of Pregnane X Receptor (PXR) DNA Binding Site Model Using Information Theory: INSIGHTS INTO PXR-MEDIATED GENE REGULATION. J Biol Chem 2004, 279: 46779\u201346786. 10.1074\/jbc.M408395200","journal-title":"J Biol Chem"},{"key":"529_CR7","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1097\/00008571-200304000-00005","volume":"13","author":"PK Rogan","year":"2003","unstructured":"Rogan PK, Svojanovsky S, Leeder JS: Information theory-based analysis of CYP2C19, CYP2D6 and CYP3A5 splicing mutations. Pharmacogenetics 2003, 13: 207\u2013218. 10.1097\/00008571-200304000-00005","journal-title":"Pharmacogenetics"},{"key":"529_CR8","doi-asserted-by":"publisher","first-page":"334","DOI":"10.1002\/humu.20151","volume":"25","author":"VK Nalla","year":"2005","unstructured":"Nalla VK, Rogan PK: Automated splicing mutation analysis by information theory. Hum Mutat 2005, 25: 334\u2013342. 10.1002\/humu.20151","journal-title":"Hum Mutat"},{"key":"529_CR9","doi-asserted-by":"publisher","first-page":"1269","DOI":"10.1210\/mend.16.6.0851","volume":"16","author":"M Podvinec","year":"2002","unstructured":"Podvinec M, Kaufmann MR, Handschin C, Meyer UA: NUBIScan, an in silico approach for prediction of nuclear receptor response elements. Mol Endocrinol 2002, 16: 1269\u20131279. 10.1210\/me.16.6.1269","journal-title":"Mol Endocrinol"},{"key":"529_CR10","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1146\/annurev.pharmtox.42.111901.111051","volume":"42","author":"B Goodwin","year":"2002","unstructured":"Goodwin B, Redinbo MR, Kliewer SA: Regulation of cyp3a gene transcription by the pregnane x receptor. Annu Rev Pharmacol Toxicol 2002, 42: 1\u201323. 10.1146\/annurev.pharmtox.42.111901.111051","journal-title":"Annu Rev Pharmacol Toxicol"},{"key":"529_CR11","doi-asserted-by":"publisher","first-page":"445","DOI":"10.1016\/S0076-6879(96)74036-3","volume":"274","author":"TD Schneider","year":"1996","unstructured":"Schneider TD: Reading of DNA sequence logos: prediction of major groove binding by information theory. Methods Enzymol 1996, 274: 445\u2013455.","journal-title":"Methods Enzymol"},{"key":"529_CR12","doi-asserted-by":"publisher","first-page":"776","DOI":"10.1093\/bioinformatics\/15.10.776","volume":"15","author":"A Wagner","year":"1999","unstructured":"Wagner A: Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes. Bioinformatics 1999, 15: 776\u2013784. 10.1093\/bioinformatics\/15.10.776","journal-title":"Bioinformatics"},{"key":"529_CR13","doi-asserted-by":"publisher","first-page":"757","DOI":"10.1073\/pnas.231608898","volume":"99","author":"BP Berman","year":"2002","unstructured":"Berman BP, Nibu Y, Pfeiffer BD, Tomancak P, Celniker SE, Levine M, Rubin GM, Eisen MB: Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc Natl Acad Sci U S A 2002, 99: 757\u2013762. 10.1073\/pnas.231608898","journal-title":"Proc Natl Acad Sci U S A"},{"key":"529_CR14","doi-asserted-by":"publisher","first-page":"W195","DOI":"10.1093\/nar\/gkh387","volume":"32","author":"WB Alkema","year":"2004","unstructured":"Alkema WB, Johansson O, Lagergren J, Wasserman WW: MSCAN: identification of functional clusters of transcription factor binding sites. Nucleic Acids Res 2004, 32: W195\u20138.","journal-title":"Nucleic Acids Res"},{"key":"529_CR15","doi-asserted-by":"publisher","first-page":"763","DOI":"10.1073\/pnas.012591199","volume":"99","author":"M Markstein","year":"2002","unstructured":"Markstein M, Markstein P, Markstein V, Levine MS: Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proc Natl Acad Sci U S A 2002, 99: 763\u2013768. 10.1073\/pnas.012591199","journal-title":"Proc Natl Acad Sci U S A"},{"key":"529_CR16","doi-asserted-by":"publisher","first-page":"9888","DOI":"10.1073\/pnas.152320899","volume":"99","author":"M Rebeiz","year":"2002","unstructured":"Rebeiz M, Reeves NL, Posakony JW: SCORE: a computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data. Site clustering over random expectation. Proc Natl Acad Sci U S A 2002, 99: 9888\u20139893. 10.1073\/pnas.152320899","journal-title":"Proc Natl Acad Sci U S A"},{"key":"529_CR17","doi-asserted-by":"publisher","first-page":"3666","DOI":"10.1093\/nar\/gkg540","volume":"31","author":"MC Frith","year":"2003","unstructured":"Frith MC, Li MC, Weng Z: Cluster-Buster: Finding dense clusters of motifs in DNA sequences. Nucleic Acids Res 2003, 31: 3666\u20133668. 10.1093\/nar\/gkg540","journal-title":"Nucleic Acids Res"},{"key":"529_CR18","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1016\/j.taap.2003.12.027","volume":"199","author":"V Lamba","year":"2004","unstructured":"Lamba V, Yasuda K, Lamba JK, Assem M, Davila J, Strom S, Schuetz EG: PXR (NR1I2): splice variants in human tissues, including brain, and identification of neurosteroids and nicotine as PXR activators. Toxicol Appl Pharmacol 2004, 199: 251\u2013265. 10.1016\/j.taap.2003.12.027","journal-title":"Toxicol Appl Pharmacol"},{"key":"529_CR19","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1186\/1471-2105-4-38","volume":"4","author":"S Gadiraju","year":"2003","unstructured":"Gadiraju S, Vyhlidal CA, Leeder JS, Rogan PK: Genome-wide prediction, display and refinement of binding sites with information theory-based models. BMC Bioinformatics 2003, 4: 38. 10.1186\/1471-2105-4-38","journal-title":"BMC Bioinformatics"},{"key":"529_CR20","first-page":"226","volume-title":"Proceedings of the 1996 International Conference on Knowledge Discovery and Data Mining (KDD '96)","author":"M Ester","year":"1996","unstructured":"Ester M, Kriegel HP, Sander J, Xu X: A density-based algorithm for discovering clusters in large spatial databases. Proceedings of the 1996 International Conference on Knowledge Discovery and Data Mining (KDD '96) 1996, 226\u2013231."},{"key":"529_CR21","first-page":"416","volume-title":"Data mining: Practical machine learning tools and techniques with Java implementations","author":"IH Witten","year":"1999","unstructured":"Witten IH, Frank E: Data mining: Practical machine learning tools and techniques with Java implementations. 1st edition. San Francisco, Morgan Kaufmann; 1999:416.","edition":"1st"},{"key":"529_CR22","first-page":"302","volume-title":"C4.5: Programs for machine learning","author":"JR Quinlan","year":"1993","unstructured":"Quinlan JR: C4.5: Programs for machine learning. San Francisco, Morgan Kaufmann; 1993:302."},{"key":"529_CR23","volume-title":"Proceedings of the Applications of Neural Networks Conference, SPIE","author":"AKTMNST Zell","year":"1991","unstructured":"Zell AKTMNST: Recent Developments of the Neural Network Simulator. Proceedings of the Applications of Neural Networks Conference, SPIE 1991., 1294:"},{"key":"529_CR24","doi-asserted-by":"publisher","first-page":"1481","DOI":"10.1109\/5.58326","volume":"78","author":"T Poggio","year":"1990","unstructured":"Poggio T, Girosi F: Networks for approximation and learning. Proceedings of the IEEE 1990, 78: 1481\u20131497. 10.1109\/5.58326","journal-title":"Proceedings of the IEEE"},{"key":"529_CR25","doi-asserted-by":"publisher","first-page":"649","DOI":"10.1124\/pr.55.4.2","volume":"55","author":"C Handschin","year":"2003","unstructured":"Handschin C, Meyer UA: Induction of drug metabolism: the role of nuclear receptors. Pharmacol Rev 2003, 55: 649\u2013673. 10.1124\/pr.55.4.2","journal-title":"Pharmacol Rev"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-204.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,1]],"date-time":"2024-02-01T17:50:02Z","timestamp":1706809802000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-6-204"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,8,22]]},"references-count":25,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2005,12]]}},"alternative-id":["529"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-6-204","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,8,22]]},"assertion":[{"value":"9 March 2005","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 August 2005","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 August 2005","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"204"}}