{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,6]],"date-time":"2025-11-06T11:38:29Z","timestamp":1762429109188,"version":"3.35.0"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2008,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Transcription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and reduce costs, many computational methods have been developed to identify TFs from new proteins and to classify the resulted TFs. Though these methods have facilitated screening of TFs to some extent, low accuracy is still a common problem. With the fast growing number of new proteins, more precise algorithms for identifying TFs from new proteins and classifying the consequent TFs are in a high demand.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>The support vector machine (SVM) algorithm was utilized to construct an automatic detector for TF identification, where protein domains and functional sites were employed as feature vectors. Error-correcting output coding (ECOC) algorithm, which was originated from information and communication engineering fields, was introduced to combine with support vector machine (SVM) methodology for TF classification. The overall success rates of identification and classification achieved 88.22% and 97.83% respectively. Finally, a web site was constructed to let users access our tools (see Availability and requirements section for URL).<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>The SVM method was a valid and stable means for TFs identification with protein domains and functional sites as feature vectors. Error-correcting output coding (ECOC) algorithm is a powerful method for multi-class classification problem. When combined with SVM method, it can remarkably increase the accuracy of TF classification using protein domains and functional sites as feature vectors. In addition, our work implied that ECOC algorithm may succeed in a broad range of applications in biological data mining.<\/jats:p><\/jats:sec>","DOI":"10.1186\/1471-2105-9-282","type":"journal-article","created":{"date-parts":[[2008,6,17]],"date-time":"2008-06-17T06:13:44Z","timestamp":1213683224000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":31,"title":["The combination approach of SVM and ECOC for powerful identification and classification of transcription factor"],"prefix":"10.1186","volume":"9","author":[{"given":"Guangyong","family":"Zheng","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ziliang","family":"Qian","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qing","family":"Yang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chaochun","family":"Wei","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lu","family":"Xie","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yangyong","family":"Zhu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yixue","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2008,6,16]]},"reference":[{"key":"2267_CR1","doi-asserted-by":"publisher","first-page":"692","DOI":"10.1126\/science.281.5377.692","volume":"281","author":"SA Duncan","year":"1998","unstructured":"Duncan SA, Navas MA, Dufort D, Rossant J, Stoffel M: Regulation of a transcription factor network required for differentiation and metabolism. Science 1998, 281: 692\u2013695. 10.1126\/science.281.5377.692","journal-title":"Science"},{"key":"2267_CR2","doi-asserted-by":"publisher","first-page":"1057","DOI":"10.1126\/science.1079490","volume":"299","author":"S Hori","year":"2003","unstructured":"Hori S, Nomura T, Sakaguchi S: Control of regulatory T cell development by the transcription factor Foxp3. Science 2003, 299: 1057\u20131061. 10.1126\/science.1079490","journal-title":"Science"},{"key":"2267_CR3","doi-asserted-by":"publisher","first-page":"362","DOI":"10.1038\/377362a0","volume":"377","author":"PS Vaughan","year":"1995","unstructured":"Vaughan PS, Aziz F, van Wijnen AJ, Wu S, Harada H, Taniguchi T, Soprano KJ, Stein JL, Stein GS: Activation of a cell-cycle-regulated histone gene by the oncogenic transcription factor IRF-2. Nature 1995, 377: 362\u2013365. 10.1038\/377362a0","journal-title":"Nature"},{"key":"2267_CR4","doi-asserted-by":"publisher","first-page":"374","DOI":"10.1093\/nar\/gkg108","volume":"31","author":"V Matys","year":"2003","unstructured":"Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H, Munch R, Reuter I, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 2003, 31: 374\u2013378. 10.1093\/nar\/gkg108","journal-title":"Nucleic Acids Res"},{"key":"2267_CR5","doi-asserted-by":"publisher","first-page":"D108","DOI":"10.1093\/nar\/gkj143","volume":"34","author":"V Matys","year":"2006","unstructured":"Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 2006, 34: D108\u201310. 10.1093\/nar\/gkj143","journal-title":"Nucleic Acids Res"},{"key":"2267_CR6","doi-asserted-by":"publisher","first-page":"1053","DOI":"10.1146\/annurev.bi.61.070192.005201","volume":"61","author":"CO Pabo","year":"1992","unstructured":"Pabo CO, Sauer RT: Transcription factors: structural families and principles of DNA recognition. Annu Rev Biochem 1992, 61: 1053\u20131095. 10.1146\/annurev.bi.61.070192.005201","journal-title":"Annu Rev Biochem"},{"key":"2267_CR7","doi-asserted-by":"publisher","first-page":"463","DOI":"10.1186\/1471-2105-8-463","volume":"8","author":"M Kumar","year":"2007","unstructured":"Kumar M, Gromiha MM, Raghava GP: Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinformatics 2007, 8: 463. 10.1186\/1471-2105-8-463","journal-title":"BMC Bioinformatics"},{"key":"2267_CR8","doi-asserted-by":"publisher","first-page":"634","DOI":"10.1093\/bioinformatics\/btl672","volume":"23","author":"S Hwang","year":"2007","unstructured":"Hwang S, Gou Z, Kuznetsov IB: DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 2007, 23: 634\u2013636. 10.1093\/bioinformatics\/btl672","journal-title":"Bioinformatics"},{"key":"2267_CR9","doi-asserted-by":"publisher","first-page":"845","DOI":"10.1016\/j.bbrc.2008.02.106","volume":"369","author":"SY Cho","year":"2008","unstructured":"Cho SY, Chung M, Park M, Park S, Lee YS: ZIFIBI: Prediction of DNA binding sites for zinc finger proteins. Biochem Biophys Res Commun 2008, 369: 845\u2013848. 10.1016\/j.bbrc.2008.02.106","journal-title":"Biochem Biophys Res Commun"},{"key":"2267_CR10","doi-asserted-by":"publisher","first-page":"308","DOI":"10.1093\/nar\/28.1.308","volume":"28","author":"D Ghosh","year":"2000","unstructured":"Ghosh D: Object-oriented transcription factors database (ooTFD). Nucleic Acids Res 2000, 28: 308\u2013310. 10.1093\/nar\/28.1.308","journal-title":"Nucleic Acids Res"},{"key":"2267_CR11","doi-asserted-by":"publisher","first-page":"2568","DOI":"10.1093\/bioinformatics\/bti334","volume":"21","author":"A Guo","year":"2005","unstructured":"Guo A, He K, Liu D, Bai S, Gu X, Wei L, Luo J: DATF: a database of Arabidopsis transcription factors. Bioinformatics 2005, 21: 2568\u20132569. 10.1093\/bioinformatics\/bti334","journal-title":"Bioinformatics"},{"key":"2267_CR12","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1016\/S0968-0004(99)01416-4","volume":"24","author":"P Bork","year":"1999","unstructured":"Bork P, Doerks T, Springer TA, Snel B: Domains in plexins: links to integrins and transcription factors. Trends Biochem Sci 1999, 24: 261\u2013263. 10.1016\/S0968-0004(99)01416-4","journal-title":"Trends Biochem Sci"},{"key":"2267_CR13","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1093\/dnares\/dsi011","volume":"12","author":"K Iida","year":"2005","unstructured":"Iida K, Seki M, Sakurai T, Satou M, Akiyama K, Toyoda T, Konagaya A, Shinozaki K: RARTF: database and tools for complete sets of Arabidopsis transcription factors. DNA Res 2005, 12: 247\u2013256. 10.1093\/dnares\/dsi011","journal-title":"DNA Res"},{"key":"2267_CR14","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1016\/j.bbrc.2006.06.060","volume":"347","author":"Z Qian","year":"2006","unstructured":"Qian Z, Cai YD, Li Y: Automatic transcription factor classifier based on functional domain composition. Biochem Biophys Res Commun 2006, 347: 141\u2013144. 10.1016\/j.bbrc.2006.06.060","journal-title":"Biochem Biophys Res Commun"},{"key":"2267_CR15","doi-asserted-by":"publisher","first-page":"S296","DOI":"10.1093\/bioinformatics\/17.suppl_1.S296","volume":"17 Suppl 1","author":"J Wojcik","year":"2001","unstructured":"Wojcik J, Schachter V: Protein-protein interaction map inference using interacting domain profile pairs. Bioinformatics 2001, 17 Suppl 1: S296\u2013305.","journal-title":"Bioinformatics"},{"key":"2267_CR16","doi-asserted-by":"publisher","first-page":"1007","DOI":"10.1016\/j.bbrc.2004.07.059","volume":"321","author":"KC Chou","year":"2004","unstructured":"Chou KC, Cai YD: Predicting protein structural class by functional domain composition. Biochem Biophys Res Commun 2004, 321: 1007\u20131009. 10.1016\/j.bbrc.2004.07.059","journal-title":"Biochem Biophys Res Commun"},{"key":"2267_CR17","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1186\/1471-2105-7-187","volume":"7","author":"X Yu","year":"2006","unstructured":"Yu X, Wang C, Li Y: Classification of protein quaternary structure by functional domain composition. BMC Bioinformatics 2006, 7: 187. 10.1186\/1471-2105-7-187","journal-title":"BMC Bioinformatics"},{"key":"2267_CR18","doi-asserted-by":"publisher","first-page":"366","DOI":"10.1016\/j.bbrc.2007.03.139","volume":"357","author":"P Jia","year":"2007","unstructured":"Jia P, Qian Z, Zeng Z, Cai Y, Li Y: Prediction of subcellular protein localization based on functional domain composition. Biochem Biophys Res Commun 2007, 357: 366\u2013370. 10.1016\/j.bbrc.2007.03.139","journal-title":"Biochem Biophys Res Commun"},{"key":"2267_CR19","doi-asserted-by":"publisher","first-page":"1257","DOI":"10.1016\/S0022-2836(02)00379-0","volume":"319","author":"LJ Jensen","year":"2002","unstructured":"Jensen LJ, Gupta R, Blom N, Devos D, Tamames J, Kesmir C, Nielsen H, Staerfeldt HH, Rapacki K, Workman C, Andersen CA, Knudsen S, Krogh A, Valencia A, Brunak S: Prediction of human protein function from post-translational modifications and localization features. J Mol Biol 2002, 319: 1257\u20131265. 10.1016\/S0022-2836(02)00379-0","journal-title":"J Mol Biol"},{"key":"2267_CR20","doi-asserted-by":"publisher","first-page":"793","DOI":"10.1038\/nrc1455","volume":"4","author":"AM Bode","year":"2004","unstructured":"Bode AM, Dong Z: Post-translational modification of p53 in tumorigenesis. Nat Rev Cancer 2004, 4: 793\u2013805. 10.1038\/nrc1455","journal-title":"Nat Rev Cancer"},{"key":"2267_CR21","doi-asserted-by":"publisher","first-page":"24266","DOI":"10.1074\/jbc.273.37.24266","volume":"273","author":"U Laufs","year":"1998","unstructured":"Laufs U, Liao JK: Post-transcriptional regulation of endothelial nitric oxide synthase mRNA stability by Rho GTPase. J Biol Chem 1998, 273: 24266\u201324271. 10.1074\/jbc.273.37.24266","journal-title":"J Biol Chem"},{"key":"2267_CR22","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1111\/j.2517-6161.1974.tb00994.x","volume":"36","author":"M Stone","year":"1974","unstructured":"Stone M: Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society 1974, 36: 111\u2013147.","journal-title":"Journal of the Royal Statistical Society"},{"key":"2267_CR23","first-page":"1","volume":"61","author":"RG Miller","year":"1974","unstructured":"Miller RG: The jackknife-a review. Biometrika 1974, 61: 1\u201315.","journal-title":"Biometrika"},{"key":"2267_CR24","doi-asserted-by":"publisher","first-page":"36","DOI":"10.2307\/2685844","volume":"37","author":"BE G.Gong","year":"1983","unstructured":"G.Gong BE: A leisurely look at the bootstrap, the jackknife, and cross-validation. The American Statistician 1983, 37: 36\u201348. 10.2307\/2685844","journal-title":"The American Statistician"},{"key":"2267_CR25","unstructured":"The InterProScan webpage[http:\/\/www.ebi.ac.uk\/InterProScan\/]"},{"key":"2267_CR26","doi-asserted-by":"crossref","unstructured":"The Universal Protein Resource (UniProt) Nucleic Acids Res 2007, 35: D193\u20137. 10.1093\/nar\/gkl929","DOI":"10.1093\/nar\/gkl929"},{"key":"2267_CR27","doi-asserted-by":"publisher","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","volume":"22","author":"W Li","year":"2006","unstructured":"Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22: 1658\u20131659. 10.1093\/bioinformatics\/btl158","journal-title":"Bioinformatics"},{"key":"2267_CR28","doi-asserted-by":"publisher","first-page":"1589","DOI":"10.1093\/bioinformatics\/btg224","volume":"19","author":"G Wang","year":"2003","unstructured":"Wang G, Dunbrack RL Jr.: PISCES: a protein sequence culling server. Bioinformatics 2003, 19: 1589\u20131591. 10.1093\/bioinformatics\/btg224","journal-title":"Bioinformatics"},{"key":"2267_CR29","doi-asserted-by":"publisher","first-page":"D224","DOI":"10.1093\/nar\/gkl841","volume":"35","author":"NJ Mulder","year":"2007","unstructured":"Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, Courcelle E, Das U, Daugherty L, Dibley M, Finn R, Fleischmann W, Gough J, Haft D, Hulo N, Hunter S, Kahn D, Kanapin A, Kejariwal A, Labarga A, Langendijk-Genevaux PS, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Nikolskaya AN, Orchard S, Orengo C, Petryszak R, Selengut JD, Sigrist CJ, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C: New developments in the InterPro database. Nucleic Acids Res 2007, 35: D224\u20138. 10.1093\/nar\/gkl841","journal-title":"Nucleic Acids Res"},{"key":"2267_CR30","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-2440-0","volume-title":"The Nature of Statistical Learning Theory","author":"V.Vapnik","year":"1995","unstructured":"V.Vapnik: The Nature of Statistical Learning Theory. New York, Springer Verlag; 1995."},{"key":"2267_CR31","volume-title":"Statistical Learning Theory","author":"V.Vapnik","year":"1998","unstructured":"V.Vapnik: Statistical Learning Theory. 2nd edition. New York, John Wiley &Sons; 1998.","edition":"2nd"},{"key":"2267_CR32","unstructured":"The svmlight webpage[http:\/\/svmlight.joachims.org\/]"},{"key":"2267_CR33","volume-title":"Making large-Scale SVM Learing Practical. Advances in Kernal Methods - Support Vector Learing","author":"T Joachims","year":"1999","unstructured":"Joachims T: Making large-Scale SVM Learing Practical. Advances in Kernal Methods - Support Vector Learing. Edited by: Bernhard Scholkopf CJCBAJS. Cambridge, USA, MIT Press; 1999."},{"key":"2267_CR34","first-page":"124","volume-title":"Using Two-Class Classifiers for Multiclass Classification.: ; Quebec, Canada..","author":"David M.J. Tax and Robert P.W.Duin","year":"2002","unstructured":"David M.J. Tax and Robert P.W.Duin: Using Two-Class Classifiers for Multiclass Classification.: ; Quebec, Canada.. ; 2002:124\u2013127."},{"key":"2267_CR35","volume-title":"Data Mining Practical Machine Learning Tools and Techniques","author":"IHWE Frank","year":"2005","unstructured":"Frank IHWE: Data Mining Practical Machine Learning Tools and Techniques. 2nd edition. New York, Diane Cerra; 2005.","edition":"2nd"},{"key":"2267_CR36","first-page":"313","volume-title":"Error-Correcting Output Coding Corrects Bias and Variance: ; Tahoe City, CA.","author":"TGD Eun Bae Kong","year":"1995","unstructured":"Eun Bae Kong TGD: Error-Correcting Output Coding Corrects Bias and Variance: ; Tahoe City, CA. ; 1995:313\u2013321."},{"key":"2267_CR37","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1613\/jair.105","volume":"2","author":"TGD G.Bakiri","year":"1995","unstructured":"G.Bakiri TGD: Solving Multiclass Learning Problems via Error-Correcting Output Codes. Journal of Artificial Intelligence Research 1995, 2: 263\u2013286.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"2267_CR38","doi-asserted-by":"publisher","first-page":"349","DOI":"10.1093\/bioinformatics\/17.4.349","volume":"17","author":"CH Ding","year":"2001","unstructured":"Ding CH, Dubchak I: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 2001, 17: 349\u2013358. 10.1093\/bioinformatics\/17.4.349","journal-title":"Bioinformatics"},{"key":"2267_CR39","first-page":"218","volume":"14","author":"MN Nguyen","year":"2003","unstructured":"Nguyen MN, Rajapakse JC: Multi-class support vector machines for protein secondary structure prediction. Genome Inform 2003, 14: 218\u2013227.","journal-title":"Genome Inform"},{"key":"2267_CR40","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1016\/j.patrec.2004.08.019","volume":"26","author":"LI Kuncheva","year":"2005","unstructured":"Kuncheva LI: Using diversity measures for generating error-correcting output codes in classifier ensembles. Pattern Recognition Letters 2005, 26: 83\u201390. 10.1016\/j.patrec.2004.08.019","journal-title":"Pattern Recognition Letters"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-9-282.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,30]],"date-time":"2025-01-30T18:32:23Z","timestamp":1738261943000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-9-282"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,6,16]]},"references-count":40,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2008,12]]}},"alternative-id":["2267"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-9-282","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2008,6,16]]},"assertion":[{"value":"9 January 2008","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 June 2008","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 June 2008","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"282"}}