{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T02:06:52Z","timestamp":1772244412885,"version":"3.50.1"},"reference-count":57,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,7,25]],"date-time":"2024-07-25T00:00:00Z","timestamp":1721865600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:p>\n                    Transcription factors are essential DNA-binding proteins that regulate the transcription rate of several genes and control the expression of genes inside a cell. The prediction of transcription factors with high precision is important for understanding biological processes such as cell differentiation, intracellular signaling, and cell-cycle control. In this study, we developed a hybrid method that combines alignment-based and alignment-free methods for predicting transcription factors with higher accuracy. All models have been trained, tested, and evaluated on a large dataset that contains 19,406 transcription factors and 523,560 non-transcription factor protein sequences. To avoid biases in evaluation, the datasets were divided into training and validation\/independent datasets, where 80% of the data was used for training, and the remaining 20% was used for external validation. In the case of alignment-free methods, models were developed using machine learning techniques and the composition-based features of a protein. Our best alignment-free model obtained an AUC of 0.97 on an independent dataset. In the case of the alignment-based method, we used BLAST at different cut-offs to predict the transcription factors. Although the alignment-based method demonstrated excellent performance, it was unable to cover all transcription factors due to instances of no hits. To combine the strengths of both methods, we developed a hybrid method that combines alignment-free and alignment-based methods. In the hybrid method, we added the scores of the alignment-free and alignment-based methods and achieved a maximum AUC of 0.99 on the independent dataset. The method proposed in this study performs better than existing methods. We incorporated the best models in the webserver\/Python Package Index\/standalone package of \u201cTransFacPred\u201d (\n                    <jats:ext-link>https:\/\/webs.iiitd.edu.in\/raghava\/transfacpred<\/jats:ext-link>\n                    ).\n                  <\/jats:p>","DOI":"10.3389\/fbinf.2024.1425419","type":"journal-article","created":{"date-parts":[[2024,7,25]],"date-time":"2024-07-25T05:04:24Z","timestamp":1721883864000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["A hybrid approach for predicting transcription factors"],"prefix":"10.3389","volume":"4","author":[{"given":"Sumeet","family":"Patiyal","sequence":"first","affiliation":[]},{"given":"Palak","family":"Tiwari","sequence":"additional","affiliation":[]},{"given":"Mohit","family":"Ghai","sequence":"additional","affiliation":[]},{"given":"Aman","family":"Dhapola","sequence":"additional","affiliation":[]},{"given":"Anjali","family":"Dhall","sequence":"additional","affiliation":[]},{"given":"Gajendra P. S.","family":"Raghava","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2024,7,25]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"e24039","DOI":"10.1371\/journal.pone.0024039","article-title":"Identification of mannose interacting residues using local composition","volume":"6","author":"Agarwal","year":"2011","journal-title":"PloS one"},{"key":"B2","doi-asserted-by":"publisher","first-page":"1690","DOI":"10.3389\/fphar.2019.01690","article-title":"SAMbinder: a web server for predicting s-adenosyl-l-methionine binding residues of a protein from its amino acid sequence","volume":"10","author":"Agrawal","year":"2020","journal-title":"Front. Pharmacol."},{"key":"B3","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1093\/nar\/28.1.45","article-title":"The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000","volume":"28","author":"Bairoch","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"B4","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/j.trecan.2015.07.001","article-title":"Targeting transcription factors in cancer","volume":"1","author":"Bhagwat","year":"2015","journal-title":"Trends Cancer"},{"key":"B5","doi-asserted-by":"publisher","first-page":"89","DOI":"10.1007\/978-1-59745-535-0_4","article-title":"UniProtKB\/Swiss-Prot","volume":"406","author":"Boutet","year":"2007","journal-title":"Methods Mol. Biol."},{"key":"B6","doi-asserted-by":"publisher","first-page":"611","DOI":"10.1038\/s41568-019-0196-7","article-title":"Targeting transcription factors in cancer - from undruggable to reality","volume":"19","author":"Bushweller","year":"2019","journal-title":"Nat. Rev. Cancer."},{"key":"B7","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1038\/s41392-019-0095-0","article-title":"Targeting epigenetic regulators for cancer therapy: mechanisms and advances in clinical trials","volume":"4","author":"Cheng","year":"2019","journal-title":"Signal Transduct. Target Ther."},{"key":"B8","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1016\/j.cell.2012.03.003","article-title":"MYC on the path to cancer","volume":"149","author":"Dang","year":"2012","journal-title":"Cell"},{"key":"B9","doi-asserted-by":"publisher","first-page":"579","DOI":"10.1111\/j.1467-7652.2012.00688.x","article-title":"Overexpression of Arabidopsis and rice stress genes\u2019 inducible transcription factor confers drought and salinity tolerance to rice","volume":"10","author":"Datta","year":"2012","journal-title":"Plant Biotechnol. J."},{"key":"B10","doi-asserted-by":"publisher","first-page":"796","DOI":"10.1126\/science.1113832","article-title":"Gene regulatory networks and the evolution of animal body plans","volume":"311","author":"Davidson","year":"2006","journal-title":"Science"},{"key":"B11","doi-asserted-by":"publisher","first-page":"bbac192","DOI":"10.1093\/bib\/bbac192","article-title":"HLAncPred: a method for predicting promiscuous non-classical HLA binding sites","volume":"23","author":"Dhall","year":"2022","journal-title":"Brief. Bioinform."},{"key":"B12","doi-asserted-by":"publisher","first-page":"104780","DOI":"10.1016\/j.compbiomed.2021.104780","article-title":"Computer-aided prediction of inhibitors against STAT3 for managing COVID-19 associated cytokine storm","volume":"137","author":"Dhall","year":"2021","journal-title":"Comput. Biol. Med."},{"key":"B13","doi-asserted-by":"publisher","first-page":"e82238","DOI":"10.1371\/journal.pone.0082238","article-title":"TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors","volume":"8","author":"Eichner","year":"2013","journal-title":"PLoS One"},{"key":"B14","doi-asserted-by":"publisher","first-page":"568","DOI":"10.1016\/j.gde.2013.05.002","article-title":"Skeletal muscle programming and re-programming","volume":"23","author":"Fong","year":"2013","journal-title":"Curr. Opin. Genet. Dev."},{"key":"B15","doi-asserted-by":"publisher","first-page":"503","DOI":"10.1186\/1471-2105-9-503","article-title":"ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins","volume":"9","author":"Garg","year":"2008","journal-title":"BMC Bioinforma."},{"key":"B16","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1101\/gad.183434.111","article-title":"NF-\u03baB, the first quarter-century: remarkable progress and outstanding questions","volume":"26","author":"Hayden","year":"2012","journal-title":"Genes Dev."},{"key":"B17","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1016\/j.molonc.2007.01.004","article-title":"Genetic and epigenetic alterations as biomarkers for cancer detection, diagnosis and prognosis","volume":"1","author":"Herceg","year":"2007","journal-title":"Mol. Oncol."},{"key":"B18","doi-asserted-by":"publisher","first-page":"1041","DOI":"10.1111\/j.1365-313X.2010.04124.x","article-title":"Research on plant abiotic stress responses in the post-genome era: past, present and future","volume":"61","author":"Hirayama","year":"2010","journal-title":"Plant J."},{"key":"B19","doi-asserted-by":"publisher","first-page":"681377","DOI":"10.3389\/fonc.2021.681377","article-title":"Transcription factors: the fulcrum between cell development and carcinogenesis","volume":"11","author":"Islam","year":"2021","journal-title":"Front. Oncol."},{"key":"B20","doi-asserted-by":"publisher","first-page":"262","DOI":"10.1159\/000448747","article-title":"Disorders of Transcriptional Regulation: an emerging category of multiple malformation syndromes","volume":"7","author":"Izumi","year":"2016","journal-title":"Mol. Syndromol."},{"key":"B21","doi-asserted-by":"publisher","first-page":"681","DOI":"10.1007\/s10555-020-09883-w","article-title":"FOXO transcription factor family in cancer and metastasis","volume":"39","author":"Jiramongkol","year":"2020","journal-title":"Cancer Metastasis Rev."},{"key":"B22","doi-asserted-by":"publisher","first-page":"2324","DOI":"10.1002\/pmic.200700597","article-title":"RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information","volume":"9","author":"Kaundal","year":"2009","journal-title":"Proteomics"},{"key":"B23","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1016\/j.cell.2014.02.054","article-title":"Large-scale genetic perturbations reveal regulatory networks and an abundance of gene-specific repressors","volume":"157","author":"Kemmeren","year":"2014","journal-title":"Cell"},{"key":"B24","doi-asserted-by":"publisher","first-page":"e2021171118","DOI":"10.1073\/pnas.2021171118","article-title":"DeepTFactor: a deep learning-based tool for the prediction of transcription factors","volume":"118","author":"Kim","year":"2021","journal-title":"Proc. Natl. Acad. Sci. U. S. A."},{"key":"B25","doi-asserted-by":"publisher","first-page":"3583","DOI":"10.1038\/s41467-019-11526-w","article-title":"Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution","volume":"10","author":"Kircher","year":"2019","journal-title":"Nat. Commun."},{"key":"B26","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1097\/MOH.0000000000000567","article-title":"Driver mutations in acute myeloid leukemia","volume":"27","author":"Kishtagari","year":"2020","journal-title":"Curr. Opin. Hematol."},{"key":"B27","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1086\/426833","article-title":"Long-range control of gene expression: emerging mechanisms and disruption in disease","volume":"76","author":"Kleinjan","year":"2005","journal-title":"Am. J. Hum. Genet."},{"key":"B28","doi-asserted-by":"publisher","first-page":"650","DOI":"10.1016\/j.cell.2018.01.029","article-title":"The human transcription factors","volume":"172","author":"Lambert","year":"2018","journal-title":"Cell"},{"key":"B29","doi-asserted-by":"publisher","first-page":"1237","DOI":"10.1016\/j.cell.2013.02.014","article-title":"Transcriptional regulation and its misregulation in disease","volume":"152","author":"Lee","year":"2013","journal-title":"Cell"},{"key":"B30","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1038\/nature01763","article-title":"Transcription regulation and animal diversity","volume":"424","author":"Levine","year":"2003","journal-title":"Nature"},{"key":"B31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41392-019-0089-y","article-title":"Applications of genome editing technology in the targeted therapy of human diseases: mechanisms, advances and prospects","volume":"5","author":"Li","year":"2020","journal-title":"Signal Transduct. Target. Ther."},{"key":"B32","doi-asserted-by":"publisher","first-page":"625","DOI":"10.1038\/ncb1589","article-title":"Pluripotency governed by Sox2 via regulation of Oct3\/4 expression in mouse embryonic stem cells","volume":"9","author":"Masui","year":"2007","journal-title":"Nat. Cell Biol."},{"key":"B33","doi-asserted-by":"publisher","first-page":"W20","DOI":"10.1093\/nar\/gkh435","article-title":"BLAST: at the core of a powerful and diverse set of sequence analysis tools","volume":"32","author":"McGinnis","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"B34","doi-asserted-by":"publisher","first-page":"659761","DOI":"10.3389\/fimmu.2021.659761","article-title":"The interplay between Chromatin architecture and lineage-specific transcription factors and the regulation of rag gene expression","volume":"12","author":"Miyazaki","year":"2021","journal-title":"Front. Immunol."},{"key":"B35","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1038\/ismej.2012.94","article-title":"Sizing up metatranscriptomics","volume":"7","author":"Moran","year":"2013","journal-title":"ISME J."},{"key":"B36","doi-asserted-by":"publisher","first-page":"167","DOI":"10.3390\/ijms21010167","article-title":"RNA-Seq and ChIP-seq as complementary approaches for comprehension of plant transcriptional regulatory mechanism","volume":"21","author":"Muhammad","year":"2019","journal-title":"Int. J. Mol. Sci."},{"key":"B37","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1126\/science.1216379","article-title":"Using gene expression noise to understand gene regulation","volume":"336","author":"Munsky","year":"2012","journal-title":"Science"},{"key":"B38","doi-asserted-by":"publisher","first-page":"a008128","DOI":"10.1101\/cshperspect.a008128","article-title":"Pluripotency in the embryo and in culture","volume":"4","author":"Nichols","year":"2012","journal-title":"Cold Spring Harb. Perspect. Biol."},{"key":"B39","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1007\/978-90-481-9069-0_8","article-title":"Identification of transcription factor-DNA interactions in vivo","volume":"52","author":"Odom","year":"2011","journal-title":"Subcell. Biochem."},{"key":"B40","doi-asserted-by":"publisher","first-page":"628","DOI":"10.1186\/1471-2164-13-628","article-title":"P2TF: a comprehensive resource for analysis of prokaryotic transcription factors","volume":"13","author":"Ortet","year":"2012","journal-title":"BMC Genomics"},{"key":"B41","doi-asserted-by":"publisher","first-page":"204","DOI":"10.1089\/cmb.2022.0241","article-title":"Pfeature: a tool for computing wide range of protein features and building prediction models","volume":"30","author":"Pande","year":"2023","journal-title":"J. Comput. Biol."},{"key":"B42","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1002\/pro.3761","article-title":"NAGbinder: an approach for identifying N\u2010acetylglucosamine interacting residues of a protein from its primary sequence","volume":"29","author":"Patiyal","year":"2020","journal-title":"Protein Sci."},{"key":"B43","doi-asserted-by":"publisher","first-page":"bbac322","DOI":"10.1093\/bib\/bbac322","article-title":"A deep learning-based method for the prediction of DNA interacting residues in a protein","volume":"23","author":"Patiyal","year":"2022","journal-title":"Brief. Bioinform."},{"key":"B44","doi-asserted-by":"publisher","first-page":"132","DOI":"10.3390\/jcm9010132","article-title":"Bioinformatics and computational tools for next-generation sequencing analysis in clinical genetics","volume":"9","author":"Pereira","year":"2020","journal-title":"J. Clin. Med."},{"key":"B45","doi-asserted-by":"publisher","first-page":"102","DOI":"10.1038\/s41598-021-86919-3","article-title":"Transcriptional regulation of the first cell fate decision","volume":"1","author":"Rhee","year":"2017","journal-title":"J. Dev. Biol. Regen. Med."},{"key":"B46","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1016\/j.tplants.2010.02.006","article-title":"WRKY transcription factors","volume":"15","author":"Rushton","year":"2010","journal-title":"Trends Plant Sci."},{"key":"B47","doi-asserted-by":"publisher","first-page":"17","DOI":"10.5582\/irdr.2014.01021","article-title":"ARID1B-mediated disorders: mutations and possible mechanisms","volume":"4","author":"Sim","year":"2015","journal-title":"Intractable Rare Dis. Res."},{"key":"B48","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1016\/j.it.2014.03.006","article-title":"Gene regulatory networks in the immune system","volume":"35","author":"Singh","year":"2014","journal-title":"Trends Immunol."},{"key":"B49","doi-asserted-by":"publisher","first-page":"1569","DOI":"10.1242\/dev.121.6.1569","article-title":"Developmental biology of the pancreas","volume":"121","author":"Slack","year":"1995","journal-title":"Development"},{"key":"B50","doi-asserted-by":"publisher","first-page":"252","DOI":"10.1038\/nrg2538","article-title":"A census of human transcription factors: function, expression and evolution","volume":"10","author":"Vaquerizas","year":"2009","journal-title":"Nat. Rev. Genet."},{"key":"B51","doi-asserted-by":"publisher","first-page":"275","DOI":"10.1038\/nrm2147","article-title":"p53 in health and disease","volume":"8","author":"Vousden","year":"2007","journal-title":"Nat. Rev. Mol. Cell Biol."},{"key":"B52","doi-asserted-by":"publisher","first-page":"2867","DOI":"10.1093\/bioinformatics\/bty194","article-title":"BART: a transcription factor prediction tool with query gene sets or epigenomic profiles","volume":"34","author":"Wang","year":"2018","journal-title":"Bioinformatics"},{"key":"B53","doi-asserted-by":"publisher","first-page":"276","DOI":"10.1038\/nrg1315","article-title":"Applied bioinformatics for the identification of regulatory elements","volume":"5","author":"Wasserman","year":"2004","journal-title":"Nat. Rev. Genet."},{"key":"B54","doi-asserted-by":"publisher","first-page":"1377","DOI":"10.1093\/molbev\/msg140","article-title":"The evolution of transcriptional regulation in eukaryotes","volume":"20","author":"Wray","year":"2003","journal-title":"Mol. Biol. Evol."},{"key":"B55","doi-asserted-by":"publisher","first-page":"781","DOI":"10.1146\/annurev.arplant.57.032905.105444","article-title":"Transcriptional regulatory networks in cellular responses and tolerance to dehydration and cold stresses","volume":"57","author":"Yamaguchi-Shinozaki","year":"2006","journal-title":"Annu. Rev. Plant Biol."},{"key":"B56","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1155\/2022\/3288527","article-title":"Subcellular localization prediction of human proteins using multifeature selection methods","volume":"2022","author":"Zhang","year":"2022","journal-title":"Biomed. Res. Int."},{"key":"B57","doi-asserted-by":"publisher","first-page":"282","DOI":"10.1186\/1471-2105-9-282","article-title":"The combination approach of SVM and ECOC for powerful identification and classification of transcription factor","volume":"9","author":"Zheng","year":"2008","journal-title":"BMC Bioinforma."}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2024.1425419\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,25]],"date-time":"2024-07-25T05:04:28Z","timestamp":1721883868000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2024.1425419\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,25]]},"references-count":57,"alternative-id":["10.3389\/fbinf.2024.1425419"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2024.1425419","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.07.13.499865","asserted-by":"object"}]},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,25]]},"article-number":"1425419"}}